Migrate from Matomo to PostHog

Last updated:

|Edit this page

Prior to starting a historical data migration, ensure you do the following:

  1. Create a project on our US or EU Cloud.
  2. Sign up to a paid product analytics plan on the billing page (historic imports are free but this unlocks the necessary features).
  3. Raise an in-app support request with the Data pipelines topic detailing where you are sending events from, how, the total volume, and the speed. For example, "we are migrating 30M events from a self-hosted instance to EU Cloud using the migration scripts at 10k events per minute."
  4. Wait for the OK from our team before starting the migration process to ensure that it completes successfully and is not rate limited.
  5. Set the historical_migration option to true when capturing events in the migration.

Migrating data from Matomo is a two step process:

  1. Exporting data via Matomo's API
  2. Converting Matomo event data to the PostHog schema and capturing in PostHog

1. Exporting data from Matomo

Matomo provides a full API you can get data from. To get the relevant event data, we make a request to the Live.getLastVisitsDetails method.

To access the API, you need to create an API token. This can be done in your instance's security settings.

Creating an API token

With this token, you can make a request to get your event data and save it as a JSON file. This example gets 10,000 events from the first half of 2024:

Python
import requests
import json
base_url = "https://cool.matomo.cloud/"
endpoint = "?module=API&method=Live.getLastVisitsDetails"
site_id = "1"
period = "range"
date = "2024-01-01,2024-07-01"
response_format = "JSON"
token_auth = "11798cde5ksjadjslc3136dd09"
filter_limit = "10000"
url = f"{base_url}{endpoint}"
payload = {
"idSite": site_id,
"period": period,
"date": date,
"format": response_format,
"token_auth": token_auth,
"filter_limit": filter_limit
}
response = requests.post(url, data=payload)
if response.status_code == 200:
print("Request was successful")
events = response.json()
with open('events.json', 'w') as json_file:
json.dump(events, json_file, indent=4)
else:
print(response.text)
print(f"Request failed with status code: {response.status_code}")

To return all rows, set filter limit to -1. Matomo recommends you only export 10,000 rows at a time. If you have more than that amount, you can use a combination of filter_limit and filter_offset to get sets of rows.

2. Converting Matomo event data to the PostHog schema

The schema of Matomo's exported event data is similar to PostHog's schema, but it requires conversion to work with the rest of PostHog's data. You can see details on Matomo's schema in their docs and events and properties PostHog autocaptures in our docs.

The big difference is that Matomo structures their data around visits that contain one or more actions. Matomo actions are similar to PostHog's events. You can go through each visit and convert it to PostHog's schema by doing the following:

  1. Converting properties like operatingSystemVersion to $os_version.
  2. Omitting properties that aren't relevant like visitEcommerceStatusIcon, plugins, timeSpentPretty, and many more. Matomo includes many more properties on their events than PostHof does.
  3. Looping through the actionDetails, converting:
    • Properties like url to $current_url
    • Action types like action to event names like $pageview
    • Action timestamp to an ISO 8601 timestamp
  4. Add visit properties to action properties

Once this is done, you can capture each action into PostHog using the Python SDK or the capture API endpoint with historical_migration set to true.

Here's an example version of a Python script:

Python
from posthog import Posthog
from datetime import datetime
import requests
import json
posthog = Posthog(
'<ph_project_api_key>',
host='https://us.i.posthog.com',
debug=True,
historical_migration=True
)
key_mapping = {
'url': '$current_url',
'browserName': '$browser',
'browserVersion': '$browser_version',
'operatingSystemName': '$os',
'operatingSystemVersion': '$os_version',
'deviceType': '$device_type',
'referrerUrl': '$referrer',
'city': '$geoip_city_name',
'region': '$geoip_subdivision_1_name',
'regionCode': '$geoip_subdivision_1_code',
'country': '$geoip_country_name',
'countryCode': '$geoip_country_code',
'continent': '$geoip_continent_name',
'continentCode': '$geoip_continent_code',
'latitude': '$geoip_latitude',
'longitude': '$geoip_longitude',
'visitIp': '$ip',
'siteName': '$host',
'languageCode': '$browser_language',
"referrerUrl": "$referrer"
}
omitted_keys = [
"accountId",
"iconSVG",
"idpageview",
"pageIdAction",
"pageviewPosition",
"pageTitle",
"serverTimePretty",
"subtitle",
"timeSpentPretty",
"timestamp",
"actions",
'browser',
'browserCode',
'browserFamily',
'browserFamilyDescription',
'browserIcon',
'countryFlag',
'deviceTypeIcon',
'events',
'experiments',
'fingerprint',
'idSite',
'idVisit',
'language',
'lastActionDateTime'
'location',
'operatingSystem',
'operatingSystemCode',
'operatingSystemIcon',
'plugins',
'pluginsIcons',
'referrerSearchEngineIcon',
'referrerSocialNetworkIcon',
'serverDate',
'serverDatePretty',
'serverDatePrettyFirstAction',
'serverTimePretty',
'serverTimePrettyFirstAction',
'serverTimestamp',
'sessionReplayUrl',
'siteCurrency',
'siteCurrencySymbol',
'siteName',
'totalAbandonedCarts',
'totalAbandonedCartsItems',
'totalAbandonedCartsRevenue',
'totalEcommerceConversions',
'totalEcommerceItems',
'totalEcommerceRevenue',
'userId',
'visitConverted',
'visitConvertedIcon',
'visitCount',
'visitDuration',
'visitDurationPretty',
'visitEcommerceStatus',
'visitEcommerceStatusIcon',
'visitLocalHour',
'visitLocalTime',
'visitServerHour',
'lastActionTimestamp',
'lastActionDateTime',
'firstActionTimestamp',
'visitorId',
'visitorTypeIcon',
'icon'
]
with open('events.json', 'r') as file:
events = json.load(file)
for event in events:
distinct_id = event.get("userId") or event.get("visitorId")
visitProperties = {}
for key, value in event.items():
if value == '' or value is None:
continue
elif key in omitted_keys:
continue
elif key in key_mapping:
if key in ['browserVersion', 'longitude', 'latitude']:
visitProperties[key_mapping[key]] = float(value)
else:
visitProperties[key_mapping[key]] = value
elif key == 'actionDetails':
# Handle actionDetails separately
continue
elif key == 'resolution':
if value == '':
continue
visitProperties['$screen_width'] = int(value.split('x')[0])
visitProperties['$screen_height'] = int(value.split('x')[1])
elif key == 'referrerKeyword':
if value != 'Keyword not defined':
visitProperties['referrerKeyword'] = value
else:
visitProperties[key] = value
actionDetails = event.get('actionDetails', [])
for index, action in enumerate(actionDetails):
actionProperties = {}
for key, value in action.items():
if value == '' or value is None:
continue
if key in omitted_keys:
continue
if key in key_mapping:
actionProperties[key_mapping[key]] = value
else:
actionProperties[key] = value
actionProperties.update(visitProperties)
ph_event_name = action.get('type')
if ph_event_name == 'action':
ph_event_name = "$pageview"
if ph_event_name == 'event':
ph_event_name = action.get('eventAction')
if ph_event_name == 'goal':
# Goal will be a duplicate of another action, so skip it
continue
ph_timestamp = action.get("timestamp")
if ph_timestamp:
ph_timestamp = datetime.utcfromtimestamp(ph_timestamp)
posthog.capture(
distinct_id=distinct_id,
event=ph_event_name,
properties=actionProperties,
timestamp=ph_timestamp
)

Questions?

Was this page useful?

Next article

Billing limits and alerts

To help you avoid surprise bills, PostHog enables you to set billing limits for each of our products. Setting a billing limit means we will stop ingesting and processing your data so you are not charged over the set limit. In other words, if you exceed the billing limit you set, your additional data is lost forever. To set a billing limit: Go to your organization's billing settings View the billing limit section at the bottom of the product and click "Set billing limit." Set your dollar limit…

Read next article