This is a Singer tap that produces JSON-formatted data following the Singer spec.
This tap:
- Pulls raw data from the Mixpanel Event Export API and the Mixpanel Query API.
- Supports following two server
- Standard Server
- EU Residency Server
- Extracts the following resources:
- Export (Events)
- Engage (People/Users)
- Funnels
- Annotations
- Cohorts
- Cohort Members
- Revenue
- Outputs the schema for each resource
- Incrementally pulls data based on the input state
- Uses date-windowing to chunk/loop through
export,revenue,funnels. - Incorporates attribution window for latency look-back to accommodate delays in data reconciliation.
- Standard Server endpoint: https://data.mixpanel.com/api/2.0/export
- EU Residency Server endpoint: https://data-eu.mixpanel.com/api/2.0/export
- Primary key fields:
event,time,distinct_id - Replication strategy: INCREMENTAL (query filtered)
- Bookmark:
time - Bookmark query field:
from_date,to_date
- Bookmark:
- Transformations: De-nest
propertiesto root-level, re-name properties with leading$...tomp_reserved_..., convert datetimes from project timezone to UTC. - Optional parameters
export_eventsto export only certain events
- Standard Server endpoint: https://mixpanel.com/api/2.0/engage
- EU Residency Server endpoint: https://eu.mixpanel.com/api/2.0/engage
- Primary key fields:
distinct_id - Replication strategy: FULL_TABLE (all records, every load)
- Transformations: De-nest
$propertiesto root-level, re-name properties with leading$...tomp_reserved_....
- Standard Server endpoint 1 (name, id): https://data.mixpanel.com/api/2.0/export
- Standard Server endpoint 2 (date, measures): https://mixpanel.com/api/2.0/funnels
- EU Residency Server endpoint 1 (name, id): https://data-eu.mixpanel.com/api/2.0/export
- EU Residency Server endpoint 2 (date, measures): https://eu.mixpanel.com/api/2.0/funnels
- Primary key fields:
funnel_id,date - Parameters:
funnel_id: {funnel_id} (from Endpoint 1)unit: day
- Replication strategy: INCREMENTAL (query filtered)
- Bookmark:
date - Bookmark query field:
from_date,to_date
- Bookmark:
- Transformations: Combine Endpoint 1 & 2 results, convert
datekeys to list toresultslist-array.
- Standard Server endpoint: https://mixpanel.com/api/2.0/engage/revenue
- EU Residency Server endpoint: https://eu.mixpanel.com/api/2.0/engage/revenue
- Primary key fields:
date - Parameters:
unit: day
- Replication strategy: INCREMENTAL (query filtered)
- Bookmark:
date - Bookmark query field:
from_date,to_date
- Bookmark:
- Transformations: Convert
datekeys to list toresultslist-array.
- Standard Server endpoint: https://mixpanel.com/api/2.0/annotations
- EU Residency Server endpoint: https://eu.mixpanel.com/api/2.0/annotations
- Primary key fields:
date - Replication strategy: FULL_TABLE
- Transformations: None.
- Standard Server endpoint: https://mixpanel.com/api/2.0/cohorts/list
- EU Residency Server endpoint: https://eu.mixpanel.com/api/2.0/cohorts/list
- Primary key fields:
id - Replication strategy: FULL_TABLE
- Transformations: None.
- Standard Server endpoint: https://mixpanel.com/api/2.0/cohorts/list
- EU Residency Server endpoint: https://eu.mixpanel.com/api/2.0/cohorts/list
- Primary key fields:
distinct_id,cohort_id - Parameters:
filter_by_cohort: {cohort_id} (fromcohortsendpoint)
- Replication strategy: FULL_TABLE
- Transformations: For each
cohort_idincohortsendpoint, queryengageendpoint withfilter_by_cohortparameter to create list ofdistinct_idfor eachcohort_id.
The Mixpanel API uses Basic Authorization with the api_secret from the tap config in base-64 encoded format. It is slightly different than normal Basic Authorization with username/password. All requests should include this header with the api_secret as the username, with no password:
- Authorization:
Basic <base-64 encoded api_secret>
- If you selected eu_residency_server then please make sure you enter api_secret of that project only.
More details may be found in the Mixpanel API Authentication instructions.
-
Install
Clone this repository, and then install using setup.py. We recommend using a virtualenv:
> virtualenv -p python3 venv > source venv/bin/activate > python setup.py install OR > cd .../tap-mixpanel > pip install .
-
Dependent libraries. The following dependent libraries were installed.
> pip install singer-python > pip install jsonlines > pip install singer-tools > pip install target-stitch > pip install target-json
- singer-tools
- target-stitch
- jsonlines needed for
exportendpoint json-lines formatted data
-
Create your tap's
config.jsonfile. The tap config file for this tap should include these entries:start_date- the default value to use if no bookmark exists for an endpoint (rfc3339 date string)user_agent(string, optional): Process and email for API logging purposes. Example:tap-mixpanel <api_user_email@your_company.com>api_secret(string,ABCdef123): an API secret for each project in Mixpanel. This can be found in the Mixpanel Console, upper-right Settings (gear icon), Organization Settings > Projects and in the Access Keys section. For this tap, only the api_secret is needed (the api_key is legacy and the token is used only for uploading data). Each Mixpanel project has a different api_secret; therefore each Singer tap pipeline instance is for a single project.date_window_size(integer,30): Number of days for date window looping through transactional endpoints with from_date and to_date. Default date_window_size is 30 days. Clients with large volumes of events may want to decrease this to 14, 7, or even down to 1-2 days.attribution_window(integer,5): Latency minimum number of days to look-back to account for delays in attributing accurate results. Default attribution window is 5 days.project_timezone(string likeUS/Pacific): Time zone in which integer date times are stored. The project timezone may be found in the project settings in the Mixpanel console. More info about timezones.select_properties_by_default(trueorfalse): Mixpanel properties are not fixed and depend on the date being uploaded. During Discovery mode and catalog.json setup, all current/existing properties will be captured. Setting this config parameter to true ensures that new properties on events and engage records are captured. Otherwise new properties will be ignored.eu_residency_server(trueorfalse): Data Residency refers to the physical/geographical storage location of an organization's data or information. Setting this config parameter to true ensures that it uses eu_residency_server endpoint to capture the records. As a Mixpanel customer in the EU, you have the option to send your data to Mixpanel's EU data center, and have your data stored exclusively in the EU when creating a new project. More info about eu_residency_server.request_timeout(integer,300): Max time for which request should wait to get a response. Default request_timeout is 300 seconds.
{ "api_secret": "YOUR_API_SECRET", "date_window_size": "30", "attribution_window": "5", "project_timezone": "US/Pacific", "select_properties_by_default": "true", "start_date": "2019-01-01T00:00:00Z", "user_agent": "tap-mixpanel <api_user_email@your_company.com>", "eu_residency_server": "true", "request_timeout": 300 }If you want to export only certain events from the Raw export APIthen provide the value of
export_events"export_events": "event_one,event_two"
Optionally, also create a
state.jsonfile.currently_syncingis an optional attribute used for identifying the last object to be synced in case the job is interrupted mid-stream. The next run would begin where the last job left off.{ "currently_syncing": "engage", "bookmarks": { "export": "2019-09-27T22:34:39.000000Z", "funnels": "2019-09-28T15:30:26.000000Z", "revenue": "2019-09-28T18:23:53Z" } } -
Run the Tap in Discovery Mode This creates a catalog.json for selecting objects/fields to integrate:
tap-mixpanel --config config.json --discover > catalog.jsonSee the Singer docs on discovery mode here.
-
Run the Tap in Sync Mode (with catalog) and write out to state file
For Sync mode:
> tap-mixpanel --config tap_config.json --catalog catalog.json > state.json > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
To load to json files to verify outputs:
> tap-mixpanel --config tap_config.json --catalog catalog.json | target-json > state.json > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
To pseudo-load to Stitch Import API with dry run:
> tap-mixpanel --config tap_config.json --catalog catalog.json | target-stitch --config target_config.json --dry-run > state.json > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
-
Test the Tap
While developing the mixpanel tap, the following utilities were run in accordance with Singer.io best practices: Pylint to improve code quality:
> pylint tap_mixpanel -d missing-docstring -d logging-format-interpolation -d too-many-locals -d too-many-argumentsPylint test resulted in the following score:
Your code has been rated at 9.67/10
To check the tap and verify working:
> tap-mixpanel --config tap_config.json --catalog catalog.json | singer-check-tap > state.json > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
Check tap resulted in the following:
The output is valid. It contained 15697 messages for 7 streams. 7 schema messages 15661 record messages 29 state messages Details by stream: +----------------+---------+---------+ | stream | records | schemas | +----------------+---------+---------+ | revenue | 134 | 1 | | export | 2811 | 1 | | funnels | 132 | 1 | | cohort_members | 454 | 1 | | engage | 12119 | 1 | | cohorts | 5 | 1 | | annotations | 6 | 1 | +----------------+---------+---------+
Unit tests may be run with the following.
python -m pytest --verboseNote, you may need to install test dependencies.
pip install -e .'[dev]'
Copyright © 2019 Stitch