Data Connector for Mixpanel

The Data Connector for Mixpanel allows you to back up the event data inside Mixpanel on Treasure Data. Use cases include:

  1. One-time migration: The organization is migrating away from Mixpanel and wants to keep a raw copy of the data
  2. Incremental daily backup: The organization needs to have more granular access with SQL to event data inside Mixpanel

Table of Contents

Option 1: Use Web Console

Create a new connection

Please visit Treasure Data Connections and search and select Mixpanel. The dialog below will open. Please fill the Mixpanel details, click Continue and provide a name for this connection.



Create a new transfer

Upon creating above connection, you will be automatically taken to My Connections tab. Look for the connection you created and click New Transfer.



The dialog below will open. Please fill the details and click Next.



Next, you will see a Preview of your data similar to the dialog below. If you wish to change anything, please click on Advanced Settings or else click on Next.



Third step is to select the database and table where you want to transfer the data, as per the following dialog:



Finally, specify the schedule of the data transfer using the dialog below and click Start Transfer:



You will see the new data transfer in progress listed under the My Input Transfers tab and a corresponding job will be listed in the Jobs section.

Now, you are ready to start analyzing your data!

Option 2: Use Command Line

Step 0: Install ‘td’ command v0.11.9 or later

Install the newest Treasure Data Toolbelt.

Step 1: Look up Mixpanel API credentials

Log in to your Mixpanel account, go to “Account” then “Projects” to look up your API key and API secret.

Step 2: Create the Seed Config File (seed.yml)

Create a file called seed.yml as below.

in:
  type: mixpanel
  api_key: MIXPANEL_API_KEY
  api_secret: MIXPANEL_API_SECRET
  timezone: 'UTC'
  from_date: "2015-10-28"
  fetch_days: 1
out:
  mode: append

This seed file will be used to “guess” the full configuration file with column names. In this example, one day’s worth of data on Oct 28, 2015 is exported from Mixpanel to Treasure Data.

For more details on available out modes, see Appendix.

Step 3: Guess Fields (Generate load.yml)

Based on seed.yml, generate load.yml with the connector:guess commnand.

$ td connector:guess seed.yml -o load.yml

The load.yml file should look as below. Your column names and types depend on how you configure Mixpanel.

in:
  type: mixpanel
  api_key: MIXPANEL_API_KEY
  api_secret: MIXPANEL_API_SECRET
  timezone: UTC
  from_date: '2015-10-28'
  fetch_days: 1
  columns:
  - name: event
    type: string
  - name: "$browser"
    type: string
  - name: "$browser_version"
    type: long
  - name: "$city"
    type: string
  - name: "$current_url"
    type: string
  - name: "$initial_referrer"
    type: string
  - name: "$initial_referring_domain"
    type: string
  - name: "$lib_version"
    type: string
  - name: "$os"
    type: string
  - name: "$referrer"
    type: string
  - name: "$referring_domain"
    type: string
  - name: "$region"
    type: string
  - name: "$screen_height"
    type: long
  - name: "$screen_width"
    type: long
  - name: ENV
    type: string
  - name: RAILS_ENV
    type: string
  - name: distinct_id
    type: long
  - name: from
    type: string
  - name: job_id
    type: long
  - name: mp_country_code
    type: string
  - name: mp_lib
    type: string
  - name: name
    type: string
  - name: time
    type: long
filters:
- type: rename
  rules:
  - rule: upper_to_lower
  - rule: character_types
    pass_types: ["a-z", "0-9"]
    pass_characters: "_"
    replace: "_"
  - rule: first_character_types
    pass_types: ["a-z"]
    pass_characters: "_"
    prefix: "_"
  - rule: unique_number_suffix
    max_length: 128
out:
  mode: append
exec: {}

For more details on the rename filter, see rename filter plugin for Data Connector.

Note that the Data Connector picked up custom fields like “ENV” and “RAILS_ENV” as well as standard fields like “$current_url”. Also, note that column names are normalized in the type: rename filter section.

Step 4: Preview Data Loading

Use `connector:preview` command to preview the data load.

$ td connector:preview load.yml
+-------------------------------+-----------------+-----------------------+--------------+--------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------------------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+----------------+---------------------+--------------------+--------------+---------------------------+------------------+----------------------------------+-------------+------------------------+---------------+-------------------------+------------+
| event:string                  | _browser:string | _browser_version:long | _city:string | _current_url:string                                                            | _initial_referrer:string                                                                                                                                                                                               | _initial_referring_domain:string | _lib_version:string | _os:string | _referrer:string                                                                                                                                                                                                       | _referring_domain:string    | _region:string | _screen_height:long | _screen_width:long | env:string   | rails_env:string          | distinct_id:long | from:string                      | job_id:long | mp_country_code:string | mp_lib:string | name:string             | time:long  |
+-------------------------------+-----------------+-----------------------+--------------+--------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------------------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+----------------+---------------------+--------------------+--------------+---------------------------+------------------+----------------------------------+-------------+------------------------+---------------+-------------------------+------------+
| "view_job"                    | "Chrome"        | 46                    | "Tokyo"      | "https://console.treasuredata.com/jobs/37608183/edit"                          | "$direct"                                                                                                                                                                                                              | "$direct"                        | "2.7.1"             | "Mac OS X" | "https://console.treasuredata.com/databases"                                                                                                                                                                           | "console.treasuredata.com"  | "Tōkyō"        | 1440                | 2560               | "production" | "production_heroku"       | 5421             | nil                              | 37608204    | "JP"                   | "web"         | nil                     | 1445990400 |
| "view_job"                    | "Chrome"        | 46                    | nil          | "https://console.treasuredata.com/query_forms/new"                             | "$direct"                                                                                                                                                                                                              | "$direct"                        | "2.7.1"             | "Mac OS X" | "https://console.treasuredata.com/jobs/37602729"                                                                                                                                                                       | "console.treasuredata.com"  | nil            | 800                 | 1280               | "production" | "production_heroku"       | 6679             | nil                              | 37608227    | "JP"                   | "web"         | nil                     | 1445990404 |

If the data looks good, then create the database and the table on Treasure Data where you wish to import Mixpanel data:

td db:create mixpanel_historical
td table:create mixpanel_historical app_name

Step 5: Load Data

Finally, submit the load job. It may take a couple of hours depending on the size of the data. Users need to specify the database and table where their data is stored.

It’s also recommended to specify --time-column option, since Treasure Data’s storage is partitioned by time (see architecture) If the option is not provided, the Data Connector will choose the first long or timestamp column as the partitioning time. The type of the column specified by --time-column must be either of long and timestamp type.

td connector:issue load.yml --database mixpanel_historical --table app_name --time-column time

The above command assumes you have already created database(td_sample_db) and table(td_sample_table). If the database or the table do not exist in TD this command will not succeed, so create the database and table manually or use --auto-create-table option with td connector:issue command to auto create the database and table:

$ td connector:issue load.yml --database td_sample_db --table td_sample_table --time-column created_at --auto-create-table
Untitled-3
You can assign Time Format column to the "Partitioning Key" by "--time-column" option.

You can check that the data is loaded by running a SQL query directly against your data in Treasure Data:

td query -T presto -w -d mixpanel_historical 'SELECT COUNT(1) FROM app_name'

Scheduling Incremental Data Loading

Unless you are migrating off of Mixpanel completely, Mixpanel data must be incrementally loaded into Treasure Data regularly. The Data Connector’s scheduling function comes in handy for this purpose.

Once scheduled, the Mixpanel Data Connector’s successive runs increment the from_date parameter by fetch_days. For example, if the initial run in load.yml was

from_date: '2015-10-28'
fetch_days: 1

Then, the next run will be

from_data: '2015-10-29'
fetch_days: 1

You do not need to update load.yml once it is uploaded since from_date field is automatically updated on the server side.

Suppose you wish to schedule a daily upload. Then, make sure that the initial from_date is at least two days ago and set fetch_days: 1 in load.yml. Then, the following command creates a daily job called “daily_mixpanel_import” which loads historical data to mixpanel_historical.app_name on Treasure Data every day.

$ td connector:create \
    daily_mixpanel_import \
    "10 0 * * *" \
    mixpanel_historical \
    app_name \
    load.yml \
    --time-column time # optional

The historical runs of the import can be seen with td connector:history <name>, e.g., td connector:history daily_mixpanel_import.

Appendix

A) Modes for out plugin

You can specify file import mode in out section of seed.yml.

append (default)

This is the default mode and records are appended to the target table.

in:
  ...
out:
  mode: append
replace (In td 0.11.10 and later)

This mode replaces data in the target table. Please note that any manual schema changes made to the target table will remain intact with this mode.

in:
  ...
out:
  mode: replace

Last modified: Nov 15 2016 07:41:10 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.