Use Treasure Data's Customer Data Platform to ingest your YouTube Analytics data.
Basic knowledge of
Treasure Data Console (and Toolbelt)
Use from TD Console
You can create the data connector from the TD Console. This is the most common approach.
Create a new connection
Visit the Treasure Data Catalog, search and select YouTube.
The following dialog opens.
Choose one of the Authentication Modes:
Your custom OAuth app
Your custom OAuth app is the preferable mode. See this Q&A Section for more information.
If you choose Your custom OAuth app, provide the following parameters:
OAuth client_id: the client_id for the custom OAuth app set up on Google Console. For further details, see Appendix.
OAuth client_secret: the client_secret for the custom OAuth app set up on Google Console. For further details, see Appendix.
OAuth refresh_token: the refresh_token for your account that allows all scopes for the custom OAuth app. For further details, see Appendix.
If you choose OAuth, select an existing OAuth connection, or click Click Here to connect a new account.
Click Continue. Provide a name for your connector. Click Done.
Create a New Source
Click New Source from the saved authentications
Provide your account CMS ID and Channel IDs if you participate in the https://www.youtube.com/yt/creators/benefits/ program. Otherwise, leave the parameters empty.
Choose the Report Type (Video, Playlist, or Channel), and choose the report preset.
CMS ID: CMS ID of the YouTube account, if participating in the YouTube Partner Program. Otherwise, leave this as blank.
Channel IDs: The list of Channel IDs under the management of the Content Owner. For non-partner YouTube accounts, leave this blank.
Report Type: The expected target of the analytics: video, playlist, or channels.
Report Presets: The collection of dimensions, metrics, and filters.
Playlists: playlist IDs to filter the list of videos. This parameter is only available for Videos report presets.
Dimensions: A list of Analytics dimensions
Metrics: A list of Analytics metrics
Filters: A list of filters (this parameter only appears in some presets)
Max Results: The number of analytics records for the API to return. This parameter is only mandatory for some specific reports, otherwise, leave it blank to retrieve all analytics.
Sort: The metrics to sort the analytics result from the API. This parameter is only mandatory for some specific reports.
Include Historical Channel Data: Specify whether the YouTube Analytics API returns the historical analytics for the channel before joining the Content Owner. This parameter is not mandatory for non-Content Owner reports.
Load from published date: The earliest published date of the specified Report Type becomes the start date to retrieve analytics.
Begin date: The start date to retrieve analytics (either "Load from published date" or "Begin date" need to be specified).
End date: The end date to retrieve analytics (inclusive).
Aggregation Period (Day): The number of days to group each record of the analytics.
Duration (Day): The number of days to retrieve analytics from the current end date in the next incremental run.
Incremental: When running on schedule, the time window of the fetched data automatically shifts forward on each run. For example, if the initial config is January 1, fetched until January 15 with ten days in duration, the first run fetches data modified from January 1 to January 15, the second run fetches from January 16 to January 25, and so on.
- Ignore Empty Playlist: Skip empty playlist instead of throwing errors
You can see a preview of your data before running the import by selecting Generate Preview. Data shown in the data preview is approximated from your source. It is not the actual data that is imported. Click Next. To preview your data, select Generate Preview. Optionally, click Next. Verify that the data looks approximately like you expect it to. Select Next.
Data preview is optional and you can safely skip to the next page of the dialog if you want.
Data shown in the data preview is approximated from your source. It is not the actual data that is imported.
To preview your data, select Generate Preview. Optionally, click Next.
Verify that the data looks approximately like you expect it to.
For data placement, select the target database and table where you want your data placed and indicate how often the import should run. Select Next. Under Storage you will create a new or select an existing database and create a new or select an existing table for where you want to place the imported data. Select a Database > Select an existing or Create New Database. Optionally, type a database name. Select a Table> Select an existing or Create New Table. Optionally, type a table name. Choose the method for importing the data. Append (default)-Data import results are appended to the table. Always Replace-Replaces the entire content of an existing table with the result output of the query. If the table does not exist, a new table is created. Replace on New Data-Only replace the entire content of an existing table with the result output when there is new data. Select the Timestamp-based Partition Key column. Select the Timezone for your data storage. Under Schedule, you can choose when and how often you want to run this query. Select Off. Select Scheduling Timezone. Select Create & Run Now. Repeat the query: Select On. Select the Schedule. The UI provides these four options: @hourly, @daily and @monthly or custom cron. You can also select Delay Transfer and add a delay of execution time. Select Scheduling Timezone. Select Create & Run Now. After your transfer has run, you can see the results of your transfer in Data Workbench > Databases.
If the table does not exist, it will be created.
If you want to set a different partition key seed than the default key, you can specify the long or timestamp column as the partitioning time. As a default time column, it uses upload_time with the add_time filter.
Select Next. Under Storage you will create a new or select an existing database and create a new or select an existing table for where you want to place the imported data.
Select a Database > Select an existing or Create New Database.
Optionally, type a database name.
Select a Table> Select an existing or Create New Table.
Optionally, type a table name.
Choose the method for importing the data.
Append (default)-Data import results are appended to the table.
Always Replace-Replaces the entire content of an existing table with the result output of the query. If the table does not exist, a new table is created.
Replace on New Data-Only replace the entire content of an existing table with the result output when there is new data.
Select the Timestamp-based Partition Key column.
Select the Timezone for your data storage.
Under Schedule, you can choose when and how often you want to run this query.
Select Scheduling Timezone.
Select Create & Run Now.
Repeat the query:
Select the Schedule. The UI provides these four options: @hourly, @daily and @monthly or custom cron.
You can also select Delay Transfer and add a delay of execution time.
Select Scheduling Timezone.
Select Create & Run Now.
After your transfer has run, you can see the results of your transfer in Data Workbench > Databases.
Using the Command Line
You can create the data connector from the CLI instead of the TD Console if you want.
Install the prerequisites
Install the latest td tool via Ruby gem:
There are other install methods. For more information, see Treasure Data Toolbelt.
Create the config file (config.yml)
The following is an example configuration file to request daily basic statistics for all videos on the YouTube channel.
Specify the client_id, client_secret, access_token, and refresh_token for authenticating with Google App. For more information see Appendix.
Specify the target of retrieving analytics in the report_type parameter:
video: retrieving analytics for individual videos in the channels or in the specific playlists (specified as a comma-separated list of Playlist IDs in the playlist parameter).
playlist: retrieve analytics for individual playlists
channel: retrieve analytics for individual channels under the management of the account.
Choose a Preset or Individual Dimensions and Metrics
A preset is a predefined group of parameters. Following are the available enumerators for this parameter along with the equivalent group of parameters.
For video and channel report_type:
audience_retention (only for video)
You can find the value for dimension and metric parameters (as well as other required parameters) in the following articles from Google:
CMS ID (required if your YouTube account is a Content Owner)
List of Channel IDs. Effective only if your account is Content Owner (array, optional, if empty, the plugin fetches analytics from channels that Content Owner has access to). This parameter applies to only content owner reports.
List of Playlist IDs, only effective for `video` report type (array, optional, if blank, the plugin fetches analytics of all videos)
A comma-separated list of YouTube Analytics metrics, such as views or likes, dislikes (string, required if `report_preset` is omitted)
A comma-separated list of YouTube Analytics dimensions (string, optional)
A list of filters that should be applied when retrieving YouTube Analytics data, separated by semi-colons (string, optional)
The maximum number of rows to include in the response, required for only some reports (integer, optional)
A comma-separated list of dimensions or metrics that determine the sort order for YouTube Analytics data. By default, the sort order is ascending. The - prefix causes descending sort order (string, optional)
Incremental loading, for example, when running on schedule, the time window of the fetched data automatically shifts forward on each run (boolean, optional, default: `true`). For example, if the initial config is January 1, with ten days in duration, the first run fetches data modified from January 1 to January 10, the second run fetches from January 11 to January 20, and so on.
The earliest published date of specified content (Report Type) becomes the start date to retrieve analytics (boolean, optional)
The start date to retrieve analytics, supported format: "yyyy-MM-dd" (string, optional). Specify either:
The end date to retrieve analytics, supported format: "yyyy-MM-dd" (string, required).
The number of days to retrieve analytics from the current end date in the next incremental run (integer, required in incremental mode).
The number of days to break down analytics (integer, optional, default: `1`, for example, daily)
Indicates whether to include channels' watch time and view data from the time period prior to when the channels were linked to the content owner. The default parameter value is `false` which means importing only watch time and viewing data from the dates that channels were linked to the content owner (boolean, optional, default: `false`). This parameter only applies to content owner reports.
The number of retries before giving up (integer, optional, default: `7`)
The initial waiting duration between retries, in millisecond (integer, optional, default: `30000`, ie. 30 seconds)
The maximum waiting duration between retries, in milliseconds (integer, optional, default: `1800000`, ie. 30 minutes)
Ignore empty playlist instead of throwing error
Execute Load Job
You must specify the database and table to store the data.
The option --time-column is preferred because Treasure Data partitions the storage by time. If this option is not available, the data connector selects the first long or timestamp column as the partitioning time. The type of the column specified by --time-column must be either of long or timestamp type (use Preview results to check for the available column name and type). A time column is available at the end of the output.
If your data doesn’t have a time column, you can add the column by using the add_time filter option. See details at add_time filter function.
Submit the load job. It may take a couple of hours depending on the data size. You need to specify the database and table to store the data.
td connector:issue assumes you have already created a database (sample_db) and a table (sample_table). If the database or the table do not exist in TD, td connector:issue will fail. Therefore you must create the database and table manually or use --auto-create-table with td connector:issue to automatically create the database and table:
You can schedule a periodic data connector execution for a periodic YouTube import. By using this feature, you no longer need a cron daemon on your local data center.
Create the schedule
td connector:create creates a new schedule. The name of the schedule, the cron-style schedule, the database and table to store the data, and the Data Connector configuration file are mandatory.
The cron parameter also accepts three options: @hourly, @daily, and @monthly. For details, see Scheduled Jobs.
By default, the schedule is set up in the UTC timezone. You can set the timezone using --timezone or -t option. The --timezone option supports only extended timezone formats like Asia/Tokyo, America/Los_Angeles, and so on. Timezone abbreviations like PST, CST are not supported and might lead to unexpected schedules.
List the Schedules
You can see the list of currently scheduled entries by td connector:list
Show the Setting and History of Schedules
td connector:show shows the execution setting of a schedule entry.
td connector:history shows the execution history of a schedule entry. To investigate the results of an individual execution, use td job <jobid>.
Delete the Schedule
td connector:delete removes the schedule.
YouTube Analytics data is based on PST and has a delay of up to 72 hours (https://support.google.com/youtube/answer/1714329?hl=en).
You cannot retrieve analytics for removed Videos and Playlists.
About YouTube API Quotas and Your custom OAuth app
YouTube has the following limitations on the YouTube Analytics API Quotas:
The connector makes many YouTube API calls to ingest daily YouTube analytics. For example, if your channel has 1000 videos, and you want to import 100 days of historical data, the total number of API calls is 1,000x100 = 100,000.
The connector estimates the total number of executed requests before executing further to Youtube API. If the estimation is higher than 100,000, the job stops. You must either increase the Aggregation Period or bring in (make earlier) the End Date. Here is how estimation is calculated.
Let say there are 4 videos:
Video 1 is published at 2010-02-10
Video 2 is published at 2012-03-11
Video 3 is published at 2016-04-12
Video 4 is published at 2018-05-13
In the following example, the begin date is the date that the data connector starts to fetch analytics data from YouTube Analytics. The publish date is the date that the video is published in YouTube. The end date is the date that you specify in Treasure Data to end the ingestion session.
If the end date is 2017-06-14, and the begin date is 2010-01-01, then the aggregation period is 7. The formula to calculate total number of calls for 1 video is: End Date - Begin Date + 1 / Aggregation_Period (with begin date being the same date as the published date).
The end date is inclusive, meaning that the analytics data that is available on that
end_date is ingested as well. With 4 videos:
Video 1: ("2017-06-14" - "2010-02-10") / 7 = 3040 / 7 = 435
Video 2: ("2017-06-14" - "2012-03-11") / 7 = 2280 / 7 = 326
Video 3: ("2017-06-14" - "2016-04-12") / 7 = 425 / 7 = 61
Video 4 is skipped because it is published after the end date; the end date is in 2017 and video 4 published in 2018)
The estimation of total requests is: 435 + 326 + 61 = 822 requests which is good because it is still < 100,000 requests.
When using the same connection (see Create a New Connection) for multiple inputs, the quotas will run out quickly, and the import job will stop. A workaround is to create multiple connections with different OAuth apps (see Appendix).
Why I can't get analytics for Content Owner report presets?
Content Owner reports are only available to the accounts that join https://www.youtube.com/yt/creators/benefits/ program. If you are not participating in this YouTube program, retrieving content owner analytics causes an exception.
Why don't the TD analytics for the last day and today match with the one displayed on the YouTube site and YouTube Analytics Dashboard?
Be aware that it takes up to 72 hours for the analytics to accumulate. For more information, see https://support.google.com/youtube/answer/1714329?hl=en.
Why are there discrepancies between the data display on the YouTube Creator Studio Dashboard and the data ingested?
During our detail testing, there are some discrepancies between the analytics displayed on the YouTube Creator Studio and the one ingested. We investigated and found some reasons:
When presenting, Creator Studio reports round some numbers before showing the data on the website.
YouTube Analytics API applies some limitation on the returned analytics. To match with Creator Studio, you may need to remove specific dimensions and filters. For more information, refer to https://support.google.com/youtube/answer/9101241
Why does the basic statistics preset does not include the redViews metric?
During the testing phase, we recognize that if the request contains some combination of dimensions and metrics, the YouTube Analytics API returns an empty record without any further notices or exceptions. For example, if the country and the redViews metrics are in the same request, there will be no analytics returned.
To retrieve the redViews metric, you must remove the country dimension and vice versa. Therefore there is no report presets that has this pair of dimension and metric.
How do I ingest analytics for users reaching Youtube videos through TrueView ?
To retrieve the views count that come from TrueView, use the Preset: Playback Traffic Source Detail but change the parameter Filter from insightTrafficSourceType==YT_SEARCH to insightTrafficSourceType==ADVERTISING. The ingested insightTrafficSourceDetail column will hold the values, including "TrueView in-search and in-display" and "TrueView in-stream". See the list of possible values for insightTrafficSourceType and insightTrafficSourceDetail here: https://developers.google.com/youtube/analytics/dimensions#Traffic_Source_Dimensions.
The default preset Playback Traffic Source Detail retrieves the top 25 search terms that lead to the video or channel. Changing from YT_SEARCH to ADVERTISING results in a different set of values for insightTrafficSourceDetail.
Why is my video, uploaded 1 month ago and published 1 week ago, show only analytic data starting from published date?
The video creation and video upload are the same. However, there is a difference between video creation and publish date. If video is uploaded to Youtube channel, it is still private video and no one is able to watch that video until it is published. There is no interaction data from creation time until published time, therefore, only analytic data from the published time and onward can be returned from Youtube Analytics API.
How to create a custom OAuth app
The following steps show how to set up a custom OAuth app on https://console.developers.google.com.
Create a new Google project
Follow https://cloud.google.com/resource-manager/docs/creating-managing-projects to create a new project on your Google Cloud Console.
Enable APIs and Services
Follow https://cloud.google.com/apis/docs/enable-disable-apis?hl=en and enable the following APIs:
YouTube Analytics API
YouTube Data API v3
Create an OAuth Client ID
Adding https://developers.google.com/oauthplayground to the Authorized redirect URIs helps to get the refresh_token using the OAuth Playground. You don't have to add OAuth Playground if you have other methods.
Save the client id and secret
Copy and save the client id and secret somewhere to use in the next steps.
How to get the refresh_token
The following steps show how to use OAuth Playground to retrieve the refresh_token for creating a YouTube data connection. You don't have to use OAuth Playground if you have other ways to retrieve this information.
Setup Client Id and Secret on OAuth Playground
Enter the ID and Secret set up in Appendix.
Specify the scopes
Paste the following scopes into Input your own scopes box:
Allow the scopes
Click on Authorize APIs and follow the usual Google steps to log in and allow the scopes.
Request the refresh_token
Click on Exchange authorization code for tokens.
Wait until the process finishes and then copy the Refresh token to use in your Treasure Data YouTube connector configuration.