Use Treasure Data's Customer Data Platform to ingest your YouTube Analytics data.
Basic knowledge of
- Treasure Data Console (and Toolbelt)
- YouTube Analytics
You can create the data connector from the TD Console. This is the most common approach.
Visit the Treasure Data Catalog, search and select YouTube.

The following dialog opens.

Choose one of the Authentication Modes:
- OAuth
- Your custom OAuth app
Your custom OAuth app is the preferable mode. See this Q&A Section for more information.
If you choose Your custom OAuth app, provide the following parameters:
- OAuth client_id: the client_id for the custom OAuth app set up on Google Console. For further details, see Appendix.
- OAuth client_secret: the client_secret for the custom OAuth app set up on Google Console. For further details, see Appendix.
- OAuth refresh_token: the refresh_token for your account that allows all scopes for the custom OAuth app. For further details, see Appendix.
If you choose OAuth, select an existing OAuth connection, or click Click Here to connect a new account.

Click Continue. Provide a name for your connector. Click Done.

Click New Source from the saved authentications

Provide your account CMS ID and Channel IDs if you participate in the https://www.youtube.com/yt/creators/benefits/ program. Otherwise, leave the parameters empty.
Choose the Report Type (Video, Playlist, or Channel), and choose the report preset.


Parameters:
- CMS ID: CMS ID of the YouTube account, if participating in the YouTube Partner Program. Otherwise, leave this as blank.
- Channel IDs: The list of Channel IDs under the management of the Content Owner. For non-partner YouTube accounts, leave this blank.
- Report Type: The expected target of the analytics: video, playlist, or channels.
- Report Presets: The collection of dimensions, metrics, and filters.
- Playlists: playlist IDs to filter the list of videos. This parameter is only available for Videos report presets.
- Dimensions: A list of Analytics dimensions
- Metrics: A list of Analytics metrics
- Filters: A list of filters (this parameter only appears in some presets)
- Max Results: The number of analytics records for the API to return. This parameter is only mandatory for some specific reports, otherwise, leave it blank to retrieve all analytics.
- Sort: The metrics to sort the analytics result from the API. This parameter is only mandatory for some specific reports.
- Include Historical Channel Data: Specify whether the YouTube Analytics API returns the historical analytics for the channel before joining the Content Owner. This parameter is not mandatory for non-Content Owner reports.
- Load from published date: The earliest published date of the specified Report Type becomes the start date to retrieve analytics.
- Begin date: The start date to retrieve analytics (either "Load from published date" or "Begin date" need to be specified).
- End date: The end date to retrieve analytics (inclusive).
- Aggregation Period (Day): The number of days to group each record of the analytics.
- Duration (Day): The number of days to retrieve analytics from the current end date in the next incremental run.
- Incremental: When running on schedule, the time window of the fetched data automatically shifts forward on each run. For example, if the initial config is January 1, fetched until January 15 with ten days in duration, the first run fetches data modified from January 1 to January 15, the second run fetches from January 16 to January 25, and so on.
- Ignore Empty Playlist: Skip empty playlist instead of throwing errors
You can see a preview of your data before running the import by selecting Generate Preview. Data preview is optional and you can safely skip to the next page of the dialog if you choose to.
- Select Next. The Data Preview page opens.
- If you want to preview your data, select Generate Preview.
- Verify the data.
For data placement, select the target database and table where you want your data placed and indicate how often the import should run.
Select Next. Under Storage, you will create a new or select an existing database and create a new or select an existing table for where you want to place the imported data.
Select a Database > Select an existing or Create New Database.
Optionally, type a database name.
Select a Table> Select an existing or Create New Table.
Optionally, type a table name.
Choose the method for importing the data.
- Append (default)-Data import results are appended to the table. If the table does not exist, it will be created.
- Always Replace-Replaces the entire content of an existing table with the result output of the query. If the table does not exist, a new table is created.
- Replace on New Data-Only replace the entire content of an existing table with the result output when there is new data.
Select the Timestamp-based Partition Key column. If you want to set a different partition key seed than the default key, you can specify the long or timestamp column as the partitioning time. As a default time column, it uses upload_time with the add_time filter.
Select the Timezone for your data storage.
Under Schedule, you can choose when and how often you want to run this query.
- Select Off.
- Select Scheduling Timezone.
- Select Create & Run Now.
- Select On.
- Select the Schedule. The UI provides these four options: @hourly, @daily and @monthly or custom cron.
- You can also select Delay Transfer and add a delay of execution time.
- Select Scheduling Timezone.
- Select Create & Run Now.
After your transfer has run, you can see the results of your transfer in Data Workbench > Databases.
You can create the data connector from the CLI instead of the TD Console if you want.
Install the latest td tool via Ruby gem:
$ gem install td
$ td --version
0.16.1There are other install methods. For more information, see Treasure Data Toolbelt.
- Example (config.yml)
The following is an example configuration file to request daily basic statistics for all videos on the YouTube channel.
in:
type: youtube
client_id: xxxxxxxxxxxxx
client_secret: xxxxxxxxxxxxx
access_token: xxxxxxxxxxxxx
refresh_token: xxxxxxxxxxxxx
report_type: video
video_report_preset: basic_statistics
skip_dimension: false
dimension_video_basic_statistics: country
metric_video_basic_statistics: views,comments,likes,dislikes
begin_date: 2018-01-01
end_date: 2018-02-01
duration: 30
interval: 1
ignore_empty_playlist: true
out:
mode: appendSpecify the client_id, client_secret, access_token, and refresh_token for authenticating with Google App. For more information see Appendix.
Specify the target of retrieving analytics in the report_type parameter:
- video: retrieving analytics for individual videos in the channels or in the specific playlists (specified as a comma-separated list of Playlist IDs in the playlist parameter).
- playlist: retrieve analytics for individual playlists
- channel: retrieve analytics for individual channels under the management of the account.
A preset is a predefined group of parameters. Following are the available enumerators for this parameter along with the equivalent group of parameters.
basic_statistics
dimension: country
metric: views,comments,likes,dislikes,videosAddedToPlaylists,videosRemovedFromPlaylists,shares,estimatedMinutesWatched,averageViewDuration,averageViewPercentage,annotationClickThroughRate,annotationCloseRate,annotationImpressions,annotationClickableImpressions,annotationClosableImpressions,annotationClicks,annotationCloses,cardClickRate,cardTeaserClickRate,cardImpressions,cardTeaserImpressions,cardClicks,cardTeaserClicks,subscribersGained,subscribersLostbasic_statistics_co
dimension: country
metric: views,comments,likes,dislikes,videosAddedToPlaylists,videosRemovedFromPlaylists,shares,estimatedMinutesWatched,averageViewDuration,averageViewPercentage,annotationClickThroughRate,annotationCloseRate,annotationImpressions,annotationClickableImpressions,annotationClosableImpressions,annotationClicks,annotationCloses,cardClickRate,cardTeaserClickRate,cardImpressions,cardTeaserImpressions,cardClicks,cardTeaserClicks,subscribersGained,subscribersLost,estimatedRevenue,estimatedAdRevenue,grossRevenue,estimatedRedPartnerRevenue,monetizedPlaybacks,playbackBasedCpm,adImpressions,cpmbasic_statistics_us
dimension: province,subscribedStatus
metric: views,redViews,estimatedMinutesWatched,estimatedRedMinutesWatched,averageViewDuration,averageViewPercentage,annotationClickThroughRate,annotationCloseRate,annotationImpressions,annotationClickableImpressions,annotationClosableImpressions,annotationClicks,annotationCloses,cardClickRate,cardTeaserClickRate,cardImpressions,cardTeaserImpressions,cardClicks,cardTeaserClicks
filter: country==USplayback_detail
dimension: country,liveOrOnDemand,subscribedStatus,youtubeProduct
metric: views,estimatedMinutesWatched,averageViewDurationplayback_detail_us
dimension: province,liveOrOnDemand,subscribedStatus,youtubeProduct
metric: views,redViews,estimatedMinutesWatched,estimatedRedMinutesWatched,averageViewDuration
filter: country==USplayback_location
dimension: insightPlaybackLocationType,liveOrOnDemand,subscribedStatus
metric: views,estimatedMinutesWatchedplayback_location_detail
dimension: insightPlaybackLocationDetail
metric: views,estimatedMinutesWatched
filter: insightPlaybackLocationType==EMBEDDED
max_results: 25
sort: -viewsplayback_traffic_source
dimension: insightTrafficSourceType,liveOrOnDemand,subscribedStatus
metric: views,estimatedMinutesWatchedplayback_traffic_source_detail
dimension: insightTrafficSourceDetail
metric: views,estimatedMinutesWatched
filter: insightTrafficSourceType==YT_SEARCH
max_results: 25
sort: -viewsdevice_os_type
dimension: deviceType,operatingSystem,liveOrOnDemand,subscribedStatus,youtubeProduct
metric: views,estimatedMinutesWatchedviewer_demographic
dimension: ageGroup,gender
metric: viewerPercentageengagement_and_content_sharing
dimension: sharingService,subscribedStatus
metric: sharesaudience_retention (only for video)
dimension: elapsedVideoTimeRatio
metric: audienceWatchRatio,relativeRetentionPerformance
filter: audienceType==ORGANICbasic_statistics
dimension: country,subscribedStatus,youtubeProduct
metric: views,estimatedMinutesWatched,averageViewDuration,playlistStarts,viewsPerPlaylistStart,averageTimeInPlaylist
filter: isCurated==1basic_statistics_us
dimension: province,subscribedStatus,youtubeProduct
metric: views,redViews,estimatedMinutesWatched,estimatedRedMinutesWatched,averageViewDuration,playlistStarts,viewsPerPlaylistStart,averageTimeInPlaylist
filter: isCurated==1;country==USplayback_location
dimension: insightPlaybackLocationType,subscribedStatus
metric: views,estimatedMinutesWatched,playlistStarts,viewsPerPlaylistStart,averageTimeInPlaylist
filter: isCurated==1traffic_source
dimension: insightTrafficSourceType,subscribedStatus
metric: views,estimatedMinutesWatched,playlistStarts,viewsPerPlaylistStart,averageTimeInPlaylist
filter: isCurated==1device_os_type
dimension: deviceType,operatingSystem,subscribedStatus,youtubeProduct
metric: views,estimatedMinutesWatched,playlistStarts,viewsPerPlaylistStart,averageTimeInPlaylist
filter: isCurated==1viewer_demographic
dimension: ageGroup,gender
metric: viewerPercentage
filter: isCurated==1You can find the value for dimension and metric parameters (as well as other required parameters) in the following articles from Google:
https://developers.google.com/youtube/analytics/channel_reports
https://developers.google.com/youtube/analytics/content_owner_reports
| Value | Description |
|---|---|
| content_owner | CMS ID (required if your YouTube account is a Content Owner) |
| channel | List of Channel IDs. Effective only if your account is Content Owner (array, optional, if empty, the plugin fetches analytics from channels that Content Owner has access to). This parameter applies to only content owner reports. |
| playlist | List of Playlist IDs, only effective for video report type (array, optional, if blank, the plugin fetches analytics of all videos) |
| metric | A comma-separated list of YouTube Analytics metrics, such as views or likes, dislikes (string, required if report_preset is omitted) |
| dimension | A comma-separated list of YouTube Analytics dimensions (string, optional) |
| filter | A list of filters that should be applied when retrieving YouTube Analytics data, separated by semi-colons (string, optional) |
| max_results | The maximum number of rows to include in the response, required for only some reports (integer, optional) |
| sort | A comma-separated list of dimensions or metrics that determine the sort order for YouTube Analytics data. By default, the sort order is ascending. The - prefix causes descending sort order (string, optional) |
| incremental | Incremental loading, for example, when running on schedule, the time window of the fetched data automatically shifts forward on each run (boolean, optional, default: true). For example, if the initial config is January 1, with ten days in duration, the first run fetches data modified from January 1 to January 10, the second run fetches from January 11 to January 20, and so on. |
| load_from_published_date | The earliest published date of specified content (Report Type) becomes the start date to retrieve analytics (boolean, optional) |
| begin_date | The start date to retrieve analytics, supported format: "yyyy-MM-dd" (string, optional). Specify either: - load_from_published_date - begin_date |
| end_date | The end date to retrieve analytics, supported format: "yyyy-MM-dd" (string, required). |
| duration | The number of days to retrieve analytics from the current end date in the next incremental run (integer, required in incremental mode). |
| interval | The number of days to break down analytics (integer, optional, default: 1, for example, daily) |
| include_historical_channel_data | Indicates whether to include channels' watch time and view data from the time period prior to when the channels were linked to the content owner. The default parameter value is false which means importing only watch time and viewing data from the dates that channels were linked to the content owner (boolean, optional, default: false). This parameter only applies to content owner reports. |
| maximum_retries | The number of retries before giving up (integer, optional, default: 7) |
| retry_initial_wait_millis | The initial waiting duration between retries, in millisecond (integer, optional, default: 30000, ie. 30 seconds) |
| maximum_retry_wait_millis | The maximum waiting duration between retries, in milliseconds (integer, optional, default: 1800000, ie. 30 minutes) |
| ignore_empty_playlist | Ignore empty playlist instead of throwing error |
td connector:preview config.ymlYou must specify the database and table to store the data.
The option --time-column is preferred because Treasure Data partitions the storage by time. If this option is not available, the data connector selects the first long or timestamp column as the partitioning time. The type of the column specified by --time-column must be either of long or timestamp type (use Preview results to check for the available column name and type). A time column is available at the end of the output.
If your data doesn’t have a time column, you can add the column by using the add_time filter option. See details at add_time filter function.
Submit the load job. It may take a couple of hours depending on the data size. You need to specify the database and table to store the data.
td connector:issue config.yml \
--database sample_db \
--table sample_tabletd connector:issue assumes you have already created a database (sample_db) and a table (sample_table). If the database or the table do not exist in TD, td connector:issue will fail. Therefore you must create the database and table manually or use --auto-create-table with td connector:issue to automatically create the database and table:
td connector:issue config.yml \
--database sample_db \
--table sample_table \
--auto-create-tableYou can schedule a periodic data connector execution for a periodic YouTube import. By using this feature, you no longer need a cron daemon on your local data center.
td connector:create creates a new schedule. The name of the schedule, the cron-style schedule, the database and table to store the data, and the Data Connector configuration file are mandatory.
td connector:create \
daily_youtube_import \
"10 0 * * *" \
sample_db \
sample_table \
config.ymlThe cron parameter also accepts three options: @hourly, @daily, and @monthly. For details, see Scheduled Jobs.
By default, the schedule is set up in the UTC timezone. You can set the timezone using --timezone or -t option. The --timezone option supports only extended timezone formats like Asia/Tokyo, America/Los_Angeles, and so on. Timezone abbreviations like PST, CST are not supported and might lead to unexpected schedules.
You can see the list of currently scheduled entries by td connector:list
td connector:listtd connector:show shows the execution setting of a schedule entry.
td connector:show daily_youtube_importYouTube Analytics data is based on PST and has a delay of up to 72 hours (https://support.google.com/youtube/answer/1714329?hl=en).
You cannot retrieve analytics for removed Videos and Playlists.
YouTube has the following limitations on the YouTube Analytics API Quotas:

The connector makes many YouTube API calls to ingest daily YouTube analytics. For example, if your channel has 1000 videos, and you want to import 100 days of historical data, the total number of API calls is 1,000x100 = 100,000.
The connector estimates the total number of executed requests before executing further to Youtube API. If the estimation is higher than 100,000, the job stops. You must either increase the Aggregation Period or bring in (make earlier) the End Date. Here is how estimation is calculated.
Let say there are 4 videos:
- Video 1 is published at 2010-02-10
- Video 2 is published at 2012-03-11
- Video 3 is published at 2016-04-12
- Video 4 is published at 2018-05-13
In the following example, the begin date is the date that the data connector starts to fetch analytics data from YouTube Analytics. The publish date is the date that the video is published in YouTube. The end date is the date that you specify in Treasure Data to end the ingestion session.
If the end date is 2017-06-14, and the begin date is 2010-01-01, then the aggregation period is 7. The formula to calculate total number of calls for 1 video is: End Date - Begin Date + 1 / Aggregation_Period (with begin date being the same date as the published date).
The end date is inclusive, meaning that the analytics data that is available on that end_date is ingested as well. With 4 videos:
- Video 1: ("2017-06-14" - "2010-02-10") / 7 = 3040 / 7 = 435
- Video 2: ("2017-06-14" - "2012-03-11") / 7 = 2280 / 7 = 326
- Video 3: ("2017-06-14" - "2016-04-12") / 7 = 425 / 7 = 61
- Video 4 is skipped because it is published after the end date; the end date is in 2017 and video 4 published in 2018
The estimation of total requests is: 435 + 326 + 61 = 822 requests which is good because it is still < 100,000 requests.
When using the same connection (see Create a New Connection) for multiple inputs, the quotas will run out quickly, and the import job will stop. A workaround is to create multiple connections with different OAuth apps (see Appendix).
Content Owner reports are only available to the accounts that join https://www.youtube.com/yt/creators/benefits/ program. If you are not participating in this YouTube program, retrieving content owner analytics causes an exception.
Be aware that it takes up to 72 hours for the analytics to accumulate. For more information, see https://support.google.com/youtube/answer/1714329?hl=en.
During our detail testing, there are some discrepancies between the analytics displayed on the YouTube Creator Studio and the one ingested. We investigated and found some reasons:
- When presenting, Creator Studio reports round some numbers before showing the data on the website.
- YouTube Analytics API applies some limitation on the returned analytics. To match with Creator Studio, you may need to remove specific dimensions and filters. For more information, refer to https://support.google.com/youtube/answer/9101241
During the testing phase, we recognize that if the request contains some combination of dimensions and metrics, the YouTube Analytics API returns an empty record without any further notices or exceptions. For example, if the country and the redViews metrics are in the same request, there will be no analytics returned.
To retrieve the redViews metric, you must remove the country dimension and vice versa. Therefore there is no report presets that has this pair of dimension and metric.
To retrieve the views count that come from TrueView, use the Preset: Playback Traffic Source Detail but change the parameter Filter from insightTrafficSourceType==YT_SEARCH to insightTrafficSourceType==ADVERTISING. The ingested insightTrafficSourceDetail column will hold the values, including "TrueView in-search and in-display" and "TrueView in-stream". See the list of possible values for insightTrafficSourceType and insightTrafficSourceDetail here: https://developers.google.com/youtube/analytics/dimensions#Traffic_Source_Dimensions.
The default preset Playback Traffic Source Detail retrieves the top 25 search terms that lead to the video or channel. Changing from YT_SEARCH to ADVERTISING results in a different set of values for insightTrafficSourceDetail.
The video creation and video upload are the same. However, there is a difference between video creation and publish date. If video is uploaded to Youtube channel, it is still private video and no one is able to watch that video until it is published. There is no interaction data from creation time until published time, therefore, only analytic data from the published time and onward can be returned from Youtube Analytics API.
The following steps show how to set up a custom OAuth app on https://console.developers.google.com.
Follow https://cloud.google.com/resource-manager/docs/creating-managing-projects to create a new project on your Google Cloud Console.
Follow https://cloud.google.com/apis/docs/enable-disable-apis?hl=en and enable the following APIs:
- YouTube Analytics API
- YouTube Data API v3


Adding https://developers.google.com/oauthplayground to the Authorized redirect URIs helps to get the refresh_token using the OAuth Playground. You don't have to add OAuth Playground if you have other methods.
Copy and save the client id and secret somewhere to use in the next steps.
The following steps show how to use OAuth Playground to retrieve the refresh_token for creating a YouTube data connection. You don't have to use OAuth Playground if you have other ways to retrieve this information.
Go to https://developers.google.com/oauthplayground/

Enter the ID and Secret set up in Appendix.
Paste the following scopes into Input your own scopes box:
https://www.googleapis.com/auth/youtube.readonly https://www.googleapis.com/auth/yt-analytics-monetary.readonly https://www.googleapis.com/auth/yt-analytics.readonly https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email
Click on Authorize APIs and follow the usual Google steps to log in and allow the scopes.
Click on Exchange authorization code for tokens.

Wait until the process finishes and then copy the Refresh token to use in your Treasure Data YouTube connector configuration.