You can import Google Search Analytics data into Treasure Data. You can filter and group data by dimensions such as page, query, country, or device. You can specify a schedule to periodically import the data.
Basic knowledge of Treasure Data
Basic knowledge of Google Search Analytics
A Google Search Console account.
Use the TD Console to Create Your Connection
Create a New Connection
When you configure a data connection, you provide authentication to access the integration. In Treasure Data, you configure the authentication and then specify the source information.
Go to Integrations Hub > Catalog and search and select Google Search Analytics.
The following dialog opens
Access to Treasure Data Google Search Analytics requires OAuth2 authentication. The authentication requires that users manually connect their Treasure Data account to their respective Google Search Analytics account.
To authenticate, complete the following procedure:
Select Click here to connect to a new account.
Log into your Google account, in the popup window and grant access to the Treasure Data Connector app.
You will be redirected back to TD Console. Repeat the first step (Create a new connection) and choose your new OAuth connection.
Name your new OneDrive Connection. Select Done.
Transfer Your Google Search Analytics Data to Treasure Data
In Authentications, configure the New Source.
Complete the details and select Next.
Property tracked in Google Search Console. Google sees the protocols (http:// or https://), domain, subdomains (for example example.com, m.example.com, and www.example.com) differently, you must enter the exact URL of the domain, subdomain, or path branch from which you want to get data.
If you use Domain properties, use the following format when specifying your domain: sc-domain:example.com.
Group by Dimension
You can group the data you import by dimensions, one or more. Supported Dimensions are: Page, Country, Device, Query, and Search Appearance.
Note that you can not group Search Appearance with other Dimensions. Examples of valid dimension groups: Page, Country, Device.
The numbers of selected Dimensions affect the output Schema. For example, if the Dimensions: Country, Device and Query are selected, then the import is Country, Device and Query columns as shown:
Filter by Dimension
You can filter data in multiple categories. For example, if you are currently grouping data by query, you can add the filters "query contains 'treasure data'".
The Google Search Analytics supports only the “AND” operator currently. Therefore, a filter like:"Country equals'USA'" AND "Device equals'Mobile'" AND "Query contains 'sample data'" returns data, but "Country equals'USA'" AND "Country equals 'JPN'" AND "query contains 'sample data'" won't return data.
Filter analytics data by search type. Supported value: Web, Image, or Video. The web is selected by default. The selected value is included in the output schema under column search_type.
Start Date and End Date
The requested date range, Start Date and End Date, are required. Specify in YYYY-MM-DD format. The End Date must be greater than or equal to the Start Date. The values are inclusive: start at the beginning of the Start Date (Start Date at 00:00:00) and end at the end of the End Date (End Date at 23:59:59). If you want to import on a specific day then set the Start Date equal to the End Date. For example: Start Date = 2018-06-11, End Date = 2018-06-11.
The data is aggregated by the Start Date and End Date. If you set the date range greater than 1 day then the value of a click, ctr, impressions, the position is the sum of individual days, over the range of days specified.
The resulting data is in PST time (UTC-8:00).
Data might have a 2-3 day delay, waiting until the data is available in Google Search Console. As a best practice, don't set an End Date that is greater than date when data is available.
By enabling Incremental Loading, you can schedule a job to run iteratively. The next iteration of the job run is calculated from the Start Date and End Date values.
Following is an example of the next iteration of `single day` and `multiple days` time ranges:
1 Week range
Preview and Advanced Settings
You see a Preview of your data similar to the following dialog. Read more about Preview.
Select Advanced Settings to customize the data connector behavior.
Include Report Duration
The Include Report Duration option affects the output Schema. If the checkbox is selected (by default) then the result of the import job has 2 extra columns (Start Date and End Date, with the type of string) as shown in the following example:
During the import of data into Treasure Data, there are a number of factors that affect the process, such as network fluctuation, and concurrency limit on Google server. The connector retries import for the number of times you specify.
Initial retry time wait in millis
The connector initially waits for this amount of milliseconds that you specify before retrying import. The next retry is 2 * initial retry time, and so on.
Max retry wait in millis
The connector aborts the retry if the amount of wait time reaches this limit that you specify.
Choose the Target Database and Table
Choose an existing or create a new database and table.
Create a new database and give your database a name. Complete similar steps for Create new table.
Select whether to append records to an existing table or replace your existing table.
If you want to set a different partition key seed rather than use the default key, you can specify one using the popup menu.
In the Schedule tab, you can specify a one-time transfer, or you can schedule an automated recurring transfer. If you select Once now, select Start Transfer. If you select Repeat… specify your schedule options, then select Schedule Transfer.
Enter the New Source name and select Done.
Use Command Line
Install ‘td’ Command
Install the Treasure Data Toolbelt.
Obtain Required Google API Credentials
The connector config for 'td' command requires:
These parameters can be obtained by the following steps:
Obtain client_id, client secret
If you already have set up OAuth 2 and enabled Google Search Console APIs access, and have client ID and client secret in place, you can skip this step.
To get the client ID and client secret, follow the instructions for Google Search Console APIs for developers: https://developers.google.com/webmaster-tools/search-console-api-original/v3/how-tos/authorizing
You can also use the Google Search Console API Wizard which allows you quickly create a project and turn on the Google Search Console API. Go to Credentials > Create Credentials > OAuth ClientID > Web Application. Enter a name then select Create. The next screen shows you the client ID and client secret.
The remaining credential you need to get is the refresh token. One way to get the refresh token is to use the Google OAuth 2.0 Playground, available here: Google OAuth 2.0 Playground
First from within the OAuth 2.0 Playground, on the top right corner, select the Gear Icon, and make sure to select the Use your own OAuth credentials checkbox. In the OAuth `Client ID` and OAuth `Client secret`, insert the credentials you got from the API console.
At step 1, select the Search Console API v3. Then select the "https://www.googleapis.com/auth/webmasters.readonly" and select "Authorize APIs" and follow the instruction to grant access from Search Console API to your Google Search Console account.
At step 2, Select Exchange authorization code for tokens, which fills the Refresh Token and Access Token fields. The refresh token field is the value used to prepare the Connector configuration in the next steps.
Prepare Configuration and Preview Data
Prepare config.yml as shown:
For details on available out modes, see Appendix A
For details of filters, see Appendix B
Then, you can see a preview of the data using the preview command.
You do not need to run td connector:guess for the Google Search Analytics connector.
Execute Load Job
Submit the load job. It may take a couple of hours to process depending on the size of the data.
You can schedule periodic data connector execution for incremental import of Search Analytics data.
For the scheduled import, at first run, the Data Connector for Search Analytics imports all of your data specified by "start_date" and "end_date".
On the second and subsequent runs, the connector imports only data that is newer than the last load.
Create the Schedule
You can create a schedule using the td connector:create command. The following are required: the name of the schedule, the cron-style schedule, the database and table where their data will be stored, and the data connector configuration file.
The `cron` parameter also accepts three special options: `@hourly`, `@daily`, and `@monthly`.
By default, the schedule is set up in the UTC timezone. You can set the schedule in a timezone using -t or --timezone option. Note that the `--timezone` option supports only extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles', etc. Timezone abbreviations like PST, CST are *not* supported and may lead to unexpected schedules.
List All Schedules
You can see the list of all current schedule entries with the command td connector:list.
Show Schedule Settings and History
td connector:show shows the execution settings of a schedule entry.
td connector:history shows the execution history of a schedule entry. To investigate the results of each individual run, use td job <jobid>.
td connector:delete removes the schedule.
The following table provides details on the options available in modes.
app client id
app client secret
app refresh token
site url to be tracked
search analytics dimension to group result
filter by dimensions
filter by search type. web, image or video
start date of request date range
end date of request date range
generate next start_date and end_date for schedule import
specify if start_date and end_date include in the output result
when a retry-able error should give up
initial wait when a retry-able error occurs. Subsequence value will be 2 times of previous value
15000 (15 seconds)
max wait for each retry
1800000 (30 minutes)
Modes for Out Plugin
You can specify how new data is imported into Treasure data by specifying mode in out section of config.yml.
This is the default mode and records are appended to the target table.
replace (In td 0.11.10 and later)
This mode replaces data in the target table. Any manual schema changes made to the target table remains intact with this mode.
The add_time filter plugin for Data Connector allows users to add a new time-based column to the schema either by copying the value from another existing column in the schema or by specifying the value. See document of add_time Filter Plugin for Integrations for more details.
Supported Filter Operators
Added New Column
Started from version v0.1.4, a new column name: search_type is added to your job result as we support Search Type filter. The default value is "web" if you do not change the job configuration.