You can connect Facebook Page Insights connector to import the following Facebook data into Treasure Data:
Basic knowledge of Treasure Data
Basic knowledge of Facebook Graph API
Having required permissions for downloading Facebook Page data.
Authorized Treasure Data account access
Using TD Console to Create Your Connection
Create a New Authentication
Go to Integrations Hub > Catalog. Search and select Facebook Page Insights. A dialog will open.
Select an existing OAuth connection for Facebook, or select the link under OAuth connection to create a new connection.
Create a New OAuth Connection
Login to your Facebook account in a popup window:
And grant access to Treasure Data app.
You will be redirected back to the TD Console. Repeat the first step (Create a new authentication) and choose your new OAuth connection.
Name your new authentication. Select Done.
Transfer Your Facebook Insights Data to Treasure Data
In Authentications, configure the New Source.
In this dialog, you can name the Source by editing the Data Transfer Name.
Name the Source in the Data Transfer field.
Select Next. The Source Table dialog opens.
In Source Table, edit the parameters and select Next.
Supported data types:
For the data type Video, the only period supported is Lifetime.
In this dialog, you can edit data settings or opt to skip this step.
Edit the Data Settings parameters.
Retrieve Video insights back to
Due to Facebook-specified data limits, a job may fail to retrieve all past data. The default setting is to import the last 3 months of data for Video Insights. Update this value to import more data.
Skip Error POST
Default true. Skip error when importing POST Insights when an error occurred.
The number of retries before the connector stops trying to connect and retrieve data.
Retry initial wait in millis
Interval to retry if a recoverable error happens (in millisecond).
Max retry wait in millis
Maximum time in milliseconds between retry attempts.
HTTP connect timeout in millis
Http connection timeout
HTTP idle timeout in millis
Http idle timeout
You can see a preview of your data before running the import by selecting Generate Preview. You can opt to skip this step.
Select Generate Preview.
After the preview is generated and you have verified it is the correct data, select Next.
In this dialog, you will specify where your data will be placed and schedule how often it will run this import.
Under Storage you will create a new or select an existing database and create a new or select an existing table for where you want to place the imported data.
Select a Database > Select an existing or Create New Database.
Select a Table> Select an existing or Create New Table.
Choose the Append or Replace method for importing the data.
Append (default)-Data import results are appended to the table. If the table does not exist, it will be created.
Replace-Replaces the entire content of an existing table with the resulting output of the query. If the table does not exist, a new table is created.
Select the Timestamp-based Partition Key column. If you want to set a different partition key seed than the default key, you can specify the long or timestamp column as the partitioning time. As a default time column, it uses upload_time with the add_time filter.
Select the Timezone for your data storage.
Under Schedule, you can choose when and how often you want to run this query.
Select Scheduling Timezone.
Select Create & Run Now.
Repeat the query:
Select the Schedule. The UI provides these four options: @hourly, @daily and @monthly or custom cron.
You can also select Delay Transfer and add a delay of execution time.
Select Scheduling Timezone.
Select Create & Run Now.
After your transfer has run, you can see the results of your transfer in Data Workbench > Databases.
Use Command-Line to Create Your Facebook Connection
You can use the Treasure Data Console to configure your connection.
Install the Treasure Data Toolbelt
Open a terminal and run the following command to install the newest TD Toolbelt.
Obtain a Facebook Token
Facebook provides 3 types of tokens. You will need the Page Access Token. We recommend that you select the never-expiring Page Access Token.
To obtain the never-expiring Page Access Token, follow the instructions here: https://www.rocketmarketinginc.com/blog/get-never-expiring-facebook-page-access-token/
Prepare a Configuration File (config.yml)
Using a text editor, create a file called config.yml. Copy and paste the following information replacing the placeholder text with your Facebook connector info.
The in: section is where you specify what comes into the connector from Facebook and the out: section is where you specify what the connector puts out to the database in Treasure Data. For more details on available out: modes, see Appendix.
Configuration keys and descriptions are as follows:
Facebook Page Access Token.
Facebook Page ID. See Addendum
Import all supported insight metrics for the current Data Type. Set this value so you don't need to set metric_presets or metrics. Applicable for Page and Post. See Available Metrics.
Predefined category of metrics, or group of related metrics. See Supported Preset Metrics.
Facebook Graph insight metrics, you can specify each metric, as much as you need. This config will override metric_preset if both of them are specified. Supported Metrics
Lower bound of the time range to consider, supported formats: yyyy-MM-dd or Unix time i.e. 1584697547
Upper bound of the time range to consider, supported formats: yyyy-MM-dd or Unix time i.e. 1584697547
The aggregation period. See Supported Periods.
Preset a date range, like ‘lastweek’ or ‘yesterday’. If a ‘since’ or ‘until’ date is specified and the date_preset is also selected, the data transfer request will fail. See Supported Date Presets.
true for generate “config_diff” with embulk run -c config.diff
Retrieve Video insights back to this months range. Specify more than 3 months range would cause the job error due to Facebook API limitation.
Skip error when importing POST insights
Number of error retries before connector gives up
Wait milliseconds for exponential backoff initial value
500 (0.5 second)
Maximum wait milliseconds for each retry
300000 (5 minutes)
HTTP connect timeout in milliseconds
180000 (3 minutes)
HTTP idle timeout in milliseconds
300000 (5 minutes)
Conversation folder: inbox, page_done, other, pending and spam
Example of config.yml with incremental and Page data type
Post data type
Video data type
Preview the Data to be Imported (Optional)
You can preview data to be imported using the command td connector:preview.
Execute Load Job
You must specify the database and table where you want to store the data before you execute the load job.
You use td connector:issue to execute the job. The following are required: the name of the schedule, the cron-style schedule, the database and table where their data will be stored, and the Data Connector configuration file.
It is recommended to specify --time-column option because Treasure Data’s storage is partitioned by time. You can also use the --time-column option to override auto-generated time values, by specifying end_time as the time column (only applied for data_typepage). Data will be accumulated daily and end_time will be end of the day, using the timezone of your Facebook Page, but converted to UTC format.
If your data doesn’t have a time column, you can add the column by using the add_time filter option. See details at add_time Filter Plugin for Integrations.
Finally, submit the load job. It may take a couple of hours depending on the data size. You must specify the database and table where their data is stored.
You can schedule periodic data connector execution for incremental Facebook Insights data. We configure our scheduler carefully to ensure high availability. By using this feature, you no longer need a cron daemon on your local data center.
For the scheduled import, at first run, the Data Connector for Facebook Page Insights imports all of your ad data.
On the second and subsequent runs, the connector imports only data that is newer than the last load.
Create the Schedule
A new schedule can be created using the td connector:create command. The following are required: the name of the schedule, the cron-style schedule, the database and table where their data will be stored, and the data connector configuration file.
The `cron` parameter also accepts three special options: `@hourly`, `@daily` and `@monthly`.
By default, the schedule is setup in UTC timezone. You can set the schedule in a timezone using -t or --timezone option. Note that the `--timezone` option supports only extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles' etc. Timezone abbreviations like PST, CST are *not* supported and may lead to unexpected schedules.
List All Schedules
You can see the list of all current schedule entries with the command td connector:list.
Show Schedule Settings and History
td connector:show shows the execution settings of a schedule entry.
td connector:history shows the execution history of a schedule entry. To investigate the results of each individual run, use td job <jobid>.
td connector:delete removes the schedule.
Q: Why were my scheduled jobs categorized as "SUCCESS" but did not bring in new data?
This means either ‘Start Date’ or ‘End Date’ has exceeded the current date, for example, is specified with a date in the future. The cause of such warning messages could be that you’ve configured cron shorter than the fetching time range. For example, a daily job to pull monthly data.
Q: How can I set up daily jobs to pull in new data each time?
You need an initial load (or multiple loads, due to the limitation of a 3-month time range for each load). Let’s say today is 2018-05-09, and you need to load data since 2018-01-01:
First job (one-time)
Second job (one-time)
[Incremental] Daily job
Q: Why do I suddenly have so many Posts Insights data?
Started from version v0.2.0 you will get the insights of all Posts since the very first Post of the page until the End Date value. In this kaizen, we upgraded the date setting so you will be able to get the Insights data from Start Date to End Date of all available Posts. Compared to the version v0.1.16, you can only get the insights data from Start Date to End Date of the Posts created within that date range only.
Modes for Out Plugin
You can specify file import mode in out section of config.yml.
This is the default mode and records are appended to the target table.
This mode replaces data in the target table. Any manual schema changes made to the target table remains intact with this mode.
How Data Aggregated
Result data will be aggregated by the end of each day, meaning insights of 2017-01-01 will have end_time = 2017-01-02 00:00:00.
Facebook APIs and other Facebook interfaces vary in how Start date and End dates are defined. The Start and End date you specify while defining this connector is passed to the Facebook API that you specify. You might need to adjust the Start and End dates that you specify to ensure that, within Facebook, the various API and interface Start and End dates match.
If you use Incremental Loading and scheduled jobs, the time range for the next iteration (run) will be calculated based on the initial Start Date and End Date. Imported data will have no gaps.
Due to the way Facebook API query operates, you will need to pay attention to End Date if you’re running a one-time job.
Your result will include insights data with end_time spanning from <Start Date + 1> 00:00:00 to <End Date - 1> 00:00:00.
If you want to import the insights for the month of January 2017, the End Date must be used carefully. For example, if you chose the Start Date and End Date this way:
Start Date = 2017-01-01 (or 2017-01-01 00:00:00)
End Date = 2017-02-01 (or 2017-02-01 00:00:00)
Your output will look similar to the following example:
The result contains data up to and including Jan 30th, ending with the beginning of Jan 31st, but will include no data for Jan 31st. To get data for the entire month, you must set End Date = 2017-02-02:
Supported Preset Metrics
Value to Use
Page CTA Clicks
Page User Demographics
Page Video Views
Page Post Impressions
Page Post Engagement
Page Post Reactions
Page Video Posts
People Talking About This
To import all periods, select all periods from the TD Console, or not set on TD CLI
Supported Date Presets
Instead of using Page's username when creating the connector, you can use Page ID. To find the page ID, on your Facebook page, select the About menu and scroll down to the Page ID, as shown: