Skip to content
Last updated

Facebook Page Insights Import Integration

You can connect Facebook Page Insights integration to import the following Facebook data into Treasure Data:

Prerequisites

  • Basic knowledge of Treasure Data
  • Basic knowledge of Facebook Graph API
  • Having required permissions for downloading Facebook Page data.
  • Authorized Treasure Data account access

Using TD Console to Create Your Connection

Create a New Authentication

Go to Integrations Hub > Catalog. Search and select Facebook Page Insights. A dialog will open.

You can select an existing OAuth connection for Facebook or click the link under OAuth connection to create a new connection.

Create a New OAuth Connection

Login to your Facebook account in a popup window:

And grant access to the Treasure Data app.

You will be redirected back to the TD Console. Repeat the first step (Create a new authentication**)** and choose your new OAuth connection.

Name your new authentication. Select Done.

Transfer Your Facebook Insights Data to Treasure Data

In Authentications, configure the New Source.

You can name the Source in this dialog by editing the Data Transfer Name.

  1. Name the Source in the Data Transfer field**.**
  2. Select Next. The Source Table dialog opens.

Source Table

In the Source Table, edit the parameters and select Next.

ParameterDescription
Data TypeSupported data types**:** - Page - Post - Video - Conversation Messages - Conversation Participants - Comments
Folders(s)Conversation folders

For the data type Video, the only period supported is Lifetime.

Data Settings

In this dialog, you can edit data settings or skip this step.

  1. Edit the Data Settings parameters.
  2. Select Next.

ParametersDescription
Retrieve Video insights back toDue to Facebook-specified data limits, a job may fail to retrieve all past data. The default setting is to import the last three months of data for Video Insights. Update this value to import more data.
Skip Error POSTDefault true. Skip the error when importing POST Insights when an error occurs.
Retry LimitThe number of retries before the connector stops trying to connect and retrieve data.
Retry initial wait in millisInterval to retry if a recoverable error happens (in milliseconds).
Max retry wait in millisMaximum time in milliseconds between retry attempts.
HTTP connect timeout in millisHTTP connection timeout.
HTTP idle timeout in millisHTTP idle timeout.

Data Preview

You can see a preview of your data before running the import by selecting Generate Preview. Data preview is optional and you can safely skip to the next page of the dialog if you choose to.

  1. Select Next. The Data Preview page opens.
  2. If you want to preview your data, select Generate Preview.
  3. Verify the data.

Data Placement

For data placement, select the target database and table where you want your data placed and indicate how often the import should run.

  1. Select Next. Under Storage, you will create a new or select an existing database and create a new or select an existing table for where you want to place the imported data.

  2. Select a Database > Select an existing or Create New Database.

  3. Optionally, type a database name.

  4. Select a TableSelect an existing or Create New Table.

  5. Optionally, type a table name.

  6. Choose the method for importing the data.

    • Append (default)-Data import results are appended to the table. If the table does not exist, it will be created.
    • Always Replace-Replaces the entire content of an existing table with the result output of the query. If the table does not exist, a new table is created.
    • Replace on New Data-Only replace the entire content of an existing table with the result output when there is new data.
  7. Select the Timestamp-based Partition Key column. If you want to set a different partition key seed than the default key, you can specify the long or timestamp column as the partitioning time. As a default time column, it uses upload_time with the add_time filter.

  8. Select the Timezone for your data storage.

  9. Under Schedule, you can choose when and how often you want to run this query.

Run once

  1. Select Off.
  2. Select Scheduling Timezone.
  3. Select Create & Run Now.

Repeat Regularly

  1. Select On.
  2. Select the Schedule. The UI provides these four options: @hourly@daily and @monthly or custom cron.
  3. You can also select Delay Transfer and add a delay of execution time.
  4. Select Scheduling Timezone.
  5. Select Create & Run Now.

After your transfer has run, you can see the results of your transfer in Data Workbench > Databases.

Use Command-Line to Create Your Facebook Connection

You can use the Treasure Data Console to configure your connection.

Install the Treasure Data Toolbelt

Open a terminal and run the following command to install the newest TD Toolbelt.

Obtain a Facebook Token

Facebook provides three types of tokens. You will need the Page Access Token. We recommend that you select the never-expiring Page Access Token.

To obtain the never-expiring Page Access Token, follow the instructions here: https://www.rocketmarketinginc.com/blog/get-never-expiring-facebook-page-access-token/

Prepare a Configuration File (config.yml)

Using a text editor, create a file called config.yml. Copy and paste the following information, replacing the placeholder text with your Facebook connector info.

The in section specifies what comes into the connector from Facebook, and the out section specifies what the connector sends to the database in Treasure Data. For more details on available out modes, see Appendix.

in:
  type: "facebook_page_insights"
  access_token: "[your Facebook Page token]"
  page_id: [your Facebook Page ID]
  data_type: page
  incremental: true
  select_all_metrics: true
  since: 2017-01-01
  until: 2017-01-31
out:
  mode: append

Configuration keys and descriptions are as follows:

Option nameDescriptionTypeRequired?Default Value
access_tokenFacebook Page Access Token.stringyes
page_idFacebook Page ID. See Addendumstringyes
data_type- page - post - video - conversation_message - conversation_participant - commentstringoptionalpage
select_all_metricsImport all supported insight metrics for the current Data Type. Set this value so you don't need to set metric_presets or metrics. Applicable for Page and Post. See Available Metrics.booloptional
metric_presetsPredefined category of metrics or group of related metrics. See Supported Preset Metrics.arrayoptional
metricsFacebook Graph insight metrics, you can specify each metric as much as you need. This config will override metric_preset if both of them are specified. Supported Metricsarrayoptional
sinceA lower bound of the time range to consider, supported formats: yyyy-MM-dd or Unix time i.e. 1584697547stringoptional
untilAn upper bound of the time range to consider, supported formats: yyyy-MM-dd or Unix time i.e. 1584697547stringoptional
periodThe aggregation period. See Supported Periods.enumoptional
date_presetPreset a date range, like ‘lastweek’ or ‘yesterday’. The data transfer request will fail if a ‘since’ or ‘until’ date is specified, and the date_preset is also selected. See Supported Date Presets.enumoptional
incrementaltrue for generate “config_diff” with embulk run -c config.diffbooloptionalfalse
last_in_monthsRetrieve Video insights back to this month's range. Specify that a range of more than three months would cause the job error due to Facebook API limitations.integeroptional3 (months)
skip_error_postSkip error when importing POST insightsbooloptionaltrue
retry_limitNumber of error retries before the connector gives upintegeroptional7
retry_initial_wait_millisWait milliseconds for exponential backoff initial valueintegeroptional500 (0.5 second)
max_retry_wait_millisMaximum wait milliseconds for each retryintegeroptional300000 (5 minutes)
connect_timeout_millisHTTP connect timeout in millisecondsintegeroptional180000 (3 minutes)
idle_timeout_millisHTTP idle timeout in millisecondsintegeroptional300000 (5 minutes)
conversation_foldersConversation folder: inbox, page_done, other, pending and spam E.g. [{"value":"inbox"}]arrayoptionalall folders

Example of config*.yml* with incremental and Page data type

in:
  type: "facebook_page_insights"
  access_token: "[your Facebook Page token]"
  page_id: [your Facebook Page ID]
  data_type: page
  page_metric_presets:
  - value: page_impressions
  - value: page_cta_clicks
  - value: page_user_demographics
  - value: page_views
  - value: page_engagement
  incremental: true
  since: 2017-01-01
  until: 2017-01-31
out:
  mode: append

Post data type

in:
  type: "facebook_page_insights"
  access_token: "[your Facebook Page token]"
  page_id: [your Facebook Page ID]
  data_type: post
  post_metric_presets:
  - value: page_post_impressions
  - value: page_post_engagement
  - value: page_post_reactions
  - value: page_video_posts
  since: 2017-01-01
  until: 2017-01-31
out:
  mode: append

Video data type

in:
  type: "facebook_page_insights"
  access_token: "[your Facebook Page token]"
  page_id: [your Facebook Page ID]
  data_type: video
  last_in_months: 3
out:
  mode: append

Preview the Data to be Imported (Optional)

You can preview data to be imported using the command td connector:preview.

td connector:preview config.yml

Execute Load Job

Before you execute the load job, you must specify the database and table where you want to store the data. |

You use td connector:issue to execute the job. The following are required: the schedule's name, the cron-style schedule, the database and table where their data will be stored, and the Data Connector configuration file.

td connector:issue config.yml --database td_sample_db --table td_sample_table

It is recommended to specify --time-column option because Treasure Data’s storage is partitioned by time. You can also use the --time-column option to override auto-generated time values, by specifying end_time as the time column (only applied for data_typepage). Data will be accumulated daily and end_time will be end of the day, using the timezone of your Facebook Page, but converted to UTC format.

If your data doesn’t have a time column, you can add one using the add_time filter option. See details at add_time Filter Plugin for Integrations.

Finally, submit the load job. Depending on the data size, it may take a couple of hours. You must specify the database and table where their data is stored.

td connector:issue config.yml --database td_sample_db --table
td_sample_table --time-column end_time

Scheduled Execution

You can schedule periodic data connector execution for incremental Facebook Insights data. We configure our scheduler carefully to ensure high availability. This feature eliminates the need for a cron daemon in your local data center.

For the scheduled import, the Data Connector for Facebook Page Insights imports all of your ad data at the first run.

On the second and subsequent runs, the connector imports only newer data than the last load.

Create the Schedule

A new schedule can be created using the td connector:create command. The following are required: the schedule's name, the cron-style schedule, the database and table where their data will be stored, and the data connector configuration file.

$ td connector:create \
    daily_import \
    "10 0 * * *" \
    td_sample_db \
    td_sample_table \
    config.yml

The cron parameter also accepts three special options: @hourly, @daily and @monthly. | By default, the schedule is set in the UTC timezone. You can set the schedule in a different timezone using the —t or—-timezone option. Note that the --timezone option supports only extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles', etc. Timezone abbreviations like PST and CST are not supported and may lead to unexpected schedules.

FAQ

Q: Why were my scheduled jobs categorized as "SUCCESS" but did not bring in new data?

2018-05-06 13:00:18.627 +0000 [WARN] (0047:task-0043): Time range does not reach, abort and will retry later. Start Date: '1525330800', End Date: '1528095600', Current Date: '2018-05-06'

This means either ‘Start Date’ or ‘End Date’ has exceeded the current date, for example, is specified with a date in the future. The cause of such warning messages could be that you’ve configured cron shorter than the fetching time range. For example, a daily job to pull monthly data.

Q: How can I set up daily jobs to pull in new data each time?

You need an initial load (or multiple loads, due to the limitation of a 3-month time range for each load). Let’s say today is 2018-05-09, and you need to load data since 2018-01-01:

JobsStart DateEnd Date
First job (one-time)2018-01-012018-04-01
Second job (one-time)2018-04-012018-05-08
[Incremental] Daily job2018-05-082018-05-09

Q: Why do I suddenly have so many Posts Insights data?

Starting from version v0.2.0, you will get insights into all Posts from the very first post of the page until the End Date value. This improvement upgrades the date setting so you will be able to get the Insights data from the Start Date to the End Date of all available Posts. Compared to version v0.1.16, you can only get the insights data from the Start Date to the End Dateof the Posts created within that date range.

Page ID

Instead of using Page's username when creating the connector, you can use Page ID. To find the page ID, on your Facebook page, select the About menu and scroll down to the Page ID, as shown: