Instagram Business enables you to obtain insights about your Instagram media objects and your user account activities. You can use Treasure Data’s connector to bring Instagram insight data into Treasure Data for integration with your other data resources on the Treasure Data platform.


Prerequisites

  • Basic knowledge of Treasure Data, including the TD toolbelt

  • API level knowledge of Facebook Graph API, Insights API, and Instagram

  • Access permission to a Facebook Page. Instagram must be connected to the Page.

Limitations

Posts or metrics generated before the Instagram account was converted into a business account is not available for request. Only metrics and posts made after the account became a business account are available through the Insights API.

About Incremental Loading

In the following example, let’s use a 9-day range between the start and end date:

Start Date: 2017-10-01
End Date: 2017-10-11

Each job will have the same time range as determined by the period between the start and end dates. The transfer of metrics begins at the completion of the previous job until the period extends past the current date (which is the default end date). Further transfers will be delayed until a complete period is available, at which the job will execute and then pause until the next period is available.

1st run: Starting end_time: 2017-10-01 07:00:00 Ending end_time: 2017-10-10 07:00:00

2nd run: Starting end_time: 2017-10-11 07:00:00 Ending end_time: 2017-10-20 07:00:00

3rd run: Starting end_time: 2017-10-21 07:00:00 Ending end_time: 2017-10-30 07:00:00

About Date Range Import

  • Start Date is inclusive and End Date is exclusive.

    • If not specified, the end date defaults to the current time in Instagram Business Account Local Time.

    • If not specified, the start date defaults to 2 days before the end date. The defaults are set by Facebook API.

Data is aggregated at the end of each day, so the insights of 2017-01-01 result in end_time = 2017-01-02 00:00:00.

If you use incremental loading and scheduled jobs, the time range for the next iteration (run) is calculated based on the period between the initial start and end dates, and the imported data will not have gaps.

Supported Metrics and Preset Metrics

Metrics

User type

Incremental

- account_reach_and_impression_week
- account_reach_and_impression_days_28
- account_day_metrics

None-incremental

- account_reach_and_impression_week
- account_reach_and_impression_days_28
- account_day_metrics - account_audience

Media type

- carousel_album_engagement-lifetime
- carousel_album_impressions-lifetime
- carousel_album_reach-lifetime
- carousel_album_saved-lifetime
- carousel_album_video_views-lifetime
- engagement-lifetime
- exits-lifetime
- impressions-lifetime
- reach-lifetime
- replies-lifetime
- saved-lifetime
- taps_back-lifetime
- taps_forward-lifetime
- video_views-lifetime

Preset Metrics

User type

Incremental

- email_contacts-day
- follower_count-day
- get_directions_clicks-day
- impressions-day
- impressions-days_28
- impressions-week
- online_followers-lifetime
- phone_call_clicks-day
- profile_views-day
- reach-day
- reach-days_28
- reach-week
- text_message_clicks-day
- website_clicks-day

None-incremental

- audience_city-lifetime
- audience_country-lifetime
- audience_gender_age-lifetime
- audience_locale-lifetime
- email_contacts-day
- follower_count-day
- get_directions_clicks-day
- impressions-day
- impressions-days_28
- impressions-week
- online_followers-lifetime
- phone_call_clicks-day
- profile_views-day
- reach-day
- reach-days_28
- reach-week
- text_message_clicks-day
- website_clicks-day

Media type

- media_all
- story_metrics
- carousel_album_metrics
- photo_and_videos_metrics

Obtain the Facebook Page ID

Enter the Facebook page ID that is linked to the Instagram Business account. When a user creates or converts an Instagram account to a business account, the user is required to link the Instagram Account to a Facebook page.

Extended User Access Token

Facebook User Access token by default is a short-lived Access token.

Obtain a long-lived Access token to run a scheduled job in Treasure Data Platform:

  • Follow steps in this article to obtain a long-lived User Access token via Facebook tool

  • Follow these steps to obtain a long-lived token using Facebook API


  • If you run job using the TD Console, the Connector already exchanged for long-lived access token for you.

  • Following permissions (scopes) is required:

    • public_profile

    • email manage_pages

    • pages_show_list

    • instagram_basic

    • instagram_manage_insights

    • instagram_manage_comments

    • ads_management

Use the TD Console to Create Your Connection

Create a New Authentication

When you configure a data connection, you provide authentication to access the integration. In Treasure Data, you configure the authentication and then specify the source information.

  1. Open TD Console.

  2. Navigate to Integrations HubCatalog

  3. Search and select Instagram User & Media Insights.


  4. Select an existing OAuth connection for Instagram Insights, or select the link under OAuth connection to create a new connection.

Create a new OAuth connection

Instagram OAuth uses Facebook OAuth so you are taken to the Facebook OAuth page.

  1. Log into your Facebook account:


  2. Grant access to Treasure Data app by selecting the ‘Continue as ’ button.


  3. You will be redirected back to the Catalog. Repeat the first step (Create a new authentication) and choose your new OAuth connection.


  4. Name your new authentication. Select Done.


Transfer Your Instagram Account Data to Treasure Data

After creating the authenticated connection, you are automatically taken to Authentications.

  1. Search for the connection you created. 

  2. Select New Source.


Connection

  1. The Connection page opens.

  2. Type a name for your Source in the Data Transfer field.


Source Table

  1. Select Next. The Source Table page opens.

  2. Edit the following parameters in the Source Table.

Parameter

Description

Facebook Page ID

Required, the Facebook page ID that is linked to the Instagram Business account.

Data type

Required, select one of the data types

User

Insights are metrics collected about your Instagram Business account, such as impressions, follower_count*, website_clicks, and audience_locale.

Media

Insights are metrics collected from content posted for your Instagram business account, such as engagement, impressions, reach, and saved.

Comments

Represents an organic Instagram comment.

Media List

Represents an Instagram photo, video, story, or album

Tags

Represents a collection of IG Media objects in which an IG User has been tagged by another Instagram user.

Start Date & End Date

Required for the User data type, the start date and end date must follow the format: YYYY-MM-DD. Example: 2017-11-21

  • Start Date is inclusive and End Date is exclusive.

    • If not specified, the end date defaults to the current time in Instagram Business Account Local Time.

    • If not specified, the start date defaults to 2 days before the end date. The defaults are set by Facebook API.

Data is aggregated at the end of each day, so the insights of 2017-01-01 result in end_time = 2017-01-02 00:00:00.

Use Individual Metrics

Select if you want to choose the metrics data to import. The dropdown lists all individual metrics for the selected data type. You can select multiple metrics.

  • Media metrics supports only [period: lifetime]

  • User metrics support day, week, days_28, and lifetime.

Preset User Metrics

The default is Week metrics. You can select from the following preset user metrics:

  • Week metrics [period: week]

  • Days 28 metrics [period: days_28]

  • Day metrics [period: day]

  • Lifetime metrics [period: lifetime], does not support incremental import

Preset Media metrics

The default is All media metrics. You can select from the following preset media metrics:

  • All media metrics. All media metrics will import metrics for all content in the Instagram account

  • Story metrics. Import metrics for Instagram Story

  • Carousel metrics. Import metrics for Instagram Carousel

  • Photo and video metrics will only import metrics for Instagram image and video*

Import media metrics

If a metric is supported by multiple media objects, for example: IMAGE, VIDEO, STORY, and so on, then the metric values of all those media objects will be imported.

Import user metrics

User metrics support day, week, days_28, and lifetime

Incremental Loading

You can schedule a job to run iteratively. Each iteration of the job run is calculated based on the Start Date and End Date values and new values imported each time with no gap or duplication.

In the following example, let’s use a 9-day range between the start and end date:

Start Date: 2017-10-01
End Date: 2017-10-11

Each job will have the same time range as determined by the period between the start and end dates. The transfer of metrics begins at the completion of the previous job until the period extends past the current date (which is the default end date). Further transfers will be delayed until a complete period is available, at which the job will execute and then pause until the next period is available.

1st run: Starting end_time: 2017-10-01 07:00:00 Ending end_time: 2017-10-10 07:00:00

2nd run: Starting end_time: 2017-10-11 07:00:00 Ending end_time: 2017-10-20 07:00:00

3rd run: Starting end_time: 2017-10-21 07:00:00 Ending end_time: 2017-10-30 07


*As of May 9, 2021, according to Facebook API change log, the IG follower_count now returns a maximum of 30 days of data instead of 2 years. User follower_count values now align more closely with their corresponding values displayed in the Instagram app. Therefore, when the user follower_count metric is selected individually, querying time MUST be within 30 days. When importing the entire User object, if querying time is older than the last 30 days (excluding the current day), the User follower_count metric will be returned as an empty value.

**Except for the online_followers metric, the lifetime metric requires that the Incremental Loading checkbox is unchecked and the Start Date and End Date fields are empty.

Data Settings

Data Settings allows you to customize the Transfer.

  1. Select Next.
    The Data Settings page opens.

  2. Optionally, edit the data settings or skip this page of the dialog.

Data Preview 

You can see a preview of your data before running the import by selecting Generate Preview. Data preview is optional and you can safely skip to the next page of the dialog if you choose to. 

  1. Select Next

  2. The Data Preview page opens. 

  3. If you want to preview your data, select Generate Preview.

  4. Verify the correct data is showing.


Data Placement

Data Placement

Specify where your data is placed and schedule how often to run this import.

  1.  Select Next.


Under Storage you will create a new or select an existing database and create a new or select an existing table for where you want to place the imported data.

  1. Select a Database > Select an existing or Create New Database.

  2. Select a Table> Select an existing or Create New Table.

  3. Choose the Append or Replace method for importing the data.

    • Append (default)-Data import results are appended to the table.
      If the table does not exist, it will be created.

    • Replace-Replaces the entire content of an existing table with the result output of the query.
      If the table does not exist, a new table is created. 

  4. Select the Timestamp-based Partition Key column.
    If you want to set a different partition key seed than the default key, you can specify the long or timestamp column as the partitioning time. As a default time column, it uses upload_time with the add_time filter.

  5. Select the Timezone for your data storage.

 Under Schedule, you can choose when and how often you want to run this query.

  •  Run once:

    1. Select Off.

    2. Select Scheduling Timezone.

    3. Select Create & Run Now.

  • Repeat the query:

    1. Select On.

    2. Select the Schedule. The UI provides these four options: @hourly, @daily and @monthly or custom cron.

    3. You can also select Delay Transfer and add a delay of execution time.

    4. Select Scheduling Timezone.

    5. Select Create & Run Now.

 After your transfer has run, you can see the results of your transfer in Data Workbench > Databases.

Use Command Line to create a Connection

Install ‘td’ command

Install the newest Treasure Data Toolbelt.

Create Seed Config File (seed.yml) 

in:
  access_token: EAAZAB0rX...EOdIc5YqAZDZD
  type: instagram_insight
  facebook_page_name: 172411252833693
  data_type: user
  incremental: true
  since: 2020-03-20
  until: 2020-04-10
  use_individual_metrics: true
  incremental_user_metrics:
    - value: email_contacts-day
    - value: impressions-days_28
out:
  mode: append

Configuration keys and descriptions are as follows:

Config key

Type

Required

Description

access_token

string

yes

Facebook long-lived User Access token.

facebook_page_name

string

yes

Facebook page ID that is linked to the Instagram Business account.

data_type

string 

yes

user, media, comments, media_list or tags

use_individual_metrics

boolean

no

Option to manually choose the metrics data to import.

incremental_user_preset_metric

string

no

User preset Metrics when incremental loading is set true

incremental_user_metrics

array

no

User Metrics to import when incremental loading is set true

non_incremental_user_preset_metric

string

no

User preset Metrics to import when incremental loading is set false

non_incremental_user_metrics

array

no

User Metrics to import when incremental loading is set false

media_metrics

array

no

Select individual Media Metrics to be transferred

media_preset_metric

string

no

Select Media preset Metrics to be transferred

incremental

boolean

no

When run repeatedly, only the data since the last import is collected.

since

string

no

Only import data since this date, see Date Range Import

until

string

no

Only import data until this date

throttle_wait_in_millis

int

no

When API limit are reached, Facebook will throttle (block) API calls. You must wait for the specified amount of time before you can start calling the API again. Default: 3600000

maximum_retries

int

no

Specifies the maximum retry times for each API call. Default: 3

initial_retry_interval_millis

int

no

Specifies the wait time for first retry. Default: 20000

maximum_retries_interval_millis

int

no

Specifies the maximum time between retries. Default: 120000

connect_timeout_in_millis

int

no

Specifies amount of time before the connection times out when doing API calls. Default: 30000

idle_timeout_in_millis

int

not

Specifies amount of time that a connection can stay idle in the pool. Default: 60000

For more details on available out modes, see Appendix

Guess Fields (Generate load.yml)

Use connector:guess. This command automatically reads the target data, and intelligently guesses the missing parts

$ td connector:guess seed.yml -o load.yml

If you open the load.yml file,  you see guessed config

---
in:
  access_token: EAAZA.....dIc5YqAZDZD
  type: instagram_insight
  facebook_page_name: '172411252833693'
  data_type: user
  incremental: true
  since: '2020-03-20'
  until: '2020-04-10'
  use_individual_metrics: true
  incremental_user_metrics:
  - {value: email_contacts-day}
  - {value: impressions-days_28}
out: {mode: append}
exec: {}
filters:
- from_value: {mode: upload_time}
  to_column: {name: time}
  type: add_time

Then you can preview the data by using preview command.

$ td connector:preview load.yml
+---------------------------+-------------------------+--------------------------+-------------------------------+
| end_time:timestamp        | email_contacts_day:json | impressions_days_28:json | time:timestamp                |
+---------------------------+-------------------------+--------------------------+-------------------------------+
| "2020-03-22 07:00:00 UTC" | "0"                     | "68314"                  | "2020-04-12 04:58:06.649 UTC" |
| "2020-03-28 07:00:00 UTC" | "0"                     | "54857"                  | "2020-04-12 04:58:06.649 UTC" |
| "2020-03-30 07:00:00 UTC" | "0"                     | "54941"                  | "2020-04-12 04:58:06.649 UTC" |
| "2020-04-04 07:00:00 UTC" | "0"                     | "48070"                  | "2020-04-12 04:58:06.649 UTC" |
| "2020-04-05 07:00:00 UTC" | "0"                     | "45857"                  | "2020-04-12 04:58:06.649 UTC" |
| "2020-03-21 07:00:00 UTC" | "0"                     | "65238"                  | "2020-04-12 04:58:06.649 UTC" |
| "2020-04-09 07:00:00 UTC" | "0"                     | "38592"                  | "2020-04-12 04:58:06.649 UTC" |
| "2020-04-06 07:00:00 UTC" | "0"                     | "43487"                  | "2020-04-12 04:58:06.649 UTC" |
| "2020-04-08 07:00:00 UTC" | "0"                     | "39955"                  | "2020-04-12 04:58:06.649 UTC" |
| "2020-03-23 07:00:00 UTC" | "0"                     | "69090"                  | "2020-04-12 04:58:06.649 UTC" |
| "2020-04-07 07:00:00 UTC" | "0"                     | "41208"                  | "2020-04-12 04:58:06.649 UTC" |
| "2020-04-01 07:00:00 UTC" | "0"                     | "55243"                  | "2020-04-12 04:58:06.649 UTC" |
| "2020-03-24 07:00:00 UTC" | "0"                     | "69152"                  | "2020-04-12 04:58:06.649 UTC" |
| "2020-03-20 07:00:00 UTC" | "0"                     | "63483"                  | "2020-04-12 04:58:06.649 UTC" |
| "2020-03-25 07:00:00 UTC" | "0"                     | "67868"                  | "2020-04-12 04:58:06.649 UTC" |
+---------------------------+-------------------------+--------------------------+-------------------------------+

Execute Load Job

Submit the load job. It may take a couple of hours depending on the data size. Users need to specify the database and table where their data is stored.

$ td connector:issue load.yml --database td_sample_db --table td_sample_table

The preceding command assumes that you have already created database(td_sample_db) and table(td_sample_table). If the database or the table do not exist in TD this command will not succeed, so create the database and table manually or use --auto-create-table option with td connector:issue command to automatically create the database and table:

$ td connector:issue load.yml --database td_sample_db --table td_sample_table --time-column created_at --auto-create-table 

You can assign Time Format column to the "Partitioning Key" by "--time-column" option.

Scheduled Execution

You can schedule periodic data connector execution. We carefully configure our scheduler to ensure high availability. By using this feature, you no longer need a cron daemon on your local data center.

Create the Schedule

A new schedule can be created by using the td connector:create command. The name of the schedule, cron-style schedule, the database and table where their data is stored, and the data connector configuration file are required.

$ td connector:create \
    daily_users_import \
    "10 0 * * *" \
    td_sample_db \
    td_sample_table \
    load.yml 

The `cron` parameter also accepts three special options: `@hourly`, `@daily` and `@monthly`.

Incremental Scheduling

You can load records incrementally by setting true for the `incremental` option.

in:
 type: instagram_insight
 ...
 incremental: true
out:
 mode: append

If you’re using scheduled execution, the connector automatically saves the last import time time_created value and holds it internally. Then it is used at the next scheduled execution.

in:
  type: instagram_insight
  ...
out:
  ...

Config Diff
---
in:
  time_created: '2020-02-02T15:46:25Z'

List the Schedules

You can see the list of scheduled entries by td connector:list.

$ td connector:list
+---------------------------+--------------+----------+-------+--------------+-----------------+---------------------------------+
| Name                      | Cron         | Timezone | Delay | Database     | Table           | Config                          |
+---------------------------+--------------+----------+-------+--------------+-----------------+---------------------------------+
|daily_users_import         | 10 0 * * *   | UTC      | 0     | td_sample_db | td_sample_table | {"type"=>"instagram_insight",... } |
+---------------------------+--------------+----------+-------+--------------+-----------------+---------------------------------+

Show the Setting and History of Schedules

td connector:show shows the execution setting of a schedule entry.

$ td connector:show daily_leads_import
Name     :daily_users_import
Cron     : 10 0 * * *
Timezone : UTC
Delay    : 0
Database : td_sample_db
Table    : td_sample_table
Config
---
// Displayed load.yml configuration.

td connector:history shows the execution history of a schedule entry. To investigate the results of each individual execution, use td job <jobid>.

Delete the Schedule

td connector:delete removes the schedule.

$ td connector:delete daily_users_import 

Modes for the Out Plugin

You can specify file import mode in the out section of the load.yml file.

The out: section controls how data is imported into a Treasure Data table.
For example, you may choose to append data or replace data in an existing table in Treasure Data.

Mode

Description

Examples

Append

Records are appended to the target table.

in:
  ...
out:
  mode: append

Always Replace

Replaces data in the target table. Any manual schema changes made to the target table remain intact.

in:
  ...
out:
  mode: replace

Replace on new data

Replaces data in the target table only when there is new data to import.

in:
  ...
out:
  mode: replace_on_new_data

  • No labels