Streaming Twitter Data into Treasure Data from Node.js

Table of Contents

Introduction

Node.js is an increasingly popular server-side implementation of JavaScript. In this article, we show how to stream Twitter Data into Treasure Data from Node.js via td-agent.

Untitled-3
We have another article that covers using Treasure Data from Node.js in general.

Prerequisites

Configuring td-agent

Having covered the prerequisites, you should already have the td-command (td) installed and set up on your machine (You can download the td-command from the toolbelt page).

We will now configure td-agent to listen to port 24224 (This port number is arbitrary, but the sample code in this article assumes this port number). Edit the td-agent configuration file (found in ‘/etc/td-agent/td-agent.conf’ by default) as shown below:

<match td.*.*>
  apikey YOUR_API_KEY
  ...
<source>
  type forward
  port 24224
  ...

You can look up YOUR_API_KEY from the console.

Getting ntwitter + fluent-logger-node

You will need two Node.js libraries: ntwitter to grab tweets from Twitter’s Stream API and fluent-logger-node to send data to td-agent/fluentd.

Here is what package.json should look like:

{
  "name": "sample-app",
  "version": "0.0.1",
  "private": true,
  "dependencies": {
    "ntwitter": "~0.5.0",
    "fluent-logger": "~0.1.0"
  }
}

All you need to do now is run npm install.

Streaming Tweets into Treasure Data

The code shown below searches for the keyword javascript on Twitter, grabs the matching tweets, and streams them onto Treasure Data.

/*jslint indent: 4*/
/*jslint node: true */
'use strict';

var Twitter = require('ntwitter'),
    logger = require('fluent-logger');

// Configure the logger to post data to localhost:24224
// The "td.test_db" tag tells td-agent to store data in a database named "test_db".
logger.configure('td.test_db', {host: 'localhost', port: 24224});

var twit = new Twitter({
    consumer_key: 'XXX',
    consumer_secret: 'XXX',
    access_token_key: 'XXX',
    access_token_secret: 'XXX',
});

// Tracking the keyword 'javascript'
twit.stream('statuses/filter', {'track': 'javascript'}, function (stream) {
    stream.on('data', function (data) {
            // Sending the data to a table named "javascript"
        logger.emit('javascript', data);
    });
});

For consumer_key/consumer_secret/access_token_key/access_token_secret, please use your own values.

That’s it! You’ll start seeing data populated on Treasure Data soon. You can view the data via the command-line (ex: td tables) or a browser console.

The example query below returns the top 20 most retweeted Tweets:

$ td query -w -d test_db '
    SELECT
        get_json_object(v["user"], "$.screen_name"),
        v["text"],
        v["retweet_count"] AS retweet_count
    FROM javascript
    ORDER BY retweet_count DESC
    LIMIT 20'

Next Steps

The Node.js article explains how to use Node.js with Treasure Data in general. It also includes tips on how to set up Treasure Data for high availability.

Acknowledgement

We would like to thank @cou929 for writing the blog post that inspired this article (for instance, the Node.js code snippet was taken from his blog entry with some additional comments).


Last modified: Feb 24 2017 09:27:52 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.