Server-Side Agent with Python Apps

Treasure Data provides Server-Side Agent called Treasure Agent (td-agent), to collect server-side logs and events. This article explains 4 steps to streamingly import the data from Python applications, through Treasure Agent.

Table of Contents

Prerequisites

  • Basic knowledge of Python.
  • Basic knowledge of Treasure Data, including the toolbelt.
  • Python 2.6 or higher (for local testing).

What is Treasure Agent?

First of all, Treasure Agent (td-agent) needs to be installed on your application servers. Treasure Agent is an agent program sits within your application servers, focusing on uploading application logs to the cloud.



The fluent-logger-python library enables Python applications to post records to their local td-agent. td-agent in turn uploads the data to the cloud every 5 minutes. Because the daemon runs on a local node, the logging latency is negligible.

How to install Treasure Agent?

This video demonstrates how to install Treasure Agent in 3 minutes.

Step 1: Installing Treasure Agent

To install Treasure Agent (td-agent), please execute one of the command below based on your environment. The agent program will be installed automatically by using the package management software for each platform like rpm/deb/dmg.

RHEL/CentOS 5,6,7

$ curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh

Ubuntu & Debian

# 14.04 Trusty (64bit only)
$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-trusty-td-agent2.sh | sh
# 12.04 Precise
$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-precise-td-agent2.sh | sh
# 10.04 Lucid
$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-lucid-td-agent2.sh | sh

# Debian Squeeze (64bit only)
$ curl -L https://toolbelt.treasuredata.com/sh/install-debian-squeeze-td-agent2.sh | sh
# Debian Wheezy (64bit only)
$ curl -L https://toolbelt.treasuredata.com/sh/install-debian-wheezy-td-agent2.sh | sh

Amazon Linux

$ curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh

MacOS X 10.11+

$ open 'https://packages.treasuredata.com/2/macosx/td-agent-2.3.0-0.dmg'
Untitled-3
With MacOS X 10.11.1 (El Capitan), some security changes were introduced and we are testing the changes we made to td-agent for this version of OS. For now, once the td-agent is installed, please edit the /Library/LaunchDaemons/td-agent.plist file to change /usr/sbin/td-agent to /opt/td-agent/usr/sbin/td-agent.

Windows Server 2012+

Windows installation needs multiple steps to follow. Please go to this documentation.



Opscode Chef (repository)

$ echo 'cookbook "td-agent"' >> Berksfile
$ berks install

AWS Elastic Beanstalk is also supported. Windows is currently NOT supported.

Step 2: Modifying /etc/td-agent/td-agent.conf

Next, please specify your API key by setting the apikey option. You can view your api key from the console. Note: YOUR_API_KEY should be your actual apikey string.

# Treasure Data Input and Output
<source>
  type forward
  port 24224
</source>

<match td.*.*>
  type tdlog
  endpoint api.treasuredata.com
  apikey YOUR_API_KEY
  auto_create_table
  buffer_type file
  buffer_path /var/log/td-agent/buffer/td
</match>
Untitled-3
YOUR_API_KEY should be your actual apikey string. You can retrieve your api key from HERE. Using a [write-only API key](access-control#rest-apis-access) is recommended.

Please restart your agent once these lines are in place.

# Linux
$ sudo /etc/init.d/td-agent restart

# MacOS X
$ sudo launchctl unload /Library/LaunchDaemons/td-agent.plist
$ sudo launchctl load /Library/LaunchDaemons/td-agent.plist

td-agent will now accept data via port 24224, buffer it (var/log/td-agent/buffer/td), and automatically upload it into the cloud.

Step 3: Using fluent-logger-python

First, install the fluent-logger library via pip.

$ pip install fluent-logger

Next, initialize and post the records as follows.

# Initialize
from fluent import sender
from fluent import event
sender.setup('td.test_db', host='localhost', port=24224)
event.Event('follow', {
  'from': 'userA',
  'to':   'userB'
})

Step 4: Confirm the Import

First, please execute the program above.

$ python test.py

Sending a SIGUSR1 signal will flush td-agent’s buffer; upload will start immediately.

# Linux
$ kill -USR1 `cat /var/run/td-agent/td-agent.pid`

# MacOS X
$ sudo kill -USR1 `sudo launchctl list | grep td-agent | cut -f 1`

To confirm that your data has been uploaded successfully, issue the td tables command as shown below.

$ td tables
+------------+------------+------+-----------+
| Database   | Table      | Type | Count     |
+------------+------------+------+-----------+
| test_db    | follow     | log  | 1         |
+------------+------------+------+-----------+

Production Deployments

Use gunicorn, tornado or modwsgi

We recommend that you use gunicorn, tornado or modwsgi. Other setups have not been fully validated.

High-Availablability Configurations of td-agent

For high-traffic websites (more than 5 application nodes), we recommend using a high availability configuration of td-agent. This will improve data transfer reliability and query performance.

Monitoring td-agent

Monitoring td-agent itself is also important. Please refer to this document for general monitoring methods for td-agent.

Untitled-3
td-agent is fully open-sourced under the fluentd project.

Next Steps

We offer a schema mechanism that is more flexible than that of traditional RDBMSs. For queries, we leverage the Hive Query Language.


Last modified: Oct 03 2016 10:50:26 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.