Overview of Server-Side Agent (td-agent)

Treasure Data provides Server-Side Agent called Treasure Agent (td-agent), to collect server-side logs and events. This article will explain how to continuously import data using td-agent.

Table of Contents

Prerequisites

  • Basic knowledge of Treasure Data. The Quickstart Guide is a good place to start.

Logs Are Streams, Not Files

Logs are usually rotated on an hourly or daily basis based on time or size. This system quickly produces many large log files that need to be batch imported for further analysis. This is an outdated approach. Logs are better treated as continuously generated STREAMS as opposed to files.

Untitled-3
"Server daemons (such as PostgreSQL or Nginx) and applications (such as a Rails or Django app) sometimes offer a configuration parameter for a path to the program’s logfile. This can lead us to think of logs as files. But a better conceptual model is to treat logs as time-ordered streams..." - Logs Are Streams, Not Files Adam Wiggins, Heroku co-founder.

td-agent, a data collection daemon, is used to import data continuously to Treasure Data. Although bulk-import is supported, we recommend importing your data continuously via td-agent.

What is Treasure Agent?

td-agent is a data collection daemon. It collects logs from various data sources and uploads them to Treasure Data.

How to install Treasure Agent?

This video demonstrates how to install td-agent in 3 minutes.


To install Treasure Agent (td-agent), please execute one of the command below based on your environment. The agent program will be installed automatically by using the package management software for each platform like rpm/deb/dmg.

MacOS X 10.11+

$ open 'https://packages.treasuredata.com/2/macosx/td-agent-2.3.0-0.dmg'
Untitled-3
With MacOS X 10.11.1 (El Capitan), some security changes were introduced and we are testing the changes we made to td-agent for this version of OS. For now, once the td-agent is installed, please edit the /Library/LaunchDaemons/td-agent.plist file to change /usr/sbin/td-agent to /opt/td-agent/usr/sbin/td-agent.

RHEL/CentOS 5,6,7

$ curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh

Ubuntu & Debian

# 16.04 Xenial (64bit only)
$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent2.sh | sh
# 14.04 Trusty
$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-trusty-td-agent2.sh | sh
# 12.04 Precise
$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-precise-td-agent2.sh | sh

# Debian Jessie (64-bit only)
$ curl -L https://toolbelt.treasuredata.com/sh/install-debian-jessie-td-agent2.sh | sh
# Debian Squeeze (64-bit only)
$ curl -L https://toolbelt.treasuredata.com/sh/install-debian-squeeze-td-agent2.sh | sh

Amazon Linux

$ curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh

Opscode Chef (repository)

$ echo 'cookbook "td-agent"' >> Berksfile
$ berks install

AWS Elastic Beanstalk is also supported. Windows is currently NOT supported.

Set up td-agent

After installing td-agent, you can modify your config file. The file can be found in /etc/td-agent/td-agent.conf.

The config file comes with some sample settings. You will find the following lines in your config. Please specify your API key by setting the apikey option. You can view your api key from the console.

# HTTP input
<source>
  type http
  port 8888
</source>

# Treasure Data output
<match td.*.*>
  type tdlog
  endpoint api-import.treasuredata.com
  apikey YOUR_API_KEY
  auto_create_table
  buffer_type file
  buffer_path /var/log/td-agent/buffer/td
  use_ssl true
  num_threads 8
</match>
Untitled-3
YOUR_API_KEY should be your actual apikey string. You can retrieve your api key from HERE. Using the [write-only key](access-control#rest-apis-access) is recommended.

Now please restart the td-agent service.

# Linux
$ sudo /etc/init.d/td-agent restart

# MacOS X
$ sudo launchctl unload /Library/LaunchDaemons/td-agent.plist
$ sudo launchctl load /Library/LaunchDaemons/td-agent.plist

Confirm Data Upload

You can add logs in JSON format using HTTP.

$ curl -X POST -d 'json={"action":"login","user":2}' \
  http://localhost:8888/td.testdb.www_access

td-agent continuously uploads logs every 5 minutes. You can force td-agent to flush the buffered logs into the cloud by sending a SIGUSR1 signal.

# Linux
$ kill -USR1 `cat /var/run/td-agent/td-agent.pid`

# MacOS X
$ sudo kill -USR1 `sudo launchctl list | grep td-agent | cut -f 1`

Finally, please visit the Databases page at Web console, to check if the data got imported successfully. From CLI, you can check by issuing the td tables command:

$ td tables
+------------+------------+------+-----------+
| Database   | Table      | Type | Count     |
+------------+------------+------+-----------+
| testdb     | www_access | log  | 1         |
+------------+------------+------+-----------+

If you run into any issues, the td-agent log (/var/log/td-agent.log) is a good place to start your investigation.

Agent Overhead

In terms of resource consumption the td-agent consumes roughly:

  • Resident memory (actual RAM used): 50MB
  • CPU Runtime: less than 2% of averaged runtime + your workload
  • Disk:
    • Linux 120MB + your file buffer (configurable)

If you think td-agent is slow, check out “5 Tips to Optimize Fluentd Performance”.

Files Installed by the Packages

The files shown below are installed by the rpm or debian packages.

Resource Location Notes
Config Directory /etc/td-agent/
Config File /etc/td-agent/td-agent.conf This config will be picked-up by the startup script
Startup Script /etc/init.d/td-agent
Log Directory /var/log/td-agent/
Plugin Directory /etc/td-agent/plugin/ Your custom plugins go here.
Ruby Interpreter /opt/td-agent/embedded/bin/ruby Ruby v2.1 is bundled with the package.
Rubygems /usr/sbin/td-agent-gem Bundled rubygems to install fluentd plugins. For example: `/usr/sbin/td-agent-gem install fluent-plugin-mongo`
jemalloc /opt/td-agent/embedded/lib/libjemalloc.so jemalloc is bundled together to avoid memory fragmentation. It is loaded by default in the startup script.

Supervision, Privileges and Network Ports

When td-agent starts, it launches 2 processes: master and slave. The master process is managing the life cycle of slave process, and slave process handles actual log collection.

$ ps w -C ruby -C td-agent --no-heading
32342 ?        Sl     0:00 /opt/td-agent/embedded/bin/ruby /usr/sbin/td-agent --daemon /var/run/td-agent/td-agent.pid --log /var/log/td-agent/td-agent.log
32345 ?        Sl     0:01 /opt/td-agent/embedded/bin/ruby /usr/sbin/td-agent --daemon /var/run/td-agent/td-agent.pid --log /var/log/td-agent/td-agent.log

Both processes run as the td-agent user under td-agent group, and all forked subprocesses run as the same. This applies to any system call initiated by td-agent as well. The agent configuration resides at /etc/td-agent/td-agent.conf. All configuration must be readable by td-agent.

The following ports are open depending on your input.

  • in_tail: nothing
  • in_forward: tcp/24224, udp/24224
  • in_unix: /var/run/td-agent/td-agent.sock

For secure uploading to Treasure Data, you need to open tcp/80 (http) and tcp/443 (https) for *.treasuredata.com.

Debugging

If you are having issues, please add the following line to /etc/default/td-agent to enable verbose logging:

DAEMON_ARGS=-vv

After that, please restart the daemon. You can now find more verbose logs in /var/log/td-agent.log

What’s Next?

Next, modify your existing applications to post data to Treasure Data. The articles below explain the process (with sample code) for various languages, frameworks, and middleware.

Languages and Frameworks

Supported Languages
Ruby or Rails Java Perl
Python PHP Scala
Node.js

Middleware

High-Availability Configurations and Monitoring

For high-traffic websites, we recommend using a high availability configuration for td-agent. Monitoring the daemon is also important.

NOTE: td-agent is fully open-sourced under the fluentd project.


Last modified: Oct 03 2016 10:42:15 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.