Tailing CSV/TSV/LTSV Logs

This article explains how to tail CSV/TSV/LTSV formatted logs from td-agent, to continuously import the access logs into the cloud.

Table of Contents

Installing td-agent

‘td-agent’ needs to be installed on your application servers. td-agent is a daemon program dedicated to the streaming upload of any kind of the time-series data. td-agent is developed and maintained by Treasure Data, Inc.



To set up td-agent, please refer to the following articles; we provide deb/rpm packages for Linux systems.

If you have... Please refer to...
MacOS X Installing td-agent on MacOS X
Ubuntu System Installing td-agent for Debian and Ubuntu
RHEL / CentOS System Installing td-agent for Redhat and CentOS
AWS Elastic Beanstalk Installing td-agent on AWS Elastic Beanstalk
Untitled-3
td-agent is fully open-sourced under the fluentd project. td-agent extends fluentd with custom plugins for Treasure Data.

Modifying /etc/td-agent/td-agent.conf

Next, please specify your authentication key by setting the apikey option. You can view your api key from the console. Next, please set the apikey option in your td-agent.conf file.

Note: YOUR_API_KEY should be your actual apikey string.

# Tailing the CSV formatted Logs
<source>
  type tail
  format csv
  path /path/to/log/foo.csv
  pos_file /var/log/td-agent/foo.pos
  tag td.production.foo

  keys key1, key2, key3
  time_key key3
</source>

# Tailing the TSV formatted Logs
<source>
  type tail
  format tsv
  path /path/to/log/bar.tsv
  pos_file /var/log/td-agent/bar.pos
  tag td.production.bar

  keys key1, key2, key3
  time_key key3
</source>

# Tailing the LTSV formatted Logs
<source>
  type tail
  format ltsv
  path /path/to/log/buz.ltsv
  pos_file /var/log/td-agent/buz.pos
  tag td.production.buz

  time_key time_field_name
</source>

# Treasure Data Input and Output
<match td.*.*>
  type tdlog
  endpoint api.treasuredata.com
  apikey YOUR_API_KEY
  auto_create_table
  buffer_type file
  buffer_path /var/log/td-agent/buffer/td
  use_ssl true
</match>

Please restart your agent once these lines are in place.

$ sudo /etc/init.d/td-agent restart

td-agent will now tail the file, buffer it (var/log/td-agent/buffer/td), and automatically upload it into the cloud.

Confirming Data Import

Sending a SIGUSR1 signal will flush td-agent’s buffer; upload will start immediately.

# append new records to the logs
$ ...

# flush the buffer
$ kill -USR1 `cat /var/run/td-agent/td-agent.pid`

To confirm that your data has been uploaded successfully, issue the td tables command as shown below.

$ td tables
+------------+------------+------+-----------+
| Database   | Table      | Type | Count     |
+------------+------------+------+-----------+
| production | foo        | log  | 1         |
| production | bar        | log  | 3         |
| production | buz        | log  | 5         |
+------------+------------+------+-----------+

Please check /var/log/td-agent.log if it’s not working correctly. td-agent:td-agent needs to have a permission to read the logs.

Untitled-3
td-agent handles log-rotation. td-agent keeps a record of the last position of the log, ensuring that each line is read exactly once even if the td-agent process goes down. However, since the information is kept in a file, the "exactly once" guarantee breaks down if the file becomes corrupted.

Next Steps

We offer a schema mechanism that is more flexible than that of traditional RDBMSs. For queries, we leverage the Hive Query Language.


Last modified: Aug 03 2015 00:01:48 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.