Tailing CSV/TSV/LTSV Logs
This article explains how to tail CSV/TSV/LTSV formatted logs from td-agent, to continuously import the access logs into the cloud.
Table of Contents
‘td-agent’ needs to be installed on your application servers. td-agent is a daemon program dedicated to the streaming upload of any kind of the time-series data. td-agent is developed and maintained by Treasure Data, Inc.
To set up td-agent, please refer to the following articles; we provide deb/rpm packages for Linux systems.
|If you have...||Please refer to...|
|MacOS X||Installing td-agent on MacOS X|
|Ubuntu System||Installing td-agent for Debian and Ubuntu|
|RHEL / CentOS System||Installing td-agent for Redhat and CentOS|
|AWS Elastic Beanstalk||Installing td-agent on AWS Elastic Beanstalk|
|td-agent is fully open-sourced under the fluentd project. td-agent extends fluentd with custom plugins for Treasure Data.|
Next, please specify your authentication key by setting the
apikey option. You can view your api key from the console. Next, please set the
apikey option in your td-agent.conf file.
Note: YOUR_API_KEY should be your actual apikey string.
# Tailing the CSV formatted Logs <source> type tail format csv path /path/to/log/foo.csv pos_file /var/log/td-agent/foo.pos tag td.production.foo keys key1, key2, key3 time_key key3 </source> # Tailing the TSV formatted Logs <source> type tail format tsv path /path/to/log/bar.tsv pos_file /var/log/td-agent/bar.pos tag td.production.bar keys key1, key2, key3 time_key key3 </source> # Tailing the LTSV formatted Logs <source> type tail format ltsv path /path/to/log/buz.ltsv pos_file /var/log/td-agent/buz.pos tag td.production.buz time_key time_field_name </source> # Treasure Data Input and Output <match td.*.*> type tdlog endpoint api.treasuredata.com apikey YOUR_API_KEY auto_create_table buffer_type file buffer_path /var/log/td-agent/buffer/td use_ssl true </match>
Please restart your agent once these lines are in place.
$ sudo /etc/init.d/td-agent restart
td-agent will now tail the file, buffer it (var/log/td-agent/buffer/td), and automatically upload it into the cloud.
Confirming Data Import
Sending a SIGUSR1 signal will flush td-agent’s buffer; upload will start immediately.
# append new records to the logs $ ... # flush the buffer $ kill -USR1 `cat /var/run/td-agent/td-agent.pid`
To confirm that your data has been uploaded successfully, issue the
td tables command as shown below.
$ td tables +------------+------------+------+-----------+ | Database | Table | Type | Count | +------------+------------+------+-----------+ | production | foo | log | 1 | | production | bar | log | 3 | | production | buz | log | 5 | +------------+------------+------+-----------+
/var/log/td-agent.log if it’s not working correctly.
td-agent:td-agent needs to have a permission to read the logs.
|td-agent handles log-rotation. td-agent keeps a record of the last position of the log, ensuring that each line is read exactly once even if the td-agent process goes down. However, since the information is kept in a file, the "exactly once" guarantee breaks down if the file becomes corrupted.|
We offer a schema mechanism that is more flexible than that of traditional RDBMSs. For queries, we leverage the Hive Query Language.
Last modified: Aug 03 2015 00:01:48 UTC