Tailing Existing Log Files

td-agent can “tail” log files like the UNIX tail command, then import the results into the cloud.

Table of Contents

Prerequisites

  • Basic knowledge of Treasure Data, including the toolbelt.
  • Basic knowledge of td-agent.

Tailing JSON-based Logs

We are using the tail input plugin with the following configuration file. We assume that each line of the log corresponds to a well-formed JSON (should not span multiple lines).

This feature is supported in td-agent v1.1.5.1 and higher.
<source>
  type tail
  path /path/to/the/file
  tag td.test_db.test_table
  format json
  pos_file /var/log/td-agent/test_db_test_table.pos
</source>

<match td.*.*>
  type tdlog
  apikey ...
  auto_create_table
  buffer_type file
  buffer_path /var/log/td-agent/buffer/td
  use_ssl true
</match>

Here is a sample log file. Every time a new line is appended to the log file, td-agent parses the line and adds it to its buffer. td-agent uploads the data into the cloud every 5 minutes; to upload the data immediately, please send a SIGUSR1 signal.

{"a"=>"b", "c"=>"d"}
{"a"=>"b", "c"=>"d", "e"=>1}
{"a"=>"b", "c"=>"d", "e"=>1, "f"=>2.0}
{"a"=>"b", "c"=>"d"}
{"a"=>"b", "c"=>"d", "e"=>1}

Issue the commands below to confirm that everything is configured correctly.

# append new entries
$ tail -n 3 /path/to/log/file > sample.txt # take the last three lines of the log...
$ cat sample.txt >>/path/to/buffer/file    # and append them to the buffer file to trigger the tail plugin.

# flush the buffer
$ kill -USR1 `cat /var/run/td-agent/td-agent.pid`

# confirm the upload
$ td tables test_db
td-agent handles log-rotation. td-agent keeps a record of the last position of the log, ensuring that each line is read exactly once even if the td-agent process goes down. However, since the information is kept in a file, the "exactly once" guarantee breaks down if the file becomes corrupted.

Tailing Custom-Formatted Logs

If your logs are in a custom format, you will need to write a custom parser (instructions). Once you have written the parser, please put the file into your /etc/td-agent/plugins/ directory.

We provide two example parsers: “URL-param style key-value pairs” and “ascii character delimited format”. Both formats are fairly common among our users.

# URL-param style key-value pairs
last_name=smith&first_name=brian&age=22&state=CA

# ASCII character delimited format. In this case, the delimiter is '|'.
# There is usually a separate file that annotates the column names
smith|brian|22|CA 
Tailing existing logs is by far the easiest way to get started with Treasure Data. We recommend logging everything as JSON. Here's why.

Filtering Out the Records

If you need to filter logs (ex: filtering out impressions and just keeping clicks), the exec-filter plugin is useful. This plugin launches another script which takes STDIN as input and STDOUT as output, and filters logs accordingly.

Here’s an example configuration.

<source>
  type tail
  path /path/to/the/file1
  tag filter
  format json
  pos_file /var/log/td-agent/file1.pos
</source>

<match filter>
  type exec_filter
  command /usr/lib64/fluent/ruby/bin/ruby /etc/td-agent/filter.rb
  in_format json  # takes a JSON string from STDIN
  out_format json # generates a JSON string to STDOUT
  tag_key tag     # The key for tags is "tag".
  time_key time   # The key for timestamps is "time".
</match>

<match td.*.*>
  type tdlog
  apikey ...
  auto_create_table
  buffer_type file
  buffer_path /var/log/td-agent/buffer/td
  use_ssl true
</match>

/etc/td-agent/filter.rb is the filter script (shown below). It filters out all the lines where the field “field0” is equal to “certain_value”. Errors are recorded in /var/log/td-agent/filter.rb.log.

open('/var/log/td-agent/filter.rb.log', 'a') { |f|
  f.puts "-- begin --"
  begin
    require 'json'
    STDOUT.sync = true
    while line = STDIN.gets
      # parse
      begin
        h = JSON.parse line
      rescue => e
        next # broken line
      end
      # filter
      # next if h["field0"] == "certain_value"
      # emit
      h['tag'] = 'td.testdb.test_table'
      puts h.to_json
    end
  rescue LoadError => e
    f.puts e.to_s
  end
}

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels. Live chat with our staffs also work well.