Tailing Custom-Formatted Logs

If your logs are in a custom format, you will need to write a custom parser (instructions). Once you have written the parser, please put the file into your /etc/td-agent/plugins/ directory.

We provide two example parsers: “URL-param style key-value pairs” and “ascii character delimited format”. Both formats are fairly common among our users.

# URL-param style key-value pairs

# ASCII character delimited format. In this case, the delimiter is '|'.
# There is usually a separate file that annotates the column names
Tailing existing logs is by far the easiest way to get started with Treasure Data. We recommend logging everything as JSON. Here's why.

Table of Contents

Filtering Out the Records

If you need to filter logs (ex: filtering out impressions and just keeping clicks), the exec-filter plugin is useful. This plugin launches another script which takes STDIN as input and STDOUT as output, and filters logs accordingly.

Here’s an example configuration.

  type tail
  path /path/to/the/file1
  tag filter
  format json
  pos_file /var/log/td-agent/file1.pos

<match filter>
  type exec_filter
  command /usr/lib64/fluent/ruby/bin/ruby /etc/td-agent/filter.rb
  in_format json  # takes a JSON string from STDIN
  out_format json # generates a JSON string to STDOUT
  tag_key tag     # The key for tags is "tag".
  time_key time   # The key for timestamps is "time".

<match td.*.*>
  type tdlog
  endpoint api.treasuredata.com
  apikey ...
  buffer_type file
  buffer_path /var/log/td-agent/buffer/td
  use_ssl true

/etc/td-agent/filter.rb is the filter script (shown below). It filters out all the lines where the field “field0” is equal to “certain_value”. Errors are recorded in /var/log/td-agent/filter.rb.log.

open('/var/log/td-agent/filter.rb.log', 'a') { |f|
  f.puts "-- begin --"
    require 'json'
    STDOUT.sync = true
    while line = STDIN.gets
      # parse
        h = JSON.parse line
      rescue => e
        next # broken line
      # filter
      # next if h["field0"] == "certain_value"
      # emit
      h['tag'] = 'td.testdb.test_table'
      puts h.to_json
  rescue LoadError => e
    f.puts e.to_s

Last modified: Aug 03 2015 00:01:48 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.