Skip to content

Logs Import Using CSV, TSV, and LTSV Formats

td-agent was discontinued in December 2023 and has been replaced by fluent-package. The fluent-package is the official successor maintained by the Cloud Native Computing Foundation.

You can import CSV, TSV, and LTSV formatted logs using Fluentd to continuously import the access logs into the cloud.

Fluentd handles log-rotation. Fluentd keeps a record of the last position of the log, ensuring that each line is read exactly once even if the Fluentd process goes down. However, because the information is kept in a file, the "exactly once" guarantee breaks down if the file becomes corrupted.

Prerequisites

  • Basic knowledge of Fluentd, and its installation.

Installing Fluentd

Fluentd (fluent-package) Install Options

td-agent was discontinued in December 2023 and has been replaced by fluent-package. The fluent-package is the official successor maintained by the Cloud Native Computing Foundation. For migration guidance from td-agent, see Fluentd Installation Guide.

To install fluent-package, run one of the following commands based on your environment.

RHEL/CentOS/Rocky Linux

# fluent-package 6 LTS (recommended)
curl -fsSL https://fluentd.cdn.cncf.io/sh/install-redhat-fluent-package6-lts.sh | sh

Ubuntu

# Ubuntu 24.04 Noble - fluent-package 6 LTS
curl -fsSL https://fluentd.cdn.cncf.io/sh/install-ubuntu-noble-fluent-package6-lts.sh | sh

# Ubuntu 22.04 Jammy - fluent-package 6 LTS
curl -fsSL https://fluentd.cdn.cncf.io/sh/install-ubuntu-jammy-fluent-package6-lts.sh | sh

Debian

# Debian Bookworm - fluent-package 6 LTS
curl -fsSL https://fluentd.cdn.cncf.io/sh/install-debian-bookworm-fluent-package6-lts.sh | sh

Amazon Linux

# Amazon Linux 2023 - fluent-package 6 LTS
curl -fsSL https://fluentd.cdn.cncf.io/sh/install-amazon2023-fluent-package6-lts.sh | sh

Windows

Download the MSI installer from:

After installation:

  1. Edit the configuration file at C:/opt/fluent/etc/fluent/fluentd.conf
  2. Start the service using net start fluentdwinsvc or via Services administrative tool

macOS

fluent-package for macOS is planned to be available via Homebrew. For current installation options, see Fluentd Installation Guide.

Starting the Service

After installation, start and verify the Fluentd service.

Linux

sudo systemctl start fluentd.service
sudo systemctl status fluentd.service

The configuration file is located at /etc/fluent/fluentd.conf.

Windows

net start fluentdwinsvc

The configuration file is located at C:\opt\fluent\etc\fluent\fluentd.conf.

macOS (gem installation)

fluentd -c /path/to/fluentd.conf

For more details, see the Fluentd Documentation.

Modifying fluentd.conf

Specify your authentication key by setting the apikey option. You can view your API key from the TD Console.

Access /etc/fluent/fluentd.conf (for fluent-package) to set the apikey option.

YOUR_API_KEY should be your API key string.

# Tailing the CSV formatted Logs
<source>
  @type tail
  <parse>
    @type csv
    keys key1, key2, key3
    time_key key3
  </parse>
  path /path/to/log/foo.csv
  pos_file /var/log/fluent/foo.pos
  tag td.production.foo
</source>

# Tailing the TSV formatted Logs
<source>
  @type tail
  <parse>
    @type tsv
    keys key1, key2, key3
    time_key key3
  </parse>
  path /path/to/log/bar.tsv
  pos_file /var/log/fluent/bar.pos
  tag td.production.bar
</source>

# Tailing the LTSV formatted Logs
<source>
  @type tail
  <parse>
    @type ltsv
    time_key time_field_name
  </parse>
  path /path/to/log/buz.ltsv
  pos_file /var/log/fluent/buz.pos
  tag td.production.buz
</source>

# Treasure Data Input and Output
<match td.*.*>
  @type tdlog
  endpoint api.treasuredata.com
  apikey YOUR_API_KEY
  auto_create_table
  use_ssl true
  <buffer>
    @type file
    path /var/log/fluent/buffer/td
  </buffer>
</match>

Restart the Fluentd service when the following lines are in place.

sudo systemctl restart fluentd.service

Fluentd tails the file, buffers the log (/var/log/fluent/buffer/td), and automatically uploads the log into the cloud.

Confirming Data Import

Sending a SIGUSR1 signal flushes Fluentd's buffer; upload starts immediately.

# append new records to the logs
$ ...

# flush the buffer
$ kill -USR1 $(cat /var/run/fluent/fluentd.pid)

To confirm that your data uploads successfully, issue the td tables command as follows.

$ td tables
+------------+------------+------+-----------+
| Database   | Table      | Type | Count     |
+------------+------------+------+-----------+
| production | foo        | log  | 1         |
| production | bar        | log  | 3         |
| production | buz        | log  | 5         |
+------------+------------+------+-----------+

Check /var/log/fluent/fluentd.log if it's not working correctly. The fluentd:fluentd user must have permission to read the logs.

Next Steps

We offer a schema mechanism that is more flexible than that of traditional RDBMSs. For queries, we leverage the Hive Query Language.