Continuous Data Import from Logstash

Logstash is an open source software for log management, which is widely known and used as a part of the ELK stack.

There’s a great repository collection with many plugins for Logstash to collect, filter and store data from many source, and to many destinations. This article describes how to ingest data from Logstash into Treasure Data Service.

Table of Contents

How to install / use the plugin

You can install that plugin very easily if you’re already using Logstash.

$ cd /path/of/logstash
$ bin/plugin install logstash-output-treasure_data
Validating logstash-output-treasure_data
Installing logstash-output-treasure_data
Installation successful

Next, configure Logstash with Treasure Data services. It requires name of database and table to be inserted. You can retrieve your api key from HERE. Using the write-only key is recommended.

input {
  # ...
}
output {
  treasure_data {
    apikey   => "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    database => "dbname"
    table    => "tablename"
    endpoint => "api-import.treasuredata.com"
  }
}

Then, launch Logstash with that configuration file.

$ bin/logstash -f your.conf

You’ll get rows on Treasure Data console. Log message texts are stored in message column, and some additional columns will exist (ex: time, host and version).



Specifications / Configurations

Currently, this plugin store all logs into just a table. Use 2 or more sections for treasure_data in configuration file if you want to insert data into 2 or more tables.

This plugin have configuration options listed below. The plugin will work with default values for almost all cases, but some of these might help you under unstable network environments.

  • apikey (required)
  • database (required)
  • table (required)
  • auto_create_table [true]: plugin will create table if table doesn’t exists
  • endpoint [api.treasuredata.com]
  • use_ssl [true]
  • http_proxy [none]
  • connect_timeout [60s]
  • read_timeout [600s]
  • send_timeout [600s]

Please note that this plugin buffers data in memory buffer in 5 minutes (in longest case). Buffered data will be lost if Logstash process crashes. To solve this problem, we recommend to use Treasure Agent. Please check the section below.

Combination with Treasure Agent

For now, logstash-output-treasure_data has very limited feature, especially for buffering, stored table specifications and performance.

There’s an another option to use Treasure Agent for more flexible and high performance transferring. We can use logstash-output-fluentd to do it.

[host a]
  -> (logstash-output-fluentd) -+
[host b]                        |
  -> (logstash-output-fluentd) -+- [Treasure Agent] -> [Treasure Data]
[host c]                        |
  -> (logstash-output-fluentd) -+

Many Logstash can be configured to send these logs to a td-agent node, and that td-agent stores whole data into Treasure Data.

# Configuration for Logastash
input {
  # ...
}
output {
  fluentd {
    host => "your.host.name.example.com"
    port => 24224 # default
    tag  => "td.database.tablename"
  }
}
# Configuration for Fluentd
<source>
  @type forward
  port 24224
</source>
<match td.*.*>
  type tdlog
  endpoint api-import.treasuredata.com
  apikey YOUR_API_KEY
  auto_create_table
  buffer_type file
  buffer_path /var/log/td-agent/buffer/td
  use_ssl true
  num_threads 8
</match>

Fluentd tdlog plugin can store data into many database-table combinations by parsing td.dbname.tablename. So you can configure any database/table pairs in Logstash configuration files if you want.

More Information


Last modified: Feb 08 2016 12:57:01 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.