Writing Job Results into Elasticsearch

This article explains how to write job results directly to your Elasticsearch.

Table of Contents

Prerequisites

  • Basic knowledge of Treasure Data, including the toolbelt.
  • Data imported into Treasure Data, that you wish to export into Elasticsearch.
  • A working knowledge of SQL, Hive, or Presto.
  • A working Elasticsearch instance. Must be version 2.0 or greater.

Also, a knowledge of the following Elasticsearch hierarchy is helpful:

  • Cluster: A collection of one or more servers (nodes) that collectively holds and provides search and indexing functionality for your entire dataset.
  • Node: A single server that is part of (or all of) your cluster.
  • Index: This is analogous to a database. An index is a collection of documents with somewhat similar characteristics.
  • Type: This is analogous to a table. One or more types is defined within an index. A type is a logical category or partition of your index.
  • ID: A column containing each name for each row/record. In Elasticsearch result export, this setting is optional.

Basic Usage



Visit Treasure Data console, go to query editor, and enter your query into the query editor.

Next: click Add for Result Export, and select Elasticsearch. Please fill out all the information below.



  • Mode: select either insert or replace
  • Nodes: comma separated list of nodes
  • Cluster Name: the name of the cluster
  • Index: the name of index
  • Type: the name of type
  • ID: (optional) the name of ID column

Once you execute your query, Treasure Data query result will be automatically imported into Elasticsearch. Currently, this works without authentication, although support for plugins like “shield” and “basic authentication” is under consideration.

Querying your results from Elasticsearch

You can sanity check the data on your elastic search index with a simple query. Assuming the IP and port on your elasticsearch instance are ‘52.9.86.225:9200’, the following command can dump all your data to a file:

$ curl -XGET -i 'http://52.9.86.225:9200/*/_search' > dump.txt

The result would be a JSON file with the column names, column types, and content according to the data you’ve previously exported there. An example of what an elasticsearch query may output is shown below.

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 2283

{"took":4,"timed_out":false,"_shards":{"total":15,"successful":15,"failed":0},"hits":{"total":100024,"max_score":1.0,"hits":[{"_index":"embulk_20160205-141457","_type":"embulk_type","_id":"AVKxyShGu46fqokIoDTf","_score":1...

For more information, please check the Elasticsearch documentation.


Last modified: Feb 24 2017 09:27:52 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.