Visit our new documentation site! This documentation page is no longer updated.

Writing Job Results into Elastic Cloud

This article explains how to write job results directly to your Elastic Cloud instance. This result output replaces Elasticsearch after Jun 30, 2017.

Table of Contents

Prerequisites

  • Basic knowledge of Treasure Data, including the toolbelt.
  • Data imported into Treasure Data, that you want to export into Elastic Cloud.
  • A working knowledge of SQL, Hive, or Presto.
  • A working Elastic Cloud instance. Recommend version 2.0 or greater.
  • You can use this with your own Elasticsearch instance running in your environment.

Also, a knowledge of the following Elastic Cloud hierarchy is helpful:

  • Cluster: A collection of one or more servers (nodes) that collectively holds and provides search and indexing functionality for your entire dataset.
  • Node: A single server that is part of (or all of) your cluster.
  • Index: This is analogous to a database. An index is a collection of documents with somewhat similar characteristics.
  • Type: This is analogous to a table. One or more types is defined within an index. A type is a logical category or partition of your index.
  • ID: A column containing each name for each row/record. In Elastic Cloud result export, this setting is optional.

Basic Usage



Visit Treasure Data console, go to query editor, and enter your query into the query editor.

Next: click Add for Result Export, and select Elastic Cloud. Complete all the information:



  • Nodes: comma separated list of nodes
  • Use SSL? whether to use SSL or not
  • Auth Method: select either Basic or None
  • Username: Username for basic authentication
  • Password: Password for above user
  • Mode: select either insert or replace
  • Index: the name of index
  • Type: the name of type
  • ID: (optional) the name of ID column

When you execute your query, Treasure Data query result will be automatically imported into Elastic Cloud. Currently, this supports “basic authentication” including “Security”(formally “Shield”) of Elastic Cloud. But the query result doesn’t support LDAP and Active Directory that are provided by “Security”.

Querying your results from Elastic Cloud instance

You can sanity check the data on your elastic search index with a simple query. Assuming the IP and port on your Elastic Cloud instance are ‘example.com:9200’, the following command can dump all your data to a file:

$ curl -XGET -i 'http://example.com:9200/*/_search' --user <username>:<password> > dump.txt

The result is a JSON file with the column names, column types, and content according to the data you’ve previously exported there. An example of what an Elasticsearch query might output is as follows:

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 2283

{"took":4,"timed_out":false,"_shards":{"total":15,"successful":15,"failed":0},"hits":{"total":100024,"max_score":1.0,"hits":[{"_index":"embulk_20160205-141457","_type":"embulk_type","_id":"AVKxyShGu46fqokIoDTf","_score":1...

For more information, go to the Elastic Cloud documentation and Elastic Cloud documentation.

FAQ for Elastic Cloud Result Output

Q: I can’t connect to Elastic Cloud

  • The most frequent cause is using the wrong port. Elastic Cloud result output uses TCP/9200 as a default. Elastic Cloud provides different a port for every user.

Q: I sometimes get a Timeout Exception

  • Try to increase Bulk actions and Bulk size. This will increase records in every insert requests and reduce HTTP requests. If you don’t get good results, consider upgrading your instance specs.

Q: Do you support Elastic Cloud authentication?

  • Yes. We support “Security (formerly Shield)” for Elastic Cloud with Elastic Cloud Result Output. We don’t support LDAP or other authentication methods.

Last modified: Jan 17 2018 19:25:24 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.