Avro Parser Function
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

The Avro parser plugin for Treasure Data's integrations parses files containing Avro binary data. The following Treasure Data integrations support the Avro parser:

The Treasure Data Avro parser supports the following compression codecs:

Deflate
Snappy

The parser is available for use from either the Treasure Data Console or the Treasure Data CLI.

Using the Avro Parser Function from the Treasure Data Console

To use the Avro parser function in the Treasure Data console you will need to create an authentication for one of the supported integrations. Afterwards, you can use that authentication to create a source. In the process of creating the source, you will have the opportunity to either adjust or preview the data in the Avro file.

In step 3 of the Create Source interface, Data Settings, the TD Console should automatically select the Avro parser. If it does not, you can manually select it from the Parser > Type drop-down menu.

You also have the option to use the delete icon to prevent specific rows from being imported.

In step 5 of the Create Source interface, Data Preview, the TD Console displays a preview of the data.

Using the Avro Parser Function from the Treasure Data CLI

Using the Treasure Data CLI , you can import Avro data from the command line using the td connector command. This command takes a configuration file in YAML format as input. For example, using the sample file, users.avro, you could create a file named importAvroFTP.yml to import the sample file from an FTP site. The configuration file might look something like this:

in:
  type: ftp
  host: 10.100.100.10
  port: 21
  user: user1
  password: "password123"
  path_prefix: /misc/users.avro
  parser:
    type: avro
    columns:
    - {name: name, type: string}
    - {name: favorite_number, type: long}
    - {name: favorite_color, type: string}
out:
  mode: append

Manually Defining the Columns for the Avro Import

The columns section of the file is where you can define the schema of the Avro file by specifying key value pairs for name, type, and (in the case of a timestamp) format.

Key Value Pairs for Column Array

Column	Description
`name`	Name of the column.
`type`	Type of the column: `boolean` — true or false `long` — 64-bit signed integers `double` — 64-bit floating-point numbers `string` `timestamp` — date and time with nanosecond precision `json`
`format`	Only valid when the column type is `timestamp`.

Here are examples of how columns can be defined:

- {name: first_name, type: string}
- {name: favorite_number, type: long}
- {name: last_access, type: timestamp, format: '%Y-%m-%d %H:%M:%S.%N'}

After you have set up your configuration file, use the td connector command to perform the import. Here is an example of how that might look.

The following example assumes that the users.avro file is available on your FTP site. Additionally, for the manual import process to work, a time column is added to the importAvroFTP.yml file.

importAvroFTP.yml

in:
  type: ftp
  host: 10.100.100.10
  port: 21
  user: user1
  password: "password123"
  path_prefix: /misc/users.avro
  parser:
    type: avro
    columns:
    - {name: name, type: string}
    - {name: favorite_number, type: long}
    - {name: favorite_color, type: string}
    - {name: time, type: timestamp}
out:
  mode: append

td connector:preview importAvroFTP.yml

td connector:issue importAvroFTP.yml \
--database wt_avro_db \
--table wt_avro_table --auto-create-table

To see the contents of the table, you can use a TD Query command. Because the output of the query command is verbose, much of the response has been removed from the following example:

td query -d wt_avro_db -T presto -w 'select * from wt_avro_table'

Allowing the Parser to Guess the Columns for the Avro Import

You also have the option to let the parser make a best guess at the columns and data types being used in any particular file. The parser makes it "guesses" based on the conversion table shown here.

Default Type Conversions from Avro to TD

Avro Type	TD Data type
String	String
Bytes	String
Fixed	String
Enum	String
Null	String
Int	Long
Long	Long
Float	Double
Double	Double
Boolean	Boolean
Map	JSON
Array	JSON
Record	JSON

Example Configurations for Importing Avro Data from S3

importAvroS3.yml

in:
  type: s3
  access_key_id: <access key>
  secret_access_key: <secret access key>
  bucket: <bucket name>
  path_prefix: users.avro
  parser:
    type: avro
    columns:
    - {name: name, type: string}
    - {name: favorite_number, type: long}
    - {name: favorite_color, type: string}
out: {mode: append}
exec: {}

Example of Running td connector:guess on an Avro File Hosted on S3

$ td connector:guess config.yml -o load.yml

---
in:
  type: s3
  access_key_id: <access key>
  secret_access_key: <secret access key>
  bucket: <bucket name>
  path_prefix: users.avro
  parser:
    charset: UTF-8
    newline: CR
    type: avro
    columns:
    - {name: name, type: string}
    - {name: favorite_number, type: long}
    - {name: favorite_color, type: string}
out: {}
exec: {}
filters:
- from_value: {mode: upload_time}
  to_column: {name: time}
  type: add_time

Avro Parser FunctionCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code

Using the Avro Parser Function from the Treasure Data Console

Using the Avro Parser Function from the Treasure Data CLI

Example Configurations for Importing Avro Data from S3

Was this helpful?

Avro Parser Function
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code