# Avro Parser Function

The Avro parser plugin for Treasure Data's integrations parses files containing Avro binary data. The following Treasure Data integrations support the Avro parser:

* [SFTP (version 1)](/int/sftp-server-import-integration)
* [Amazon S3 v2](/int/amazon-s3-import-integration-v2)
* [Box](/int/box-import-integration)
* [OneDrive](/int/onedrive-import-integration)
* [Microsoft Azure](/int/microsoft-azure-blob-storage-import-integration)
* [FTP](/int/ftp-import-integration-cli)


The Treasure Data Avro parser supports the following compression codecs:

* Deflate
* Snappy


The parser is available for use from either the Treasure Data Console or the Treasure Data CLI.

## Using the Avro Parser Function from the Treasure Data Console

To use the Avro parser function in the Treasure Data console you will need to create an authentication for one of the supported integrations. Afterwards, you can use that authentication to create a source. In the process of creating the source, you will have the opportunity to either adjust or preview the data in the Avro file.

In step 3 of the Create Source interface, Data Settings, the TD Console should automatically select the Avro parser. If it does not, you can manually select it from the **Parser > Type** drop-down menu.

![](/assets/avrodatasettingsparserselect.743711139ccddef7220961a76e18208cf96ec42b79d0edf6b3deebc878c5170a.ef656343.png)

You also have the option to use the delete icon to prevent specific rows from being imported.

![](/assets/avrodatasettingsrowdeleteoption.6bc13a70332f17823e6358171b20ce3be3b06468eca93a021e95c02beb295bee.ef656343.png)

In step 5 of the Create Source interface, Data Preview, the TD Console displays a preview of the data.

![](/assets/avrodatapreview.88dc4d96af05add664219cff4c3fcea81767d32205df4f397d724e970fdb8d40.ef656343.png)

## Using the Avro Parser Function from the Treasure Data CLI

Using the Treasure Data CLI , you can import Avro data from the command line using the td connector command. This command takes a configuration file in `YAML` format as input. For example, using the sample file, [users.avro](/assets/users.a3d856bb5714a03c57f554c9c23db539fa6dd108ed36efb8748bd6dab90a8405.08e53790.avro), you could create a file named `importAvroFTP.yml` to import the sample file from an FTP site. The configuration file might look something like this:


```bash
in:
  type: ftp
  host: 10.100.100.10
  port: 21
  user: user1
  password: "password123"
  path_prefix: /misc/users.avro
  parser:
    type: avro
    columns:
    - {name: name, type: string}
    - {name: favorite_number, type: long}
    - {name: favorite_color, type: string}
out:
  mode: append
```

**Manually Defining the Columns for the Avro Import**

The **`columns`** section of the file is where you can define the schema of the Avro file by specifying key value pairs for `name`, `type`, and (in the case of a timestamp) format.

**Key Value Pairs for Column Array**

| Column | Description |
|  --- | --- |
| `name` | Name of the column. |
| `type` | Type of the column:- `boolean` — true or false
- `long` — 64-bit signed integers
- `double` — 64-bit floating-point numbers
- `string`
- `timestamp` — date and time with nanosecond precision
- `json`

 |
| `format` | Only valid when the column type is `timestamp`. |


Here are examples of how columns can be defined:


```yaml
- {name: first_name, type: string}
- {name: favorite_number, type: long}
- {name: last_access, type: timestamp, format: '%Y-%m-%d %H:%M:%S.%N'}
```

After you have set up your configuration file, use the `td connector` command to perform the import. Here is an example of how that might look.

The following example assumes that the [users.avro](/assets/users.a3d856bb5714a03c57f554c9c23db539fa6dd108ed36efb8748bd6dab90a8405.08e53790.avro) file is available on your FTP site. Additionally, for the manual import process to work, a time column is added to the importAvroFTP.yml file.


```yaml importAvroFTP.yml
in:
  type: ftp
  host: 10.100.100.10
  port: 21
  user: user1
  password: "password123"
  path_prefix: /misc/users.avro
  parser:
    type: avro
    columns:
    - {name: name, type: string}
    - {name: favorite_number, type: long}
    - {name: favorite_color, type: string}
    - {name: time, type: timestamp}
out:
  mode: append
```


```bash
td connector:preview importAvroFTP.yml
```


```bash
td connector:issue importAvroFTP.yml \
--database wt_avro_db \
--table wt_avro_table --auto-create-table
```

To see the contents of the table, you can use a TD Query command. Because the output of the query command is verbose, much of the response has been removed from the following example:


```bash
td query -d wt_avro_db -T presto -w 'select * from wt_avro_table'
```

**Allowing the Parser to Guess the Columns for the Avro Import**

You also have the option to let the parser make a best guess at the columns and data types being used in any particular file. The parser makes it "guesses" based on the conversion table shown here.

**Default Type Conversions from Avro to TD**

| **Avro Type** | **TD Data type** |
|  --- | --- |
| String | String |
| Bytes | String |
| Fixed | String |
| Enum | String |
| Null | String |
| Int | Long |
| Long | Long |
| Float | Double |
| Double | Double |
| Boolean | Boolean |
| Map | JSON |
| Array | JSON |
| Record | JSON |


## Example Configurations for Importing Avro Data from S3


```yaml importAvroS3.yml
in:
  type: s3
  access_key_id: <access key>
  secret_access_key: <secret access key>
  bucket: <bucket name>
  path_prefix: users.avro
  parser:
    type: avro
    columns:
    - {name: name, type: string}
    - {name: favorite_number, type: long}
    - {name: favorite_color, type: string}
out: {mode: append}
exec: {}
```

**Example of Running td connector:guess on an Avro File Hosted on S3**


```bash
$ td connector:guess config.yml -o load.yml
```


```yaml
---
in:
  type: s3
  access_key_id: <access key>
  secret_access_key: <secret access key>
  bucket: <bucket name>
  path_prefix: users.avro
  parser:
    charset: UTF-8
    newline: CR
    type: avro
    columns:
    - {name: name, type: string}
    - {name: favorite_number, type: long}
    - {name: favorite_color, type: string}
out: {}
exec: {}
filters:
- from_value: {mode: upload_time}
  to_column: {name: time}
  type: add_time
```