query string parser plugin for Data Connector

The query string parser plugin for Data Connector transform query string data as key=value&key2=value2 line to {key: "value", key2: "value2"}. It takes 6 options as explained below. Detail is here.

Option Description
strip_quote If you have quoted lines file such as “foo=FOO&bar=BAR”, should be true for strip their quotes. (bool, default: true)
strip_whitespace Strip whitespace before parsing lines for any indented line parse correctly such as ‘ foo=FOO’. (bool, default: true)
capture Capture valuable text from each line using Regexp. Matched first pattern (a.k.a $1) will be used. See also partial-config.yml (string, default: nil)
charset Character encoding (eg. ISO-8859-1, UTF-8)
newline Newline character (CRLF, LF or CR)
columns Columns (see below)

The columns option declares the list of columns.

Column Description
name Name of the column
type Type of the column (see below).

long: 64-bit signed integers
timestamp: Date and time with nano-seconds precision
double: 64-bit floating point numbers
string: Strings

Table of Contents

Configuration

Set the parser section like an example:

in:
...
  parser:
    strip_quote: true
    strip_whitespace: true
    charset: UTF-8
    newline: CRLF
    type: query_string
    columns:
      - {name: id, type: long}
      - {name: name, type: string}
      - {name: price, type: double}
      - {name: datetime, type: timestamp}
out:
...

Use Case Example: Simple case

Example of Source data

"user_id=42&some_param=ABC&price=100.95&datetime=2014-05-14_10:11:25"
"user_id=43&some_param=EFG&price=200.04&datetime=2015-05-15_11:11:25"
"user_id=44&some_param=XYZ&price=300.34&datetime=2016-05-16_12:11:25"
"user_id=44&some_param=XYZ&price=531.24&datetime=2017-05-17_23:11:25"

Example of load.yml

in:
...
parser:
  strip_quote: true
  strip_whitespace: true
  charset: UTF-8
  newline: CRLF
  type: query_string
  columns:
    - {name: user_id, type: long}
    - {name: some_param, type: string}
    - {name: price, type: double}
    - {name: datetime, type: timestamp}
filters: []
out: {mode: append}
exec: {}

You can show parsed query string data by preview command.

$ td connector:preview load.yml
+--------------+-------------------+--------------+---------------------------+
| user_id:long | some_param:string | price:double | datetime:timestamp        |
+--------------+-------------------+--------------+---------------------------+
| 42           | "ABC"             | 100.95       | "2014-05-14 10:11:25 UTC" |
| 43           | "EFG"             | 200.04       | "2015-05-15 11:11:25 UTC" |
| 44           | "XYZ"             | 300.34       | "2016-05-16 12:11:25 UTC" |
| 44           | "XYZ"             | 531.24       | "2017-05-17 23:11:25 UTC" |
+--------------+-------------------+--------------+---------------------------+

If your data doesn’t have a time column you may add it using add_time filter option. More details at add_time filter plugin

$ td connector:issue load.yml --database <database name> --table <table name>  --time-column timestamp --auto-create-table

Use Case Example: Using “capture” option

Source data is.

site A: "user_id=42&some_param=ABC&price=100.95&datetime=2014-05-14_10:11:25"
site B: "user_id=43&some_param=EFG&price=200.04&datetime=2015-05-15_11:11:25"
site C: "user_id=44&some_param=XYZ&price=300.34&datetime=2016-05-16_12:11:25"
site A: "customer_id=44&some_param=XYZ&price=531.24&datetime=2017-05-17_23:11:25"

If you use following load.yml for it.

in:
...
parser:
  strip_quote: true
  strip_whitespace: true
  capture: 'site A: ("u.*")'
  charset: UTF-8
  newline: CRLF
  type: query_string
  columns:
    - {name: user_id, type: long}
    - {name: some_param, type: string}
    - {name: price, type: double}
    - {name: datetime, type: timestamp}
filters: []
out: {mode: append}
exec: {}

The capture option matches first row only, so result is below.

$ td connector:preview load.yml
+--------------+-------------------+--------------+---------------------------+
| user_id:long | some_param:string | price:double | datetime:timestamp        |
+--------------+-------------------+--------------+---------------------------+
| 42           | "ABC"             | 100.95       | "2014-05-14 10:11:25 UTC" |
+--------------+-------------------+--------------+---------------------------+
  • “site B” and “site C” don’t match “site A: ” part.
  • Second “site A” row doesn’t match “("u.*”)“ part because query string started from c (customer_id).

Last modified: Feb 22 2017 23:40:42 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.