Deleting rows from a Table

This article explains how to delete specific rows from a Treasure Data table.

Table of Contents

Prerequistes

  • Basic knowledge of Treasure Data, including the toolbelt
  • Understanding of Presto or Hive

Overview

Currently, it’s not possible to perform row level deletes in a Treasure Data table. Please consider this article as a work-around if it’s required to perform row level deletions.

If you would like to delete rows imported to table within a time range(based on time column), consider using partial delete feature.

Listed below are two methods:

Using Presto/Hive: Efficient for small tables as it will reload full table

Alternate Method: Efficient for huge tables. Here we take superset of rows which would also include rows to be deleted filtered based on time column to process it further.

Using Presto/Hive

Step 1: Create a temporary table with required data from your original table.

Using Presto:

CREATE TABLE tmp AS SELECT * from original_table where col1 is not null 

Using Hive:

Untitled-3
Currently, TD Hive doesn't support CREATE TABLE. Please create table manually using console or td CLI.

Create a table using td command td table:create or using console named tmp

INSERT INTO TABLE tmp select * from original_table where col1 is not null

This query, for example would remove all the rows with col1 as nulls.

Step 2: Use td CLI to swap table names of tmp and original_table

Using td CLI’s, table swap command you can interchange table names. Please see here for more details or type td help table:swap at command prompt.

td table:swap tmp original_table

Step 3: Delete tmp table

After ensuring original_table has required data, you may proceed to delete the tmp table using Console or td CLI

Alternate method using partial delete and CREATE TABLE/INSERT INTO

Step 1: Load data from your original table to a temporary table within a time range.

Using Presto or Hive, load data from your original table to a temporary table within a time range you want to delete specific rows.

Using Presto:

CREATE TABLE tmp AS SELECT * from original_table where td_time_range(time, '2016-06-10 12:00:00', '2016-06-10 2:00:00')  

Using Hive:

Untitled-3
Currently, TD Hive doesn't support CREATE TABLE. Please create table manually using console or td CLI.

Create a table using td command td table:create or using console named tmp

INSERT INTO TABLE tmp select * from original_table where td_time_range(time, '2016-06-10 12:00:00', '2016-06-10 2:00:00')

Step 2: Partial delete

Now, using td CLI partial delete command delete data from your original table in the same timerange.

td table:partial_delete example_db original_table --from '2016-06-10 12:00:00' --to '2016-06-10 2:00:00'

Step 3: Now using Hive or Presto, re-load required data into your original table

Using Presto:

INSERT INTO original_table select * from tmp where col1 is not null

Using Hive:

INSERT INTO TABLE original_table select * from tmp where col1 is not null

After ensuring original_table has required data, you may proceed to delete the tmp table using Console or td CLI


Last modified: Mar 29 2017 03:58:34 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.