Visit our new documentation site! This documentation page is no longer updated.

Deleting rows from a Table

This article explains how to delete specific rows from a Treasure Data table.

Table of Contents

Prerequistes

  • Basic knowledge of Treasure Data, including the toolbelt
  • Understanding of Presto or Hive

Overview

Currently, it’s not possible to perform row level deletes in a Treasure Data table. Please consider this article as a work-around if it’s required to perform row level deletions.

If you would like to delete rows imported to table within a time range(based on time column), consider using partial delete feature.

Listed below are three methods:

Presto DELETE)

Using Presto/Hive: Efficient for small tables as it will reload full table

Alternate Method: Efficient for huge tables. Here we take superset of rows which would also include rows to be deleted filtered based on time column to process it further.

Using Presto DELETE Statements

Presto Delete allows you to issues DELETE statement queries against any table in Treasure Data, with the following syntax:

DELETE FROM table_name [ WHERE condition ]

Please read the full DELETE documentation for more details & limitations.

Using Presto/Hive

Step 1: Create a temporary table with required data from your original table.

Using Presto:

CREATE TABLE tmp AS SELECT * from original_table where col1 is not null 

Using Hive:

Untitled-3
Currently, TD Hive doesn't support CREATE TABLE. Please create table manually using console or td CLI.

Create a table using td command td table:create or using console named tmp

INSERT INTO TABLE tmp select * from original_table where col1 is not null

This query, for example would remove all the rows with col1 as nulls.

Step 2: Use td CLI to swap table names of tmp and original_table

Using td CLI’s, table swap command you can interchange table names. Please see here for more details or type td help table:swap at command prompt.

td table:swap tmp original_table

Step 3: Delete tmp table

After ensuring original_table has required data, you may proceed to delete the tmp table using Console or td CLI

Alternate method using partial delete and CREATE TABLE/INSERT INTO

Step 1: Load data from your original table to a temporary table within a time range.

Using Presto or Hive, load data from your original table to a temporary table within a time range you want to delete specific rows.

Using Presto:

CREATE TABLE tmp AS SELECT * from original_table where td_time_range(time, '2016-06-10 12:00:00', '2016-06-10 14:00:00')  

Using Hive:

Untitled-3
Currently, TD Hive doesn't support CREATE TABLE. Please create table manually using console or td CLI.

Create a table using td command td table:create or using console named tmp

INSERT INTO TABLE tmp select * from original_table where td_time_range(time, '2016-06-10 12:00:00', '2016-06-10 14:00:00')

Step 2: Partial delete

Now, using td CLI partial delete command delete data from your original table in the same timerange.

td table:partial_delete example_db original_table --from 1465560000 --to 1465567200

Step 3: Now using Hive or Presto, re-load required data into your original table

Using Presto:

INSERT INTO original_table select * from tmp where col1 is not null

Using Hive:

INSERT INTO TABLE original_table select * from tmp where col1 is not null

After ensuring original_table has required data, you may proceed to delete the tmp table using Console or td CLI


Last modified: Jun 30 2017 07:16:04 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.