Skip to content
Last updated

DELETE Statement Syntax

A DELETE statement is used to delete rows from a table.

DELETE statements are helpful for deleting unnecessary rows like bad data, duplicated data, outdated data, etc.

DELETE statement is supported only by Trino query engine.

Syntax

DELETE FROM <table_name> WHERE <condition>

Where:

  • <table_name> is the name of the table for deletion
  • <condition> is the clause to specify a search condition

DELETE statement without WHERE clause isn't allowed to reduce the occurrence of unintentional data deletion.

Examples

Delete all line items shipped by air:

DELETE FROM lineitem WHERE shipmode = 'AIR'

Delete all event logs from a July 1st, 2017 UTC:

DELETE FROM lineitem WHERE TD_TIME_RANGE(time, '2017-07-01','2017-07-02')

Limitations

No DELETE Without a WHERE Clause

To reduce the occurrence of unintentional data deletion, DELETE statements require you to include a WHERE clause.

Error sample:

Does not support DELETE statement without WHERE condition

DELETE against Data Tank

DELETE statements cannot be issued against data stored in Data Tank. Connect directly to the PostgreSQL Data Tank to issue DELETE statements on data stored there.

Limit number of partitions to delete at once

A single DELETE query can delete up to 500k partitions. A DELETE query fails when it exceeds the limit.

Reduce affected rows of DELETE statement

When a DELETE query fails due to the limit exceeded error or timeout error, you should rewrite the DELETE statement to reduce affected rows. The most effective way to do this is to include a WHERE clause that limits the time range for the DELETE with TD_TIME_RANGE or TD_INTERVAL. For example, if you have a year's worth of game history data, instead of

DELETE FROM game_history WHERE player_id=1000;

try

DELETE FROM game_history
  WHERE player_id=1000
  AND TD_TIME_RANGE(time, '2018-01-01', '2018-02-01','PDT')

and then delete more time ranges until all rows are deleted.

DELETE statement Timeout Error

A DELETE query can take several hours when affected rows of the DELETE statement are very large. The query may time out and the job fail in this case. If the job fails by a timeout error, you should rewrite the DELETE statement to affect fewer rows. Refer to Reduce affected rows of DELETE statement.

Concurrent DELETE statements on the same table may conflict

When multiple DELETE statements are executed concurrently on the same table, the execution time may be longer or the query may fail. It is because of the conflict of underlying partition file deletion. The general guideline is to avoid concurrent DELETE statement execution since the possibility of conflict is unpredictable. The possibility depends on various factors, like the condition of the DELETE statement, the data volume and the data distribution of the table.

When a DELETE statement fails due to conflict, please retry the query.

DELETE Statement Resource Consumption

DELETE queries issue jobs and consume resources. Because of the storage and indexing methods used in Treasure Data, a DELETE query can be resource intensive.

To prevent a DELETE query from performing a full table scan, you should use:

  • a time expression, like TD_TIME_RANGE or TD_INTERVAL, if the table uses default time-based partitioning
  • an equality predicate on all partition keys, if the table uses user-defined partitioning

For example, if your table is partitioned by time, and you are trying to delete a set of users that were created in June 2018, include the time restriction even if that's not a critical parameter for your DELETE statement.

For example, you should:

DELETE FROM logtable WHERE userid IN (1234, 1235, 1236) AND TD_TIME_RANGE(time, '2017-06-01','2017-07-01')

Including the time restriction significantly improves the speed of your query, and reduces the resources this query consumes.

Undo Delete Statements

If you want to rollback an executed delete statement and recover deleted data, you must contact Treasure Data technical support and provide the jobID of the job that contained the executed DELETE statement. In many cases, you can undo a delete statement. However, certain situations prevent the rollback of DELETE statements.

Situations that prevent rollback include:

  • When new partitions have been deleted by another DELETE statement.
  • When original partitions have been discarded, which can happen after a retention period.
  • When a table is dropped and a new table with the same name is created.
  • When new partitions have been further modified or replaced, which can happen as a result of internal storage maintenance processes.