Learn the cause and how to resolve the following error message.
Hive processing will at times move data from S3 into local storage or HDFS storage on the Hadoop cluster nodes. Hive jobs fail when disk space is exhausted on Hadoop cluster nodes-- either when one of the Hadoop nodes runs out of local space or when a job reaches its overall storage limit on HDFS.
In the past, under these conditions, a Treasure Data operator would be alerted to manually kill the job, and support would manually notify the customer and follow up. We are changing our handling of these situations to be more predictable and automated, and more in line with other job failures.
The new behavior is, such jobs will fail, and the job Output Log will contain a diagnostic message about the failure.
If the disk space is full on one Hadoop worker, the output log contains:
If the query exceeds the limit of HDFS storage, the output log contains:
To reduce disk usage by Hadoop jobs, limit the amount of input data, such as by applying TD_TIME_RANGE to restrict the time period of any subqueries that scan data, or consider applying more restrictive conditions to JOINs. See Performance Tuning for Hive for more suggestions.