Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Learn the cause and how to resolve the following error message.

Code Block
languageruby
linenumberstrue
Diagnostic Messages for this Task:
Error: Task exceeded the limits: 
    org.apache.hadoop.mapred.Task$TaskReporter$TaskLimitException: 
    too much data in local scratch dir=/mnt4/hadoop/yarn/cache/yarn/nm-local-dir/usercache/1/appcache/application_1522879910596_701143. 
    current size is 322234851654 the limit is 322122547200
Error: java.lang.RuntimeException: 
   org.apache.hadoop.hive.ql.metadata.HiveException:
   org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: 
   The DiskSpace quota of /mnt/hive/hive-1/0 is exceeded: 
   quota = 8246337208320 B = 7.50 TB but diskspace consumed = 8246728346553 B = 7.50 TB

...

Description

Hive processing will at times move data from S3 into local storage or HDFS storage on the Hadoop cluster nodes. Hive jobs fail when disk space is exhausted on Hadoop cluster nodes-- either when one of the Hadoop nodes runs out of local space or when a job reaches its overall storage limit on HDFS.

...

If the disk space is full on one Hadoop worker, the output log contains:

Code Block
languageruby
linenumberstrue
Diagnostic Messages for this Task:
Error: Task exceeded the limits: 
    org.apache.hadoop.mapred.Task$TaskReporter$TaskLimitException: 
    too much data in local scratch dir=/mnt4/hadoop/yarn/cache/yarn/nm-local-dir/usercache/1/appcache/application_1522879910596_701143. 
    current size is 322234851654 the limit is 322122547200

...

If the query exceeds the limit of HDFS storage, the output log contains:

Code Block
languageruby
linenumberstrue
Error: java.lang.RuntimeException: 
   org.apache.hadoop.hive.ql.metadata.HiveException:
   org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: 
   The DiskSpace quota of /mnt/hive/hive-1/0 is exceeded: 
   quota = 8246337208320 B = 7.50 TB but diskspace consumed = 8246728346553 B = 7.50 TB

...

Resolution

To reduce disk usage by Hadoop jobs, limit the amount of input data, such as by applying TD_TIME_RANGE to restrict the time period of any subqueries that scan data, or consider applying more restrictive conditions to JOINs. See Performance Tuning for Hive for more suggestions.