Saturday, December 1, 2012

Hive query shows ERROR "too many counters"

A hive job face the odd "Too many counters:" like

Ended Job = job_xxxxxx with exception 'org.apache.hadoop.mapreduce.counters.LimitExceededException(Too many counters: 201 max=200)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
Intercepting System.exit(1)

These happens when operators are used in queries (Hive Operators). Hive creates 4 counters per operator, max upto 1000, plus a few additional counters like file read/write, partitions and tables. Hence the number of counter required is going to be dependent upon the query. 

To avoid such exception, configure "mapreduce.job.counters.max" in mapreduce-site.xml to a value above 1000. Hive will fail when he is hitting the 1k counts, but other MR jobs not. A number around 1120 should be a good choice.

Using "EXPLAIN EXTENDED" and "grep -ri operators | wc -l" print out the used numbers of operators. Use this value to tweak the MR settings carefully.