Thursday, September 29, 2016

Hadoop Format Speed Tests: Parquet,ORC, w;w/o compression

For Hadoop/HDFS, which format is faster?

ORC vs RCfile


According to a posting on the Hortonworks site, both the compression and the performance for ORC files are vastly superior to both plain text Hive tables and RCfile tables. For compression, ORC files are listed as 78% smaller than plain text files. And for performance, ORC files support predicate pushdown and improved indexing that can result in a 44x (4,400%) improvement. Needless to say, for Hive, ORC files will gain in popularity.  (you can read the posting here: ORC File in HDP 2: Better Compression, Better Performance).


Parquet vs ORC


On Stackoverflow, contributor Rahul posted an extensive list of results he did comparing ORC vs. Parquet, along with different compressions.  You can find the full results here:  http://stackoverflow.com/questions/32373460/parquet-vs-orc-vs-orc-with-snappy.

Below are the results that were posted by Rahul:

       
Table A - Text File Format- 2.5GB

Table B - ORC - 652MB

Table C - ORC with Snappy - 802MB

Table D - Parquet - 1.9 GB

Parquet was worst as far as compression for my table is concerned.

My tests with the above tables yielded following results.

Row count operation

Text Format Cumulative CPU - 123.33 sec

Parquet Format Cumulative CPU - 204.92 sec

ORC Format Cumulative CPU - 119.99 sec

ORC with SNAPPY Cumulative CPU - 107.05 sec

Sum of a column operation

Text Format Cumulative CPU - 127.85 sec

Parquet Format Cumulative CPU - 255.2 sec

ORC Format Cumulative CPU - 120.48 sec

ORC with SNAPPY Cumulative CPU - 98.27 sec

Average of a column operation

Text Format Cumulative CPU - 128.79 sec

Parquet Format Cumulative CPU - 211.73 sec

ORC Format Cumulative CPU - 165.5 sec

ORC with SNAPPY Cumulative CPU - 135.45 sec

Selecting 4 columns from a given range using where clause

Text Format Cumulative CPU - 72.48 sec

Parquet Format Cumulative CPU - 136.4 sec

ORC Format Cumulative CPU - 96.63 sec

ORC with SNAPPY Cumulative CPU - 82.05 sec 

       
 

No comments: