PARQUET_FILE_SIZE Query Option
Specifies the maximum size of each Parquet data file produced by Impala INSERT statements.
Syntax:
Specify the size in bytes, or with a trailing m or g character to indicate megabytes or gigabytes. For example:
-- 128 megabytes. set PARQUET_FILE_SIZE=134217728 INSERT OVERWRITE parquet_table SELECT * FROM text_table; -- 512 megabytes. set PARQUET_FILE_SIZE=512m; INSERT OVERWRITE parquet_table SELECT * FROM text_table; -- 1 gigabyte. set PARQUET_FILE_SIZE=1g; INSERT OVERWRITE parquet_table SELECT * FROM text_table;
Usage notes:
With tables that are small or finely partitioned, the default Parquet block size (formerly 1 GB, now 256 MB in Impala 2.0 and later) could be much larger than needed for each data file. For INSERT operations into such tables, you can increase parallelism by specifying a smaller PARQUET_FILE_SIZE value, resulting in more HDFS blocks that can be processed by different nodes.
Type: numeric, with optional unit specifier
Currently, the maximum value for this setting is 1 gigabyte (1g). Setting a value higher than 1 gigabyte could result in errors during an INSERT operation.
Default: 0 (produces files with a target size of 256 MB; files might be larger for very wide tables)
Isilon considerations:
isi hdfs settings modify --default-block-size=256MB
For information about the Parquet file format, and how the number and size of data files affects query performance, see Using the Parquet File Format with Impala Tables.
<< PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (CDH 5.8 or higher only) | ©2016 Cloudera, Inc. All rights reserved | PARQUET_READ_STATISTICS Query Option (CDH 5.12 or higher only) >> |
Terms and Conditions Privacy Policy |