Object storage file formats#
Object storage connectors support one or more file formats specified by the underlying data source.
ORC format configuration properties#
The following properties are used to configure the read and write operations with ORC files performed by supported object storage connectors:
| Property Name | Description | Default | 
|---|---|---|
| 
 | Sets the default time zone for legacy ORC files that did not declare a time zone. | JVM default | 
| 
 | Access ORC columns by name. By default, columns in ORC files are accessed by
their ordinal position in the Hive table definition. The equivalent catalog
session property is  | 
 | 
| 
 | Enable bloom filters for predicate pushdown. | 
 | 
| 
 | Allow reads on ORC files with short zone ID in the stripe footer. | 
 | 
Parquet format configuration properties#
The following properties are used to configure the read and write operations with Parquet files performed by supported object storage connectors:
| Property Name | Description | Default | 
|---|---|---|
| 
 | Adjusts timestamp values to a specific time zone. For Hive 3.1+, set this to UTC. | JVM default | 
| 
 | Access Parquet columns by name by default. Set this property to  | 
 | 
| 
 | Percentage of parquet files to validate after write by re-reading the whole
file. The equivalent catalog session property is
 | 
 | 
| 
 | Maximum size of pages written by Parquet writer. | 
 | 
| 
 | Maximum values count of pages written by Parquet writer. | 
 | 
| 
 | Maximum size of row groups written by Parquet writer. | 
 | 
| 
 | Maximum number of rows processed by the parquet writer in a batch. | 
 | 
| 
 | Whether bloom filters are used for predicate pushdown when reading Parquet
files. Set this property to  | 
 | 
| 
 | Skip reading Parquet pages by using Parquet column indices. The equivalent
catalog session property is  | 
 | 
| 
 | Ignore statistics from Parquet to allow querying files with corrupted or
incorrect statistics. The equivalent catalog session property is
 | 
 | 
| 
 | Sets the maximum number of rows read in a batch. The equivalent catalog
session property is named  | 
 | 
| 
 | Data size below which a Parquet file is read
entirely. The equivalent catalog session property is named
 | 
 | 
| 
 | Enable using Java Vector API (SIMD) for faster decoding of parquet files.
The equivalent catalog session property is
 | 
 |