Tuning S3 performance
You can use the S3a protocol to store data for Db2® Big SQL tables in an object store. With object storage, network performance is critical.
If you plan to use Db2 Big SQL with object storage, try to collocate your cluster and the object store in the same data center to minimize network latency and maximize network throughput.
- When Db2 Big SQL table data is on HDFS, the ORC, Parquet, or text file format is recommended.
- When Db2 Big SQL table data is in an object store, use a compact file format such as ORC or Parquet to minimize network traffic.
Be sure to review the Db2 Big SQL S3a connector settings and tune them for your environment. The following table provides some examples:
S3a setting | Default value | Tuned value | Notes® |
---|---|---|---|
fs.s3a.buffer.dir | ${hadoop.tmp.dir}/s3a | For example, suppose that HDFS uses N disks: /data1 to
/dataN . Set fs.s3a.buffer.dir to
/data1/tmp/s3a,/data2/tmp/s3a,...,/dataN/tmp/s3a. |
Spread the temporary files across the available data disks that are used for HDFS. |
fs.s3a.connection.maximum | 15 | 250 | Tune fs.s3a.connection.maximum and fs.s3a.threads.core together. |
fs.s3a.threads.core | 15 | 250 | The value of fs.s3a.threads.max is 256. |