Block size
Choose the file system block size based on the projected workload of the file system and the type of storage that it uses.
General
In a file system, a block is the largest contiguous amount of disk space that can be allocated to a file and also the largest amount of data that can be transferred in a single I/O operation. The block size determines the maximum size of a read request or write request that a file system sends to the I/O device driver. Blocks are composed of an integral number of subblocks, which are the smallest unit of contiguous disk space that can be allocated to a file. Files larger than one block are stored in some number of full blocks plus any subblocks that might be required after the last block to hold the remaining data. Files smaller than one block size are stored in one or more subblocks.
- The block size, subblock size, and number of subblocks per block of a file system are set when the file system is created and cannot be changed later.
- All the data blocks in a file system have the same block size and the same subblock size. Data blocks and subblocks in the system storage pool and those in user storage pools have the same sizes. An example of a valid block size and subblock size is a 4 MiB block with an 8 KiB subblock.
- All the metadata blocks in a file system have the same block size and the same subblock size.
The metadata blocks and subblocks are set to the same sizes as data blocks and subblocks, unless the
--metadata-block-size parameter is specified. Note: The --metadata-block-size parameter that is used to specify a different metadata block size than the data block size is being deprecated. This option is no longer required to use for performance improvements for file systems with file system format 5.0.0 or later and it will be removed in a future release.
- If the system storage pool contains only metadataOnly NSDs, the
metadata block can be set to a different size than the data block size with the
--metadata-block-size parameter. Note: This setting can result in a change in the data subblock size and in the number of subblocks in a data block, if the block size (-B parameter) is different from the --metadata-block-size. For an example, see Scenario 3 in a later bullet in this list.
- The data blocks and metadata blocks must have the same number of subblocks, even when the data block size and the metadata block size are different. See Scenario 3 in the next bullet.
- The number of subblocks per block is derived from the smallest block size of any storage pool in
the file system, including the system metadata pool. Consider the following example
scenarios:Note: For a table of the valid block sizes and subblock sizes, see Table 1 in mmcrfs command.
- Scenario 1: The file system is composed of a single system storage pool with all the NSD usage configured as dataAndMetadata. The file system block size is set with the -B parameter to 16MiB. As a result, the block size for both metadata and data blocks is 16 MiB. The metadata and data subblock size is 16 KiB.
- Scenario 2: The file system is composed of multiple storage pools with system storage pool NSD usage configured as metadataOnly and user storage pool NSD usage configured as dataOnly. The file system block size is set (-B parameter) to 16 MiB. The --metadata-block-size is also set to 16 MiB. As a result, the metadata and data block size is 16 MiB. The metadata and data subblock size is 16 KiB.
- Scenario 3: The file system is composed of multiple storage pools with the system storage pool NSD usage configured as metadataOnly and the user storage pool NSD usage configured as dataOnly. The file system block size is set (-B parameter) to 16 MiB, which has a subblock size of 16 KiB, but the --metadata-block-size is set to 1 MiB, which has a subblock size of 8 KiB. The number of subblocks across the pools of a file system needs to be the same and this is calculated based on the storage pool with smallest block size. In this case, the system pool has the smallest block size (1 MiB). The number of subblocks per block in the system storage pool is 128 (1 MiB block size / 8 KiB subblock size = 128 subblocks per block). The other storage pools inherit the 128-subblocks-per-block setting and their subblock size is recalculated based on 128 subblocks per block. In this case the subblock size of the user storage pool is recalculated as 128 KiB (16 MiB / 128 subblocks per block = 128 KiB subblock size)
- The block size cannot exceed the value of the cluster attribute maxblocksize, which can be set by the mmchconfig command.
Test actual performance with different block sizes
The ideal file system block size can be determined by running performance tests with different file system block sizes using actual workloads or representative benchmarks that match the file sizes that you expect to use in production.
Factors that can affect performance
For more performance information, see the IBM Storage Scale white papers in the Techdocs Library (www.ibm.com/support/techdocs/atsmastr.nsf/Web/WhitePapers).
-
- RAID stripe size
- The RAID stripe size is the size of the sequential block of data that a disk array writes to or
reads from each storage volume (the block device corresponding to an NSD). For better performance,
it is a good idea to set the file system block size to the same value as either the RAID stripe size
or a multiple of the RAID stripe size. If the block size is not equal to or a multiple of the RAID
stripe size, then the file system performance can be severely degraded, especially for write
requests, because of the increase in read-modify-write operations that occur in the underlying
hardware RAID controllers.Note: The block size for IBM Storage Scale RAID that is implemented with vdisk is specifically designed for optimal behavior. For IBM Storage Scale RAID, the block size must be equal to the vdisk track size. For more information, see the online documentation available for IBM Storage Scale RAID Documentation.
- File system size
- For file systems larger than 100 TiB, it is a good idea to set the block size to at least 256
KiB. The default block size is 4 MiB in IBM
Storage Scale
. Generally larger block sizes provide better performance.
- Large block size and page pool
- For block sizes larger than the default size of 4 MiB, it is a good
idea to increase the page pool size in proportion to the block size. The reason is that the
efficiency of internal optimizations that rely on caching file data in the GPFS page pool depends more on the number of blocks that are
cached than on the amount data that is cached. A larger block size results in fewer cached
blocks.
- Variation in file size
- For a file system that contain files of many different sizes, the file system delivers better
overall performance from selecting a larger block size, 4 MiB or greater, rather than a smaller one.
It is true that with a larger block size some space is wasted when a small file is written into a
large subblock, because the unused space in the subblock cannot be written to with data from another
file unless the block is freed.
However, the amount of waste in the general case is likely to be insignificant overall, because the smaller files occupy a smaller percentage of the storage space in the file system compared to the space occupied by the larger files (files on the order of GiBs).
- Application I/O patterns
- The effect of block size on file system performance greatly depends on the application I/O pattern:
- A larger block size is often beneficial for large sequential read and write workloads.
- A smaller block size can offer better performance for applications that do small random writes to sparse files or small random writes to large files that are subject to frequent snapshots.
- Metadata performance
- The choice of block size affects the performance of certain metadata operations, in particular,
block allocation performance. The IBM Storage Scale block
allocation map is stored in blocks, similar to regular files. When the block size is small:
- More blocks are required to store the same amount of data, which results in more work to allocate those blocks
- One block of allocation map data contains less information
- Metadata-only system pool
- The --metadata-block-size option on the
mmcrfs command allows a different block size to be specified for the
system storage pool, provided its usage is set to metadataOnly. Valid
values are the same as the ones that are listed for the -B parameter.Note: Setting the metadata block size to a different value than the data block size can have the effect of changing the data subblock size and the number of subblocks per data block. For more information see Scenario 3 earlier in this help topic.