mmcrfs command

Creates a GPFS file system.

Synopsis

mmcrfs Device {"DiskDesc[;DiskDesc...]" | -F StanzaFile}
       [-A {yes | no | automount}] [-B BlockSize] [-D {posix | nfs4}]
       [-E {yes | no}] [-i InodeSize] [-j {cluster | scatter}]
       [-k {posix | nfs4 | all}] [-K {no | whenpossible | always}]
       [-L LogFileSize] [-m DefaultMetadataReplicas]
       [-M MaxMetadataReplicas] [-n NumNodes] [-Q {yes | no}]
       [-p afmAttribute=Value[,afmAttribute=Value...]...]
       [-r DefaultDataReplicas] [-R MaxDataReplicas]
       [-S {yes | no | relatime}] [-T Mountpoint] [-t DriveLetter]
       [-v {yes | no}] [-z {yes | no}] [--filesetdf | --nofilesetdf]
       [--flush-on-close | --noflush-on-close]
       [--auto-inode-limit | --noauto-inode-limit]
       [--inode-limit MaxNumInodes[:NumInodesToPreallocate]]
       [--log-replicas LogReplicas] [--metadata-block-size MetadataBlockSize]
       [--mount-priority Priority] [--nfs4-owner-write-acl {yes | no}] 
       [--perfileset-quota | --noperfileset-quota] 
       [--version VersionString] [--write-cache-threshold HAWCThreshold]

Availability

Available on all IBM Storage Scale editions.

Description

Use the mmcrfs command to create a GPFS file system. The first parameter must be Device and it must be followed by either DiskDescList or -F StanzaFile. You can mount a maximum of 256 file systems in an IBM Storage Scale cluster at any one time, including remote file systems.

The performance of a file system is affected by the values that you set for block size, replication, and the maximum number of files (number of inodes).

For more information about block size, see the descriptions in this help topic of the -B BlockSize parameter and the --metadata-block-size parameter. The following list includes some general facts from those descriptions:

The block size, subblock size, and number of subblocks per block of a file system are set when the file system is created and cannot be changed later.
All the data blocks in a file system have the same block size and the same subblock size. The data blocks and subblocks in the system storage pool and those in user storage pools have the same sizes. An example of a valid block size and subblock size is a 4 MiB block with an 8 KiB subblock.
All the metadata blocks in a file system have the same block size and the same subblock size. The metadata blocks and subblocks are set to the same sizes as data blocks and subblocks, unless the --metadata-block-size parameter is specified.
If the system storage pool contains only metadataOnly NSDs, the metadata block can be set to a different size than the data block size with the --metadata-block-size parameter.
Note: This setting can result in a change in the data subblock size and in the number of subblocks in a data block. For an example, see the subsection "Subblocks" in the description of the -B parameter later in this help topic.
The data blocks and metadata blocks must have the same number of subblocks, even when the data block size and the metadata block size are different.
The number of subblocks per block is derived from the smallest block size of any storage pool in the file system, including the system metadata pool.
The block size cannot exceed the value of the cluster attribute maxblocksize, which can be set by the mmchconfig command.

For more information, see Block size.

For more information about replication factors, see the descriptions of the -m, -M, -r, and -R parameters in this help topic.

For information about the maximum number of files (number of inodes), see the description of the --inode-limit parameter later in this help topic.

Results

Upon successful completion of the mmcrfs command, these tasks are completed on all the nodes of the cluster:

The mount point directory is created.
The file system is formatted.

In GPFS v3.4 and earlier, disk information for the mmcrfs command was specified with disk descriptors, which have the following format. The second, third, and sixth fields are reserved:

DiskName:::DiskUsage:FailureGroup::StoragePool:

For compatibility with earlier versions, the mmcrfs command still accepts the traditional disk descriptors, but their use is deprecated.

Parameters

Device

The device name of the file system to be created.

File system names need not be fully qualified. fs0 is as acceptable as /dev/fs0. However, file system names must be unique within a GPFS cluster. Do not specify an existing entry in /dev.

The device name must be the first parameter.

"DiskDesc[;DiskDesc...]"

A descriptor for each disk to be included. Each descriptor is separated by a semicolon (;). The entire list must be enclosed in quotation marks (' or "). The use of disk descriptors is discouraged.

-F StanzaFile

Specifies a file that contains the NSD stanzas and pool stanzas for the disks that are to be added to the file system. NSD stanzas have the following format:

%nsd: 
  nsd=NsdName
  usage={dataOnly | metadataOnly | dataAndMetadata | descOnly}
  failureGroup=FailureGroup
  pool=StoragePool
  servers=ServerList
  device=DiskName
  thinDiskType={no | nvme | scsi | auto}

Where:

nsd=NsdName

Specifies the name of an NSD that was previously created by the mmcrnsd command. For a list of available disks, issue the mmlsnsd -F command. This clause is mandatory for the mmcrfs command.

usage={dataOnly | metadataOnly | dataAndMetadata | descOnly}

Specifies the type of data that is to be stored on the disk.

Note: For more information about the system storage pool, user storage pools, and the optional metadata system pool, see Internal storage pools.

dataAndMetadata: Indicates that the disk contains both data and metadata. This data type is the default for disks in the system pool.
dataOnly: Indicates that the disk contains data and does not contain metadata. This data type is the default for disks in storage pools other than the system pool.
metadataOnly: Indicates that the disk contains metadata and does not contain data.
descOnly: Indicates that the disk contains no data and no file metadata. IBM Storage Scale uses this type of disk primarily to keep a copy of the file system descriptor. It can also be used as a third failure group in certain disaster recovery configurations. For more information, see Synchronous mirroring with GPFS replication.

failureGroup=FailureGroup

Identifies the failure group to which the disk belongs. A failure group identifier can be a simple integer or a topology vector that consists of up to three comma-separated integers. The default is -1, which indicates that the disk has no point of failure in common with any other disk.

GPFS uses this information during data and metadata placement to ensure that no two replicas of the same block can become unavailable due to a single failure. All disks that are attached to the same NSD server or adapter must be placed in the same failure group.

If the file system is configured with data replication, all storage pools must have two failure groups to maintain proper protection of the data. Similarly, if metadata replication is in effect, the system storage pool must have two failure groups.

Disks that belong to storage pools in which write affinity is enabled can use topology vectors to identify failure domains in a shared-nothing cluster. Disks that belong to traditional storage pools must use simple integers to specify the failure group.

pool=StoragePool

Specifies the storage pool to which the disk is to be assigned. The default is the system storage pool. To specify the system storage pool explicitly, type system:

pool=system

Only the system storage pool can contain metadataOnly, dataAndMetadata, or descOnly disks. Disks in other storage pools must be dataOnly.

servers=ServerList

A comma-separated list of NSD server nodes. This clause is ignored by the mmcrfs command.

device=DiskName

The block device name of the underlying disk device. This clause is ignored by the mmcrfs command.

thinDiskType={no | nvme | scsi | auto}

Specifies the space reclaim disk type:

Note: By default the system pool cannot contain both regular disks and thin provisioned disks. If you want to include both types of disk in the system pool, contact IBM® Service for more information.

no: The disk does not support space reclaim. This value is the default.
nvme: The disk is a TRIM capable NVMe device that supports the mmreclaimspace command.
scsi: The disk is a thin provisioned SCSI disk that supports the mmreclaimspace command.
auto: The type of the disk is either nvme or scsi. IBM Storage Scale tries to detect the actual disk type automatically. To avoid problems, replace auto with the correct disk type, nvme or scsi, as soon as you can.
Note: The space reclaim auto-detection is enhanced in IBM Storage Scale 5.0.5. Use the auto keyword after you upgrade the cluster to IBM Storage Scale 5.0.5 or later.

For more information, see IBM Storage Scale with data reduction storage devices.

Pool stanzas have the following format:

%pool: 
  pool=StoragePoolName
  blockSize=BlockSize
  usage={dataOnly | metadataOnly | dataAndMetadata}
  layoutMap={scatter | cluster}
  allowWriteAffinity={yes | no}
  writeAffinityDepth={0 | 1 | 2}
  blockGroupFactor=BlockGroupFactor

Where:

pool=StoragePoolName

Is the name of a storage pool.

blockSize=BlockSize

Specifies the block size of the disks in the storage pool.

usage={dataOnly | metadataOnly | dataAndMetadata}

Specifies the type of data to be stored in the storage pool:

dataAndMetadata: Indicates that the disks in the storage pool contain both data and metadata. This is the default for disks in the system pool.
dataOnly: Indicates that the disks contain data and do not contain metadata. This is the default for disks in storage pools other than the system pool.
metadataOnly: Indicates that the disks contain metadata and do not contain data.

layoutMap={scatter | cluster}

Specifies the block allocation map type. When allocating blocks for a given file, GPFS first uses a round-robin algorithm to spread the data across all disks in the storage pool. After a disk is selected, the location of the data block on the disk is determined by the block allocation map type. If cluster is specified, GPFS attempts to allocate blocks in clusters. Blocks that belong to a particular file are kept adjacent to each other within each cluster. If scatter is specified, the location of the block is chosen randomly.

The cluster allocation method may provide better disk performance for some disk subsystems in relatively small installations. The benefits of clustered block allocation diminish when the number of nodes in the cluster or the number of disks in a file system increases, or when the file system's free space becomes fragmented. The cluster allocation method is the default for GPFS clusters with eight or fewer nodes and for file systems with eight or fewer disks.

The scatter allocation method provides more consistent file system performance by averaging out performance variations due to block location (for many disk subsystems, the location of the data relative to the disk edge has a substantial effect on performance). This allocation method is appropriate in most cases and is the default for GPFS clusters with more than eight nodes or file systems with more than eight disks.

The block allocation map type cannot be changed after the storage pool has been created.

allowWriteAffinity={yes | no}

Indicates whether the File Placement Optimizer (FPO) feature is to be enabled for the storage pool. For more information on FPO, see File Placement Optimizer

writeAffinityDepth={0 | 1 | 2}

Specifies the allocation policy to be used by the node writing the data.

A write affinity depth of 0 indicates that each replica is to be striped across the disks in a cyclical fashion with the restriction that no two disks are in the same failure group. By default, the unit of striping is a block; however, if the block group factor is specified in order to exploit chunks, the unit of striping is a chunk.

A write affinity depth of 1 indicates that the first copy is written to the writer node. The second copy is written to a different rack. The third copy is written to the same rack as the second copy, but on a different half (which can be composed of several nodes).

A write affinity depth of 2 indicates that the first copy is written to the writer node. The second copy is written to the same rack as the first copy, but on a different half (which can be composed of several nodes). The target node is determined by a hash value on the fileset ID of the file, or it is chosen randomly if the file does not belong to any fileset. The third copy is striped across the disks in a cyclical fashion with the restriction that no two disks are in the same failure group. The following conditions must be met while using a write affinity depth of 2 to get evenly allocated space in all disks:

The configuration in disk number, disk size, and node number for each rack must be similar.
The number of nodes must be the same in the bottom half and the top half of each rack.

This behavior can be altered on an individual file basis by using the --write-affinity-failure-group option of the mmchattr command.

This parameter is ignored if write affinity is disabled for the storage pool.

blockGroupFactor=BlockGroupFactor

Specifies how many file system blocks are laid out sequentially on disk to behave like a single large block. This option only works if --allow-write-affinity is set for the data pool. This applies only to a new data block layout; it does not migrate previously existing data blocks. For more information, see File Placement Optimizer.

-A {yes | no | automount}

Indicates when the file system is to be mounted:

yes: When the GPFS daemon starts. This is the default.
no: The file system is mounted manually.
automount: On non-Windows nodes, when the file system is first accessed. On Windows nodes, when the GPFS daemon starts.
Note: IBM Storage Protect for Space Management does not support file systems with the -A option set to automount.

-B BlockSize

Specifies the size of data blocks in the file system. By default this parameter sets the block size and subblock size for all the data blocks and metadata blocks in the file system. This statement applies to all the data blocks and metadata blocks in the system storage pool and all the data blocks in user storage pools.

Note: You can specify metadata blocks to a different size than the data blocks by using the --metadata-block-size parameter, and this setting can change the size and number of data subblocks. However, it is not recommended to use different data and metadata block sizes. For more information, see the description of the --metadata-block-size parameter later in this help topic.

Specify the value of the -B parameter with the character K or M. For example, to set the block size to 4 MiB with an 8 KiB subblock, type "-B 4M". The following table shows the supported block sizes with their subblock size:

Table 1. Supported block sizes with subblock size
Supported block sizes with subblock size
64 KiB block with a 2 KiB subblock
128 KiB block with a 4 KiB subblock
256 KiB, 512 KiB, 1 MiB, 2 MiB, or 4 MiB block with 8 KiB subblock
8 MiB or 16 MiB block with a 16 KiB subblock

Attention:

A data block size of 4 MiB provides good sequential performance, makes efficient use of disk space, and provides good performance for small files. It works well for the widest variety of workloads.
For information about suggested block sizes for different types of I/O and for different workloads and configuration types, see Block size.

Default block size

If the -B parameter is not specified then the data blocks and the metadata blocks in the file system are set to a default block size and subblock size. If the format version of the file system is 5.0.0 or greater (and the value of the maxblocksize cluster attribute is greater than 4 MiB) then the default block size is 4 MiB with an 8 KiB subblock.

The default block size depends on the file system format version of the new file system, which is determined by the specified value or the default value of the --version parameter. For more information, see the description of the --version parameter later in this help topic.

If the format version of the new file system is 5.0.0 or greater (available in IBM Storage Scale 5.0.x or later) then the following settings apply:
- The default block size is 4 MiB or the value of the maxblocksize attribute of the cluster, whichever is smaller. For more information about the maxblocksize attribute, see mmchconfig command.
  Note: But if maxblocksize is less than 256 KiB, then you must explicitly set -B BlockSize to 64 KiB or 128 KiB.
- The subblock size is set from the appropriate row of Table 1.
If the format version of the file system is earlier than 5.0.0 (earlier than IBM Storage Scale 5.0.0) then the following settings apply:
- The default block size is 256 KiB.
- The subblock size is 1/32 of the block size.

Subblocks

By default the data blocks and metadata blocks in a file system are set to the same block size and the same subblock size. The block size and subblock size are determined either by a setting from Table 1 (if -B is specified) or by the default sizes (if -B is not specified).

IBM Storage Scale 5.0.0 introduced variable subblock sizes, which make space allocations for smaller files more efficient with larger block sizes. With variable subblock sizes, only one number of subblocks per block is defined per file system, and that number is defined by the smaller of the block sizes between data and metadata block sizes. That is, as only a single definition for the number of subblocks per block exists per file system, selecting a smaller metadata block size has the unintended side effect of increasing the subblock size for data blocks. Although it is supported to set metadata blocks to a different size than data blocks by using the --metadata-block-size parameter, it is not recommended to use that option. The --metadata-block-size option is currently being deprecated and it will be removed in a future release.

If you still decide to use --metadata-block-size parameter to change the metadata block size, then the following steps are taken automatically to determine the subblock sizes for data blocks and metadata blocks. Otherwise, you can ignore these steps:

Determine the number of subblocks. This step is necessary because data blocks and metadata blocks must have the same number of subblocks:
1. Choose the block type with the smaller block size (usually the metadata block).
2. Set the subblock size from the appropriate row in Table 1.
3. Find the number of subblocks by dividing the block size by the subblock size. This value will be the number of subblocks for both data blocks and metadata blocks.
For example, suppose that initially the block sizes are set to 16 MiB for data blocks and 1 MiB for metadata blocks. The smaller block size is 1 MiB for metadata blocks. From Table 1, the subblock size for a block size of 1 MiB is 8 KiB. Therefore the number of subblocks is (1 MiB / 8 KiB) or 128 subblocks. Thus the following settings are determined:
- For metadata blocks, from the table, the block size is 1 MiB and the metadata subblock size is 8 KiB.
- Both data blocks and metadata blocks must have 128 subblocks.
Determine the subblock size for the other block type (usually the data block) by dividing the block size by the number of subblocks from Step 1. Continuing the example from Step 1, a data block must have 128 subblocks. Therefore the subblock size for data blocks is (16 MiB / 128) or 128 KiB. Note that this is different than the standard subblock size of 16 KIB for a 16 MIB block.

-D {nfs4 | posix}

Specifies whether a deny-write open lock blocks write operations, as it is required to do by NFS V4. File systems supporting NFS V4 must have -D nfs4 set. The option -D posix allows NFS writes even in the presence of a deny-write open lock. If you intend to export the file system using NFS V4 or Samba, you must use -D nfs4. For NFS V3 (or if the file system is not NFS exported at all) use -D posix. The default is -D nfs4.

-E {yes | no}

Specifies whether to report exact mtime values (-E yes), or to periodically update the mtime value for a file system (-E no). If it is more desirable to display exact modification times for a file system, specify or use the default -E yes.

-i InodeSize

Specifies the byte size of inodes. Supported inode sizes are 512, 1024, and 4096 bytes. The default is 4096.

-j {cluster | scatter}

Specifies the default block allocation map type to be used if layoutMap is not specified for a given storage pool.

-k {posix | nfs4 | all}

Specifies the type of authorization supported by the file system:

posix: Traditional GPFS ACLs only (NFS V4 and Windows ACLs are not allowed). Authorization controls are unchanged from earlier releases.
nfs4: Support for NFS V4 and Windows ACLs only. Users are not allowed to assign traditional GPFS ACLs to any file system objects (directories and individual files).
all: Any supported ACL type is permitted. This includes traditional GPFS (posix) and NFS V4 and Windows ACLs (nfs4).
The administrator is allowing a mixture of ACL types. For example, fileA might have a posix ACL, while fileB in the same file system may have an NFS V4 ACL, implying different access characteristics for each file depending on the ACL type that is currently assigned. The default is -k all.

Avoid specifying nfs4 or all unless files are to be exported to NFS V4 or Samba clients, or the file system is mounted on Windows. NFS V4 and Windows ACLs affect file attributes (mode) and have access and authorization characteristics that are different from traditional GPFS ACLs.

-K {no | whenpossible | always}

Specifies whether strict replication is to be enforced:

no: Indicates that strict replication is not enforced. GPFS tries to create the needed number of replicas, but still returns EOK if it can allocate at least one replica.
whenpossible: Indicates that strict replication is enforced provided the disk configuration allows it. If the number of failure groups is insufficient, strict replication is not enforced. This is the default value.
always: Indicates that strict replication is enforced.

For more information, see the topic "Strict replication" in the IBM Storage Scale: Problem Determination Guide.

-L LogFileSize

Specifies the size of the internal log files. The LogFileSize must be a multiple of the metadata block size. The default log file size is 32 MiB in most cases. However, if the data block size (parameter -B) is less than 512 KiB or if the metadata block size (parameter --metadata-block-size) is less than 256 KiB, then the default log file size is either 4 MiB or the metadata block size, whichever is greater. The minimum size is 256 KiB and the maximum size is 1024 MiB. Specify this value with the K or M character, for example: 8M.

The default log size works well in most cases. An increased log file size is useful when the highly available write cache feature (parameter --write-cache-threshold) is enabled.

-m DefaultMetadataReplicas

Specifies the default number of copies of inodes, directories, and indirect blocks for a file. Valid values are 1, 2, and 3. This value cannot be greater than the value of MaxMetadataReplicas. The default is 1.

-M MaxMetadataReplicas

Specifies the default maximum number of copies of inodes, directories, and indirect blocks for a file. Valid values are 1, 2, and 3. This value cannot be less than the value of DefaultMetadataReplicas. The default is 2.

-n NumNodes

The estimated number of nodes that will mount the file system in the local cluster and all remote clusters. This is used as a best guess for the initial size of some file system data structures. The default is 32. This value can be changed after the file system has been created but it does not change the existing data structures. Only the newly created data structure is affected by the new value. For example, new storage pool.

When you create a GPFS file system, you might want to overestimate the number of nodes that will mount the file system. GPFS uses this information for creating data structures that are essential for achieving maximum parallelism in file system operations (For more information, see GPFS architecture ). If you are sure there will never be more than 64 nodes, allow the default value to be applied. If you are planning to add nodes to your system, you should specify a number larger than the default.

-Q {yes | no}

Activates quotas automatically when the file system is mounted. The default is -Q no. Issue the mmdefedquota command to establish default quota values. Issue the mmedquota command to establish explicit quota values.

To activate GPFS quota management after the file system has been created:

Mount the file system.
To establish default quotas:
1. Issue the mmdefedquota command to establish default quota values.
2. Issue the mmdefquotaon command to activate default quotas.
To activate explicit quotas:
1. Issue the mmedquota command to activate quota values.
2. Issue the mmquotaon command to activate quota enforcement.

-r DefaultDataReplicas

Specifies the default number of copies of each data block for a file. Valid values are 1, 2, and 3. This value cannot be greater than the value of MaxDataReplicas. The default is 1.

-R MaxDataReplicas

Specifies the default maximum number of copies of data blocks for a file. Valid values are 1, 2, and 3. This value cannot be less than the value of DefaultDataReplicas. The default is 2.

-S {yes | no | relatime}

Controls how the file attribute atime is updated.

Note: The attribute atime is updated locally in memory, but the value is not visible to other nodes until after the file is closed. To get an accurate value of atime, an application must call subroutine gpfs_stat or gpfs_fstat.

yes

The atime attribute is not updated. The subroutines gpfs_stat and gpfs_fstat return the time that the file system was last mounted with relatime=no. For more information, see the topics mmmount command with the -o parameter and Mount options specific to IBM Storage Scale.

no

The atime attribute is updated whenever the file is read. This option is the default if the minimum release level (minReleaseLevel) of the cluster is less than 5.0.0 when the file system is created.

relatime

The atime attribute is updated whenever the file is read, but only if one of the following conditions is true:

The current file access time (atime) is earlier than the file modification time (mtime).
The current file access time (atime) is greater than the atimeDeferredSeconds attribute. For more information, see mmchconfig command.

This setting is the default if the minimum release level (minReleaseLevel) of the cluster is 5.0.0 or greater when the file system is created.

For more information, see atime values.

-T MountPoint

Specifies the mount point directory of the GPFS file system. If it is not specified, the mount point is set to DefaultMountDir/Device. The default value for DefaultMountDir is /gpfs, but it can be changed with the mmchconfig command.

-t DriveLetter

Specifies the drive letter to use when the file system is mounted on Windows.

-v {yes | no}

Verifies that specified disks do not belong to an existing file system. The default is -v yes. Specify -v no only when you want to reuse disks that are no longer needed for an existing file system. If the command is interrupted for any reason, use -v no on the next invocation of the command.

Important: Using -v no on a disk that already belongs to a file system will corrupt that file system. This will not be noticed until the next time that file system is mounted.

-z {yes | no}

Enable or disable DMAPI on the file system. Turning this option on requires an external data management application such as IBM Storage Protect hierarchical storage management (HSM) before the file system can be mounted. The default is -z no.

For further information regarding DMAPI for GPFS, see GPFS-specific DMAPI events.

Note: IBM Storage Protect for Space Management does not support file systems with the -A option set to automount.

--auto-inode-limit

Automatically increases the maximum number of inodes per inode space in the file system. If enabled, then the current value that is defined for MaxNumInodes is not used as a limit when the preallocated inodes are expanded on demand. After expansion, if the new number of preallocated inodes is larger than the current value defined in MaxNumInodes, then the maximum number of inodes is increased to bring both at par.

Note: Both the MaxNumInodes and the NumInodesToPreallocate variables are defined for the --inode-limit option.

The --auto-inode-limit option is available only in IBM Storage Scale 5.1.4 with format level 28.00 or later.

--noauto-inode-limit

The maximum number of inodes cannot be expanded on demand. This is the default.

--filesetdf

Specifies that when the details that are reported by the df command are enforced for a fileset other than the root fileset, the numbers that are reported by the df command are based on the quotas for the specific fileset or on the capacity and usage limit at the independent fileset level, rather than the entire file system. Thedf command reports either quota limit and usage or inode space capacity and usage for the fileset and not for the total file system. This option affects the df command behavior only on Linux® nodes.

The df command reports quota limit and quota usage if quota is enabled for the fileset. If quota is disabled and filesetdf is enabled in IBM Storage Scale 5.1.1 or later with file system version 5.1.1 or later, then the df command reports inode space capacity and inode usage at the independent fileset level. However, the df command reports the block space at the file system level because the block space is shared with the whole file system.

Note: In IBM Storage Scale 5.1.3 or later with the file system version 5.1.1 or later, if quota is enabled but the limits are not defined then the df command reports inode space capacity and inode space usage at the independent fileset level.

--nofilesetdf

Specifies that the numbers reported by the df command are not based on the fileset level. The df command returns the numbers for the entire file system. This is the default.

--flush-on-close | --noflush-on-close

Specifies whether the automatic flushing of disk buffers is enabled when closing files that were opened for write operations on the device. The minimum release level of the cluster must be 5.1.3 or later and the file system format version must be at 5.1.3.0 (27.00) or later to enable this feature.

The automatic flushing of disk buffers is disabled by default.

Note: Enabling the --flush-on-close feature might impact the performance of workloads that are running on the file system for which it is enabled.

--inode-limit MaxNumInodes[:NumInodesToPreallocate]

Specifies the maximum number of files in the file system.

In a file system that does parallel file creates, the number of free inodes must be greater than 5% of the total number of inodes. If not, the performance of the file system can be degraded. To increase the number of inodes, issue the mmchfs command.

The parameter NumInodesToPreallocate specifies the number of inodes that the system immediately preallocates. If you do not specify a value for NumInodesToPreallocate, GPFS dynamically allocates inodes as needed.

You can specify the NumInodes and NumInodesToPreallocate values with a suffix, for example 100K or 2M. Note that in order to optimize file system operations, the number of inodes that are actually created may be greater than the specified value.

Note: Preallocated inodes created using the mmcrfs command are allocated only to the root fileset, and these inodes cannot be deleted or moved to another independent fileset. It is recommended to avoid preallocating too many inodes because there can be both performance and memory allocation costs associated with such preallocations. In most cases, there is no need to preallocate inodes because GPFS dynamically allocates inodes as needed.

--log-replicas LogReplicas

Specifies the number of recovery log replicas. Valid values are 1, 2, 3, or DEFAULT. If not specified, or if DEFAULT is specified, the number of log replicas is the same as the number of metadata replicas currently in effect for the file system.

This option is applicable only if the recovery log is stored in the system.log storage pool. For more information about the system.log storage pool, see The system.log storage pool.

--metadata-block-size MetadataBlockSize

Sets the metadata block size.

By default the data blocks and metadata blocks in a file system are set to the same block size and the same subblock size. For more information about these settings see the description of the -B BlockSize parameter earlier in this help topic.

Note: It is recommended to use the same block size for data and metadata blocks. For more information about the reason for using the same block size for data and metadata, see Subblocks section in this topic.

However, you can use the --metadata-block-size parameter to specify a metadata block size that is different from the data block size. This option is being deprecated because it is no longer required to use for performance improvements for file systems with file system format 5.0.0 or later. This parameter will be removed in a future release.

If you need to set metadata blocks to a different size than the data blocks, you must define a metadata-only system pool and specify the --metadata-block-size parameter when you issue the mmcrfs command. Follow these steps:

Define a pool stanza for a metadata-only system pool. Include the following settings:
- Set pool to a valid pool name.
- Do not set a value for blockSize. The metadata block size is set in the --metadata-block-size parameter of the mmcrfs command.
- Set usage to metadataOnly.
Define an NSD stanza that includes the following settings:
- Set usage to metadataOnly.
- Set pool to the name of the pool stanza that you defined in Step 1.
Include the NSD stanza and the pool stanza in the stanza file that you will pass to the mmcrfs command.
When you run the mmcrfs command, include the --metadata-block-size parameter and specify a valid block size from Table 1.

Note: It is recommended to use same value for -B and --metadata-block-size parameters. When metadata blocks are set to a different size than data blocks, the subblock sizes are ultimately determined by an automatic sequence of steps in the mmcrfs command processing. For more information, see the "Subblocks" subtopic in the description of the --B BlockSize parameter earlier in this help topic.

--perfileset-quota

Sets the scope of user and group quota limit checks to the individual fileset level (rather than the entire file system).

--noperfileset-quota

Sets the scope of user and group quota limit checks to the entire file system (rather than per individual fileset). This is the default.

--mount-priority Priority

Controls the order in which the individual file systems are mounted at daemon startup or when one of the all keywords is specified on the mmmount command.

File systems with higher Priority numbers are mounted after file systems with lower numbers. File systems that do not have mount priorities are mounted last. A value of zero indicates no priority. This is the default.

--nfs4-owner-write-acl {yes | no}

Specifies whether object owners are given implicit NFSv4 WRITE_ACL permission.

A value of yes specifies that object owners are given implicit WRITE_ACL permission.

A value of no specifies that object owners are NOT given implicit WRITE_ACL permission. The default value is yes.

Note:

The minimum release level of the cluster must be 5.1.5 or later and the file system format version must be at 5.1.5 (29.00) or later to enable this feature.

When the option is set to no, copying files and directories using the cp command with owner's privileges from an NFS client may fail with the error E_PERM as a consequence of SETATTR operation failure. For applications that expect cp to return a success status, consider using the ignore_mode_change=true option for NFS Ganesha exports so SETATTR and, consequently, cp return a success status.

On AIX Start of change version below 7300-02-01-2346 End of change , when the option is set to no and the owner does not WRITE_ACL permissions for a directory, copying such directory using the cp -r/-R option with owner's privileges will return the error E_PERM. The error is returned due to the underlying chmod call.

--version VersionString

Specifies the file system format version of the new file system, such as 4.2.3.0. A file system format version is associated with a file system format number (for example, 17.0) that determines the features that are enabled in the new file system. For more information about these values, see File system format changes between versions of IBM Storage Scale.

If you do not specify this parameter, the file system format version of the new file system defaults to the version of IBM Storage Scale that is installed on the node where you issue the command. For example, if IBM Storage Scale 4.2.3 is installed on the node where you issue the command, then the default file system format version for the new file system is 4.2.3.0.

Whether you specify the file system format version or let it assume the default value, the file system format version must be in the range 4.1.1.0 - mRL, where mRL is the minimum release level of the cluster (minReleaseLevel).

The file system format version also affects the default value of the -B BlockSize parameter. For more information, see the description of that parameter earlier in this help topic.

Important:

A remote node with an installed product version of IBM Storage Scale (for example, 4.2.3) that is less than the file system format version of the new file system (such as 5.0.0) will not be able to access the file system.
Windows nodes can mount only file systems with a file system format version greater than or equal to 3.2.1.5.
If you do not specify this parameter and the installed product version of the node where you issue the command is greater than the minimum release level (minReleaseLevel) of the cluster, then the command returns with an error message and prompts you to upgrade the minimum release level. To avoid this result, specify a file system format version with the --version parameter.
In many contexts you might want to let the file system format version assume its default value. However, specifying an explicit file system format version can be useful or necessary in the following contexts:
- When nodes in the cluster are running different versions of IBM Storage Scale.
- When you want to make the file system available to remote clusters in which nodes are running an earlier version of IBM Storage Scale.

--profile ProfileName

Specifies a predefined profile of attributes to be applied. System-defined profiles are located in /usr/lpp/mmfs/profiles/. All the file system attributes listed under a file system stanza are changed as a result of this command. The following system-defined profile names are accepted:

gpfsProtocolDefaults
gpfsProtocolRandomIO

The file system attributes are applied at file system creation. If there is a current profile in place on the system (use mmlsconfig profile to check), then the file system is created with those attributes and values listed in the profile's file system stanza. The default is to use whatever attributes and values associate with the current profile setting.

Furthermore, any and all file system attributes from an installed profile file can be by-passed with '--profile=userDefinedProfile', where the userDefinedProfile is a profile file has been installed by the user in /var/mmfs/etc/.

Note: The user provided profile file name must be in lowercase.

User-defined profiles consist of the following stanzas:


%cluster:
[CommaSeparatedNodesOrNodeClasses:]ClusterConfigurationAttribute=Value
...
%filesystem:
FilesystemConfigurationAttribute=Value
...

A sample file can be found in /usr/lpp/mmfs/samples/sample.profile. See the mmchconfig command for a detailed description of the different configuration parameters.

User-defined profiles should be used only by experienced administrators. When in doubt, use the mmchconfig command instead.

--write-cache-threshold HAWCThreshold

Specifies the maximum length (in bytes) of write requests that will be initially buffered in the highly-available write cache before being written back to primary storage. Only synchronous write requests are guaranteed to be buffered in this fashion.

A value of 0 disables this feature. 64K is the maximum supported value. Specify in multiples of 4K.

This feature can be enabled or disabled at any time (the file system does not need to be unmounted). For more information about this feature, see Highly available write cache (HAWC).

-p afmAttribute

Specifies the AFM parameters that to be set on the file system for the file system-level migration by using AFM. If you set AFM parameters while you are creating a new file system, it allows migration of data from a source file system to the newly created IBM Storage Scale file system by using AFM migration method. This method does not require creation of any AFM mode fileset. Instead, AFM will be enabled on the "root" fileset of the file system. The supported AFM mode that can be enabled on the file system are AFM LU mode and AFM RO mode. Conversion of the RO mode to the LU mode is permitted. After the migration is completed, you can disable AFM relationship by using the mmchfileset Device root -p afmTarget=disable command and later use this as a regular file system. After AFM relationship is disabled, it cannot be enabled again. For more information, see Migration from the legacy hardware by using AFM.

AFM supports the following parameters for file system-level migration:

afmTarget

Identifies the home that is associated with the cache. The home is specified in either of the following forms:

nfs://{Host|Map}/Source_Path

Where:

nfs://: Specifies the transport protocol.

Source_Path: Specifies the export path.

Host: Specifies the server domain name system (DNS) name or IP address.

Map: Specifies the export map name. For more information about mapping, see Parallel data transfers.

The afmTarget parameter examples are as follows:

Use NFS protocol without mapping.

# mmcrfs Device ... -p afmTarget=<Host|IP>://Source_Path,afmmode=ro

Use NFS protocol with mapping.

# mmcrfs Device ... -p afmtarget=nfs://<map1>/Source_Path,afmmode=ro

Define the afmTarget parameter to associate a cloud object bucket with a file system by using the manual updates mode for the file system replication.
```
afmtarget=https://s3.us-east.cloud-object-storage.appdomain.cloud:443/bucket1
```
https

Is a protocol.

s3.us-east.cloud-object-storage.appdomain.cloud:443

Is an endpoint.

bucket1

Is a bucket name. This bucket is created before creating a file system for replication.

afmMode

Specifies the AFM fileset mode. Valid values are as follows:

read-only | ro: Specifies the read-only mode. You can fetch data into the ro-mode fileset for read-only purpose.

local-updates | lu: Specifies the local-updates mode. You can fetch data into the lu-mode fileset and update it locally. The modified data will not be synchronized to the home and stays local.

Conversion of the ro mode to the lu mode is supported for file system-level migration. For more information, see Caching modes.

manual-updates | mu: The manual updates (MU) mode supports manual replication of the files or objects by using ILM policies or user provided object list. The MU mode is supported on AFM to cloud object storage backends for the fileset or file system level replication.; The MU mode fileset provides the flexibility to upload and download files or objects to and from cloud object storage after you finalize the set of objects to upload or download for replication by using the file system or fileset. Unlike other AFM to cloud object storage objectfs fileset modes, MU mode depends on manual intervention from administrators to upload and download the data to be in sync. As an administrator you can also automate upload and download by using ILM policies to search specific files or objects to upload or download.

To disable the AFM relationship from the file system, complete the following steps:

Unmount the file system on all cluster nodes.

Disable the AFM relationship by issuing the following command:

# mmchfileset fs1 root -p afmMode=disable

A sample output is as follows:

Warning! Once disabled, AFM cannot be re-enabled on this fileset. Do you wish to continue? (yes/no) yes

Warning! Fileset should be verified for uncached files and orphans. If already verified, then skip this step. Do you wish to verify same? (yes/no) no

Fileset root changed.

afmDirLookupRefreshInterval

Controls the frequency of data revalidations that are triggered by lookup operations such as ls or stat (specified in seconds). When a lookup operation is performed on a directory, if the specified time passed, AFM sends a message to the home cluster to find out whether the metadata of that directory is modified since the last time it was checked. If the time interval did not pass, AFM does not check the home cluster for updates to the metadata.

Valid values are 0 – 2147483647. The default is 60. Where home cluster data changes frequently, value 0 is recommended.

afmDirOpenRefreshInterval: Controls the frequency of data revalidations that are triggered by such I/O operations as read or write (specified in seconds). After a directory is cached, open requests that are resulting from I/O operations on that object are directed to the cached directory until the specified amount of time has passed. After the specified time passed, the open request is directed to a gateway node rather than to the cached directory.
Valid values are 0 - 2147483647. The default is 60. Set a lower value for a higher level of consistency.

afmFileLookupRefreshInterval: Controls the frequency of data revalidations that are triggered by lookup operations such as ls or stat (specified in seconds). When a lookup operation is performed on a file, if the specified time passed, AFM sends a message to the home cluster to find out whether the metadata of the file is modified since the last time it was checked. If the time interval did not pass, AFM does not check the home cluster for updates to the metadata.
Valid values are 0 – 2147483647. The default is 30. Where home cluster data changes frequently, value 0 is recommended.

afmFileOpenRefreshInterval: Controls the frequency of data revalidations that are triggered by I/O operations such as read or write (specified in seconds). After a file is cached, open requests from I/O operations on that object are directed to the cached file until the specified time passed. After the specified time passed, the open request is directed to a gateway node rather than to the cached file.
Valid values are 0 – 2147483647. The default is 30. Set a lower value for a higher level of consistency.

afmParallelReadChunkSize: Defines the minimum chunk size of the read that needs to be distributed among the gateway nodes during parallel reads. Values are interpreted in bytes. The default value of this parameter is 128 MiB, and the valid range of values is 0 – 2147483647. It can be changed cluster wide with the mmchconfig command. It can be set at fileset level by using the mmcrfileset or mmchfileset commands.

afmParallelReadThreshold: Defines the threshold beyond which parallel reads become effective. Reads are split into chunks when file size exceeds this threshold value. Values are interpreted in MiB. The default value is 1024 MiB. The valid range of values is 0 – 2147483647. It can be changed cluster wide with the mmchconfig command. It can be set at fileset level by using mmcrfileset or mmchfileset commands.

Exit status

0: Successful completion.
nonzero: A failure has occurred.

Security

You must have root authority to run the mmcrfs command.

The node on which the command is issued must be able to execute remote shell commands on any other node in the cluster without the use of a password and without producing any extraneous messages. For more information, see Requirements for administering a GPFS file system.

Examples

This example shows how to create a file system named gpfs1 using three disks, each with a block size of 512 KiB, allowing metadata and data replication to be 2, turning quotas on, and creating /gpfs1 as the mount point. The NSD stanzas describing the three disks are assumed to have been placed in file/tmp/freedisks. To complete this task, issue the following command:

# mmcrfs gpfs1 -F /tmp/freedisks -B 512K -m 2 -r 2 -Q yes -T /gpfs1

A sample output is as follows:

The following disks of gpfs1 will be formatted on node c21f1n13:
    hd2n97: size 1951449088 KB
    hd3n97: size 1951449088 KB
    hd4n97: size 1951449088 KB
Formatting file system ...
Disks up to size 16 TB can be added to storage pool 'system'.
Creating Inode File
Creating Allocation Maps
Creating Log Files
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool 'system'
  19 % complete on Tue Feb 28 18:03:20 2012
  42 % complete on Tue Feb 28 18:03:25 2012
  62 % complete on Tue Feb 28 18:03:30 2012
  79 % complete on Tue Feb 28 18:03:35 2012
  96 % complete on Tue Feb 28 18:03:40 2012
 100 % complete on Tue Feb 28 18:03:41 2012
Completed creation of file system /dev/gpfs1.
mmcrfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

Location

/usr/lpp/mmfs/bin

mmcrfs command

Synopsis

Availability

Description

Results

Parameters

Exit status

Security

Examples

See also

Location