Transitioning data to Amazon S3 cloud service

You can transition data to a remote cloud service as part of the lifecycle configuration using storage classes to reduce cost and improve manageability. The transition is unidirectional and data cannot be transitioned back from the remote zone. This feature is to enable data transition to multiple cloud providers such as Amazon (S3).

Use cloud-s3 as tier-type to configure the remote cloud S3 object store service to which the data needs to be transitioned. These do not need a data pool and are defined in terms of the zonegroup placement targets.

Prerequisites

  • An IBM Storage Ceph with Ceph Object Gateway installed.

  • User credentials for the remote cloud service, Amazon S3.

  • Target path created on Amazon S3.

  • s3cmd installed on the bootstrapped node.

  • Amazon AWS configured locally to download data.

Procedure

  1. Create a user with access key and secret key:

    Syntax

    radosgw-admin user create --uid=USER_NAME --display-name="DISPLAY_NAME" [--access-key ACCESS_KEY --secret-key SECRET_KEY]

    Example

    [ceph: root@host01 /]# radosgw-admin user create --uid=test-user --display-name="test-user" --access-key a21e86bce636c3aa1 --secret-key cf764951f1fdde5e
    {
        "user_id": "test-user",
        "display_name": "test-user",
        "email": "",
        "suspended": 0,
        "max_buckets": 1000,
        "subusers": [],
        "keys": [
            {
                "user": "test-user",
                "access_key": "a21e86bce636c3aa1",
                "secret_key": "cf764951f1fdde5e"
            }
        ],
        "swift_keys": [],
        "caps": [],
        "op_mask": "read, write, delete",
        "default_placement": "",
        "default_storage_class": "",
        "placement_tags": [],
        "bucket_quota": {
            "enabled": false,
            "check_on_raw": false,
            "max_size": -1,
            "max_size_kb": 0,
            "max_objects": -1
        },
        "user_quota": {
            "enabled": false,
            "check_on_raw": false,
            "max_size": -1,
            "max_size_kb": 0,
            "max_objects": -1
        },
        "temp_url_keys": [],
        "type": "rgw",
        "mfa_ids": []
    }
  2. On the bootstrapped node, add a storage class with the tier type as cloud-s3:

    Note: Once a storage class is created with the --tier-type=cloud-s3 option , it cannot be later modified to any other storage class type.

    Syntax

    radosgw-admin zonegroup placement add --rgw-zonegroup =ZONE_GROUP_NAME \
                                --placement-id=PLACEMENT_ID \
                                --storage-class =STORAGE_CLASS_NAME \
                                --tier-type=cloud-s3

    Example

    [ceph: root@host01 /]# radosgw-admin zonegroup placement add --rgw-zonegroup=default \
                                                     --placement-id=default-placement \
                                                     --storage-class=CLOUDTIER \
                                                     --tier-type=cloud-s3
    [
        {
            "key": "default-placement",
            "val": {
                "name": "default-placement",
                "tags": [],
                "storage_classes": [
                    "CLOUDTIER",
                    "STANDARD"
                ],
                "tier_targets": [
                    {
                        "key": "CLOUDTIER",
                        "val": {
                            "tier_type": "cloud-s3",
                            "storage_class": "CLOUDTIER",
                            "retain_head_object": "false",
                            "s3": {
                                "endpoint": "",
                                "access_key": "",
                                "secret": "",
                                "host_style": "path",
                                "target_storage_class": "",
                                "target_path": "",
                                "acl_mappings": [],
                                "multipart_sync_threshold": 33554432,
                                "multipart_min_part_size": 33554432
                            }
                        }
                    }
                ]
            }
        }
    ]
  3. Update storage_class.

    Note: If the cluster is part of a multi-site setup, run period update --commit so that the zonegroup changes are propagated to all the zones in the multi-site.
    Note: Make sure access_key and secret do not start with a digit.

    Mandatory parameters are:

    • access_key is the remote cloud S3 access key used for a specific connection.

    • secret is the secret key for the remote cloud S3 service.

    • endpoint is the URL of the remote cloud S3 service endpoint.

    • region (for AWS) is the remote cloud S3 service region name.

    Optional parameters are:

    • target_path defines how the target path is created. The target path specifies a prefix to which the source bucket-name/object-name is appended. If not specified, the target_path created is rgwx-ZONE_GROUP_NAME-STORAGE_CLASS_NAME-cloud-bucket.

    • target_storage_class defines the target storage class to which the object transitions. If not specified, the object is transitioned to STANDARD storage class.

    • retain_head_object, if true, retains the metadata of the object transitioned to cloud. If false (default), the object is deleted post transition. This option is ignored for current versioned objects.

    • multipart_sync_threshold specifies that objects this size or larger are transitioned to the cloud using multipart upload.

    • multipart_min_part_size specifies the minimum part size to use when transitioning objects using multipart upload.

      Syntax

      radosgw-admin zonegroup placement modify --rgw-zonegroup ZONE_GROUP_NAME \
                                               --placement-id PLACEMENT_ID \
                                               --storage-class STORAGE_CLASS_NAME  \
                                               --tier-config=endpoint=AWS_ENDPOINT_URL,\
                                               access_key=AWS_ACCESS_KEY,secret=AWS_SECRET_KEY,\
                                               target_path="TARGET_BUCKET_ON_AWS",\
                                               multipart_sync_threshold=44432,\
                                               multipart_min_part_size=44432,\
                                               retain_head_object=true
                                               region=REGION_NAME

      Example

      [ceph: root@host01 /]# radosgw-admin zonegroup placement modify --rgw-zonegroup default
                                                                      --placement-id default-placement \
                                                                      --storage-class CLOUDTIER \
                                                                      --tier-config=endpoint=http://10.0.210.010:8080,\
                                                                      access_key=a21e86bce636c3aa2,secret=cf764951f1fdde5f,\
                                                                      target_path="dfqe-bucket-01",\
                                                                      multipart_sync_threshold=44432,\
                                                                      multipart_min_part_size=44432,\
                                                                      retain_head_object=true
                                                                      region= us-east-1
      
      [
          {
              "key": "default-placement",
              "val": {
                  "name": "default-placement",
                  "tags": [],
                  "storage_classes": [
                      "CLOUDTIER",
                      "STANDARD",
                      "cold.test",
                      "hot.test"
                  ],
                  "tier_targets": [
                      {
                          "key": "CLOUDTIER",
                          "val": {
                              "tier_type": "cloud-s3",
                              "storage_class": "CLOUDTIER",
                              "retain_head_object": "true",
                              "s3": {
                                  "endpoint": "http://10.0.210.010:8080",
                                  "access_key": "a21e86bce636c3aa2",
                                  "secret": "cf764951f1fdde5f",
                                  "region": "",
                                  "host_style": "path",
                                  "target_storage_class": "",
                                  "target_path": "dfqe-bucket-01",
                                  "acl_mappings": [],
                                  "multipart_sync_threshold": 44432,
                                  "multipart_min_part_size": 44432
                              }
                          }
                      }
                  ]
              }
          }
      ]
  4. Restart the Ceph Object Gateway:

    Syntax

    ceph orch restart CEPH_OBJECT_GATEWAY_SERVICE_NAME

    Example

    [ceph: root@host 01 /]# ceph orch restart rgw.rgw.1
    
    Scheduled to restart rgw.rgw.1.host03.vkfldf on host 'host03’
  5. Exit the shell and as a root user, configure Amazon S3 on your bootstrapped node:

    Example

    [root@host01 ~]# s3cmd --configure
    
    Enter new values or accept defaults in brackets with Enter.
    Refer to user manual for detailed description of all options.
    
    Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
    Access Key: a21e86bce636c3aa2
    Secret Key: cf764951f1fdde5f
    Default Region [US]:
    
    Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
    S3 Endpoint [s3.amazonaws.com]: 10.0.210.78:80
    
    Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
    if the target S3 system supports dns based buckets.
    DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: 10.0.210.78:80
    
    Encryption password is used to protect your files from reading
    by unauthorized persons while in transfer to S3
    Encryption password:
    Path to GPG program [/usr/bin/gpg]:
    
    When using secure HTTPS protocol all communication with Amazon S3
    servers is protected from 3rd party eavesdropping. This method is
    slower than plain HTTP, and can only be proxied with Python 2.7 or newer
    Use HTTPS protocol [Yes]: No
    
    On some networks all internet access must go through a HTTP proxy.
    Try setting it here if you can't connect to S3 directly
    HTTP Proxy server name:
    
    New settings:
      Access Key: a21e86bce636c3aa2
      Secret Key: cf764951f1fdde5f
      Default Region: US
      S3 Endpoint: 10.0.210.78:80
      DNS-style bucket+hostname:port template for accessing a bucket: 10.0.210.78:80
      Encryption password:
      Path to GPG program: /usr/bin/gpg
      Use HTTPS protocol: False
      HTTP Proxy server name:
      HTTP Proxy server port: 0
    
    Test access with supplied credentials? [Y/n] Y
    Please wait, attempting to list all buckets...
    Success. Your access key and secret key worked fine :-)
    
    Now verifying that encryption works...
    Not configured. Never mind.
    
    Save settings? [y/N] y
    Configuration saved to '/root/.s3cfg'
  6. Create the S3 bucket.

    Syntax

    s3cmd mb s3://NAME_OF_THE_BUCKET_FOR_S3

    Example

    [root@host01 ~]# s3cmd mb s3://awstestbucket
    Bucket 's3://awstestbucket/' created
  7. Create your file, input all the data, and move it to S3 service.

    Syntax

    s3cmd put FILE_NAME  s3://NAME_OF_THE_BUCKET_ON_S3

    Example

    [root@host01 ~]# s3cmd put test.txt s3://awstestbucket
    
    upload: 'test.txt' -> 's3://awstestbucket/test.txt'  [1 of 1]
     21 of 21   100% in    1s    16.75 B/s  done
  8. Create the lifecycle configuration transition policy.

    Syntax

    <LifecycleConfiguration>
      <Rule>
        <ID>RULE_NAME</ID>
        <Filter>
          <Prefix></Prefix>
        </Filter>
        <Status>Enabled</Status>
        <Transition>
          <Days>DAYS</Days>
          <StorageClass>STORAGE_CLASS_NAME</StorageClass>
        </Transition>
      </Rule>
    </LifecycleConfiguration>

    Example

    [root@host01 ~]# cat lc_cloud.xml
    <LifecycleConfiguration>
      <Rule>
        <ID>Archive all objects</ID>
        <Filter>
          <Prefix></Prefix>
        </Filter>
        <Status>Enabled</Status>
        <Transition>
          <Days>2</Days>
          <StorageClass>CLOUDTIER</StorageClass>
        </Transition>
      </Rule>
    </LifecycleConfiguration>
  9. Set the lifecycle configuration transition policy:

    Syntax

    s3cmd setlifecycle FILE_NAME s3://NAME_OF_THE_BUCKET_FOR_S3

    Example

    [root@host01 ~]#  s3cmd setlifecycle lc_config.xml s3://awstestbucket
    
    s3://awstestbucket/: Lifecycle Policy updated
  10. Log in to cephadm shell:

    Example

    [root@host 01 ~]# cephadm shell
  11. Restart the Ceph Object Gateway:

    Syntax

    ceph orch restart CEPH_OBJECT_GATEWAY_SERVICE_NAME

    Example

    [ceph: root@host 01 /]# ceph orch restart rgw.rgw.1
    
    Scheduled to restart rgw.rgw.1.host03.vkfldf on host `host03`

Verification

  1. On the source cluster, verify if the data has moved to S3 with radosgw-admin lc list command:

    Example

    [ceph: root@host01 /]# radosgw-admin lc list
    [
        {
            "bucket": ":awstestbucket:552a3adb-39e0-40f6-8c84-00590ed70097.54639.1",
            "started": "Mon, 26 Sep 2022 18:32:07 GMT",
            "status": "COMPLETE"
        }
    ]
  2. Verify object transition at cloud endpoint:

    Example

    [root@client ~]$ radosgw-admin bucket list
    [
        "awstestbucket"
    ]
  3. List the objects in the bucket:

    [root@host01 ~]$ aws s3api list-objects --bucket awstestbucket --endpoint=http://10.0.209.002:8080
    {
        "Contents": [
            {
                "Key": "awstestbucket/test",
                "LastModified": "2022-08-25T16:14:23.118Z",
                "ETag": "\"378c905939cc4459d249662dfae9fd6f\"",
                "Size": 29,
                "StorageClass": "STANDARD",
                "Owner": {
                    "DisplayName": "test-user",
                    "ID": "test-user"
                }
            }
        ]
    }
  4. List the contents of the S3 bucket:

    Example

    [root@host01 ~]# s3cmd ls s3://awstestbucket
    2022-08-25 09:57            0  s3://awstestbucket/test.txt
  5. Check the information of the file:

    Example

    [root@host01 ~]# s3cmd info s3://awstestbucket/test.txt
    s3://awstestbucket/test.txt (object):
       File size: 0
       Last mod:  Mon, 03 Aug 2022 09:57:49 GMT
       MIME type: text/plain
       Storage:   CLOUDTIER
       MD5 sum:   991d2528bb41bb839d1a9ed74b710794
       SSE:       none
       Policy:    none
       CORS:      none
       ACL:       test-user: FULL_CONTROL
       x-amz-meta-s3cmd-attrs: atime:1664790668/ctime:1664790668/gid:0/gname:root/md5:991d2528bb41bb839d1a9ed74b710794/mode:33188/mtime:1664790668/uid:0/uname:root
  6. Download data locally from Amazon S3:

    1. Configure AWS:

      Example

      [client@client01 ~]$ aws configure
      
      AWS Access Key ID [****************6VVP]:
      AWS Secret Access Key [****************pXqy]:
      Default region name [us-east-1]:
      Default output format [json]:
    2. List the contents of the AWS bucket:

      Example

      [client@client01 ~]$ aws s3 ls s3://dfqe-bucket-01/awstest
      PRE awstestbucket/
    3. Download data from S3:

      Example

      [client@client01 ~]$ aws s3 cp s3://dfqe-bucket-01/awstestbucket/test.txt .
      
      download: s3://dfqe-bucket-01/awstestbucket/test.txt to ./test.txt