Transitioning data to Amazon S3 cloud service

Edit online

You can transition data to a remote cloud service as part of the lifecycle configuration using storage classes to reduce cost and improve manageability. The transition is unidirectional and data cannot be transitioned back from the remote zone. This feature is to enable data transition to multiple cloud providers such as Amazon (S3).

Use cloud-s3 as tier-type to configure the remote cloud S3 object store service to which the data needs to be transitioned. These do not need a data pool and are defined in terms of the zonegroup placement targets.

Prerequisites

Edit online

An IBM Storage Ceph with Ceph Object Gateway installed.
User credentials for the remote cloud service, Amazon S3.
Target path created on Amazon S3.
s3cmd installed on the bootstrapped node.
Amazon AWS configured locally to download data.

Procedure

Edit online

Create a user with access key and secret key:

Syntax

radosgw-admin user create --uid=USER_NAME --display-name="DISPLAY_NAME" [--access-key ACCESS_KEY --secret-key SECRET_KEY]

Example

[ceph: root@host01 /]# radosgw-admin user create --uid=test-user --display-name="test-user" --access-key a21e86bce636c3aa1 --secret-key cf764951f1fdde5e
{
    "user_id": "test-user",
    "display_name": "test-user",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [],
    "keys": [
        {
            "user": "test-user",
            "access_key": "a21e86bce636c3aa1",
            "secret_key": "cf764951f1fdde5e"
        }
    ],
    "swift_keys": [],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}

On the bootstrapped node, add a storage class with the tier type as cloud-s3:

Note: Once a storage class is created with the --tier-type=cloud-s3 option , it cannot be later modified to any other storage class type.

Syntax

radosgw-admin zonegroup placement add --rgw-zonegroup =ZONE_GROUP_NAME \
                            --placement-id=PLACEMENT_ID \
                            --storage-class =STORAGE_CLASS_NAME \
                            --tier-type=cloud-s3

Example

[ceph: root@host01 /]# radosgw-admin zonegroup placement add --rgw-zonegroup=default \
                                                 --placement-id=default-placement \
                                                 --storage-class=CLOUDTIER \
                                                 --tier-type=cloud-s3
[
    {
        "key": "default-placement",
        "val": {
            "name": "default-placement",
            "tags": [],
            "storage_classes": [
                "CLOUDTIER",
                "STANDARD"
            ],
            "tier_targets": [
                {
                    "key": "CLOUDTIER",
                    "val": {
                        "tier_type": "cloud-s3",
                        "storage_class": "CLOUDTIER",
                        "retain_head_object": "false",
                        "s3": {
                            "endpoint": "",
                            "access_key": "",
                            "secret": "",
                            "host_style": "path",
                            "target_storage_class": "",
                            "target_path": "",
                            "acl_mappings": [],
                            "multipart_sync_threshold": 33554432,
                            "multipart_min_part_size": 33554432
                        }
                    }
                }
            ]
        }
    }
]

Update storage_class.

Note: If the cluster is part of a multi-site setup, run

period update
--commit

so that the zonegroup changes are propagated to all the zones in the multi-site.

Note: Make sure access_key and secret do not start with a digit.

Mandatory parameters are:

access_key is the remote cloud S3 access key used for a specific connection.
secret is the secret key for the remote cloud S3 service.
endpoint is the URL of the remote cloud S3 service endpoint.
region (for AWS) is the remote cloud S3 service region name.

Optional parameters are:

target_path defines how the target path is created. The target path specifies a prefix to which the source bucket-name/object-name is appended. If not specified, the target_path created is rgwx-ZONE_GROUP_NAME-STORAGE_CLASS_NAME-cloud-bucket.
target_storage_class defines the target storage class to which the object transitions. If not specified, the object is transitioned to STANDARD storage class.
retain_head_object, if true, retains the metadata of the object transitioned to cloud. If false (default), the object is deleted post transition. This option is ignored for current versioned objects.
multipart_sync_threshold specifies that objects this size or larger are transitioned to the cloud using multipart upload.

multipart_min_part_size specifies the minimum part size to use when transitioning objects using multipart upload.

Syntax

radosgw-admin zonegroup placement modify --rgw-zonegroup ZONE_GROUP_NAME \
                                         --placement-id PLACEMENT_ID \
                                         --storage-class STORAGE_CLASS_NAME  \
                                         --tier-config=endpoint=AWS_ENDPOINT_URL,\
                                         access_key=AWS_ACCESS_KEY,secret=AWS_SECRET_KEY,\
                                         target_path="TARGET_BUCKET_ON_AWS",\
                                         multipart_sync_threshold=44432,\
                                         multipart_min_part_size=44432,\
                                         retain_head_object=true
                                         region=REGION_NAME

Example

[ceph: root@host01 /]# radosgw-admin zonegroup placement modify --rgw-zonegroup default
                                                                --placement-id default-placement \
                                                                --storage-class CLOUDTIER \
                                                                --tier-config=endpoint=http://10.0.210.010:8080,\
                                                                access_key=a21e86bce636c3aa2,secret=cf764951f1fdde5f,\
                                                                target_path="dfqe-bucket-01",\
                                                                multipart_sync_threshold=44432,\
                                                                multipart_min_part_size=44432,\
                                                                retain_head_object=true
                                                                region= us-east-1

[
    {
        "key": "default-placement",
        "val": {
            "name": "default-placement",
            "tags": [],
            "storage_classes": [
                "CLOUDTIER",
                "STANDARD",
                "cold.test",
                "hot.test"
            ],
            "tier_targets": [
                {
                    "key": "CLOUDTIER",
                    "val": {
                        "tier_type": "cloud-s3",
                        "storage_class": "CLOUDTIER",
                        "retain_head_object": "true",
                        "s3": {
                            "endpoint": "http://10.0.210.010:8080",
                            "access_key": "a21e86bce636c3aa2",
                            "secret": "cf764951f1fdde5f",
                            "region": "",
                            "host_style": "path",
                            "target_storage_class": "",
                            "target_path": "dfqe-bucket-01",
                            "acl_mappings": [],
                            "multipart_sync_threshold": 44432,
                            "multipart_min_part_size": 44432
                        }
                    }
                }
            ]
        }
    }
]

Restart the Ceph Object Gateway:

Syntax

ceph orch restart CEPH_OBJECT_GATEWAY_SERVICE_NAME

Example

[ceph: root@host 01 /]# ceph orch restart rgw.rgw.1

Scheduled to restart rgw.rgw.1.host03.vkfldf on host 'host03’

Exit the shell and as a root user, configure Amazon S3 on your bootstrapped node:

Example

[root@host01 ~]# s3cmd --configure

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.

Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key: a21e86bce636c3aa2
Secret Key: cf764951f1fdde5f
Default Region [US]:

Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
S3 Endpoint [s3.amazonaws.com]: 10.0.210.78:80

Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
if the target S3 system supports dns based buckets.
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: 10.0.210.78:80

Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password:
Path to GPG program [/usr/bin/gpg]:

When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP, and can only be proxied with Python 2.7 or newer
Use HTTPS protocol [Yes]: No

On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't connect to S3 directly
HTTP Proxy server name:

New settings:
  Access Key: a21e86bce636c3aa2
  Secret Key: cf764951f1fdde5f
  Default Region: US
  S3 Endpoint: 10.0.210.78:80
  DNS-style bucket+hostname:port template for accessing a bucket: 10.0.210.78:80
  Encryption password:
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: False
  HTTP Proxy server name:
  HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n] Y
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)

Now verifying that encryption works...
Not configured. Never mind.

Save settings? [y/N] y
Configuration saved to '/root/.s3cfg'

Create the S3 bucket.

Syntax

s3cmd mb s3://NAME_OF_THE_BUCKET_FOR_S3

Example

[root@host01 ~]# s3cmd mb s3://awstestbucket
Bucket 's3://awstestbucket/' created

Create your file, input all the data, and move it to S3 service.

Syntax

s3cmd put FILE_NAME  s3://NAME_OF_THE_BUCKET_ON_S3

Example

[root@host01 ~]# s3cmd put test.txt s3://awstestbucket

upload: 'test.txt' -> 's3://awstestbucket/test.txt'  [1 of 1]
 21 of 21   100% in    1s    16.75 B/s  done

Create the lifecycle configuration transition policy.

Syntax

<LifecycleConfiguration>
  <Rule>
    <ID>RULE_NAME</ID>
    <Filter>
      <Prefix></Prefix>
    </Filter>
    <Status>Enabled</Status>
    <Transition>
      <Days>DAYS</Days>
      <StorageClass>STORAGE_CLASS_NAME</StorageClass>
    </Transition>
  </Rule>
</LifecycleConfiguration>

Example

[root@host01 ~]# cat lc_cloud.xml
<LifecycleConfiguration>
  <Rule>
    <ID>Archive all objects</ID>
    <Filter>
      <Prefix></Prefix>
    </Filter>
    <Status>Enabled</Status>
    <Transition>
      <Days>2</Days>
      <StorageClass>CLOUDTIER</StorageClass>
    </Transition>
  </Rule>
</LifecycleConfiguration>

Set the lifecycle configuration transition policy:

Syntax

s3cmd setlifecycle FILE_NAME s3://NAME_OF_THE_BUCKET_FOR_S3

Example

[root@host01 ~]#  s3cmd setlifecycle lc_config.xml s3://awstestbucket

s3://awstestbucket/: Lifecycle Policy updated

Restart the Ceph Object Gateway:

Syntax

ceph orch restart CEPH_OBJECT_GATEWAY_SERVICE_NAME

Example

[ceph: root@host 01 /]# ceph orch restart rgw.rgw.1

Scheduled to restart rgw.rgw.1.host03.vkfldf on host `host03`

Verification

Edit online

On the source cluster, verify if the data has moved to S3 with radosgw-admin lc list command:

Example

[ceph: root@host01 /]# radosgw-admin lc list
[
    {
        "bucket": ":awstestbucket:552a3adb-39e0-40f6-8c84-00590ed70097.54639.1",
        "started": "Mon, 26 Sep 2022 18:32:07 GMT",
        "status": "COMPLETE"
    }
]

Verify object transition at cloud endpoint:

Example

[root@client ~]$ radosgw-admin bucket list
[
    "awstestbucket"
]

List the objects in the bucket:

[root@host01 ~]$ aws s3api list-objects --bucket awstestbucket --endpoint=http://10.0.209.002:8080
{
    "Contents": [
        {
            "Key": "awstestbucket/test",
            "LastModified": "2022-08-25T16:14:23.118Z",
            "ETag": "\"378c905939cc4459d249662dfae9fd6f\"",
            "Size": 29,
            "StorageClass": "STANDARD",
            "Owner": {
                "DisplayName": "test-user",
                "ID": "test-user"
            }
        }
    ]
}

List the contents of the S3 bucket:

Example

[root@host01 ~]# s3cmd ls s3://awstestbucket
2022-08-25 09:57            0  s3://awstestbucket/test.txt

Check the information of the file:

Example

[root@host01 ~]# s3cmd info s3://awstestbucket/test.txt
s3://awstestbucket/test.txt (object):
   File size: 0
   Last mod:  Mon, 03 Aug 2022 09:57:49 GMT
   MIME type: text/plain
   Storage:   CLOUDTIER
   MD5 sum:   991d2528bb41bb839d1a9ed74b710794
   SSE:       none
   Policy:    none
   CORS:      none
   ACL:       test-user: FULL_CONTROL
   x-amz-meta-s3cmd-attrs: atime:1664790668/ctime:1664790668/gid:0/gname:root/md5:991d2528bb41bb839d1a9ed74b710794/mode:33188/mtime:1664790668/uid:0/uname:root

Download data locally from Amazon S3:

Configure AWS:

Example

[client@client01 ~]$ aws configure

AWS Access Key ID [****************6VVP]:
AWS Secret Access Key [****************pXqy]:
Default region name [us-east-1]:
Default output format [json]:

List the contents of the AWS bucket:

Example

[client@client01 ~]$ aws s3 ls s3://dfqe-bucket-01/awstest
PRE awstestbucket/

Download data from S3:

Example

[client@client01 ~]$ aws s3 cp s3://dfqe-bucket-01/awstestbucket/test.txt .

download: s3://dfqe-bucket-01/awstestbucket/test.txt to ./test.txt