Transitioning data to Amazon S3 cloud service
You can transition data to a remote cloud service as part of the lifecycle configuration using storage classes to reduce cost and improve manageability. The transition is unidirectional and data cannot be transitioned back from the remote zone. This feature is to enable data transition to multiple cloud providers such as Amazon (S3).
Use cloud-s3 as tier-type to configure the remote cloud S3
object store service to which the data needs to be transitioned. These do not need a data pool and
are defined in terms of the zonegroup placement targets.
Prerequisites
-
An IBM Storage Ceph with Ceph Object Gateway installed.
-
User credentials for the remote cloud service, Amazon S3.
-
Target path created on Amazon S3.
-
s3cmdinstalled on the bootstrapped node. -
Amazon AWS configured locally to download data.
Procedure
-
Create a user with access key and secret key:
Syntax
radosgw-admin user create --uid=USER_NAME --display-name="DISPLAY_NAME" [--access-key ACCESS_KEY --secret-key SECRET_KEY]Example
[ceph: root@host01 /]# radosgw-admin user create --uid=test-user --display-name="test-user" --access-key a21e86bce636c3aa1 --secret-key cf764951f1fdde5e { "user_id": "test-user", "display_name": "test-user", "email": "", "suspended": 0, "max_buckets": 1000, "subusers": [], "keys": [ { "user": "test-user", "access_key": "a21e86bce636c3aa1", "secret_key": "cf764951f1fdde5e" } ], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "default_storage_class": "", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "temp_url_keys": [], "type": "rgw", "mfa_ids": [] } -
On the bootstrapped node, add a storage class with the tier type as
cloud-s3:Note: Once a storage class is created with the--tier-type=cloud-s3option , it cannot be later modified to any other storage class type.Syntax
radosgw-admin zonegroup placement add --rgw-zonegroup =ZONE_GROUP_NAME \ --placement-id=PLACEMENT_ID \ --storage-class =STORAGE_CLASS_NAME \ --tier-type=cloud-s3Example
[ceph: root@host01 /]# radosgw-admin zonegroup placement add --rgw-zonegroup=default \ --placement-id=default-placement \ --storage-class=CLOUDTIER \ --tier-type=cloud-s3 [ { "key": "default-placement", "val": { "name": "default-placement", "tags": [], "storage_classes": [ "CLOUDTIER", "STANDARD" ], "tier_targets": [ { "key": "CLOUDTIER", "val": { "tier_type": "cloud-s3", "storage_class": "CLOUDTIER", "retain_head_object": "false", "s3": { "endpoint": "", "access_key": "", "secret": "", "host_style": "path", "target_storage_class": "", "target_path": "", "acl_mappings": [], "multipart_sync_threshold": 33554432, "multipart_min_part_size": 33554432 } } } ] } } ] -
Update
storage_class.Note: If the cluster is part of a multi-site setup, runperiod update --commitso that the zonegroup changes are propagated to all the zones in the multi-site.Note: Make sureaccess_keyandsecretdo not start with a digit.Mandatory parameters are:
-
access_keyis the remote cloud S3 access key used for a specific connection. -
secretis the secret key for the remote cloud S3 service. -
endpointis the URL of the remote cloud S3 service endpoint. -
region(for AWS) is the remote cloud S3 service region name.
Optional parameters are:
-
target_pathdefines how the target path is created. The target path specifies a prefix to which the sourcebucket-name/object-nameis appended. If not specified, the target_path created isrgwx-ZONE_GROUP_NAME-STORAGE_CLASS_NAME-cloud-bucket. -
target_storage_classdefines the target storage class to which the object transitions. If not specified, the object is transitioned to STANDARD storage class. -
retain_head_object, if true, retains the metadata of the object transitioned to cloud. If false (default), the object is deleted post transition. This option is ignored for current versioned objects. -
multipart_sync_thresholdspecifies that objects this size or larger are transitioned to the cloud using multipart upload. -
multipart_min_part_sizespecifies the minimum part size to use when transitioning objects using multipart upload.Syntax
radosgw-admin zonegroup placement modify --rgw-zonegroup ZONE_GROUP_NAME \ --placement-id PLACEMENT_ID \ --storage-class STORAGE_CLASS_NAME \ --tier-config=endpoint=AWS_ENDPOINT_URL,\ access_key=AWS_ACCESS_KEY,secret=AWS_SECRET_KEY,\ target_path="TARGET_BUCKET_ON_AWS",\ multipart_sync_threshold=44432,\ multipart_min_part_size=44432,\ retain_head_object=true region=REGION_NAMEExample
[ceph: root@host01 /]# radosgw-admin zonegroup placement modify --rgw-zonegroup default --placement-id default-placement \ --storage-class CLOUDTIER \ --tier-config=endpoint=http://10.0.210.010:8080,\ access_key=a21e86bce636c3aa2,secret=cf764951f1fdde5f,\ target_path="dfqe-bucket-01",\ multipart_sync_threshold=44432,\ multipart_min_part_size=44432,\ retain_head_object=true region= us-east-1 [ { "key": "default-placement", "val": { "name": "default-placement", "tags": [], "storage_classes": [ "CLOUDTIER", "STANDARD", "cold.test", "hot.test" ], "tier_targets": [ { "key": "CLOUDTIER", "val": { "tier_type": "cloud-s3", "storage_class": "CLOUDTIER", "retain_head_object": "true", "s3": { "endpoint": "http://10.0.210.010:8080", "access_key": "a21e86bce636c3aa2", "secret": "cf764951f1fdde5f", "region": "", "host_style": "path", "target_storage_class": "", "target_path": "dfqe-bucket-01", "acl_mappings": [], "multipart_sync_threshold": 44432, "multipart_min_part_size": 44432 } } } ] } } ]
-
-
Restart the Ceph Object Gateway:
Syntax
ceph orch restart CEPH_OBJECT_GATEWAY_SERVICE_NAMEExample
[ceph: root@host 01 /]# ceph orch restart rgw.rgw.1 Scheduled to restart rgw.rgw.1.host03.vkfldf on host 'host03’ -
Exit the shell and as a root user, configure Amazon S3 on your bootstrapped node:
Example
[root@host01 ~]# s3cmd --configure Enter new values or accept defaults in brackets with Enter. Refer to user manual for detailed description of all options. Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables. Access Key: a21e86bce636c3aa2 Secret Key: cf764951f1fdde5f Default Region [US]: Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3. S3 Endpoint [s3.amazonaws.com]: 10.0.210.78:80 Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used if the target S3 system supports dns based buckets. DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: 10.0.210.78:80 Encryption password is used to protect your files from reading by unauthorized persons while in transfer to S3 Encryption password: Path to GPG program [/usr/bin/gpg]: When using secure HTTPS protocol all communication with Amazon S3 servers is protected from 3rd party eavesdropping. This method is slower than plain HTTP, and can only be proxied with Python 2.7 or newer Use HTTPS protocol [Yes]: No On some networks all internet access must go through a HTTP proxy. Try setting it here if you can't connect to S3 directly HTTP Proxy server name: New settings: Access Key: a21e86bce636c3aa2 Secret Key: cf764951f1fdde5f Default Region: US S3 Endpoint: 10.0.210.78:80 DNS-style bucket+hostname:port template for accessing a bucket: 10.0.210.78:80 Encryption password: Path to GPG program: /usr/bin/gpg Use HTTPS protocol: False HTTP Proxy server name: HTTP Proxy server port: 0 Test access with supplied credentials? [Y/n] Y Please wait, attempting to list all buckets... Success. Your access key and secret key worked fine :-) Now verifying that encryption works... Not configured. Never mind. Save settings? [y/N] y Configuration saved to '/root/.s3cfg' -
Create the S3 bucket.
Syntax
s3cmd mb s3://NAME_OF_THE_BUCKET_FOR_S3Example
[root@host01 ~]# s3cmd mb s3://awstestbucket Bucket 's3://awstestbucket/' created -
Create your file, input all the data, and move it to S3 service.
Syntax
s3cmd put FILE_NAME s3://NAME_OF_THE_BUCKET_ON_S3Example
[root@host01 ~]# s3cmd put test.txt s3://awstestbucket upload: 'test.txt' -> 's3://awstestbucket/test.txt' [1 of 1] 21 of 21 100% in 1s 16.75 B/s done -
Create the lifecycle configuration transition policy.
Syntax
<LifecycleConfiguration> <Rule> <ID>RULE_NAME</ID> <Filter> <Prefix></Prefix> </Filter> <Status>Enabled</Status> <Transition> <Days>DAYS</Days> <StorageClass>STORAGE_CLASS_NAME</StorageClass> </Transition> </Rule> </LifecycleConfiguration>Example
[root@host01 ~]# cat lc_cloud.xml <LifecycleConfiguration> <Rule> <ID>Archive all objects</ID> <Filter> <Prefix></Prefix> </Filter> <Status>Enabled</Status> <Transition> <Days>2</Days> <StorageClass>CLOUDTIER</StorageClass> </Transition> </Rule> </LifecycleConfiguration> -
Set the lifecycle configuration transition policy:
Syntax
s3cmd setlifecycle FILE_NAME s3://NAME_OF_THE_BUCKET_FOR_S3Example
[root@host01 ~]# s3cmd setlifecycle lc_config.xml s3://awstestbucket s3://awstestbucket/: Lifecycle Policy updated -
Log in to
cephadm shell:Example
[root@host 01 ~]# cephadm shell -
Restart the Ceph Object Gateway:
Syntax
ceph orch restart CEPH_OBJECT_GATEWAY_SERVICE_NAMEExample
[ceph: root@host 01 /]# ceph orch restart rgw.rgw.1 Scheduled to restart rgw.rgw.1.host03.vkfldf on host `host03`
Verification
-
On the source cluster, verify if the data has moved to S3 with
radosgw-admin lc listcommand:Example
[ceph: root@host01 /]# radosgw-admin lc list [ { "bucket": ":awstestbucket:552a3adb-39e0-40f6-8c84-00590ed70097.54639.1", "started": "Mon, 26 Sep 2022 18:32:07 GMT", "status": "COMPLETE" } ] -
Verify object transition at cloud endpoint:
Example
[root@client ~]$ radosgw-admin bucket list [ "awstestbucket" ] -
List the objects in the bucket:
[root@host01 ~]$ aws s3api list-objects --bucket awstestbucket --endpoint=http://10.0.209.002:8080 { "Contents": [ { "Key": "awstestbucket/test", "LastModified": "2022-08-25T16:14:23.118Z", "ETag": "\"378c905939cc4459d249662dfae9fd6f\"", "Size": 29, "StorageClass": "STANDARD", "Owner": { "DisplayName": "test-user", "ID": "test-user" } } ] } -
List the contents of the S3 bucket:
Example
[root@host01 ~]# s3cmd ls s3://awstestbucket 2022-08-25 09:57 0 s3://awstestbucket/test.txt -
Check the information of the file:
Example
[root@host01 ~]# s3cmd info s3://awstestbucket/test.txt s3://awstestbucket/test.txt (object): File size: 0 Last mod: Mon, 03 Aug 2022 09:57:49 GMT MIME type: text/plain Storage: CLOUDTIER MD5 sum: 991d2528bb41bb839d1a9ed74b710794 SSE: none Policy: none CORS: none ACL: test-user: FULL_CONTROL x-amz-meta-s3cmd-attrs: atime:1664790668/ctime:1664790668/gid:0/gname:root/md5:991d2528bb41bb839d1a9ed74b710794/mode:33188/mtime:1664790668/uid:0/uname:root -
Download data locally from Amazon S3:
-
Configure AWS:
Example
[client@client01 ~]$ aws configure AWS Access Key ID [****************6VVP]: AWS Secret Access Key [****************pXqy]: Default region name [us-east-1]: Default output format [json]: -
List the contents of the AWS bucket:
Example
[client@client01 ~]$ aws s3 ls s3://dfqe-bucket-01/awstest PRE awstestbucket/ -
Download data from S3:
Example
[client@client01 ~]$ aws s3 cp s3://dfqe-bucket-01/awstestbucket/test.txt . download: s3://dfqe-bucket-01/awstestbucket/test.txt to ./test.txt
-