Synchronizing with AWS S3 Storage

Sync can be used to synchronize files when the source or destination is AWS S3 Cloud Object Storage. Each endpoint (HSTS) of the async session must be configured to support Sync and the async must include certain file system-related options.

About this task

Capabilities:

  • Noncontinuous PUSH, PULL, and BIDI synchronization between a local disk and AWS S3, and between S3 buckets.
  • Continuous PULL and BIDI when S3 is the content source requires the --scan-interval option.

Requirements:

  • An IBM Aspera On-Demand instance in AWS S3, or HSTS for Linux or Windows installed on a virtual machine instance in AWS with Trapd enabled. For instructions on setting up a HSTS in the cloud, see the Server Setup in Amazon EC2/Amazon S3.
  • The S3 instance must have an on-demand entitlement and a Aspera Sync-enabled license.
  • The async binary must be installed on both the source and destination server.
  • Configure the S3 instance, or both S3 endpoints if you are running an S3-to-S3 synchronization, as described in the following steps.

Procedure

  1. SSH into your instance as root by running the following command.
    The command is for Linux but also works for Mac. Windows users must use an SSH tool, such as PuTTY.
    # ssh -i identity_file -p 33001 ec2-user@ec2_host_ip
  2. Elevate to root privileges by running the following command:
    # su -
  3. Set an S3 docroot for the system account user that is used to run async.
    # asconfigurator -x "set_user_data;user_name,username;absolute,s3://s3.amazonaws.com/bucketname"

    If you are not using IAM roles, then you must also specify the S3 credentials in your docroot:

    s3://access_id:secret_key@s3.amazonaws.com/my_bucket

    By setting the docroot for the system user, the account becomes an Aspera transfer user.

  4. Set database and log directories for async.
    These directories must be located in /mnt/ephemeral/data. The /mnt/ephemeral/ directory is no-cost ephemeral storage that is associated with your instance. Create a directory to use that is named for the transfer user, and giving the transfer user write access.
    For example, if the transfer user is ec2_user, run the following commands to create the directory /mnt/ephemeral/data/ec2_user, create the database and log subdirectories, give ec2_user write access, and set the directories as the location for the database and logs:
    # mkdir -p /mnt/ephemeral/data
    # mkdir /mnt/ephemeral/data/ec2_user
    # mkdir /mnt/ephemeral/data/ec2_user/db
    # mkdir /mnt/ephemeral/data/ec2_user/log
    # chown -R ec2_user /mnt/ephemeral/data/ec2_user
    # asconfigurator -x "set_node_data;async_db_dir,/mnt/ephemeral/data/ec2_user/db"
    # asconfigurator -x "set_node_data;async_log_dir,/mnt/ephemeral/data/ec2_user/log"

Examples of Sync to or from S3

About this task

Note: If the client is on the cloud storage host, the following options are required:
  • The log directory and local database directory must be specified by using the -L and -b options.
  • The --apply-local-docroot option must be used to transfer content into the object storage, rather than the local disk.

The following examples include the optional arguments --transfer-threads, --local-fs-threads, and --remote-fs-threads, which improve performance when one or both endpoints are in cloud storage.

One-time push from local disk to S3:

A one-time (noncontinuous) push that is run from a local disk to an S3 bucket by using SSH keys where ec2_user is the transfer user. For more information about using SSH keys, see Creating SSH keys).

# async -N sync-to-s3 -d /data/data-2017-01 -r ec2_user@192.0.4.24:/data -i /bobcat/.ssh/private_key -K push -B /mnt/ephemeral/data/db --transfer-threads=8 --remote-fs-threads=16

One-time bidi from S3 to local disk:

A one-time bidirectional sync that is run from the S3 client to a local disk:

# async -L /mnt/ephemeral/data/log --apply-local-docroot -N bidi_london -d /data -r bear@192.0.12.442:/data -K bidi -b /mnt/ephemeral/data/db -B /async/log --transfer-threads=8 --local-fs-threads=16

One-time pull from S3 to S3:

A one-time pull by ec2_user from s3host to /data/2017 in the client S3 storage:

# async -L /mnt/ephemeral/data/log --apply-local-docroot -N s3sync -d /data/2017 -r ec2_user@s3host:/data/2017-01 -K pull -b /tmp --transfer-threads=8 --local-fs-threads=16 --remote-fs-threads=16