Skip to content

Getting Data from AWS

We store our data on AWS S3 Buckets. We have two buckets, one for G002 and one for G003. To get access to the buckets, you need to obtain credentials. Buckets are HIPAA compliant and are encrypted.

Obtaining AWS credentials.

To get AWS credentials, you can email Troy to get access.

There are two scenarios for access.

  1. You already have an AWS key. In that case, email us your IAM ARN and we will add you to the IAM group.

  2. You don't have an AWS account and need to be added to the SchiefLab group. In that case, you will receive an email with login instructions, your OTP (one time password) and your security credentials. The security credentials will be your AWS key and secret key and will be used to access the data programmatically

AWS G003

To get G003 data, we use AWS buckets. The bucket S3URI is s3://iavig003sabucket/g003/. The data is organized by sequencing, sorting and output. To get the data, you need the AWS CLI.

To get AWS CLI, follow the instructions here.

Once you get the AWS CLI, you need to configure it. Follow the instructions here.

$ aws configure
AWS Access Key ID [None]: Secret key in email credentials
AWS Secret Access Key [None]: Secret key in email credentials
Default region name [None]: us-west-2
Default output format [None]: json

Once you are configured, we can check if you have access to the S3 bucket.

$ aws s3 ls s3://iavig003sabucket/ --region af-south-1
PRE g003/
PRE raw_sequences

G003

You have access to the bucket if you are shown the PRE g003/ output.

AWS S3 Copy Specific Files from AWS

Below is a way to only copy certain components of the AWS bucket

$ aws s3 cp --recursive  s3://iavig003sabucket/g003/sorting/ buckets/g003/sorting
---> 100%

Get all sequencing files

$ aws s3 cp --recursive  s3://iavig003sabucket/g003/sequencing/ buckets/g003/sequencing
---> 100%

get the sequencing and sorting directory, excluding large bcl and fastq files

$ aws s3 cp --recursive s3://iavig003sabucket/g003/ ./g003/sequencing --exclude *working_directory/* --exclude *.fastq.gz --exclude *.tif --exclude *.cbcl --exclude *.imf1 --exclude *.filter --exclude *.bin --exclude *Logs/* --exclude *_stdout --exclude *_stderr --exclude *Autofocus/* --exclude *Intensities/*
---> 100%

$ aws s3 cp --recursive  s3://iavig003sabucket/g003/output/ buckets/g003/output
---> 100%

AWS S3 Copy Local Files to AWS

Copy a directory to AWS. This will copy the entire directory and all subdirectories.

$ aws s3 cp --recursive  221118_VH00124_107_AAAW2V3HV  s3://iavig003sabucket/raw_sequnces/
---> 100%

221118_VH00124_107_AAAW2V3HV

This is the flow cell directory from the sequencer. If you are an end user, update to s3://iavig003sabucket/raw_sequences.

AWS S3 Sync All Files from AWS

SYNC THE ENTIRE BUCKET!!

$ aws s3 sync --delete  s3://iavig003sabucket/ buckets/ --region af-south-1
---> 100%

Warning: Large File

The entire bucket will likely be over 2 TB

--delete

The --delete flag will delete any files in the destination that are not in the source. This is useful for keeping the destination in sync with the source. If you don't want to delete files, remove the --delete flag.

Sync

The sync command can also be used on specific file susbsets, e.g. sorting

s3://iavig003sabucket/g003

The subpath g003 is the main g003 bucket. The top level may have some other app-related things.