Getting Data from AWS¶
We store our data on AWS S3 Buckets. We have two buckets, one for G002 and one for G003. To get access to the buckets, you need to obtain credentials. Buckets are HIPAA compliant and are encrypted.
Obtaining AWS credentials.¶
To get AWS credentials, you can email Troy to get access.
There are two scenarios for access.
-
You already have an AWS key. In that case, email us your IAM ARN and we will add you to the IAM group.
-
You don't have an AWS account and need to be added to the SchiefLab group. In that case, you will receive an email with login instructions, your OTP (one time password) and your security credentials. The security credentials will be your AWS key and secret key and will be used to access the data programmatically
AWS G003¶
To get G003 data, we use AWS buckets. The bucket S3URI
is s3://iavig003sabucket/g003/
. The data is organized by sequencing, sorting and output. To get the data, you need the AWS CLI.
To get AWS CLI, follow the instructions here.
Once you get the AWS CLI, you need to configure it. Follow the instructions here.
$ aws configure
AWS Access Key ID [None]: Secret key in email credentials
AWS Secret Access Key [None]: Secret key in email credentials
Default region name [None]: us-west-2
Default output format [None]: json
Once you are configured, we can check if you have access to the S3 bucket.
$ aws s3 ls s3://iavig003sabucket/ --region af-south-1
PRE g003/
PRE raw_sequences
G003
You have access to the bucket if you are shown the PRE g003/
output.
AWS S3 Copy Specific Files from AWS¶
Below is a way to only copy certain components of the AWS bucket
$ aws s3 cp --recursive s3://iavig003sabucket/g003/sorting/ buckets/g003/sorting
---> 100%
Get all sequencing files
$ aws s3 cp --recursive s3://iavig003sabucket/g003/sequencing/ buckets/g003/sequencing
---> 100%
get the sequencing and sorting directory, excluding large bcl and fastq files
$ aws s3 cp --recursive s3://iavig003sabucket/g003/ ./g003/sequencing --exclude *working_directory/* --exclude *.fastq.gz --exclude *.tif --exclude *.cbcl --exclude *.imf1 --exclude *.filter --exclude *.bin --exclude *Logs/* --exclude *_stdout --exclude *_stderr --exclude *Autofocus/* --exclude *Intensities/*
---> 100%
$ aws s3 cp --recursive s3://iavig003sabucket/g003/output/ buckets/g003/output
---> 100%
AWS S3 Copy Local Files to AWS¶
Copy a directory to AWS. This will copy the entire directory and all subdirectories.
$ aws s3 cp --recursive 221118_VH00124_107_AAAW2V3HV s3://iavig003sabucket/raw_sequnces/
---> 100%
221118_VH00124_107_AAAW2V3HV
This is the flow cell directory from the sequencer. If you are an end user, update to s3://iavig003sabucket/raw_sequences
.
AWS S3 Sync All Files from AWS¶
SYNC THE ENTIRE BUCKET!!
$ aws s3 sync --delete s3://iavig003sabucket/ buckets/ --region af-south-1
---> 100%
Warning: Large File
The entire bucket will likely be over 2 TB
--delete
The --delete
flag will delete any files in the destination that are not in the source. This is useful for keeping the destination in sync with the source. If you don't want to delete files, remove the --delete
flag.
Sync
The sync
command can also be used on specific file susbsets, e.g. sorting
s3://iavig003sabucket/g003
The subpath g003
is the main g003 bucket. The top level may have some other app-related things.