Contents¶
Summary¶
AWS S3 buckets contain a stable version of the data. The data is organized by sequencing, sorting and output. To get the data using Simple Setup, you need the AWS CLI.
If you are a developer and would like the most recent version of the data with syncing functionality, please see Advanced Setup.
Simple Setup¶
Getting Data from AWS¶
We store our data on AWS S3 Buckets. We have two buckets, one for G002 and one for G003. To get access to the buckets, you need to obtain credentials. Buckets are HIPAA compliant and are encrypted.
Obtaining AWS credentials.¶
To get AWS credentials, you can email Troy to get access.
There are two scenarios for access.
-
You already have an AWS key. In that case, email us your IAM ARN and we will add you to the IAM group.
-
You don't have an AWS account and need to be added to the SchiefLab group. In that case, you will receive an email with login instructions, your OTP (one time password) and your security credentials. The security credentials will be your AWS key and secret key and will be used to access the data programatically
AWS G002¶
To get G002 data, we use AWS buckets. The bucket S3URI
is s3://iavig002westbucket/g002/
. The data is organized by sequencing, sorting and output. To get the data, you need the AWS CLI.
To get AWS CLI, follow the instructions here.
Once you get the AWS CLI, you need to configure it. Follow the instructions here.
$ aws configure
AWS Access Key ID [None]: Secret key in email credentials
AWS Secret Access Key [None]: Secret key in email credentials
Default region name [None]: us-west-2
Default output format [None]: json
Once you are configured, we can check if you have access to the S3 bucket.
$ aws s3 ls s3://iavig002westbucket/
PRE g002/
G002
You have access to the bucket if you are shown the PRE g002/
output.
AWS S3 Copy Specific Files¶
Below is a way to only copy certain components of the AWS bucket
$ aws s3 cp --recursive s3://iavig002westbucket/g002/G002/sorting/ buckets/g002/G002/sorting
---> 100%
Get all sequencing files
$ aws s3 cp --recursive s3://iavig002westbucket/g002/G002/sorting/ buckets/g002/G002/sequencing
---> 100%
get the sequencing directory excluding large bcl and fastq files
$ aws s3 cp --recursive s3://iavig002westbucket/g002/G002/sequencing ./g002/G002/sequencing --exclude *working_directory/* --exclude *.fastq.gz --exclude *.tif --exclude *.cbcl --exclude *.imf1 --exclude *.filter --exclude *.bin --exclude *Logs/* --exclude *_stdout --exclude *_stderr --exclude *Autofocus/* --exclude *Intensities/*
---> 100%
$ aws s3 cp --recursive s3://iavig002westbucket/g002/G002/output/ buckets/g002/G002/output
---> 100%
AWS S3 Sync All Files¶
$ aws s3 sync --delete s3://iavig002westbucket/ buckets/
---> 100%
Warning: Large File
The entire bucket will likely be over 2 TB
--delete
The --delete
flag will delete any files in the destination that are not in the source. This is useful for keeping the destination in sync with the source. If you don't want to delete files, remove the --delete
flag.
Sync
The sync
command can also be used on specific file subsets, e.g. sorting.
Advanced Setup¶
For those users that need to mount Box or Globus, we have two sets of instructions below. The first is for a quick setup if you are using an AWS instance; otherwise, we recommend using the second set of instructions for a local machine.
- The 2 Available Options:
- AWS Instance Quick Setup
- Local Machine Setup
AWS Instance Setup¶
For this to work, you must have all the following completed:
- You have a running AWS instance with a home directory for a user with sudo privileges
- You were invited to the G002 Box.
- You were invited to the G002 Globus collection.
- G00x is installed on your AWS instance. If not, follow the instructions here
Box Auto Sync¶
This could take 3+ hours, but it will be running in the background using systemd.
$ g00x g002 box setup
## Click the link and sign into your Box account when prompted
To check on the status of the sync, run the following:.
$ g00x g002 box status
Globus Auto Sync¶
This could take 1+ hours, but it will be running in the background using systemd.
$ g00x g002 globus setup
## Click the link and sign in to your Globus account when prompted
To check on the status of the sync, run the following.
$ g00x g002 globus status
You are done! Please skip the rest of this page regarding local machine setup.
Local Machine Setup¶
DO NOT DO IF YOU ALREADY COMPLETE THE AWS QUICK SETUP
Mounting Box¶
Rclone¶
To mount Box, we will use a utility called R clone. R clone is a command line utility that can be used to mount cloud storage. To install R clone, follow the instructions here.
Once you install, you can run the following.
$ rclone config
It will ask you a few questions. I used the following answers.
n) new remote
name > box
storage > 6
client_id > (leave blank)
client_secret > (leave blank)
box_config_file > (leave blank)
box_sub_type > 1
Edit advanced config? (y/n) > n
Use auto config? (y/n) > <see below>
use auto config
depends on whether you use a machine with a browser (like your personal computer). If you are on a server without access to a browser, hit n
and you will be given a link to copy and paste into a browser.
If you hit n
, head over to a machine with rclone installed (e.g. brew install rclone) that has a browser and type.
$ rclone authorize box
That will take you to a login page. Just log in and hit authorise. Come back to the command line console, and you will see a code that looks like the following.
#Paste this code into the rclone configuration:
>result {"access_token":"LGOqxaTWf2Tzc6Na0","token_type":"bearer","refresh_token":"t8DBslxjlZ7JXiRBk5rqv1cVFN6O5BUcFEzzzFS2","expiry":"2022-12-19T12:32:17.317533-06:00"}
There you now have Box setup to mount! Check ~/.config/rclone/rclone.conf
to see the configuration and that the token is in there.
Mounting Box¶
To mount Box, you can use the following command
you can't mount this on fsx so put on /mnt/
$ mkdir /mnt/box
change ownership
$ sudo -R chown jwillis:jwillis /mnt/box
mount with rclone
$ rclone mount --daemon g002: /mnt/box
Now Box will be mounted to /mnt/Box
Path must exist
The /mnt/path/to/Box must exist. If it doesn't, you will get an error.
Mounting Globus¶
To install globus, we need Globus Personal Connect. You can find the complete installation here
Once you install, you can run the following.
$ /path/to/globus/install/globusconnectpersonal -start &