Analysis¶
The analysis of the data can be done after the pipeline.
Getting an analysis report¶
The analysis report is the combined sequencing and flow cytometry data. It has all the counts and frequencies of the flow data and the frequencies of the VRC01 class among the sequences. It will then combine those frequencies to give a final frequency of VRC01 among some cell type phenotypes.
The following will produce an analysis report and combine data. It will output a flow
$ g00x g003 analysis report -s g003/G003/output/final_df.feather -f g003/G003/output/flow_output.feather -o g003/G003/output/flow_and_sequencing
from g00x.data import Data
from g00x.analysis.report import combine_seq_and_flow
data = Data()
sequencing_dataframe_path = "g003/G003/output/final_df.feather"
flow_dataframe_path = "g003/G003/output/flow_output.feather"
# input the sequences to feather
sequencing_dataframe = pd.read_feather(sequencing_dataframe_path)
flow_dataframe = pd.read_feather(flow_dataframe_path)
# generate three different dataframes
seq_and_flow_df, seq_and_flow_df_long_name, seq_and_flow_df_long_form = combine_seq_and_flow(
data, sequencing_dataframe, flow_dataframe
)
This will output a flow_and_sequencing.feather
, flow_and_sequencing_long_names.feather
, and flow_and_sequencing_long_form.feather
.
The long form is the long form dataframe:
run_purpose | pubID | ptid | group | weeks | visit_id | probe_set | sample_type | value_type | value | short_name | long_name | pbmc_gate_expression | calculation |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
KWTRPG003 | G003-243 | G003-243 | 10 | -5 | V91 | eODGT8 | PBMC | count | 102.0 | IgD+/KO-/Antigen++ B cells | IgD+/KO-/Antigen++ B cells | P14 | |
KWTRPG003 | G003-243 | G003-243 | 10 | 8 | V201 | eODGT8 | PBMC | count | 2163368.0 | IgD+ B cells | IgD+ B cells | P6 | |
The long names are pivoted:
Unnamed: 0 | run_purpose | pubID | ptid | group | weeks | visit_id | probe_set | sample_type | B cells | Dump- | IgD+ B cells | IgD+/Antigen++ B cells | IgD+/Antigen++/KO- B cells | IgD+/KO- B Cells | IgD+/KO-/Antigen++ B cells | IgD- B cells | IgD-/Antigen++ B cells | IgD-/Antigen++/KO- B cells | IgD-/IgG+/KO- | IgD-/IgG-/IgM+/Antigen++ B cells | IgD-/IgG-/IgM+/KO-/Antigen++ B cells | IgD-/IgM+/IgG- B cells | IgD-/IgM-/IgG+/Antigen++ B cells | IgD-/IgM-/IgG+/Antigen++/KO- B cells | IgD-/KO-/Antigen++ (sorted) B cells | IgD-IgM-IgG+ B cells | IgD-KO- B cells | IgG-/IgM+/KO- B cells | IgG-/IgM+/KO-/Antigen++ B cells | IgM-/IgG+/KO-/Antigen++ B cells | IgM-/IgG- B cells | IgM-/IgG-/Antigen-- B cells | IgM-/IgG-/KO- B cells | IgM-/IgG-/KO-/Antigen-- B cells | IgM-IgG-/Antigen--/KO- B cells | Lymphocytes | Number of IGD- sequences that are VRC01-class | Number of IGHA sequences that are VRC01-class | Number of IGHA1*01 sequences that are VRC01-class | Number of IGHA2*01 sequences that are VRC01-class | Number of IGHD sequences that are VRC01-class | Number of IGHD*02 sequences that are VRC01-class | Number of IGHG sequences that are VRC01-class | Number of IGHG1*01 sequences that are VRC01-class | Number of IGHG2*01 sequences that are VRC01-class | Number of IGHG3*01 sequences that are VRC01-class | Number of IGHG4*01 sequences that are VRC01-class | Number of IGHM sequences that are VRC01-class | Number of IGHM*01 sequences that are VRC01-class | Number of IGHV1-2*02 sequences that are VRC01-class | Number of IGHV1-2*04 sequences that are VRC01-class | Number of IGHV1-2*05 sequences that are VRC01-class | Number of IGHV1-2*06 sequences that are VRC01-class | Number of sequences | Number of undefined-allele sequences that are VRC01-class | Percent IgA^{+}KO^- among Ag^{--} | Percent IgD^{-}KO^{-} among Ag^{++} | Percent IgG^{+}KO^- among Ag^{++} | Percent IgM{+}KO^- among Ag^{++} | Percent antigen-specific (IgD-GT8^{++}) among IgG^{+} | Percent antigen-specific among IgD^- | Percent antigen-specific among IgG^{+} | Percent antigen-specific among IgM | Percent epitope-specific (CD4bs-specific) among IgG^{+} | Percent epitope-specific (KO^-Ag^{++}) among IgD^- | Percent epitope-specific (KO^-Ag^{++}) among IgG^{+} | Percent epitope-specific (KO^-Ag^{++}) among IgM | Percent of IGD- sequences that are VRC01-class | Percent of IGHA sequences that are VRC01-class | Percent of IGHA1*01 sequences that are VRC01-class | Percent of IGHA2*01 sequences that are VRC01-class | Percent of IGHD sequences that are VRC01-class | Percent of IGHD*02 sequences that are VRC01-class | Percent of IGHG sequences that are VRC01-class | Percent of IGHG1*01 sequences that are VRC01-class | Percent of IGHG2*01 sequences that are VRC01-class | Percent of IGHG3*01 sequences that are VRC01-class | Percent of IGHG4*01 sequences that are VRC01-class | Percent of IGHM sequences that are VRC01-class | Percent of IGHM*01 sequences that are VRC01-class | Percent of IGHV1-2*02 sequences that are VRC01-class | Percent of IGHV1-2*04 sequences that are VRC01-class | Percent of IGHV1-2*05 sequences that are VRC01-class | Percent of IGHV1-2*06 sequences that are VRC01-class | Percent of VRC01-class sequences among IgA | Percent of VRC01-class sequences among IgD- | Percent of VRC01-class sequences among IgG | Percent of VRC01-class sequences among IgM | Percent of undefined-allele sequences that are VRC01-class | Singlets | num_not_vrc01_class | num_vrc01_class |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
58 | KWTRPG003 | G003-630 | G003-630 | 10 | 16 | V257 | eODGT8 | PBMC | 2.85673e+06 | 2.92277e+06 | 1.60911e+06 | 199 | 165 | 1.60499e+06 | 167 | 1.24337e+06 | 4293 | 1617 | 763753 | 13 | 11 | 124732 | 4022 | 1470 | 1599 | 767771 | 1.23859e+06 | 124691 | 11 | 1471 | 328433 | 250 | 328199 | 112 | 112 | 3.46313e+06 | 57 | 11 | 11 | 0 | 0 | 0 | 45 | 38 | 2 | 2 | 3 | 0 | 0 | 57 | 0 | 0 | 0 | 309 | 1 | 44.8 | 37.666 | 36.549 | 0.00882181 | 0.559151 | 0.34527 | 0.523854 | 0.0104223 | 0.208265 | 0.128602 | 0.191594 | 0.00881891 | 18.4466 | 40.7407 | 40.7407 | 0 | 0 | 0 | 17.5781 | 19.0955 | 10.5263 | 8.69565 | 20 | 0 | 0 | 71.25 | 0 | 0 | 0 | nan | 0.0237226 | 0.0336786 | 0 | 7.69231 | 3.26462e+06 | 252 | 57 |
83 | KWTRPG003 | G003-799 | G003-799 | 6 | 21 | V292 | eODGT8 | PBMC | 1.24465e+07 | 1.27558e+07 | 7.06433e+06 | 1401 | 1229 | 7.0634e+06 | 1231 | 5.35685e+06 | 15885 | 9990 | 3.16765e+06 | 47 | 42 | 531550 | 14962 | 9157 | 9930 | 3.18366e+06 | 5.34015e+06 | 531499 | 42 | 9169 | 1.63602e+06 | 862 | 1.6356e+06 | 710 | 708 | 1.44308e+07 | 404 | 45 | 45 | 0 | 0 | 0 | 358 | 348 | 2 | 5 | 3 | 1 | 1 | 404 | 0 | 0 | 0 | 519 | 0 | 82.1346 | 62.8895 | 61.2017 | 0.00790218 | 0.498953 | 0.296536 | 0.469962 | 0.00884207 | 0.311905 | 0.18537 | 0.288001 | 0.00790142 | 77.842 | 86.5385 | 90 | 0 | 0 | 0 | 78.1659 | 79.2711 | 40 | 50 | 75 | 12.5 | 12.5 | 88.4026 | 0 | 0 | 0 | nan | 0.144296 | 0.225119 | 0.000987678 | 0 | 1.40101e+07 | 115 | 404 |
The short names are also pivoted:
run_purpose | pubID | ptid | group | weeks | visit_id | probe_set | sample_type | B cells | Dump- | IgD+ B cells | IgD+/Antigen++ B cells | IgD+/Antigen++/KO- B cells | IgD+/KO- B Cells | IgD+/KO-/Antigen++ B cells | IgD- B cells | IgD-/Antigen++ B cells | IgD-/Antigen++/KO- B cells | IgD-/IgG+/KO- | IgD-/IgG-/IgM+/Antigen++ B cells | IgD-/IgG-/IgM+/KO-/Antigen++ B cells | IgD-/IgM+/IgG- B cells | IgD-/IgM-/IgG+/Antigen++ B cells | IgD-/IgM-/IgG+/Antigen++/KO- B cells | IgD-/KO-/Antigen++ (sorted) B cells | IgD-IgM-IgG+ B cells | IgD-KO- B cells | IgG-/IgM+/KO- B cells | IgG-/IgM+/KO-/Antigen++ B cells | IgM-/IgG+/KO-/Antigen++ B cells | IgM-/IgG- B cells | IgM-/IgG-/Antigen-- B cells | IgM-/IgG-/KO- B cells | IgM-/IgG-/KO-/Antigen-- B cells | IgM-IgG-/Antigen--/KO- B cells | Lymphocytes | Singlets | num_IGHA1*01_vrc01_class_sequences | num_IGHA2*01_vrc01_class_sequences | num_IGHA_vrc01_class_sequences | num_IGHD*02_vrc01_class_sequences | num_IGHD_vrc01_class_sequences | num_IGHG1*01_vrc01_class_sequences | num_IGHG2*01_vrc01_class_sequences | num_IGHG3*01_vrc01_class_sequences | num_IGHG4*01_vrc01_class_sequences | num_IGHG_vrc01_class_sequences | num_IGHM*01_vrc01_class_sequences | num_IGHM_vrc01_class_sequences | num_IGHV1-2*02_vrc01_class_sequences | num_IGHV1-2*04_vrc01_class_sequences | num_IGHV1-2*05_vrc01_class_sequences | num_IGHV1-2*06_vrc01_class_sequences | num_igdneg_vrc01_class_sequences | num_sequences | num_undefined-allele_vrc01_class_sequences | percent_IGHA1*01_vrc01_class_sequences | percent_IGHA2*01_vrc01_class_sequences | percent_IGHA_vrc01_class_sequences | percent_IGHD*02_vrc01_class_sequences | percent_IGHD_vrc01_class_sequences | percent_IGHG1*01_vrc01_class_sequences | percent_IGHG2*01_vrc01_class_sequences | percent_IGHG3*01_vrc01_class_sequences | percent_IGHG4*01_vrc01_class_sequences | percent_IGHG_vrc01_class_sequences | percent_IGHM*01_vrc01_class_sequences | percent_IGHM_vrc01_class_sequences | percent_IGHV1-2*02_vrc01_class_sequences | percent_IGHV1-2*04_vrc01_class_sequences | percent_IGHV1-2*05_vrc01_class_sequences | percent_IGHV1-2*06_vrc01_class_sequences | percent_ag_among_igd_neg | percent_ag_among_igg | percent_ag_among_igm | percent_cd4bs_among_igg | percent_ep_among_igd_neg | percent_ep_among_igg | percent_ep_among_igm | percent_gt8++_among_igg | percent_igako_among_ag | percent_igdneg_vrc01_class_sequences | percent_iggko_among_ag | percent_igmko_among_ag | percent_ko_among_ag_igd_neg | percent_undefined-allele_vrc01_class_sequences | percent_vrc01_among_iga | percent_vrc01_among_igd_neg | percent_vrc01_among_igg | percent_vrc01_among_igm | num_not_vrc01_class | num_vrc01_class |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
KWTRPG003 | G003-630 | G003-630 | 10 | 10 | V215 | eODGT8 | PBMC | 3494736.0 | 3547169.0 | 1717301.0 | 586.0 | 427.0 | 1712345.0 | 430.0 | 1772133.0 | 7731.0 | 1924.0 | 1019144.0 | 483.0 | 143.0 | 262713.0 | 6813.0 | 1592.0 | 1889.0 | 1028967.0 | 1760743.0 | 262022.0 | 143.0 | 1599.0 | 465573.0 | 392.0 | 465126.0 | 123.0 | 123.0 | 4034868.0 | 3887505.0 | 10.0 | 0.0 | 10.0 | 0.0 | 0.0 | 23.0 | 1.0 | 2.0 | 1.0 | 27.0 | 0.0 | 0.0 | 37.0 | 0.0 | 0.0 | 0.0 | 37.0 | 305.0 | 0.0 | 23.25581395348837 | 0.0 | 22.22222222222222 | 0.0 | 0.0 | 11.330049261083744 | 5.555555555555555 | 11.11111111111111 | 10.0 | 10.843373493975903 | 0.0 | 0.0 | 60.65573770491803 | 0.0 | 0.0 | 0.0 | 0.43625393805092505 | 0.6621203595450583 | 0.1838508181932375 | 0.18358217513292457 | 0.10659470818499515 | 0.1553985696334285 | 0.05443202277770799 | 0.751336048677946 | 31.377551020408163 | 12.131147540983607 | 23.367092323499193 | 0.05457556999030616 | 24.886819298926397 | 0.0 | 0.01293116132080269 | 0.016850447309648874 | 0.0 | 268.0 | 37.0 | |
KWTRPG003 | G003-799 | G003-799 | 6 | 21 | V292 | eODGT8 | PBMC | 12446531.0 | 12755837.0 | 7064332.0 | 1401.0 | 1229.0 | 7063404.0 | 1231.0 | 5356848.0 | 15885.0 | 9990.0 | 3167650.0 | 47.0 | 42.0 | 531550.0 | 14962.0 | 9157.0 | 9930.0 | 3183665.0 | 5340149.0 | 531499.0 | 42.0 | 9169.0 | 1636022.0 | 862.0 | 1635598.0 | 710.0 | 708.0 | 14430849.0 | 14010068.0 | 45.0 | 0.0 | 45.0 | 0.0 | 0.0 | 348.0 | 2.0 | 5.0 | 3.0 | 358.0 | 1.0 | 1.0 | 404.0 | 0.0 | 0.0 | 0.0 | 0.0 | 404.0 | 519.0 | 90.0 | 0.0 | 86.53846153846155 | 0.0 | 0.0 | 79.27107061503416 | 40.0 | 50.0 | 75.0 | 78.16593886462883 | 12.5 | 12.5 | 88.40262582056893 | 0.0 | 0.0 | 0.0 | 0.2965363213591276 | 0.469961506628367 | 0.00884206565704073 | 0.3119046759002596 | 0.18537020277596078 | 0.28800140718323064 | 0.007901420374376822 | 0.4989532504205059 | 82.13457076566125 | 77.84200385356455 | 61.20171100120305 | 0.007902178555368872 | 62.889518413597735 | 0.0 | 0.1442958803882238 | 0.2251190038681148 | 0.0009876775467971028 | 115.0 | 404.0 |
Count¶
Easily count the amount of samples we have.
$ g00x g003 analysis count -f g003/G003/output/flow_output.feather -o count
This will output: