Metagen Overview
Metagen Overview
When working with your own data you should never follow any pipeline
*This is generic; specific workflows can vary on blindly. There can be critical differences based on your data.
the order of steps here and how they are done.
might be done by sequencing facility
sequencing demultiplex quality filter/trim
facility fastq files (split samples by barodes) (remove adapters/primers) fasta/q files
fastqc/multiqc
@HISEQ2500:282:1:1101:1220:1944 1 Some tools: Some tools: >HISEQ2500:282:1:1101:1220:1944 1
ATCGGATCG... ATCGGATCG...
+ • sabre • trimmomatic
<G.<G<AGGII... • fastx_demux (usearch/vsearch) • bbduk.sh (bbtools suite of tools)
• idemp
• fastx barcode splitter (fastx-toolkit)
Pangenomics
0
I (4) XV (2) WPC1 (1) UC-A (2) 04 ANE
2.0 2.5 3.0 genome detected abundance ANW 141
141
ANW
Clade 50 60 70 0.25 0.50 0.75 5% 10% IOS 57
Env. distributions
UW179A
WH8109 35
Environmental MIT9508
GEYO
25
140
MED
N32 38 Accessory
Genes
WH8016 132
UW179B
PON
N5
UW86
II 36
37
WH8020
CC9311
KORDI49 93
CC9616 124
PSE
PSW
II CRD1 IV III X WPC-1 I XV UC-A
KORDI100
RED
N26 36 WH8102 33 25
Presence/Absence
CC9902 APase (phoD)
BL107 Nirtate/Nitrite trans. (ntrABC)
N19 36 UW69
2 Mn trans. (corA)
Gene
UW106
1,002 1:1 orthologs UW106 14 CC9605
Lactate dehydrogenase (dld)
Cb/Zn/Cd efflux (czcD) 33
132
354,229 AA UW69 XV WH8109
N32 2 2 33
2 2 25 Arylsulfatase (aslA)
3 2 2 6 2 2 2 5 3 5 5 2 Ferritin (ftn)
142 04
Some tools:
16 N5 Catalase peroxidase (katG)
UW86
KORDI100
KORDI49
UW179A
WH8020
UW179B
WH8016
WH8109
WH8102
MIT9508
MIT9509
RCC307
CC9311
CC9605
CC9902
CC9616
N26
UW106
BL107
GEYO
BL107
UW86
UW69
18
N19
N26
N32
IV
N5
N19
CC9902 20
Spearman
WH8102 19
0
O2
Salinity 141
KORDI100 10 14,036 GCs PO4-3
Temp.
UC-A
pangenomic workflow for identifying orthologs via OrthoMCL)
Fe
31 genomes -0.7
CC9616 11
24 84,784 totalgenes 38
KORDI49 WPC1 20
140 57
• PanOCT (identifies orthologs utilizing synteny information)
UWN26 9
CC9311
N1
9
86
W N3N5
CC H810 2
UW 05 9
= 100 WH8020 15
I
UW 106
52
96
69
48
W 99 7
CCBL10
KO H8 02
UW179B
• StrainPhlAn/PanPhlAn (tools for strain-level analyses)
CC I1 2
2
RD 10
KO 96 00
RD 16
W 93 9
CC I4
UWH802 11
WH8016
W 17 0
2 Core
H8 9B
M GE 6
01
UWIT95 YO
124 76
MI 17 08
V 1,106 GCs
T9 9A
WH7803 2
9
• MUSCLE (alignment software)
50
35,140 genes
WH7805 VI 1 % recruited of total reads
78
GEYO 33 9.0 68
MIT9508
UW179A
CRD1
31
5
5.1B
• FastTree (very fast, pseudo-maximum likelihood tree builder) Unique
RC
C30
7
6.0
3.0
93
0.1
MIT9509
RS9917 VIII
14
0
• RAxML (maximum likelihood tree builder) 7,986 GCs
8,181 genes
0.01
= Not detected
RS9916
UW105
IX 5
6 • Mauve (whole-genome alignment)
UW140 XVI 5
RCC307 X 28 5.3
astrobiomike.github.io