0% found this document useful (0 votes)
111 views1 page

Metagen Overview

This document provides an overview of a generic metagenomics workflow that involves sequencing samples, quality filtering the reads, performing either read-based analysis without assembly or assembling reads into contigs followed by mapping reads back to the assembly and recovering genomes. Key steps include demultiplexing samples, quality filtering reads, performing either read-based analysis using tools like metaphlan2 or assembling reads using tools like Megahit, mapping reads back to assemblies, and recovering genomes from assemblies. The document notes that specific workflows may vary in the order and details of steps.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views1 page

Metagen Overview

This document provides an overview of a generic metagenomics workflow that involves sequencing samples, quality filtering the reads, performing either read-based analysis without assembly or assembling reads into contigs followed by mapping reads back to the assembly and recovering genomes. Key steps include demultiplexing samples, quality filtering reads, performing either read-based analysis using tools like metaphlan2 or assembling reads using tools like Megahit, mapping reads back to assemblies, and recovering genomes from assemblies. The document notes that specific workflows may vary in the order and details of steps.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Overview of generic* metagenomics workflow When working with your own data you should never follow any

When working with your own data you should never follow any pipeline
*This is generic; specific workflows can vary on blindly. There can be critical differences based on your data.
the order of steps here and how they are done.
might be done by sequencing facility
sequencing demultiplex quality filter/trim
facility fastq files (split samples by barodes) (remove adapters/primers) fasta/q files
fastqc/multiqc
@HISEQ2500:282:1:1101:1220:1944 1 Some tools: Some tools: >HISEQ2500:282:1:1101:1220:1944 1
ATCGGATCG... ATCGGATCG...
+ • sabre • trimmomatic
<G.<G<AGGII... • fastx_demux (usearch/vsearch) • bbduk.sh (bbtools suite of tools)
• idemp
• fastx barcode splitter (fastx-toolkit)

read-based no-assembly path


analysis
Some tools: consider testing assemblies with and w/o
Some tools:
• TIPP/SEPP assembly
• metaphlan2
path digital normalization • bbnorm
• diginorm
• humann2
• sourmash
• kraken Count Table MetaQUAST is a great
Sample_A Sample_B ... tool for comparing
obj_1 0 428 ... assemblies
Analysis obj_2 306 323 ...
map individual sample reads to (co)-assembly
Generate coverage
Some tools: obj_3 217 1 ... (co)-assembly
• phyloseq • SpiecEasi
... ... ... ...
information (mapping)
• Breakaway • MaAsLin Some assemblers and tools:
• DivNet • DESeq2 • Megahit (assembler) Some tools:
• CORNCOB • SPAdes (assembler) • bowtie2
• idba-ud (assembler) • bwa
• MetAMOS (assembler and analysis pipeline)
• MetaCompass (reference-guided)
Gene calling • MetagenomeScope (visualize assembly graphs)
Recovering genomes A note on MAGs:
MAGs (metagenome-assembled genomes) are
Functional/taxonomic from metagenomes not the same thing as isolate genomes. They are
composite representative genomes of closely
profiling Some tools:
related genomic lineages.
• anvi'o (interactive manual curation of bins; and much more)
Some tools: Some common genomics stuff • CONCOCT (kmer-based and coverage-based binning; also incorporated in anvi'o)
• prodigal (identifies open reading frames) • COCACOLA (kmer-based, coverage-based, and incorporates paired-read linkage of contigs)
• prokka (runs prodigal and performs annotations) • MetaBAT2 (kmer-based and coverage-based binning tool)
• GHOSTKOALA (web-hosted KEGG annotations)
• BLAST (protein nr db/refseq/COGs)
Phylogenomics • BinSanity (primarily coverage-based, optional second round kmer-based binning tool)
• checkm (genome-level taxonomy; and much more)
Comparative • DASTool (a tool for evaluating bins recovered by different methods)
• DESMAN (tool aimed at resolving strains)

Size Mb ( ) GC % ( ) Proportion of Overall relative


genomics II (7) CRD1 (4) IV (2) III (1) X (1) ref. genome
overall rel. abund.
15%
0 4 8

Pangenomics
0
I (4) XV (2) WPC1 (1) UC-A (2) 04 ANE
2.0 2.5 3.0 genome detected abundance ANW 141
141
ANW
Clade 50 60 70 0.25 0.50 0.75 5% 10% IOS 57

% recruited of sample reads


PON 140
RED 33 68 ASE

A KORDI52 WPC2 B n =5 C D Environmental


ION 38
PSW 124
ECG/EAG
38
ASW
ION
Core
CC9605 35
Genes
RCC307
MIT9509 57 IOS

Env. distributions
UW179A
WH8109 35
Environmental MIT9508
GEYO
25
140
MED

N32 38 Accessory
Genes
WH8016 132
UW179B
PON

N5
UW86
II 36
37
WH8020
CC9311
KORDI49 93
CC9616 124
PSE

PSW
II CRD1 IV III X WPC-1 I XV UC-A
KORDI100
RED
N26 36 WH8102 33 25

Presence/Absence
CC9902 APase (phoD)
BL107 Nirtate/Nitrite trans. (ntrABC)
N19 36 UW69
2 Mn trans. (corA)

5.1A Gluconate dehydrogenase (kduD)

Gene
UW106
1,002 1:1 orthologs UW106 14 CC9605
Lactate dehydrogenase (dld)
Cb/Zn/Cd efflux (czcD) 33
132
354,229 AA UW69 XV WH8109
N32 2 2 33
2 2 25 Arylsulfatase (aslA)
3 2 2 6 2 2 2 5 3 5 5 2 Ferritin (ftn)
142 04
Some tools:
16 N5 Catalase peroxidase (katG)
UW86

KORDI100

KORDI49

UW179A
WH8020
UW179B
WH8016
WH8109

WH8102

MIT9508

MIT9509
RCC307
CC9311
CC9605

CC9902

CC9616
N26

UW106
BL107

GEYO
BL107
UW86

UW69
18

N19
N26

N32
IV

N5
N19

CC9902 20

• anvi'o (integrated HMMs for common single-copy gene sets; integrated


0.7
Chl.
Si
III

Spearman
WH8102 19
0
O2
Salinity 141
KORDI100 10 14,036 GCs PO4-3
Temp.
UC-A
pangenomic workflow for identifying orthologs via OrthoMCL)
Fe
31 genomes -0.7
CC9616 11
24 84,784 totalgenes 38
KORDI49 WPC1 20
140 57
• PanOCT (identifies orthologs utilizing synteny information)

UWN26 9
CC9311

N1
9

86
W N3N5
CC H810 2
UW 05 9
= 100 WH8020 15
I

UW 106
52

96

69
48

W 99 7
CCBL10
KO H8 02
UW179B
• StrainPhlAn/PanPhlAn (tools for strain-level analyses)

CC I1 2
2

RD 10
KO 96 00
RD 16
W 93 9
CC I4
UWH802 11
WH8016

W 17 0
2 Core

H8 9B
M GE 6
01
UWIT95 YO
124 76

MI 17 08
V 1,106 GCs

T9 9A
WH7803 2

9
• MUSCLE (alignment software)

50
35,140 genes
WH7805 VI 1 % recruited of total reads
78
GEYO 33 9.0 68
MIT9508
UW179A
CRD1
31
5
5.1B
• FastTree (very fast, pseudo-maximum likelihood tree builder) Unique
RC
C30
7
6.0
3.0
93

0.1
MIT9509
RS9917 VIII
14
0
• RAxML (maximum likelihood tree builder) 7,986 GCs
8,181 genes
0.01
= Not detected
RS9916
UW105
IX 5
6 • Mauve (whole-genome alignment)
UW140 XVI 5

RCC307 X 28 5.3
astrobiomike.github.io

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy