0% found this document useful (0 votes)

9 views6 pages

PC#1 Exercises Introduction To NCBI 2020-Solved

The document provides an introduction to the NCBI and its databases, including how to explore the Entrez system and perform searches related to cancer and human sequences. It details the number of records found in various databases for the terms 'cancer' and 'human', as well as the significance of using specific search expressions. Additionally, it covers the use of Batch Entrez for retrieving sequences related to Homo sapiens and includes instructions for downloading data using command line tools.

Uploaded by

marti.diez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views6 pages

PC#1 Exercises Introduction To NCBI 2020-Solved

Uploaded by

marti.diez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Practical Session #1: Introduction to NCBI and Entrez databases.

I. Explore the National Center for Biotechnology information (NCBI) website and get familiar
with its design and environment and its major databases.

https://www.ncbi.nlm.nih.gov/

What NCBI is? How many databases are hosted by the NCBI?

59 https://www.ncbi.nlm.nih.gov/guide/all/#databases

Which of the following NCBI databases could be considered as primary databases? Protein,
Nucleotide, CDD, PubMed, Gene, Genomes, Refseq, BioProjects.

Protein, Nucleotide, PubMed, BioProjects. Pubmed would be a special case since is not exactly
“experimental data”, but data is stored “as is” there is no postprocessing to it.

II. The Entrez system

Perform a Global Query at the NCBI through the Entrez using the expression (all[Filter]). Which
database contains the largest number of records?

https://www.ncbi.nlm.nih.gov/entrez/query/static/help/
Summary_Matrices.html#Search_Fields_and_Qualifiers

https://www.ncbi.nlm.nih.gov/genbank/statistics/

https://www.ncbi.nlm.nih.gov/search/all/?term=all[Filter]

Now we are interested to find all the information at the NCBI related to the group of diseases
in humans known as cancer. Type the word “cancer” in the search box on the NCBI homepage
and run the search (Global Query). Note that the query is interpreted differently in different
databases.
How many scientific papers contain this word?
https://www.ncbi.nlm.nih.gov/pubmed/?term=cancer 4180400

How many nucleotide sequences?

https://www.ncbi.nlm.nih.gov/nuccore/?term=cancer 10438437

How many cancer-related functional genomics studies have been stored at NCBI?
https://www.ncbi.nlm.nih.gov/bioproject/?term=cancer 28765

Why does taxonomy database give us one record? (For discussion in class)
https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?
mode=Info&id=6754&lvl=3&lin=f&keep=1&srchmode=1&unlock there is a crustacean from
the family Cancridae
Perform a new Global Query but using the word “human”.

How many entries (records) have been obtained for the different databases?
https://www.ncbi.nlm.nih.gov/search/all/?term=human

Would we get the same results if we perform a Global Query using the search expression
(homo sapiens),
https://www.ncbi.nlm.nih.gov/search/all/?term=homo%20sapiens

(human[organism])
https://www.ncbi.nlm.nih.gov/search/all/?term=human%5Borganism%5D

or (homo sapiens[organism])? https://www.ncbi.nlm.nih.gov/search/all/?term=homo+sapiens

%5Borganism%5D

Why?
It looks like in some databases human is a clear alias of “homo sapiens”, but not in all of them.
(eg: Pubchem databases)

When the [organism] is included this term is only looked in the field “organism” of the
database. Some of them use controlled vocabularies, some not.

Beware that results in main screen not necessarily matches results in the database (when we
click the link) eg: taxonomy.

How is the expression [organism] interpreted by each database?

It depends if the expression is recognized.

If you are interested in studying human cancer, which of the following strategies would
produce a more useful set of results in a Global Query at the NCBI?

cancer AND human

cancer[organism] AND human

cancer AND human[organism]

cancer OR human[organism]
Cancer AND human search both terms in any place we can find cases like:
https://www.ncbi.nlm.nih.gov/gene/39645575 a gene from a “Klebsiella pneumoniae” isolated
from a cancer patient.

III. At NCBI each record is assigned a UID “unique integer identifier” for internal tracking. In
sequence databases this unique identifier is also known as the Accession number.

What NCBI database the following UIDs belong to?

CM000253.1 GeneBank Nucleotide

NG_011877.1 RefSeq Nucleotide

SRX4644664 SRA

NP_002266.2 Refseq protein

CP027442.1 GeneBank Nucleotide (take care with the genome entry)

PRJNA490405 BioProject

CAB37359.1 GeneBank protein

ADE87724.1 GeneBank protein

IV. Open the NCBI entry with accession number NG_011877 and get familiar with the format
and the different fields used to store sequence information. This will open in the GenBank
Flat File Format.

What does this entry represent? Do you think this entry provides cross-references (links) to
other databases? From which organism this sequence was obtained? What is the UID or
identifier for this organism in the Taxanomy database? What does the underscore “_” in the
accession number stand for? Display the entry in FASTA format. What happened?

V. In the Taxonomy database explore all the information related to the organism Homo
sapiens.

Look at the lineage for this taxon. What order do humans belong to? Primates What is the txid
for this mammalian order? 9443

How many human protein sequences are there today at the NCBI? 1421783

VI. Advance searches

With which of these strategies will you find all the human sequences stored in the nucleotide
database at the NCBI?

A. txid9606[Primary organism]

B. homo sapiens[Primary Organism]

C. homo sapiens[porgn]
D. human[porgn]

They are al synonyms. In this case controlled vocabulary works.

VI. Batch Entrez

Use Batch Entrez to upload a file of GIs or accession numbers from the Nucleotide or Protein
databases, or upload a list of record identifiers from other Entrez databases. Batch Entrez will
download automatically the corresponding records.

In this exercise we will retrieve from the NCBI database all sequences related to Homo sapiens
tumor protein 53 (TP53) published on a paper with PubMed accession number PMC3675194.
This flat text file has a list of the accession numbers referenced in this paper.

1. Save the text file locally in your computer.

2. Open Batch Entrez.

https://www.ncbi.nlm.nih.gov/sites/batchentrez

3. Select the database from which the list of accessions will be queried.
4. Use the “Browse” button to select the filename containing the list of idetifiers from
your system directory.
5. After pressing the “Retrieve” button you will see a list of record summaries. Retrieve
them!
6. Optionally, select a format in which to display the data for viewing, and/or saving.
Select “Send to file” to save the file.

How many records are on the list? 79

From what database the entries belong to?nucleotide

Do all entries represent human sequences? Yes

grep "Homo sapiens" nuccore_result.txt |less -NS

Do all entries represent mRNA sequences? Yes

grep " bp " nuccore_result.txt |grep "mRNA"|less

Do all sequences belong to the same human subject? No

Do all sequences have the same length? No

grep " bp " nuccore_result.txt |sort -n |less

If it does not work you can obtain the same results with this link:
https://www.ncbi.nlm.nih.gov/nuccore/?term=KC820708:KC820786[pacc]
You can download all this sequences as fasta files using e-utilities, under linux / OSX / Cygwin,
etc:

1) Download the PMC3675194-List_IDs.txt file

2) Go to the downloaded directory in a command line shell (eg: bash)
3) Execute the following 2 commands:
a. dos2unix PMC3675194-List_IDs.txt
b. cat PMC3675194-List_IDs.txt |xargs -tI% wget -O %.fasta
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=
%&rettype=fasta&retmode=text

Note that the command “b” starts with “cat” and ends with “text”. The command “a” is
important, you should know why.

Manual - Profinet Board - CP (TIA)
No ratings yet
Manual - Profinet Board - CP (TIA)
13 pages
15GN402L Final Bioinformatics Lab Manual
No ratings yet
15GN402L Final Bioinformatics Lab Manual
68 pages
Lecture3 4
No ratings yet
Lecture3 4
73 pages
Bioinformatics Practical File
No ratings yet
Bioinformatics Practical File
12 pages
Biological Database 1
No ratings yet
Biological Database 1
50 pages
Practical
No ratings yet
Practical
9 pages
Bookshelf NBK21101
100% (1)
Bookshelf NBK21101
451 pages
Model Test 133
No ratings yet
Model Test 133
16 pages
Bioinformatics Manual Updated
No ratings yet
Bioinformatics Manual Updated
48 pages
4 Bioinformaticsdatabases
No ratings yet
4 Bioinformaticsdatabases
71 pages
Bioinfo Lab Final
No ratings yet
Bioinfo Lab Final
49 pages
Fertilizer Brochure
No ratings yet
Fertilizer Brochure
8 pages
Lecture 3
No ratings yet
Lecture 3
55 pages
Lecture 3 Database
No ratings yet
Lecture 3 Database
81 pages
Bioinformatics 1
No ratings yet
Bioinformatics 1
37 pages
Intro Client Update Latest
No ratings yet
Intro Client Update Latest
13 pages
Bioinformatics 1 p2
No ratings yet
Bioinformatics 1 p2
22 pages
Bioinformatics Exercise TYBSC
No ratings yet
Bioinformatics Exercise TYBSC
13 pages
Literature Database
No ratings yet
Literature Database
37 pages
Concall SWSOLAR
No ratings yet
Concall SWSOLAR
20 pages
Genomics
No ratings yet
Genomics
24 pages
Genomics & Proteomics
No ratings yet
Genomics & Proteomics
22 pages
Molecular Genetics - Lab Manual - 22 May 2021
No ratings yet
Molecular Genetics - Lab Manual - 22 May 2021
36 pages
Module1 Understanding Bioinformatics
No ratings yet
Module1 Understanding Bioinformatics
28 pages
Assignment 1 - Database - Feb 2022
No ratings yet
Assignment 1 - Database - Feb 2022
2 pages
Bioinformatics Glossary
No ratings yet
Bioinformatics Glossary
4 pages
University of Okara: Name: Topic: Subject: Semester: Department
No ratings yet
University of Okara: Name: Topic: Subject: Semester: Department
29 pages
LO4 Access To Sequenced Data and Related Information
No ratings yet
LO4 Access To Sequenced Data and Related Information
11 pages
Brochure Dietetics With Nutrition
100% (1)
Brochure Dietetics With Nutrition
12 pages
Additional Note PDF
No ratings yet
Additional Note PDF
25 pages
Bioinfo Exercise 2
No ratings yet
Bioinfo Exercise 2
4 pages
Lecture 4-Entrez-Biological Information Repository.
No ratings yet
Lecture 4-Entrez-Biological Information Repository.
10 pages
نماذج الاضواء انجليزي اولى اعدادي الترم الثاني 2024 بالاجابات
No ratings yet
نماذج الاضواء انجليزي اولى اعدادي الترم الثاني 2024 بالاجابات
44 pages
Adobe Scan 19 May 2025
No ratings yet
Adobe Scan 19 May 2025
4 pages
2024.HF BioInformatics Lec3p
No ratings yet
2024.HF BioInformatics Lec3p
11 pages
Bioinformatics
No ratings yet
Bioinformatics
55 pages
Lecture 5 - DataBase
No ratings yet
Lecture 5 - DataBase
18 pages
Ncbi
No ratings yet
Ncbi
25 pages
Advanced Cellular Biology
No ratings yet
Advanced Cellular Biology
50 pages
Partying in Prague
No ratings yet
Partying in Prague
4 pages
OOH Barometer: In&Out View and Full Channel Landscape For Snacks & Drinks
No ratings yet
OOH Barometer: In&Out View and Full Channel Landscape For Snacks & Drinks
14 pages
PC#1 Exercises Introduction To NCBI 2020 v2
No ratings yet
PC#1 Exercises Introduction To NCBI 2020 v2
4 pages
Experiment - 01
No ratings yet
Experiment - 01
26 pages
NACTO Dont Give Up at The Intersection
No ratings yet
NACTO Dont Give Up at The Intersection
40 pages
Senior Two Notes - Sculpture in The Round
No ratings yet
Senior Two Notes - Sculpture in The Round
5 pages
Manual
No ratings yet
Manual
68 pages
Sight Screen Catalog
No ratings yet
Sight Screen Catalog
3 pages
Ncbi Dulu
No ratings yet
Ncbi Dulu
6 pages
Activity 1: Using Databases To Analyze DNA Sequences
No ratings yet
Activity 1: Using Databases To Analyze DNA Sequences
9 pages
Tourism Industries in Assam Agriculture Economy Geography
No ratings yet
Tourism Industries in Assam Agriculture Economy Geography
6 pages
Biological Databases: DR Z Chikwambi Biotechnology
No ratings yet
Biological Databases: DR Z Chikwambi Biotechnology
47 pages
Dav Class 1
No ratings yet
Dav Class 1
21 pages
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
No ratings yet
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
41 pages
Entrez
No ratings yet
Entrez
46 pages
Exp 1
No ratings yet
Exp 1
7 pages
Actual Test 6 Listening 3
No ratings yet
Actual Test 6 Listening 3
4 pages
Comp Bio Lab File
No ratings yet
Comp Bio Lab File
43 pages
Online Biological Databases: A/Prof. Ly Le
No ratings yet
Online Biological Databases: A/Prof. Ly Le
64 pages
Counter Affidavit of FR Shay Cullen
No ratings yet
Counter Affidavit of FR Shay Cullen
6 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
30 pages
Report Card 30-09-2022
No ratings yet
Report Card 30-09-2022
2 pages
Half Yearly Datesheet 2024-2025
No ratings yet
Half Yearly Datesheet 2024-2025
1 page
Coursera BioinfoMethods-I Lab01 PDF
No ratings yet
Coursera BioinfoMethods-I Lab01 PDF
22 pages
Learning Objectives: The Global Trade and Investment Environment
No ratings yet
Learning Objectives: The Global Trade and Investment Environment
9 pages
2006 09 01 - Lect01 - ch1 2 PDF
No ratings yet
2006 09 01 - Lect01 - ch1 2 PDF
104 pages
Entrez
No ratings yet
Entrez
3 pages
Saiva Siddhanta Church Act, No 22 of 1988
No ratings yet
Saiva Siddhanta Church Act, No 22 of 1988
2 pages
Lab 1
No ratings yet
Lab 1
39 pages
English 8 2nd Quarter Exam
No ratings yet
English 8 2nd Quarter Exam
2 pages
Lab 1A - Exploring Ncbi: Bioinformatic Methods I Lab 1
No ratings yet
Lab 1A - Exploring Ncbi: Bioinformatic Methods I Lab 1
22 pages
Ahmed Saad Qatea / 4 Stage
No ratings yet
Ahmed Saad Qatea / 4 Stage
10 pages
Factsheet: Genome Database
No ratings yet
Factsheet: Genome Database
4 pages
Bach Prelude From Cello Suite Banjo - PDF - Musical Compositions - Musical Forms
No ratings yet
Bach Prelude From Cello Suite Banjo - PDF - Musical Compositions - Musical Forms
1 page
Check Answers
No ratings yet
Check Answers
2 pages
Hurl Case 28 People Vs Basay, 219 Scra 404
No ratings yet
Hurl Case 28 People Vs Basay, 219 Scra 404
3 pages
Counselli NG in Hiv/Aids: Presented By-Bhawna Joshi Msc. N 2 Tear SNSR
No ratings yet
Counselli NG in Hiv/Aids: Presented By-Bhawna Joshi Msc. N 2 Tear SNSR
30 pages
BioinfoMethods I Lab01
No ratings yet
BioinfoMethods I Lab01
19 pages
OKRs and KPIs: What They Are and How They Work Together - Reflektive
No ratings yet
OKRs and KPIs: What They Are and How They Work Together - Reflektive
1 page
Data Retrival Systems
No ratings yet
Data Retrival Systems
3 pages
System Biology Assignment
No ratings yet
System Biology Assignment
17 pages
CEO Key Performance Indicators 2014-15
No ratings yet
CEO Key Performance Indicators 2014-15
3 pages
Spicejet Improves Transparency and Control With Ibm Airline Office and Sap Erp
No ratings yet
Spicejet Improves Transparency and Control With Ibm Airline Office and Sap Erp
4 pages
Coursera 14b Unit 1-Ncbi PDF
No ratings yet
Coursera 14b Unit 1-Ncbi PDF
5 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
Latihan Soal Bahasa Inggris Kelas 6
No ratings yet
Latihan Soal Bahasa Inggris Kelas 6
3 pages
202 07 Bioinformatics
No ratings yet
202 07 Bioinformatics
14 pages
Avn 1c London Aircraft Insurance Policy (Hull, Third Party and Passenger Liability)
No ratings yet
Avn 1c London Aircraft Insurance Policy (Hull, Third Party and Passenger Liability)
11 pages
Bio-Informatics, Its Application S& Ncbi: Submitted By: Sidhant Oberoi (BTF/09/4038)
No ratings yet
Bio-Informatics, Its Application S& Ncbi: Submitted By: Sidhant Oberoi (BTF/09/4038)
9 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Artificial Immune Systems: Fundamentals and Applications
From Everand
Artificial Immune Systems: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

PC#1 Exercises Introduction To NCBI 2020-Solved

Uploaded by

PC#1 Exercises Introduction To NCBI 2020-Solved

Uploaded by

Practical Session #1: Introduction to NCBI and Entrez databases.

II. The Entrez system

How many nucleotide sequences?

or (homo sapiens[organism])? https://www.ncbi.nlm.nih.gov/search/all/?term=homo+sapiens

How is the expression [organism] interpreted by each database?

It depends if the expression is recognized.

cancer AND human

cancer[organism] AND human

cancer AND human[organism]

What NCBI database the following UIDs belong to?

CM000253.1 GeneBank Nucleotide

NG_011877.1 RefSeq Nucleotide

NP_002266.2 Refseq protein

CP027442.1 GeneBank Nucleotide (take care with the genome entry)

CAB37359.1 GeneBank protein

ADE87724.1 GeneBank protein

VI. Advance searches

B. homo sapiens[Primary Organism]

They are al synonyms. In this case controlled vocabulary works.

VI. Batch Entrez

1. Save the text file locally in your computer.

How many records are on the list? 79

From what database the entries belong to?nucleotide

Do all entries represent human sequences? Yes

Do all entries represent mRNA sequences? Yes

Do all sequences belong to the same human subject? No

Do all sequences have the same length? No

1) Download the PMC3675194-List_IDs.txt file

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.