BAD601 Module 3 PDF
BAD601 Module 3 PDF
MongoDB is
1. Cross-platform.
2. Open source.
3. Non-relational.
4. Distributed.
5. NoSQL.
Few of the major challenges with traditional RDBMS are dealing with large
volumes of data, rich variety of data-particularly unstructured data, and
meeting up to the scale needs of enterprise data.
The need is for a database that can scale out or scale horizontally to meet the
scale requirements, has flexibility with respect to schema, is fault tolerant, is
consistent and partition tolerant, and can be easily distributed over a
multitude of nodes in a cluster.
3. Auto sharding
Page 1
Big Data Analytics-BAD601-Module 3
4. Document oriented
5. High performance
7. Replication
8. Easy scalability
9. High availability
Page 2
Big Data Analytics-BAD601-Module 3
"FirstName": "John",
"LastName": "Mathews",
},
Page 3
Big Data Analytics-BAD601-Module 3
"FirstName": "Andrews",
"LastName": "Symmonds",
},
"FirstName": "Mable",
"LastName": "Mathews",
JSON is very expressive. It provides the much needed ease to store and retrieve
documents in their real form. The binary form of JSON is BSON. BSON is an
open standard. In most cases it consumes less space as compared to the text-
based JSON. There is yet another advantage with BSON. It is much easier and
quicker to convert BSON to a programming language's native data format.
There are MongoDB drivers available for a number of programming languages
such as C, C++, Ruby, PHP, Python, C#, etc., and each works slightly
differently. Using the basic binary format enables the native data structures to
be built quickly for each language without going through the hassle of first
processing JSON.
3.2.2 Creating or generating a Unique key
• Each JSON document should have a unique identifier.
• It is the _id key.
• It is similar to the primary key in relational databases.
• This facilitates search for documents based on the unique identifier.
Page 4
Big Data Analytics-BAD601-Module 3
3.2.2.1 Database
3.2.2.2 Collection
A collection is analogous to a table of RDBMS. A collection is created on
demand. It gets created the first time that you attempt to save a document that
references it. A collection exists within a single database. A collection holds
several MongoDB documents. A collection does not enforce a schema. This
implies that documents within a collection can have different fields. Even if the
documents within a collection have same fields, the order of the fields can be
different.
3.2.2.3 Document
A document is analogous to a row/record/tuple in an RDBMS table. A
document has a dynamic schema. This implies that a document in a collection
need not necessarily have the same set of fields/key-value pairs.
Shown in Figure below is a collection by the name "students" containing three
documents.
Page 5
Big Data Analytics-BAD601-Module 3
Page 6
Big Data Analytics-BAD601-Module 3
3.2.6 Sharding
Page 7
Big Data Analytics-BAD601-Module 3
Sharding is akin to horizontal scaling. It means that the large dataset is divided
and distributed over multiple servers or shards. Each shard is an independent
database and collectively they would constitute a logical database.
The prime advantages of sharding are as follows:
1. Sharding reduces the amount of data that each shard needs to store and
manage. For example, if the dataset was 1 TB in size and we were to distribute
this over four shards, each shard would house just 256 GB data. As the cluster
grows, the amount of data that each shard will store and manage will decrease.
2. Sharding reduces the number of operations that each shard handles. For
example, if we were to insert data, the application needs to access only that
shard which houses that data.
Page 8
Big Data Analytics-BAD601-Module 3
memory. The fewer the reads and writes that we perform to the disk, the better
is the performance. This makes MongoDB faster than its other competitors who
write almost immediately to the disk. However, there is a tradeoff. MongoDB
makes no guarantee that data will be stored safely on the disk.
Page 9
Big Data Analytics-BAD601-Module 3
RDBMS MongoDB
Description
Terms Equivalent
SQL Query
Find Query MongoDB uses a JavaScript-like
(SELECT,
(db.collection.find()) query syntax instead of SQL.
WHERE, etc.)
Page 10
Big Data Analytics-BAD601-Module 3
RDBMS MongoDB
Description
Terms Equivalent
MongoDB
RDBMS Terms Description
Equivalent
Page 11
Big Data Analytics-BAD601-Module 3
MongoDB
RDBMS Terms Description
Equivalent
MongoDB
RDBMS Terms Description
Equivalent
Page 12
Big Data Analytics-BAD601-Module 3
MongoDB
RDBMS Terms Description
Equivalent
User Roles & Role-Based Access MongoDB uses RBAC for fine-
Permissions Control (RBAC) grained security.
MongoDB supports
Authentication Authentication with
authentication methods like
(LDAP, Kerberos, LDAP, Kerberos,
LDAP, Kerberos, and SCRAM-
etc.) SCRAM
SHA
Creating a Database
Syntax:
use DATABASE_Name
Page 13
Big Data Analytics-BAD601-Module 3
Example:
To create a database named myDB, use:
use myDB
Output:
switched to db myDB
db
Output:
myDB
show dbs
Output (example):
admin (empty)
local 0.078GB
test 0.078GB
• The newly created database (e.g., myDB) does not appear in the list from
show dbs until it contains at least one document.
• The default database in MongoDB is test.
Vtucircle.com Page 14
Big Data Analytics-BAD601-Module 3
Confirmation Message
Vtucircle.com Page 15
Big Data Analytics-BAD601-Module 3
1. String
2. Integer
Ex: { "age": 25 }
3. Double
4. Boolean
5. Array
Vtucircle.com Page 16
Big Data Analytics-BAD601-Module 3
7. ObjectId
8. Date
9. Null
Vtucircle.com Page 17
Big Data Analytics-BAD601-Module 3
Similar to javascript, but allows defining scope (variables) for the script.
14. Timestamp
15. Decimal128
test
Vtucircle.com Page 18
Big Data Analytics-BAD601-Module 3
admin (empty)
local 0.078GB
myDB1 0.078GB
switched to db myDB1
To display the list of collections (tables) in the current database:
show collections
Example Output:
system.indexes
system.js
2.6.1
1. StudRoll No
2. StudName
Vtucircle.com Page 19
Big Data Analytics-BAD601-Module 3
3. Grade
4. Hobbies
5. DOJ
Before we get into the details of CRUD operations in MongoDB, let us look at
how the statements are written in RDBMS and MongoDB.
Vtucircle.com Page 20
Big Data Analytics-BAD601-Module 3
Creating a Collection
Objective: Create a collection named "Person".
Step 1 – View existing collections:
show collections
Example output:
Students
food
system.indexes
system.js
Step 2 – Create the new collection:
db.createCollection("Person")
Output:
{ "ok" : 1 }
Outcome – View updated collections:
show collections
Example output after creation:
Person
Vtucircle.com Page 21
Big Data Analytics-BAD601-Module 3
Students
food
system.indexes
system.js
Dropping a Collection
Objective: Drop the collection named "food".
Step 1 – Check current collections:
show collections
Example output:
Person
Students
food
system.indexes
system.js
Step 2 – Drop the collection:
db.food.drop()
Output:
true
Outcome – View updated collections:
show collections
Example output after dropping:
Person
Students
system.indexes
system.js
Objective: Insert Aryan David only if not already in the collection. If present,
update his hobbies.
Vtucircle.com Page 23
Big Data Analytics-BAD601-Module 3
Save document:
db.Students.save({StudName:"Vamsi Bapat", Grade:"VII"})
Check final documents:
db.Students.find().pretty()
3.5.2 save() method
Inserts a new document if no document with the specified _id exists. If the
document exists, it replaces the existing one.
Objective:
Insert the document of "Hersch Gibbs" into the Students collection using the
update() method with the upsert option.
Step 1: Check existing documents in the "Students" collection
Shows the existing documents with their _id, StudName, Grade, and Hobbies.
Step 2: Use update with upsert: false
db.Students.update(
{_id:4, StudName:"Hersch Gibbs", Grade:"VII"},
{$set: {Hobbies: "Graffiti"}},
{upsert: false}
);
• No document is inserted because a document with _id:4 doesn't exist.
• Result shows nUpserted: 0 meaning no document was inserted.
Step 3: Use update with upsert: true
db.Students.update(
{_id:4, StudName:"Hersch Gibbs", Grade:"VII"},
Vtucircle.com Page 24
Big Data Analytics-BAD601-Module 3
Vtucircle.com Page 25
Big Data Analytics-BAD601-Module 3
Act:
Add the new field "Location" with the value "Newark":
db.Students.update(
{_id:4},
{$set: {Location: "Newark"}}
);
Output shows:
{
"nMatched": 1,
"nUpserted": 0,
"nModified": 1
}
Outcome:
Confirm the new field has been added:
db.Students.find({_id:4}).pretty();
Output:
{
"_id": 4,
"Grade": "VII",
"StudName": "Hersch Gibbs",
"Hobbies": "Graffiti",
"Location": "Newark"
}
Objective:
To remove the field "Location" with the value "Newark" from a document with
_id: 4 in the Students collection.
Vtucircle.com Page 26
Big Data Analytics-BAD601-Module 3
Input:
db.Students.find({_id:4}).pretty()
Output:
{
"_id": 4,
"Grade": "VII",
"StudName": "Hersch Gibbs",
"Hobbies": "Graffiti",
"Location": "Newark"
}
Act:
This uses:
Output:
Outcome:
Vtucircle.com Page 27
Big Data Analytics-BAD601-Module 3
db.Students.find({_id:4}).pretty()
Result:
{
"_id": 4,
"Grade": "VII",
"StudName": "Hersch Gibbs",
"Hobbies": "Graffiti"
}
Vtucircle.com Page 28
Big Data Analytics-BAD601-Module 3
• Adds a new field Sports with the value "Football" to the document with
_id: 4.
db.Students.update(
{Grade: "VII"},
{$set: {Sports: "Cricket"}},
{multi: true}
)
db.Students.replaceOne(
{_id: 4},
{
_id: 4,
Grade: "X",
StudName: "Hersch Gibbs",
Hobbies: "Graffiti",
Sports: "Football"
}
)
Vtucircle.com Page 29
Big Data Analytics-BAD601-Module 3
db.Students.update(
{_id: 5},
{$set: {StudName: "Paul Adams", Grade: "VIII"}},
{upsert: true}
)
Vtucircle.com Page 30
Big Data Analytics-BAD601-Module 3
Commands:
db.Students.find().sort({Grade: 1}) // Ascending
db.Students.find().sort({Grade: -1}) // Descending
• Equivalent in SQL:
SELECT * FROM Students ORDER BY Grade ASC;
SELECT * FROM Students ORDER BY Grade DESC;
Limiting Results – limit() Method
Objective:
Retrieve only a certain number of documents.
Command:
db.Students.find().limit(3)
• Returns the first 3 documents from the collection.
• Equivalent in SQL:
SELECT * FROM Students LIMIT 3;
Vtucircle.com Page 31
Big Data Analytics-BAD601-Module 3
Act: Find the document wherein the "StudName" has value "Aryan David”.
db.Students.find({StudName:"Aryan David"});
Outcome:
RDBMS equivalent:
Select *
From Students
Where StudName like 'Aryan David';
Objective: To display only the StudName from all the documents of the
Student's collection. The identifier "_id" should be suppressed and NOT
displayed.
Act:
db.Students.find({}, {StudName: 1,_id:0});
Outcome:
Vtucircle.com Page 32
Big Data Analytics-BAD601-Module 3
RDBMS equivalent:
Select StudName
From Students;
Objective: To display only the StudName and Grade from all the documents of
the Students collec- tion. The identifier _id should be suppressed and NOT
displayed.
Act:
db.Students.find({}, {StudName:1,Grade: 1,_id:0});
Outcome:
RDBMS equivalent:
Select StudName, Grade
From Students;
Objective: To display the StudName, Grade as well the identifier, id from the
document of the Students collection where the _id column is 1.
Act:
db.Students.find({_id:1},{StudName:1,Grade:1});
Outcome:
Vtucircle.com Page 33
Big Data Analytics-BAD601-Module 3
RDBMS equivalent:
Select StudRoll No, StudName, Grade
From Students
Where StudRollNo = '1';
Objective: To display the StudName and Grade from the document of the
Students collection where the_id column is 1. The id field should NOT be
displayed.
Act:
db.Students.find({_id:1}, {StudName:1,Grade:1,_id:0});
Outcome:
RDBMS equivalent:
Select StudName, Grade
From Students
Where StudRollNo like '1';
Vtucircle.com Page 34
Big Data Analytics-BAD601-Module 3
RDBMS Equivalent:
Select *
From Students
Where Grade like 'VII';
Objective: To find those documents where the Grade is NOT set to 'VII'.
Act:
db.Students.find({Grade: {$ne: 'VII'}}).pretty();
Vtucircle.com Page 35
Big Data Analytics-BAD601-Module 3
RDBMS Equivalent:
Select *
From Students
Where Grade <> 'VII';
Objective: To find those documents from the Students collection where the
Hobbies is set to either 'Chess' or is set to 'Skating'.
Act:
db.Students.find ({Hobbies :{ $in: ['Chess', 'Skating']}}).pretty ();
Outcome:
RDBMS Equivalent:
Select *
From Students
Where Hobbies in ('Chess', 'Skating');
Objective: To find those documents from the Students collection where the
Hobbies is set neither to 'Chess' nor is set to 'Skating'.
Vtucircle.com Page 36
Big Data Analytics-BAD601-Module 3
Act:
db.Students.find({Hobbies :{ $nin: ['Chess','Skating']}}).pretty ();
Outcome:
RDBMS Equivalent:
Select *
From Students
Where Hobbies not in ('Chess', 'Skating');
Objective: To find those documents from the Students collection where the
Hobbies is set to 'Graffiti' and the StudName is set to 'Hersch Gibbs' (AND
condition).
Act:
db.Students.find({Hobbies:'Graffiti', StudName: 'Hersch Gibbs'}).pretty();
Vtucircle.com Page 37
Big Data Analytics-BAD601-Module 3
Outcome:
RDBMS Equivalent:
Select *
From Students
Where Hobbies like 'Graffiti' and StudName like 'Hersch Gibbs';
RDBMS Equivalent:
Select *
Vtucircle.com Page 38
Big Data Analytics-BAD601-Module 3
From Students
Where StudName like 'M%';
RDBMS Equivalent:
Select *
From Students
Where StudName like '%s';
Vtucircle.com Page 39
Big Data Analytics-BAD601-Module 3
RDBMS Equivalent:
Select *
From Students
Where StudName like '%e%';
Objective: To find documents from the Students collection where the
StudName ends in "a".
Act:
db.Students.find({StudName: {$regex:"a$"}}).pretty();
Outcome:
RDBMS Equivalent:
Select *
From Students
Where StudName like "%a";
Vtucircle.com Page 40
Big Data Analytics-BAD601-Module 3
RDBMS Equivalent:
Select *
From Students
Where StudName like 'M%';
Vtucircle.com Page 41
Big Data Analytics-BAD601-Module 3
Vtucircle.com Page 42
Big Data Analytics-BAD601-Module 3
RDBMS Equivalent:
Select *
From Students
Where Grade like "VII' and rownum <4;
Vtucircle.com Page 43
Big Data Analytics-BAD601-Module 3
RDBMS Equivalent:
Select *
From Students
Order by StudName asc;
Objective: To sort the documents from the Students collection in the
descending order of StudName.
Act:
db.Students.find().sort((StudName:-1}).pretty();
Outcome:
Vtucircle.com Page 44
Big Data Analytics-BAD601-Module 3
RDBMS Equivalent:
Select *
From Students
Order by StudName desc;
Objective: To sort the documents from the Students collection first on Grade
in ascending order and then on Hobbies in descending order.
Act:
db.Students.find().sort((Grade:1, Hobbies:-1)).pretty();
Outcome:
Vtucircle.com Page 45
Big Data Analytics-BAD601-Module 3
RDBMS Equivalent:
Select *
From Students
Order by Grade asc, hobbies desc;
Objective: To sort the documents from the Students collection first on Grade
in ascending order and then on Hobbies in ascending order.
Act:
db.Students.find().sort((Grade:1, Hobbies:1}).pretty();
Outcome:
RDBMS Equivalent:
Select *
From Students
Order by Grade asc, Hobbies asc;
Vtucircle.com Page 46
Big Data Analytics-BAD601-Module 3
RDBMS Equivalent:
Select StudRollNo, StudName, Grade, Hobbies
From (Select StudRollNo, StudName, Grade, Hobbies, RowNum as
TheRowNum From Students)
Where TheRowNum > 2;
Objective: To sort the documents from the Students collection and skip the
first document from the output.
Act:
db.Students.find().skip (1).pretty().sort({StudName:1});
Outcome:
Vtucircle.com Page 47
Big Data Analytics-BAD601-Module 3
RDBMS Equivalent:
Select StudRollNo, StudName, Grade, Hobbies
From (Select Stud RollNo, StudName, Grade, Hobbies, RowNum as
TheRowNum From Students)
Where TheRowNum > 1
Order by StudName;
Objective: To retrieve the third, fourth, and fifth document from the Students
collection.
Act:
db.Students.find().pretty().skip(2).limit(3);
Outcome:
Vtucircle.com Page 48
Big Data Analytics-BAD601-Module 3
3.5.8 Arrays
Objective: To create a collection by the name "food" and then insert documents
into the "food" collection. Each document should have a "fruits" array.
Act:
db.food.insert({_id:1,fruits:[ 'banana','apple', 'cherry' ] })
db.food.insert({_id:2,fruits:[ 'orange','butterfruit','mango' ]})
db.food.insert({_id:3,fruits:[ 'pineapple', 'strawberry','grapes']});
db.food.insert({_id:4,fruits:[ 'banana', 'strawberry','grapes']});
db.food.insert((_id:5,fruits: [ 'orange','grapes']});
Objective: To find those documents from the "food" collection which has the
"fruits array" constituted of "banana", "apple" and "cherry".
Act:
db.food.find({fruits: ['banana','apple', 'cherry']}).pretty()
Outcome:
Vtucircle.com Page 49
Big Data Analytics-BAD601-Module 3
Objective: To find those documents from the "food" collection which has the
"fruits" array having "banana", as an element.
Act:
db.food.find({fruits:'banana'})
Outcome:
Objective: To find those documents from the "food" collection which have the
"fruits" array having "grapes" in the first index position. The index position
begins at 0.
Act:
db.food.find({'fruits. 1':'grapes'})
Outcome:
Objective: To find those documents from the "food" collection where "grapes" is
present in the 2nd index position of the "fruits" array.
Act:
db.food.find({'fruits.2':'grapes'})
Outcome:
Objective: To find those documents from the "food" collection where the size of
the array is two. The size implies that the array holds only 2 values.
Act:
db.food.find({"fruits":{$size:2}})
Outcome:
Vtucircle.com Page 50
Big Data Analytics-BAD601-Module 3
Objective: To find those documents from the "food" collection where the size of
the array is three. The size implies that the array holds only 3 values.
Act:
db.food.find({"fruits":{$size:3}})
Outcome:
Objective: To find the document with (id: 1) from the "food" collection and
display the first two elements from the array "fruits".
Act:
db.food.find({_id:1},{"fruits":{$slice:2}})
Outcome:
Objective: To find all documents from the "food" collection which have
elements "orange" and "grapes" in the array "fruits".
Act:
db.food.find ((fruits: {$all: ["orange", "grapes"]}}).pretty ();
Outcome:
Objective: To find those documents from the "food" collection which have the
element "orange" in the 0th index position in the array "fruits".
Act:
db.food.find({ "fruits.0" : "orange" }).pretty();
Vtucircle.com Page 51
Big Data Analytics-BAD601-Module 3
Outcome:
Objective: To find the document with (id: 1) from the "food" collection and
display two elements from the array "fruits", starting with the element at 0th
index position.
Act:
db.food.find({id:1},{"fruits": {$slice: [0,2]}})
Outcome:
Objective: To find the document with (id: 1) from the "food" collection and
display two elements from the array "fruits", starting with the element at 1"
index position.
Act:
db.food.find({_id:1},{"fruits": {$slice:[1,2]}})
Outcome:
Objective: To find the document with (id: 1) from the "food" collection and
display three elements from the array "fruits", starting with the element at 2nd
index position. Since we have only 3 elements in the array "fruits" for the
document with _id:1, it displays only one element, the element at 2 nd index
position, that is, "cherry".
Act:
db.food.find({_id:1},{"fruits": {$slice: [2,3]}})
Outcome:
Vtucircle.com Page 52
Big Data Analytics-BAD601-Module 3
Objective: To update the document with "_id:4" and replace the element
present in the 1st index position of the "fruits" array with "apple".
Act:
db.food.update({_id:4}, {$set:{'fruits.1': 'apple'}})
Objective: To update the document with "_id:1" and replace the element
"apple" of the "fruits" array with "An apple".
Act:
db.food.update({_id:1, 'fruits':'apple'}, {$set: {'fruits.$': 'An apple' }})
Vtucircle.com Page 53
Big Data Analytics-BAD601-Module 3
Objective: To update the document with "_id:2" and push new key value pairs
in the "fruits" array.
Act:
db.food.update({_id:2},{$push:{price:{orange:60,butterfruit:200,mango: 120}}})
Vtucircle.com Page 54
Big Data Analytics-BAD601-Module 3
Vtucircle.com Page 55
Big Data Analytics-BAD601-Module 3
Objective: To update the document with "_id:3" by popping two elements from
the list of elements present in the array "fruits". The elements popped are
"pineapple" and "grapes".
The document with "_id:3" before the update is
Act:
db.food.update({_id:3},{$pullAll:{fruits: [ 'pineapple','grapes' ]}});
Vtucircle.com Page 56
Big Data Analytics-BAD601-Module 3
Act:
db.food.update({fruits:'banana'}, {$pull:{fruits:'banana'}})
Vtucircle.com Page 57
Big Data Analytics-BAD601-Module 3
Vtucircle.com Page 58
Big Data Analytics-BAD601-Module 3
and display that group where the "TotAccBal" column has a value greater than
1200.
Let us start off by creating the collection “Customers" with the above displayed
four documents:
db.Customers.insert([{CustID:"C123",AccBal:500,AccType:"S"},
{CustID:"C123", AccBal: 900, AccType:"S"},
{CustID:"C111", AccBal: 1200, AccType:"S"},
{CustID:"C123", AccBal: 1500, AccType:"C"}});
To confirm the presence of four documents in the "Customers" collection, use
the below syntax:
db.Customers.find().pretty();
To group on "CustID" and compute the sum of "AccBal", use the below syntax:
db.Customers.aggregate({$group:{_id:"$CustID",TotAccBal:{$sum:"$AccBal"
}}});
In order to first filter on "AccType:S" and then group it on "CustID" and then
compute the sum of "AccBal", use the below syntax:
Vtucircle.com Page 59
Big Data Analytics-BAD601-Module 3
Vtucircle.com Page 60
Big Data Analytics-BAD601-Module 3
Given below is the syntax that we will use to accomplish the objective.
db.Customers.mapReduce (
map → function() { emit (this. CustID, this.AccBal ); },
reduce→ function(key, values) { return Array.sum (values ) },
{
query→ query: { AccType: "S"},
output→ out: "Customer_Totals"
}
)
Map Function
var map=function(){
emit (this. CustID, this.AccBal );}
Reduce Function
var reduce = function (key, values) { return Array.sum(values); }
To execute the query
db.Customers.mapReduce(map,reduce,{out:"Customer_Totals",query:{AccType:"
S"}});
Vtucircle.com Page 61
Big Data Analytics-BAD601-Module 3
Act:
db.system.js.insert({_id:"factorial",
value:function(n)
{
if (n==1)
return 1;
else
return n* factorial(n-1);
}
}
);
To execute the function "factorial”, use the eval() method.
db.eval("factorial(3)");
db.eval("factorial(5)");
db.eval("factorial(1)");
Vtucircle.com Page 62
Big Data Analytics-BAD601-Module 3
Vtucircle.com Page 63
Big Data Analytics-BAD601-Module 3
});
Verification:
To confirm the presence of all 26 documents:
db.alphabets.find()
3.5.13 Indexes
Sample Data:
Collection:books
Contains 5 documents with fields like:
• _id, Category, Bookname, Author, Qty, Price, Pages
Example Categories:
• Machine Learning
• Web Mining
• Python
• Visualization
Creating an Index
To create an index on the Category field in the books collection:
db.books.ensureIndex({"Category": 1});
Checking Index Status
1. To check index stats:
db.books.stats();
Shows count, storage size, index count, index sizes, etc.
Example:
"indexes" : 2,
"indexSizes" : {
"_id_" : 8176,
"Category_1" : 8176
}
2. To list all indexes:
db.books.getIndexes();
• Shows keys and names of all indexes.
Vtucircle.com Page 64
Big Data Analytics-BAD601-Module 3
3.5.14 mongoimport
Purpose:
The mongoimport command is used to import data into MongoDB from:
• CSV (Comma-Separated Values)
• TSV (Tab-Separated Values)
• JSON (JavaScript Object Notation)
Vtucircle.com Page 65
Big Data Analytics-BAD601-Module 3
Objective:
Import a CSV file named sample.txt located in the D: drive into the MongoDB
collection SampleJSON within the test database.
Contents of sample.txt:
_id,FName,LName
1,Samuel,Jones
2,Virat,Kumar
3,Raul,"A Simpson"
4,,"Andrew Simon"
Command to Import CSV File:
Run the following command in the command prompt:
mongoimport --db test --collection SampleJSON --type csv --headerline --file
d:\sample.txt
• --db test → Target database
• --collection SampleJSON → Target collection
• --type csv → Input file type
• --headerline → Use the first line of the CSV file as field names
• --file → Path to the input file
Successful Output Message:
connected to: 127.0.0.1
imported 4 objects
Verifying the Import in Mongo Shell:
Steps:
1. Start Mongo shell
2. Switch to test database:
3. use test
4. View collections:
5. show collections
6. Query the data:
7. db.SampleJSON.find().pretty();
Vtucircle.com Page 66
Big Data Analytics-BAD601-Module 3
3.5.15 mongoexport
Purpose:
The mongoexport command is used at the command prompt to export
MongoDB JSON documents into:
• CSV (Comma-Separated Values),
• TSV (Tab-Separated Values), or
• JSON (JavaScript Object Notation) formats.
Objective:
Export data from the Customers collection in the test database into a CSV file
named Output.txt in the D: drive.
Sample Data in MongoDB (Customers Collection):
{ "_id": ObjectId("..."), "CustID": "C123", "AccBal": 500, "AccType": "S" }
{ "_id": ObjectId("..."), "CustID": "C123", "AccBal": 900, "AccType": "S" }
{ "_id": ObjectId("..."), "CustID": "C111", "AccBal": 1200, "AccType": "S" }
{ "_id": ObjectId("..."), "CustID": "C123", "AccBal": 1500, "AccType": "C" }
Steps to Export the Data:
Step 1: Create fields.txt file
This file should contain the field names exactly as they appear in the
MongoDB collection, one per line:
CustID
Vtucircle.com Page 67
Big Data Analytics-BAD601-Module 3
AccBal
AccType
◻◻ Important: Field names are case-sensitive. Only one field name should be
placed per line.
Step 2: Run the Export Command
mongoexport --db test --collection Customers --csv --fieldFile d:\fields.txt --out
d:\output.txt
• --db test → Specifies the database.
• --collection Customers → Target collection.
• --csv → Specifies the output format.
• --fieldFile → Points to the list of fields to include.
• --out → Output file location.
Expected Command Line Output:
connected to: 127.0.0.1
exported 4 records
Final Output File (Output.txt in D: Drive):
CustID,AccBal,AccType
"C123",500.0,"S"
"C123",900.0,"S"
"C111",1200.0,"S"
"C123",1500.0,"C"
Vtucircle.com Page 68
Big Data Analytics-BAD601-Module 3
seq: 0
}
)
• _id is a custom name (e.g., "empid") used to identify the sequence.
• seq is initialized to 0.
Step 2: Create a JavaScript Function getnextseq
This function will find and increment the sequence value using
findAndModify().
function getnextseq(name) {
var ret = db.usercounters.findAndModify({
query: { _id: name },
update: { $inc: { seq: 1 } },
new: true
});
return ret.seq;
}
• findAndModify() atomically finds the document and increments seq by 1.
• new: true returns the modified document after the update.
• Returns the incremented seq value.
Step 3: Use getnextseq() While Inserting New Documents
Use the getnextseq() function when inserting into a collection (e.g., users) to
auto-assign a unique _id:
db.users.insert(
{
_id: getnextseq("empid"),
Name: "sarah jane"
}
)
• The _id will now have an auto-incremented value based on the "empid"
sequence in usercounters.
Vtucircle.com Page 69
Big Data Analytics-BAD601-Module 3
Benefits:
• Ensures unique and sequential _id values.
• Useful in applications needing custom ID schemes (e.g., employee
numbers, customer IDs).
*****END*****
Vtucircle.com Page 70