This README is a practical, beginner-to-intermediate guide to MongoDB.
MongoDB is a NoSQL document database. Instead of rows and columns, it stores data as flexible JSON-like documents (BSON under the hood).
- Database: Top-level container (like a project namespace).
- Collection: Group of related documents (like a table).
- Document: A single record (like a row), represented as key-value pairs.
Example document:
{
"_id": "64ef...",
"name": "Pritam",
"email": "pritam@example.com",
"age": 27,
"skills": ["javascript", "mongodb"],
"address": {
"city": "Kolkata",
"country": "India"
}
}- Flexible schema: Documents in the same collection can vary.
- Fast development: JSON-like model maps naturally to app objects.
- Rich query language and aggregation framework.
- Built for scale with replication and sharding.
- BSON: Binary JSON format used internally by MongoDB.
_id: Unique primary key for every document (auto-created if not provided).- ObjectId: Default
_idtype. - Index: Data structure that speeds reads.
- Replica Set: Group of nodes for high availability.
- Shard: Horizontal partition of data for scale.
- Aggregation Pipeline: Stage-based data processing.
- Database (SQL) -> Database (MongoDB)
- Table -> Collection
- Row -> Document
- Column -> Field
- JOIN ->
$lookup(or denormalized design) GROUP BY->$group
If you are using Mongoose, install both Node dependencies and a MongoDB server (local or Atlas).
Install Node.js LTS (includes npm).
Check:
node -v
pnpm -vpnpm initpnpm add mongoose dotenvUse MongoDB Atlas — the official cloud-hosted MongoDB service. It has a free tier and requires no local installation.
- Go to https://www.mongodb.com/cloud/atlas and sign up.
- Create a free M0 cluster (free forever, no credit card needed).
- Under Database Access, create a database user with a username and password.
- Under Network Access, add your current IP address (or
0.0.0.0/0for development). - Go to Clusters → Connect → Drivers, select Node.js, and copy the connection string.
The connection string looks like:
mongodb+srv://<username>:<password>@cluster0.xxxxx.mongodb.net/<dbname>?retryWrites=true&w=majority
Replace <username>, <password>, and <dbname> with your values.
Create .env:
MONGODB_URI=mongodb://127.0.0.1:27017/schoolFor Atlas, use your Atlas URI in MONGODB_URI.
Create index.js:
import "dotenv/config";
import mongoose from "mongoose";
async function start() {
try {
await mongoose.connect(process.env.MONGODB_URI);
console.log("MongoDB connected with Mongoose");
await mongoose.connection.close();
} catch (error) {
console.error("Connection failed:", error.message);
process.exit(1);
}
}
start();Set package type (for import syntax):
{
"type": "module"
}Run:
node index.jsIf it prints MongoDB connected with Mongoose, installation is correct.
Assume you already created a Student model.
Insert:
await Student.create({ name: "Asha", age: 20, course: "CS", marks: 89 });
await Student.insertMany([
{ name: "Ravi", age: 21, course: "Math", marks: 76 },
{ name: "Meera", age: 19, course: "CS", marks: 92 }
]);Read:
await Student.find();
await Student.find({ course: "CS" });
await Student.find({ marks: { $gte: 80 } });
await Student.findOne({ name: "Asha" });Projection (pick fields):
await Student.find({ course: "CS" }).select("name marks -_id");Update:
await Student.updateOne({ name: "Asha" }, { $set: { marks: 91 } });
await Student.updateMany({ course: "CS" }, { $inc: { marks: 2 } });Delete:
await Student.deleteOne({ name: "Ravi" });
await Student.deleteMany({ marks: { $lt: 40 } });$eq,$ne,$gt,$gte,$lt,$lte,$in,$nin
Example (Mongoose):
await Student.find({ marks: { $gte: 80, $lte: 95 } });$and,$or,$not,$nor
Example (Mongoose):
await Student.find({
$or: [{ course: "CS" }, { marks: { $gt: 90 } }]
});$exists,$type,$all,$elemMatch,$size
Example (Mongoose):
await User.find({ skills: { $all: ["mongodb", "nodejs"] } });await Student.find().sort({ marks: -1 }).limit(5).skip(0);sort: Asc1, Desc-1limit: max docs returnedskip: pagination offset
// Case-insensitive search
await Student.find({ name: { $regex: "^ash", $options: "i" } });
// Count documents
const totalCS = await Student.countDocuments({ course: "CS" });
// Exists / missing field
await Student.find({ phone: { $exists: false } });
// Lean query (faster read, plain JS objects)
const rows = await Student.find({ marks: { $gte: 80 } }).lean();
// Pagination helper
const page = 1;
const pageSize = 10;
const items = await Student.find()
.sort({ createdAt: -1 })
.skip((page - 1) * pageSize)
.limit(pageSize);Without indexes, MongoDB scans many documents.
Create index:
studentSchema.index({ email: 1 }, { unique: true });
studentSchema.index({ course: 1, marks: -1 });Check indexes:
await Student.collection.getIndexes();Inspect query plan:
await Student.find({ course: "CS" }).explain("executionStats");Aggregation transforms and analyzes data in stages.
Common stages:
$matchfilter$groupaggregate$projectreshape$sort$limit$lookupjoin-like$unwindflatten arrays
Example: average marks by course.
await Student.aggregate([
{ $match: { marks: { $gte: 50 } } },
{
$group: {
_id: "$course",
avgMarks: { $avg: "$marks" },
count: { $sum: 1 }
}
},
{ $sort: { avgMarks: -1 } }
]);MongoDB schema is flexible, but design still matters.
- Embed when related data is read together and bounded in size.
- Reference when relation is large, shared, or updated separately.
Embedded example:
{
"name": "Order-1001",
"customer": { "id": 1, "name": "Asha" },
"items": [
{ "sku": "P1", "qty": 2, "price": 500 },
{ "sku": "P2", "qty": 1, "price": 900 }
]
}Referenced example:
{
"orderId": "1001",
"customerId": "c001",
"itemIds": ["i1", "i2"]
}- Model around query patterns first.
- Keep documents under 16 MB.
- Avoid unbounded arrays in hot documents.
- Add validation rules where possible.
With Mongoose, enforce structure directly in the schema definition.
import mongoose from "mongoose";
const studentSchema = new mongoose.Schema({
name: { type: String, required: true },
email: { type: String, required: true },
age: { type: Number, min: 0 }
});
const Student = mongoose.model("Student", studentSchema);MongoDB supports multi-document ACID transactions (on replica sets/sharded clusters).
Use transactions when one operation depends on another and all must succeed together.
- Replica set has primary and secondary nodes.
- Writes go to primary; secondaries replicate data.
- On primary failure, automatic election picks new primary.
Benefits:
- High availability
- Fault tolerance
- Read scaling (read preferences)
Sharding splits data across multiple servers by a shard key.
- Needed when one server cannot handle data size or throughput.
- Choose shard key carefully to avoid hot shards.
- Enable authentication and role-based access control.
- Use TLS for encryption in transit.
- Use encryption at rest where possible.
- Restrict network access with firewall/VPC.
- Never expose an unauthenticated database publicly.
Common tools:
mongodumpfor logical backup.mongorestorefor restore.- Atlas offers managed snapshots and point-in-time recovery options.
- MongoDB Compass: GUI for browsing data and running queries.
- mongosh: Command line shell.
- MongoDB Atlas: Managed cloud MongoDB.
Popular drivers:
- Node.js (
mongodb,mongoose) - Python (
pymongo) - Java (
mongodb-driver-sync)
If you will use Mongoose with Node.js, start like this.
Install:
pnpm add mongooseimport mongoose from "mongoose";
const uri = process.env.MONGODB_URI || "mongodb://127.0.0.1:27017/school";
await mongoose.connect(uri);
console.log("MongoDB connected");import mongoose from "mongoose";
const studentSchema = new mongoose.Schema(
{
name: { type: String, required: true, trim: true },
email: { type: String, required: true, unique: true, lowercase: true },
marks: { type: Number, min: 0, max: 100, default: 0 },
course: { type: String, index: true }
},
{ timestamps: true }
);
const Student = mongoose.model("Student", studentSchema);
export default Student;import Student from "./models/student.model.js";
// Create
const asha = await Student.create({
name: "Asha",
email: "asha@example.com",
marks: 88,
course: "CS"
});
// Read
const topStudents = await Student.find({ marks: { $gte: 80 } })
.select("name marks course -_id")
.sort({ marks: -1 })
.limit(5);
// Update
await Student.updateOne({ email: "asha@example.com" }, { $inc: { marks: 2 } });
// Delete
await Student.deleteOne({ email: "asha@example.com" });import mongoose from "mongoose";
const userSchema = new mongoose.Schema({
name: String
});
const postSchema = new mongoose.Schema({
title: String,
author: { type: mongoose.Schema.Types.ObjectId, ref: "User", required: true }
});
const User = mongoose.model("User", userSchema);
const Post = mongoose.model("Post", postSchema);
const user = await User.create({ name: "Pritam" });
await Post.create({ title: "MongoDB Basics", author: user._id });
const posts = await Post.find().populate("author", "name");const session = await mongoose.startSession();
try {
await session.withTransaction(async () => {
await Student.create([{ name: "Meera", email: "meera@example.com" }], { session });
await Student.updateOne({ email: "asha@example.com" }, { $inc: { marks: 1 } }, { session });
});
} finally {
await session.endSession();
}Note: Transactions require replica set or Atlas cluster.
studentSchema.pre("save", function (next) {
this.name = this.name.trim();
next();
});Use Mongoose validation for app-level checks and MongoDB indexes for uniqueness/performance.
- Create indexes for real query patterns.
- Use projections to reduce payload.
- Avoid frequent full collection scans.
- Keep schema consistent even if flexible.
- Use aggregation for server-side data processing.
- Monitor performance with
explain()and metrics. - Plan for backup, security, and scaling from day one.
- Keep Mongoose schemas strict and explicit for predictable data.
- Handle DB connection and query errors with centralized middleware.
- Treating MongoDB exactly like SQL without rethinking model.
- Overusing references when embedding is better.
- Missing indexes on filter/sort fields.
- Storing huge unbounded arrays in one document.
- Ignoring validation and security in development.
import mongoose from "mongoose";
const productSchema = new mongoose.Schema({
name: String,
category: String,
price: Number,
stock: Number
});
const Product = mongoose.model("Product", productSchema);
await Product.insertMany([
{ name: "Laptop", category: "Electronics", price: 70000, stock: 12 },
{ name: "Mouse", category: "Electronics", price: 800, stock: 100 },
{ name: "Book", category: "Education", price: 450, stock: 50 }
]);
await Product.find({ category: "Electronics" });
await Product.updateOne({ name: "Mouse" }, { $inc: { stock: -1 } });
await Product.aggregate([
{ $group: { _id: "$category", totalStock: { $sum: "$stock" } } }
]);MongoDB uses WiredTiger as its default storage engine (since v3.2).
- Data is stored in a compressed, columnar format on disk inside
.wtfiles. - WiredTiger uses a B-tree data structure to organize documents and indexes on disk.
- Compression is applied per block (snappy by default, zlib/zstd available), reducing disk usage significantly.
When you insert a document, MongoDB serializes it to BSON (Binary JSON) before writing it to disk.
- BSON is a binary-encoded format that adds type information and length prefixes to each field.
- This makes field traversal and size calculation fast without parsing the entire document.
- Each document lives inside a collection file, identified by its
_id.
MongoDB guarantees durability through a write-ahead log.
- Every write is first appended to the journal (a
.journalfile) before being applied to the data files. - If the server crashes mid-write, on restart MongoDB replays the journal to recover to a consistent state.
- WiredTiger flushes the journal to disk every 50 ms by default (
storage.journal.commitIntervalMs).
Every index in MongoDB is stored as a B-tree on disk.
- The leaf nodes of the B-tree hold the indexed field value and a pointer (RecordId) to the actual document.
- For a compound index
{ course: 1, marks: -1 }, the B-tree is sorted first bycourseascending, then bymarksdescending within each course. - When a query matches the index prefix, MongoDB traverses the B-tree instead of scanning the full collection — this is why indexes matter so much.
WiredTiger uses document-level (optimistic) concurrency control, not collection-level locks.
- Multiple writes to different documents in the same collection can proceed concurrently.
- Uses MVCC (Multi-Version Concurrency Control): readers see a consistent snapshot of data without blocking writers.
- Conflicts on the same document are serialized.
WiredTiger maintains an in-memory cache (default: 50% of RAM − 1 GB).
- Frequently accessed pages (B-tree nodes, documents, index entries) are kept in the cache.
- Dirty pages (modified but not yet flushed) are written to disk during checkpoints (every 60 seconds by default).
- This is why the first query after startup is slower — the cache is cold.
When you run a query, MongoDB's query planner picks an execution strategy:
- Candidate plan selection: The planner considers all applicable indexes and generates candidate plans.
- Trial run: Candidate plans are run concurrently for a small number of documents to measure performance.
- Winning plan: The fastest plan is selected and cached for that query shape.
- Execution: The winning plan is carried out — either an index scan (
IXSCAN) or a collection scan (COLLSCAN).
You can inspect this with:
await Student.find({ course: "CS" }).explain("executionStats");Key fields to look at in the output:
winningPlan.stage:IXSCAN(good) vsCOLLSCAN(potentially slow)totalDocsExamined: how many documents were scannedtotalKeysExamined: how many index entries were scannedexecutionTimeMillis: how long the query took
Client write
↓
MongoDB driver (BSON serialization)
↓
mongod process receives operation
↓
WiredTiger appends to journal (WAL)
↓
Document written / updated in WiredTiger cache
↓
B-tree indexes updated in cache
↓
Background checkpoint flushes dirty pages to .wt data files
- Advanced indexing (TTL, text, wildcard, partial indexes)
- Aggregation deep dive (
$facet,$bucket,$setWindowFields) - Atlas monitoring and performance tuning
- Change streams and event-driven architecture
- Schema versioning and migration strategies