sql
sql
SQL (Structured Query Language) is a special language used to talk to databases. Think of it like a
universal language for organizing, managing, and retrieving information stored in a structured way.
Main Applications:
Creating and managing databases: You use SQL to set up the structure of your database,
create tables, and define how data should be stored.
Storing and retrieving data: This is its primary job. You use SQL to put new information into the
database (like adding a new customer), change existing information (like updating an address),
delete information (like removing an old record), and most importantly, get specific information out
of the database (like listing all customers from a particular city).
Data analysis and reporting: Businesses use SQL to pull data for reports, analyze trends, and
make informed decisions.
Web applications: Most websites that deal with user data (like e-commerce sites, social media
platforms) use SQL databases in the background to store user profiles, posts, product
information, etc.
Think of them like a highly organized library with strict rules. Every book has a specific place
(shelf, row), and every piece of information about a book (title, author, ISBN) has a designated
spot.
Structure: They use tables with fixed columns and rows. You define the structure (schema)
before you put any data in.
Data Relationships: Data in different tables can be linked together using common fields (like
linking a "Books" table to an "Authors" table).
Scalability: Traditionally, they scale vertically (meaning you make the existing server more
powerful). Scaling horizontally (adding more servers) can be more complex.
ACID Properties: They generally guarantee ACID properties (explained later), which means data
is very reliable and consistent.
Examples: MySQL, PostgreSQL, Oracle, SQL Server.
When to use: When you need strong data consistency, complex relationships between data, and
a clear, predictable structure (e.g., banking systems, e-commerce transactions).
Think of them like a very flexible filing system. You can throw in different types of documents,
and they don't all have to follow the exact same format.
Structure: They are more flexible and don't require a fixed schema. Data can be stored in various
formats (key-value pairs, documents, graphs, etc.).
Data Relationships: Relationships are typically less rigid or handled differently.
Scalability: They are designed to scale horizontally (adding more servers) easily, making them
good for very large amounts of data.
ACID Properties: They often relax some ACID properties to achieve better performance and
scalability, especially for distributed systems.
Examples: MongoDB (document), Cassandra (column-family), Redis (key-value), Neo4j (graph).
When to use: When you have large amounts of unstructured or semi-structured data, need high
scalability and flexibility, or when the data relationships are less defined (e.g., social media feeds,
real-time analytics, IoT data).
Joins are used to combine rows from two or more tables based on a related column between them.
Employees Table:
Departments Table:
DepartmentID DepartmentName
101 Sales
102 Marketing
103 HR
1. INNER JOIN:
What it does: Returns only the rows where there's a match in both tables based on the join
condition. It's like finding the intersection of two sets.
Example: Find employees who are in a department.
Result:
Name
Alice
Bob
Name
Charlie
What it does: Returns all rows from the left table, and the matching rows from the right table. If
there's no match in the right table, it will show NULL for the right table's columns.
Example: Get all employees and their department if they have one.
Result:
Name
Alice
Bob
Charlie
David
What it does: Returns all rows from the right table, and the matching rows from the left table. If
there's no match in the left table, it will show NULL for the left table's columns.
Example: Get all departments and the employees in them (even if a department has no
employees).
Result:
Name
Alice
Bob
Charlie
NULL
What it does: Returns all rows when there's a match in either the left or the right table. If there's
no match, it will show NULL for the columns of the table that doesn't have a match. It's like
combining the results of a LEFT JOIN and a RIGHT JOIN.
Example: Get all employees and all departments, matching them where possible.
Result:
Name
Alice
Bob
Charlie
David
NULL
A subquery (also known as an inner query or nested query) is a query embedded inside another SQL
query. The inner query runs first, and its result is then used by the outer query.
Example: Find the names of employees who earn more than the average salary.
4 David 80000
SELECT Name
FROM Employees
WHERE Salary > (SELECT AVG(Salary) FROM Employees);
Explanation:
1. The inner query (SELECT AVG(Salary) FROM Employees) calculates the average salary of all
employees (e.g., 66250).
2. The outer query then uses this average value to filter employees whose Salary is greater than
that average.
Optimizing SQL queries means making them run faster and use fewer resources. Here are some key
ways:
Use Indexes: Create indexes on columns frequently used in WHERE clauses, JOIN conditions,
ORDER BY , and GROUP BY . Indexes are like a book's index – they help the database find data
quickly without scanning the whole table.
Avoid SELECT * : Only select the columns you actually need. SELECT * fetches unnecessary
data, increasing network traffic and processing time.
Filter Early (Use WHERE clause effectively): Apply WHERE clauses as early as possible to
reduce the number of rows processed by subsequent operations.
Optimize JOIN clauses: Ensure JOIN conditions are correctly indexed. Use the most efficient
JOIN type for your specific needs.
Avoid OR in WHERE (if possible): Multiple OR conditions can sometimes prevent index usage.
Consider UNION ALL or IN clause as alternatives.
Use EXPLAIN (or EXPLAIN PLAN ): This is a powerful tool that shows you how the database
executes your query. It helps you identify bottlenecks and areas for improvement.
Limit Results ( LIMIT / TOP ): If you only need a subset of results, use LIMIT (MySQL,
PostgreSQL) or TOP (SQL Server) to stop processing once enough rows are found.
Denormalization (carefully): Sometimes, for read-heavy applications, you might intentionally
duplicate some data across tables to avoid complex joins, but this can introduce data redundancy.
Avoid HAVING when WHERE can be used: WHERE filters rows before grouping, while HAVING
filters after grouping. Filtering early with WHERE is generally more efficient.
Use TRUNCATE instead of DELETE (for full table deletion): TRUNCATE is faster as it deallocates
data pages, while DELETE logs each row deletion. (Be careful, TRUNCATE cannot be rolled back).
Review your database design: A well-designed database schema with proper normalization (or
calculated denormalization) can significantly impact query performance.
ACID is an acronym representing four key properties that guarantee reliable and consistent database
transactions, especially in systems where multiple operations happen simultaneously.
A - Atomicity:
Meaning: All or nothing. A transaction is treated as a single, indivisible unit. If any part of the
transaction fails, the entire transaction is rolled back, and the database remains in its state
before the transaction began.
Example: Transferring money from account A to account B. If debiting A succeeds but
crediting B fails, the entire transaction is undone, and money is not debited from A.
C - Consistency:
Meaning: A transaction brings the database from one valid state to another valid state. It
must obey all defined rules, constraints, and triggers.
Example: If a rule says an Age column cannot be negative, a transaction trying to insert -5
into Age will be rejected, maintaining consistency.
I - Isolation:
Meaning: Concurrent transactions appear to execute serially. Even if multiple transactions
are running at the same time, each transaction is unaware of others, preventing them from
interfering with each other's data.
Example: Two people try to book the last seat on a flight simultaneously. Isolation ensures
that only one person successfully books the seat, and the other sees it as unavailable.
D - Durability:
Meaning: Once a transaction is committed, its changes are permanent and will survive
system failures (like power outages or crashes).
Example: After you make an online purchase and the transaction is confirmed, even if the
website server crashes immediately after, your order details are safely recorded and won't be
lost.
What is the difference between clustered and non-clustered indexes?
Clustered Index:
Analogy: Imagine the phone book itself, sorted by last name. The actual addresses and phone
numbers are physically stored in the order of the last names.
Physical Order: A clustered index determines the physical order in which the data rows are
stored on the disk.
One per table: A table can have only one clustered index because data can only be physically
stored in one order.
Primary Key: By default, the Primary Key of a table often creates a clustered index.
Good for: Range searches (e.g., "find all people whose last name starts with 'S'"), ORDER BY
clauses.
Performance: Generally faster for retrieving data because the data is already sorted.
Non-Clustered Index:
Analogy: Imagine a separate index at the back of the phone book listing all the phone numbers,
with page numbers next to them where you can find the full entry. The phone book itself is still
sorted by last name.
Logical Order: A non-clustered index creates a separate structure that contains the indexed
column(s) and pointers (references) to the actual data rows in the table. The data itself is not
reordered.
Multiple per table: A table can have many non-clustered indexes.
Good for: Searching specific values, often used for columns frequently used in WHERE clauses
that are not the primary key.
Performance: Can be slower than clustered indexes for retrieving data because the database
has to go to the index first, then follow the pointer to the actual data.
GROUP BY Clause:
What it does: The GROUP BY clause is used to arrange identical data into groups. It's often used
with aggregate functions (like SUM() , COUNT() , AVG() , MIN() , MAX() ) to perform calculations
on each group.
When to use: When you want to summarize data for categories.
Example: Calculate the total amount spent by each customer.
Result:
CustomerID
101
102
103
HAVING Clause:
What it does: The HAVING clause is used to filter groups based on a specified condition. It's
similar to WHERE , but WHERE filters individual rows before grouping, while HAVING filters groups
after grouping. You cannot use aggregate functions directly in a WHERE clause.
When to use: When you want to filter results based on the aggregated values.
Example: Find customers who have spent a total of more than 200.
Result:
CustomerID
101
102
1. FROM
2. WHERE (filters individual rows)
3. GROUP BY (groups the filtered rows)
4. HAVING (filters the groups)
5. SELECT (selects the final columns/aggregations)
6. ORDER BY
A self-join is a join operation where a table is joined with itself. This means you treat the same table
as if it were two separate tables, giving them different aliases.
Finding hierarchical data: (e.g., employees and their managers, where both are in the same
Employees table).
Comparing values within the same column: (e.g., finding employees hired on the same date).
Finding duplicates or related records: (e.g., finding customers with the same address).
Example: Find employees and their managers, assuming both are in an Employees table with
EmployeeID , Name , and ManagerID .
SELECT
E.Name AS EmployeeName,
M.Name AS ManagerName
FROM
Employees E
LEFT JOIN
Employees M ON E.ManagerID = M.EmployeeID;
Explanation:
Both UNION and UNION ALL are used to combine the result sets of two or more SELECT statements
into a single result set. The key differences lie in how they handle duplicate rows and their
performance.
UNION:
What it does: Combines the result sets and then removes duplicate rows. It acts like a
DISTINCT operation on the combined result.
Performance: Generally slower than UNION ALL because it needs to perform a sorting and
comparison operation to identify and remove duplicates.
When to use: When you need a unique list of combined results.
Example:
Table A :
Value
1
2
3
Table B :
Value
2
3
4
Result:
UNION ALL:
What it does: Combines the result sets and includes all rows, including duplicates. It simply
appends the results of one query to the end of another.
Performance: Generally faster than UNION because it doesn't need to perform any duplicate
checking or sorting.
When to use: When you need all records from both queries, even if there are duplicates, and
performance is a priority.
Example:
How would you update a value in one table based on a value in another table?
You can do this using a JOIN within an UPDATE statement. The exact syntax can vary slightly
depending on the database system (e.g., MySQL, SQL Server, PostgreSQL).
Products Table:
Categories Table:
We want to apply the DiscountPercentage from the Categories table to the Price in the
Products table for all matching categories.
UPDATE P
SET P.Price = P.Price * (1 - C.DiscountPercentage)
FROM Products P
JOIN Categories C ON P.CategoryID = C.CategoryID
WHERE P.CategoryID = C.CategoryID; -- This WHERE clause is technically redundant
here due to the JOIN, but often included for clarity or additional filtering.
UPDATE Products P
JOIN Categories C ON P.CategoryID = C.CategoryID
SET P.Price = P.Price * (1 - C.DiscountPercentage);
What is a Common Table Expression (CTE)? When would you use it?
A Common Table Expression (CTE) is a temporary, named result set that you can reference within
a single SQL statement (like SELECT , INSERT , UPDATE , DELETE ). Think of it as a temporary,
reusable "view" that only exists for the duration of that one query.
Syntax:
Improving Readability and Modularity: Break down complex queries into smaller, more
manageable, and readable logical steps.
Referencing the same result set multiple times: Instead of writing the same subquery multiple
times, you define it once as a CTE and reuse it.
Recursive Queries: CTEs are essential for performing recursive operations, such as traversing
organizational hierarchies or bill-of-materials structures.
Simplifying Complex Joins/Aggregations: Prepare intermediate results before performing final
joins or aggregations.
Example: Find employees who have a salary higher than the average salary of their department.
Let's assume an Employees table with EmployeeID , Name , Department , and Salary .
WITH DepartmentAverageSalary AS (
SELECT
Department,
AVG(Salary) AS AvgDeptSalary
FROM
Employees
GROUP BY
Department
)
SELECT
E.Name,
E.Department,
E.Salary,
DAS.AvgDeptSalary
FROM
Employees E
JOIN
DepartmentAverageSalary DAS ON E.Department = DAS.Department
WHERE
E.Salary > DAS.AvgDeptSalary;
Explanation:
1. The DepartmentAverageSalary CTE calculates the average salary for each department.
2. The main SELECT statement then joins the Employees table with this CTE to compare each
employee's salary against their department's average.
Data integrity ensures that the data in your database is accurate, consistent, and reliable. Here's how
it's achieved in SQL:
Primary Keys (PK): Uniquely identifies each record in a table. Enforces entity integrity (no
duplicate rows, no NULL primary keys).
Foreign Keys (FK): Establishes a link between data in two tables. Enforces referential integrity
(ensures relationships between tables are valid, preventing "orphan" records).
Unique Constraints: Ensures that all values in a column (or set of columns) are unique. Similar
to a primary key but you can have multiple unique constraints per table, and they can allow
NULLs (unless specified NOT NULL ).
NOT NULL Constraints: Ensures that a column cannot contain NULL values. Enforces that a
particular piece of information must always be present.
CHECK Constraints: Defines a rule that values in a column must satisfy. (e.g., Age > 0 , Salary
>= 1000 ). Enforces domain integrity.
Default Constraints: Specifies a default value for a column if no value is explicitly provided
during an INSERT .
Data Types: Using appropriate data types (e.g., INT for numbers, VARCHAR for text, DATE for
dates) restricts the kind of data that can be stored in a column, maintaining domain integrity.
Transactions (ACID Properties): As discussed earlier, ACID properties ensure that database
operations are reliable and consistent, even in the event of failures.
Triggers: Special stored procedures that automatically execute when a specific event occurs on a
table (e.g., INSERT , UPDATE , DELETE ). They can be used to enforce complex business rules that
cannot be handled by simple constraints.
Write a SQL query to fetch records with specific conditions (e.g., customers with invoices above a certain
amount, employees with salary above average).
Explanation:
Alternative (using a subquery with EXISTS for potentially better performance on large
datasets):
SELECT C.CustomerName
FROM Customers C
WHERE EXISTS (
SELECT 1
FROM Invoices I
WHERE I.CustomerID = C.CustomerID AND I.InvoiceAmount > 500
);
Explanation:
The subquery (SELECT AVG(Salary) FROM Employees) calculates the overall average salary.
The outer query then selects employees whose Salary is greater than this calculated average.