MAD 1 - Week 7 Parampreet Singh
MAD 1 - Week 7 Parampreet Singh
Backend Systems
Types of Storage elements
On-chip registers: 10s to 100s of bytes
SRAM (cache): 0.1s to 1MB
DRAM (main memory): 10s to 100s of GB
SSD (solid state drive) - Flash: 100GB - 1TB
Magnetic disck (hard drive): 1TB,...
There are various other types of storage elements like optical, magnetic tape, holographic, etc...
Note: Above numbers are just example, there's no limit.
Storage Parameters
Latency
It refers to the time it takes for a data packet to travel from the source to the destination (here storage
element).
It is often measured in milliseconds (ms) and represents the delay between the initiation of an action
and the response or result.
Lower latency is desirable, especially in real-time applications like online gaming, video conferencing,
and financial transactions, where quick responses are crucial.
Register < SRAM < DRAM < SSD < HDD
Throughput
It is the number of data packets that can be transferred from one place to another in a given amount of
time.
It is often measured in megabits per second (Mbps) or gigabits per second (Gbps).
Higher throughput is desirable, especially in applications like video streaming, where a large amount of
data is transferred continuously.
DRAM > SSD > HDD
Density
It is the amount of data that can be stored in a given amount of physical space.
Higher density often leads to more efficient resource utilization, reduced physical footprint and
increased scalability.
HDD > SSD > DRAM > SRAM > Registers
Computer Architecture
param302.bio.link
It deals with how a computer system organizes its various memory components.
This hierarchy consists of different memory levels, each with distinct properties in terms of speed, capacity, and
cost.
The goal of the memory hierarchy is to effectively handle data access and storage, ultimately improving the
overall performance of the computer system.
Memory hierarchy organizes memory into multiple levels based on proximity to the CPU and performance
characteristics.
Levels include blazing-fast CPU Registers and on-chip L1, L2, and L3 Caches for quick data access.
Main Memory (RAM) is larger but slower, while Secondary Storage (HDD/SSD) offers high-capacity but
even slower storage.
The hierarchy optimizes data access by moving frequently used data closer to the CPU, enhancing overall
performance.
Cold Storage
Cold storage is a computer system designed for retaining inactive data, such as information required for
regulatory compliance, at low cost and high efficiency.
Examples: Amazon Glacier, Google Coldline, Microsoft Azure Cool Blob Storage, etc...
Developer must be aware of choices and what kind of database to choose for a given application.
Notation
Tabular databases
param302.bio.link
Tables with rows and columns
To search quickly on some columns, we can create INDEX of those columns.
Index are stored in a tree structure, so searching is fast.
Example: MySQL use B-Tree and Hash Indexes.
B-Tree
A B-tree index can be used for column comparisons in expressions that use the =, >, >=, <, <=, or BETWEEN
operators.
The index also can be used for LIKE comparisons if the argument to LIKE is a constant string that does not start
with a wildcard character.
For example, the following SELECT statements use indexes:
In the first statement, only rows with Patrick <= key_col < Patricl are considered. In the second
statement, only rows with Pat <= key_col < Pau are considered.
For more, check here
Hash Index
Hash indexes have somewhat different characteristics from those just discussed:
They are used only for equality comparisons that use the = or <=> operators (but are very fast). They are
not used for comparison operators such as < that find a range of values. Systems that rely on this type of
single-value lookup are known as “key-value stores”; to use MySQL for such applications, use hash indexes
wherever possible.
The optimizer cannot use a hash index to speed up ORDER BY operations. (This type of index cannot be
used to search for the next entry in order.)
For more, check here
Query Optimization
Query optimization is database specific.
MySQL, SQLite, PostgreSQL, etc... have different query optimization techniques.
MySQL
SQLite
PostgreSQL
param302.bio.link
SQL vs NoSQL
Parameter SQL (using MySQL) NoSQL (using MongoDB)
Follows fixed schema with tables and Adapts to flexible schema (key-value,
Data Model
columns. document, column, graph).
Query Uses standardized SQL for querying Has query languages tailored to specific
Language ( SELECT , INSERT , etc.). data models.
Examples MySQL (using Python and SQLite) MongoDB (using Python and MongoDB)
-- Insert a new row into the 'users' table with the name 'Alice' and age 25
INSERT INTO users (name, age) VALUES ('Alice', 25);
-- Update the age to 26 for the user with the name 'Alice' in the 'users' table
UPDATE users SET age = 26 WHERE name = 'Alice';
-- Delete all rows from the 'users' table where the age is greater than 30
DELETE FROM users WHERE age > 30;
// Insert a new document with the name 'Alice' and age 25 into the 'users' collection
db.users.insertOne({ name: 'Alice', age: 25 });
// Update the age to 26 for the document with the name 'Alice' in the 'users' collection
db.users.updateOne({ name: 'Alice' }, { $set: { age: 26 } });
// Delete all documents from the 'users' collection where the age is greater than 30
db.users.deleteMany({ age: { $gt: 30 } });
param302.bio.link
Alternate way to store data:
Key-Value
Key-values are stored in a hash table or search trees.
Example: Redis, DynamoDB, Python Dicitonary, C++ OrderedMap etc...
Very efficient key lookup, not good for range type queries.
Often used for caching, session management, etc...
Column Stores
Traditional databases store data in rows, but column stores store data in columns.
Example: Cassandra, HBase, etc...
Graph
Graph databases store data in nodes and edges.
Different degrees, weights of edges, nodes etc... are ways to store data.
Path finding more important than just search.
Example: Neo4j, OrientDB, etc...
ACID
ACID stands for Atomicity, Consistency, Isolation, Durability.
It is a set of properties of database transactions
Read more about it here (page no. )
L7.5: Scaling
Can improve availability, performance, Can improve reliability, fault tolerance, and
Benefits
and scalability. disaster recovery.
ACID vs BASE
ACID stands for Atomicity, Consistency, BASE stands for Basically Available, Soft
Definition
Isolation, and Durability. State, Eventual Consistency.
ACID guarantees that transactions are BASE guarantees that data is eventually
Guarantees always atomic, consistent, isolated, and consistent, even if there are temporary
durable. inconsistencies.
ACID provides strong guarantees for data integrity. BASE provides good availability and
Benefits
scalability.
param302.bio.link
Feature ACID BASE
Scale-up vs Scale-out
Approach Scale-Up Scale-Out
Resource
Vertical growth Horizontal growth
Focus
param302.bio.link
Code to store the data in a database
name = form.request.get('username')
password = form.request.get('password')
query = "SELECT * FROM users WHERE name = '" + name + "' AND password = '" + password + "'"
db.execute(query)
If user enters a normal username and password, the final query will be:
It is a valid query and will return the user's data. But if the user enters some suspicious data, which makes our
query like this:
SELECT * FROM users WHERE name = "" or "" AND password = "" or "";
This query will return all the data from the database, which is not what we want. This is called SQL Injection.
SELECT * FROM users WHERE name = 'x'; DROP TABLE users; AND password = 'x';
A real life SQL Injection performed on CCTV cameras on parking lots in Australia
Problem:
Parameters are not sanitized, just taken as it is.
param302.bio.link
The query is not checked for any malicious code.
Validation must be done just before the database query, even if we have validation on frontend, because
the user can bypass it.
Buffer Overflow
Buffer overflow is a vulnerability in low level codes of a program.
It occurs when a program tries to store more data in a buffer than it was intended to hold.
This extra data overflows into adjacent memory space, overwriting the data held in that space.
This can cause the program to crash, and can be exploited by attackers to execute malicious code.
Input Overflow
Input overflow is a vulnerability in high level codes of a program.
It occurs when a program tries to store more data in a variable than it was intended to hold.
There are many other vulnerabilities as well, the main point is how to prevent them.
Solutions
HTTPS
HTTPS is a protocol that encrypts the data sent between a client and a server.
Secure Sockets Layer (SSL) and Transport Layer Security (TLS) are the two main protocols used to
implement HTTPS.
Server certificates are used to verify the identity of a server.
However:
Only secures the link for data transfer, not the data itself.
Doesn't perform any validation or safety checks on the data.
Negative impact on "caching" of resources like static files.
Some overhead on the server and client.
Data encryption
This involves encrypting sensitive data, such as passwords and credit card numbers, before it is stored or
transmitted over a network.
param302.bio.link
This helps to protect the data from being intercepted by attackers.
Both of these methods use placeholders to represent the values that will be inserted into the query.
Input Validation
Implement strict input validation on the server-side to validate and sanitize user inputs, preventing malicious
code injection and data manipulation.
Example:
app = Flask(__name__)
@app.route('/login', methods=['POST'])
def login():
username = request.form.get('username')
password = request.form.get('password')
# Validate input
if not username or not password:
return 'Invalid input', 400
# Authenticate user
if username == 'admin' and password == 'password':
return 'Login successful', 200
else:
return 'Invalid credentials', 401
param302.bio.link
Screencast
7.1 Logging
7.2 Debugging
param302.bio.link