0% found this document useful (0 votes)
125 views17 pages

(MIT 6.1800) Spring 2025 Notes

Notes from 6.1800 - Computer Systems Engineering at MIT.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views17 pages

(MIT 6.1800) Spring 2025 Notes

Notes from 6.1800 - Computer Systems Engineering at MIT.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

6.

1800 Master Notes Doc


Isabella Zhu
5 May 2025

§1 Lecture 1: Modularity, Abstraction, Systems


Complexity makes building systems hard because it limits what we can build. We
mitigate complexity via design principles like modularity and abstraction.

A client/server model helps us enforce modularity where the two modules reside
on different machines and communicate with RPCs.

A remote procedure call is different from a procedure call because the procedure caller
is on a different machine. This introduces the problem of network/server failures.

When designing a system, we also care about scalability, security, performance, and
fault-tolerance/reliability.

§2 Lecture 2: Naming
Names are used to allow modules to interact. They let us achieve modularity by providing
communication and organization.

Components of a naming scheme are

1. The set of all possible names.

2. The set of all possible values.

3. A look-up algorithm to translate a name into a value.

§2.1 DNS
In DNS, the names are hostnames (e.g. eecs.mit.edu) and the values are IP addresses
(e.g. 18.25.0.23).

DNS is organized by a tree hierarchy system. The root nameserver looks up the IP of
the next nameserver, and keep propagating down until we find the correct server.

1
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

§3 Lecture 3: Virtual Memory


Virtualization is how we put several modules on the same machine. This ensures that
programs can’t refer to each others’ memory. Every program appears to have access to a
full 32-bit address space, but they are actually sharing this with other programs.

The memory management unit (MMU) needs to translate virtual memory ad-
dresses into physical ones. Memory (RAM) is temporary and used for quick access,
while storage is permanent and used for long-term data retention.

An OS uses page tables to virtualize memory to save space. It’s inefficient to index by
full virtual address.

Translation occurs by getting the virtual page number from the top 20 bits, looking
that up in a page table to get the physical page number, and then adding the offset
(bottom 12 bits).

If there is not enough memory to store all programs’ instructions, page table entries
contain additional bits that help us deal with this problem.

The kernel’s job is to manage page faults and other exceptions.

Other bits in PTEs are:


• P bit: is the page currently in memory? If page not in memory, access triggers
exception, which kernel handles.
• U/K bit: whether operating in user mode or kernel mode.
• R/W bit: is the program allowed to write to this address
A virtual address is translated to a physical address using multilevel page tables by
traversing down the tree.

A multilevel page table saves space compared to normal page tables because you have
less rows per thing, but more table lookups and more exceptions.

§4 Lecture 4: Bounded Buffers and Locks


A bounded buffer is a buffer that stores (up to) N messages, programs can send
and receive messages via this buffer. An operating system uses it for inter-program
communication.

A race condition is when two parties try to take the same action and one ends
up overwriting the other. This will happen in the basic version of send and receive.
1 send(bb, message):
2 while True:
3 if bb.in - bb.out < N:
4 bb.buf[bb.in mod N] <- message
5 bb.in <- bb.in + 1
6 return

2
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

1 receive(bb):
2 while True:
3 if bb.out < bb.in:
4 message <- bb.buf[bb.out mod N]
5 bb.out <- bb.out + 1
6 return message

A lock allows only one CPU to be inside a piece of code at a time. Programs can acquire
and release a lock.

Deadlock refers to when two programs are waiting on each other, and neither can
make progress until the other one does. This can be fixed with an additional acquire
and release.

The final send code is


1 send(bb, message):
2 acquire(bb.lock)
3 while bb.in - bb.out >= N:
4 release(bb.lock) // to prevent deadlock
5 acquire(bb.lock)
6 bb.buf[bb.in mod N] <- message
7 bb.in <- bb.in + 1
8 release(bb.lock)
9 return

The final receive code is


1 receive(bb):
2 acquire(bb.lock)
3 while bb.out >= bb.in:
4 release(bb.lock)
5 acquire(bb.lock)
6 message <- bb.buf[bb.out mod N]
7 bb.out <- bb.out + 1
8 release(bb.lock)
9 return message

We have to assume acquire and release are atomic actions, which means they can’t be
interrupted. Atomic actions and performance are a tradeoff.

§5 Lecture 5: Threads
A thread is a virtual processor that can suspend and resume. Threads allow multiple
programs to share a CPU.

Suspending a thread means pausing it and allowing another thread to run. This is
done with the keyword yield. Resuming a thread means the thread unpauses.

Condition variables let threads wait for events (”conditions”) and get notified, can
wait on a condition and be notified of it occurring.

3
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

The difference between yield and yield_wait is that yield_wait doesn’t acquire
and release t_lock since wait does this already.

Preemption means forcibly interrupting threads. Ensures wait and yield are ac-
tually called.

§6 Lecture 6: OS Structure, Virtual Machines


The role of a virtual machine monitor (VMM) is to virtualize the physical hardware
for the guest OSes.

The VMM will intercept (”trap”) when guest OS executes a privileged instructions, and
then the VMM will emulate the instruction.

The VMM handles virtual memory for guest OSes by making two page tables:

1. guest OS page table: guest virtual to guest physical

2. VMM page table: guest physical to host physical

which can be combined to form a host page table mapping guest virtual to host physical
addresses.

Guest OS page tables are marked as read-only memory so that modifications to


these page tables also trigger exceptions (and thus allow the VMM to update the other
tables).

The VMM deals with the U/K bit for guest OSes by making guest OSes run in user
mode. VMM will replace problematic instructions with ones it can trap and emulate.
Architecture provides a special operating mode for VMMs in addition to user mode,
kernel mode.

A monolithic kernel is when there is no modularity within the kernel itself. A


microkernel enforces modularity by putting subsystems in user programs.

Many OSes are monolithic kernels because of performance.

§7 Lecture 7: OS Performance
A performance bottleneck is where the performance is being constrained.

It’s helpful to have a model of system when thinking about performance. Common
performance metrics are

• latency: how long does it take to complete a single request?

• throughput: how many requests per unit of time?

• utilization: what fraction of resources are being utilized?

4
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

General approach to improving performance is to measure our systems to find a bottle-


neck, and then relax the bottleneck with general techniques such as caching, parallelism,
etc.

The disk is often the main bottleneck in reading/writing stored data.

For HDDs (common in datacenters), read/writes are slow but can be improved by
not doing random access. Seeking takes the longest time.

For SSDs, read/writes are faster because SSDs don’t involve moving parts. However, the
SSD controller is careful about how it writes new data and makes changes to existing data.

Batching reads on HDDs improves performance because of consecutive memory.

A database management system (DBMS) is good at predicting what the next query will
be, compared to a filesystem. It’s in a good position to exploit block-level control over
loading or evicting data to memory.

§8 Lecture 8: Intro to Networking and Layering


Modules on separate machines communicate through a network.
Definition 8.1. A point-to-point link is when the source talks to a directly-connected
destination.
Definition 8.2. A switch helps forward data to destinations that are far away.
Set of links and switches becomes a network.

§8.1 Four Layers


There are four layers in our network model:

1. Link: communication between two directly-connected nodes.

2. Network: naming, addressing, routing

3. Transport: sharing the network, reliability (or not)

4. Application: the things generating the traffic

A layered model is useful because we can swap out protocols at one layer without much
change to protocols at other layers.

§9 Lecture 9: Network Layer - Routing


The goal of a routing protocol is to allow each switch to know, for every node dst in
the network, a minimum-cost route to dst.

5
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

§9.1 Distributed Routing Protocols


In a distributed routing protocol, nodes build their own routing tables, instead of
being given tables by a centralized authority. The three main steps of a distributed
routing protocol are

1. Nodes learn about their neighbors via the HELLO protocol.

2. Nodes learn about other reachable nodes via advertisements.

3. Nodes determine the min-cost routes.

We use distributed routing protocols because these steps can happen periodically, which
allows the routing protocol to detect and respond to failures, and adapt to other changes
in the network.

§9.2 Link State Routing


Link-state routing spreads full topology info so that nodes can run shortest-path
algorithm. An advertisement contains its link costs to each of its neighbors. If node A
sends out an advertisement, then all other nodes get the advertisement (via flooding).

Nodes keep track of which advertisements they’ve forwarded so they don’t re-forward
them. Nodes use Dijkstra’s algorithm to integrate advertisements. Each node keeps
track of a table with three columns, dst, route, and cost.

Link state routing has its pros and cons:

• Pros: Flooding makes link-state routing very resilient to failure.

• Cons: There is a lot of overhead associated with flooding.

§9.3 Distance Vector Routing


Distance-vector routing disseminates information about the current min costs to each
node, rather than complete topology. An advertisement contains its current costs to
every node its aware of. If node A sends out an advertisement, then only its neighbors
will receive it.

Nodes use the ... algorithm to integrate advertisements. This works as follows... Distance
vector routing has its pros and cons:

• Pros: Less overhead.

• Cons: Failures can be complicated because of timing. Failures are also hard to
handle.

The order in which advertisements are received by nodes matters. When there is a
failure, cost will count to infinity. The workaround for this is using the split horizon
strategy, which doesn’t send advertisements about a route to the node providing the
route, but this doesn’t always work.

6
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

§10 Lecture 10: BGP (Border Gateway Protocol)


Definition 10.1. An autonomous system (AS) is a collection of IP networks and
routers under the control of a single organization that presents a common routing policy
to the internet.
There are three components that contribute to scalable routing on the internet.

1. Hierarchy of routing: route between ASes, and then within an AS

2. Path-vector routing: similar to distance-vector, but advertisements include the


path, to allow nodes to detect and avoid routing loops.

3. Topological addressing: assign addresses in contiguous blocks to make adver-


tisements smaller

Definition 10.2. Policy routing is where packets are forwarded based on specific
policies set by network administrator, not just shortest path.
Different AS relationships include

• Customer-provider: customer pays provider for transit

• Peering: allows free mutual access to each others’ customers (as long as amount of
traffic is approx equal in each direction)

All the top tier ISPs (Internet stakeholders) peer to allow for global connectivity.

§10.1 Export and Import Policies


Export policies are which routes are advertised to whom. This is affected by different
AS relationships.

Providers tell all neighbors about their customers and tell their customers about all
neighbors.

Peers will tell each other about their customers.

ASes will set their own import policies. If an AS hears about multiple routes to
a destination, it will prefer to use its customers first, then peers, then providers.

§10.2 BGP
BGP (border gateway protocol) as a distributed routing protocol:

1. Nodes learn about their neighbors via the HELLO protocol. Nodes send ”KEEPALIVE”
messages to their neighbors once every sixty seconds.

2. Nodes learn about other reachable nodes via advertisements. Advertisements differ
based on AS relationships (customer/provider, peer).

3. Nodes determine the min-cost routes. Nodes choose which routes to use based on
AS relationship and other properties.

7
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

BGP is an application-layer protocol, even though it deals with routing. It runs on top of
TCP which provides reliable transport.

This lets BGP handle failures differently than link-state and distance-vector routing.

BGP scales to the Internet, but size of routing tables and route instability all cause
scaling issues. BGP is not secure.

§11 Lecture 11: Transport Layer - TCP


A reliable transport protocol delivers each byte of data exactly once, in order to the
receiving application over an unreliable network.

§11.1 Reliable Transport Basics


A sequence number is used to order the packets. An acknowledgement (ACK)
is used to confirm that a packet has been received. An ACK with sequence number k
indicates that the receiver has received all packets up to and including k. The sender
is allowed to have W outstanding packets at once, but no more. This is known as a
sliding-window protocol.

A TCP sender uses timeout to infer that a packet has been lost. It will resend the packet
after a certain timeout. Spurious retransmission is when the sender retransmitted a
packet that had already been ACKed.

§11.2 Congestion Control Basics


Congestion control is for The goals of congestion control are to control the source
rates to achieve efficiency and fairness.
• Efficiency: minimize drops, minimize delay, maximize bottleneck utilization.

• Fairness: under infinite offered load, split bandwidth evenly among all sources
sharing a bottleneck.
The window W refers to the max number of outstanding packets that the sender can
have. We control for W using AIMD.

TCP’s congestion control is AIMD (additive increase multiplicative decrease).


Every RTT, if there is no loss, W = W + 1, otherwise W = W/2. The idea behind this
algorithm is that AIMD makes R1 and R2 oscillate around fixed point (see slide deck
11).

§11.3 Additional Congestion Control Mechanisms


A slow start is at the start of the connection, double W every RTT. This is because
sender doesn’t know available bandwidth or congestion state of network.

Fast retransmit/fast recovery means retransmit packet k + 1 as soon as four ACKs


with sequence number k are received.

8
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

Retransmission due to timeout happens when there is significant loss, as compared


to retransmission due to fast recovery. Senders are conservative and will drop window
down to 1 when this happens.

An issue about TCP that will be addressed next lecture is that it doesn’t react to
congestion until after it’s a problem, so we want to get senders to react before queues
are full.

§12 Lecture 12: In-Network Resource Management


This is on the network level.

§12.1 Queue Management


Definition 12.1. DropTail queueing is dropping packets only when the queue is full.
This is simple but leads to high delays and synchronizes flows.
A better alternative (RED protocol + ECN) is to drop packets before the queue is
full, with increasing probability as the queue grows. This prevents queue lengths from
oscillating. Decreases delay, flows don’t synchronize, but this is more complex and it’s
hard to pick parameters.

§12.2 Scheduling
Delay-based scheduling is when we put latency-sensitive traffic in its own queue and
serve that queue first. This doesn’t prevent latency-sensitive traffic from ”starving out”
the other traffic.

Bandwidth-based scheduling is when we allocate specific amounts of bandwidth.


1 in each round:
2 for each queue q:
3 q.credit += q.quantum
4
5 while q.credit >= size of next packet p:
6 q.credit -= size of p
7 send p

Round-robin scheduling can’t handle variable packet sizes and doesn’t allow us to
weight traffic differently. Deficit round-robin scheduling handles variable packet sizes
(even within the same queue), near-perfect fairness and low packet processing overhead.

§13 Lecture 13: Application Layer


A client/server model delivers content via server providing content and client requests
and consumes them. The main downside is that it doesn’t scale well.

§13.1 P2P Networks


A P2P network delivers content by...

9
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

1. Download .torrent file from known website.

2. Contact tracker for list of peers.

3. Communicate with (some) peers to download and upload blocks.

The BitTorrent P2P network distributes large files efficiently by breaking them into
smaller pieces and sharing them across multiple peers.

The .torrent file contains info about the file name, file size, information about the
”blocks” of the file, and the tracker URL.

The tracker contains a list of peers.

Peers are incentivized to upload data because peers prioritize uploading to those who
also upload back.

§13.2 CDNs
CDNs work by

1. Geographically distribute the servers.

2. Replicate a particular piece of content p on some of them.

3. When a client requests p, direct them to the ”best” server that has a copy of p.

A CDN owner (like Akamai) might take geographical proximity, RTT, bandwidth, and
throughput into account when deciding which server is ”best” for a particular client.

§14 Lecture 14: Datacenters and Clouds


The cloud refers to a network of remote servers that store, manage, and process data
over the internet.

§14.1 Physical Infrastructure of Datacenter


A rack is a stack of multiple physical machines.

The network topology provides communications between racks. One example is the clos
topology which looks like many copies of trees (to provide for redundancies.

We route using multi-path routing, which can load-balance across paths, but we
need to be careful about how we divide traffic across the paths (since this makes conges-
tion control more difficult). The centralized controller in a datacenter is responsible
for managing and optimizing compute, storage, and network resources to ensure efficient
operations.

Compared to the Internet, datacenter networks are under the control of a single admin
entity, so we have a higher level of control.

10
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

§15 Lecture 15: Reliability


Reliability is how our systems deal with failures. Build reliable systems by
1. Identify all possible faults, decide which ones going to handle.
2. Detect/contain faults.
3. Handle faults (recover).
Some metrics used to measure failure are annualized failure rate (AFR) and mean
time between failures (MTBF).

§15.1 RAID
RAID stands for redundant array of independent disks. Different types of RAID.
• RAID 1: mirroring. Pro is that can recover from single-disk failure, con is requires
2N disks.
• RAID 4: dedicated parity disk (which is the XOR of all the previous disks).
Pros are can recover from single disk failure, requires N + 1 disks (instead of N ).
Performance benefits if you stripe a single file across multiple data disks. Con is
that all writes go to the parity disk (so the disk gets worn out faster).
• RAID 5: instead of writing on one disk, distribute the parities (for example, sector
i’s write can be on disk i (mod N + 1)).
Our main tool for improving reliability is redundancy. RAID 5 protects against
single-disk failures while maintaining good performance.

§16 Lecture 16: Atomicity, Isolation, Transactions


An action is atomic if it happens completely or not at all. It’s easier to reason about
failures if we assume some actions are atomic. One example is the bank account updates
on slides. The takeaway here is that it’s easiest to make rename atomic.

To do this, we make a shadow copy of the file called temp, which we don’t care
about being atomic.

Isolation refers to how and when the effects of one action are visible on another.

§17 Lecture 17: Logging


Atomicity: logging, which is going to provide us with better performance at the cost of
some added complexity.

We keep a log of updates and commits, so that when we call read method, we look
through the logs to get the result. We will only read if a commit exists. If program
crashes halfway through, no commit, so value will not be read.
• writes contain the old and new value of a variable. each write is a small append to
the end of the log.

11
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

• to read a variable x, the system scans backwards through the log to find x’s last
uncommitted value.
• the commit point for a transaction is writing the COMMIT record.
The problem is that reads can be very slow. To fix this, we add cell storage on disk
which stores A and B. We also add a recover operation, which undos any writes that
don’t have an associated COMMIT. The issue is now that write becomes slow, because
write has to go to the log first and then cell storage.

Instead, we use a cache instead of disk. When we write, set cache value. When we read,
attempt to read from cache, otherwise read from disk. We also have an operation called
flush which is called occasionally to update the disk values to reflect the cache values.

To improve performance for recovery, we can write checkpoints and truncate the log.

§18 Lecture 18: Isolation


Isolation is provided by two-phase locking. Our goal is to run transactions T1 , T2 , . . . Tn
concurrently and have it appear as if they ran sequentially.

This implies that some interleavings will not work, since some interleavings may produce
results that aren’t achievable through sequential running.

However, we may want to enforce even stricter conditions. For example, even if we end
up with the same result, what if a set of read or write occurs where this set would not
be possible in sequential order? Is this still ok? The answer is that it depends, as there
are different notions of serializability.

§18.1 Conflicting Schedules


Two operations conflict if they operate on the same object and at least one of them is a
write. In any schedule, two conflicting operations A and B will have an order: either A is
executed before B or vice versa. This is called the order of the conflict in that schedule.
If T1 and T2 are executed serially, then in the ordering of the conflicts we see either all of
T1 ’s operations occurring first or all of T2 ’s operations occurring first.

Even if we interleave, it’s possible that our order of conflicts is either same as one
of the sequential runs or different. A schedule is conflict serializable if the order of all
of its conflicts is the same as the order of the conflicts in some sequential schedule.

§18.2 Conflict Graphs


Express order of conflicts more succintly with a conflict graph: there is an edge from Ti
to Tj iff Ti and Tj have a conflict between them and the first step in the conflict occurs
in Ti . A conflict serializable conflict graph might look like
T2 → T1
while a conflict nonserializable conflict graph might look like
T2 ↔ T1

12
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

A schedule is conflict serializable iff it has an acyclic conflict graph.

§18.3 Two-Phase Locking (2PL)


The purpose of two-phase locking is to produce a schedule that is conflict serializable
without searching through all possible schedules.
1. Each shared variable has a lock.
2. Before any operation on a variable, the transaction must acquire the corresponding
lock.
3. After a transaction releases a lock, it may not acquire any other locks.
Note that we still have options for where we put the acquire and releases. As long as all
releases happen after all acquires and all operations of the corresponding variable.

One issue here is that 2PL can result in deadlock. One solution is global ordering
on locks. A better solution is to take advantage of atomicity and abort one of the
transactions. I don’t really understand what they mean about ”aborting a transaction”,
they also don’t give an example of this.

We can use reader/writer locks.


1. Each shared variable now has two locks: one for reading and one for writing.
2. Before any operation on a variable, the transaction must acquire the appropriate
lock.
3. Multiple transactions can hold reader locks for the same variable at once, a
transaction can only hold a writer lock for a variable if there are no other locks
held for that variable.
4. After a transaction releases a lock, it may not acquire any other locks.
This allows for better performance since reads can now happen concurrently.

§19 Lecture 19: Distributed Transactions


We want atomicity across machines. For example, suppose we want to run transfer(A,Z,amount),
but A and Z are on different servers.

One possible issue is if one server commits but the other doesn’t. Our goal is to
develop a protocol that can provide multi-site atomicity in the face of all sorts of
failures.

§19.1 Two-Phase Commit


A two-phase commit is when nodes agree that they’re ready to commit before com-
miting. This works if worker failure happens before or during the prepare phase.

If worker failure happens during commit phase, we cannot abort the transaction. Workers
must be able to recover into a prepared state and then commit. Workers write PREPARE
records once prepared. The recovery process (reading through the log) will indicate
which transactions are prepared but not committed.

13
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

§19.2 Prepare Phase


Why do we have a prepare phase? Serves the following purposes
1. Gives servers chance to abort the transaction even if they haven’t failed entirely.
2. Allows for recovery from failure during commit phase.

§19.3 Coordinator Failures


Worker failure or coordinator failure during prepare phase: coordinator can safely abort
transaction, will send explicit abort messages to live workers.

If coordinator fails during commit phase, machines must commit the transaction during
recovery. One performance issue is that if the coordinator fails during the prepare phase,
it will block the transaction from progressing.

§19.4 Remaining Issues


When some workers fail, some of the data is completely unavailable. Solution is to
replicate data. However, we need to keep multiple copies of the same data consistent,
which is hard.

§20 Replicated State Machines


To increase availability, try replicating data on two servers. An issue is that order of
messages can cause replicas to become inconsistent.

Instead, we can try to make one replica the primary replica and have coordina-
tors in place to help manage failures. Clients communicate only with the coordinator
(not replica). The coordinator sends requests to primary server. Primary ACKs coordina-
tor only after it’s sure that backup has all updates. If primary fails, C (the coordinator)
switches to backup.

Let’s introduce the concept of a network partition. Machines on the same side
of the line can communicate with each other. Because two different replicas both think
that they are the primary replica, data can become inconsistent.

To fix this, we introduce the view server to determine which replica is primary, in hopes
that we can deal with network partitions. The view server keeps a table that maintains
a sequence of views. The view server alerts primary/backups about their roles.

If a machine is primary in view n, it must have been primary or backup in view


n − 1 (excluding view 1). This is because if S1 was the original primary and its down
now (still believes it’s the primary), it won’t ask the new primary for an ack so it would
still be able to push changes, resulting in inconsistency.

§21 Authentication
We’re starting our last section on security (i.e. how our system copes in the face of
targeted attacks). For example,

14
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

• Policy: provide authentication for users.

• Threat model: adversary has access to the entire stored table.


The issue here is that the adversary can just read passwords directly.

§21.1 Hash Functions


Hash function H takes in an input string and outputs a fixed-length string
• H is deterministic: if x1 = x2 , then H(x1 ) = H(x2 ).

• H is collision-resistant: if x1 ̸= x2 , then the probability that H(x1 ) = H(x2 ) is


basically zero.

• H is one-way: given x, it’s easy to compute H(x) but not the other way around.
If we store the hash of the password, this is now safer.

However, we have another problem: the adversary can easily pre-compute hashes for a
lot of passwords. So, we introduce a special type of hash function.

§21.2 Slow Hash Functions


A slow hash function H is a hash function that takes a longer amount of time to
compute H(x). This ensures that an adversary can compute at most a few passwords.

In this case, the attacker is incentivized to concentrate on the most common passwords.
One idea to remedy this is to add randomness.

§21.3 Randomness
We will associate a random value (a salt) with each user. These salts are stored in
plaintext and are not a secret.

Instead of storing a hash of the password, we will concatenate the password and the salt
and store the hash of that string.

The adversary can’t really do anything anymore because they would have to precompute
for every possible salt.

§22 Low-Level Attacks


In a program stack, each function should be able to access its own local variables. After
a function returns, the next line of the calling function should execute.

To return to main() after function() ends, we use BP (base pointer) to locate the start
of the current stack frame. The previous values of BP and IP (instruction pointer) are
located at a fixed offset from that so we can reset BP and IP and continue on.

Adversary’s goal is to input a string that overwrites modified. idk what’s happen-
ing anymore.

15
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

§22.1 Compilers
Compilers take source code as an input and output machine code.

§23 Secure Channels


We’re going to look at adversaries that are observing data on the network. Some packet
data can reveal what you’re doing even if packet headers are difficult to interpret.

Our policy is to provide confidentiality and integrity. We will encrypt and decrypt
a message using a key. The issue is that the adversary still can tamper with packets.

Instead, we send over ciphertext and a token, where MAC(key, message) = token.
We can’t get token without knowing key, we can’t get message even if we know token
and key. This solves the issue of integrity.

The problem is that an adversary can intercept a message and resend it at a later
time. Solution is to use sequence numbers.

§24 Tor
Today we’re still looking at adversaries observing data on the network. Symmetric-key
cryptography is when the same key is used to encrypt and decrypt. This means that
Alice and Bob share the same key, which is difficult because how do we even share the
key in the first place?

Public-key cryptography is when a message to x is encrypted with x’s public key,


only x’s secret key can decrypt the message.

However, we want to provide anonymity, i.e. it’s a problem if the packet header
exposes to the adversary that A is communicating with S.

One solution to this is to have a proxy P , which states ”from A to P ”, and then
the packet header changes to ”from P to S”.

However, we have a new problem: no entity in the network should receive a packet from
A and send it directly to S. No entity in the network should keep state that links A to S.

Solution is to have a chain of proxies A → P1 → P2 → P3 → S.

However, what if the adversary has multiple vantage points and can observe the same
data traveling from A to S? This means that data cannot appear the same across packets.

Solution to this is onion routing, which adds layers of encryption that proxies strip
off one by one. The setup chains skipping one in the middle, i.e. A → P2 , followed by
P1 → P3 , followed by P2 → S.

In practice, tor uses public-key cryptography to securely exchange symmetric keys

16
Isabella Zhu — 5 May 2025 6.1800 Master Notes Doc

between A and each node in the circuit, and the layers of encryption use those symmetric
keys, which is what allows traffic to travel in both directions.

17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy