Unit -01 easid
Unit -01 easid
Parallel computing involves multiple processors working together at the same time to solve a
problem. It breaks large problems into smaller, independent parts that are processed
simultaneously. After processing, the results are combined to complete the task.
1. Bit-level parallelism:
○ Reduces the number of instructions needed by increasing the size of data the
processor can handle in one step.
2. Instruction-level parallelism:
○ Hardware-based: The processor decides which instructions to run in parallel
during runtime.
○ Software-based: The compiler (software) decides which instructions to run in
parallel.
3. Task parallelism:
○ Runs different tasks at the same time on the same data using multiple
processors.
4. Superword-level parallelism:
○ Groups similar instructions into one operation to perform tasks faster.
A parallel computer is a system with multiple processors working together to solve a problem.
● Examples:
○ Supercomputers with hundreds or thousands of processors.
○ Workstations with multiple processors.
○ Embedded systems.
● How It Works:
○ Processors are connected and exchange data.
○ They work together on the same problem.
● Parallel Systems:
○ Processors are close together and solve one problem jointly.
● Distributed Systems:
○ Processors are spread across a large area and work on separate tasks.
○ Main goals:
■ Use all available resources.
■ Share information over a network.
Parallel computers can be grouped based on various features like the type of processors, how
they connect with each other, how they control operations, and how they handle input/output
tasks.
The CPU is the part of the computer that executes program instructions. It is sometimes called
the microprocessor. The CPU has three main components:
● ALU (Arithmetic and Logic Unit): Does calculations and logical operations.
● CU (Control Unit): Controls the ALU, memory, and input/output devices. It makes sure
everything in the computer works together.
● Registers: Fast storage areas in the CPU where data is kept temporarily before being
processed.
Registers:
The ALU performs math operations (like addition and subtraction) and logical operations (like
AND, OR, NOT).
The Control Unit directs the operations of the ALU, memory, and input/output devices. It tells
these components how to respond to the program instructions it reads from memory. It also
provides the timing and control signals needed for other parts of the computer to work together.
Buses
Buses are pathways that carry data between the CPU, memory, and input/output devices.
● Address Bus: Carries the memory addresses between the processor and memory.
● Data Bus: Carries data between the processor, memory, and input/output devices.
● Control Bus: Carries control signals from the CPU and status signals to coordinate the
computer’s activities.
Memory Unit
The Memory Unit consists of RAM (Random Access Memory), which is fast and directly
accessed by the CPU. It is used to store data temporarily while the CPU works with it. RAM is
divided into sections, each with a unique address. Data from permanent storage (like a hard
drive) is loaded into RAM so that the CPU can work faster.
Flynn’s Classical Taxonomy
1. SISD (Single Instruction stream, Single Data stream): A single instruction is executed
on a single data stream (like a traditional computer).
2. SIMD (Single Instruction stream, Multiple Data stream): The same instruction is
applied to multiple data streams at the same time (useful for tasks like image
processing).
● MISD means multiple instructions are executed, but only one data stream is used.
● This setup is rare and not commonly used in modern computing.
● It would involve different instructions working on the same data at different times, but this
setup isn't efficient in practice.
● MIMD means multiple instructions are executed at the same time, each on different data
streams.
● This architecture is widely used in parallel computing and is common in modern
supercomputers and multi-core processors.
● SIMD means one instruction is applied to multiple pieces of data at the same time.
● SIMD can be executed using techniques like pipelining or parallelism (multiple units
processing data simultaneously).
● Flynn divided SIMD into three types:
1. Array Processor:
○ All processing units receive the same instruction.
○ Each unit has its own memory and registers.
○ Modern version: SIMT (Single Instruction, Multiple Threads).
2. Pipelined Processor:
○ All units receive the same instruction.
○ They process pieces of data sequentially from a central resource (like memory or
registers).
○ Packed SIMD: A type where each unit processes a piece of data and writes it
back to memory.
3. Associative Processor:
○ All units receive the same instruction, but each unit makes its own decision
based on local data whether to execute or skip the instruction.
○ Modern name: Predicated (or Masked) SIMD.
○ Example: GPUs today use features from more than one of these types (SIMT
and Associative processing).
● CPU: A modern CPU may have multiple cores, each with its own instruction stream.
Cores may be organized in sockets, and there is usually memory sharing across
sockets.
● Node: A node is a standalone computer or unit with multiple CPUs/cores, memory, and
network interfaces. Nodes are often networked together in supercomputers.
● Task: A task is a unit of computational work, often in the form of a program or set of
instructions. Parallel programs involve multiple tasks running on multiple processors.
● Pipelining: Dividing a task into steps processed by different units, similar to an
assembly line. Inputs flow through each step, creating parallelism.
● Shared Memory: All processors have direct access to common physical memory, and
parallel tasks can directly access and modify memory locations.
● Symmetric Multi-Processor (SMP): A shared memory architecture where multiple
processors have equal access to all resources, like memory and disk.
● Distributed Memory: Each processor has local memory, and tasks can only access
their local memory. Communication is needed to access memory on other machines.
● Communications: Parallel tasks often need to exchange data. This can be done via
shared memory or through network communication.
Synchronization
Computational Granularity
Observed Speedup
Parallel Overhead
● Definition: The additional execution time required for managing parallel tasks, which is
not related to the useful computation.
● Includes:
○ Task start-up time
○ Synchronization
○ Data communication
○ Software overhead from libraries, operating systems, etc.
○ Task termination time
Massively Parallel
Embarrassingly Parallel
● Definition: A class of parallel tasks that are highly independent and require little to no
coordination between tasks.
● Examples: Simultaneously solving similar independent tasks like data processing or
simulations.
Scalability
● Primary Goal: To increase computation power and solve problems faster by distributing
work across multiple processors.
● Advantages:
○ Reduces time and cost by enabling simultaneous task processing.
○ Solves larger problems that serial computing cannot handle.
○ Efficient use of non-local resources (e.g., cloud resources or the Internet).
○ Improves overall hardware utilization, reducing waste in computing power.
○ Essential for managing large datasets and real-time dynamic simulations.
● Memory Access:
○ Accessing local memory is cheaper than accessing remote memory (from
different nodes).
○ Locality: Frequent access to local data is crucial for efficient parallel software.
○ Impact of Locality: The ratio of remote to local access costs can vary, with
remote access being up to 1000 times more expensive in some cases.
Key Features:
● The size of distributed systems can range from a few devices to millions of computers
spread across different locations.
● The network connecting these devices can be wired or wireless.
● These systems are dynamic, meaning computers can join or leave at any time, affecting
performance.
● Middleware is software that sits above the operating systems of the nodes in a
distributed system.
● It helps manage resources, enabling efficient sharing across the network and offering
services like:
○ Communication (e.g., Remote Procedure Call - RPC)
○ Security
○ Accounting
○ Failure recovery
● Middleware simplifies development by providing common services so developers don't
need to recreate them.
Cluster Computing:
● A group of similar computers (e.g., workstations or PCs) connected by a high-speed
local network, all running the same operating system.
Grid Computing:
● A system where resources from different organizations work together to form a virtual
organization, allowing collaboration across institutions.
● It uses a multi-layer architecture:
1. Fabric Layer: Interfaces with local resources.
2. Connectivity Layer: Supports communication between resources.
3. Resource Layer: Manages individual resources.
4. Collective Layer: Manages access to multiple resources.
5. Application Layer: Includes the applications that use the grid environment.
IBM's Definition:
● A cloud is a pool of virtualized computer resources that can run various tasks, from
backend jobs to interactive applications.
Cloud Layers:
1. Hardware Layer:
○ This layer includes the physical resources like processors, routers, power, and
cooling systems, typically managed in data centers. Users generally do not
interact directly with this layer.
2. Infrastructure Layer:
○ The backbone of cloud computing, providing virtualized storage and computing
resources. It involves managing virtual servers and storage devices.
3. Platform Layer:
○ This layer offers tools for developing and deploying applications in the cloud.
Developers use vendor-specific APIs to upload and execute their applications on
the cloud.
4. Application Layer:
○ This is where actual applications run, like text processors, spreadsheets, and
presentation software, and are accessible to users for further customization.
1. Infrastructure-as-a-Service (IaaS):
○ Provides hardware and infrastructure like virtual servers and storage.
2. Platform-as-a-Service (PaaS):
○ Offers a platform for developers to build and deploy applications.
3. Software-as-a-Service (SaaS):
○ Provides access to software applications hosted on the cloud, like email or office
applications.
● Distributed systems often arise in organizations with networked applications that need to
work together, but where integration between these applications is difficult.
● Server-Client Model:
○ A server runs a networked application (like a database) and makes it available to
remote clients. Clients send requests, and the server processes them.
● Distributed Transactions:
○ Clients can bundle multiple requests into a single larger request to be executed
as a distributed transaction. Either all requests succeed or none of them do.
Transaction Primitives:
1. Communication Models:
○ With the decoupling of applications from their databases, inter-application
communication became crucial. This communication is managed through
several middleware technologies.
2. Middleware Communication Types:
○ Remote Procedure Call (RPC): Allows an application component to call another
application component remotely, as if it were a local procedure.
■ Drawback: Requires both the caller and callee to be running at the same
time.
○ Remote Method Invocation (RMI): Similar to RPC, but operates on objects
instead of procedures.
■ Drawback: Tight coupling between the caller and callee.
○ Message-Oriented Middleware (MOM): Enables asynchronous communication.
Applications send messages to predefined logical contact points, and the system
ensures that messages reach their intended recipients. This is part of
publish-subscribe systems.
3. Methods of Application Integration:
○ File Transfer: One application produces a file that another application reads.
■ Challenges: Agreement on file format, file management, and handling
updates.
○ Shared Database: All applications access the same database.
■ Challenges: Designing a common schema and performance bottlenecks.
○ Remote Procedure Call (RPC): Allows one application to invoke a procedure on
another without direct access.
■ Challenges: Requires both applications to be running at the same time.
○ Messaging: Ensures that requests and responses are delivered asynchronously,
even if the systems are temporarily unavailable.
Pervasive Systems
1. Pervasive Systems:
○With mobile and embedded computing devices, pervasive systems emerge.
These systems aim to seamlessly blend into the environment, often not requiring
direct interaction from the user.
○ Devices are small, battery-powered, mobile, and often connect wirelessly to the
network, forming part of the Internet of Things (IoT).
2. Key Requirements for Pervasive Systems:
○ Context Awareness: Devices must be aware of environmental changes, such as
network availability, and adjust their behavior accordingly.
○ Ad-Hoc Composition: Devices should be configurable, either by the user or
automatically, to form a useful suite of applications.
○ Sharing: Devices should easily share and access information, adapting to
intermittent and changing connectivity.
3. Types of Pervasive Systems:
○ Ubiquitous Computing Systems: Devices are networked and continuously
present in the environment. These systems are designed to be transparent, with
minimal user interaction, and must be context-aware and autonomous.
○ Mobile Computing Systems: These systems rely on mobile devices like
smartphones and tablets that use wireless communication for network
connectivity.
○ Sensor Networks: These networks are composed of devices that collect and
share data through sensors.
Second, in mobile computing the location of a device is assumed to change over time. A
changing location has its effects on many issues. Changing locations also has a profound effect
on communication.
Memory Hierarchies
Memory in parallel computers is organized in layers or levels to balance speed and storage.
These layers can be categorized as follows:
1. Primary Memory
○ CPU Registers: Small, very fast memory locations within the CPU used for
immediate data manipulation.
○ Cache Memory: A smaller, faster memory that stores copies of frequently
accessed data from the main memory to speed up access.
○ Physical/Main Memory: The main memory (RAM) where active programs and
data are stored. It’s slower than cache but has larger capacity.
2. Secondary/Auxiliary Memory
○ Solid State Memory: Fast, non-volatile storage (e.g., SSDs).
○ Magnetic Memory: Traditional, slower storage (e.g., hard drives).
There are three main types of memory architectures used in parallel computing systems:
1. Shared Memory
2. Distributed Memory
3. Hybrid Distributed-Shared Memory
Shared Memory
General Characteristics:
● In shared memory systems, all processors can access a single global memory address
space, meaning any changes made to memory by one processor are visible to all other
processors.
● These systems allow multiple processors to work independently while sharing the same
memory resources.
Key Types:
● In the NUMA system, memory access time depends on the memory's location.
● Memory is distributed across processors, called local memories.
● Global address space is formed by all these local memories, accessible by every
processor.
● Often created by linking two or more SMPs (Symmetric Multiprocessors), so one SMP
can access the memory of another.
● Memory access across links is slower compared to local access.
● When cache coherency is maintained, it's called CC-NUMA (Cache Coherent NUMA).
● A special NUMA model where all distributed memory is turned into cache memory.
1. Scalability Issues: Adding more CPUs increases memory traffic and makes cache
coherence more challenging to manage.
2. Synchronization Challenges: Programmers must ensure correct access to shared
memory to avoid conflicts.
Distributed Memory