0% found this document useful (0 votes)
9 views5 pages

Exercise 9

homework 9 networking

Uploaded by

nani chkhenkeli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

Exercise 9

homework 9 networking

Uploaded by

nani chkhenkeli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Exercise 9

1. Computing forces

Celestial mechanics:
2
𝐺∗|𝑟𝑖 −𝑟𝑗 |
𝑓(𝑖, 𝑗) = 𝑚𝑖 ∗ 𝑚𝑗

Cost: distance calculation - eucledian distance between bodies dominates cost, one square root has
time complexity of O(sqrt), subtractions - O(n)

constant factors G and masses of bodies are assumed constant and minimal cost. overall cost
O(1)+O(√𝑛).

Time: Brute force methods have time complexity of O(𝑛2 ), this is because each body needs to interact
with every other body, n*(n-1) calculation. But because of symmetrical interactions time complexity
becomes O(n^2)/2. (but for large n constant factor ½ becomes less significant).

Molecular dynamics:

Cost: almost similar to celestial mechanics, but higher because of cost associated with energy
function.

Time: Brute force methods have time complexity of O(𝑛2 ), systems with localized interactions can
reduce to O(n).

2. Heat dissipation

1) Serial time t1(n) = O(n)


𝑛 𝑛
Parallel time tp(n) = + O(√ )
√𝑝 𝑝

Number of processors – p
𝑡1(𝑛) 𝑝
Speedup S=𝑡𝑝(𝑛) = 2
+ 𝑂(1)

𝑆 1
Efficiency = 𝑝 = ½ + O(𝑝)

2)𝑡𝑠𝑡𝑟𝑖𝑝𝑒 = n x 𝑡𝑠𝑒𝑟𝑖𝑎𝑙 ( 𝑡𝑠𝑒𝑟𝑖𝑎𝑙 is time for simulating single grid point). Communication – O(n).

Total parallel runtime = 𝑡𝑠𝑡𝑟𝑖𝑝𝑒 + O(n)


𝑡𝑠𝑒𝑞𝑢𝑒𝑛𝑡𝑖𝑎𝑙 𝑛2 ∗𝑡𝑠𝑒𝑟𝑖𝑎𝑙 𝑝𝑛
Speedup 𝑡𝑠𝑡𝑟𝑖𝑝𝑒 = 𝑡 = 𝑛∗𝑡𝑠𝑒𝑟𝑖𝑎𝑙+𝑂(𝑛0
= 𝑛+𝑂(1)
=𝑝
𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑆𝑡𝑟𝑖𝑝𝑒

𝑆
Efficiency(squares) = 𝑝 = 1, efficiency for stripes is the same.
3. Matrix multiplication

Runtime: O(n^3) .

a(constant communication overhead factor),

n(problem size)

parallelizable part = O(n^3) – a


1
ideal speedup = 𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑖𝑧𝑎𝑏𝑙𝑒 𝑝𝑎𝑟𝑡 𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑖𝑧𝑎𝑏𝑙𝑒 𝑝𝑎𝑟𝑡 1
(1− )+( )𝑥 ( )
𝑡𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒 𝑡𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒 𝑝

𝑂(𝑛3 )−𝑎 𝑎
= 1- 𝑂(𝑛3)
𝑂(𝑛3 )

1 𝑂(𝑛3 )∗𝑝 𝑂(𝑛3 )


S= 𝑎 1 𝑎 = =
+ − 𝑂(𝑛3 )−𝑎(𝑝−1) 𝑝−𝑎
𝑂(𝑛3 ) 𝑝 𝑂(𝑛3 )∗𝑝

each processor stores portion of input matrices A,B. if each processor stores n/p rows of A, n
2𝑛2
columns of B, total number will be +n.
𝑝

4. Matrix multiplication

Each processor (ip, jp) has initial inputs, submatrices A,B and empty C.

Exchange blocks -

Each processor in √𝑝 communication rounds 0<= k <= √𝑝 − 1

In round k processor sends its block B(i,(𝑗𝑝 + 𝑘) % √𝑝) 𝑓𝑟𝑜𝑚 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟 (𝑖𝑝 , (𝑗𝑝 − k + √𝑝)%√𝑝 .

Processor receives block B(i’,(𝑗𝑝 + 𝑘) % √𝑝) from processor ((𝑖𝑝 + 𝑘) % √𝑝, 𝑗𝑝 )

Local computation –

Processor performs block multiplication between submatrix A and all received blocks B. results are in
blocks of C.

Communication of gathering –
Each processor in √𝑝 communication rounds 0<= k <= √𝑝 − 1

In round k processor sends its block C(i,(𝑗𝑝 + 𝑘) % √𝑝) to processor ((𝑖𝑝 − 𝑘 + √𝑝) % √𝑝, 𝑗𝑝 ),

Processor receives a block C((𝑖𝑝 + 𝑘) % √𝑝, ( 𝑗𝑝 + 𝑘)%√𝑝) from processor (𝑖𝑝 ,(𝑗𝑝 − 𝑘 + √𝑝) % √𝑝)

Local accumulation:

After √𝑝 𝑐𝑜𝑚𝑚𝑢𝑛𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑟𝑜𝑢𝑛𝑑𝑠, 𝑒𝑎𝑐ℎ 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟 𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑑 𝑎𝑙𝑙 𝑏𝑙𝑜𝑐𝑘𝑠 𝑓𝑜𝑟 𝐶(𝑖, 𝑗). Processor
accumulates received blocks into the final submatrix C(i,j).

Pseudo code would like the following –

FUNCTION parallelMult(A, B, n, p):

# Check if n and p are perfect squares

# Decompose matrices A and B into submatrices

submatricesA = decomposeMatrix(A, n, p)

submatricesB = decomposeMatrix(B, n, p)

# Initialize result matrix

submatricesC = initializeMatrix(n, p)

# Communication and computation rounds

FOR k = 0 TO sqrt(p) - 1:

FOR i = 0 TO sqrt(p) - 1:

# Calculate index for submatrix B to be received

j_recv = (i + k) % sqrt(p)

submatrixB = submatricesB[j_recv]

# Calculate index for submatrix C to be accumulated

j_out = i

computeProduct(submatricesA[k], submatrixB, submatricesC[i * sqrt(p) + j_out])

# Gather results

RETURN gatherResults(submatricesC)
Exercise 5: routing for a grid

1.

(0,0) -- (1,0) -- (2,0) -- (3,0)

| | | |

(0,1) -- (1,1) -- (2,1) -- (3,1)

| | | |

(0,2) -- (1,2) -- (2,2) -- (3,2)

| | | |

(0,3) -- (1,3) -- (2,3) -- (3,3)

2.

(0,0) -> 0 (0,1) -> 1 (0,2) -> 2 (0,3) -> 3

(1,0) -> 4 (1,1) -> 5 (1,2) -> 6 (1,3) -> 7

(2,0) -> 8 (2,1) -> 9 (2,2) -> 10 (2,3) -> 11

(3,0) -> 12 (3,1) -> 13 (3,2) -> 14 (3,3) -> 15

3. w=(0,1,4,5,8,9,12,13)

4. w=(0,1,4,5,8,9,12,13,3,7,11,15,2,6,10,14)

5.

Westward edges (0,1,4,5,8,9,12,13) form a cycle. These nodes not conflict, eastward edges(3,7,11,15,
2,6,10,14) connect nodes in a way that does not create conflicts.

6.
Route left (0,1,4,5,8,9,12,13)

Right(3,7,11,15,2,6,10,14)

Exercise6:

Case 1: cy(0) + th < e(0) - ts

Sender transmits bit x with time greater than clock skew ts.

During interval [e(0)-ts, e(0)], x is guaranteed to be stable on the bus before receivers first clock edge
in cycle i. cy(i) - R_cy(i) < e(0) - ts for all k ∈ [0, 6].

Case2: cy(0) + th ≥ e(0) – ts

Receiver might miss x in cycle I due to clock alignment. cy(i) - R_cy(i) ≥ 0 (only applies for cycle i).

Stable sender transmission and subsequent cycles guarantee correct sampling: cy(i + k) - R_cy(i + k) <
e(0) - ts for k ∈ [1, 6].

In both cases, the receiver samples the correct x for at least 7 consecutive cycles, starting from cycle β
= 0 (Case 1) or β = 1 or β=0 (Case 2).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy