Skip to content

Commit a78e120

Browse files
authored
Moved some things around (#475)
1 parent b0f23a7 commit a78e120

File tree

1 file changed

+24
-26
lines changed

1 file changed

+24
-26
lines changed

pgml-docs/docs/blog/scaling-postgresml-to-one-million-requests-per-second.md

Lines changed: 24 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ There are many poolers available presently, the most notable being PgBouncer, wh
8080
- Failover in case a read replica is broken
8181
- Sharding (this feature is still being developed)
8282

83-
In this benchmark, we used its load balancing feature to evenly distribute our XGBoost predictions across our Postgres replicas.
83+
In this benchmark, we used its load balancing feature to evenly distribute XGBoost predictions across our Postgres replicas.
8484

8585

8686
#### Postgres Replicas
@@ -90,9 +90,11 @@ Scaling Postgres reads is a solved problem. If more read queries are coming in,
9090

9191
## Results
9292

93-
We ran over a 100 different benchmarks, by changing the number of clients, poolers, replicas, and XGBoost predictions we requested. Our raw data is available [below](#methodology).
93+
We ran over a 100 different benchmarks, by changing the number of clients, poolers, replicas, and XGBoost predictions we requested. The benchmarks were meant to test the limits of each configuration, and what remediations were needed in each scenario. Our raw data is available [below](#methodology).
9494

95-
### Summary
95+
One of the tests we ran used 1,000 clients, which were connected to 1, 2, and 5 replicas. The results were exactly what we expected.
96+
97+
### Linear Scaling
9698

9799
<center>
98100
<iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vRm4aEylX8xMNmO-HFFxr67gbZDQ8rh_vss1HvX0tWAUD_zxkwYYNhiBObT1LVe8m6ELZ0seOzmH0ZL/pubchart?oid=2028066210&amp;format=interactive"></iframe>
@@ -106,32 +108,16 @@ Both latency and throughput, the standard measurements of system performance, sc
106108

107109
Our architecture shares nothing and requires no synchronization. The replicas don't talk to each other and the poolers don't either. Every component has the knowledge it needs (through configuration) to do its job, and they do it well.
108110

109-
### Scaling to Clients
110-
111-
<center>
112-
<iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vRm4aEylX8xMNmO-HFFxr67gbZDQ8rh_vss1HvX0tWAUD_zxkwYYNhiBObT1LVe8m6ELZ0seOzmH0ZL/pubchart?oid=1396205706&amp;format=interactive"></iframe>
113-
</center>
114-
115-
<center>
116-
<iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vRm4aEylX8xMNmO-HFFxr67gbZDQ8rh_vss1HvX0tWAUD_zxkwYYNhiBObT1LVe8m6ELZ0seOzmH0ZL/pubchart?oid=1300503904&amp;format=interactive"></iframe>
117-
</center>
118-
119-
As the demand on PostgresML increases, the system gracefully handles the load. If the number of replicas stays the same, latency slowly increases, all the while remaining well below acceptable ranges. Throughput holds as well, as increasing number of clients evenly split available resources.
120-
121-
If we increase the number of replicas, latency decreases and throughput increases and holds as the number of clients increases in parallel. We get the best result with 5 replicas, but this number is variable and can be changed as needs for latency compete with cost.
122-
123-
!!! success
124-
The most impressive result is serving close to a million predictions with an average latency of less than 1ms. You might notice though that `950160.7` isn't quite one million, and that's true.
111+
The most impressive result is serving close to a million predictions with an average latency of less than 1ms. You might notice though that `950160.7` isn't quite one million, and that's true. We couldn't reach one million with 1000 clients, so we increased to 2000 and got our magic number: **1,021,692.7 req/sec**, with an average latency of **1.7ms**.
125112

126-
We couldn't reach one million with 1000 clients, so we increased to 2000 and got our magic number: **1,021,692.7 req/sec**, with an average latency of **1.7ms**.
127113

128114
### Batching Predictions
129115

130-
Batching is a proven method to optimize performance. If you need to get several data points, batch the requests into one query, and it should run faster than making individual requests.
116+
Batching is a proven method to optimize performance. If you need to get several data points, batch the requests into one query, and it will run faster than making individual requests.
131117

132118
We should precede this result by stating that PostgresML does not yet have a batch prediction API as such. Our `pgml.predict()` function can predict multiple points, but we haven't implemented a query pattern to pass multiple Postgres rows to that function at the same time. Once we do, based on our tests, we should see a quadratic increase in performance.
133119

134-
Regardless of that limitation, we still managed to get better results by batching queries together since Postgres needed to do less query parsing and data fetching, and we saved on network round trip as well.
120+
Regardless of that limitation, we still managed to get better results by batching queries together since Postgres needed to do less query parsing and data fetching, and we saved on network round trip time as well.
135121

136122
<center>
137123
<iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vRm4aEylX8xMNmO-HFFxr67gbZDQ8rh_vss1HvX0tWAUD_zxkwYYNhiBObT1LVe8m6ELZ0seOzmH0ZL/pubchart?oid=1506211879&amp;format=interactive"></iframe>
@@ -141,14 +127,21 @@ Regardless of that limitation, we still managed to get better results by batchin
141127
<iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vRm4aEylX8xMNmO-HFFxr67gbZDQ8rh_vss1HvX0tWAUD_zxkwYYNhiBObT1LVe8m6ELZ0seOzmH0ZL/pubchart?oid=1488435965&amp;format=interactive"></iframe>
142128
</center>
143129

144-
If batching did not work at all, we would see a linear increase in latency and a linear decrease in throughput. That did not happen, as we saw a good increase in throughput and a sublinear increase in latency. A modest success, but a success nonetheless.
145-
130+
If batching did not work at all, we would see a linear increase in latency and a linear decrease in throughput. That did not happen; instead, we got a good increase in throughput and a sublinear increase in latency. A modest success, but a success nonetheless.
146131

147132
### Graceful Degradation and Queuing
148133

134+
<center>
135+
<iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vRm4aEylX8xMNmO-HFFxr67gbZDQ8rh_vss1HvX0tWAUD_zxkwYYNhiBObT1LVe8m6ELZ0seOzmH0ZL/pubchart?oid=1396205706&amp;format=interactive"></iframe>
136+
</center>
137+
138+
<center>
139+
<iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vRm4aEylX8xMNmO-HFFxr67gbZDQ8rh_vss1HvX0tWAUD_zxkwYYNhiBObT1LVe8m6ELZ0seOzmH0ZL/pubchart?oid=1300503904&amp;format=interactive"></iframe>
140+
</center>
141+
149142
All systems, at some point in their lifetime, will come under more load than they were designed for; what happens then is an important feature (or bug) of their design. Horizontal scaling is never immediate: it takes a bit of time to spin up additional hardware to handle the load. It can take a second, or a minute, depending on availability, but in both cases, existing resources need to serve traffic the best way they can.
150143

151-
We were hoping to test PostgresML to its breaking point, but we couldn't quite get there. As load (number of clients) increased beyond provisioned capacity, the only thing we saw was a gradual increase in latency. Throughput remained roughly the same. This gradual latency increase was caused by simple queuing: the replicas couldn't serve requests concurrently so they had to wait, patiently, in the poolers.
144+
We were hoping to test PostgresML to its breaking point, but we couldn't quite get there. As load (number of clients) increased beyond provisioned capacity, the only thing we saw was a gradual increase in latency. Throughput remained roughly the same. This gradual latency increase was caused by simple queuing: the replicas couldn't serve requests concurrently, so the requests had to patiently wait in the poolers.
152145

153146
<center>
154147
![Queuing](/images/illustrations/queueing.svg) <br />
@@ -159,11 +152,16 @@ Among many others, this is a very important feature of any proxy: it's a FIFO qu
159152

160153
Queueing overall is not desirable, but it's a feature, not a bug. While autoscaling spins up an additional replica, the app continues to work, although a few milliseconds slower, which is a good trade off for not overspending on hardware.
161154

155+
As the demand on PostgresML increases, the system gracefully handles the load. If the number of replicas stays the same, latency slowly increases, all the while remaining well below acceptable ranges. Throughput holds as well, as increasing number of clients evenly split available resources.
156+
157+
If we increase the number of replicas, latency decreases and throughput increases and eventually stabilizies as the number of clients increases in parallel. We get the best result with 5 replicas, but this number is variable and can be changed as needs for latency compete with cost.
158+
159+
162160
## What's Next
163161

164162
Horizontal scaling and high availability are fascinating topics in software engineering. After doing this benchmark, we had no more doubts about our chosen architecture. Needing to serve 1 million predictions per second is rare, but having the ability to do that, and more if desired, is an important aspect for any new system.
165163

166-
The next challenge for us is to scale writes horizontally. In the database world, this means sharding the database into multiple separate machines using a hashing function, and automatically routing both reads and writes to the right shards. There are many possible solutions on the market for this already, e.g. Citus and Foreign Data Wrappers, but none are as horizontally scalable to our liking, although we will incorporate them into our architecture until we build the one we really want.
164+
The next challenge for us is to scale writes horizontally. In the database world, this means sharding the database into multiple separate machines using a hashing function, and automatically routing both reads and writes to the right shards. There are many possible solutions on the market for this already, e.g. Citus and Foreign Data Wrappers, but none are as horizontally scalable as we like, although we will incorporate them into our architecture until we build the one we really want.
167165

168166
For that purpose, we're building our own open source [Postgres proxy](https://github.com/levkk/pgcat/) which we discussed earlier in the article. As we progress further in our journey, we'll be adding more features and performance improvements.
169167

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy