Skip to content

Commit 14dee55

Browse files
committed
2 parents 0cec1f0 + 6b8e8fe commit 14dee55

File tree

5 files changed

+405
-60
lines changed

5 files changed

+405
-60
lines changed

README.md

Lines changed: 181 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,197 @@
1-
# postgres_cluster
1+
# `Postgresql multi-master`
22

3-
[![Build Status](https://travis-ci.org/postgrespro/postgres_cluster.svg?branch=master)](https://travis-ci.org/postgrespro/postgres_cluster)
3+
Multi-master is an extension and set of patches to a Postegres database, that turns Postgres into a
4+
synchronous shared-nothing cluster to provide OLTP scalability and high availability with automatic
5+
disaster recovery.
46

5-
Various experiments with PostgreSQL clustering perfomed at PostgresPro.
67

7-
This is a mirror of postgres repo with several changes to the core and a few extra extensions.
88

9-
## Core changes:
9+
## Overview
1010

11-
* Transaction manager interface (eXtensible Transaction Manager, xtm). Generic interface to plug distributed transaction engines. More info on [postgres wiki](https://wiki.postgresql.org/wiki/DTM) and on [the email thread](http://www.postgresql.org/message-id/flat/F2766B97-555D-424F-B29F-E0CA0F6D1D74@postgrespro.ru).
11+
Multi-master replicates same database to all nodes in cluster and allows writes to each node. Transaction
12+
isolation is enforced cluster-wide, so in case of concurrent updates on different nodes database will use the
13+
same conflict resolution rules (mvcc with repeatable read isolation level) as single node uses for concurrent
14+
backends and always stays in consistent state. Any writing transaction will write to all nodes, hence increasing
15+
commit latency for amount of time proportional to roundtrip between nodes nedded for synchronization. Read only
16+
transactions and queries executed locally without measurable overhead. Replication mechanism itself based on
17+
logical decoding and earlier version of pglogical extension provided for community by 2ndQuadrant team.
18+
19+
Several changes was made in postgres core to implement mentioned functionality:
20+
* Transaction manager API. (eXtensible Transaction Manager, xtm). Generic interface to plug distributed
21+
transaction engines. More info on [postgres wiki](https://wiki.postgresql.org/wiki/DTM) and
22+
on [the email thread](http://www.postgresql.org/message-id/flat/F2766B97-555D-424F-B29F-E0CA0F6D1D74@postgrespro.ru).
1223
* Distributed deadlock detection API.
1324
* Logical decoding of transactions.
1425

15-
## New extensions:
26+
Cluster consisting of N nodes can continue to work while majority of initial nodes are alive and reachable by
27+
other nodes. This is done by using 3 phase commit protocol and heartbeats for failure discovery. Node that is
28+
brought back to cluster can be fast-forwaded to actual state automatically in case when transactions log still
29+
exists since the time when node was excluded from cluster (this depends on checkpoint configuration in postgres).
30+
31+
Read more about internals on [Architechture](/Architechture) page.
32+
33+
34+
35+
## Features
36+
37+
* Cluster-wide transaction isolation
38+
* Synchronous logical replication
39+
* DDL Replication
40+
* Distributed sequences
41+
* Fault tolerance
42+
* Automatic node recovery
43+
44+
45+
46+
## Limitations
47+
48+
* Commit latency.
49+
Current implementation of logical replication sends data to subscriber nodes only after local commit, so in case of
50+
heavy-write transaction user will wait for transaction processing two times: on local node and al other nodes
51+
(simultaneosly). We have plans to address this issue in future.
52+
53+
* DDL replication.
54+
While data is replicated on logical level, DDL replicated by statements performing distributed commit with the same
55+
statement. Some complex DDL scenarious including stored procedures and temp temp tables aren't working properly. We
56+
are working right now on proving full compatibility with ordinary postgres. Currently we are passing 141 of 164
57+
postgres regression tests.
58+
59+
* Isolation level.
60+
Multimaster currently support only _repeatable_ _read_ isolation level. This is stricter than default _read_commited_,
61+
but also increases probability of serialization failure during commit. _Serializable_ level isn't supported yet.
62+
63+
* One database per cluster.
64+
65+
66+
67+
## Installation
68+
69+
(Existing db?)
70+
71+
Multi-master consist of patched version of postgres and extension mmts, that provides most of functionality, but
72+
doesn't requiere changes to postgres core. To run multimaster one need to install postgres and several extensions
73+
to all nodes in cluster.
74+
75+
### Sources
76+
77+
Ensure that following prerequisites are installed:
78+
79+
debian based linux:
80+
81+
```sh
82+
apt-get install -y git make gcc libreadline-dev bison flex zlib1g-dev
83+
```
84+
85+
red hat based linux:
86+
87+
```sh
88+
yum groupinstall 'Development Tools'
89+
yum install git, automake, libtool, bison, flex readline-devel
90+
```
91+
92+
After that everything is ready to install postgres along with extensions
93+
94+
```sh
95+
git clone https://github.com/postgrespro/postgres_cluster.git
96+
cd postgres_cluster
97+
./configure && make && make -j 4 install
98+
cd ./contrib/raftable && make install
99+
cd ../../contrib/mmts && make install
100+
```
101+
102+
### Docker
103+
104+
Directort contrib/mmts also includes Dockerfile that is capable of building multi-master and starting 3 node cluster.
105+
106+
```sh
107+
cd contrib/mmts
108+
docker-compose build
109+
docker-compose up
110+
```
111+
112+
### PgPro packages
113+
114+
After things go more stable we will release prebuilt packages for major platforms.
115+
116+
117+
118+
## Configuration
119+
120+
1. Add these required options to the `postgresql.conf` of each instance in the cluster.
121+
122+
```sh
123+
max_prepared_transactions = 200 # should be > 0, because all
124+
# transactions are implicitly two-phase
125+
max_connections = 200
126+
max_worker_processes = 100 # at least (2 * n + p + 1)
127+
# this figure is calculated as:
128+
# 1 raftable worker
129+
# n-1 receiver
130+
# n-1 sender
131+
# 1 mtm-sender
132+
# 1 mtm-receiver
133+
# p workers in the pool
134+
max_parallel_degree = 0
135+
wal_level = logical # multimaster is build on top of
136+
# logical replication and will not work otherwise
137+
max_wal_senders = 10 # at least the number of nodes
138+
wal_sender_timeout = 0
139+
default_transaction_isolation = 'repeatable read'
140+
max_replication_slots = 10 # at least the number of nodes
141+
shared_preload_libraries = 'raftable,multimaster'
142+
multimaster.workers = 10
143+
multimaster.queue_size = 10485760 # 10mb
144+
multimaster.node_id = 1 # the 1-based index of the node in the cluster
145+
multimaster.conn_strings = 'dbname=... host=....0.0.1 port=... raftport=..., ...'
146+
# comma-separated list of connection strings
147+
multimaster.use_raftable = true
148+
multimaster.heartbeat_recv_timeout = 1000
149+
multimaster.heartbeat_send_timeout = 250
150+
multimaster.ignore_tables_without_pk = true
151+
multimaster.twopc_min_timeout = 2000
152+
```
153+
1. Allow replication in `pg_hba.conf`.
154+
155+
(link to full doc on config params)
156+
157+
## Management
158+
159+
`create extension mmts;` to gain access to these functions:
160+
161+
* `mtm.get_nodes_state()` -- show status of nodes on cluster
162+
* `mtm.get_cluster_state()` -- show whole cluster status
163+
* `mtm.get_cluster_info()` -- print some debug info
164+
* `mtm.make_table_local(relation regclass)` -- stop replication for a given table
165+
166+
(link to full doc on functions)
167+
168+
169+
170+
171+
## Tests
172+
173+
### Performance
174+
175+
(Show TPC-C here on 3 nodes)
176+
177+
### Fault tolerance
178+
179+
(Link to test/failure matrix)
16180

17-
The following table describes the features and the way they are implemented in our four main extensions:
181+
### Postgres compatibility
18182

19-
| |commit timestamps |snapshot sharing |
20-
|---------------------------:|:----------------------------:|:----------------------------------:|
21-
|**distributed transactions**|[`pg_tsdtm`](contrib/pg_tsdtm)|[`pg_dtm`](contrib/pg_dtm) |
22-
|**multimaster replication** |[`mmts`](contrib/mmts) |[`multimaster`](contrib/multimaster)|
183+
Regression: 141 of 164
184+
Isolation: n/a
23185

24-
### [`mmts`](contrib/mmts)
25-
An implementation of synchronous **multi-master replication** based on **commit timestamps**.
186+
To run tests:
187+
* `make -C contrib/mmts check` to run TAP-tests.
188+
* `make -C contrib/mmts xcheck` to run blockade tests. The blockade tests require `docker`, `blockade`, and some other packages installed, see [requirements.txt](tests2/requirements.txt) for the list. You might also want to gain superuser privileges to run these tests successfully.
26189

27-
### [`multimaster`](contrib/multimaster)
28-
An implementation of synchronous **multi-master replication** based on **snapshot sharing**.
29190

30-
### [`pg_dtm`](contrib/pg_dtm)
31-
An implementation of **distributed transaction** management based on **snapshot sharing**.
191+
docs:
32192

33-
### [`pg_tsdtm`](contrib/pg_tsdtm)
34-
An implementation of **distributed transaction** management based on **commit timestamps**.
193+
## Architechture
35194

36-
### [`arbiter`](contrib/arbiter)
37-
A distributed transaction management daemon.
38-
Used by `pg_dtm` and `multimaster`.
195+
## Configuration params
39196

40-
### [`raftable`](contrib/raftable)
41-
A key-value table replicated over Raft protocol.
42-
Used by `mmts`.
197+
## Management functions

contrib/mmts/README.md

Lines changed: 148 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,122 @@
1-
# `mmts`
1+
# `Postgresql multi-master`
22

3-
An implementation of synchronous **multi-master replication** based on **commit timestamps**.
3+
Multi-master is an extension and set of patches to a Postegres database, that turns Postgres into a
4+
synchronous shared-nothing cluster to provide OLTP scalability and high availability with automatic
5+
disaster recovery.
46

5-
## Usage
67

7-
1. Install `contrib/raftable` and `contrib/mmts` on each instance.
8+
9+
## Overview
10+
11+
Multi-master replicates same database to all nodes in cluster and allows writes to each node. Transaction
12+
isolation is enforced cluster-wide, so in case of concurrent updates on different nodes database will use the
13+
same conflict resolution rules (mvcc with repeatable read isolation level) as single node uses for concurrent
14+
backends and always stays in consistent state. Any writing transaction will write to all nodes, hence increasing
15+
commit latency for amount of time proportional to roundtrip between nodes nedded for synchronization. Read only
16+
transactions and queries executed locally without measurable overhead. Replication mechanism itself based on
17+
logical decoding and earlier version of pglogical extension provided for community by 2ndQuadrant team.
18+
19+
Several changes was made in postgres core to implement mentioned functionality:
20+
* Transaction manager API. (eXtensible Transaction Manager, xtm). Generic interface to plug distributed
21+
transaction engines. More info on [postgres wiki](https://wiki.postgresql.org/wiki/DTM) and
22+
on [the email thread](http://www.postgresql.org/message-id/flat/F2766B97-555D-424F-B29F-E0CA0F6D1D74@postgrespro.ru).
23+
* Distributed deadlock detection API.
24+
* Logical decoding of transactions.
25+
26+
Cluster consisting of N nodes can continue to work while majority of initial nodes are alive and reachable by
27+
other nodes. This is done by using 3 phase commit protocol and heartbeats for failure discovery. Node that is
28+
brought back to cluster can be fast-forwaded to actual state automatically in case when transactions log still
29+
exists since the time when node was excluded from cluster (this depends on checkpoint configuration in postgres).
30+
31+
Read more about internals on [Architechture](/Architechture) page.
32+
33+
34+
35+
## Features
36+
37+
* Cluster-wide transaction isolation
38+
* Synchronous logical replication
39+
* DDL Replication
40+
* Distributed sequences
41+
* Fault tolerance
42+
* Automatic node recovery
43+
44+
45+
46+
## Limitations
47+
48+
* Commit latency.
49+
Current implementation of logical replication sends data to subscriber nodes only after local commit, so in case of
50+
heavy-write transaction user will wait for transaction processing two times: on local node and al other nodes
51+
(simultaneosly). We have plans to address this issue in future.
52+
53+
* DDL replication.
54+
While data is replicated on logical level, DDL replicated by statements performing distributed commit with the same
55+
statement. Some complex DDL scenarious including stored procedures and temp temp tables aren't working properly. We
56+
are working right now on proving full compatibility with ordinary postgres. Currently we are passing 141 of 164
57+
postgres regression tests.
58+
59+
* Isolation level.
60+
Multimaster currently support only _repeatable_ _read_ isolation level. This is stricter than default _read_commited_,
61+
but also increases probability of serialization failure during commit. _Serializable_ level isn't supported yet.
62+
63+
* One database per cluster.
64+
65+
66+
67+
## Installation
68+
69+
(Existing db?)
70+
71+
Multi-master consist of patched version of postgres and extension mmts, that provides most of functionality, but
72+
doesn't requiere changes to postgres core. To run multimaster one need to install postgres and several extensions
73+
to all nodes in cluster.
74+
75+
### Sources
76+
77+
Ensure that following prerequisites are installed:
78+
79+
debian based linux:
80+
81+
```sh
82+
apt-get install -y git make gcc libreadline-dev bison flex zlib1g-dev
83+
```
84+
85+
red hat based linux:
86+
87+
```sh
88+
yum groupinstall 'Development Tools'
89+
yum install git, automake, libtool, bison, flex readline-devel
90+
```
91+
92+
After that everything is ready to install postgres along with extensions
93+
94+
```sh
95+
git clone https://github.com/postgrespro/postgres_cluster.git
96+
cd postgres_cluster
97+
./configure && make && make -j 4 install
98+
cd ./contrib/raftable && make install
99+
cd ../../contrib/mmts && make install
100+
```
101+
102+
### Docker
103+
104+
Directort contrib/mmts also includes Dockerfile that is capable of building multi-master and starting 3 node cluster.
105+
106+
```sh
107+
cd contrib/mmts
108+
docker-compose build
109+
docker-compose up
110+
```
111+
112+
### PgPro packages
113+
114+
After things go more stable we will release prebuilt packages for major platforms.
115+
116+
117+
118+
## Configuration
119+
8120
1. Add these required options to the `postgresql.conf` of each instance in the cluster.
9121

10122
```sh
@@ -40,7 +152,9 @@ An implementation of synchronous **multi-master replication** based on **commit
40152
```
41153
1. Allow replication in `pg_hba.conf`.
42154

43-
## Status functions
155+
(link to full doc on config params)
156+
157+
## Management
44158

45159
`create extension mmts;` to gain access to these functions:
46160

@@ -49,7 +163,35 @@ An implementation of synchronous **multi-master replication** based on **commit
49163
* `mtm.get_cluster_info()` -- print some debug info
50164
* `mtm.make_table_local(relation regclass)` -- stop replication for a given table
51165

52-
## Testing
166+
(link to full doc on functions)
167+
168+
53169

170+
171+
## Tests
172+
173+
### Performance
174+
175+
(Show TPC-C here on 3 nodes)
176+
177+
### Fault tolerance
178+
179+
(Link to test/failure matrix)
180+
181+
### Postgres compatibility
182+
183+
Regression: 141 of 164
184+
Isolation: n/a
185+
186+
To run tests:
54187
* `make -C contrib/mmts check` to run TAP-tests.
55188
* `make -C contrib/mmts xcheck` to run blockade tests. The blockade tests require `docker`, `blockade`, and some other packages installed, see [requirements.txt](tests2/requirements.txt) for the list. You might also want to gain superuser privileges to run these tests successfully.
189+
190+
191+
docs:
192+
193+
## Architechture
194+
195+
## Configuration params
196+
197+
## Management functions

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy