Synchronous replication + pgPool: not all transactions immediately visible on standby

Discussion:

Thomas Kellerer

2014-09-24 08:08:59 UTC

Hello,

we have a setup with Postgres 9.3.4 running on Ubuntu (don't know the exact version) using streaming replication with a hot standby and pgPool 3.3.3 as a loadbalancer in front of the two Postgres servers.

While running automated tests we noticed that despite the fact that replication is set to synchronous not all committed transactions are immediately visible on the standby.

The tests are Java programs using JPA (EclipseLink) to access the database.

The basic pattern is something like this:

* tests opens a new JPA sessions
* updates a table
* commits the transaction
* closes the JPA session
* opens a new JPA session (this is where it gets transferred to the slave)
* selects the modified data to verify everything
* closes the JPA session

If the commit and the following select are issued too quickly, the select doesn't see the changes.

If we either connect directly to the master to run the tests, or artificially sleep inside the tests (e.g. 100ms) then we can see the results of a previous transaction without problems. All connections use the default isolation level (read committed).

We enabled statement logging on the master and the server, and these are the relevant parts

Log on the on the master:

2014-09-24 09:13:24.774 CEST LOG: Execute <unnamed>: SET extra_float_digits = 3
2014-09-24 09:13:24.797 CEST LOG: Execute <unnamed>: BEGIN
2014-09-24 09:13:24.869 CEST LOG: Execute <unnamed>: UPDATE xxxx SET STATUS = $1 WHERE some_col IN (.....)
2014-09-24 09:13:24.869 CEST DETAIL: Parameter: $1 = '2', ........ $2001 = '1999'
2014-09-24 09:13:25.060 CEST LOG: Execute S_1: COMMIT
2014-09-24 09:13:25.120 CEST LOG: Anweisung: DISCARD ALL

Log on the slave:

2014-09-24 09:13:25.125 CEST LOG: Execute <unnamed>: SET extra_float_digits = 3
2014-09-24 09:13:25.131 CEST LOG: Execute <unnamed>: BEGIN
2014-09-24 09:13:25.133 CEST LOG: Execute <unnamed>: SELECT ... FROM xxxx WHERE ... AND (STATUS = $3) LIMIT $4 OFFSET $5
2014-09-24 09:13:25.133 CEST DETAIL: Parameter: $1 = '1000426553', $2 = '2014-09-24 09:14:18.114', $3 = '2', $4 = '2100', $5 = '0'
2014-09-24 09:13:25.137 CEST LOG: Execute S_1: COMMIT
2014-09-24 09:13:25.138 CEST LOG: Anweisung: DISCARD ALL

So the new session on the slave was initiated 65ms after the commit on the master was done.
But still the SELECT didn't return anything (the where clause includes the new values for the STATUS column updated in the previous transaction).

The IN clause of the UPDATE statement contains 2000 values. If we reduce the number of updated rows (e.g. to 20) things are working fine.
Everything sounds as if the replication is configured to be asynchronous, although it isn't

Here are some of the configuration settings that I can imagine would be important:

postgresql.conf (from the master)

wal_level = hot_standby
checkpoint_segments = 16
checkpoint_completion_target = 0.9
max_wal_senders = 5
wal_keep_segments = 50
synchronous_standby_names = 'test_slave'

The following entries are unchanged from a default configuration (they are still commented out)

#fsync = on
#synchronous_commit = on
#wal_sync_method = fsync
#full_page_writes = on
#wal_buffers = -1
#wal_writer_delay = 200ms
#commit_delay = 0
#commit_siblings = 5

The slave has

hot_standby = on
hot_standby_feedback = on

pgpool.conf

num_init_children = 400
max_pool = 2
child_life_time = 300
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0
connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'

replication_mode = off
replicate_select = off
insert_lock = on

load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 0
parallel_mode = off
use_watchdog = off
memory_cache_enabled = off

Our assumption is that it has something to do with the hot standby and/or the pgPool configuration.
But we are unsure where exactly the problem is.

This is our first time using pgPool (this is a pre-configured system from our customer) and we are unsure on where to look further.

Regards
Thomas

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tatsuo Ishii

2014-09-24 23:40:52 UTC

Permalink

I think your problem is not relevant to pgpool-II.

PostgreSQL's "synchronous" replication is actually not synchronous
(it's confusing but the naming was developer's decision). Primary
server sends the committed transaction's WAL record to standby and
wait for it is written to the standby's WAL file (and synched to the
disk if synchronous_commit = on). Then report to the client "the
transaction has been committed". That means if you send query on the
just committed row to the standby, it may returns an old row because
WAL record may replay yet.

If you dislike the PostgreSQL's behavior, you may want to try
pgpool-II's "native replication mode" (set replication_mode = on and
master_slave_mode = off). In the mode, pgpool-II does not return
response to the client until all PostgreSQL returns a commit
response. Thus right after the commit, querying to any PostgreSQL
should return committed row immediately.

Note that there's a small updating delay between PostgreSQL A and B.
So there's small window where data on A and B are differ.

Postgres-XC (or Postgres-XL which is a fork of Postgres-XC) overcomes
the issue by using "global transaction management" technique.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Post by Thomas Kellerer
Hello,
we have a setup with Postgres 9.3.4 running on Ubuntu (don't know the exact version) using streaming replication with a hot standby and pgPool 3.3.3 as a loadbalancer in front of the two Postgres servers.
While running automated tests we noticed that despite the fact that replication is set to synchronous not all committed transactions are immediately visible on the standby.
The tests are Java programs using JPA (EclipseLink) to access the database.
* tests opens a new JPA sessions
* updates a table
* commits the transaction
* closes the JPA session
* opens a new JPA session (this is where it gets transferred to the slave)
* selects the modified data to verify everything
* closes the JPA session
If the commit and the following select are issued too quickly, the select doesn't see the changes.
If we either connect directly to the master to run the tests, or artificially sleep inside the tests (e.g. 100ms) then we can see the results of a previous transaction without problems. All connections use the default isolation level (read committed).
We enabled statement logging on the master and the server, and these are the relevant parts
2014-09-24 09:13:24.774 CEST LOG: Execute <unnamed>: SET extra_float_digits = 3
2014-09-24 09:13:24.797 CEST LOG: Execute <unnamed>: BEGIN
2014-09-24 09:13:24.869 CEST LOG: Execute <unnamed>: UPDATE xxxx SET STATUS = $1 WHERE some_col IN (.....)
2014-09-24 09:13:24.869 CEST DETAIL: Parameter: $1 = '2', ........ $2001 = '1999'
2014-09-24 09:13:25.060 CEST LOG: Execute S_1: COMMIT
2014-09-24 09:13:25.120 CEST LOG: Anweisung: DISCARD ALL
2014-09-24 09:13:25.125 CEST LOG: Execute <unnamed>: SET extra_float_digits = 3
2014-09-24 09:13:25.131 CEST LOG: Execute <unnamed>: BEGIN
2014-09-24 09:13:25.133 CEST LOG: Execute <unnamed>: SELECT ... FROM xxxx WHERE ... AND (STATUS = $3) LIMIT $4 OFFSET $5
2014-09-24 09:13:25.133 CEST DETAIL: Parameter: $1 = '1000426553', $2 = '2014-09-24 09:14:18.114', $3 = '2', $4 = '2100', $5 = '0'
2014-09-24 09:13:25.137 CEST LOG: Execute S_1: COMMIT
2014-09-24 09:13:25.138 CEST LOG: Anweisung: DISCARD ALL
So the new session on the slave was initiated 65ms after the commit on the master was done.
But still the SELECT didn't return anything (the where clause includes the new values for the STATUS column updated in the previous transaction).
The IN clause of the UPDATE statement contains 2000 values. If we reduce the number of updated rows (e.g. to 20) things are working fine.
Everything sounds as if the replication is configured to be asynchronous, although it isn't
postgresql.conf (from the master)
wal_level = hot_standby
checkpoint_segments = 16
checkpoint_completion_target = 0.9
max_wal_senders = 5
wal_keep_segments = 50
synchronous_standby_names = 'test_slave'
The following entries are unchanged from a default configuration (they are still commented out)
#fsync = on
#synchronous_commit = on
#wal_sync_method = fsync
#full_page_writes = on
#wal_buffers = -1
#wal_writer_delay = 200ms
#commit_delay = 0
#commit_siblings = 5
The slave has
hot_standby = on
hot_standby_feedback = on
pgpool.conf
num_init_children = 400
max_pool = 2
child_life_time = 300
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0
connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'
replication_mode = off
replicate_select = off
insert_lock = on
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 0
parallel_mode = off
use_watchdog = off
memory_cache_enabled = off
Our assumption is that it has something to do with the hot standby and/or the pgPool configuration.
But we are unsure where exactly the problem is.
This is our first time using pgPool (this is a pre-configured system from our customer) and we are unsure on where to look further.
Regards
Thomas
--
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Michael Paquier

2014-09-25 12:10:26 UTC

Permalink

Post by Tatsuo Ishii
Postgres-XC (or Postgres-XL which is a fork of Postgres-XC) overcomes
the issue by using "global transaction management" technique.

... At the cost of reducing data availability for sharded tables, and
increasing write load for replicated tables, both things not that cool
for data warehouse applications, better for OLTP loads.
Regards,

--
Michael
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Thomas Kellerer

2014-09-25 14:24:46 UTC

Permalink

Post by Tatsuo Ishii
I think your problem is not relevant to pgpool-II.
PostgreSQL's "synchronous" replication is actually not synchronous
(it's confusing but the naming was developer's decision). Primary
server sends the committed transaction's WAL record to standby and
wait for it is written to the standby's WAL file (and synched to the
disk if synchronous_commit = on). Then report to the client "the
transaction has been committed". That means if you send query on the
just committed row to the standby, it may returns an old row because
WAL record may replay yet.

Thanks for the insight. I wasn't aware of that.
I assumed that if the slave said "transaction applied" this would also be visible "on the SQL level".

Post by Tatsuo Ishii
If you dislike the PostgreSQL's behavior, you may want to try
pgpool-II's "native replication mode" (set replication_mode = on and
master_slave_mode = off). In the mode, pgpool-II does not return
response to the client until all PostgreSQL returns a commit
response. Thus right after the commit, querying to any PostgreSQL
should return committed row immediately.

OK, we will try that out.

The documentation could a bit more specific on what exactly "replication mode" means.
It seems that this only influences the way pgPool distributes queries, it does not actually turn on any kind of replication, right?

Post by Tatsuo Ishii
From the flow chart[1] it also seems that this will only properly distribute read-only queries if we turn off auto-commit.

Did I understand that correctly?

Post by Tatsuo Ishii
Postgres-XC (or Postgres-XL which is a fork of Postgres-XC) overcomes
the issue by using "global transaction management" technique.

I know of those two options, but those are currently not on our roadmap
(although I'd really like to play around with them at some time).

Regards
Thomas

[1] http://www.pgpool.net/docs/latest/where_to_send_queries.pdf

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tatsuo Ishii

2014-09-25 15:07:11 UTC

Permalink

Post by Thomas Kellerer

OK, we will try that out.
The documentation could a bit more specific on what exactly "replication mode" means.
It seems that this only influences the way pgPool distributes queries, it does not actually turn on any kind of replication, right?

With replication mode pgpool-II implicitely sends all write queries to
all PostgreSQL servers. The idea is, "sending identical query will
result in same result". Thus some queries having oid, xid, random() or
any object which results in different result among PostgreSQL servers
will bring different data in replication mode. So you should be very
carefull if you want to use such queries. Queries including time/data
datatypes, functions (for example now()) are rewritten by pgpool-II so
that it uses local time at pgpool-II to avoid the problem.

With the mode, PostgreSQL's streaming replication mode should be turn
off by the way.

Post by Thomas Kellerer

Post by Tatsuo Ishii
From the flow chart[1] it also seems that this will only properly distribute read-only queries if we turn off auto-commit.

Did I understand that correctly?

Yes, the chart only explains read queries. Write queries are sent to
all PostgreSQL servers as stated above.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Kevin Grittner

2014-09-25 14:47:56 UTC

Permalink

Post by Tatsuo Ishii
I think your problem is not relevant to pgpool-II.

Agreed.

Post by Tatsuo Ishii
PostgreSQL's "synchronous" replication is actually not
synchronous

Well, that statement is a bit misleading. What is synchronous with
the COMMIT request is that data is persisted on at least two
targets before the COMMIT request returns an indication of success.
It guarantees that much (which some people complain about because
if there is only one synchronous replication target the commit
request hangs indefinitely if it, or communications to it, goes
down) and no more (because some people expect that it is not just
about durability, but also about visibility). There have been many
discussions about allowing configuration of broader or less strict
guarantees, but for now, you have just the one option.

Post by Tatsuo Ishii
(it's confusing but the naming was developer's decision).

There was much discussion at the time, and this was the consensus
for an initial implementation.

Post by Tatsuo Ishii
Primary server sends the committed transaction's WAL record to
standby and wait for it is written to the standby's WAL file (and
synched to the disk if synchronous_commit = on). Then report to
the client "the transaction has been committed". That means if
you send query on the just committed row to the standby, it may
returns an old row because WAL record may replay yet.

Right.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tatsuo Ishii

2014-09-25 15:22:53 UTC

Permalink

Post by Kevin Grittner

Post by Tatsuo Ishii
PostgreSQL's "synchronous" replication is actually not
synchronous

Post by Tatsuo Ishii
(it's confusing but the naming was developer's decision).

There was much discussion at the time, and this was the consensus
for an initial implementation.

I know what PostgreSQL's synchronous replication does. But, as you
saw, still many users expect "synchronous replication" will do
"visibility synchronous". I'm a little bit tired of making this kind
of explanation to users but that's not users fault, I think. Maybe
"crash safe replication" or some such was more appropriate term, but
of course this is just a hindsight.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general