Discussion:
Optimal checkpoint_setting
pinker
2014-10-09 10:52:45 UTC
Permalink
Hello All,
I have a brand new machine to tune:
x3750 M4, 4xE5-4640v2 2.2GHz; 512GBRAM (32x16GB), 4x300GB
SAS + SSD (Easy Tier) in RAID 10

What's particularly important now is to choose optimal configuration for
write operations. We have discussion about checkpoint_segments and
checkpoint_timeout parameters. Test, which was based on pg_replay, has shown
that the biggest amount of data is written when checkpoint_segments are set
to 10 000 and checkpoint_timeout to 30 min, but I'm worrying about amount of
time needed for crash recovery. On the other hand it's mighty machine :) and
I don't wanna go back with 30 segments.

I'd appreciate any helpful remark on that.

The rest of parameters from the test:
shared_buffers=2GB
temp_buffers=128MB
max_files_per_process=500
work_mem=256MB
maintenance_work_mem=128MB
effective_io_concurrency=50
synchronous_commit=local
wal_buffers=512kB
wal_writer_delay=1ms
commit_delay=100
commit_siblings=1000
random_page_cost=1.0
cpu_tuple_cost=2.0
effective_cache_size=500GB
geqo_threshold=20
geqo_effort=10
geqo_selection_bias=2.0
join_collapse_limit=16










--
View this message in context: http://postgresql.1045698.n5.nabble.com/Optimal-checkpoint-setting-tp5822359.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Jeff Janes
2014-10-09 16:55:03 UTC
Permalink
Post by pinker
Hello All,
x3750 M4, 4xE5-4640v2 2.2GHz; 512GBRAM (32x16GB), 4x300GB
SAS + SSD (Easy Tier) in RAID 10
What's particularly important now is to choose optimal configuration for
write operations. We have discussion about checkpoint_segments and
checkpoint_timeout parameters. Test, which was based on pg_replay, has shown
that the biggest amount of data is written when checkpoint_segments are set
to 10 000 and checkpoint_timeout to 30 min, but I'm worrying about amount of
time needed for crash recovery.
Since you already have pg_replay running, kill -9 some backend (my favorite
victim is the bgwriter) during the middle of pg_replay, and see how long it
takes to recover.

You might want to try it with and without clobbering the FS cache, or
simply rebooting the whole machine, depending on what kind of crash you
think is more likely.

Recovering into a cold cache can be painfully slow. If your database
mostly fits in memory, you can speed it up by using something (like "tar
-cf - pgdata | wc -c" to) read the entire pg data directory in sequential
fashion and hopefully cache it. If you find recovery too slow, you might
want to try to this trick (and put it in your init scripts) rather than
lowering checkpoint_segments.

Cheers,

Jeff
Dennis
2014-10-09 20:13:12 UTC
Permalink
Hi, a little off the general topic but I am just wondering if the “tar -cf - pgdata | wc -c” trick can be used as general trick to pre-warm the cache?

Thanks.
Post by pinker
Hello All,
x3750 M4, 4xE5-4640v2 2.2GHz; 512GBRAM (32x16GB), 4x300GB
SAS + SSD (Easy Tier) in RAID 10
What's particularly important now is to choose optimal configuration for
write operations. We have discussion about checkpoint_segments and
checkpoint_timeout parameters. Test, which was based on pg_replay, has shown
that the biggest amount of data is written when checkpoint_segments are set
to 10 000 and checkpoint_timeout to 30 min, but I'm worrying about amount of
time needed for crash recovery.
Since you already have pg_replay running, kill -9 some backend (my favorite victim is the bgwriter) during the middle of pg_replay, and see how long it takes to recover.
You might want to try it with and without clobbering the FS cache, or simply rebooting the whole machine, depending on what kind of crash you think is more likely.
Recovering into a cold cache can be painfully slow. If your database mostly fits in memory, you can speed it up by using something (like "tar -cf - pgdata | wc -c" to) read the entire pg data directory in sequential fashion and hopefully cache it. If you find recovery too slow, you might want to try to this trick (and put it in your init scripts) rather than lowering checkpoint_segments.
Cheers,
Jeff
John R Pierce
2014-10-09 20:40:12 UTC
Permalink
Hi, a little off the general topic but I am just wondering if the “tar
-cf - pgdata | wc -c” trick can be used as general trick to pre-warm
the cache?
I wouldn't do this if your pgdata is larger than about 50% of your
physical ram.
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Kevin Grittner
2014-10-09 20:46:26 UTC
Permalink
Post by John R Pierce
Hi, a little off the general topic but I am just wondering if the “tar
-cf - pgdata | wc -c” trick can be used as general trick to pre-warm
the cache?
I wouldn't do this if your pgdata is larger than about 50% of your
physical ram.
Nor would I do this if pgdata is anywhere near the size of a memory
segment on a NUMA machine. I have seen cache pre-warming through a
single process on NUMA absolutely kill performance.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
pinker
2014-10-14 10:30:57 UTC
Permalink
Thank you for your answer, but I don't have access to this server any more
and have to just interpret and pick some parameters from test results, but
your advice about tar trick is something I'm gonna try for sure in next
test.

So, because I don't have any chance to do some more tests, should I change
checkpoint_segments parameter?
My colleagues prefer old setting as shown below, because of maintenance
reasons, but I still would like to convince them to much higher setting. 30
segments for machine like that seems to be too humble.

checkpoint_segments = 30
checkpoint_timeout = 8min

The rest of config looks like this:
shared_buffers=2GB
temp_buffers=128MB
max_files_per_process=1000
work_mem=384MB
maintenance_work_mem=10240MB
effective_io_concurrency=1
synchronous_commit=on
wal_buffers=16MB
wal_writer_delay=200ms
commit_delay=0
commit_siblings=5
random_page_cost=1.0
cpu_tuple_cost= 0.01
effective_cache_size=450GB
geqo_threshold=12
geqo_effort=5
geqo_selection_bias=2.0
join_collapse_limit=8

Any ideas about rest of config? Maybe those connected with write operations?






--
View this message in context: http://postgresql.1045698.n5.nabble.com/Optimal-checkpoint-setting-tp5822359p5822951.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Loading...