Discussion:
Question about Vacuum and Replication failures in 9.3.5
Joel Avni
2014-09-22 20:42:38 UTC
Permalink
I noticed that tables on my master PostgreSQL server were growing, and running vacuum full analyze on them actually made them even bigger.

At the same time, a slave PostgreSQL server had fallen behind in trying to replicate, and was stuck in constantly looping over ‘started streaming WAL from primary at…’ and ‘requested WAL segment …. has already been removed’. Once I stopped running the slave instance, I was able to manually vacuum the tables, and appears that auto vacuum is now able to vacuum as well. One table (for instance) dropped from 10Gb down to 330Mb after this operation. I don’t see anything about auto vacuum not able to acquire locks while the slave wasn’t able to replicate. I am unclear why a slave trying to continue streaming would block the auto vacuum, or is something else at play?

I did check, and no base backups were in progress at the time this occurred.

Thank you,
Joel Avni
Adrian Klaver
2014-09-22 23:04:39 UTC
Permalink
Post by Joel Avni
I noticed that tables on my master PostgreSQL server were growing, and
running vacuum full analyze on them actually made them even bigger.
First what version of Postgres are you using?

Second VACUUM FULL is usually not recommended for the reason you found
out and which is documented here:

http://www.postgresql.org/docs/9.3/interactive/sql-vacuum.html

FULL

Selects "full" vacuum, which can reclaim more space, but takes much
longer and exclusively locks the table. This method also requires extra
disk space, since it writes a new copy of the table and doesn't release
the old copy until the operation is complete. Usually this should only
be used when a significant amount of space needs to be reclaimed from
within the table.
"
Post by Joel Avni
At the same time, a slave PostgreSQL server had fallen behind in trying
to replicate, and was stuck in constantly looping over ‘started
streaming WAL from primary at…’ and ‘requested WAL segment …. has
already been removed’. Once I stopped running the slave instance, I was
able to manually vacuum the tables, and appears that auto vacuum is now
able to vacuum as well. One table (for instance) dropped from 10Gb down
to 330Mb after this operation. I don’t see anything about auto vacuum
not able to acquire locks while the slave wasn’t able to replicate. I
am unclear why a slave trying to continue streaming would block the auto
vacuum, or is something else at play?
My guess related to the locks your VACUUM FULL was taking, though it
would require more information on what all the various parts where doing
over the time frame.
Post by Joel Avni
I did check, and no base backups were in progress at the time this occurred.
Thank you,
Joel Avni
--
Adrian Klaver
***@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Joel Avni
2014-09-23 05:04:56 UTC
Permalink
It 9.3.5 and I did the manual vacuum to try to see where the problem might
be.
Post by Adrian Klaver
Post by Joel Avni
I noticed that tables on my master PostgreSQL server were growing, and
running vacuum full analyze on them actually made them even bigger.
First what version of Postgres are you using?
Second VACUUM FULL is usually not recommended for the reason you found
http://www.postgresql.org/docs/9.3/interactive/sql-vacuum.html
FULL
Selects "full" vacuum, which can reclaim more space, but takes much
longer and exclusively locks the table. This method also requires extra
disk space, since it writes a new copy of the table and doesn't release
the old copy until the operation is complete. Usually this should only
be used when a significant amount of space needs to be reclaimed from
within the table.
"
Post by Joel Avni
At the same time, a slave PostgreSQL server had fallen behind in trying
to replicate, and was stuck in constantly looping over Œstarted
streaming WAL from primary atŠ¹ and Œrequested WAL segment Š. has
already been removed¹. Once I stopped running the slave instance, I was
able to manually vacuum the tables, and appears that auto vacuum is now
able to vacuum as well. One table (for instance) dropped from 10Gb down
to 330Mb after this operation. I don¹t see anything about auto vacuum
not able to acquire locks while the slave wasn¹t able to replicate. I
am unclear why a slave trying to continue streaming would block the auto
vacuum, or is something else at play?
My guess related to the locks your VACUUM FULL was taking, though it
would require more information on what all the various parts where doing
over the time frame.
Post by Joel Avni
I did check, and no base backups were in progress at the time this occurred.
Thank you,
Joel Avni
--
Adrian Klaver
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Joel Avni
2014-09-23 05:21:38 UTC
Permalink
Its version 9.3.5, whats interesting the that the table grew in size after
the vacuum full, which I did to try to see why the auto vacuum wasn¹t
working.
However, after I stopped the PostgreSQL slave instance, then vacuum full
did result in a much much smaller size, as expected. So it appears to be
that there must be some interaction between a slave that trying to do
streaming replication but failing, because the requests WALs have been
cycled out and vacuuming on the master. I am not entirely sure that¹s the
case, but I think observed it twice. Is it the master can¹t clean up
tuples that might be visible at the slave¹s last replayed transaction? I
didn¹t think the master was aware of the slave¹s state, and why locks
can¹t be coordinated between the master and slave.
Post by Adrian Klaver
Post by Joel Avni
I noticed that tables on my master PostgreSQL server were growing, and
running vacuum full analyze on them actually made them even bigger.
First what version of Postgres are you using?
Second VACUUM FULL is usually not recommended for the reason you found
http://www.postgresql.org/docs/9.3/interactive/sql-vacuum.html
FULL
Selects "full" vacuum, which can reclaim more space, but takes much
longer and exclusively locks the table. This method also requires extra
disk space, since it writes a new copy of the table and doesn't release
the old copy until the operation is complete. Usually this should only
be used when a significant amount of space needs to be reclaimed from
within the table.
"
Post by Joel Avni
At the same time, a slave PostgreSQL server had fallen behind in trying
to replicate, and was stuck in constantly looping over Œstarted
streaming WAL from primary atŠ¹ and Œrequested WAL segment Š. has
already been removed¹. Once I stopped running the slave instance, I was
able to manually vacuum the tables, and appears that auto vacuum is now
able to vacuum as well. One table (for instance) dropped from 10Gb down
to 330Mb after this operation. I don¹t see anything about auto vacuum
not able to acquire locks while the slave wasn¹t able to replicate. I
am unclear why a slave trying to continue streaming would block the auto
vacuum, or is something else at play?
My guess related to the locks your VACUUM FULL was taking, though it
would require more information on what all the various parts where doing
over the time frame.
Post by Joel Avni
I did check, and no base backups were in progress at the time this occurred.
Thank you,
Joel Avni
--
Adrian Klaver
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Adrian Klaver
2014-09-23 14:22:37 UTC
Permalink
Post by Joel Avni
Its version 9.3.5, whats interesting the that the table grew in size after
the vacuum full, which I did to try to see why the auto vacuum wasn¹t
working.
Please do not top post, it makes it difficult to follow the thread.
Post by Joel Avni
However, after I stopped the PostgreSQL slave instance, then vacuum full
did result in a much much smaller size, as expected. So it appears to be
that there must be some interaction between a slave that trying to do
streaming replication but failing, because the requests WALs have been
cycled out and vacuuming on the master. I am not entirely sure that¹s the
case, but I think observed it twice. Is it the master can¹t clean up
tuples that might be visible at the slave¹s last replayed transaction? I
didn¹t think the master was aware of the slave¹s state, and why locks
can¹t be coordinated between the master and slave.
As far as I know the master does not care about the slave(unless you are
using synchronous replication). It generates and recycles WALs and it up
to the slave to keep up. Best guess is that whatever you where doing was
generating and recycling WALs faster then the slave was able to process
them.

For more information on the care and feeding of WALs see below:

Configuration:
http://www.postgresql.org/docs/9.3/interactive/runtime-config-wal.html

Explanation:
http://www.postgresql.org/docs/9.3/interactive/wal.html

in particular:

http://www.postgresql.org/docs/9.3/interactive/wal-configuration.html

Best guess is you have wal_keep_segments set too low:

http://www.postgresql.org/docs/9.3/interactive/runtime-config-replication.html

Another option is to set up a WAL archive so the slave could fetch from
there also. See the Configuration link above.
Post by Joel Avni
Post by Adrian Klaver
Post by Joel Avni
I noticed that tables on my master PostgreSQL server were growing, and
running vacuum full analyze on them actually made them even bigger.
First what version of Postgres are you using?
Second VACUUM FULL is usually not recommended for the reason you found
http://www.postgresql.org/docs/9.3/interactive/sql-vacuum.html
FULL
Selects "full" vacuum, which can reclaim more space, but takes much
longer and exclusively locks the table. This method also requires extra
disk space, since it writes a new copy of the table and doesn't release
the old copy until the operation is complete. Usually this should only
be used when a significant amount of space needs to be reclaimed from
within the table.
"
Post by Joel Avni
At the same time, a slave PostgreSQL server had fallen behind in trying
to replicate, and was stuck in constantly looping over Œstarted
streaming WAL from primary atŠ¹ and Œrequested WAL segment Š. has
already been removed¹. Once I stopped running the slave instance, I was
able to manually vacuum the tables, and appears that auto vacuum is now
able to vacuum as well. One table (for instance) dropped from 10Gb down
to 330Mb after this operation. I don¹t see anything about auto vacuum
not able to acquire locks while the slave wasn¹t able to replicate. I
am unclear why a slave trying to continue streaming would block the auto
vacuum, or is something else at play?
My guess related to the locks your VACUUM FULL was taking, though it
would require more information on what all the various parts where doing
over the time frame.
Post by Joel Avni
I did check, and no base backups were in progress at the time this occurred.
Thank you,
Joel Avni
--
Adrian Klaver
--
Adrian Klaver
***@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Loading...