Discussion:
PG_DUMP very slow because of STDOUT ??
Andras Fabian
2010-07-09 11:48:15 UTC
Permalink
Hi,

I am new to the mailing list (as a writer ... reading since a long time), but am using PostgreSQL in our company for quite some time now.
Lately, we rolled out some new servers with vastly improved hardware (8 disk RAID 1+0, instead of old 2 disk RAID 1 etc. etc.), new OS (Ubuntu 10.04 server instead of Debian Lenny) and newer PostgreSQL version (8.4.4 instead of 8.3.8). And everything seemed to be much better - as expected - until, well until I have seen the first full backup run-times.
For example:

- One of our old machines needed approximately 2 œ hours to make a dump (with "-Fc -compress=9") of our approximately 100 GByte DB.

- One of our new machines with the copy of the same 100 GByte DB needed 6 œ to 7 hours to dump the same DB.

First check of our MRTG monitoring showed, that during the backup, the machine was not saturated in any way by load (IO Wait were below 5%), CPUs were almost idling so you got the feeling, that machine was sleeping ... But of course it did the backup, only that it was very slow.

No I further investigated. I also did some detailed monitoring with collect (http://collectl.sourceforge.net/), seeing, that the machine often was doing some slow reading and every 20-30 seconds wrote down a few Mbytes on the backup disk. So, almost everything suggests that we are very very far from some hardware limitations. I also checked some file write/ reads on the RAID, and could confirm, that plain DD of some multy GByte file only took a snap.

OK, then I checked what PG_DUMP was doing. In the process list of PostgreSQL (viewed via pgAdmin), I see how PG_DUMP works (didn't care much about it until now). In does a COPY to STDOUT. Now I wanted to narrow down the problem, and tried to reproduce that COPY statement of one of our 8 GByte DBs (that table only has INT, BIGINT, BOOLEAN fields ! So no potential BLOB problem!). And now comes the interesting part.

- When I did "COPY public.bigtable (column1, column2, column3) TO '/var/tmp/test.dump';" it is FAST. It writes down the dump file of 3 GBytes in about 2 œ minutes (as expected with the hardware)!

- When I did "time sudo -u postgres /var/software/pgsql1/bin/psql -h /tmp/pgsql1 -p 4306 -d database_name -f copy-test.sql > /var/tmp/test.dump" (and copy-test.sql has : "COPY public.cachereleaseartist (releasebyterritoryid, artistid, isdisplayartist, artistroleclusterid, artistroletypeid, iscontainedinrelease, iscontainedinresource) TO STDOUT;" ) ... I couldn't wait until it ended .. after 20 minutes the test.dump file was merely at 1 Gbyte (far from the target size of 3 Gbyte).
I also monitored both statements via collect, and could confirm that the direct COPY to file made some reasonable IO activity, while the COPY via STDOUT was idling like mad, with some occasional bursts and falling asleep again. This would also make clear, why PG_DUMP is so dog slow. If it gets its data via STDOUT at that speed, then it falls asleep too ....

And to make things even worse: I did the same test on our old hardware too. And BOTH, COPY to file directly AND COPY via STDOUT war equally fast (took 5 œ minutes ... but OK, that's an old iron compared to the new one)!!! No difference between file and STDOUT (as expected)!

Now I ask, whats going on here ???? Why is COPY via STDOUT so much slower on out new machine?

- Has anything changed between PostgreSQL 8.3.8 and PostgreSQL 8.4.4 in this respect? Maybe some STDOUT buffer flushing from COPY? Buffer handling? Buffer size???

- Is maybe the OS to blame? As I told, we changed from Debian Lenny (Kernel 2.6.26-2) to Ubuntu 10.04 Server (Kernel 2.6.32-22) (both 64-bit)? And if yes, would there be ways for PostgreSQL to adapt to the new OSs new behavior in STDOUT (if there is such thing)???

And for the Hardware:

- Old machine was an 8 Core with 16 GByte RAM BUT only one RAID 1 (software!) with two disks for everything (I know, that was a little bit meager).

- New machine has 8 Core (with hyper-threading 16), 24 GByte RAM, and a RAID 1+0 with 8 disks for Data only, and XLOG/Backup/OS on a second RAID 1 with 2 disks, with hardware raid controller and battery backed cache (so, obviously, this machine should be faster than the old one)

Can someone shed some light on this STDOUT madness? Or give me some directions/hints in which I could further research?

Thank you very much!

Andras Fabian
Atrada Trading Network AG
Tom Lane
2010-07-09 13:38:47 UTC
Permalink
Andras Fabian <***@atrada.net> writes:
> Now I ask, whats going on here ???? Why is COPY via STDOUT so much slower on out new machine?

Something weird about the network stack on the new machine, maybe.
Have you compared the transfer speeds for Unix-socket and TCP connections?

On a Red Hat box I would try using oprofile to see where the bottleneck
is ... don't know if that's available for Ubuntu.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-09 13:47:29 UTC
Permalink
> Something weird about the network stack on the new machine, maybe.
> Have you compared the transfer speeds for Unix-socket and TCP connections?

Hmm, no ... but how do Unix-socket / TCP connections relate to STDOUT here (sorry, maybe this is obvious to some, but not to me at the moment)?
And doing some profiling came to my mind too, but didn't have much experience in that field until now (at least not on basic OS related stuff). I might check out oprofile (yes, it seems to exist for Ubuntu) next week (now its time to leave for the weekend :-)
And maybe others have some more ideas until then ... (or have experience this or similar weirdness too).

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Tom Lane [mailto:***@sss.pgh.pa.us]
Gesendet: Freitag, 9. Juli 2010 15:39
An: Andras Fabian
Cc: pgsql-***@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

Andras Fabian <***@atrada.net> writes:
> Now I ask, whats going on here ???? Why is COPY via STDOUT so much slower on out new machine?

Something weird about the network stack on the new machine, maybe.
Have you compared the transfer speeds for Unix-socket and TCP connections?

On a Red Hat box I would try using oprofile to see where the bottleneck
is ... don't know if that's available for Ubuntu.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Tom Lane
2010-07-09 14:08:54 UTC
Permalink
Andras Fabian <***@atrada.net> writes:
>> Something weird about the network stack on the new machine, maybe.
>> Have you compared the transfer speeds for Unix-socket and TCP connections?

> Hmm, no ... but how do Unix-socket / TCP connections relate to STDOUT here (sorry, maybe this is obvious to some, but not to me at the moment)?

Well, COPY TO STDOUT really means "send the data across the network
connection to the client". It sounds to me like your problem is slow
data transfer between the database backend and pg_dump (or psql).
Not sure what could be causing it, unless perhaps you've got a kernel
packet filter that's performing an unreasonable amount of processing
per packet.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Craig Ringer
2010-07-09 20:11:29 UTC
Permalink
On 09/07/10 22:08, Tom Lane wrote:
> Andras Fabian <***@atrada.net> writes:
>>> Something weird about the network stack on the new machine, maybe.
>>> Have you compared the transfer speeds for Unix-socket and TCP connections?
>
>> Hmm, no ... but how do Unix-socket / TCP connections relate to STDOUT here (sorry, maybe this is obvious to some, but not to me at the moment)?
>
> Well, COPY TO STDOUT really means "send the data across the network
> connection to the client". It sounds to me like your problem is slow
> data transfer between the database backend and pg_dump (or psql).
> Not sure what could be causing it, unless perhaps you've got a kernel
> packet filter that's performing an unreasonable amount of processing
> per packet.

... and that should be showing up as high "system" CPU time, or it seems
to on my machines with lots of tcp/ip traffic and heavy netfilter rules.
If they're using unix sockets packet filtering shouldn't come into play
anyway.

Andrais: Are you using UNIX sockets or TCP/IP to connect to PostgreSQL?
If you specify any host to connect to, even "localhost", you're using
TCP/IP. If you don't specify a host, you're using unix sockets.

Whichever you are doing, try the other one and see if performance is
different.

Idea: One notable difference between COPY to file and 'copy to stdout"
to a receiving process over the network is that the latter involves the
interaction of two processes. When a kernel change is involved and
you're seeing low performance with lots of apparent idleness, that
immediately makes me think "process scheduler".

Of course, that might be completely off-track, but I've seen weird
performance issues caused by scheduler changes before.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-09 20:25:49 UTC
Permalink
Hi guys,

well definitely thanks for these first ideas. They at least point in a direction which I havn't considered before (to be honest, I didn't even think about associating STDOUT with some socket stuff).
With my experiments, I think I have essentially tried both socket types. Once I normally connected via pgAdmin to my server (this would definitely qualify as TCP/IP I think :-), and then also did locally on the machine via "psql" (see the example in my original post). So, in that case, it must have been unix sockets. And both cases behaved similarly when issuing the COPY TO STDOUT (i.e.: sloooowly).
And for difference between COPY TO FILE and and COPY TO STDOUT ... when I did the same experiment on old machine/kernel, there was almost no difference between execution time (almost identical, and fast as expected) - that test was done from "psql"/command line ... so unix-sockets.
So, of course, there might be something with our new kernel. But then many others could have this "problem" too! Because we are using the stock kernel of the latest Ubuntu 10.04 LTS 64-bit server (nothing self compiled, just as is,and as it comes from Canonical after you downloaded it). Maybe, someone having a machine with Ubuntu 10.04 (server!) and a PostgreSQL on it, could try to reproduce it too.
By the way, how could I experiment with the scheduler? I remember, that there were some magic trick via /proc file system or something like that to "tweak" it on the fly ....

Andras Fabian

________________________________________
Von: Craig Ringer [***@postnewspapers.com.au]
Gesendet: Freitag, 9. Juli 2010 22:11
An: Tom Lane
Cc: Andras Fabian; pgsql-***@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 09/07/10 22:08, Tom Lane wrote:
> Andras Fabian <***@atrada.net> writes:
>>> Something weird about the network stack on the new machine, maybe.
>>> Have you compared the transfer speeds for Unix-socket and TCP connections?
>
>> Hmm, no ... but how do Unix-socket / TCP connections relate to STDOUT here (sorry, maybe this is obvious to some, but not to me at the moment)?
>
> Well, COPY TO STDOUT really means "send the data across the network
> connection to the client". It sounds to me like your problem is slow
> data transfer between the database backend and pg_dump (or psql).
> Not sure what could be causing it, unless perhaps you've got a kernel
> packet filter that's performing an unreasonable amount of processing
> per packet.

... and that should be showing up as high "system" CPU time, or it seems
to on my machines with lots of tcp/ip traffic and heavy netfilter rules.
If they're using unix sockets packet filtering shouldn't come into play
anyway.

Andrais: Are you using UNIX sockets or TCP/IP to connect to PostgreSQL?
If you specify any host to connect to, even "localhost", you're using
TCP/IP. If you don't specify a host, you're using unix sockets.

Whichever you are doing, try the other one and see if performance is
different.

Idea: One notable difference between COPY to file and 'copy to stdout"
to a receiving process over the network is that the latter involves the
interaction of two processes. When a kernel change is involved and
you're seeing low performance with lots of apparent idleness, that
immediately makes me think "process scheduler".

Of course, that might be completely off-track, but I've seen weird
performance issues caused by scheduler changes before.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-12 08:45:09 UTC
Permalink
Hi Tom (or others),

are there some recommended settings/ways to use oprofile on a situation like this??? I got it working, have seen a first profile report, but then managed to completely freeze the server on a second try with different oprofile settings (next tests will go against the newly installed - next and identical - new servers).

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Tom Lane [mailto:***@sss.pgh.pa.us]
Gesendet: Freitag, 9. Juli 2010 15:39
An: Andras Fabian
Cc: pgsql-***@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

Andras Fabian <***@atrada.net> writes:
> Now I ask, whats going on here ???? Why is COPY via STDOUT so much slower on out new machine?

Something weird about the network stack on the new machine, maybe.
Have you compared the transfer speeds for Unix-socket and TCP connections?

On a Red Hat box I would try using oprofile to see where the bottleneck
is ... don't know if that's available for Ubuntu.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-12 13:03:49 UTC
Permalink
This STDOU issue gets even weirder. Now I have set up our two new servers (identical hw/sw) as I would have needed to do so anyways. After having PG running, I also set up the same test scenario as I have it on our problematic servers, and started the COPY-to-STDOUT experiment. And you know what? Both new servers are performing well. No hanging, and the 3 GByte test dump was written in around 3 minutes (as expected). To make things even more complicated ... I went back to our production servers. Now, the first one - which I froze up with oprofile this morning and needed a REBOOT - is performing well too! It needed 3 minutes for the test case ... WTF? BUT, the second production server, which did not have a reboot, is still behaving badly.
Now I tried to dig deeper (without killing a production server again) ... and came to comparing the outputs of PS (with '-fax' parameter then, '-axl'). Now I have found something interesting:
- all fast servers show the COPY process as being in the state Rs ("runnable (on run queue)")
- on the still slow server, this process is in 9 out of 10 samples in Ds ("uninterruptible sleep (usually IO)")

Now, this "Ds" state seems to be something unhealthy - especially if it is there almost all the time - as far as my first reeds on google show (and although it points to IO, there is seemingly only very little IO, and IO-wait is minimal too). I have also done "-axl" with PS, which brings the following line for our process:
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
1 5551 2819 4201 20 0 5941068 201192 conges Ds ? 2:05 postgres: postgres musicload_cache [local] COPY"

Now, as far as I understood from my google searches, the column WCHAN shows, where in the kernel my process is hanging. Here it says "conges". Now, can somebody tell me, what "conges" means ???? Or do I have other options to get out even more info from the system (maybe without oprofile - as it already burned my hand :-).

And yes, now I see a reboot as a possible "Fix", but that would not ensure me, that the problem will not resurface. So, for the time being, I will leave my current second production server as is ... so I can further narrow down the potential reasons of this strange STDOUT slow down (especially I someone ha s a tip for me :-)

Andras Fabian

(in the meantime my "slow" server finished the COPY ... it took 46 minutes instead of 3 minutes on the fast machines ... a slowdown of factor 15).




-----Ursprüngliche Nachricht-----
Von: Andras Fabian
Gesendet: Montag, 12. Juli 2010 10:45
An: 'Tom Lane'
Cc: pgsql-***@postgresql.org
Betreff: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

Hi Tom (or others),

are there some recommended settings/ways to use oprofile on a situation like this??? I got it working, have seen a first profile report, but then managed to completely freeze the server on a second try with different oprofile settings (next tests will go against the newly installed - next and identical - new servers).

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Tom Lane [mailto:***@sss.pgh.pa.us]
Gesendet: Freitag, 9. Juli 2010 15:39
An: Andras Fabian
Cc: pgsql-***@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

Andras Fabian <***@atrada.net> writes:
> Now I ask, whats going on here ???? Why is COPY via STDOUT so much slower on out new machine?

Something weird about the network stack on the new machine, maybe.
Have you compared the transfer speeds for Unix-socket and TCP connections?

On a Red Hat box I would try using oprofile to see where the bottleneck
is ... don't know if that's available for Ubuntu.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Greg Smith
2010-07-12 18:54:53 UTC
Permalink
Andras Fabian wrote:
> - all fast servers show the COPY process as being in the state Rs ("runnable (on run queue)")
> - on the still slow server, this process is in 9 out of 10 samples in Ds ("uninterruptible sleep (usually IO)")
>

I've run into significant performance regressions in PostgreSQL
performance due to issues with the Linux scheduler before, specifically
when running a single really intensive client program. You might be
seeing something similar here. I wrote a reference link heavy blog
entry about that at
http://notemagnet.blogspot.com/2008/05/pgbench-suffering-with-linux-2623-2626.html
you might find useful, one of the batch scheduler tweaks alluded to
there might improve things. Regression here in newer kernels are the
norm rather than the exception, and given the general lack of quality
control in Ubuntu 10.04 I have avoided any performance testing of it
yet. I was going to give it six months after release before I even
thought about that, in hopes more bugs are squashed, but I'm not
optimistic about this distribution for server use at all right now.

There's more information about using oprofile at
http://wiki.postgresql.org/wiki/Profiling_with_OProfile that might help
you dig into the underlying spot it's stuck at.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
***@2ndQuadrant.com www.2ndQuadrant.us


--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Scott Marlowe
2010-07-13 01:43:20 UTC
Permalink
On Mon, Jul 12, 2010 at 7:03 AM, Andras Fabian <***@atrada.net> wrote:
> This STDOU issue gets even weirder. Now I have set up our two new servers (identical hw/sw) as I would have needed to do so anyways. After having PG running, I also set up the same test scenario as I have it on our problematic servers, and started the COPY-to-STDOUT experiment. And you know what? Both new servers are performing well. No hanging, and the 3 GByte test dump was written in around 3 minutes (as expected). To make things even more complicated ... I went back to our production servers. Now, the first one - which I froze up with oprofile this morning and needed a REBOOT - is performing well too! It needed 3 minutes for the test case ... WTF? BUT, the second production server, which did not have a reboot, is still behaving badly.

I'm gonna take a scientific wild-assed guess that your machine was
rebuilding RAID arrays when you started out, and you had massive IO
contention underneath the OS level resulting in such a slow down.
Note that you mentioned ~5% IO Wait. That's actually fairly high if
you've got 8 to 16 cores or something like that. It's much better to
use iostat -xd 60 or something like that and look for IO Utilization
at the end of the lines.

Again, just a guess.

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-13 06:31:18 UTC
Permalink
Hi Scott,

Although I can't guarantee for 100% that there was no RAID rebuild at some point, I am almost sure that it wasn't the case. Two machines - the ones which were already in production - exhibited this problem. Both of them were already up for some weeks. Now, the reboot rather "fixed" one of them instead of making it worse (as your theory goes this way) the problem "disappeared" (but I don't know for how long). Now, only one of the production machines has the issue ... the one which wasn't rebooted. Strange, strange. Nevertheless thank you for your idea ... this is exactly the way I try to approach the problem, by making some theories and trying to prove or disapprove them :-)
Now I will try to further investigate along the tips from Craig and Greg.

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Scott Marlowe [mailto:***@gmail.com]
Gesendet: Dienstag, 13. Juli 2010 03:43
An: Andras Fabian
Cc: Tom Lane; pgsql-***@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

On Mon, Jul 12, 2010 at 7:03 AM, Andras Fabian <***@atrada.net> wrote:
> This STDOU issue gets even weirder. Now I have set up our two new servers (identical hw/sw) as I would have needed to do so anyways. After having PG running, I also set up the same test scenario as I have it on our problematic servers, and started the COPY-to-STDOUT experiment. And you know what? Both new servers are performing well. No hanging, and the 3 GByte test dump was written in around 3 minutes (as expected). To make things even more complicated ... I went back to our production servers. Now, the first one - which I froze up with oprofile this morning and needed a REBOOT - is performing well too! It needed 3 minutes for the test case ... WTF? BUT, the second production server, which did not have a reboot, is still behaving badly.

I'm gonna take a scientific wild-assed guess that your machine was
rebuilding RAID arrays when you started out, and you had massive IO
contention underneath the OS level resulting in such a slow down.
Note that you mentioned ~5% IO Wait. That's actually fairly high if
you've got 8 to 16 cores or something like that. It's much better to
use iostat -xd 60 or something like that and look for IO Utilization
at the end of the lines.

Again, just a guess.

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Scott Marlowe
2010-07-13 07:45:12 UTC
Permalink
On Tue, Jul 13, 2010 at 12:31 AM, Andras Fabian <***@atrada.net> wrote:
> Hi Scott,
>
> Although I can't guarantee for 100% that there was no RAID rebuild at some point, I am almost sure that it wasn't the case. Two machines - the ones which were already in production - exhibited this problem. Both of them were already up for some weeks. Now, the reboot rather "fixed" one of them instead of making it worse (as your theory goes this way) the problem "disappeared" (but I don't know for how long). Now, only one of the production machines has the issue ... the one which wasn't rebooted. Strange, strange. Nevertheless thank you for your idea ... this is exactly the way I try to approach the problem, by making some theories and trying to prove or disapprove them :-)
> Now I will try to further investigate along the tips from Craig and Greg.

Was there maybe a kernel update that hadn't been applied by reboot?

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-13 07:53:07 UTC
Permalink
Hi Scott,

No, we didn't have a kernel update (it is still the stock Ubuntu 10.04 Server kernel ... 2.6.32.2). And in the meantime - this morning - I have discovered, that the rebooted server is again slowing down! It is not at the level of the not-rebooted-server (about 45 mins for the 3 Gig file)... it "only" needs 22 minutes, but it is already quite a bit away from the optimum of 3 minutes (or less). So, definitely, something is "deteriorating" in the system ...

And I also did dome readings with iostat -xd 5 ... And the target drive to which the output of the STDOUT is directed is below 1% utilization (mostly around 0.2 - 0.4%) with rare "peaks around 2-3% when it does a little bit more. And this is maybe one of the interesting observations. It seems to periodically "flush" a bit more out, just to fall asleep again (with minimum write activity). The drive, from which the reads come (the one, where PG-s data files are ... it is the 8-disk RAID 10), has a little bit more activity (utilization 6-8%) but this data is also - concurrently - in use by some apps reading from the DB (just, normal traffic on the DB).

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Scott Marlowe [mailto:***@gmail.com]
Gesendet: Dienstag, 13. Juli 2010 09:45
An: Andras Fabian
Cc: Tom Lane; pgsql-***@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

On Tue, Jul 13, 2010 at 12:31 AM, Andras Fabian <***@atrada.net> wrote:
> Hi Scott,
>
> Although I can't guarantee for 100% that there was no RAID rebuild at some point, I am almost sure that it wasn't the case. Two machines - the ones which were already in production - exhibited this problem. Both of them were already up for some weeks. Now, the reboot rather "fixed" one of them instead of making it worse (as your theory goes this way) the problem "disappeared" (but I don't know for how long). Now, only one of the production machines has the issue ... the one which wasn't rebooted. Strange, strange. Nevertheless thank you for your idea ... this is exactly the way I try to approach the problem, by making some theories and trying to prove or disapprove them :-)
> Now I will try to further investigate along the tips from Craig and Greg.

Was there maybe a kernel update that hadn't been applied by reboot?

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Craig Ringer
2010-07-13 09:05:02 UTC
Permalink
On 13/07/2010 3:53 PM, Andras Fabian wrote:
> Hi Scott,
>
> No, we didn't have a kernel update (it is still the stock Ubuntu 10.04 Server kernel ... 2.6.32.2). And in the meantime - this morning - I have discovered, that the rebooted server is again slowing down! It is not at the level of the not-rebooted-server (about 45 mins for the 3 Gig file)... it "only" needs 22 minutes, but it is already quite a bit away from the optimum of 3 minutes (or less). So, definitely, something is "deteriorating" in the system ...
>
> And I also did dome readings with iostat -xd 5 ... And the target drive to which the output of the STDOUT is directed is below 1% utilization (mostly around 0.2 - 0.4%) with rare "peaks around 2-3% when it does a little bit more. And this is maybe one of the interesting observations. It seems to periodically "flush" a bit more out, just to fall asleep again (with minimum write activity). The drive, from which the reads come (the one, where PG-s data files are ... it is the 8-disk RAID 10), has a little bit more activity (utilization 6-8%) but this data is also - concurrently - in use by some apps reading from the DB (just, normal traffic on the DB).

I don't think it'll be particularly helpful in this case, but you never
know, so: another information-collecting tool is "blktrace". This can
let you observe in detail exactly what's being done on a given block
device. It's helpful when you have weird I/O patterns and can't figure
out why, as it'll often reveal some process continually poking away at
some file it doesn't need to, thrashing away on a disk-backed mmap()ed
tempfile, or the like.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Craig Ringer
2010-07-13 03:03:23 UTC
Permalink
On 12/07/10 21:03, Andras Fabian wrote:
> This STDOU issue gets even weirder. Now I have set up our two new servers (identical hw/sw) as I would have needed to do so anyways. After having PG running, I also set up the same test scenario as I have it on our problematic servers, and started the COPY-to-STDOUT experiment. And you know what? Both new servers are performing well. No hanging, and the 3 GByte test dump was written in around 3 minutes (as expected). To make things even more complicated ... I went back to our production servers. Now, the first one - which I froze up with oprofile this morning and needed a REBOOT - is performing well too! It needed 3 minutes for the test case ... WTF? BUT, the second production server, which did not have a reboot, is still behaving badly.
> Now I tried to dig deeper (without killing a production server again) ... and came to comparing the outputs of PS (with '-fax' parameter then, '-axl'). Now I have found something interesting:
> - all fast servers show the COPY process as being in the state Rs ("runnable (on run queue)")
> - on the still slow server, this process is in 9 out of 10 samples in Ds ("uninterruptible sleep (usually IO)")
>
> Now, this "Ds" state seems to be something unhealthy - especially if it is there almost all the time - as far as my first reeds on google show (and although it points to IO, there is seemingly only very little IO, and IO-wait is minimal too). I have also done "-axl" with PS, which brings the following line for our process:
> F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
> 1 5551 2819 4201 20 0 5941068 201192 conges Ds ? 2:05 postgres: postgres musicload_cache [local] COPY"


Your wchan column isn't wide enough to show the full function name, but
I'd say it's related to some form of throttling or congestion control.

Get a wider view of that column to find out what the full function name
is. Grepping the kernel source for it can then tell you a lot about
where in the kernel it is and what might be going on.

Try:

ps ax -O wchan:40

to get a decently wide view of that col.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-13 08:05:26 UTC
Permalink
Craig, thanks for that PS tip (you think, you have used PS for such a long time, but it still has some new tricks available).

And here is the more readable line:

26390 congestion_wait D ? 00:00:26 postgres: postgres musicload_cache [local] COPY

So the kernel function it is always idling on seems to be congestion_wait ... I am already looking around at google and in the kernel source (for 2.6.32.2)
http://lxr.linux.no/#linux+v2.6.32.2/mm/backing-dev.c#L762
/**
* congestion_wait - wait for a backing_dev to become uncongested
* @sync: SYNC or ASYNC IO
* @timeout: timeout in jiffies
*
* Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit
* write congestion. If no backing_devs are congested then just wait for the
* next write to be completed.
*/
So, obviously, for some reason we are waiting too much for a backind_device ... which ever it is at the moment. Because, as I just wrote to Scott Marlowe, the disk system is almost idling (have seen disk utilization on the drive to which I write below 1%).

So, the question seems to be, why and where this "idling happens".

Just as a test, I have tried a very simple piping example (which should go trough STDOUT too ... or am I wrong).
- "dd if=/dev/zero of=file_10GB bs=1024 count=10M" created a 10 GB test file on source drive (sdb)
- "time cat file_10GB > /var/tmp/test2.dump" ... pipe the file to target drive (/var/tmp is on sda)
This take only a bit over 2 minutes to complete ... and although, one sees congestion_wait in PS from time to time, it is not dominant, and there are other EXT4 or IO related function shown. ... But maybe this mini-experiment is lacking some important steps, that are in the path of a COPY-to-STDOUT from PostgreSQL.

Andras Fabian



-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:***@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 05:03
An: Andras Fabian
Cc: Tom Lane; pgsql-***@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 12/07/10 21:03, Andras Fabian wrote:
> This STDOU issue gets even weirder. Now I have set up our two new servers (identical hw/sw) as I would have needed to do so anyways. After having PG running, I also set up the same test scenario as I have it on our problematic servers, and started the COPY-to-STDOUT experiment. And you know what? Both new servers are performing well. No hanging, and the 3 GByte test dump was written in around 3 minutes (as expected). To make things even more complicated ... I went back to our production servers. Now, the first one - which I froze up with oprofile this morning and needed a REBOOT - is performing well too! It needed 3 minutes for the test case ... WTF? BUT, the second production server, which did not have a reboot, is still behaving badly.
> Now I tried to dig deeper (without killing a production server again) ... and came to comparing the outputs of PS (with '-fax' parameter then, '-axl'). Now I have found something interesting:
> - all fast servers show the COPY process as being in the state Rs ("runnable (on run queue)")
> - on the still slow server, this process is in 9 out of 10 samples in Ds ("uninterruptible sleep (usually IO)")
>
> Now, this "Ds" state seems to be something unhealthy - especially if it is there almost all the time - as far as my first reeds on google show (and although it points to IO, there is seemingly only very little IO, and IO-wait is minimal too). I have also done "-axl" with PS, which brings the following line for our process:
> F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
> 1 5551 2819 4201 20 0 5941068 201192 conges Ds ? 2:05 postgres: postgres musicload_cache [local] COPY"


Your wchan column isn't wide enough to show the full function name, but
I'd say it's related to some form of throttling or congestion control.

Get a wider view of that column to find out what the full function name
is. Grepping the kernel source for it can then tell you a lot about
where in the kernel it is and what might be going on.

Try:

ps ax -O wchan:40

to get a decently wide view of that col.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Craig Ringer
2010-07-13 09:00:53 UTC
Permalink
On 13/07/2010 4:05 PM, Andras Fabian wrote:
> Craig, thanks for that PS tip (you think, you have used PS for such a long time, but it still has some new tricks available).

> So, obviously, for some reason we are waiting too much for a backind_device ... which ever it is at the moment. Because, as I just wrote to Scott Marlowe, the disk system is almost idling (have seen disk utilization on the drive to which I write below 1%).

A quick search suggests that most calls into congestion_wait are in the
memory management subsystem, and are involved in situations where the
system is struggling for memory. However, that includes memory consumed
by I/O buffers, writeback for I/O, etc, so it'd also be consistent with
very slow I/O causing write throttling as the system tried to write
already buffered data to disk.

Most other calls are in file system drivers.

At this point I'd be taking a closer look at "vmstat 1" and "iostat 1"
output, plus "top", to see if any interesting clues about system-wide
issues turned up.

I'd also be trying to perform each step of the problem operation in
isolation as much as possible, so as to see if I could find out what
particular part was causing the slowdown. Comparing "\copy" to "COPY ...
TO STDOUT", invoking "COPY ... TO STDOUT" with a standalone backend
writing output to an on disk file and to /dev/null, etc.

> So, the question seems to be, why and where this "idling happens".

You can potentially find out more by getting a trace of the kernel
function call stack for the backend process. The kernel call stack of a
process at any given time can be obtained by reading /proc/$pid/stack .

This will tell you not only what call it's waiting in in the kernel, but
what function(s) called it, and even the code offset within each function.

> Just as a test, I have tried a very simple piping example (which should go trough STDOUT too ... or am I wrong).
> - "dd if=/dev/zero of=file_10GB bs=1024 count=10M" created a 10 GB test file on source drive (sdb)
> - "time cat file_10GB> /var/tmp/test2.dump" ... pipe the file to target drive (/var/tmp is on sda)

Isn't your issue suspected to be with network transfers over unix
sockets and/or tcp/ip, rather than with pipes?

Try "socat" if you want to test unix socket performance and/or tcp/ip
socket performance. It's an amazing sysadmin/network swiss army knife.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-13 09:18:47 UTC
Permalink
Hi Craig,

Yes, a first look at /proc/pid/stack shows something that smells like memory management ... ut least up to the point where congestion_wait is called.
------------------------------------------------------------------
[<ffffffff8110d750>] congestion_wait+0x70/0x90
[<ffffffff81101cb7>] shrink_inactive_list+0x667/0x7e0
[<ffffffff81101ec1>] shrink_list+0x91/0xf0
[<ffffffff811020b7>] shrink_zone+0x197/0x240
[<ffffffff81102886>] __zone_reclaim+0x146/0x260
[<ffffffff81102ab7>] zone_reclaim+0x117/0x150
[<ffffffff810f8fd4>] get_page_from_freelist+0x544/0x6c0
[<ffffffff810f98c9>] __alloc_pages_nodemask+0xd9/0x180
[<ffffffff81131822>] kmalloc_large_node+0x62/0xb0
[<ffffffff81135df9>] __kmalloc_node_track_caller+0x109/0x160
[<ffffffff814665f0>] __alloc_skb+0x80/0x190
[<ffffffff81462504>] sock_alloc_send_pskb+0x1c4/0x320
[<ffffffff81462675>] sock_alloc_send_skb+0x15/0x20
[<ffffffff814f54d5>] unix_stream_sendmsg+0x275/0x3e0
[<ffffffff8145f5ab>] sock_sendmsg+0x10b/0x140
[<ffffffff8145f765>] sys_sendto+0x125/0x180
[<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
------------------------------------------------------------------

I also looked at "whats happening" with strace, where you see traffic like:
------------------------------------------------------------------
....
[ 7fc153d925b2] sendto(8, "\tt\tt\nd\0\0\0\0378241003\t974369\tt\t\\N\t10"..., 8192, 0, NULL, 0) = 8192
[ 7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[ 7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[ 7fc153d925b2] sendto(8, "0\tt\tt\nd\0\0\0\0378241003\t830278\tt\t\\N\t1"..., 8192, 0, NULL, 0) = 8192
[ 7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[ 7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[ 7fc153d925b2] sendto(8, "\tt\tt\nd\0\0\0 8241006\t1114684\tt\t\\N\t1"..., 8192, 0, NULL, 0) = 8192
[ 7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[ 7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[ 7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
....
------------------------------------------------------------------

But still no definitive clue about the reasons. What is also quite interesting is, that when I start my COPY-to-STDOUT experiment, it is running quite fast in the beginning. Sometimes up to 400 Mbytes, sometimes up to 1.4 GBytes (I didn't find a real reason which I could be correlated to this) ... and then, suddenly it begins to stall. From there on, it only advances slowly with all the congestion_wait going on ... Hmm, maybe it has really something to do with the state of the memory ... (this would go well with the fact, that a freshly rebooted server is not having the problem in the beginning).

Ahh, well, and what would/could go against the file system / drive argument. My initial tests, where I did a COPY-to-FILE, I have never head problems (only COPY-to-STDOUT gives them).

Well, I will try to gather some more information with your other hints (quite a lot of stuff - and one learns a lot new tricks in the process :-) ...

Andras Fabian



-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:***@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 11:01
An: Andras Fabian
Cc: Tom Lane; pgsql-***@postgresql.org
Betreff: Re: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/2010 4:05 PM, Andras Fabian wrote:
> Craig, thanks for that PS tip (you think, you have used PS for such a long time, but it still has some new tricks available).

> So, obviously, for some reason we are waiting too much for a backind_device ... which ever it is at the moment. Because, as I just wrote to Scott Marlowe, the disk system is almost idling (have seen disk utilization on the drive to which I write below 1%).

A quick search suggests that most calls into congestion_wait are in the
memory management subsystem, and are involved in situations where the
system is struggling for memory. However, that includes memory consumed
by I/O buffers, writeback for I/O, etc, so it'd also be consistent with
very slow I/O causing write throttling as the system tried to write
already buffered data to disk.

Most other calls are in file system drivers.

At this point I'd be taking a closer look at "vmstat 1" and "iostat 1"
output, plus "top", to see if any interesting clues about system-wide
issues turned up.

I'd also be trying to perform each step of the problem operation in
isolation as much as possible, so as to see if I could find out what
particular part was causing the slowdown. Comparing "\copy" to "COPY ...
TO STDOUT", invoking "COPY ... TO STDOUT" with a standalone backend
writing output to an on disk file and to /dev/null, etc.

> So, the question seems to be, why and where this "idling happens".

You can potentially find out more by getting a trace of the kernel
function call stack for the backend process. The kernel call stack of a
process at any given time can be obtained by reading /proc/$pid/stack .

This will tell you not only what call it's waiting in in the kernel, but
what function(s) called it, and even the code offset within each function.

> Just as a test, I have tried a very simple piping example (which should go trough STDOUT too ... or am I wrong).
> - "dd if=/dev/zero of=file_10GB bs=1024 count=10M" created a 10 GB test file on source drive (sdb)
> - "time cat file_10GB> /var/tmp/test2.dump" ... pipe the file to target drive (/var/tmp is on sda)

Isn't your issue suspected to be with network transfers over unix
sockets and/or tcp/ip, rather than with pipes?

Try "socat" if you want to test unix socket performance and/or tcp/ip
socket performance. It's an amazing sysadmin/network swiss army knife.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Craig Ringer
2010-07-13 10:17:12 UTC
Permalink
On 13/07/10 17:18, Andras Fabian wrote:

> But still no definitive clue about the reasons. What is also quite interesting is, that when I start my COPY-to-STDOUT experiment, it is running quite fast in the beginning. Sometimes up to 400 Mbytes, sometimes up to 1.4 GBytes (I didn't find a real reason which I could be correlated to this) ... and then, suddenly it begins to stall. From there on, it only advances slowly with all the congestion_wait going on ... Hmm, maybe it has really something to do with the state of the memory ... (this would go well with the fact, that a freshly rebooted server is not having the problem in the beginning).

Is there any chance something is buffering the output of the COPY ... TO
STDOUT? Whether it's a pipe/socket buffer that grows to a certain limit,
then begins to be limited by the ability of the receiver to empty it, or
whether it's the receiver buffering the whole COPY in memory before
writing it out, I don't know.

Do you see anything growing in memory use on the system while the COPY
is in progress? Any of the memory use stats in "free -m" changing"?

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-13 10:26:28 UTC
Permalink
Wait, now, here I see some correlation! Yes, it seems to be the memory! When I start my COPY-to-STDOUT experiment I had some 2000 MByte free (well ,the server has 24 GByte ... maybe other PostgreSQL processes used up the rest). Then, I could monitor via "ll -h" how the file nicely growed (obviously no congestion), and in parallel see, how "free -m" the "free" memory went down. Then, it reached a level below 192 MByte, and congestion began. Now it is going back and forth around 118-122-130 ... Obviously the STDOUT thing went out of some memory resources.
Now I "only" who and why is running out, and how I can prevent that. Could there be some extremely big STDOUT buffering in play ????

I will need to do some more investigation. And thanks for your very good pointers which help me to narrow the focus down!

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:***@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 12:17
An: Andras Fabian
Cc: pgsql-***@postgresql.org
Betreff: Re: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/10 17:18, Andras Fabian wrote:

> But still no definitive clue about the reasons. What is also quite interesting is, that when I start my COPY-to-STDOUT experiment, it is running quite fast in the beginning. Sometimes up to 400 Mbytes, sometimes up to 1.4 GBytes (I didn't find a real reason which I could be correlated to this) ... and then, suddenly it begins to stall. From there on, it only advances slowly with all the congestion_wait going on ... Hmm, maybe it has really something to do with the state of the memory ... (this would go well with the fact, that a freshly rebooted server is not having the problem in the beginning).

Is there any chance something is buffering the output of the COPY ... TO
STDOUT? Whether it's a pipe/socket buffer that grows to a certain limit,
then begins to be limited by the ability of the receiver to empty it, or
whether it's the receiver buffering the whole COPY in memory before
writing it out, I don't know.

Do you see anything growing in memory use on the system while the COPY
is in progress? Any of the memory use stats in "free -m" changing"?

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Craig Ringer
2010-07-13 10:51:18 UTC
Permalink
On 13/07/2010 6:26 PM, Andras Fabian wrote:
> Wait, now, here I see some correlation! Yes, it seems to be the memory! When I start my COPY-to-STDOUT experiment I had some 2000 MByte free (well ,the server has 24 GByte ... maybe other PostgreSQL processes used up the rest). Then, I could monitor via "ll -h" how the file nicely growed (obviously no congestion), and in parallel see, how "free -m" the "free" memory went down. Then, it reached a level below 192 MByte, and congestion began. Now it is going back and forth around 118-122-130 ... Obviously the STDOUT thing went out of some memory resources.
> Now I "only" who and why is running out, and how I can prevent that.

> Could there be some extremely big STDOUT buffering in play ????

Remember, "STDOUT" is misleading. The data is sent down the network
socket between the postgres backend and the client connected to that
backend. There is no actual stdio involved at all.

Imagine that the backend's stdout is redirected down the network socket
to the client, so when it sends to "stdout" it's just going to the
client. Any buffering you are interested in is in the unix or tcp/ip
socket (depending on how you're connecting), in the client, and in the
client's output to file/disk/whatever.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-13 10:57:03 UTC
Permalink
OK, so here I should - maybe - look around the sockets. Hmm. Well, in the case of my experiments we are talking about Unix sockets, as I am only connecting locally to the server (not real networking involved). Are there any ideas, where such a Unix Socket could impose such extreme buffering ??? And can/could I control its behavior at all?? (or would it be a thing, which can only be controlled from C-Code ... which would fall back to PostgreSQL as the initiator).

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:***@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 12:51
An: Andras Fabian
Cc: pgsql-***@postgresql.org
Betreff: Re: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/2010 6:26 PM, Andras Fabian wrote:
> Wait, now, here I see some correlation! Yes, it seems to be the memory! When I start my COPY-to-STDOUT experiment I had some 2000 MByte free (well ,the server has 24 GByte ... maybe other PostgreSQL processes used up the rest). Then, I could monitor via "ll -h" how the file nicely growed (obviously no congestion), and in parallel see, how "free -m" the "free" memory went down. Then, it reached a level below 192 MByte, and congestion began. Now it is going back and forth around 118-122-130 ... Obviously the STDOUT thing went out of some memory resources.
> Now I "only" who and why is running out, and how I can prevent that.

> Could there be some extremely big STDOUT buffering in play ????

Remember, "STDOUT" is misleading. The data is sent down the network
socket between the postgres backend and the client connected to that
backend. There is no actual stdio involved at all.

Imagine that the backend's stdout is redirected down the network socket
to the client, so when it sends to "stdout" it's just going to the
client. Any buffering you are interested in is in the unix or tcp/ip
socket (depending on how you're connecting), in the client, and in the
client's output to file/disk/whatever.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Craig Ringer
2010-07-13 12:11:26 UTC
Permalink
On 13/07/10 18:57, Andras Fabian wrote:
> OK, so here I should - maybe - look around the sockets. Hmm. Well, in the case of my experiments we are talking about Unix sockets, as I am only connecting locally to the server (not real networking involved). Are there any ideas, where such a Unix Socket could impose such extreme buffering ??? And can/could I control its behavior at all?? (or would it be a thing, which can only be controlled from C-Code ... which would fall back to PostgreSQL as the initiator).

Confirm your issue is actually the socket between client and server first.

Are you sure the client isn't buffering the data? Or a pager like
"less"? Or terminal scrollback?

Run "top" and sort by memory use by pressing Shift-M. Run your query.
Look at the output and see if anything grows lots.

You really need to start looking at if this is memory consumption, and
what's using it. Don't get fixated on one answer until you've excluded
other possibilities.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-13 12:45:55 UTC
Permalink
Hi Craig,

Now I did the test with top too. Free RAM is around 900. And there doesn't seem to be other processes eating memory away (looked at it with top/htop). The other procs having more RAM did have it before (mostly some postgres processes), and don't grow it at an exporbitant rate. One coul see with free, that most of the time, when it is eaten away, it goes directly to cache. I also observe, that the lower "free" is, the slower COPY-to-STDOUT becomes ... and linearly, the number of times one can see congestion_wait in PS grows. So, with more "free" available (even when it is not eaten away anymore ...) the more responsive COPY-to-STDOUT.

Well, I have now also rerun a few times my plain COPY-to-FILE test. That one seems to behave similarly with the RAM (when monitored via "free -m") ... but it never gets into congestion. So, maybe, just maybe ... could this be some kind of race condition? I mean, the file cache going for RAM (or the memory mapped file ???) and the SOCKET going for some memory too, and then they somehow overwhelm the memory manager (which shouldn't happen). ... Well, this might be a very wild speculation, but still.

I now also looked at the stack, before the congestion_wait. There are some "interesting" calls leading to it. Like "shrink_list", "zone_reclaim" ... after the call of "get_page_from_freelist". Sounds like some heavy memory juggling going on (to get new free pages) ... What ever the reason is for this.

Andras Fabian


-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:***@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 14:11
An: Andras Fabian
Cc: pgsql-***@postgresql.org
Betreff: Re: AW: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/10 18:57, Andras Fabian wrote:
> OK, so here I should - maybe - look around the sockets. Hmm. Well, in the case of my experiments we are talking about Unix sockets, as I am only connecting locally to the server (not real networking involved). Are there any ideas, where such a Unix Socket could impose such extreme buffering ??? And can/could I control its behavior at all?? (or would it be a thing, which can only be controlled from C-Code ... which would fall back to PostgreSQL as the initiator).

Confirm your issue is actually the socket between client and server first.

Are you sure the client isn't buffering the data? Or a pager like
"less"? Or terminal scrollback?

Run "top" and sort by memory use by pressing Shift-M. Run your query.
Look at the output and see if anything grows lots.

You really need to start looking at if this is memory consumption, and
what's using it. Don't get fixated on one answer until you've excluded
other possibilities.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-13 14:16:23 UTC
Permalink
I think I have found the solution. Yes, I now can get constantly high throughput with COPY-to-STDOUT, even if free -m only shows me 82 Mbytes (so no, this solution is not cleaning the cache). Always around 2 3/4 minutes.

I have compared all the /proc/sys/vm settings on my new machines and the old machines (which never had problems), and of course found some differences, some new settings etc. (of course, lot of changes can happen between 2.6.26 and 2.6.32). And there was one, which stood out from the mass, because its name reminded me of some functions which I have always seen in the kernel stack while having congestion_wait:

- zone_reclaim_mode
(yes, in the kernel stack there was always also a call to "zone_reclaim").

Interestingly, on the old machine this was set to "0" and on the new machine - obviously per Ubuntu default - to "1" ... What these all means is shortly described here:

http://www.linuxinsight.com/proc_sys_vm_zone_reclaim_mode.html

Then I though, lets give it a try. Nd I set it to "0" on the new server too ... and voila, it is running at high speed in COPY-to-STDOUT. I can even switch back and forth between 0 and 1 and see, how congestion_wait comes back or disappears.

Now, someone with big kernel know-how could try to describe me in detail, what exactly could be at odds here.

But for me it is now obvious, that I will put in my start up settings a change of "zone_reclaim_mode = 0".

And tomorrow I will see how my nightly backup runs with this setting.

Andras fabian




-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:***@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 14:11
An: Andras Fabian
Cc: pgsql-***@postgresql.org
Betreff: Re: AW: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/10 18:57, Andras Fabian wrote:
> OK, so here I should - maybe - look around the sockets. Hmm. Well, in the case of my experiments we are talking about Unix sockets, as I am only connecting locally to the server (not real networking involved). Are there any ideas, where such a Unix Socket could impose such extreme buffering ??? And can/could I control its behavior at all?? (or would it be a thing, which can only be controlled from C-Code ... which would fall back to PostgreSQL as the initiator).

Confirm your issue is actually the socket between client and server first.

Are you sure the client isn't buffering the data? Or a pager like
"less"? Or terminal scrollback?

Run "top" and sort by memory use by pressing Shift-M. Run your query.
Look at the output and see if anything grows lots.

You really need to start looking at if this is memory consumption, and
what's using it. Don't get fixated on one answer until you've excluded
other possibilities.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Craig Ringer
2010-07-14 03:10:37 UTC
Permalink
On 13/07/10 22:16, Andras Fabian wrote:
> I think I have found the solution. Yes, I now can get constantly high throughput with COPY-to-STDOUT, even if free -m only shows me 82 Mbytes (so no, this solution is not cleaning the cache). Always around 2 3/4 minutes.
>
> I have compared all the /proc/sys/vm settings on my new machines and the old machines (which never had problems), and of course found some differences, some new settings etc. (of course, lot of changes can happen between 2.6.26 and 2.6.32). And there was one, which stood out from the mass, because its name reminded me of some functions which I have always seen in the kernel stack while having congestion_wait:
>
> - zone_reclaim_mode
> (yes, in the kernel stack there was always also a call to "zone_reclaim").
>
> Interestingly, on the old machine this was set to "0" and on the new machine - obviously per Ubuntu default - to "1" ... What these all means is shortly described here:
>
> http://www.linuxinsight.com/proc_sys_vm_zone_reclaim_mode.html
>
> Then I though, lets give it a try. Nd I set it to "0" on the new server too ... and voila, it is running at high speed in COPY-to-STDOUT. I can even switch back and forth between 0 and 1 and see, how congestion_wait comes back or disappears.
>
> Now, someone with big kernel know-how could try to describe me in detail, what exactly could be at odds here.
>
> But for me it is now obvious, that I will put in my start up settings a change of "zone_reclaim_mode = 0".
>
> And tomorrow I will see how my nightly backup runs with this setting.

It sounds like it's time for a post to the Linux Kernel Mailing List,
and/or a Launchpad bug against the Ubuntu kernel.

Make sure to have your asbestos undewear on if posting to LKML ;-)

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Scott Marlowe
2010-07-14 03:32:37 UTC
Permalink
On Tue, Jul 13, 2010 at 9:10 PM, Craig Ringer
<***@postnewspapers.com.au> wrote:
>> And tomorrow I will see how my nightly backup runs with this setting.
>
> It sounds like it's time for a post to the Linux Kernel Mailing List,
> and/or a Launchpad bug against the Ubuntu kernel.
>
> Make sure to have your asbestos undewear on if posting to LKML ;-)

Note that I am testing a new 48 core machine with gobs of ram under
ubuntu 10.04, so I'll do some testing to see if I get the same kind of
issue, especially once I subscribe one of these machines to the slony
farm and get some action going on it to fill its buffers.

Note that the page posted earlier that described zone_reclaim_mode
said that when the memory is mostly used for file caching it's a good
idea to set it to 0. Even though they said that they were talking
about file servers, I'd assume that applies to pgsql as well, since
it's normal to use most of the extra memory as file system cache.

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-14 06:39:39 UTC
Permalink
Hi guys,

I just wanted to tell you, that now the originating problem, namely that backups (with pg_dump) were much slower on the new hardware IS solved. After setting " zone_reclaim_mode = 0", my backups last night were fast as expected ... for my 100 GByte DB it took now only 1 hour (instead of 6 1/2 with "zone_reclaim_mode = 1" ).
And again, thank you all for the good tips (especially about the tools to analyze the issue - I have learned a lot in the process) which helped me to pin the problem down!

I will consider to post it to LKML 8as time permits) or to Ubuntu ...

By the way, would it maybe make sense, to put this information in the PostgreSQL Documentation ... as a kind of cautionary warning? Could make life for some admins easier in the future (especially if Ubuntu or other distributions bring different settings here).

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Scott Marlowe [mailto:***@gmail.com]
Gesendet: Mittwoch, 14. Juli 2010 05:33
An: Craig Ringer
Cc: Andras Fabian; pgsql-***@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

On Tue, Jul 13, 2010 at 9:10 PM, Craig Ringer
<***@postnewspapers.com.au> wrote:
>> And tomorrow I will see how my nightly backup runs with this setting.
>
> It sounds like it's time for a post to the Linux Kernel Mailing List,
> and/or a Launchpad bug against the Ubuntu kernel.
>
> Make sure to have your asbestos undewear on if posting to LKML ;-)

Note that I am testing a new 48 core machine with gobs of ram under
ubuntu 10.04, so I'll do some testing to see if I get the same kind of
issue, especially once I subscribe one of these machines to the slony
farm and get some action going on it to fill its buffers.

Note that the page posted earlier that described zone_reclaim_mode
said that when the memory is mostly used for file caching it's a good
idea to set it to 0. Even though they said that they were talking
about file servers, I'd assume that applies to pgsql as well, since
it's normal to use most of the extra memory as file system cache.

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-14 07:18:49 UTC
Permalink
Well, I have searched a bit more about the zone_reclaim_mode thing, and obviously I am not the first one having problems with it. It seems, there was some kind of logic introduced to the kernel, which decided that our machine must be a NUMA architecture (which it is not), and as a consequence, decided to switch zone_reclaim_mode = 1. But now, it seems, that some people have discovered that this makes no sense with some of the modern processors, which - although - seem to be NUME from software ... are not NUMA. And thus, they want to go back to always set zone_reclaim_mode = 0 by default.

Here is a lengthy discussion about the topic (so no need for me to start it again at LKML - and obviously it is not Ubuntu specific :-)

http://lkml.org/lkml/2009/5/12/586

(and yes, our hardware has Core i7 CPUs which seem to make many kernels think it is NUMA - it is often mentioned in the LKML thread).

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:***@postnewspapers.com.au]
Gesendet: Mittwoch, 14. Juli 2010 05:11
An: Andras Fabian
Cc: pgsql-***@postgresql.org
Betreff: Re: AW: AW: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/10 22:16, Andras Fabian wrote:
> I think I have found the solution. Yes, I now can get constantly high throughput with COPY-to-STDOUT, even if free -m only shows me 82 Mbytes (so no, this solution is not cleaning the cache). Always around 2 3/4 minutes.
>
> I have compared all the /proc/sys/vm settings on my new machines and the old machines (which never had problems), and of course found some differences, some new settings etc. (of course, lot of changes can happen between 2.6.26 and 2.6.32). And there was one, which stood out from the mass, because its name reminded me of some functions which I have always seen in the kernel stack while having congestion_wait:
>
> - zone_reclaim_mode
> (yes, in the kernel stack there was always also a call to "zone_reclaim").
>
> Interestingly, on the old machine this was set to "0" and on the new machine - obviously per Ubuntu default - to "1" ... What these all means is shortly described here:
>
> http://www.linuxinsight.com/proc_sys_vm_zone_reclaim_mode.html
>
> Then I though, lets give it a try. Nd I set it to "0" on the new server too ... and voila, it is running at high speed in COPY-to-STDOUT. I can even switch back and forth between 0 and 1 and see, how congestion_wait comes back or disappears.
>
> Now, someone with big kernel know-how could try to describe me in detail, what exactly could be at odds here.
>
> But for me it is now obvious, that I will put in my start up settings a change of "zone_reclaim_mode = 0".
>
> And tomorrow I will see how my nightly backup runs with this setting.

It sounds like it's time for a post to the Linux Kernel Mailing List,
and/or a Launchpad bug against the Ubuntu kernel.

Make sure to have your asbestos undewear on if posting to LKML ;-)

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Greg Smith
2010-07-14 13:10:14 UTC
Permalink
Andras Fabian wrote:
> Here is a lengthy discussion about the topic (so no need for me to start it again at LKML - and obviously it is not Ubuntu specific :-)
> http://lkml.org/lkml/2009/5/12/58

Once there's some sort of resolution there, you might want to create an
Ubuntu bug report anyway and push for the fix to be backported to the
LTS version you're running. The response you get to that should be an
interesting study in how Ubuntu treats bug fixes for your company.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
***@2ndQuadrant.com www.2ndQuadrant.us


--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Scott Marlowe
2010-07-15 05:18:35 UTC
Permalink
On Tue, Jul 13, 2010 at 8:16 AM, Andras Fabian <***@atrada.net> wrote:
> I think I have found the solution.


> - zone_reclaim_mode
> (yes, in the kernel stack there was always also a call to "zone_reclaim").

Thanks so much for this. I too just got bit by the zone_reclaim_mode
slowing my file transfers to a crawl. I can't believe that option is
enabled by default. It appears to be a fabulously capable foot-gun.

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-13 11:29:02 UTC
Permalink
Now, I have found an unorthodox way, to make a slow machine run COPY-to-STDOUT fast. I empty the cache memory of the server, which makes "free" in "free -m" jump up to 14 GBytes (well, I just noticed, that most of the memory on the server is in "cache" ... up to 22 GBytes). I just entered:

" sync;echo 3 > /proc/sys/vm/drop_caches"

Running the COPY-to-STDOUT test after this immediately went through in a snap (2 1/2 minutes). I also see, that something in relation with the file is "nicely" mapped into cache memory, because as soon as I delete the file (with "rm"), that immediately frees up 3 GBytes of the cache.

This seems to prove, that a memory issue is/was behind the slow down. But still the question remains, why and how this can happen? I mean, at some point the memory manager most have taken a very wrong decision, if this is the result of its "normal" work ... And how the writing trough the socket affects this, I don't understand (because I still see the case, when a normal COPY-to-FILE didn't slow down at the same time when COPY-to-STDOUT was crouching).

So, potentially, maybe ... as a quick fix I could clean caches in my backup script that starts each night. But is this a safe idea at all? Or could there be adverse implications (yes, of course, some queries that got their data from the file cache before would now need to repopulate it) ?

Or is there a way to influence the memory manager Linux in a way, that it behaves a bit more conservative (or just different in a positive way)?

Andras Fabian


-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:***@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 12:51
An: Andras Fabian
Cc: pgsql-***@postgresql.org
Betreff: Re: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/2010 6:26 PM, Andras Fabian wrote:
> Wait, now, here I see some correlation! Yes, it seems to be the memory! When I start my COPY-to-STDOUT experiment I had some 2000 MByte free (well ,the server has 24 GByte ... maybe other PostgreSQL processes used up the rest). Then, I could monitor via "ll -h" how the file nicely growed (obviously no congestion), and in parallel see, how "free -m" the "free" memory went down. Then, it reached a level below 192 MByte, and congestion began. Now it is going back and forth around 118-122-130 ... Obviously the STDOUT thing went out of some memory resources.
> Now I "only" who and why is running out, and how I can prevent that.

> Could there be some extremely big STDOUT buffering in play ????

Remember, "STDOUT" is misleading. The data is sent down the network
socket between the postgres backend and the client connected to that
backend. There is no actual stdio involved at all.

Imagine that the backend's stdout is redirected down the network socket
to the client, so when it sends to "stdout" it's just going to the
client. Any buffering you are interested in is in the unix or tcp/ip
socket (depending on how you're connecting), in the client, and in the
client's output to file/disk/whatever.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Steve Clark
2010-07-13 11:36:36 UTC
Permalink
On 07/13/2010 07:29 AM, Andras Fabian wrote:
> Now, I have found an unorthodox way, to make a slow machine run COPY-to-STDOUT fast. I empty the cache memory of the server, which makes "free" in "free -m" jump up to 14 GBytes (well, I just noticed, that most of the memory on the server is in "cache" ... up to 22 GBytes). I just entered:
>
> " sync;echo 3> /proc/sys/vm/drop_caches"
>
> Running the COPY-to-STDOUT test after this immediately went through in a snap (2 1/2 minutes). I also see, that something in relation with the file is "nicely" mapped into cache memory, because as soon as I delete the file (with "rm"), that immediately frees up 3 GBytes of the cache.
>
> This seems to prove, that a memory issue is/was behind the slow down. But still the question remains, why and how this can happen? I mean, at some point the memory manager most have taken a very wrong decision, if this is the result of its "normal" work ... And how the writing trough the socket affects this, I don't understand (because I still see the case, when a normal COPY-to-FILE didn't slow down at the same time when COPY-to-STDOUT was crouching).
>
> So, potentially, maybe ... as a quick fix I could clean caches in my backup script that starts each night. But is this a safe idea at all? Or could there be adverse implications (yes, of course, some queries that got their data from the file cache before would now need to repopulate it) ?
>
> Or is there a way to influence the memory manager Linux in a way, that it behaves a bit more conservative (or just different in a positive way)?
>
> Andras Fabian
>
>
> -----Ursprüngliche Nachricht-----
> Von: Craig Ringer [mailto:***@postnewspapers.com.au]
> Gesendet: Dienstag, 13. Juli 2010 12:51
> An: Andras Fabian
> Cc: pgsql-***@postgresql.org
> Betreff: Re: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??
>
> On 13/07/2010 6:26 PM, Andras Fabian wrote:
>> Wait, now, here I see some correlation! Yes, it seems to be the memory! When I start my COPY-to-STDOUT experiment I had some 2000 MByte free (well ,the server has 24 GByte ... maybe other PostgreSQL processes used up the rest). Then, I could monitor via "ll -h" how the file nicely growed (obviously no congestion), and in parallel see, how "free -m" the "free" memory went down. Then, it reached a level below 192 MByte, and congestion began. Now it is going back and forth around 118-122-130 ... Obviously the STDOUT thing went out of some memory resources.
>> Now I "only" who and why is running out, and how I can prevent that.
>
> > Could there be some extremely big STDOUT buffering in play ????
>
> Remember, "STDOUT" is misleading. The data is sent down the network
> socket between the postgres backend and the client connected to that
> backend. There is no actual stdio involved at all.
>
> Imagine that the backend's stdout is redirected down the network socket
> to the client, so when it sends to "stdout" it's just going to the
> client. Any buffering you are interested in is in the unix or tcp/ip
> socket (depending on how you're connecting), in the client, and in the
> client's output to file/disk/whatever.
>
> --
> Craig Ringer
>
Have you posted this problem/issue to the linux kernel mailing list? You may get some rude responses about
the posting but someone will give you a tip or what is causing the problem.

I know there are some sysctls that affect memory management. I use the following 2 based on recommendation
of Linus to help interactivity for a desktop user.
/sbin/sysctl vm.dirty_background_ratio=3
/sbin/sysctl vm.dirty_ratio=5




--
Stephen Clark
NetWolves
Sr. Software Engineer III
Phone: 813-579-3200
Fax: 813-882-0209
Email: ***@netwolves.com
www.netwolves.com


--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-13 11:35:19 UTC
Permalink
I have just rechecked one of our old generation machines, which never had/have this problem (where the backup of a 100 GB database - to a 10 GByte dump - is still going trough in about 2 hours). They seem to have this high caching ratio too (one of the machine says it has 15 GByte in cache out of 16) ... but still, they manage to gracefully give back RAM to whoever needs it (i.e. the backup process with its COPY-to-STDOU or others). Well, the difference is of course 6 Kernel versions (old machine 2.6.26 ... new machin 2.6.32) ... or who knows what kernel settings ...

Andras Fabian


-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:***@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 12:51
An: Andras Fabian
Cc: pgsql-***@postgresql.org
Betreff: Re: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/2010 6:26 PM, Andras Fabian wrote:
> Wait, now, here I see some correlation! Yes, it seems to be the memory! When I start my COPY-to-STDOUT experiment I had some 2000 MByte free (well ,the server has 24 GByte ... maybe other PostgreSQL processes used up the rest). Then, I could monitor via "ll -h" how the file nicely growed (obviously no congestion), and in parallel see, how "free -m" the "free" memory went down. Then, it reached a level below 192 MByte, and congestion began. Now it is going back and forth around 118-122-130 ... Obviously the STDOUT thing went out of some memory resources.
> Now I "only" who and why is running out, and how I can prevent that.

> Could there be some extremely big STDOUT buffering in play ????

Remember, "STDOUT" is misleading. The data is sent down the network
socket between the postgres backend and the client connected to that
backend. There is no actual stdio involved at all.

Imagine that the backend's stdout is redirected down the network socket
to the client, so when it sends to "stdout" it's just going to the
client. Any buffering you are interested in is in the unix or tcp/ip
socket (depending on how you're connecting), in the client, and in the
client's output to file/disk/whatever.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-13 12:07:21 UTC
Permalink
Hmm, I continued some testing. Now, strangely, the congestion_wait occurs even if free -m shows me about 1500 Mbytes (before that I tried to "fill up" the cache by doing some plain "cat from_here > to_there" ... which pushed free down to 1500). But, interestingly, the value also doesn't gets lower. So, it stays around 1500 ... seems like the same scenario as before, when it stopped around 120-130 ... So, obviously still some struggling for memory. But not as hard like when it is completely down. Now the file grows quite steadily ... and congestion_wait is not always present in PS ... (finished after 6 1/2 minutes instead the optimum around 2 1/2).

Something very fishy about the memory manager ....

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Andras Fabian
Gesendet: Dienstag, 13. Juli 2010 13:35
An: 'Craig Ringer'
Cc: pgsql-***@postgresql.org
Betreff: AW: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

I have just rechecked one of our old generation machines, which never had/have this problem (where the backup of a 100 GB database - to a 10 GByte dump - is still going trough in about 2 hours). They seem to have this high caching ratio too (one of the machine says it has 15 GByte in cache out of 16) ... but still, they manage to gracefully give back RAM to whoever needs it (i.e. the backup process with its COPY-to-STDOU or others). Well, the difference is of course 6 Kernel versions (old machine 2.6.26 ... new machin 2.6.32) ... or who knows what kernel settings ...

Andras Fabian


-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:***@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 12:51
An: Andras Fabian
Cc: pgsql-***@postgresql.org
Betreff: Re: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/2010 6:26 PM, Andras Fabian wrote:
> Wait, now, here I see some correlation! Yes, it seems to be the memory! When I start my COPY-to-STDOUT experiment I had some 2000 MByte free (well ,the server has 24 GByte ... maybe other PostgreSQL processes used up the rest). Then, I could monitor via "ll -h" how the file nicely growed (obviously no congestion), and in parallel see, how "free -m" the "free" memory went down. Then, it reached a level below 192 MByte, and congestion began. Now it is going back and forth around 118-122-130 ... Obviously the STDOUT thing went out of some memory resources.
> Now I "only" who and why is running out, and how I can prevent that.

> Could there be some extremely big STDOUT buffering in play ????

Remember, "STDOUT" is misleading. The data is sent down the network
socket between the postgres backend and the client connected to that
backend. There is no actual stdio involved at all.

Imagine that the backend's stdout is redirected down the network socket
to the client, so when it sends to "stdout" it's just going to the
client. Any buffering you are interested in is in the unix or tcp/ip
socket (depending on how you're connecting), in the client, and in the
client's output to file/disk/whatever.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
A***@globaldatapoint.com
2010-07-13 12:08:59 UTC
Permalink
Hi All,

We are a data based company and are migrating from Oracle to Postgresql. For this purpose I am doing POC for the same. We have a business requirement to send the Data in XML files to our clients. The file size of XMLs is around 700MB and is growing.

I have been able to generate sample XML files and store the same(Still to test for full files).

However I have the following problems

1) It took 35 minutes to store 1 GB data in the TEXT column
(create table test_text_nearly_1_GB as select repeat('1', 1024*1024*1023):: text as nearly_1_GB)

2) It took 30 Minutes to write the same to OS using COPY command
Copy test_text_nearly_1_GB to '$HOME/test_text_nearly_1_GB.xml'

However when I tested to write a table of size 3GB with 150000 rows it wrote to OS in 20 minutes.

I am looking for a solution to
1) generate and write the 1GB file in OS in less than 10 minutes.
2) write the file to OS the results of a query
3) Find a solution when the file size become > 1GB

Please help.


Regards,
Atul Goel
This e-mail is confidential and should not be used by anyone who is not the original intended recipient. Global DataPoint Limited does not accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Global DataPoint Limited. No contracts may be concluded on behalf of Global DataPoint Limited by means of e-mail communication. Global DataPoint Limited Registered in England and Wales with registered number 3739752 Registered Office Middlesex House, 34-42 Cleveland Street, London W1T 4LB

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Craig Ringer
2010-07-13 12:26:10 UTC
Permalink
It's helpful if you can avoid replying to an existing post to make a new
thread. Your mail client puts "In-reply-to" headers in the message that
confuses threaded mail clients.

I've replied to this message properly in a new thread to avoid confusing
things too much.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
A***@globaldatapoint.com
2010-07-13 12:43:02 UTC
Permalink
Sure I will take care.

Regards,
Atul Goel


-----Original Message-----
From: Craig Ringer [mailto:***@postnewspapers.com.au]
Sent: 13 July 2010 13:26
To: Atul Goel
Cc: pgsql-***@postgresql.org
Subject: Re: [GENERAL] Writing XML files to Operating System

It's helpful if you can avoid replying to an existing post to make a new
thread. Your mail client puts "In-reply-to" headers in the message that
confuses threaded mail clients.

I've replied to this message properly in a new thread to avoid confusing
things too much.

--
Craig Ringer
This e-mail is confidential and should not be used by anyone who is not the original intended recipient. Global DataPoint Limited does not accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Global DataPoint Limited. No contracts may be concluded on behalf of Global DataPoint Limited by means of e-mail communication. Global DataPoint Limited Registered in England and Wales with registered number 3739752 Registered Office Middlesex House, 34-42 Cleveland Street, London W1T 4LB

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Martin Gainty
2010-07-18 14:39:31 UTC
Permalink
Good Morning



Admittedly oracle comes prepackaged with about 100 packaged goodies..reading and writing XML is the most useful.


I experienced similar headaches when moving a Oracle DB to mysql and discovered my 100 or so XML output scripts
could not be ported out of oracle..aggh!

The only solution i can think of you're going to have to backout all the XML routines out of the oracle DB

then create an XSD Xml Schema Document (apache axis contains about 12 of these you can use as your template)
then persist the individual simple and complex entities from the XSD structure to Entity Beans

Ultimately you will implemented your persistence with an available J2EE server such as Weblogic..Glassfish or tomcat (with chili beans)

When the webapp has finished processing you will eant to persist your entity beans to database and output the XML results to disk with FOP..or in my case write to disk and transmit to p/l


Looks like you've just added 12 line-items to the globaldatapoint bill!


Martin Gainty
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.

Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.





> From: ***@globaldatapoint.com
> To: pgsql-***@postgresql.org
> Subject: [GENERAL] Writing XML files to Operating System
> Date: Tue, 13 Jul 2010 12:08:59 +0000
>
> Hi All,
>
> We are a data based company and are migrating from Oracle to Postgresql. For this purpose I am doing POC for the same. We have a business requirement to send the Data in XML files to our clients. The file size of XMLs is around 700MB and is growing.
>
> I have been able to generate sample XML files and store the same(Still to test for full files).
>
> However I have the following problems
>
> 1) It took 35 minutes to store 1 GB data in the TEXT column
> (create table test_text_nearly_1_GB as select repeat('1', 1024*1024*1023):: text as nearly_1_GB)
>
> 2) It took 30 Minutes to write the same to OS using COPY command
> Copy test_text_nearly_1_GB to '$HOME/test_text_nearly_1_GB.xml'
>
> However when I tested to write a table of size 3GB with 150000 rows it wrote to OS in 20 minutes.
>
> I am looking for a solution to
> 1) generate and write the 1GB file in OS in less than 10 minutes.
> 2) write the file to OS the results of a query
> 3) Find a solution when the file size become > 1GB
>
> Please help.
>
>
> Regards,
> Atul Goel
> This e-mail is confidential and should not be used by anyone who is not the original intended recipient. Global DataPoint Limited does not accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Global DataPoint Limited. No contracts may be concluded on behalf of Global DataPoint Limited by means of e-mail communication. Global DataPoint Limited Registered in England and Wales with registered number 3739752 Registered Office Middlesex House, 34-42 Cleveland Street, London W1T 4LB
>
> --
> Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

_________________________________________________________________
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
Andras Fabian
2010-07-13 10:20:21 UTC
Permalink
I had another observation before your last mail:

I have compared the /proc/pid/sched Stats of a normal and a slow machine. And there were two counters that really sticked out:
- se.sleep_start
- se.block_start
On a normal machine, both counter remaind at 0 all the time while doing COPY-to-STDOUT. On a "slow" machine, those counters - but always only one of them - got some really big numbers while the process was stuck in congestion_wait.

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:***@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 11:01
An: Andras Fabian
Cc: Tom Lane; pgsql-***@postgresql.org
Betreff: Re: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/2010 4:05 PM, Andras Fabian wrote:
> Craig, thanks for that PS tip (you think, you have used PS for such a long time, but it still has some new tricks available).

> So, obviously, for some reason we are waiting too much for a backind_device ... which ever it is at the moment. Because, as I just wrote to Scott Marlowe, the disk system is almost idling (have seen disk utilization on the drive to which I write below 1%).

A quick search suggests that most calls into congestion_wait are in the
memory management subsystem, and are involved in situations where the
system is struggling for memory. However, that includes memory consumed
by I/O buffers, writeback for I/O, etc, so it'd also be consistent with
very slow I/O causing write throttling as the system tried to write
already buffered data to disk.

Most other calls are in file system drivers.

At this point I'd be taking a closer look at "vmstat 1" and "iostat 1"
output, plus "top", to see if any interesting clues about system-wide
issues turned up.

I'd also be trying to perform each step of the problem operation in
isolation as much as possible, so as to see if I could find out what
particular part was causing the slowdown. Comparing "\copy" to "COPY ...
TO STDOUT", invoking "COPY ... TO STDOUT" with a standalone backend
writing output to an on disk file and to /dev/null, etc.

> So, the question seems to be, why and where this "idling happens".

You can potentially find out more by getting a trace of the kernel
function call stack for the backend process. The kernel call stack of a
process at any given time can be obtained by reading /proc/$pid/stack .

This will tell you not only what call it's waiting in in the kernel, but
what function(s) called it, and even the code offset within each function.

> Just as a test, I have tried a very simple piping example (which should go trough STDOUT too ... or am I wrong).
> - "dd if=/dev/zero of=file_10GB bs=1024 count=10M" created a 10 GB test file on source drive (sdb)
> - "time cat file_10GB> /var/tmp/test2.dump" ... pipe the file to target drive (/var/tmp is on sda)

Isn't your issue suspected to be with network transfers over unix
sockets and/or tcp/ip, rather than with pipes?

Try "socat" if you want to test unix socket performance and/or tcp/ip
socket performance. It's an amazing sysadmin/network swiss army knife.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Greg Smith
2010-07-13 14:29:03 UTC
Permalink
Andras Fabian wrote:
> So the kernel function it is always idling on seems to be congestion_wait ...

Ugh, not that thing again. See
http://www.westnet.com/~gsmith/content/linux-pdflush.htm ; that chunk of
code has cost me weeks worth of "why isn't the kernel writing things the
way I asked it?" trouble in the past. I know the kernel developers have
been fiddling with pdflush again recently, they might have introduced
yet another bug into how it handles heavy write volume. You can reduce
dirty_ratio and dirty_background_ratio to try and improve things, but
the congestion code will thwart any attempt to make them really low.

You might monitor what shows up as "Dirty:" in /proc/meminfo to see if
that lines up with the slow periods; example of what bad output looks
like at
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
***@2ndQuadrant.com www.2ndQuadrant.us


--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andras Fabian
2010-07-13 14:35:16 UTC
Permalink
Hi Greg,

hmmm, thats true. Thos settings for example were much higher too (on the Ubuntu server), than on our old machine.
New machine has:
- dirty_ratio = 20 (old has 10)
- dirty_background_ratio = 10 (old has 5)

But obviously setting vm.zone_reclaim_mode=0 "fixes" the problem to (which was "1" on new machine and "0" on old). See my latest post to Craig.

I hope using vm.zone_reclaim_mode=0 doesn't have other dire consequences :-)

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Greg Smith [mailto:***@2ndquadrant.com]
Gesendet: Dienstag, 13. Juli 2010 16:29
An: Andras Fabian
Cc: Craig Ringer; Tom Lane; pgsql-***@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

Andras Fabian wrote:
> So the kernel function it is always idling on seems to be congestion_wait ...

Ugh, not that thing again. See
http://www.westnet.com/~gsmith/content/linux-pdflush.htm ; that chunk of
code has cost me weeks worth of "why isn't the kernel writing things the
way I asked it?" trouble in the past. I know the kernel developers have
been fiddling with pdflush again recently, they might have introduced
yet another bug into how it handles heavy write volume. You can reduce
dirty_ratio and dirty_background_ratio to try and improve things, but
the congestion code will thwart any attempt to make them really low.

You might monitor what shows up as "Dirty:" in /proc/meminfo to see if
that lines up with the slow periods; example of what bad output looks
like at
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
***@2ndQuadrant.com www.2ndQuadrant.us


--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Steve Clark
2010-07-13 14:40:05 UTC
Permalink
On 07/13/2010 10:35 AM, Andras Fabian wrote:
> Hi Greg,
>
> hmmm, thats true. Thos settings for example were much higher too (on the Ubuntu server), than on our old machine.
> New machine has:
> - dirty_ratio = 20 (old has 10)
> - dirty_background_ratio = 10 (old has 5)
>
> But obviously setting vm.zone_reclaim_mode=0 "fixes" the problem to (which was "1" on new machine and "0" on old). See my latest post to Craig.
>
> I hope using vm.zone_reclaim_mode=0 doesn't have other dire consequences :-)
It looks to me that vm.zone_reclaim_mode value is related to NUMA machines that
have "local" memory per node and shouldn't be used at all in your environment.

>
> Andras Fabian
>
> -----Ursprüngliche Nachricht-----
> Von: Greg Smith [mailto:***@2ndquadrant.com]
> Gesendet: Dienstag, 13. Juli 2010 16:29
> An: Andras Fabian
> Cc: Craig Ringer; Tom Lane; pgsql-***@postgresql.org
> Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??
>
> Andras Fabian wrote:
>> So the kernel function it is always idling on seems to be congestion_wait ...
>
> Ugh, not that thing again. See
> http://www.westnet.com/~gsmith/content/linux-pdflush.htm ; that chunk of
> code has cost me weeks worth of "why isn't the kernel writing things the
> way I asked it?" trouble in the past. I know the kernel developers have
> been fiddling with pdflush again recently, they might have introduced
> yet another bug into how it handles heavy write volume. You can reduce
> dirty_ratio and dirty_background_ratio to try and improve things, but
> the congestion code will thwart any attempt to make them really low.
>
> You might monitor what shows up as "Dirty:" in /proc/meminfo to see if
> that lines up with the slow periods; example of what bad output looks
> like at
> http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html
>


--
Stephen Clark
NetWolves
Sr. Software Engineer III
Phone: 813-579-3200
Fax: 813-882-0209
Email: ***@netwolves.com
www.netwolves.com


--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Craig Ringer
2010-07-13 12:52:37 UTC
Permalink
[ Er, oops. I mucked up replying on a new thread - ugh! Sorry, should've
left well alone. ]

Atul Goel wrote:

> > We are a data based company and are migrating from Oracle to
> > Postgresql. For this purpose I am doing POC for the same. We have a
> > business requirement to send the Data in XML files to our clients.
> > The file size of XMLs is around 700MB and is growing.
> >
> > I have been able to generate sample XML files and store the
> > same(Still to test for full files).
> >
> > However I have the following problems
> >
> > 1) It took 35 minutes to store 1 GB data in the TEXT column
> > (create table test_text_nearly_1_GB as select repeat('1',
> > 1024*1024*1023):: text as nearly_1_GB)

That's a fairly inefficient way to generate the data to insert. You'll
probably find that much of the time was spent in repeat() not anywhere
interesting to your query.

I'd be more interested in how long COPY ... FROM took, personally.

> > 2) It took 30 Minutes to write the same to OS using COPY command
> > Copy test_text_nearly_1_GB to '$HOME/test_text_nearly_1_GB.xml'

That's a write rate of 34MB/min, or half a meg a second. Not pretty.

Where's the load during the COPY? Mostly CPU? Or mostly disk I/O?

Are you writing the output to the same disk the database is on? (Not
that it should make this much difference).

> > 3) Find a solution when the file size become > 1GB
That's going to be interesting.

Do you really need to store whole XML documents this size in the
database, rather than broken up into structures that can be worked with
usefully in the database? If so, PostgreSQL might not be your best choice.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
A***@globaldatapoint.com
2010-07-13 12:54:54 UTC
Permalink
Atul Goel wrote:



> > We are a data based company and are migrating from Oracle to

> > Postgresql. For this purpose I am doing POC for the same. We have a

> > business requirement to send the Data in XML files to our clients.

> > The file size of XMLs is around 700MB and is growing.

> >

> > I have been able to generate sample XML files and store the

> > same(Still to test for full files).

> >

> > However I have the following problems

> >

> > 1) It took 35 minutes to store 1 GB data in the TEXT column (create

> > table test_text_nearly_1_GB as select repeat('1',

> > 1024*1024*1023):: text as nearly_1_GB)



That's a fairly inefficient way to generate the data to insert. You'll probably find that much of the time was spent in repeat() not anywhere interesting to your query.



I'd be more interested in how long COPY ... FROM took, personally.



> > 2) It took 30 Minutes to write the same to OS using COPY command

> > Copy test_text_nearly_1_GB to '$HOME/test_text_nearly_1_GB.xml'



That's a write rate of 34MB/min, or half a meg a second. Not pretty.



Where's the load during the COPY? Mostly CPU? Or mostly disk I/O?



Are you writing the output to the same disk the database is on? (Not that it should make this much difference).



> > 3) Find a solution when the file size become > 1GB

That's going to be interesting.



Do you really need to store whole XML documents this size in the database, rather than broken up into structures that can be worked with usefully in the database? If so, PostgreSQL might not be your best choice.



--

Craig Ringer




Hi All,



I am posting this request again with the correct subject.



We are a data based company and are migrating from Oracle to Postgresql. For this purpose I am doing POC for the same. We have a business requirement to send the Data in XML files to our clients. The file size of XMLs is around 700MB and is growing.



I have been able to generate sample XML files and store the same(Still to test for full files).



However I have the following problems



1) It took 35 minutes to store 1 GB data in the TEXT column (create table test_text_nearly_1_GB as select repeat('1', 1024*1024*1023):: text as nearly_1_GB)



2) It took 30 Minutes to write the same to OS using COPY command Copy test_text_nearly_1_GB to '$HOME/test_text_nearly_1_GB.xml'



However when I tested to write a table of size 3GB with 150000 rows it wrote to OS in 20 minutes.



I am looking for a solution to

1) generate and write the 1GB file in OS in less than 10 minutes.

2) write the file to OS the results of a query

3) Find a solution when the file size become > 1GB



Please help.


Atul Goel
SENIOR DEVELOPER

Global DataPoint
Middlesex House, 34-42 Cleveland Street
London W1T 4LB, UK
T: +44 (0)20 7079 4827
M: +44 (0)7846765098
www.globaldatapoint.com<http://www.globaldatapoint.com/>

This e-mail is confidential and should not be used by anyone who is not the original intended recipient. Global DataPoint Limited does not accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Global DataPoint Limited. No contracts may be concluded on behalf of Global DataPoint Limited by means of e-mail communication. Global DataPoint Limited Registered in England and Wales with registered number 3739752 Registered Office Middlesex House, 34-42 Cleveland Street, London W1T 4LB
Loading...