free RAM not being used for page cache

Discussion:

Kevin Goess

2014-07-30 17:51:14 UTC

.html

Is that expected? Is there a setting we need to tune for that? We have
400GB of databases on this box, so I know it's not all fitting in that
49.89GB.

Merlin Moncure

2014-07-30 18:49:32 UTC

Permalink

A couple months ago we upgraded the RAM on our database servers from 48GB to
64GB. Immediately afterwards the new RAM was being used for page cache,
which is what we want, but that seems to have dropped off over time, and
there's currently actually like 12GB of totally unused RAM.
http://s76.photobucket.com/user/kgoesspb/media/db1-mem-historical.png.html
Is that expected? Is there a setting we need to tune for that? We have
400GB of databases on this box, so I know it's not all fitting in that
49.89GB.

could be a numa issue. Take a look at:
http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html

merlin

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Kevin Goess

2014-07-30 18:57:43 UTC

Permalink

Good suggestion, but nope, that ain't it:

$ cat /proc/sys/vm/zone_reclaim_mode
0

Post by Kevin Goess
A couple months ago we upgraded the RAM on our database servers from

48GB to

Post by Kevin Goess
64GB. Immediately afterwards the new RAM was being used for page cache,
which is what we want, but that seems to have dropped off over time, and
there's currently actually like 12GB of totally unused RAM.

http://s76.photobucket.com/user/kgoesspb/media/db1-mem-historical.png.html

Post by Kevin Goess
Is that expected? Is there a setting we need to tune for that? We have
400GB of databases on this box, so I know it's not all fitting in that
49.89GB.

http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html
merlin

--
Kevin M. Goess
Software Engineer
Berkeley Electronic Press
***@bepress.com

510-665-1200 x179
www.bepress.com

bepress: sustainable scholarly publishing

Scott Marlowe

2014-07-30 19:07:12 UTC

Permalink

Post by Kevin Goess

Post by Merlin Moncure

http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html
merlin

$ cat /proc/sys/vm/zone_reclaim_mode
0

Could it just be your dataset isn't any bigger than what's being used?

--
To understand recursion, one must first understand recursion.
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Kevin Grittner

2014-07-30 19:05:18 UTC

Permalink

Post by Merlin Moncure

could be a numa issue.

I was thinking the same thing.

The other thought was that it could be a usage pattern and/or
monitoring issue. When there are transient requests for large
amounts of memory, it will discard cache to satisfy those (e.g.,
work_mem or maintenance_work_mem allocations). If the *active*
portion of the database is not as big as RAM, it might not refill
right away. This could be compounded on your monitoring graphs if
they summarize by taking the *average* RAM usage for an interval
rather than the *maximum* usage for that interval. Intermittent
spikes in usage could make it look like the RAM is unused if you
are averaging; personally, I would prefer to use maximum for a
metric like this. Many monitoring systems allow you to choose.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Scott Marlowe

2014-07-30 19:21:28 UTC

Permalink

Post by Kevin Grittner

Post by Merlin Moncure

could be a numa issue.

I was thinking the same thing.
The other thought was that it could be a usage pattern and/or
monitoring issue. When there are transient requests for large
amounts of memory, it will discard cache to satisfy those (e.g.,
work_mem or maintenance_work_mem allocations). If the *active*
portion of the database is not as big as RAM, it might not refill
right away. This could be compounded on your monitoring graphs if
they summarize by taking the *average* RAM usage for an interval
rather than the *maximum* usage for that interval. Intermittent
spikes in usage could make it look like the RAM is unused if you
are averaging; personally, I would prefer to use maximum for a
metric like this. Many monitoring systems allow you to choose.

In fact, looking at the png he attached, I'd bet they cranked up
work_mem and / or connections sometime around the end of January and
that's what we're seeing here. More memory used for sorts etc, less
left for caching.

Shaun Thomas

2014-08-05 15:27:17 UTC

Permalink

Post by Kevin Goess
A couple months ago we upgraded the RAM on our database servers from
48GB to 64GB. Immediately afterwards the new RAM was being used for
page cache, which is what we want, but that seems to have dropped off
over time, and there's currently actually like 12GB of totally unused RAM.

What version of the Linux kernel are you using? We had exactly this
problem when we were on 3.2. We've since moved to 3.8 and that solved
this issue, along with a few others.

If you're having the same problem, this is not a NUMA issue or in any
way related to zone_reclaim_mode. The memory page aging algorithm in pre
3.7 is simply broken, judging by the traffic on the Linux Kernel Mailing
List (LKML).

I hate to keep beating this drum, but anyone using 3.2 (default for a
few Linux distributions) needs to stop using 3.2; it's hideously broken.
--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
***@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Kevin Goess

2014-09-04 00:17:36 UTC

Permalink

Post by Kevin Goess
A couple months ago we upgraded the RAM on our database servers from

Post by Kevin Goess
48GB to 64GB. Immediately afterwards the new RAM was being used for
page cache, which is what we want, but that seems to have dropped off
over time, and there's currently actually like 12GB of totally unused RAM.

What version of the Linux kernel are you using? We had exactly this
problem when we were on 3.2. We've since moved to 3.8 and that solved this
issue, along with a few others.

Debian squeeze, still on 2.6.32.

Post by Kevin Goess
If you're having the same problem, this is not a NUMA issue or in any way
related to zone_reclaim_mode. The memory page aging algorithm in pre 3.7 is
simply broken, judging by the traffic on the Linux Kernel Mailing List
(LKML).

Darn, really? I just learned about the "mysql swap insanity" problem and
noticed that all the free memory is concentrated on one of the two nodes.

$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6
node 0 size: 32768 MB
node 0 free: 9105 MB
node 1 cpus: 1 3 5 7
node 1 size: 32755 MB
node 1 free: 259 MB

$ free
total used free shared buffers cached
Mem: 66099280 56565804 9533476 0 11548 51788624

I haven't been able to get any traction on what that means yet though.
--
Kevin M. Goess
Software Engineer
Berkeley Electronic Press
***@bepress.com

510-665-1200 x179
www.bepress.com

bepress: sustainable scholarly publishing

Shaun Thomas

2014-09-04 14:44:48 UTC

Permalink

Post by Kevin Goess
Debian squeeze, still on 2.6.32.

Interesting. Unfortunately that kernel suffers from the newer task
scheduler they added to 3.2, and I doubt much of the fixes have been
back-ported. I don't know if that affects the memory handling, but it might.

Post by Kevin Goess
Darn, really? I just learned about the "mysql swap insanity" problem and
noticed that all the free memory is concentrated on one of the two nodes.
$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6
node 0 size: 32768 MB
node 0 free: 9105 MB
node 1 cpus: 1 3 5 7
node 1 size: 32755 MB
node 1 free: 259 MB

And that's the kind of behavior we were seeing until we upgraded to 3.8.
A 8GB gap between your nodes is definitely bad, but it's not the same
thing they described in the MySQL swap insanity posts. MySQL has a much
bigger internal cache than we do, so expects a good proportion of system
memory. It's not uncommon for dedicated MySQL systems to have more than
75% of system memory dedicated to database use. Without NUMA
interleaving, that's a recipe for a broken system.

Post by Kevin Goess
$ free
total used free shared buffers cached
Mem: 66099280 56565804 9533476 0 11548 51788624

And again, this is what we started seeing with 3.2 when we upgraded
initially. Unfortunately it looks like at least one of the bad memory
aging patches got backported to the kernel you're using. If everything
were working properly, that excess 9GB would be in your cache.

Check /proc/meminfo for a better breakdown of how the memory is being
used. This should work:

grep -A1 Active /proc/meminfo

I suspect your inactive file cache is larger than the active set,
suggesting an overly aggressive memory manager.
--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
***@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Kevin Goess

2014-09-04 18:08:28 UTC

Permalink

This is a super-interesting topic, thanks for all the info.

Post by Shaun Thomas
Check /proc/meminfo for a better breakdown of how the memory is being
grep -A1 Active /proc/meminfo
I suspect your inactive file cache is larger than the active set,
suggesting an overly aggressive memory manager.

$ grep -A1 Active /proc/meminfo
Active: 34393512 kB
Inactive: 20765832 kB
Active(anon): 13761028 kB
Inactive(anon): 890688 kB
Active(file): 20632484 kB
Inactive(file): 19875144 kB

The inactive set isn't larger than the active set, they're about even, but
I'm still reading that as the memory manager being aggressive in marking
pages as inactive, is that what it says to you too?

Interestingly, I just looked at the memory graph for our standby backup
database, and while it *normally* uses all the available RAM as the page
cache, which is what I'd expect to see, when it was the active database for
a time in April and May, the page cache size was reduced by about the same
margin. So it's the act of running an active postgres instance that causes
the phenomenon.

Loading Image...

.html
--
Kevin M. Goess
Software Engineer
Berkeley Electronic Press
***@bepress.com

510-665-1200 x179
www.bepress.com

bepress: sustainable scholarly publishing