m***@kset.org
2014-10-18 08:46:10 UTC
Hi,
Yesterday I was trying to calculate by hand the row number estimates using
the examples from
http://www.postgresql.org/docs/9.3/static/row-estimation-examples.html
It is usually correct, except when you try to get an estimate near the
first (or last) histogram bound. Let me demonstrate with an example:
create table a(
id serial not null
);
insert into a(id) (select i from generate_series(1,1000000) i);
analyse a;
select histogram_bounds
from pg_stats
where tablename = 'a';
-- "{38,9399,19852,...}"
explain analyse
select * from a where id <= 2000;
-- Expected 2095.929, got 2096. This is OK.
explain analyse
select * from a where id <= 50;
-- Expected 12.819, got 100.
explain analyse
select * from a where id <= 10;
-- Expected -29.911, or better 1, got 100. The estimate should not be < 0,
so I was expecting at least 1.
It seems that when the selectivity calculated using the histogram is lower
than 0.0001 or higher than 0.9999 it is capped to this values. Looking at
the code in selfuncs.c it seems that is doing just that.
The thing that intrigued me was what happened when I added an index
(primary key) to the column.
alter table a
add primary key (id);
explain analyse
select * from a where id <= 2000;
-- expected 2096, got 2127
explain analyse
select * from a where id <= 50;
-- expected 100, got 52
explain analyse
select * from a where id <= 10;
-- expected 100, got 10
As no statistics are collected for the index (as stated in the
pg_statistic documentation "No entry is made for an ordinary non-expression
index column, however, since it would be redundant with the entry for the
underlying table column.") it is mystery to me how these estimates are
calculated. I tried to look at the code (my guess was that I could find it
in btcostestimate) but I wasn't able to figure it out.
Can somebody explain how are the estimates calculated when there is an
index involved?
Regards,
Mladen Marinović
Yesterday I was trying to calculate by hand the row number estimates using
the examples from
http://www.postgresql.org/docs/9.3/static/row-estimation-examples.html
It is usually correct, except when you try to get an estimate near the
first (or last) histogram bound. Let me demonstrate with an example:
create table a(
id serial not null
);
insert into a(id) (select i from generate_series(1,1000000) i);
analyse a;
select histogram_bounds
from pg_stats
where tablename = 'a';
-- "{38,9399,19852,...}"
explain analyse
select * from a where id <= 2000;
-- Expected 2095.929, got 2096. This is OK.
explain analyse
select * from a where id <= 50;
-- Expected 12.819, got 100.
explain analyse
select * from a where id <= 10;
-- Expected -29.911, or better 1, got 100. The estimate should not be < 0,
so I was expecting at least 1.
It seems that when the selectivity calculated using the histogram is lower
than 0.0001 or higher than 0.9999 it is capped to this values. Looking at
the code in selfuncs.c it seems that is doing just that.
The thing that intrigued me was what happened when I added an index
(primary key) to the column.
alter table a
add primary key (id);
explain analyse
select * from a where id <= 2000;
-- expected 2096, got 2127
explain analyse
select * from a where id <= 50;
-- expected 100, got 52
explain analyse
select * from a where id <= 10;
-- expected 100, got 10
As no statistics are collected for the index (as stated in the
pg_statistic documentation "No entry is made for an ordinary non-expression
index column, however, since it would be redundant with the entry for the
underlying table column.") it is mystery to me how these estimates are
calculated. I tried to look at the code (my guess was that I could find it
in btcostestimate) but I wasn't able to figure it out.
Can somebody explain how are the estimates calculated when there is an
index involved?
Regards,
Mladen Marinović
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general