Discussion:
Crash in 9.4 Beta when partially collapsing left outer joins
l***@benjamindsmith.com
2014-09-08 21:36:01 UTC
Permalink
I think this is the first time I've ever reported a PG crash, which is notable
since I've been using PG for over 10 years. ;)

Using the 9.4 Beta RPMs on CentOS 6.X/64, we're experiencing a reproducible
crash when running a query with a left outer join, partially collapsed.

TRAP: FailedAssertion("!(!restriction_is_or_clause((RestrictInfo *) orarg))",
File: "indxpath.c", Line: 1213)
< 2014-09-08 14:21:33.179 PDT >LOG: server process (PID 19957) was terminated
by signal 6: Aborted
< 2014-09-08 14:21:33.179 PDT >DETAIL: Failed process was running: SELECT
students.id
FROM students
LEFT OUTER JOIN enrollments ON
(
enrollments.students_id = students.id
)
WHERE
(
students.id = 5008
OR
(
(
students.birthcity = 'Chico'
OR students.birthcity IS NULL
)
AND enrollments.start < 20141219
)
);
< 2014-09-08 14:21:33.179 PDT >LOG: terminating any other active server
processes
< 2014-09-08 14:21:33.179 PDT >WARNING: terminating connection because of
crash of another server process
< 2014-09-08 14:21:33.179 PDT >DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.


Here's the smallest query I could find that it would crash on. Run on a blank
database, the following will reliably crash postgres:

CREATE TABLE students
(id SERIAL PRIMARY KEY,
birthcity VARCHAR DEFAULT NULL);

CREATE TABLE enrollments
(students_id INTEGER NOT NULL REFERENCES students(id),
start INTEGER);

SELECT
students.id
FROM students
LEFT OUTER JOIN enrollments ON
(
enrollments.students_id = students.id
)
WHERE
(
students.id = 5008
OR
(
(
students.birthcity = 'Chico'
OR students.birthcity IS NULL
)
AND enrollments.start < 20141219
)
);

-----------------------
Other environment stuff:

[***@db1 pgsql]# rpm -qa | grep postg
postgresql94-libs-9.4beta2-1PGDG.rhel6.x86_64
postgresql94-server-9.4beta2-1PGDG.rhel6.x86_64
postgresql94-devel-9.4beta2-1PGDG.rhel6.x86_64
postgresql92-libs-9.2.9-1PGDG.rhel6.x86_64
postgresql94-9.4beta2-1PGDG.rhel6.x86_64
postgresql94-contrib-9.4beta2-1PGDG.rhel6.x86_64

[***@db1 pgsql]# uname -a
Linux db1.schoolpathways.com 2.6.32-431.23.3.el6.x86_64 #1 SMP Thu Jul 31
17:20:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux


What other information should I provide? We have the machine available if
necessary.

Benjamin Smith
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Tom Lane
2014-09-09 03:58:08 UTC
Permalink
Post by l***@benjamindsmith.com
Using the 9.4 Beta RPMs on CentOS 6.X/64, we're experiencing a reproducible
crash when running a query with a left outer join, partially collapsed.
The test case crashes as described for me. Will take a look tomorrow.
Thanks for the report!

regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Michael Paquier
2014-09-09 05:43:50 UTC
Permalink
Post by l***@benjamindsmith.com
What other information should I provide? We have the machine available if
necessary.
This can be reproduced without especially LEFT OUTER JOIN, and system
crashes as long as index path is taken in planner, and that WHERE
clause uses a given combination of OR and AND like the one in the
query given. Here is a more simple example:
create table aa (a int);
create index aai on aa(a);
select a1.a from aa a1, aa a2 where a1.a = 0 or (a1.a = 0 or a1.a = 1)
and a2.a = 0;
Some bisecting is showing as well that the commit at the origin of the
regression is f343a88.
Regards,
--
Michael
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Michael Paquier
2014-09-09 07:50:40 UTC
Permalink
On Tue, Sep 9, 2014 at 2:43 PM, Michael Paquier
Post by Michael Paquier
Some bisecting is showing as well that the commit at the origin of the
regression is f343a88.
The failure is caused by an assertion not happy since this commit:
frame #4: 0x0000000101d20670
postgres`generate_bitmap_or_paths(root=0x00007fd61d004d48,
rel=0x00007fd61c033a58, clauses=0x00007fd61d010200,
other_clauses=0x0000000000000000) + 480 at indxpath.c:1213
frame #5: 0x0000000101d1fc37
postgres`create_index_paths(root=0x00007fd61d004d48,
rel=0x00007fd61c033a58) + 1255 at indxpath.c:314
frame #6: 0x0000000101d1146b
postgres`set_plain_rel_pathlist(root=0x00007fd61d004d48,
rel=0x00007fd61c033a58, rte=0x00007fd61c033c88) + 75 at allpaths.c:397

While reading the code of this commit, I noticed that
extract_or_clause has added some logic for nested OR clauses: it
extracts their content and adds them directly to the list of
subclauses that are then used by generate_bitmap_or_paths, triggering
the assertion failure reported by the trace above.
The logic for nested OR is correct by reading it, hence why not simply
removing the assertion failing? The attached patch 1 does so.

Another approach would consist in removing the nested OR part and keep
the old assertion logic, like in the patch 2 attached, but this seems
like a no-go as f343a88 has actually improved nested OR tracking.
Thoughts?
Note: I added as well a regression tests in patch 1 as this is IMO the
correct approach, if that's considered as correct of course :)
--
Michael
Tom Lane
2014-09-09 13:25:07 UTC
Permalink
Post by Michael Paquier
The logic for nested OR is correct by reading it, hence why not simply
removing the assertion failing? The attached patch 1 does so.
The reason for the assert is that there should never be an OR directly
underneath an OR in the planner after eval_const_expressions has flattened
such cases. Evidently commit f343a88 failed to preserve AND/OR flatness
in some cases :-(. That code should be taught to do so, rather than
lobotomizing this assertion. Lack of flatness causes optimization
inefficiencies, which is why we don't want to just allow it.

regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Michael Paquier
2014-09-10 00:19:56 UTC
Permalink
Post by Tom Lane
Post by Michael Paquier
The logic for nested OR is correct by reading it, hence why not simply
removing the assertion failing? The attached patch 1 does so.
The reason for the assert is that there should never be an OR directly
underneath an OR in the planner after eval_const_expressions has flattened
such cases. Evidently commit f343a88 failed to preserve AND/OR flatness
in some cases :-(. That code should be taught to do so, rather than
lobotomizing this assertion. Lack of flatness causes optimization
inefficiencies, which is why we don't want to just allow it.
Ah, OK, I just saw your commit. so the trick is to add the arguments
of subclause in case of an OR clause found to have a correct
flattening here... Thanks!
--
Michael
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Tom Lane
2014-09-10 03:07:08 UTC
Permalink
Post by Michael Paquier
Post by Tom Lane
The reason for the assert is that there should never be an OR directly
underneath an OR in the planner after eval_const_expressions has flattened
such cases. Evidently commit f343a88 failed to preserve AND/OR flatness
in some cases :-(. That code should be taught to do so, rather than
lobotomizing this assertion. Lack of flatness causes optimization
inefficiencies, which is why we don't want to just allow it.
Ah, OK, I just saw your commit. so the trick is to add the arguments
of subclause in case of an OR clause found to have a correct
flattening here... Thanks!
Right. If you look again at that code in orclauses.c, you'll notice that
it is itself assuming AND/OR flatness in its input. We could discard
that assumption, but it would just mean moving complexity from the places
that currently have to preserve flatness to other places. For instance,
right now we suppose that all "top level" WHERE clauses are in the
top-level AND list ... if we had to check for sub-AND clauses and recurse
into those, it would make life complicated in numerous places.

I do wonder, having seen this bug, if there's someplace we could add
assertions to check for AND/OR flatness that'd be more certainly hit
by a violation.

regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Loading...