Why can't I select un-grouped columns when grouping by a (non-primary) unique key?

Discussion:

Daniel Lenski

2014-09-24 16:04:21 UTC

If I include the primary key of a table in my GROUP BY clause, PG 9.3
allows me to refer to other columns of that table without explicit GROUP BY:

CREATE TABLE A (id SERIAL PRIMARY KEY, name TEXT UNIQUE NOT NULL,
document JSON);

-- this works fine
SELECT A.document
FROM A
GROUP BY A.primary_key

Why doesn't the same thing work with a non-NULL unique constraint?

-- ERROR: column "A.document" must appear in the GROUP BY clause or be
used in an aggregate function
SELECT A.document
FROM A
GROUP BY A.name

I got thinking about this distinction because I wrote some very ugly SQL in
a few cases, to get around the lack of JSON comparison operators in PG 9.3,
before I discovered that it would work if I used the PRIMARY KEY instead:

-- this works but it's ugly
SELECT A.document::text::json
FROM table
GROUP BY A.non_null_unique_key, A.document::text

The manual refers to this situation (
http://www.postgresql.org/docs/9.3/static/sql-select.html#SQL-GROUPBY), but
I don't understand whether there's a specific reason to distinguish primary
keys from non-NULL unique constraints.

Thanks,
Dan Lenski

Alberto Cabello Sánchez

2014-09-24 17:37:10 UTC

Permalink

On Wed, 24 Sep 2014 09:04:21 -0700

Post by Daniel Lenski
If I include the primary key of a table in my GROUP BY clause, PG 9.3
Why doesn't the same thing work with a non-NULL unique constraint?

At first sight, primary key means no grouping at all, as there are no
duplicated A.primary_key values:

SELECT A.document
FROM A
GROUP BY A.primary_key

is the same as

SELECT A.document
FROM A

--
Alberto Cabello Sánchez
<***@unex.es>
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Geoff Montee

2014-09-24 17:46:48 UTC

Permalink

Post by Alberto Cabello SÃ¡nchez
On Wed, 24 Sep 2014 09:04:21 -0700

Post by Daniel Lenski
If I include the primary key of a table in my GROUP BY clause, PG 9.3
Why doesn't the same thing work with a non-NULL unique constraint?

At first sight, primary key means no grouping at all, as there are no
SELECT A.document
FROM A
GROUP BY A.primary_key
is the same as
SELECT A.document
FROM A

I believe this blog post contains better examples of the feature he's
referring to:

http://www.depesz.com/2010/08/08/waiting-for-9-1-recognize-functional-dependency-on-primary-keys/

For example:

SELECT
p.id,
p.firstname,
p.lastname,
count(*)
FROM
people p
JOIN visits v on p.id = v.person_id
GROUP BY p.id;

Geoff

Daniel Lenski

2014-09-24 18:06:21 UTC

Permalink

Post by Geoff Montee
I believe this blog post contains better examples of the feature he's
http://www.depesz.com/2010/08/08/waiting-for-9-1-recognize-functional-dependency-on-primary-keys/
SELECT
p.id,
p.firstname,
p.lastname,
count(*)
FROM
people p
JOIN visits v on p.id = v.person_id
GROUP BY p.id;

Geoff, that's exactly the feature I'm referring to. I see that the
inclusion of UNIQUE NOT NULL constraints was recognized as a logical
next step when this feature was introduced.

Now that I understand PG's current behavior, it doesn't seem like a
huge limitation... but I'm curious about what is preventing the UNIQUE
NOT NULL constraints from being allowed as well. Is there something
different about the internal representation of UNIQUE NOT NULL
constraints compared to PRIMARY KEY constraints?

Thanks,
Dan

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom Lane

2014-09-24 18:19:10 UTC

Permalink

Post by Daniel Lenski
Now that I understand PG's current behavior, it doesn't seem like a
huge limitation... but I'm curious about what is preventing the UNIQUE
NOT NULL constraints from being allowed as well. Is there something
different about the internal representation of UNIQUE NOT NULL
constraints compared to PRIMARY KEY constraints?

The comments for check_functional_grouping() explain where the holdup is:

* Determine whether a relation can be proven functionally dependent on
* a set of grouping columns. If so, return TRUE and add the pg_constraint
* OIDs of the constraints needed for the proof to the *constraintDeps list.
*
* grouping_columns is a list of grouping expressions, in which columns of
* the rel of interest are Vars with the indicated varno/varlevelsup.
*
* Currently we only check to see if the rel has a primary key that is a
* subset of the grouping_columns. We could also use plain unique constraints
* if all their columns are known not null, but there's a problem: we need
* to be able to represent the not-null-ness as part of the constraints added
* to *constraintDeps. FIXME whenever not-null constraints get represented
* in pg_constraint.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Daniel Lenski

2014-09-24 18:17:20 UTC

Permalink

On Wed, Sep 24, 2014 at 10:37 AM, Alberto Cabello Sánchez

Post by Alberto Cabello SÃ¡nchez
At first sight, primary key means no grouping at all, as there are no
SELECT A.document
FROM A
GROUP BY A.primary_key
is the same as
SELECT A.document
FROM A

Yes, my example is oversimplified, because GROUP BY primary_key has no
semantic effect on a single-table query, although it still illustrates
the different behavior for PRIMARY KEY vs. UNIQUE NOT NULL, which
should be logically equivalent.

The examples that Geoff Montee gave are better because (GROUP BY
primary_key) does change the meaning of the query when there's a join
to another table.

Dan

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general