Discussion:
Question about row_number() ordering semantics
Fred Jonsson
2014-09-23 20:46:56 UTC
Permalink
Hey everyone,

As I was playing around with `row_number()`s for cursor-based pagination, I
came across some ordering behavior that I didn't expect.

In particular, when I order in a way where multiple rows compete for the
same position in the result set (i.e., rows that are equivalent in terms of
the requested order), it appears that the `row_number()` column may be out
of order.

In the case of the following example query, the ordering in the OVER clause
is the opposite of the ordering of the rows, so I expected that the row
numbers would effectively be reversed. However, the example shows that the
for the first three values, the `row_number` rows are instead non-ordered.

=> SELECT
-> number,
-> row_number() OVER (ORDER BY number DESC) AS row_number
-> FROM generate_series(0, 20) number
-> INNER JOIN generate_series(2, 4) divisor
-> ON mod(number, divisor) = 0
-> ORDER BY number ASC;

number | row_number
--------+------------
0 | 22
0 | 24
0 | 23
2 | 21
3 | 20
4 | 18
4 | 19
6 | 17
6 | 16
8 | 15
8 | 14
9 | 13
10 | 12
12 | 11
12 | 10
12 | 9
14 | 8
15 | 7
16 | 6
16 | 5
18 | 4
18 | 3
20 | 2
20 | 1
(24 rows)

Curiously, if I set the row_number column to be `OVER (ORDER BY number
ASC)`, the column is returned in order.

Is this behavior to be expected, and is it safe to assume that the
out-of-order row numbers can only exist within the set of rows that have
the exact same ordering? In other words, can I expect that row numbers
being out of order would never happen between two rows that do not have
ordinal equivalence?

Best,
Fred

(If so, in these scenarios I could safely order by row_number as secondary
ordering without losing the correct primary sorting order. Of course, for
cursor-based pagination this will lead to some unpredictability in terms of
where new rows might appear in the result set. I'll probably choose to
rather impose more strict ordering based on other constant but variable
columns such as a creation timestamp.)
Tom Lane
2014-09-24 16:27:34 UTC
Permalink
Post by Fred Jonsson
As I was playing around with `row_number()`s for cursor-based pagination, I
came across some ordering behavior that I didn't expect.
In particular, when I order in a way where multiple rows compete for the
same position in the result set (i.e., rows that are equivalent in terms of
the requested order), it appears that the `row_number()` column may be out
of order.
If you remove the outer ORDER BY then you'll get results that make sense.
The problem is that the outer ORDER BY re-sorts the rows after the window
function has been applied, and it is not required to do anything
deterministic with rows having equal sort keys. IIRC, Postgres will
typically use a quicksort for small numbers of rows, and that doesn't
promise anything about the order in which equal keys are emitted.
Post by Fred Jonsson
Curiously, if I set the row_number column to be `OVER (ORDER BY number
ASC)`, the column is returned in order.
Yeah --- the planner is smart enough to not sort by the same condition
twice. But "number DESC" and "number ASC" are not the same sort
condition.

In short, if you want predictable results you need a unique sort key.
I think "ORDER BY number ASC, row_number" (or maybe row_number DESC
is what you want) is the most convenient solution for this example.

regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Continue reading on narkive:
Loading...