Discussion:
COPY data into a table with a SERIAL column?
Steve Wampler
2014-10-16 16:33:02 UTC
Permalink
Hi,

This is with Postgresql 9.3.5.

I'm looking at using a COPY command (via jdbc) to do bulk inserts into a table that
includes a BIGSERIAL column. Is there a way to mark the data in that
column so it gets assigned a new value on entry - akin to the use of 'default'
in an INSERT? Some of the rows have values for the serial column, others
don't.

Or is the only way to use COPY for this task:

COPY table_name (columnnameA, columnnameB, columnnameD) FROM source;

where the serial column name is omitted? This wouldn't preserve the values
for the serial column on rows that have one already.

Thanks!
Steve
--
Steve Wampler -- ***@noao.edu
The gods that smiled on your birth are now laughing out loud.

--
This mail was scanned by a trial version of BitDefender
For more information please visit http://www.bitdefender.com/
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Rob Sargent
2014-10-16 16:42:51 UTC
Permalink
Post by Steve Wampler
Hi,
This is with Postgresql 9.3.5.
I'm looking at using a COPY command (via jdbc) to do bulk inserts into a table that
includes a BIGSERIAL column. Is there a way to mark the data in that
column so it gets assigned a new value on entry - akin to the use of 'default'
in an INSERT? Some of the rows have values for the serial column, others
don't.
COPY table_name (columnnameA, columnnameB, columnnameD) FROM source;
where the serial column name is omitted? This wouldn't preserve the values
for the serial column on rows that have one already.
Thanks!
Steve
Doesn't this guarantee collision at some point?

I might add a column to the target table which would contain the
"foreign" serial id and give all records the "local" serial. Update
local to foreign iff safe and desired.
Steve Wampler
2014-10-16 17:04:03 UTC
Permalink
Post by Rob Sargent
Post by Steve Wampler
This is with Postgresql 9.3.5.
I'm looking at using a COPY command (via jdbc) to do bulk inserts into a table that
includes a BIGSERIAL column. Is there a way to mark the data in that
column so it gets assigned a new value on entry - akin to the use of 'default'
in an INSERT? Some of the rows have values for the serial column, others
don't.
COPY table_name (columnnameA, columnnameB, columnnameD) FROM source;
where the serial column name is omitted? This wouldn't preserve the values
for the serial column on rows that have one already.
Doesn't this guarantee collision at some point?
Depends - without the UNIQUE tag on that column it shouldn't matter.
Or, with a bigserial there's a lot of room to play with. The rows with existing
serial values might all have negative values for that column, for example.
Post by Rob Sargent
I might add a column to the target table which would contain the "foreign" serial id and give all records the "local"
serial. Update local to foreign iff safe and desired.
I don't think this addresses the problem of having entry rows with no serial column in them.

Let me generalize the problem a bit: How can I specify that the default value of a column
is to be used with a COPY command when some rows have values for that column and
some don't?
--
Steve Wampler -- ***@noao.edu
The gods that smiled on your birth are now laughing out loud.

--
This mail was scanned by a trial version of BitDefender
For more information please visit http://www.bitdefender.com/
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
David G Johnston
2014-10-16 17:38:15 UTC
Permalink
Post by Steve Wampler
Let me generalize the problem a bit: How can I specify that the default value of a column
is to be used with a COPY command when some rows have values for that column and
some don't?
If you provide a value for a column, including NULL, the default expression
is not evaluated.

COPY is dumb but fast. If you need logic you need to add it yourself.
Either before the copy or copy into a temporary UNLOGGED table and write
smart SQL to migrate from that to the live table.

You can also put smarts into a trigger.

Personally I would generally stage all the data then write two INSERT INTO
... SELECT statements; one for the known values and one where you omit the
column and let the system use the default.

David J.





--
View this message in context: http://postgresql.1045698.n5.nabble.com/COPY-data-into-a-table-with-a-SERIAL-column-tp5823278p5823291.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Rob Sargent
2014-10-16 17:44:47 UTC
Permalink
Post by David G Johnston
Post by Steve Wampler
Let me generalize the problem a bit: How can I specify that the default
value of a column
is to be used with a COPY command when some rows have values for that column and
some don't?
If you provide a value for a column, including NULL, the default expression
is not evaluated.
COPY is dumb but fast. If you need logic you need to add it yourself.
Either before the copy or copy into a temporary UNLOGGED table and write
smart SQL to migrate from that to the live table.
You can also put smarts into a trigger.
Personally I would generally stage all the data then write two INSERT INTO
... SELECT statements; one for the known values and one where you omit the
column and let the system use the default.
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/COPY-data-into-a-table-with-a-SERIAL-column-tp5823278p5823291.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Yeah, part three of my test proves his point:

postgres=# insert into t (id, name) values(null, 'rjs');
ERROR: null value in column "id" violates not-null constraint
DETAIL: Failing row contains (null, rjs).
Steve Wampler
2014-10-16 18:17:45 UTC
Permalink
Post by David G Johnston
COPY is dumb but fast. If you need logic you need to add it yourself.
Either before the copy or copy into a temporary UNLOGGED table and write
smart SQL to migrate from that to the live table.
You can also put smarts into a trigger.
Never thought about a trigger on a COPY before. I'll look into that and
see what the hit is.

I was kinda hoping there was the equivalent of \N for indicating the use
of a default value instead of a null value, but I accept that such a thing
might be too expensive for COPY's goal in life.

Maybe the best approach is to switch to a batched insert, which might be
fast enough for my needs.

Thanks for the suggestions!
--
Steve Wampler -- ***@noao.edu
The gods that smiled on your birth are now laughing out loud.

--
This mail was scanned by a trial version of BitDefender
For more information please visit http://www.bitdefender.com/
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Adrian Klaver
2014-10-16 20:22:54 UTC
Permalink
Post by Steve Wampler
Post by David G Johnston
COPY is dumb but fast. If you need logic you need to add it yourself.
Either before the copy or copy into a temporary UNLOGGED table and write
smart SQL to migrate from that to the live table.
You can also put smarts into a trigger.
Never thought about a trigger on a COPY before. I'll look into that and
see what the hit is.
I was kinda hoping there was the equivalent of \N for indicating the use
of a default value instead of a null value, but I accept that such a thing
might be too expensive for COPY's goal in life.
Maybe the best approach is to switch to a batched insert, which might be
fast enough for my needs.
Might want to take a look at pg_bulkload:

http://pgbulkload.projects.pgfoundry.org/pg_bulkload.html

in particular its FILTER function:

http://pgbulkload.projects.pgfoundry.org/pg_bulkload.html#filter
Post by Steve Wampler
Thanks for the suggestions!
--
Adrian Klaver
***@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Rob Sargent
2014-10-16 17:42:45 UTC
Permalink
Post by Steve Wampler
Post by Rob Sargent
Post by Steve Wampler
This is with Postgresql 9.3.5.
I'm looking at using a COPY command (via jdbc) to do bulk inserts into a table that
includes a BIGSERIAL column. Is there a way to mark the data in that
column so it gets assigned a new value on entry - akin to the use of 'default'
in an INSERT? Some of the rows have values for the serial column, others
don't.
COPY table_name (columnnameA, columnnameB, columnnameD) FROM source;
where the serial column name is omitted? This wouldn't preserve the values
for the serial column on rows that have one already.
Doesn't this guarantee collision at some point?
Depends - without the UNIQUE tag on that column it shouldn't matter.
Or, with a bigserial there's a lot of room to play with. The rows with existing
serial values might all have negative values for that column, for example.
Post by Rob Sargent
I might add a column to the target table which would contain the
"foreign" serial id and give all records the "local"
serial. Update local to foreign iff safe and desired.
I don't think this addresses the problem of having entry rows with no
serial column in them.
No data in the column (null) or no column at all?

I appreciate the vastness of bigserial but I think it starts at 1. Are
negative numbers even allowed? To clarify my suggestion: all incoming
records would get a new "local" big serial and those incoming records
WITH a value would set the "foreign" bigserial though that column would
have to be typed as bigint nullable (this would allow negative values).
That said, according to my test, the supplied bigserial value would get
insert as supplied if not null (without the extra column I suggested)
My test

postgres=# create table t (id bigserial, name text);
CREATE TABLE
postgres=# insert into t values('rjs');
ERROR: invalid input syntax for integer: "rjs"
LINE 1: insert into t values('rjs');
^
postgres=#
postgres=# insert into t (name) values('rjs');
INSERT 0 1
postgres=# select * from t;
id | name
----+------
1 | rjs
(1 row)

postgres=# insert into t (id, name) values(777, 'rjs');
INSERT 0 1
postgres=# select * from t;
id | name
-----+------
1 | rjs
777 | rjs
(2 rows)
Post by Steve Wampler
Let me generalize the problem a bit: How can I specify that the default value of a column
is to be used with a COPY command when some rows have values for that column and
some don't?
David G Johnston
2014-10-16 17:52:38 UTC
Permalink
On Thu, Oct 16, 2014 at 11:44 AM, lup [via PostgreSQL] <
I appreciate the vastness of bigserial but I think it starts at 1. Are
negative numbers even allowed?
​http://www.postgresql.org/docs/9.3/static/sql-createsequence.html

A DEFAULT sequence starts at one but it is able to generate any biginteger
value.​ Regardless, the value generated by the sequence and the allowed
values for the target column are distinct - which is why a sequence
attached to a normal integer will start throwing "value out of bounds"
errors before it runs out of values.

Therefore, by default if one is able to live with disallowing half of the
bigint range for auto-generation using the negative half of the range for
manual assignment is a quick-and-simple solution to the problem.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/COPY-data-into-a-table-with-a-SERIAL-column-tp5823278p5823296.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Rob Sargent
2014-10-16 17:58:23 UTC
Permalink
On Thu, Oct 16, 2014 at 11:44 AM, lup [via PostgreSQL] <[hidden email]
I appreciate the vastness of bigserial but I think it starts at
1. Are negative numbers even allowed?
​http://www.postgresql.org/docs/9.3/static/sql-createsequence.html
A DEFAULT sequence starts at one but it is able to generate any
biginteger value.​ Regardless, the value generated by the sequence
and the allowed values for the target column are distinct - which is
why a sequence attached to a normal integer will start throwing "value
out of bounds" errors before it runs out of values.
Therefore, by default if one is able to live with disallowing half of
the bigint range for auto-generation using the negative half of the
range for manual assignment is a quick-and-simple solution to the problem.
David J.
------------------------------------------------------------------------
View this message in context: Re: COPY data into a table with a SERIAL
column?
<http://postgresql.1045698.n5.nabble.com/COPY-data-into-a-table-with-a-SERIAL-column-tp5823278p5823296.html>
Sent from the PostgreSQL - general mailing list archive
<http://postgresql.1045698.n5.nabble.com/PostgreSQL-general-f1843780.html>
at Nabble.com.
As proven by part 4 :)
postgres=# insert into t (id, name) values(-777, 'rjs');
INSERT 0 1
postgres=# select * from t;
id | name
------+------
1 | rjs
777 | rjs
-777 | rjs
(3 rows)
Loading...