Discussion:
pg_dump on local Windows, pg_restore on Linux?
Phoenix Kiula
2007-08-15 17:26:11 UTC
Permalink
Couple of questions with porting:

1. I have been playing around with my databases locally on Win XP so
as not to hurt our website traffic. Now I would like to move the
database to a Linux CentOS server. Can I use pg_dump on Windows and
pg_restore it on Linux? If so, any tips on what I should keep in mind
(e.g., manual seems to suggest that pg_restore prefers tar gzipped
format...but I'm not sure if Windows can generate this?)

2. I would like my database to be UTF-8. I issue the command

CREATE DATABASE mydb OWNER me ENCODING 'utf8';

Should I add anything else to it, such as collation? I did not find
any option for that in here:
http://www.postgresql.org/docs/8.2/static/sql-createdatabase.html

3. Also, just to confirm, if I have utf-8 database, then all tables in
it should be able to take utf-8 data? I would just like these tables
to take whatever I send to them. No error checking or encoding
checking. Can I disable the painful error that PG keeps throwing if
even a single erroneous byte shows up? I'd rather have 'garbage data'
than not go through with the query.

Thanks for any tips!

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Scott Marlowe
2007-08-15 17:42:28 UTC
Permalink
Post by Phoenix Kiula
1. I have been playing around with my databases locally on Win XP so
as not to hurt our website traffic. Now I would like to move the
database to a Linux CentOS server. Can I use pg_dump on Windows and
pg_restore it on Linux? If so, any tips on what I should keep in mind
(e.g., manual seems to suggest that pg_restore prefers tar gzipped
format...but I'm not sure if Windows can generate this?)
2. I would like my database to be UTF-8. I issue the command
CREATE DATABASE mydb OWNER me ENCODING 'utf8';
Should I add anything else to it, such as collation? I did not find
http://www.postgresql.org/docs/8.2/static/sql-createdatabase.html
3. Also, just to confirm, if I have utf-8 database, then all tables in
it should be able to take utf-8 data? I would just like these tables
to take whatever I send to them. No error checking or encoding
checking. Can I disable the painful error that PG keeps throwing if
even a single erroneous byte shows up? I'd rather have 'garbage data'
than not go through with the query.
Requirements 2 and 3 are exclusive. Either you want your database to
be UTF-8, which means that invalid byte sequences should be rejected
because they AREN'T utf-8, or you want your database to swallow
whatever you throw at it, in which case, that's not UTF-8.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Phoenix Kiula
2007-08-15 17:53:11 UTC
Permalink
Post by Scott Marlowe
Post by Phoenix Kiula
1. I have been playing around with my databases locally on Win XP so
as not to hurt our website traffic. Now I would like to move the
database to a Linux CentOS server. Can I use pg_dump on Windows and
pg_restore it on Linux? If so, any tips on what I should keep in mind
(e.g., manual seems to suggest that pg_restore prefers tar gzipped
format...but I'm not sure if Windows can generate this?)
2. I would like my database to be UTF-8. I issue the command
CREATE DATABASE mydb OWNER me ENCODING 'utf8';
Should I add anything else to it, such as collation? I did not find
http://www.postgresql.org/docs/8.2/static/sql-createdatabase.html
3. Also, just to confirm, if I have utf-8 database, then all tables in
it should be able to take utf-8 data? I would just like these tables
to take whatever I send to them. No error checking or encoding
checking. Can I disable the painful error that PG keeps throwing if
even a single erroneous byte shows up? I'd rather have 'garbage data'
than not go through with the query.
Requirements 2 and 3 are exclusive. Either you want your database to
be UTF-8, which means that invalid byte sequences should be rejected
because they AREN'T utf-8, or you want your database to swallow
whatever you throw at it, in which case, that's not UTF-8.
Thanks. Is there an encoding that is so flexible that it will silently
accept whatever I send to it without throwing an exception?

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
Tom Lane
2007-08-15 18:08:41 UTC
Permalink
Post by Phoenix Kiula
Thanks. Is there an encoding that is so flexible that it will silently
accept whatever I send to it without throwing an exception?
SQL_ASCII does that. Whether it's a good idea to use it is
questionable. One thing to think about is that you will be unable to
provide translation to and from different client encodings --- the
database will always just regurgitate the bytes it was given, since it
doesn't really know what encoding they are in.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Leif B. Kristensen
2007-08-15 17:53:45 UTC
Permalink
Post by Phoenix Kiula
1. I have been playing around with my databases locally on Win XP so
as not to hurt our website traffic. Now I would like to move the
database to a Linux CentOS server. Can I use pg_dump on Windows and
pg_restore it on Linux? If so, any tips on what I should keep in mind
(e.g., manual seems to suggest that pg_restore prefers tar gzipped
format...but I'm not sure if Windows can generate this?)
This is how I import my output from pg_dump:

createdb --encoding=UNICODE $DB
psql -U postgres -d $DB -f $INFILE > restore.log 2>&1

This way, you just feed psql with an uncompressed dump file. I've never
used pg_restore.
Post by Phoenix Kiula
2. I would like my database to be UTF-8. I issue the command
CREATE DATABASE mydb OWNER me ENCODING 'utf8';
Should I add anything else to it, such as collation? I did not find
http://www.postgresql.org/docs/8.2/static/sql-createdatabase.html
At least on a *nix system, collation is based on the value of the LC_ALL
environment variable at dbinit time. There's nothing you can do about
it in a live database. IMO that's a little awkward, and is what finally
made me change the global from ISO-8859-1 to UTF-8 on my three Gentoo
Linux machines.
--
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo/KDE
My Jazz Jukebox: http://www.last.fm/user/leifbk/

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
Phoenix Kiula
2007-08-15 18:04:08 UTC
Permalink
Post by Leif B. Kristensen
At least on a *nix system, collation is based on the value of the LC_ALL
environment variable at dbinit time. There's nothing you can do about
it in a live database. IMO that's a little awkward, and is what finally
made me change the global from ISO-8859-1 to UTF-8 on my three Gentoo
Linux machines.
This is great info, thanks. Could you let me know how I could change
the global values of "LC_ALL"? I am on Linux too, just CentOS, but I
suppose it should be the same or similar?

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Leif B. Kristensen
2007-08-15 18:31:45 UTC
Permalink
Post by Phoenix Kiula
Post by Leif B. Kristensen
At least on a *nix system, collation is based on the value of the
LC_ALL environment variable at dbinit time. There's nothing you can
do about it in a live database. IMO that's a little awkward, and is
what finally made me change the global from ISO-8859-1 to UTF-8 on
my three Gentoo Linux machines.
This is great info, thanks. Could you let me know how I could change
the global values of "LC_ALL"? I am on Linux too, just CentOS, but I
suppose it should be the same or similar?
I don't have the foggiest idea how it's done in CentOS. In Gentoo, you
just edit the contents of /etc/env.d/02locale:

balapapa ~ # cat /etc/env.d/02locale
LC_ALL="nb_NO.UTF-8"
LANG=""
LC_CTYPE="nb_NO.UTF.8"
LC_NUMERIC="nb_NO.UTF.8"
LC_TIME="nb_NO.UTF.8"
LC_COLLATE="nb_NO.UTF.8"
LC_MONETARY="nb_NO.UTF.8"
LC_PAPER="nb_NO.UTF.8"
LC_NAME="nb_NO.UTF.8"
LC_ADDRESS="nb_NO.UTF.8"
LC_TELEPHONE="nb_NO.UTF.8"
LC_MEASUREMENT="nb_NO.UTF.8"
LC_IDENTIFICATION="nb_NO.UTF.8"

To update the environment settings globally, the Gentoo method is to
issue the command

"env-update && source /etc/profile"

from root. But you should really Read The Fine Manual about this. If you
for instance have filenames with non-ASCII characters in them, prepare
yourself for some interesting challenges.
--
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo/KDE
My Jazz Jukebox: http://www.last.fm/user/leifbk/

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/
James B. Byrne
2007-08-15 21:38:04 UTC
Permalink
Date: Wed, 15 Aug 2007 20:31:45 +0200
From: "Leif B. Kristensen" <***@solumslekt.org>
To: pgsql-***@postgresql.org
Subject: Re: pg_dump on local Windows, pg_restore on Linux?
Post by Phoenix Kiula
This is great info, thanks. Could you let me know how I could change
the global values of "LC_ALL"? I am on Linux too, just CentOS, but I
suppose it should be the same or similar?
See: man local and man 1p locale and man 1p localedef and /etc/sysconfig/i18n

Do NOT manually set LC_ALL on a CentOS system unless you know exactly what
you are doing and what programs will be impacted. If set then LC_ALL
overrides all of the individual LC_ settings.


Regards,
--
*** E-Mail is NOT a SECURE channel ***
James B. Byrne mailto:***@Harte-Lyne.ca
Harte & Lyne Limited http://www.harte-lyne.ca
9 Brockley Drive vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada L8E 3C3


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Phoenix Kiula
2007-08-16 01:15:44 UTC
Permalink
Post by James B. Byrne
Date: Wed, 15 Aug 2007 20:31:45 +0200
Subject: Re: pg_dump on local Windows, pg_restore on Linux?
Post by Phoenix Kiula
This is great info, thanks. Could you let me know how I could change
the global values of "LC_ALL"? I am on Linux too, just CentOS, but I
suppose it should be the same or similar?
See: man local and man 1p locale and man 1p localedef and /etc/sysconfig/i18n
Do NOT manually set LC_ALL on a CentOS system unless you know exactly what
you are doing and what programs will be impacted. If set then LC_ALL
overrides all of the individual LC_ settings.
locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=


Is this enough to run a pgsql database that is supposed to house utf-8
content? Or should I somehow make sure LC_ALL carries the value of
'en_US.UTF-8' as well? If not manually, how would I do it, and how can
I trace what other programs may be impacted. It sounds like an
impossible task given that many applications/servers run on this
machine, so I am hoping that either there is a workaround for
postgresql, or that postgresql will work with just the above
parameters as they are.

TIA!

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
James B. Byrne
2007-08-16 13:34:13 UTC
Permalink
Post by Phoenix Kiula
Post by Leif B. Kristensen
locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Is this enough to run a pgsql database that is supposed to house utf-8
content?
Yes, those settings are entirely appropriate for CentOS where the system
language is U.S. English. You do not need to override them by setting
LC_ALL and would gain nothing thereby.

The locale preface settings (en_US) determine how internationalized
programs display information to users, whether by text presentation format
masks or via character collation orders. The important thing for
PostgreSQL is that any application host system that generates data stored
by a UTF-8 database instance have its character encoding be UTF-8 or
provide a means to convert it before submitting it to the DBMS. Otherwise
you will get encoding errors when attempting to write data that otherwise
appears to the user as perfectly sensible text.


Sincerely,
--
*** E-Mail is NOT a SECURE channel ***
James B. Byrne mailto:***@Harte-Lyne.ca
Harte & Lyne Limited http://www.harte-lyne.ca
9 Brockley Drive vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada L8E 3C3


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Loading...