Discussion:
a couple questions about convert()
s***@frii.com
2014-09-18 22:35:34 UTC
Permalink
Hello,

In a postgresql-9.3.1 database with UTF8 encoding I can do:

select convert_from (E'\\x68656c6c6f', 'LATIN1');
convert_from
--------------
hello

But when I explicitly give the "to" encoding:

select convert (E'\\x68656c6c6f', 'LATIN1', 'UTF8');
convert
--------------
\x68656c6c6f

Why does that second one give different results from the first?

Second question: why is that none of the convert* functions are
marked as immutable (thus preventing me from creating a functional
index using them). Surely if I convert \x68 to utf-8 the result
will *always* be "h", won't it?
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Tom Lane
2014-09-19 14:50:12 UTC
Permalink
Post by s***@frii.com
select convert_from (E'\\x68656c6c6f', 'LATIN1');
convert_from
--------------
hello
select convert (E'\\x68656c6c6f', 'LATIN1', 'UTF8');
convert
--------------
\x68656c6c6f
Why does that second one give different results from the first?
convert_from() produces a result of type text. But convert() returns
bytea, because its output is not necessarily valid in the current
database encoding. It's the same bytes, but it prints differently.
(You could ameliorate the unreadability by changing the bytea_output
setting.)
Post by s***@frii.com
Second question: why is that none of the convert* functions are
marked as immutable (thus preventing me from creating a functional
index using them). Surely if I convert \x68 to utf-8 the result
will *always* be "h", won't it?
Well, no, it'll be whatever the conversion function says it is;
and encoding conversion functions are replaceable through DDL.

I recall some discussion to the effect that that was silly and
we should rip out CREATE CONVERSION and friends in favor of
hard-wired conversion rules. Nothing's been done about it though.

regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
s***@frii.com
2014-09-22 04:16:06 UTC
Permalink
Post by Tom Lane
Post by s***@frii.com
select convert_from (E'\\x68656c6c6f', 'LATIN1');
convert_from
--------------
hello
select convert (E'\\x68656c6c6f', 'LATIN1', 'UTF8');
convert
--------------
\x68656c6c6f
Why does that second one give different results from the first?
convert_from() produces a result of type text. But convert() returns
bytea, because its output is not necessarily valid in the current
database encoding. It's the same bytes, but it prints differently.
(You could ameliorate the unreadability by changing the bytea_output
setting.)
I my case I routinely deal with multi-byte utf-8 characters so the
"escape" format doesn't help much.
Post by Tom Lane
Post by s***@frii.com
Second question: why is that none of the convert* functions are
marked as immutable (thus preventing me from creating a functional
index using them). Surely if I convert \x68 to utf-8 the result
will *always* be "h", won't it?
Well, no, it'll be whatever the conversion function says it is;
and encoding conversion functions are replaceable through DDL.
I recall some discussion to the effect that that was silly and
we should rip out CREATE CONVERSION and friends in favor of
hard-wired conversion rules. Nothing's been done about it though.
Well, if it matters, count me as +1.

Thanks very much for the explanation, things are clearer now. :-)
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Continue reading on narkive:
Loading...