Discussion:
JSONB spaces in text presentation
Ilya I. Ashchepkov
2014-09-24 05:38:05 UTC
Permalink
Hi.

Is spaces is nessesary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.

I'm developing web application, and want to get json-string from pg and
send it to browser without repacking.
--
С уважеМОеЌ,
АщепкПв Илья ***@gmail.com
Ilya I. Ashchepkov
2014-09-24 07:23:08 UTC
Permalink
Hi.

Is spaces is necessary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.

I'm developing web application, and want to get json-string from pg and
send it to browser without repacking.
--
С уважеМОеЌ,
АщепкПв Илья ***@gmail.com
John R Pierce
2014-09-24 07:30:09 UTC
Permalink
Post by Ilya I. Ashchepkov
Is spaces is necessary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.
can you show us an example of this?
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Ilya I. Ashchepkov
2014-09-24 07:44:25 UTC
Permalink
I'm sorry about sending email several times. I haven't understand, was it
sent by gmail or not.
Post by John R Pierce
Post by Ilya I. Ashchepkov
Is spaces is necessary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.
can you show us an example of this?
One record
# select data from events.data limit 1;
{"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
"runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps": 1,
"gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19, "speed":
87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256, "digital":
{"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
"2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
"altitude": 143, "latitude": 55.127888997395836, "longitude":
80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource": "terminal",
"receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}

Whitespacis percents in this record:
# select array_length(regexp_split_to_array(data::text, text ' '),
1)*100./length(data::text) from events.data limit 1;
?column?
---------------------
12.3417721518987342

Whitespace in test data
# select count(*),avg(array_length(regexp_split_to_array(data::text, text
' '), 1)*100./length(data::text)) from events.data ;
count | avg
--------+---------------------
242222 | 12.3649234646118312
Post by John R Pierce
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
http://www.postgresql.org/mailpref/pgsql-general
--
С уважеМОеЌ,
АщепкПв Илья ***@gmail.com
Merlin Moncure
2014-09-24 13:44:52 UTC
Permalink
Post by Ilya I. Ashchepkov
I'm sorry about sending email several times. I haven't understand, was it
sent by gmail or not.
Post by John R Pierce
Post by Ilya I. Ashchepkov
Is spaces is necessary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.
can you show us an example of this?
One record
# select data from events.data limit 1;
{"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
"runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps": 1,
{"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
"2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource": "terminal",
"receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
# select array_length(regexp_split_to_array(data::text, text ' '),
1)*100./length(data::text) from events.data limit 1;
?column?
---------------------
12.3417721518987342
Whitespace in test data
# select count(*),avg(array_length(regexp_split_to_array(data::text, text '
'), 1)*100./length(data::text)) from events.data ;
count | avg
--------+---------------------
242222 | 12.3649234646118312
For jsonb (unlike json), data is not actually stored as json but in a
binary format. It will generally be much larger than the text
representation in fact but in exchange for that many operations will
be faster. The spaces you see are generated when the jsonb type is
converted to text for output. I actually think it's pretty reasonable
to want to redact all spaces from such objects in all cases where
converstion to text happens (output functions, xxxto_json, etc)
because ~12% savings are nothing to sneeze at when moving large
documents in and out of the database.

On the flip side, a more verbose prettification would be pretty nice
too. I wonder if a hypothetical GUC is the best way to control this
behavior...

merlin
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Seref Arikan
2014-09-24 14:22:00 UTC
Permalink
This is interesting. Most binary encoding methods I use produce smaller
files than the text files for the same content.
Having read your mail, I've realized that I have no reason to accept the
same from the jsonb. I did a quick google search to see if it is wrong to
expect binary encoding to decrease size and saw that I'm not alone (which
still does not mean I'm being reasonable).
This project: http://ubjson.org/#size is one of the hits which mentions
some nice space gains thanks to binary encoding.

The "much larger" part is a bit scary. Is this documented somewhere?

Best regards
Seref
Post by Ilya I. Ashchepkov
Post by Ilya I. Ashchepkov
I'm sorry about sending email several times. I haven't understand, was it
sent by gmail or not.
Post by John R Pierce
Post by Ilya I. Ashchepkov
Is spaces is necessary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.
can you show us an example of this?
One record
# select data from events.data limit 1;
{"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
1,
Post by Ilya I. Ashchepkov
87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256,
{"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
"2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
"terminal",
Post by Ilya I. Ashchepkov
"receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
# select array_length(regexp_split_to_array(data::text, text ' '),
1)*100./length(data::text) from events.data limit 1;
?column?
---------------------
12.3417721518987342
Whitespace in test data
# select count(*),avg(array_length(regexp_split_to_array(data::text,
text '
Post by Ilya I. Ashchepkov
'), 1)*100./length(data::text)) from events.data ;
count | avg
--------+---------------------
242222 | 12.3649234646118312
For jsonb (unlike json), data is not actually stored as json but in a
binary format. It will generally be much larger than the text
representation in fact but in exchange for that many operations will
be faster. The spaces you see are generated when the jsonb type is
converted to text for output. I actually think it's pretty reasonable
to want to redact all spaces from such objects in all cases where
converstion to text happens (output functions, xxxto_json, etc)
because ~12% savings are nothing to sneeze at when moving large
documents in and out of the database.
On the flip side, a more verbose prettification would be pretty nice
too. I wonder if a hypothetical GUC is the best way to control this
behavior...
merlin
--
http://www.postgresql.org/mailpref/pgsql-general
Adrian Klaver
2014-09-24 14:34:44 UTC
Permalink
Post by Seref Arikan
This is interesting. Most binary encoding methods I use produce smaller
files than the text files for the same content.
Having read your mail, I've realized that I have no reason to accept the
same from the jsonb. I did a quick google search to see if it is wrong
to expect binary encoding to decrease size and saw that I'm not alone
(which still does not mean I'm being reasonable).
This project: http://ubjson.org/#size is one of the hits which mentions
some nice space gains thanks to binary encoding.
The "much larger" part is a bit scary. Is this documented somewhere?
Best regards
Seref
--
Adrian Klaver
***@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Ilya I. Ashchepkov
2014-09-24 16:28:44 UTC
Permalink
With the same data:

# create cast (jsonb as bytea) without function;
# select
sum(length(data::text))::float/sum(octet_length((data::jsonb)::bytea)) from
data.packets;
?column?
-------------------
0.630663654967513

and 0.554666142734544 without spaces
Post by Seref Arikan
This is interesting. Most binary encoding methods I use produce smaller
files than the text files for the same content.
Having read your mail, I've realized that I have no reason to accept the
same from the jsonb. I did a quick google search to see if it is wrong to
expect binary encoding to decrease size and saw that I'm not alone (which
still does not mean I'm being reasonable).
This project: http://ubjson.org/#size is one of the hits which mentions
some nice space gains thanks to binary encoding.
The "much larger" part is a bit scary. Is this documented somewhere?
Best regards
Seref
Post by Ilya I. Ashchepkov
Post by Ilya I. Ashchepkov
I'm sorry about sending email several times. I haven't understand, was
it
Post by Ilya I. Ashchepkov
sent by gmail or not.
Post by John R Pierce
Post by Ilya I. Ashchepkov
Is spaces is necessary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.
can you show us an example of this?
One record
# select data from events.data limit 1;
74,
1,
Post by Ilya I. Ashchepkov
"gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19,
87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256,
{"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
"2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
"terminal",
Post by Ilya I. Ashchepkov
"receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
# select array_length(regexp_split_to_array(data::text, text ' '),
1)*100./length(data::text) from events.data limit 1;
?column?
---------------------
12.3417721518987342
Whitespace in test data
# select count(*),avg(array_length(regexp_split_to_array(data::text,
text '
Post by Ilya I. Ashchepkov
'), 1)*100./length(data::text)) from events.data ;
count | avg
--------+---------------------
242222 | 12.3649234646118312
For jsonb (unlike json), data is not actually stored as json but in a
binary format. It will generally be much larger than the text
representation in fact but in exchange for that many operations will
be faster. The spaces you see are generated when the jsonb type is
converted to text for output. I actually think it's pretty reasonable
to want to redact all spaces from such objects in all cases where
converstion to text happens (output functions, xxxto_json, etc)
because ~12% savings are nothing to sneeze at when moving large
documents in and out of the database.
On the flip side, a more verbose prettification would be pretty nice
too. I wonder if a hypothetical GUC is the best way to control this
behavior...
merlin
--
http://www.postgresql.org/mailpref/pgsql-general
--
С уважеМОеЌ,
АщепкПв Илья ***@gmail.com
Oleg Bartunov
2014-09-24 19:07:55 UTC
Permalink
Check slides 17-20 of
http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2013.pdf to
understand, what 'binary format' means. The slides describes binary storage
for nested hstore, not jsonb, but you'll get the idea.
Post by Seref Arikan
This is interesting. Most binary encoding methods I use produce smaller
files than the text files for the same content.
Having read your mail, I've realized that I have no reason to accept the
same from the jsonb. I did a quick google search to see if it is wrong to
expect binary encoding to decrease size and saw that I'm not alone (which
still does not mean I'm being reasonable).
This project: http://ubjson.org/#size is one of the hits which mentions
some nice space gains thanks to binary encoding.
The "much larger" part is a bit scary. Is this documented somewhere?
Best regards
Seref
Post by Ilya I. Ashchepkov
Post by Ilya I. Ashchepkov
I'm sorry about sending email several times. I haven't understand, was
it
Post by Ilya I. Ashchepkov
sent by gmail or not.
Post by John R Pierce
Post by Ilya I. Ashchepkov
Is spaces is necessary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.
can you show us an example of this?
One record
# select data from events.data limit 1;
74,
1,
Post by Ilya I. Ashchepkov
"gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19,
87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256,
{"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
"2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
"terminal",
Post by Ilya I. Ashchepkov
"receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
# select array_length(regexp_split_to_array(data::text, text ' '),
1)*100./length(data::text) from events.data limit 1;
?column?
---------------------
12.3417721518987342
Whitespace in test data
# select count(*),avg(array_length(regexp_split_to_array(data::text,
text '
Post by Ilya I. Ashchepkov
'), 1)*100./length(data::text)) from events.data ;
count | avg
--------+---------------------
242222 | 12.3649234646118312
For jsonb (unlike json), data is not actually stored as json but in a
binary format. It will generally be much larger than the text
representation in fact but in exchange for that many operations will
be faster. The spaces you see are generated when the jsonb type is
converted to text for output. I actually think it's pretty reasonable
to want to redact all spaces from such objects in all cases where
converstion to text happens (output functions, xxxto_json, etc)
because ~12% savings are nothing to sneeze at when moving large
documents in and out of the database.
On the flip side, a more verbose prettification would be pretty nice
too. I wonder if a hypothetical GUC is the best way to control this
behavior...
merlin
--
http://www.postgresql.org/mailpref/pgsql-general
Seref Arikan
2014-09-24 20:41:39 UTC
Permalink
Thanks Oleg, I'll check the slides.
Post by Oleg Bartunov
Check slides 17-20 of
http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2013.pdf to
understand, what 'binary format' means. The slides describes binary storage
for nested hstore, not jsonb, but you'll get the idea.
Post by Seref Arikan
This is interesting. Most binary encoding methods I use produce smaller
files than the text files for the same content.
Having read your mail, I've realized that I have no reason to accept the
same from the jsonb. I did a quick google search to see if it is wrong to
expect binary encoding to decrease size and saw that I'm not alone (which
still does not mean I'm being reasonable).
This project: http://ubjson.org/#size is one of the hits which mentions
some nice space gains thanks to binary encoding.
The "much larger" part is a bit scary. Is this documented somewhere?
Best regards
Seref
Post by Ilya I. Ashchepkov
Post by Ilya I. Ashchepkov
I'm sorry about sending email several times. I haven't understand, was
it
Post by Ilya I. Ashchepkov
sent by gmail or not.
Post by John R Pierce
Post by Ilya I. Ashchepkov
Is spaces is necessary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.
can you show us an example of this?
One record
# select data from events.data limit 1;
74,
Post by Ilya I. Ashchepkov
"runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084,
"gps": 1,
Post by Ilya I. Ashchepkov
"gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19,
87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256,
0,
Post by Ilya I. Ashchepkov
"2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
"terminal",
Post by Ilya I. Ashchepkov
"receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
# select array_length(regexp_split_to_array(data::text, text ' '),
1)*100./length(data::text) from events.data limit 1;
?column?
---------------------
12.3417721518987342
Whitespace in test data
# select count(*),avg(array_length(regexp_split_to_array(data::text,
text '
Post by Ilya I. Ashchepkov
'), 1)*100./length(data::text)) from events.data ;
count | avg
--------+---------------------
242222 | 12.3649234646118312
For jsonb (unlike json), data is not actually stored as json but in a
binary format. It will generally be much larger than the text
representation in fact but in exchange for that many operations will
be faster. The spaces you see are generated when the jsonb type is
converted to text for output. I actually think it's pretty reasonable
to want to redact all spaces from such objects in all cases where
converstion to text happens (output functions, xxxto_json, etc)
because ~12% savings are nothing to sneeze at when moving large
documents in and out of the database.
On the flip side, a more verbose prettification would be pretty nice
too. I wonder if a hypothetical GUC is the best way to control this
behavior...
merlin
--
http://www.postgresql.org/mailpref/pgsql-general
John R Pierce
2014-09-24 19:11:43 UTC
Permalink
Post by Seref Arikan
This is interesting. Most binary encoding methods I use produce
smaller files than the text files for the same content.
'1' vs INTEGER 1 ... 1 byte vs 4 bytes.

now add metadata necessary to represent the original json structure.
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Ilya I. Ashchepkov
2014-09-24 16:23:48 UTC
Permalink
IMHO, prettification is useful only for debugging.
It would be nice to have a session variable for the debug output with
spaces, new lines and indentation.
Post by Ilya I. Ashchepkov
Post by Ilya I. Ashchepkov
I'm sorry about sending email several times. I haven't understand, was it
sent by gmail or not.
Post by John R Pierce
Post by Ilya I. Ashchepkov
Is spaces is necessary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.
can you show us an example of this?
One record
# select data from events.data limit 1;
{"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
1,
Post by Ilya I. Ashchepkov
87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256,
{"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
"2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
"terminal",
Post by Ilya I. Ashchepkov
"receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
# select array_length(regexp_split_to_array(data::text, text ' '),
1)*100./length(data::text) from events.data limit 1;
?column?
---------------------
12.3417721518987342
Whitespace in test data
# select count(*),avg(array_length(regexp_split_to_array(data::text,
text '
Post by Ilya I. Ashchepkov
'), 1)*100./length(data::text)) from events.data ;
count | avg
--------+---------------------
242222 | 12.3649234646118312
For jsonb (unlike json), data is not actually stored as json but in a
binary format. It will generally be much larger than the text
representation in fact but in exchange for that many operations will
be faster. The spaces you see are generated when the jsonb type is
converted to text for output. I actually think it's pretty reasonable
to want to redact all spaces from such objects in all cases where
converstion to text happens (output functions, xxxto_json, etc)
because ~12% savings are nothing to sneeze at when moving large
documents in and out of the database.
On the flip side, a more verbose prettification would be pretty nice
too. I wonder if a hypothetical GUC is the best way to control this
behavior...
merlin
--
С уважеМОеЌ,
АщепкПв Илья ***@gmail.com
Adrian Klaver
2014-09-24 13:51:03 UTC
Permalink
Post by Ilya I. Ashchepkov
I'm sorry about sending email several times. I haven't understand, was
it sent by gmail or not.
Is spaces is necessary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.
can you show us an example of this?
One record
# select data from events.data limit 1;
74, "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084,
256, "digital": {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0},
{"course": 265, "altitude": 143, "latitude": 55.127888997395836,
"longitude": 80.8046142578125}, "protocol": 4, "coldstart": 1,
"timesource": "terminal", "receiver_on": 1, "external_power": 28.07,
"internal_power": 4.19}
# select array_length(regexp_split_to_array(data::text, text ' '),
1)*100./length(data::text) from events.data limit 1;
?column?
---------------------
12.3417721518987342
Whitespace in test data
# select count(*),avg(array_length(regexp_split_to_array(data::text,
text ' '), 1)*100./length(data::text)) from events.data ;
count | avg
--------+---------------------
242222 | 12.3649234646118312
The only thing I can of is to use json not jsonb. Modified example taken
from docs:

http://www.postgresql.org/docs/9.4/static/datatype-json.html

test=# SELECT '{"bar":"baz","balance":7.77,"active":false}'::jsonb;
jsonb
--------------------------------------------------
{"bar": "baz", "active": false, "balance": 7.77}



test=# SELECT '{"bar":"baz","balance":7.77,"active":false}'::json;
json
---------------------------------------------
{"bar":"baz","balance":7.77,"active":false}


json will return exactly what was put in.
--
Adrian Klaver
***@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Ilya I. Ashchepkov
2014-09-24 07:25:42 UTC
Permalink
Hi.

Is spaces is nessesary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.

I'm developing web application, and want to get json-string from pg and
send it to browser without repacking.
--
С уважеМОеЌ,
АщепкПв Илья ***@gmail.com
Loading...