Discussion:
psql and tab-delimited output
Abelard Hoffman
2014-09-06 07:32:58 UTC
Permalink
Hi.

Traditionally, to generate a TSV report, I've simply invoked psql with:
--no-align --field-separator '\t' --pset footer=off

That works in most cases, except when your column values contain tabs
themselves.

I know that COPY() will escape tabs (as \t), and we can use that from psql
with the \copy command, but that does not include a header row of the
column names.

So, my question is, what's the simplest way to generate tab-escaped
TSV-formatted reports with the first line containing the list of column
names?


I also considered handling the escaping myself within the SELECT, and then
sticking with the first approach above.

Suggestions?

Thanks.
Thomas Kellerer
2014-09-06 09:32:11 UTC
Permalink
Post by Abelard Hoffman
Traditionally, to generate a TSV report, I've simply invoked psql
with: --no-align --field-separator '\t' --pset footer=off
That works in most cases, except when your column values contain tabs
themselves.
I know that COPY() will escape tabs (as \t), and we can use that from
psql with the \copy command, but that does not include a header row
of the column names.
So, my question is, what's the simplest way to generate tab-escaped
TSV-formatted reports with the first line containing the list of
column names?
Do you have to do that using psql or can you use a different tool?

The tool I maintain - SQL Workbench/J - has a command to export tables (or queries) to a wide range of export formats.

WbExport can escape of special characters by using \t, \n etc notation for "control" characters

http://sql-workbench.net/manual/command-export.html

especially

http://sql-workbench.net/manual/command-export.html#text-escape-switch

This can all be done from a shell/batch script if needed.

The tool needs a Java runtime and the Postgres JDBC driver.

Thomas
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Adrian Klaver
2014-09-06 14:28:53 UTC
Permalink
Post by Abelard Hoffman
Hi.
--no-align --field-separator '\t' --pset footer=off
That works in most cases, except when your column values contain tabs
themselves.
I know that COPY() will escape tabs (as \t), and we can use that from
psql with the \copy command, but that does not include a header row of
the column names.
So, my question is, what's the simplest way to generate tab-escaped
TSV-formatted reports with the first line containing the list of column
names?
create table tsv_test (id int, fld_1 varchar);

insert into tsv_test values (1, 'test value');
insert into tsv_test values (2, 'test value');
insert into tsv_test values (3, 'test value');

\copy tsv_test to 'data.tsv' with csv header delimiter ' ';

***@panda:~> cat data.tsv
id fld_1
1 "test value"
2 "test value"
3 "test value"
Post by Abelard Hoffman
I also considered handling the escaping myself within the SELECT, and
then sticking with the first approach above.
Suggestions?
Thanks.
--
Adrian Klaver
***@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Abelard Hoffman
2014-09-06 17:34:01 UTC
Permalink
[snip]
Post by Abelard Hoffman
So, my question is, what's the simplest way to generate tab-escaped
Post by Abelard Hoffman
TSV-formatted reports with the first line containing the list of column
names?
create table tsv_test (id int, fld_1 varchar);
insert into tsv_test values (1, 'test value');
insert into tsv_test values (2, 'test value');
insert into tsv_test values (3, 'test value');
\copy tsv_test to 'data.tsv' with csv header delimiter ' ';
id fld_1
1 "test value"
2 "test value"
3 "test value"
Thanks, Adrian. That works, but since we're using quotes to embed the
delimiter, we lose the simplicity of TSV. I can't just do a split on /\t/
to get the fields and then unescape the values. At that point it's probably
simpler to just switch to standard CSV.

Using your example, the output I'd prefer is:

id fld_1
1 test\tvalue
2 test\tvalue
3 test\tvalue

I looked at the options for COPY's CSV format, but I don't see a way to
disable quoting but still have escaping.

This works, although it's not exactly simple:

DROP TABLE IF EXISTS tsv_test;

CREATE TABLE tsv_test (id int, fld_1 varchar);

INSERT INTO tsv_test VALUES (1, 'test value');
INSERT INTO tsv_test VALUES (2, 'test value');
INSERT INTO tsv_test VALUES (3, 'test value');

SELECT * FROM tsv_test WHERE FALSE; -- to generate header row
COPY tsv_test TO STDOUT;

And then run that through psql with the --no-align --field-separator '\t'
--pset footer=off options.
With that, I'd probably generate the report into a temp table, and then run
the above to actually export that table as TSV.

@Thomas, yes, I was hoping to stick with just psql, but I'll look at other
tools if necessary.

Any other ideas?

Thanks.
Adrian Klaver
2014-09-06 18:43:56 UTC
Permalink
Post by Abelard Hoffman
[snip]
So, my question is, what's the simplest way to generate tab-escaped
TSV-formatted reports with the first line containing the list of column
names?
create table tsv_test (id int, fld_1 varchar);
insert into tsv_test values (1, 'test value');
insert into tsv_test values (2, 'test value');
insert into tsv_test values (3, 'test value');
\copy tsv_test to 'data.tsv' with csv header delimiter ' ';
id fld_1
1 "test value"
2 "test value"
3 "test value"
Thanks, Adrian. That works, but since we're using quotes to embed the
delimiter, we lose the simplicity of TSV. I can't just do a split on
/\t/ to get the fields and then unescape the values. At that point it's
probably simpler to just switch to standard CSV.
id fld_1
1 test\tvalue
2 test\tvalue
3 test\tvalue
I guess it depends on what you are using.

In Python:

In [30]: with open('data.tsv', 'rb') as c_file:
c_reader = csv.reader(c_file, delimiter='\t', quotechar = '"')
for row in c_reader:
print row
....:
['id', 'fld_1']
['1', 'test\tvalue']
['2', 'test\tvalue']
['3', 'test\tvalue']


In [33]: with open('data.tsv', 'rb') as c_file:
c_reader = csv.reader(c_file, delimiter='\t', quotechar = '"')
for row in c_reader:
print row[1]
....:
fld_1
test value
test value
test value

The Postgres docs have a good note on the CSV format:

http://www.postgresql.org/docs/9.3/interactive/sql-copy.html

"Note: Many programs produce strange and occasionally perverse CSV
files, so the file format is more a convention than a standard. Thus you
might encounter some files that cannot be imported using this mechanism,
and COPY might produce files that other programs cannot process."


So it always an adventure:)
Post by Abelard Hoffman
I looked at the options for COPY's CSV format, but I don't see a way to
disable quoting but still have escaping.
DROP TABLE IF EXISTS tsv_test;
CREATE TABLE tsv_test (id int, fld_1 varchar);
INSERT INTO tsv_test VALUES (1, 'test value');
INSERT INTO tsv_test VALUES (2, 'test value');
INSERT INTO tsv_test VALUES (3, 'test value');
SELECT * FROM tsv_test WHERE FALSE; -- to generate header row
COPY tsv_test TO STDOUT;
And then run that through psql with the --no-align --field-separator
'\t' --pset footer=off options.
With that, I'd probably generate the report into a temp table, and then
run the above to actually export that table as TSV.
@Thomas, yes, I was hoping to stick with just psql, but I'll look at
other tools if necessary.
Any other ideas?
Thanks.
--
Adrian Klaver
***@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Abelard Hoffman
2014-09-07 08:45:56 UTC
Permalink
Post by Adrian Klaver
Post by Abelard Hoffman
[snip]
So, my question is, what's the simplest way to generate tab-escaped
TSV-formatted reports with the first line containing the list of column
names?
create table tsv_test (id int, fld_1 varchar);
insert into tsv_test values (1, 'test value');
insert into tsv_test values (2, 'test value');
insert into tsv_test values (3, 'test value');
\copy tsv_test to 'data.tsv' with csv header delimiter ' ';
id fld_1
1 "test value"
2 "test value"
3 "test value"
Thanks, Adrian. That works, but since we're using quotes to embed the
delimiter, we lose the simplicity of TSV. I can't just do a split on
/\t/ to get the fields and then unescape the values. At that point it's
probably simpler to just switch to standard CSV.
id fld_1
1 test\tvalue
2 test\tvalue
3 test\tvalue
I guess it depends on what you are using.
[snip]

Yeah, I can parse CSV easily enough. My boss wants TSV though (I could
parse CSV and split out TSV, of course). Even then, having to take the
report query (which can be big), strip all the newlines and insert it into
a \copy command is kind of a PITA.

I wrote this function, which does what I want. I can call this on each
column that might include tabs and then safely generate TSV using just psql:

CREATE OR REPLACE FUNCTION public.tsv_escape(text) RETURNS text AS $$
my %replace = (
"\t" => 't',
"\r" => 'r',
"\n" => 'n',
'\\' => '\\',
);

$_[0] =~ s/(.)/exists $replace{ $1 } ? "\\$replace{$1}" : $1/gse;
return $_[0];
$$ LANGUAGE 'plperl';

Anyone see a way to rewrite that as an 'sql' function rather than plperl?

I also took a look at the psql source. It doesn't look like it would be
very hard to add some backslash escaping logic as an option. Am I the only
one that would use that? For reports, everyone else mostly uses other
tools? I'd like to stay away from GUI-tools, if possible.

Thanks.
Alban Hertroys
2014-09-07 11:18:37 UTC
Permalink
For reports, everyone else mostly uses other tools? I'd like to stay away from GUI-tools, if possible.
For reporting, usually you use the data in the database directly.

A TSV or CSV file is not a report, it’s at best a data source for your report. Going through an intermediate format is not a particularly effective approach to create reports, but if you have to (for example because you aren’t _allowed_ access to the database), generally preferred formats seem to be CSV, XML or JSON; as long as it’s a standard format.
TSV is not a common choice. Are you sure your boss actually cares that it’s TSV and not, for example, CSV?

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
David G Johnston
2014-09-07 12:44:07 UTC
Permalink
On 07 Sep 2014, at 10:45, Abelard Hoffman <
For reports, everyone else mostly uses other tools? I'd like to stay away
from GUI-tools, if possible.
For reporting, usually you use the data in the database directly.
A TSV or CSV file is not a report, it’s at best a data source for your
report. Going through an intermediate format is not a particularly
effective approach to create reports, but if you have to (for example
because you aren’t _allowed_ access to the database), generally preferred
formats seem to be CSV, XML or JSON; as long as it’s a standard format.
TSV is not a common choice. Are you sure your boss actually cares that
it’s TSV and not, for example, CSV?
TSV is generally chosen as an interchange format because data rarely has
tabs...if your does that advantage goes away.

Same question regarding the report aspect. If the boss wants a human
readable text format report I'd suggest using the format function and output
the the resultant single column as-is to the screen. Fixed-width columns
are better than any kind of delimited format in that use case.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/psql-and-tab-delimited-output-tp5818019p5818097.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Abelard Hoffman
2014-09-08 18:45:37 UTC
Permalink
Hi Alban.
Post by Abelard Hoffman
For reports, everyone else mostly uses other tools? I'd like to stay
away from GUI-tools, if possible.
For reporting, usually you use the data in the database directly.
A TSV or CSV file is not a report, it’s at best a data source for your
report. Going through an intermediate format is not a particularly
effective approach to create reports, but if you have to (for example
because you aren’t _allowed_ access to the database), generally preferred
formats seem to be CSV, XML or JSON; as long as it’s a standard format.
TSV is not a common choice. Are you sure your boss actually cares that
it’s TSV and not, for example, CSV?
Could you expand on that a bit? What sort of tools does management use to
generate reports from the database directly?
You're meaning a database warehouse? We just have an OLTP db, so we've
always generated reports periodically through cron jobs.

Or maybe "reports" is the wrong word. We generate a bunch of db stats which
can then be used however they want (pulled into Excel, etc.).
But would definitely be interested in learning about other approaches.

And yes, I'm sure we could convert everything over to CSV. Just an issue of
inertia.

Thanks.
Adrian Klaver
2014-09-09 14:29:40 UTC
Permalink
Post by Abelard Hoffman
Hi Alban.
For reports, everyone else mostly uses other tools? I'd like to stay away from GUI-tools, if possible.
For reporting, usually you use the data in the database directly.
A TSV or CSV file is not a report, it’s at best a data source for
your report. Going through an intermediate format is not a
particularly effective approach to create reports, but if you have
to (for example because you aren’t _allowed_ access to the
database), generally preferred formats seem to be CSV, XML or JSON;
as long as it’s a standard format.
TSV is not a common choice. Are you sure your boss actually cares
that it’s TSV and not, for example, CSV?
Could you expand on that a bit? What sort of tools does management use
to generate reports from the database directly?
You're meaning a database warehouse? We just have an OLTP db, so we've
always generated reports periodically through cron jobs.
Or maybe "reports" is the wrong word. We generate a bunch of db stats
which can then be used however they want (pulled into Excel, etc.).
But would definitely be interested in learning about other approaches.
Ways I have done it:

1) In Excel use the data tools to run the query and return data directly
to the spreadsheet. This assumes a version of Excel that has the data tools.

2) I use Python, so use psycopg2 to pull data from the database and then
xlwt to write out a spreadsheet or reportlab to create a pdf or the csv
module to create whatever flavor of CSV.
Post by Abelard Hoffman
And yes, I'm sure we could convert everything over to CSV. Just an issue
of inertia.
Thanks.
--
Adrian Klaver
***@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Adrian Klaver
2014-09-07 16:25:48 UTC
Permalink
On Sat, Sep 6, 2014 at 11:43 AM, Adrian Klaver
On Sat, Sep 6, 2014 at 7:28 AM, Adrian Klaver
[snip]
So, my question is, what's the simplest way to generate
tab-escaped
TSV-formatted reports with the first line containing
the list of
column
names?
create table tsv_test (id int, fld_1 varchar);
insert into tsv_test values (1, 'test value');
insert into tsv_test values (2, 'test value');
insert into tsv_test values (3, 'test value');
\copy tsv_test to 'data.tsv' with csv header delimiter '
';
id fld_1
1 "test value"
2 "test value"
3 "test value"
Thanks, Adrian. That works, but since we're using quotes to embed the
delimiter, we lose the simplicity of TSV. I can't just do a split on
/\t/ to get the fields and then unescape the values. At that point it's
probably simpler to just switch to standard CSV.
id fld_1
1 test\tvalue
2 test\tvalue
3 test\tvalue
I guess it depends on what you are using.
[snip]
Yeah, I can parse CSV easily enough. My boss wants TSV though (I could
parse CSV and split out TSV, of course). Even then, having to take the
report query (which can be big), strip all the newlines and insert it
into a \copy command is kind of a PITA.
I see CSV as a generic term that covers all separated value formats, so
TSV is just a variation. I am not sure exactly what the end point of all
this, so I am probably not going to be able to offer much more. One
heads up, in 9.3+ you have the option in \copy(COPY) to use an external
program to copy TO or FROM
I also took a look at the psql source. It doesn't look like it would be
very hard to add some backslash escaping logic as an option. Am I the
only one that would use that? For reports, everyone else mostly uses
other tools? I'd like to stay away from GUI-tools, if possible.
Not sure what you are asking for here. Something like this?:

http://www.postgresql.org/docs/9.3/static/sql-syntax-lexical.html

4.1.2.2. String Constants with C-style Escapes

test=> select E'a\tb';
?column?
-----------
a b
(1 row)
Thanks.
--
Adrian Klaver
***@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Jeff Janes
2014-09-07 19:28:01 UTC
Permalink
Post by Abelard Hoffman
Hi.
--no-align --field-separator '\t' --pset footer=off
That works in most cases, except when your column values contain tabs
themselves.
I know that COPY() will escape tabs (as \t), and we can use that from psql
with the \copy command, but that does not include a header row of the
column names.
Which is a shame. \copy really should allow HEADER in the default format,
not just CSV format.

And it on the to-do list, just hasn't be done yet:

https://wiki.postgresql.org/wiki/Todo#COPY

It seems like it should be fairly easy to do.
Post by Abelard Hoffman
So, my question is, what's the simplest way to generate tab-escaped
TSV-formatted reports with the first line containing the list of column
names?
I just assume that none of the column names need escaping, and so select
and join them on tabs. At one point I had a perl script that would do this
for me, e.g. given a query, it would execute it once with a 'and 1=0' at
the end (obviously can't be done legally/efficiently/safely with all
queries) to get the column names, then again in a \COPY to get the data,
but I seem to have misplaced it.

It worked well as long as you understood it was a dirty hack and so had the
limitations of one.


Cheers,

Jeff
Abelard Hoffman
2014-09-08 18:39:43 UTC
Permalink
[snip]
Post by Abelard Hoffman
I know that COPY() will escape tabs (as \t), and we can use that from psql
Post by Abelard Hoffman
with the \copy command, but that does not include a header row of the
column names.
Which is a shame. \copy really should allow HEADER in the default format,
not just CSV format.
https://wiki.postgresql.org/wiki/Todo#COPY
It seems like it should be fairly easy to do.
Ah, excellent. That would solve most of my issue.
Post by Abelard Hoffman
So, my question is, what's the simplest way to generate tab-escaped
Post by Abelard Hoffman
TSV-formatted reports with the first line containing the list of column
names?
I just assume that none of the column names need escaping, and so select
and join them on tabs. At one point I had a perl script that would do this
for me, e.g. given a query, it would execute it once with a 'and 1=0' at
the end (obviously can't be done legally/efficiently/safely with all
queries) to get the column names, then again in a \COPY to get the data,
but I seem to have misplaced it.
It worked well as long as you understood it was a dirty hack and so had
the limitations of one.
Yes, that's exactly what I need, although I'm loathe to run the queries
twice just to get the headers.

Thanks to everyone for their comments and suggestions.

As additional background, I have a dozen or so "reports" that exist as
plain text files, and they're just run by cron with psql and mailed to
those who are interested in them. Historically, all of them have been TSV.
It's worked fine since all of the data has been computed (e.g., sales
counts, etc.). But with a recent report, we're including a customer
feedback comment, which obviously can have tabs and newlines, etc.
Post by Abelard Hoffman
I also took a look at the psql source. It doesn't look like it would be
Post by Abelard Hoffman
very hard to add some backslash escaping logic as an option. Am I the
only one that would use that? For reports, everyone else mostly uses
other tools? I'd like to stay away from GUI-tools, if possible.
http://www.postgresql.org/docs/9.3/static/sql-syntax-lexical.html
I was meaning, it seems like it would be useful to have psql be able to
behave like COPY but without having to call \copy.
So IOW, something like:

psql --no-align --field-separator '\t' --field-escape

Where the --field-escape option would cause the data to be escaped exactly
like COPY does. Having the HEADER option to COPY's default format will
largely solve this though. Then it's just a matter of slurping in the
report files, making sure all newlines are removed, and passing them to the
appropriate \copy command.
Melvin Davidson
2014-09-08 00:17:52 UTC
Permalink
id fld_1
1 test\tvalue
2 test\tvalue
3 >test\tvalue
*Does this work for you?copy (SELECT id, replace(fld_1, ' ', '\t') FROM
tsv_test) to stdout with csv header delimiter ' ';query returned copy
data:id replace1 test\tvalue2 test\tvalue3 test\tvalue*
--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
Abelard Hoffman
2014-09-08 21:46:16 UTC
Permalink
Post by Melvin Davidson
id fld_1
1 test\tvalue
2 test\tvalue
3 >test\tvalue
*Does this work for you?copy (SELECT id, replace(fld_1, ' ', '\t') FROM
tsv_test) to stdout with csv header delimiter ' ';*
Thank you, Melvin. Yes, I think that's what I'll do for the time-being,
until COPY's TSV format supports HEADER.
Loading...