Discussion:
Beyond the 1600 columns limit on windows
Evandro's mailing lists (Please, don't send personal messages to this address)
2005-11-08 11:58:27 UTC
Permalink
Hi guys,
I would like to know if it is possible to have more than 1600 columns on
windows without recompiling postgres.
Regards
-Evandro

--
Evandro M Leite Jr
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Please, use Jr(at)evandro.org for personal messages
Richard Huxton
2005-11-08 12:10:27 UTC
Permalink
Evandro's mailing lists (Please, don't send personal messages to this
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
Hi guys,
I would like to know if it is possible to have more than 1600 columns on
windows without recompiling postgres.
I don't think so. Are you sure you need more than 1600 columns? That's
many more than I've ever wanted or needed.

If you can share some details of the problem you are trying to solve,
perhaps someone can see a different solution for you.
--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Tino Wildenhain
2005-11-08 12:50:54 UTC
Permalink
Evandro's mailing lists (Please, don't send personal messages to this
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
Hi guys,
I would like to know if it is possible to have more than 1600 columns on
windows without recompiling postgres.
I would like to know who on earth needs 1600 columns and even beyond?
Hint: you can have practically unlimited rows in your n:m table :-)

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Evandro's mailing lists (Please, don't send personal messages to this address)
2005-11-08 13:14:03 UTC
Permalink
I'm doing a PhD in data mining and I need more than 1600 columns. I got an
error message saying that I can not use more than 1600 columns.
It is happening because I have to change categorical values to binary
creating new columns. Do you know if oracle can handle it?
--
Evandro M Leite Jr.
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Mobile 079 068 70740 Office 023 8055 3644 Home 023 8055 9160
Post by Richard Huxton
Evandro's mailing lists (Please, don't send personal messages to this
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
Hi guys,
I would like to know if it is possible to have more than 1600 columns on
windows without recompiling postgres.
I would like to know who on earth needs 1600 columns and even beyond?
Hint: you can have practically unlimited rows in your n:m table :-)
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
--
Evandro M Leite Jr
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Please, use Jr(at)evandro.org for personal messages
Richard Huxton
2005-11-08 13:48:26 UTC
Permalink
Evandro's mailing lists (Please, don't send personal messages to this
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
I'm doing a PhD in data mining and I need more than 1600 columns. I got an
error message saying that I can not use more than 1600 columns.
It is happening because I have to change categorical values to binary
creating new columns.
Perhaps you don't want a relational database at all if you are
stretching it to match your client application in this way. Do I have it
right that you have something like

Table: bird_sighting_facts (bird, category, value)
1 | wingspan | 120mm
2 | beak-colour | red
3 | chest-colour| blue
...

And are converting it into:
expanded_bird_facts (bird, cat_wingspan, cat_beak_colour,
cat_chest_colour, ...)

In which case since you'll almost certainly be throwing away any
relational integrity you had in the first case I'd just throw a
lightweight wrapper around some dbfile files or similar.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Tom Lane
2005-11-08 14:56:28 UTC
Permalink
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
I'm doing a PhD in data mining and I need more than 1600 columns.
I don't think so --- consider redesigning your data model instead.
For instance, maybe you could combine similar columns into an array.
Or split the table into an m:n linking structure. Even coming close
to that implementation limit suggests bad SQL design; if we thought
it was a realistic problem we would have increased it long ago...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Tino Wildenhain
2005-11-08 15:43:07 UTC
Permalink
Evandro's mailing lists (Please, don't send personal messages to this
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
I'm doing a PhD in data mining and I need more than 1600 columns. I
got an error message saying that I can not use more than 1600 columns.
It is happening because I have to change categorical values to
binary creating new columns. Do you know if oracle can handle it?
pardon, but as PhD you should be able to do sensible database design.
Even if you would have more then 1600 columns, you cannot expect
very good performance with it (on nearly any database).

I'd strongly recommend to replan your table layout. You can get help
here if you provide more information on your plans.


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Randal L. Schwartz
2005-11-08 13:30:07 UTC
Permalink
[I would have replied to your personal address, but I'm not about
to copy it from a footer.]

Evandro's> I'm doing a PhD in data mining and I need more than 1600 columns. I got an
Evandro's> error message saying that I can not use more than 1600 columns.
Evandro's> It is happening because I have to change categorical values to binary
Evandro's> creating new columns. Do you know if oracle can handle it?

/me boggles

You are doing a PhD in data mining, and you have a table that needs
more than 1600 columns?

/me gasps

What are they *teaching* these days?

If you have a design that has more than 20 or so columns, you're
probably already not normalizing properly. There just aren't *that*
many attributes of a object before you should start factoring parts of
it out, even if it means creating some 1-1 tables.

In programming, if I ever see someone name a sequence of variables,
like "thing1" and "thing2", I know there's going to be trouble ahead,
because that should have been a different data structure. Similarly,
I bet some of your columns are "foo1" and "foo2". Signs of brokenness
in the design.

Or do you really have 1600 *different* attributes, none of which have
a number in their name? That requires a serious amount of
creativity. :)
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<***@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Evandro's mailing lists (Please, don't send personal messages to this address)
2005-11-08 17:01:32 UTC
Permalink
Sorry,
It has nothing to do with normalisation. It is a program for scientific
applications.
Data values are broken into column to allow multiple linear regression and
multivariate regression trees computations.
Even SPSS the most well-known statistic sw uses the same approach and data
structure that my software uses.
Probably I should use another data structure but would not be as eficient
and practical as the one I use now.
Many thanks
-Evandro
Post by Randal L. Schwartz
"Evandro's" == Evandro's mailing lists (Please, don't send personal
[I would have replied to your personal address, but I'm not about
to copy it from a footer.]
Evandro's> I'm doing a PhD in data mining and I need more than 1600 columns. I got an
Evandro's> error message saying that I can not use more than 1600 columns.
Evandro's> It is happening because I have to change categorical values to
binary
Evandro's> creating new columns. Do you know if oracle can handle it?
/me boggles
You are doing a PhD in data mining, and you have a table that needs
more than 1600 columns?
/me gasps
What are they *teaching* these days?
If you have a design that has more than 20 or so columns, you're
probably already not normalizing properly. There just aren't *that*
many attributes of a object before you should start factoring parts of
it out, even if it means creating some 1-1 tables.
In programming, if I ever see someone name a sequence of variables,
like "thing1" and "thing2", I know there's going to be trouble ahead,
because that should have been a different data structure. Similarly,
I bet some of your columns are "foo1" and "foo2". Signs of brokenness
in the design.
Or do you really have 1600 *different* attributes, none of which have
a number in their name? That requires a serious amount of
creativity. :)
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com <http://PerlTraining.Stonehenge.com> for
onsite and open-enrollment Perl training!
--
Evandro M Leite Jr
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Please, use Jr(at)evandro.org for personal messages
Richard Huxton
2005-11-08 17:40:14 UTC
Permalink
Evandro's mailing lists (Please, don't send personal messages to this
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
Sorry,
It has nothing to do with normalisation. It is a program for scientific
applications.
It has everything to do with normalisation. You appear to be pushing
application presentation issues into the structure of your database. If
SQL allowed you, this would break 1NF.
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
Data values are broken into column to allow multiple linear regression and
multivariate regression trees computations.
Sounds like you want an array then (or perhaps several arrays).
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
Even SPSS the most well-known statistic sw uses the same approach and data
structure that my software uses.
Ah - and they've made a good choice?
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
Probably I should use another data structure but would not be as eficient
and practical as the one I use now.
The structure you use inside your application and the data definition
used by the database are two separate things. You presumably are doing
some transformation of data on fetching it anyway - I'd switch
rows-columns over then.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
John D. Burger
2005-11-08 19:14:58 UTC
Permalink
Evandro's mailing lists (Please, don't send personal messages to this
It has nothing to do with normalisation.  It is a program for
scientific applications.
Data values are broken into column to allow multiple linear regression
and multivariate regression trees computations.
Having done similar things in the past, I wonder if your current DB
design includes a column for every feature-value combination:

instanceID color=red color=blue color=yellow ... height=71
height=72
-------------------------------------------------
42 True False False
43 False True False
44 False False True
...

This is likely to be extremely sparse, and you might use a sparse
representation accordingly. As several folks have suggested, the
representation in the database needn't be the same as in your code.
Even SPSS the most well-known statistic sw uses the same approach and
data structure that my software uses.
Probably I should use another data structure but would not be as
eficient and practical as the one I use now.
The point is that, if you want to use Postgres, this is not in fact
efficient and practical. In fact, it might be the case that mapping
from a sparse DB representation to your internal data structures is
=more= efficient than naively using the same representation in both
places.

- John D. Burger
MITRE

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Jim C. Nasby
2005-11-09 00:18:36 UTC
Permalink
Post by Richard Huxton
Evandro's mailing lists (Please, don't send personal messages to this
It has nothing to do with normalisation.? It is a program for
scientific applications.
Data?values are broken into column to allow multiple linear regression
and multivariate regression trees computations.
Having done similar things in the past, I wonder if your current DB
instanceID color=red color=blue color=yellow ... height=71
height=72
-------------------------------------------------
42 True False False
43 False True False
44 False False True
...
This is likely to be extremely sparse, and you might use a sparse
representation accordingly. As several folks have suggested, the
representation in the database needn't be the same as in your code.
Even SPSS?the most well-known statistic sw uses the same approach and
data structure that my software uses.
Probably I should use another data structure but would not be as
eficient and practical as the one I use now.
The point is that, if you want to use Postgres, this is not in fact
efficient and practical. In fact, it might be the case that mapping
from a sparse DB representation to your internal data structures is
=more= efficient than naively using the same representation in both
places.
s/Postgres/just about any database/

BTW, even if you're doing logic in the database that doesn't mean you
have to stick with the way you're representing things. There's ways to
get the same info via conventional SQL that doesn't involve building a
huge crosstab.

Something interesting is that the data structure presented here looks a
hell of a lot like a bitmap index, something new in 8.1 (well, at least
bitmap index scans).
--
Jim C. Nasby, Sr. Engineering Consultant ***@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Evandro's mailing lists (Please, don't send personal messages to this address)
2005-11-09 15:51:45 UTC
Permalink
Yes it is exactly that. I will follow you advice and create a abstraction
layer for the data access that will return the sparse dataset using the
standard dataset as input.
There is just one thing I disagree you said it that the performance is not
good, right. However, it is practical! Nothing is easier and more practical
than keeping the sparse representation inside of the database for my
application.
Post by Richard Huxton
Evandro's mailing lists (Please, don't send personal messages to this
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
It has nothing to do with normalisation. It is a program for
scientific applications.
Datavalues are broken into column to allow multiple linear regression
and multivariate regression trees computations.
Having done similar things in the past, I wonder if your current DB
instanceID color=red color=blue color=yellow ... height=71
height=72
-------------------------------------------------
42 True False False
43 False True False
44 False False True
...
This is likely to be extremely sparse, and you might use a sparse
representation accordingly. As several folks have suggested, the
representation in the database needn't be the same as in your code.
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
Even SPSSthe most well-known statistic sw uses the same approach and
data structure that my software uses.
Probably I should use another data structure but would not be as
eficient and practical as the one I use now.
The point is that, if you want to use Postgres, this is not in fact
efficient and practical. In fact, it might be the case that mapping
from a sparse DB representation to your internal data structures is
=more= efficient than naively using the same representation in both
places.
- John D. Burger
MITRE
--
Evandro M Leite Jr
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Please, use Jr(at)evandro.org for personal messages
Tino Wildenhain
2005-11-09 16:22:26 UTC
Permalink
Evandro's mailing lists (Please, don't send personal messages to this
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
Yes it is exactly that. I will follow you advice and create a
abstraction layer for the data access that will return the sparse
dataset using the standard dataset as input.
There is just one thing I disagree you said it that the performance is
not good, right. However, it is practical! Nothing is easier and more
practical than keeping the sparse representation inside of the database
for my application.
I bet even your application would profit from not handling sparse data.
You could just "not" insert them into your tree instead of having
to jump over empty elements.

And there is always a way to lazily abstract the data to some
frontend (While I doubt anybody can actuall scroll wide enough on
a screen to see all the 1600 colums ;)


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Chris Travers
2005-11-23 20:15:52 UTC
Permalink
Evandro's mailing lists (Please, don't send personal messages to this
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
Yes it is exactly that. I will follow you advice and create a
abstraction layer for the data access that will return the sparse
dataset using the standard dataset as input.
There is just one thing I disagree you said it that the performance is
not good, right. However, it is practical! Nothing is easier and more
practical than keeping the sparse representation inside of the
database for my application.
I think that you misunderstand the problems that come from doing it that
way.

The basic issue is this: the point of creating a system using an RDBMS
is to separate your data from your application. This allows for various
methods of data mining later. If you are presenting information in your
DB this way, you are breaking that assumption and so you get very little
(if any) benefit from using PostgreSQL instead of something like BDB.
Database design is usually about concise and unambiguous representation
of data and normalization is a part of this. This allows various
applications or ad hoc queries to be able to draw against the data in
meaningful ways.

The traditional way of representing sparse data is to use a join.

CREATE TABLE bird (
id SERIAL PRIMARY KEY,
description TEXT,
bname TEXT);

CREATE TABLE bird_color (
bird_id INT REFERENCES bird (id),
color TEXT,
is_color BOOL,
UNIQUE (bird_id, color));

Now, you can select bname from bird where (select is_color from
bird_color where color = 'red' and bird_id = bird.id)

The point is not that this makes it easier or harder from the stand
point of your application but that it makes it easier to build secondary
apps against the same data set, and that it avoids various ambiguities
that could result from secondary data entry/analysis apps.

People have suggested using an array, and that would be possible as well
(undecided whether this breaks first normal form) but this will
certainly cause more headache with secondary apps. Please understand
that PostgreSQL is designed around a world where these secondary apps
inevitably get built while a lot of commerical, off the shelf software
assumes that they won't be. This is why many of us question the DB
design of these apps.

I hope that this helps explain some of the why's of this thread.

Best Wishes,
Chris Travers
Metatron Technology Consulting

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Tino Wildenhain
2005-11-08 15:45:12 UTC
Permalink
Evandro's mailing lists (Please, don't send personal messages to this
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
Hi guys,
I would like to know if it is possible to have more than 1600
columns on windows without recompiling postgres.
I would like to know who on earth needs 1600 columns and even beyond?
Hint: you can have practically unlimited rows in your n:m table :-)
Well this screams random arbitrary limit to me. Why does this limit
exist? What ever happened to the holy 0,1,infinity triumvirate?
I guess it eases implementation and there is no reason to go so high
on columns either. The limit could even be lower w/o and hurts but
1600 seems skyrocket high enough (read unlimited :-)


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Scott Marlowe
2005-11-08 16:06:17 UTC
Permalink
Post by Tino Wildenhain
Evandro's mailing lists (Please, don't send personal messages to this
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
Hi guys,
I would like to know if it is possible to have more than 1600
columns on windows without recompiling postgres.
I would like to know who on earth needs 1600 columns and even beyond?
Hint: you can have practically unlimited rows in your n:m table :-)
Well this screams random arbitrary limit to me. Why does this limit
exist? What ever happened to the holy 0,1,infinity triumvirate?
I guess it eases implementation and there is no reason to go so high
on columns either. The limit could even be lower w/o and hurts but
1600 seems skyrocket high enough (read unlimited :-)
I'd have to vote with Tino here. Why worry about an arbitrary limit you
should never really be approaching anyway. If a table has more than
several dozen columns, you've likely missed some important step of
normalization. Once you near 100 columns, something is usually horribly
wrong. I cannot imagine having a table that actually needed 1600 or
more columns.

And, Evandro, nothing is free. If someone went to the trouble of
removing the limit of 1600, we'd probably pay in some other way, most
likely with poor performance. There are other, far more important
features to work on, I'd think.

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Alex Stapleton
2005-11-08 16:42:26 UTC
Permalink
Post by Scott Marlowe
Post by Tino Wildenhain
Evandro's mailing lists (Please, don't send personal messages to this
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
Hi guys,
I would like to know if it is possible to have more than 1600
columns on windows without recompiling postgres.
I would like to know who on earth needs 1600 columns and even beyond?
Hint: you can have practically unlimited rows in your n:m table :-)
Well this screams random arbitrary limit to me. Why does this limit
exist? What ever happened to the holy 0,1,infinity triumvirate?
I guess it eases implementation and there is no reason to go so high
on columns either. The limit could even be lower w/o and hurts but
1600 seems skyrocket high enough (read unlimited :-)
I'd have to vote with Tino here. Why worry about an arbitrary
limit you
should never really be approaching anyway. If a table has more than
several dozen columns, you've likely missed some important step of
normalization. Once you near 100 columns, something is usually horribly
wrong. I cannot imagine having a table that actually needed 1600 or
more columns.
And, Evandro, nothing is free. If someone went to the trouble of
removing the limit of 1600, we'd probably pay in some other way, most
likely with poor performance. There are other, far more important
features to work on, I'd think.
Oh wait, PG is written in C isn't it. I guess fixed size things are a
bit easier to deal with. Pardon me then :)

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
Bruno Wolff III
2005-11-08 17:44:25 UTC
Permalink
Post by Tino Wildenhain
Well this screams random arbitrary limit to me. Why does this limit
exist? What ever happened to the holy 0,1,infinity triumvirate?
I guess it eases implementation and there is no reason to go so high
on columns either. The limit could even be lower w/o and hurts but
1600 seems skyrocket high enough (read unlimited :-)
It is probably what fits in a single block.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Tom Lane
2005-11-08 16:07:05 UTC
Permalink
Post by Tino Wildenhain
Well this screams random arbitrary limit to me. Why does this limit
exist? What ever happened to the holy 0,1,infinity triumvirate?
I guess it eases implementation and there is no reason to go so high
on columns either. The limit could even be lower w/o and hurts but
1600 seems skyrocket high enough (read unlimited :-)
The rationale is laid out in excruciating detail in src/include/access/htup.h:

/*
* MaxTupleAttributeNumber limits the number of (user) columns in a tuple.
* The key limit on this value is that the size of the fixed overhead for
* a tuple, plus the size of the null-values bitmap (at 1 bit per column),
* plus MAXALIGN alignment, must fit into t_hoff which is uint8. On most
* machines the upper limit without making t_hoff wider would be a little
* over 1700. We use round numbers here and for MaxHeapAttributeNumber
* so that alterations in HeapTupleHeaderData layout won't change the
* supported max number of columns.
*/
#define MaxTupleAttributeNumber 1664 /* 8 * 208 */

/*----------
* MaxHeapAttributeNumber limits the number of (user) columns in a table.
* This should be somewhat less than MaxTupleAttributeNumber. It must be
* at least one less, else we will fail to do UPDATEs on a maximal-width
* table (because UPDATE has to form working tuples that include CTID).
* In practice we want some additional daylight so that we can gracefully
* support operations that add hidden "resjunk" columns, for example
* SELECT * FROM wide_table ORDER BY foo, bar, baz.
* In any case, depending on column data types you will likely be running
* into the disk-block-based limit on overall tuple size if you have more
* than a thousand or so columns. TOAST won't help.
*----------
*/
#define MaxHeapAttributeNumber 1600 /* 8 * 200 */

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
Alex Stapleton
2005-11-08 13:31:32 UTC
Permalink
Post by Richard Huxton
Evandro's mailing lists (Please, don't send personal messages to
Post by Evandro's mailing lists (Please, don't send personal messages to this address)
Hi guys,
I would like to know if it is possible to have more than 1600
columns on windows without recompiling postgres.
I would like to know who on earth needs 1600 columns and even beyond?
Hint: you can have practically unlimited rows in your n:m table :-)
Well this screams random arbitrary limit to me. Why does this limit
exist? What ever happened to the holy 0,1,infinity triumvirate?

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Continue reading on narkive:
Loading...