Extending SQL in C using VARIABLE length type

Discussion:

Carsten Kropf

2010-02-10 08:32:58 UTC

Hello everybody,
I am quite a novice in using the extension features of the PostgreSQL database. Actually, I have to do this for work at the university. At the moment, I am trying around a little bit with creating my own types using shared objects, written in C. The usage of static types with fixed length was actually no problem for me, so I proceeded to variable length types.
I created an n-dimensional point structure called "PointND" that contains a field of float8 values of dynamic length. I also put in a int4/int32 field for the length specification, as required by the documentation. So the structure looks like the following:
struct PointND
{
int32 dimensions;
float8 coordinates[1];
};
I hope, that at least this layout is as it is required. Together with this type, I also provide the required in and out functions to convert the structure to the internal/external representation properly.
The in/out functions work properly when giving the following statement (I also tried it using a debugger):
select '(4,5,6)'::pointnd;
pointnd
--------------------------------
(4.000000, 5.000000, 6.000000)
(1 row)

So it seems, that at least these functions do what they are supposed to.
The problem I have is that if I now create a table that should store entries of the type pointnd, it won't store them actually. If I do an insert like the following:
insert into test (point) values ('(5,16,6)'::pointnd);
INSERT 0 1

I get the feedback that one new row has been created. Actually this row has been created and the in function is also called (I also checked this using the debugger). Now, I would have expected something like the following, when querying the table:
select * from test;
point
--------------------------------
(5.000000, 16.000000, 6.000000)

But, actually I get the following:
select * from test;
point
--------------------------------
(0.000000, 0.000000, 0.000000)

The SQL-Script used to create the type can be seen here:
CREATE TYPE pointnd(
INTERNALLENGTH = VARIABLE,
ALIGNMENT=DOUBLE,
INPUT=pointnd_in,
OUTPUT=pointnd_out,
RECEIVE=pointnd_recv,
SEND=pointnd_send,
STORAGE=PLAIN
);

I played around with the parameters a little bit, but still don't know where this behaviour comes from. Actually, I was thinking that I conform to the requirements given by Postgres after having read the documentation. Storage type set to another method (like MAIN) will result in a segmentation fault, though.
I would be very glad, if somebody could provide me some help to this issue because I could proceed with my "actual" work, after that.
Thank you in advance

Best regards

Carsten Kropf

Yeb Havinga

2010-02-10 10:39:24 UTC

Permalink

Post by Carsten Kropf
The usage of static types with fixed length was actually no problem
for me, so I proceeded to variable length types.
I created an n-dimensional point structure called "PointND" that
contains a field of float8 values of dynamic length. I also put in a
int4/int32 field for the length specification, as required by the
struct PointND
{
int32 dimensions;
float8 coordinates[1];
};

The structure should begin with a int32 vl_len_ header. At creation /
palloc time the size must be set with the SET_VARSIZE macro, and the
size can be queried with the VARSIZE_ * macros -> doxygen.postgresql.org
is your friend here. Take a look at e.g. contrib/cube for examples.

regards,
Yeb Havinga

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Carsten Kropf

2010-02-10 10:51:36 UTC

Permalink

Actually, I thought, I did this using the int32 variable called "dimension" which should be exactly this field. Unfortunately, it seems, that something is wrong here. I'll look inside the code of cube to determine the things I'm doing wrong, currently. Thanks so far for your advice.
My in-method is the following:
Datum pointnd_in(PG_FUNCTION_ARGS)
{
PointND * in = NULL;
// access the format string
char * in_string = PG_GETARG_CSTRING(0);
char * save = in_string;
// 1024 bytes should be sufficient for one coordinate
char curr_buffer[1024];
bool corrupted = false;
float8 * coordinates = NULL;
unsigned int dimensions = 0, i = 0;

coordinates = (float8 *) palloc(sizeof(float8));

// allocate the memory
// read bytewise and count the ',' in order to determine the amount of dimensions, after that: parse the point
in_string = find_char(in_string, '(');
// next sign after opening bracket
++in_string;
// read as long, as no closing bracket has been found
for (dimensions = 0; *in_string != ')' && !corrupted; ++dimensions)
{
// clear the memory
memset(curr_buffer, 0, 1024);
for (i = 0; *in_string != ',' && *in_string != 0 && *in_string != ')'; ++in_string, ++i)
{
if ((*in_string < '0' || *in_string > '9') && *in_string != '.')
{
corrupted = true;
break;
}
// copy current sign
curr_buffer[i] = *in_string;
}

// something has happened here (no valid number)
if (corrupted)
{
break;
}
coordinates = (float8 *) repalloc(coordinates, (dimensions + 1) * sizeof(float8));
//sscanf(curr_buffer, "%f", &coordinates[dimensions]);
coordinates[dimensions] = strtod(curr_buffer, NULL);

// if we have a comma here, skip it
if (*in_string == ',')
{
++in_string;
}
if (*in_string == ' ')
{
// skip space
++in_string;
}
}

// something lead to a corruption of the point
if (corrupted)
{
ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("invalid representation of a point: %s, has to look like \"(<coord1>,<coord2>,...)\"", save)));
}
else
{
in = (PointND *) palloc(sizeof(float8) * dimensions + VARHDRSZ);
SET_VARSIZE(in, dimensions);
// copy the coordinates to the data area
memcpy((void *) VARDATA(in), (void *) coordinates, (dimensions) * sizeof(float8));
}

PG_RETURN_POINTER(in);
}

So, probably in here, I am doing something terribly wrong and you could correct me at this point. As I already said, I thought that my int32 dimensions variable would represent exactly this vl_len_header field. The remaining stuff is OK then? That means the creation script and information about storage alignment and layout or do you detect addititonal errors?
I know, that this code might not be the best, but I am still in the testing phase of how to achieve sth using Postgres with C.
Thanks so far for your advice
Best regards
Carsten Kropf

Post by Carsten Kropf
The usage of static types with fixed length was actually no problem for me, so I proceeded to variable length types.
struct PointND
{
int32 dimensions;
float8 coordinates[1];
};

The structure should begin with a int32 vl_len_ header. At creation / palloc time the size must be set with the SET_VARSIZE macro, and the size can be queried with the VARSIZE_ * macros -> doxygen.postgresql.org is your friend here. Take a look at e.g. contrib/cube for examples.
regards,
Yeb Havinga

Yeb Havinga

2010-02-10 11:04:09 UTC

Permalink

Post by Carsten Kropf
Actually, I thought, I did this using the int32 variable called
"dimension" which should be exactly this field.

yes.

Post by Carsten Kropf
in = (PointND *) palloc(sizeof(float8) * dimensions + VARHDRSZ);
SET_VARSIZE(in, dimensions);

What about

len = sizeof(float8) * dimensions + VARHDRSZ;
in = (PointND *) palloc0(len);
SET_VARSIZE(in, len);

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Carsten Kropf

2010-02-10 11:09:18 UTC

Permalink

Oh, I see, does the VARSIZE length field have to be the total number of bytes occupied (including VARHDRSZ and the size of the structure) or only the size that is used by "my" datatype? Then it would become pretty much obvious, why this is not supposed to work.
I'll try it out then.

regards
Carsten Kropf

Post by Yeb Havinga

Post by Carsten Kropf
Actually, I thought, I did this using the int32 variable called "dimension" which should be exactly this field.

yes.

Post by Carsten Kropf
in = (PointND *) palloc(sizeof(float8) * dimensions + VARHDRSZ);
SET_VARSIZE(in, dimensions);

What about
len = sizeof(float8) * dimensions + VARHDRSZ;
in = (PointND *) palloc0(len);
SET_VARSIZE(in, len);
--
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Yeb Havinga

2010-02-10 11:20:46 UTC

Permalink

Post by Carsten Kropf
Oh, I see, does the VARSIZE length field have to be the total number of bytes occupied (including VARHDRSZ and the size of the structure) or only the size that is used by "my" datatype?

Yes

Post by Carsten Kropf
Then it would become pretty much obvious, why this is not supposed to work.
I'll try it out then.

My €0,02: rename the dimensions to vl_len_ to avoid confusion and get
compiler errors where you now use 'dimension'. Add a macro that converts
a pointnd structure to dimension int by taking the VARSIZE_ANY_EXHDR /
sizeof(float8) and use it where dimension is used now. Or if your
database is small you could keep dimension in the structure.

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Carsten Kropf

2010-02-10 12:17:37 UTC

Permalink

Thanks for the hint according to the cube, this was actually exactly what I have been looking for. I wanted to do something similar like the cube but I didn't think that it would be implemented in multiple dimension. I just thought, the cube were a 3-d construct, but as I see in the sources, it is, in fact, n-dimensional. So my problems are solved here. Btw I could manage to get my own point to be saved in the database using your hints, thanks for this.

regards
Carsten Kropf

Yes

Post by Carsten Kropf
Then it would become pretty much obvious, why this is not supposed to work.
I'll try it out then.

My €0,02: rename the dimensions to vl_len_ to avoid confusion and get compiler errors where you now use 'dimension'. Add a macro that converts a pointnd structure to dimension int by taking the VARSIZE_ANY_EXHDR / sizeof(float8) and use it where dimension is used now. Or if your database is small you could keep dimension in the structure.
--
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Carsten Kropf

2010-02-11 07:54:12 UTC

Permalink

Thanks a lot so far. I adopted my structures and am now storing two fields (v_len_ and dimensions) and the storage is now working properly. If I now would try to combine two of these points to a range (like cube) including an upper and a lower bound n-dimensional point structure, I don't get the point how to achieve this.
I tried around a little bit and ended up with a structure like the following:
/**
* basic structure definition for a range containing an upper and a lower point (in multiple dimensions)
*/
struct Range
{
int32 v_len_;
/**
* the upper limit in each dimension
*/
struct PointND * upper;
/**
* the lower limit in each dimension
*/
struct PointND * lower;
};
However, the problem is again, how to put this range into a table. Actually, I don't know exactly, how to do this, I tried the following:
len = VARSIZE(upper) + VARSIZE(lower) + VARHDRSZ + 2 * sizeof(struct Point *);
result = (Range *) palloc0(len);
// result->upper = upper;
// result->lower = lower;
memcpy((void *) result->upper, (void *) upper, VARSIZE(upper));
memcpy((void *) result->lower, (void *) lower, VARSIZE(lower));

// set the var size
SET_VARSIZE(result, len);
But this didn't do the trick. I did not yet find sth in the code of postgres, how to build such a combined type, unfortunately (or I did not look at the right places until now). How would one do this?

Thanks in advance
regards
Carsten Kropf

Yes

Post by Carsten Kropf
Then it would become pretty much obvious, why this is not supposed to work.
I'll try it out then.

My 0,02: rename the dimensions to vl_len_ to avoid confusion and get compiler errors where you now use 'dimension'. Add a macro that converts a pointnd structure to dimension int by taking the VARSIZE_ANY_EXHDR / sizeof(float8) and use it where dimension is used now. Or if your database is small you could keep dimension in the structure.
--
http://www.postgresql.org/mailpref/pgsql-general

Tom Lane

2010-02-11 15:39:04 UTC

Permalink

Post by Carsten Kropf
Thanks a lot so far. I adopted my structures and am now storing two fields (v_len_ and dimensions) and the storage is now working properly. If I now would try to combine two of these points to a range (like cube) including an upper and a lower bound n-dimensional point structure, I don't get the point how to achieve this.

Well, you can either make that a separate data type with its own
specialized functions for extracting the points, or you can make it into
an array, or possibly a composite type (record). The example code seems
to be headed towards the first of these but maybe you should consider
the alternatives.

The main problem with your example code seems to be that it's confusing
a struct with a pointer to a struct. If you made the struct members
be "struct PointND" rather than pointer to same, it would probably
work, as long as points aren't actually variable-length. Otherwise
you'd need to deal with the fact that "lower" isn't really at a fixed
offset in the larger struct.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Carsten Kropf

2010-02-11 16:38:21 UTC

Permalink

Thanks for this hint, I already got it to work in the meantime.
My approach now (based on the fact, that PointND is indeed a variable length type) is to have the following structure:
struct Range
{
int vl_len_;
struct PointND limits[1];
};
whereas now vl_len_ stores the total size of the structure (including the size of the limiting points) and the points themselves keep the layout contained in one of my previous mails.
I programmed some macros to access the lower point (limits[0]) and the upper point (limits[1]), respectively.
This approach works also when putting it into a table, whereas it will not be like this, if I will be doing the whole thing for productive implementation. Each of the two points contains a vl_len_ field and a dimension field whereas the dimensions have to be the same for each point. So, according the storage of the range structure, this would yield an overhead of 2 * vl_len_ and once the dimension field each of which is int32. So, in total, we would have an overhead of 12 Bytes for each Range stored somewhere compared to a Range that has just one-dimensional offsets to show the upper/lower bound, as seen in cube.
regards
Carsten Kropf

Post by Tom Lane

Well, you can either make that a separate data type with its own
specialized functions for extracting the points, or you can make it into
an array, or possibly a composite type (record). The example code seems
to be headed towards the first of these but maybe you should consider
the alternatives.
The main problem with your example code seems to be that it's confusing
a struct with a pointer to a struct. If you made the struct members
be "struct PointND" rather than pointer to same, it would probably
work, as long as points aren't actually variable-length. Otherwise
you'd need to deal with the fact that "lower" isn't really at a fixed
offset in the larger struct.
regards, tom lane
--
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Continue reading on narkive:

Search results for 'Extending SQL in C using VARIABLE length type' (Questions and Answers)

replies

can i get question answer of asp.net ?

started 2006-10-11 00:02:47 UTC

software

replies

acessing cached pages?

started 2006-07-07 05:56:14 UTC

computers & internet

replies

object oriented DBMS?