Postgres as key/value store

Discussion:

snacktime

2014-09-27 23:48:06 UTC

I'm looking for some feedback on the design I'm using for a basic key/value
storage using postgres.

Just some quick background. This design is for large scale games that can
get up to 10K writes per second or more. The storage will be behind a
distributed memory cache that is built on top of Akka, and has a write
behind caching mechanism to cut down on the number of writes when you have
many updates in a short time period of the same key, which is common for a
lot of multiplayer type games.

I have been using Couchbase, but this is an open source project, and
Couchbase is basically a commercial product for all intents and purposes,
which is problematic. I will still support Couchbase, but I don't want it
have to tell people if you really want to scale, couchbase is the only
option.

The schema is that a key is a string, and the value is a string or binary.
I am actually storing protocol buffer messages, but the library gives me
the ability to serialize to native protobuf or to json. Json is useful at
times especially for debugging.

This is my current schema:

CREATE TABLE entities
(
id character varying(128) NOT NULL,
value bytea,
datatype smallint,
CONSTRAINT entities_pkey PRIMARY KEY (id)
);

CREATE OR REPLACE RULE entities_merge AS
ON INSERT TO entities
WHERE (EXISTS ( SELECT 1
FROM entities entities_1
WHERE entities_1.id::text = new.id::text)) DO INSTEAD UPDATE
entities SET value = new.value, datatype = new.datatype
WHERE entities.id::text = new.id::text;

Additional functionality I want is to do basic fuzzy searches by key.
Currently I'm using a left anchored LIKE query. This works well because
keys are left prefixed with a scope, a delimiter, and then the actual key
for the data. These fuzzxy searches would never be used in game logic,
they would be admin only queries for doing things like obtaining a list of
players. So they should be infrequent.

The scope of the query ability will not expand in the future. I support
multiple backends for the key/value storage so I'm working with the lowest
common denominator. Plus I have a different approach for data that you
need to do complex queries on (regular tables and an ORM).

Chris

Gavin Flower

2014-09-28 00:33:53 UTC

Permalink

Post by snacktime
I'm looking for some feedback on the design I'm using for a basic
key/value storage using postgres.
Just some quick background. This design is for large scale games that
can get up to 10K writes per second or more. The storage will be
behind a distributed memory cache that is built on top of Akka, and
has a write behind caching mechanism to cut down on the number of
writes when you have many updates in a short time period of the same
key, which is common for a lot of multiplayer type games.
I have been using Couchbase, but this is an open source project, and
Couchbase is basically a commercial product for all intents and
purposes, which is problematic. I will still support Couchbase, but I
don't want it have to tell people if you really want to scale,
couchbase is the only option.
The schema is that a key is a string, and the value is a string or
binary. I am actually storing protocol buffer messages, but the
library gives me the ability to serialize to native protobuf or to
json. Json is useful at times especially for debugging.
CREATE TABLE entities
(
id character varying(128) NOT NULL,
value bytea,
datatype smallint,
CONSTRAINT entities_pkey PRIMARY KEY (id)
);
CREATE OR REPLACE RULE entities_merge AS
ON INSERT TO entities
WHERE (EXISTS ( SELECT 1
FROM entities entities_1
WHERE entities_1.id::text = new.id::text)) DO INSTEAD
UPDATE entities SET value = new.value, datatype = new.datatype
WHERE entities.id::text = new.id::text;
Additional functionality I want is to do basic fuzzy searches by key.
Currently I'm using a left anchored LIKE query. This works well
because keys are left prefixed with a scope, a delimiter, and then the
actual key for the data. These fuzzxy searches would never be used in
game logic, they would be admin only queries for doing things like
obtaining a list of players. So they should be infrequent.
The scope of the query ability will not expand in the future. I
support multiple backends for the key/value storage so I'm working
with the lowest common denominator. Plus I have a different approach
for data that you need to do complex queries on (regular tables and an
ORM).
Chris

Note:
I suspect that what I suggest below will probably NOT improve
performance, and may not necessarily be appropriate for your use case.
However, they may facilitate a wider range of queries, and might be
easier to understand.

Note the comment about using 'PRIMARY KEY' in
http://www.postgresql.org/docs/9.2/static/sql-createtable.html

[...]
The primary key constraint specifies that a column or columns of a
table can contain only unique (non-duplicate), nonnull values.
Technically, PRIMARY KEY is merely a combination of UNIQUE and NOT
NULL, but identifying a set of columns as primary key also provides
metadata about the design of the schema, as a primary key implies
that other tables can rely on this set of columns as a unique
identifier for rows.
[...]

My first thought was to simplify the table create, though I think the
length check on the id is best done in the software updating the databased:

CREATE TABLE entities
(
id text PRIMARY KEY,
value bytea,
datatype smallint,
CONSTRAINT id_too_long CHECK (length(id) <= 128)
);

Then I noticed that your id is actually a compound key, and probably
would be better modelled as:

CREATE TABLE entities
(
scope text,
key text,
value bytea,
datatype smallint,
CONSTRAINT entities_pkey PRIMARY KEY (scope, key)
);

I suspect that making 'datatype' an 'int' would improve performance, but
only by a negligible amount!

Cheers,
Gavin

Thomas Kellerer

2014-09-28 08:44:56 UTC

Permalink

I'm looking for some feedback on the design I'm using for a basic key/value storage using postgres.

Are you aware of Postgres' "native" key/value store: hstore?
http://www.postgresql.org/docs/current/static/hstore.html

Or Postgres JSON support?

Especially the new JSONB in the upcoming 9.4 release which makes indexing JSON documents much eaasier/faster.

I would expect both solutions to be much faster than your entity-attribute-value design.

Thomas

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jonathan Vanasco

2014-09-29 23:20:08 UTC

Permalink

The schema is that a key is a string, and the value is a string or binary. I am actually storing protocol buffer messages, but the library gives me the ability to serialize to native protobuf or to json. Json is useful at times especially for debugging.

I don't know if this will apply to you, but i received significant speed improvements on Postgres key searches by using substring indexes.

If your keys are just random hashes, this would probably work well for you.
if your keys are person-readable, it's probably not going to work as well as the distribution of prefix characters will probably be too uniform.

But the general idea is twofold:

1. create an additional partial index on the key field -- CREATE INDEX _entities_id__subst_7 ON entities(substr(id,1,7));
2. update your SELECTS to search for both the full string AND the substring

- WHERE id = :id
+ WHERE (id = :id) AND (substr(id,1,7) = substr(:id, 1, 7))

By adding in the substring query, the planner will (usually) optimize the select by doing a first pass on the substring index. then it searches that limited set for the rest of matching criteria.

on a table with 4MM+ records , introducing a substring index/query improved my searches by a few orders of magnitude.

before trying this indexing strategy, we were actively looking to migrate this particular query service off of postgres -- it was such a bottleneck and was not scalable.
now there is no reason to leave in the foreseeable future.

This works well because keys are left prefixed with a scope, a delimiter, and then the actual key for the data.

if you're able to standardize the scope out, an index of "(scope, substring(key,1,7))" might work well.

i only used 1,7 as my key arguments, because that was an optimal speed/space mix on my dataset. depending on yours, a shorter or longer index might be more appropriate

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general