GithubHelp home page GithubHelp logo

citusdata / cstore_fdw Goto Github PK

View Code? Open in Web Editor NEW
1.7K 118.0 170.0 821 KB

Columnar storage extension for Postgres built as a foreign data wrapper. Check out https://github.com/citusdata/citus for a modernized columnar storage implementation built as a table access method.

License: Apache License 2.0

Makefile 0.90% C 95.41% PLpgSQL 3.70%
postgresql compression columnar-store columnar-storage

cstore_fdw's People

Contributors

begriffs avatar clairegiordano avatar craigkerstiens avatar domoritz avatar jasonmp85 avatar jeffwidman avatar kostiantyn-nemchenko avatar marcocitus avatar marcoslot avatar martin-loetzsch avatar mtuncer avatar onurctirtir avatar orgrim avatar pykello avatar samay-sharma avatar serprex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cstore_fdw's Issues

select: stripe footer column count and table column count don't match

Hello,

I set up the tpch benchmark database in cstore_fdw and inserted the data. So far no errors. But when I now want to do a simple select like

select * from region;

or

select count(*) from region;

I get the following error:

ERROR: stripe footer column count and table column count don't match
STATEMENT: select * from region;
ERROR: stripe footer column count and table column count don't match

What am I missing?

Kind regards
Nasadows

fail to execute truncate and copy with column list

When I used Kettle's bulk load to copy data into a cstore table,it failed.
The SQL sended by The Kettle's bulk load is like the following.

postgres=# truncate cstb1;
ERROR: "cstb1" is not a table

postgres=# copy cstb1(id,name) from stdin with csv;
ERROR: copy column list is not supported

How about supporting the two SQLs in the future?

./cstore.pb-c.h:7:10: fatal error: 'protobuf-c/protobuf-c.h' file not found

Hi, I'm trying to install cstore_fdw and getting the following error:

gcc -mmacosx-version-min=10.7 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv --std=c99 -I. -I. -I/Applications/Postgres93.app/Contents/MacOS/include/postgresql/server -I/Applications/Postgres93.app/Contents/MacOS/include/postgresql/internal -I/Applications/Postgres93.app/Contents/MacOS/include/libxml2 -I/Applications/Postgres93.app/Contents/MacOS/include -c -o cstore.pb-c.o cstore.pb-c.c
In file included from cstore.pb-c.c:9:
./cstore.pb-c.h:7:10: fatal error: 'protobuf-c/protobuf-c.h' file not found

include <protobuf-c/protobuf-c.h>

     ^

1 error generated.
make: *** [cstore.pb-c.o] Error 1

I am attempting this on my MBP (osx) laptop. I have installed postgre using postgres.app (version 9.3.1.0 (18)).

I installed protobuf and they seem to have installed without any errors:

brew info protobuf-c
protobuf-c: stable 1.0.2
https://github.com/protobuf-c/protobuf-c
/usr/local/Cellar/protobuf-c/1.0.2 (10 files, 304K) *
Built from source
From: https://github.com/Homebrew/homebrew/blob/master/Library/Formula/protobuf-c.rb
==> Dependencies
Build: pkg-config ✔
Required: protobuf ✔

I am running OS X Yosemite (10.10)

Install on mac, doesn't create the cstore file and can't insert or analyze

The table is created and I can see it in pgadmin. when I do an analyze it gives me the following error

ERROR: could not stat file "/Users/tim/var/cstore/white_box_development_probes_scstore.cstore": No such file or directory

The directory is there but there is no file. Should the file be created when I create the table?

OSX 10.9: ld: library not found for -lprotobuf-c

I'm getting this error when trying to make install cstore_fdw on OSX 10.9.2.

clang -I/usr/local/Cellar/ossp-uuid/1.6.2/include -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv  -bundle -multiply_defined suppress -o cstore_fdw.so cstore.pb-c.o cstore_fdw.o cstore_writer.o cstore_reader.o cstore_metadata_serialization.o -L/usr/local/Cellar/postgresql/9.3.4/lib -L/usr/local/Cellar/ossp-uuid/1.6.2/lib  -Wl,-dead_strip_dylibs   -lprotobuf-c -bundle_loader /usr/local/Cellar/postgresql/9.3.4/bin/postgres
ld: library not found for -lprotobuf-c
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Setup:

  • brew install protobuf-c
  • PostGres 9.3.4 installed with brew
  • PATH: /usr/local/Cellar/postgresql/9.3.4/bin/ /usr/bin /bin /usr/sbin /sbin /usr/local/bin

I have tried doing
sudo ln -s /usr/local/Cellar/protobuf-c/0.15/lib/libprotobuf-c.0.dylib /usr/lib/libprotobuf-c.dylib but that doesn't help.

I have also tried addin the protobuf-c directory to both LD_LIBRARY_PATH and DYLD_LIBRARY_PATH but that doesn't seem to work either.

Help!

high concurrency ( >8) query , postgresql(cstore_fdw) crash

I ran a test with 8 thread , and encountered server crash . when the thread number is 4, it is okay.

see the log below:

postgres=# LOG: server process (PID 29409) was terminated by signal 9: Killed
DETAIL: Failed process was running: select * from jaelog limit 1 offset $1
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

code is issues

LoadStripeSkipList(FILE tableFile, StripeMetadata *stripeMetadata,
StripeFooter *stripeFooter, uint32 columnCount,
Form_pg_attribute *attributeFormArray)
{
....
/
table contains additional columns added after this stripe is created */
for (columnIndex = stripeColumnCount; columnIndex < columnCount; columnIndex++)
{
ColumnBlockSkipNode *columnSkipList = NULL;
uint32 blockIndex = 0;

    /* create empty ColumnBlockSkipNode for missing columns*/
    columnSkipList = palloc0(stripeBlockCount * sizeof(ColumnBlockSkipNode));

    for (blockIndex = 0; blockIndex < stripeBlockCount; blockIndex++)
    {
        columnSkipList->rowCount = 0;
        columnSkipList->hasMinMax = false;
        columnSkipList->minimumValue = 0;
        columnSkipList->maximumValue = 0;
        columnSkipList->existsBlockOffset = 0;
        columnSkipList->valueBlockOffset = 0;
        columnSkipList->existsLength = 0;
        columnSkipList->valueLength = 0;
        columnSkipList->valueCompressionType = COMPRESSION_NONE;
    }
    blockSkipNodeArray[columnIndex] = columnSkipList;
}
   ...

}

DROP foreign table doesnt fully delete database

So I create simple database;

CREATE FOREIGN TABLE customer_reviews
(
    customer_id TEXT
)
SERVER cstore_server
OPTIONS(filename '/usr/local/opt/postgresql/cstore/customer_reviews.cstore', compression 'pglz');

Then insert 2 rows using COPY.
Count gives me->

select count(*) from customer_reviews;
 count 
-------
     2
(1 row)

and finally do

DROP FOREIGN TABLE customer_reviews;

/usr/local/opt/postgresql/cstore/customer_reviews.cstore doesnt get deleted.

And now if I re-create the database and do select count(*) (before inserting any data) I get 2 as a result.

Altering cstore table column data type might fail

Create cstore table, load a row to it:

create foreign table foo (a int) server cstore_server;
insert into foo select 1;
select a from foo;

We get

 a 

---
 1
(1 row)

Now change the column type, this time we are lucky:

alter table foo alter column a type date;
select a from foo;

And we get

     a      
------------
 2000-01-02
(1 row)

Change the type again, this time we are not lucky:

alter table foo alter column a type varchar(2);
select a from foo;

And we get this

ERROR:  could not open relation with OID 0

And one time, I even got

ERROR:  invalid memory alloc request size 18446744072304459780

I guess it would be fine to prohibit altering column type to an incompatible one at a first step.

Thanks!

Alternatives to empty a table?

I understand that DELETE and TRUNCATE are not implemented so I'd like to know if there's any way to empty a table.

I could do a DROP and then a CREATE but it's not very elegant.

Thanks

"cache lookup failed for type 0" when COPYing into existing cstore

copy se_results from /home/josh/genesis_2010.csv with csv header;
ERROR: cache lookup failed for type 0

Relevant facts:

  • I previously dropped one column from the cstore table (the last column)
  • the table has 179 columns, of which 176 are NUMERIC
  • the table already has 110m rows in it, I'm looking to copy in another 10m

I've tried this a number of ways, but there appears to be no way of any kind to copy into this cstore FDW.

If cstore_fdw can't access a file, it should error out on CREATE

phc=# create foreign table c_statements (
session_id text, session_line_num bigint, log_time timestamptz
,command_tag text, duration float) server cstore_server options (filename '/data/ssd/9.3/data/base/c_statements.cstore', compression 'pglz' );
CREATE FOREIGN TABLE
phc=# analyze c_statements;
ERROR: could not stat file "/data/ssd/9.3/data/base/c_statements.cstore": No such file or directory

Right now, if there's a problem accessing the specified file location, CREATE TABLE happily returns success, and errors out the first time you try to do something with the cstore table. It should error out on CREATE TABLE.

Insert support

Are you planning to support INSERT commands any time soon?

thanks

Support for table copies

Supporting incremental insert/update/deletes will take quite a bit of work. However, another less difficult enhancement will make cstore_fdw 300% more useful right now, which would be support for bulk inserts, specifically:

 INSERT INTO cstore_table SELECT columns FROM regular_table;

As a shortcut, you could internally route this through the existing COPY code on the receiving end. It would be deadly slow for single-row inserts, but then people doing single-row inserts into a cstore are mistaken anyway.

MacOS X - FATAL: could not access file “‘cstore_fdw’”: No such file or directory

On MacOSX I've installed cstore_fdw extension to PostgreSQL 9.3.5, and it looks there was no error in the process ("/usr/local/pgsql/bin/" is incorrect path, but files were copied where they should be, as pg_config is symlinked in the $PATH):

XXX:cstore_fdw kjedrzejewski$ sudo PATH=/usr/local/pgsql/bin/:$PATH make install
/bin/sh /usr/local/Cellar/postgresql/9.3.5_1/lib/pgxs/src/makefiles/../../config/install-sh -c -d '/usr/local/Cellar/postgresql/9.3.5_1/lib'
/bin/sh /usr/local/Cellar/postgresql/9.3.5_1/lib/pgxs/src/makefiles/../../config/install-sh -c -d '/usr/local/Cellar/postgresql/9.3.5_1/share/postgresql/extension'
/bin/sh /usr/local/Cellar/postgresql/9.3.5_1/lib/pgxs/src/makefiles/../../config/install-sh -c -d '/usr/local/Cellar/postgresql/9.3.5_1/share/postgresql/extension'
/usr/bin/install -c -m 755  cstore_fdw.so '/usr/local/Cellar/postgresql/9.3.5_1/lib/cstore_fdw.so'
/usr/bin/install -c -m 644 ./cstore_fdw.control '/usr/local/Cellar/postgresql/9.3.5_1/share/postgresql/extension/'
/usr/bin/install -c -m 644 ./cstore_fdw--1.3.sql ./cstore_fdw--1.2--1.3.sql ./cstore_fdw--1.1--1.2.sql ./cstore_fdw--1.0--1.1.sql  '/usr/local/Cellar/postgresql/9.3.5_1/share/postgresql/extension/'
XXX:cstore_fdw kjedrzejewski$ 

However, when I try to start Postgres, the extension cannot be loaded:

XXX:cstore_fdw kjedrzejewski$ pg_ctl -D /usr/local/var/postgres start
server starting
XXX:cstore_fdw kjedrzejewski$ FATAL:  could not access file "‘cstore_fdw’": No such file or directory

Is there any automatic replication available

I want to know that Is there any automatic replication available. Currently we are copying data from csv file right? I just want to know that is there any option for updating data in citusDB from my postgresql database.

OOM death on SQL COPY

I just spent a couple days figuring out why my dataset kept failing on SQL COPY like this:

cstore_test=# COPY airlines_complete_local FROM '/home/vagrant/airlines_cleaned.csv' WITH (FORMAT csv, HEADER true);
The connection to the server was lost. Attempting reset: Failed.
!> 
!> 

Whenever that would happen, I'd end up with a non-empty cstore file created, but no .footer file, so the table wasn't queryable.

cstore_test=# select count(*) from airlines_cleaned;
ERROR:  could not open file "/var/lib/postgresql/9.3/main/airlines_cleaned.orc.footer" for reading: No such file or directory
HINT:  Try copying in data to the table.

I eventually tracked it down to the OOM killer. It looks like the Postgres subprocess doing the import got zapped.

I was testing in a Vagrant box with 1GB of RAM.

The dataset is a CSV with 660 columns and 245269 rows. It's mostly small integers, with a few text columns. The CSV file size is 251MB (uncompressed). I can't post it publicly but could send privately if you'd like to test.

I've successfully loaded in much larger datasets, but not with so many columns.

Ideally, this would just work. But if there are fundamental constraints on how much memory it takes to import a wide table, it'd be nice to get some kind of memory usage warning from the extension itself instead just the cryptic loss of connection that you get when the OOM killer kicks in. I probably won't be the last person to run into this, so if we can save others the debugging time, that'd be nice.

1.2 build error

I'm seeing this on my Fedora 21 box:

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -DLINUX_OOM_ADJ=0 -fpic --std=c99 -I. -I./ -I/usr/pgsql-9.4/include/server -I/usr/pgsql-9.4/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include -c -o cstore_metadata_serialization.o cstore_metadata_serialization.c
cstore_metadata_serialization.c:19:25: fatal error: cstore.pb-c.h: No such file or directory
#include "cstore.pb-c.h"
^
compilation terminated.
: recipe for target 'cstore_metadata_serialization.o' failed
make[1]: *** [cstore_metadata_serialization.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: Leaving directory '/home/devrim/Documents/Devrim/Projects/repo/pgrpms/rpm/redhat/9.4/cstore_fdw/F-21/cstore_fdw-1.2'

Regards, Devrim

cstore_fdw should support tablespaces

After Enhancement #16 is implemented, we should also implement support for a TABLESPACE option for cstore tables, which would drop them in the base/oid directory on the specified tablespace.

Hash-based indexes

I'm not familiar enough with FDWs to know for sure if this could work with the query planner, but I think it would be beneficial to use hash based indexes. In this sense, you could get away with not storing the indexed columns as columns, but store the hash in the table data similar to how you store skiplists. I think this would be extremely beneficial for data size and its implicit performance. I believe this is the strategy that kdb+ uses for its database (which is pretty ubiquitous in HFT).

Concurrency issues with "INSERT INTO ... SELECT ..."

Some users have observed that sometimes they get corrupt footer files when they try to concurrently use "INSERT INTO ... SELECT ...".

The related thread is: https://groups.google.com/forum/#!topic/cstore-users/THYfaSgPUHc

The scenario's which this can happen are:

  • Concurrent INSERT and INSERT,
  • Concurrent INSERT and COPY.

We should have protected two concurrent COPY commands from each-other by acquiring the proper locks, but we need to test for this.

Before taking action, we should make sure that the footer corruption is indeed happening because of concurrency issues. The footer file is usually small is and I expect the window for writing to it should be very small and chance of two concurrent bulk loads writing to it in the same time shouldn't be high. Irrespective of this, it seems that INSERT has some concurrency problems.

fatal error: 'access/htup_details.h' file not found

I'm having trouble building on Mac OS X 10.8.5 with XCode 5.1.

One thing I had to do was symlink /Applications/Xcode.app/Contents/Developer/Toolchains/OSX10.8.xctoolchain.

sudo ln -s /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain /Applications/Xcode.app/Contents/Developer/Toolchains/OSX10.8.xctoolchain

That made the build actually work, but then it aborts on the following error:

> PATH=/usr/local/bin/pg_config:$PATH make
/Applications/Xcode.app/Contents/Developer/Toolchains/OSX10.8.xctoolchain/usr/bin/cc -arch x86_64 -pipe -Os -g -Wall -Wno-deprecated-declarations -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv  --std=c99 -I. -I. -I/usr/include/postgresql/server -I/usr/include/postgresql/internal -I/usr/include/libxml2   -c -o cstore.pb-c.o cstore.pb-c.c
/Applications/Xcode.app/Contents/Developer/Toolchains/OSX10.8.xctoolchain/usr/bin/cc -arch x86_64 -pipe -Os -g -Wall -Wno-deprecated-declarations -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv  --std=c99 -I. -I. -I/usr/include/postgresql/server -I/usr/include/postgresql/internal -I/usr/include/libxml2   -c -o cstore_fdw.o cstore_fdw.c
cstore_fdw.c:23:10: fatal error: 'access/htup_details.h' file not found
#include "access/htup_details.h"
         ^
1 error generated.
make: *** [cstore_fdw.o] Error 1

cstore will not compile against 9.5

It is not currently possible to compile cstore against Postgres 9.5.

make_foreignscan requires new arguments:

cstore_fdw.c: In function 'CStoreGetForeignPlan':
cstore_fdw.c:1178:16: error: too few arguments to function 'make_foreignscan'
foreignScan = make_foreignscan(targetList, scanClauses, baserel->relid,
^
In file included from cstore_fdw.c:40:0:
/usr/local/pgsql/include/server/optimizer/planmain.h:46:21: note: declared here
extern ForeignScan make_foreignscan(List *qptlist, List *qpqual,
^
: recipe for target 'cstore_fdw.o' failed
make: *
* [cstore_fdw.o] Error 1

cstore incompatible to 9.5 RC1

in the current feature_9.5_compatibility branch the following error appears:

make install
clang -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -O2 --std=c99 -I. -I./ -I/usr/local/Cellar/postgresql/9.5rc1_2/include/server -I/usr/local/Cellar/postgresql/9.5rc1_2/include/internal -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/libxml2 -c -o cstore.pb-c.o cstore.pb-c.c
clang -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -O2 --std=c99 -I. -I./ -I/usr/local/Cellar/postgresql/9.5rc1_2/include/server -I/usr/local/Cellar/postgresql/9.5rc1_2/include/internal -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/libxml2 -c -o cstore_fdw.o cstore_fdw.c

cstore_fdw.c:748:29: warning: incompatible pointer types assigning to 'GetForeignPlan_function' (aka 'ForeignScan ()(PlannerInfo
, RelOptInfo *, Oid, ForeignPath *, List *, List *, Plan *)') from 'ForeignScan *(PlannerInfo *, RelOptInfo *, Oid,
ForeignPath *, List *, List *)' [-Wincompatible-pointer-types]
fdwRoutine->GetForeignPlan = CStoreGetForeignPlan;
^ ~~~~~~~~~~~~~~~~~~~~
cstore_fdw.c:1139:20: error: too few arguments to function call, expected 9, have 8
NIL); /
no fdw...
^
/usr/local/Cellar/postgresql/9.5rc1_2/include/server/optimizer/pathnode.h:82:1: note: 'create_foreignscan_path' declared here
extern ForeignPath create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
^
cstore_fdw.c:1180:40: error: too few arguments to function call, expected 8, have 7
foreignPrivateList, NIL, NIL);
^
/usr/local/Cellar/postgresql/9.5rc1_2/include/server/optimizer/planmain.h:46:1: note: 'make_foreignscan' declared here
extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
^
1 warning and 2 errors generated.
make: *
* [cstore_fdw.o] Error 1

Implement fast skiplist-based aggregation

This is a major enhancement and will take some time to develop.

Currently cstore just returns individual tuples and relies on PostgreSQL for aggegation. This means we're not getting one of the primary benefits of column stores, which is fast single-column aggregation, based on the skiplist indexes.

This would probably involve implementing an executor hook. It will be complicated to figure out how to use the hook when we can (i.e. single-column known aggregation) and to kick it out to Postgres when we can't (e.g. multicolumn aggregation, custom aggregates, non-btree aggregation, etc.). Ideally, we would also add an API for supporting custom aggregates which can be based on COUNT/MIN/MAX from the skiplists.

Data management function support (Bug in CitusDB)

UDF master_apply_delete_command does not appear to support FDW tables:

# SELECT master_apply_delete_command('DELETE FROM customer_reviews_cstore WHERE review_date  < ''1990-01-01''');
ERROR:  relation "customer_reviews_cstore" is not an ordinary table

unused code in CreateEmptyStripeBuffers() should be removed

Hi ,

     In CreateEmptyStripeBuffers function in cstore_writer.c contains the following code snippet 

ColumnBlockData *blockDataArray = palloc0(columnCount * sizeof(ColumnBlockData));

ColumnBlockData *blockData = palloc0(sizeof(ColumnBlockData));
bool *existsArray = palloc0(blockRowCount * sizeof(bool));
Datum *valueArray = palloc0(blockRowCount * sizeof(Datum));

blockData->existsArray = existsArray;
blockData->valueArray = valueArray;
blockData->valueBuffer = NULL;
blockDataArray[columnIndex] = blockData;

     What is the significance of the above code block ?? blockDataArray created is not assigned anywhere in this function . Am i missing something ??

Thanks,
Harsha

CitusDB STAGE fails on distributed table without filename option

This bug is for the "develop" branch of cstore_fdw.

I create a stock citusdb single node cluster on Ubuntu 12.04 according to the instructions at http://www.citusdata.com/docs/single-node-cluster.

I compiled and installed a fresh version of the cstore_fdw wrapper from the develop branch and verified that cstore_fdw loaded and functioned correctly when not using the DISTRIBUTE BY clause in the table creation. I was able to successfully use \COPY in that case on each of the servers.

I tried creating a distributed foreign cstore table with the following command:

CREATE FOREIGN TABLE customer_reviews_cstore
(
    customer_id TEXT,
    review_date DATE,
    review_rating INTEGER,
    review_votes INTEGER,
    review_helpful_votes INTEGER,
    product_id CHAR(10),
    product_title TEXT,
    product_sales_rank BIGINT,
    product_group TEXT,
    product_category TEXT,
    product_subcategory TEXT,
    similar_product_ids CHAR(10)[]
)
DISTRIBUTE BY APPEND (review_date)
SERVER cstore_server
OPTIONS(compression 'pglz')

Which worked fine, the problem was during \STAGE which produced the following:

postgres=# \STAGE customer_reviews_cstore FROM '/home/ubuntu/customer_reviews_1998.csv' WITH CSV;
NOTICE:  extension "cstore_fdw" already exists, skipping
remote command "SELECT * FROM worker_apply_shard_ddl_command  ($1::int8, $2::text)" failed with ERROR:  option "filename" not found

Any help as to what I'm doing wrong or if there is some other kind of error would be helpful.

Performance issues

Hello,

We are currently testing cstore_fdw 1.3 but we came across serious performance issues on different cases.

To demonstrate the problem, we created a testing data set as follows:

postgres=> create foreign table "test_table" (somedata integer) server cstore_server;
CREATE FOREIGN TABLE
Time: 22.470 ms

postgres=> insert into "test_table" select generate_series(1,10000000);
INSERT 0 10000000

Then a simple query to test the table:

SELECT sum(somedata)
FROM   "test_table"
WHERE  somedata between 100 and 8000;
   sum    
----------
 31999050
(1 row)

Time: 26.560 ms

Postgres' query plan when using 'explain analyze' shows that only a few rows were filtered. Our understanding is that cstore_fdw was able to skip many blocks entirely :

                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=155165.00..155165.01 rows=1 width=4) (actual time=25.134..25.134 rows=1 loops=1)
   ->  Foreign Scan on test_table  (cost=0.00..155040.00 rows=50000 width=4) (actual time=0.578..23.509 rows=7901 loops=1)
         Filter: ((somedata >= 100) AND (somedata <= 8000))
         Rows Removed by Filter: 2099
         CStore File: /var/lib/postgresql/9.4/main/cstore_fdw/12150/16528
         CStore File Size: 41283601

However, when doing the following equivalent query, performance gets much worse:

WITH test AS (
    SELECT   sum(somedata)
    FROM     "test_table"
    WHERE    somedata between 100 and 8000
)
SELECT * FROM test;
   sum    
----------
 31999050
(1 row)

Time: 771.661 ms

The planer seems to indicate a full scan is being used (9992099 rows filtered):

                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on test  (cost=155165.01..155165.03 rows=1 width=8) (actual time=871.111..871.112 rows=1 loops=1)
   CTE test
     ->  Aggregate  (cost=155165.00..155165.01 rows=1 width=4) (actual time=871.107..871.107 rows=1 loops=1)
           ->  Foreign Scan on test_table  (cost=0.00..155040.00 rows=50000 width=4) (actual time=1.135..869.455 rows=7901 loops=1)
                 Filter: ((somedata >= 100) AND (somedata <= 8000))
                 Rows Removed by Filter: 9992099
                 CStore File: /var/lib/postgresql/9.4/main/cstore_fdw/12150/16528
                 CStore File Size: 41283601

A similar issue exists when doing a 'group by' or an 'order by'

SELECT *
FROM  (
    SELECT   somedata 
    FROM     "test_table"
    WHERE    somedata between 100 and 8000
    ORDER BY somedata
) x(somedata);
Time: 823.139 ms

Doing the equivalent query this way is faster:

SELECT somedata
FROM  (
    SELECT   somedata
    FROM     "test_table"
    WHERE    somedata between 100 and 8000
    ) x(somedata)
ORDER BY somedata;
Time: 34.104 ms

                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=158942.41..159067.41 rows=50000 width=4) (actual time=7.317..7.604 rows=7901 loops=1)
   Sort Key: test_table.somedata
   Sort Method: quicksort  Memory: 563kB
   ->  Foreign Scan on test_table  (cost=0.00..155040.00 rows=50000 width=4) (actual time=0.177..6.510 rows=7901 loops=1)
         Filter: ((somedata >= 100) AND (somedata <= 8000))
         Rows Removed by Filter: 2099
         CStore File: /var/lib/postgresql/9.4/main/cstore_fdw/12150/16528
         CStore File Size: 41283601

However, when using the above (faster) query as a subselect, things get slower again with a query plan that also indicates that a full scan occured:

SELECT *
FROM (
    SELECT somedata
    FROM  (
        SELECT   somedata
        FROM     "test_table"
        WHERE    somedata between 100 and 8000
    ) x(somedata)
    ORDER BY somedata
) y(somedata);
Time: 858.250 ms

                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=158942.41..159067.41 rows=50000 width=4) (actual time=879.086..879.306 rows=7901 loops=1)
   Sort Key: test_table.somedata
   Sort Method: quicksort  Memory: 563kB
   ->  Foreign Scan on test_table  (cost=0.00..155040.00 rows=50000 width=4) (actual time=1.142..876.050 rows=7901 loops=1)
         Filter: ((somedata >= 100) AND (somedata <= 8000))
         Rows Removed by Filter: 9992099
         CStore File: /var/lib/postgresql/9.4/main/cstore_fdw/12150/16528
         CStore File Size: 41283601
 Planning time: 3.950 ms
 Execution time: 879.659 ms

Concurrent Drop Table & Truncate Table commands during COPY

It seems we have a minor issue on concurrently dropping & truncating tables.

Steps to reproduce:

  • Follow the instrcutions on the README to create customer_reviews table.
  • Run a long running \COPY command
    • While \COPY is in progress issue 2 TRUNCATE TABLE customer_reviews and 2 DROP TABLE customer_reviews on 4 different psql consoles (or use pgbench)
    • see the following in the outputs of some of the commands:
postgres=# truncate TABLE customer_reviews;
ERROR:  deadlock detected
DETAIL:  Process 11881 waits for AccessExclusiveLock on relation 32405 of database 12641; blocked by process 11868.
Process 11868 waits for AccessExclusiveLock on relation 32405 of database 12641; blocked by process 11881.

HINT:  See server log for query details.
postgres=# 

The same steps does not lead to deadlocks on regular postgres table.

(Note: We first hit this issue in #67)
(Note: Only concurrent TRUNCATEs or DROPs does not lead to any deadlocks in either cstore_fdw or regular postgres tables.)

Installation on Windows 7

Hello,

did anybody try to get the cstore_fdw running on Windows 7? When I try to run the makefile, I get this error:

process_begin: CreateProcess(NULL, pg_config --pgxs, ...) failed.
makefile:42: *** PostgreSQL 9.3 or 9.4 is required to compile this extension. S
chluss.

But I have PostgreSQL 9.3 installed. The newest Version available for Windows. Is the cstore_fdw not yet compatible with Windows or am I missing something?

Thanks,
Nasadows

Having trouble installing on OS X

After getting protobuf-c via macports and trying to run make I get this error message:

clang: warning: no such sysroot directory: '/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.6.sdk'
clang: warning: no such sysroot directory: '/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.6.sdk'
clang: warning: argument unused during compilation: '-headerpad_max_install_names'
In file included from cstore.pb-c.c:8:
./cstore.pb-c.h:6:10: fatal error: 'google/protobuf-c/protobuf-c.h' file not
      found
#include <google/protobuf-c/protobuf-c.h>
         ^
1 error generated.
make: *** [cstore.pb-c.o] Error 1

I'm new to OS X, so it my also be my fault. Help would be great.

Make postgres version requirement explicit

It's not clear from the docs here which Postgres version is required. I see from the travis file that this package only appears to be tested against Postgres 9.3. I attempted to build and compile this on a RHEL 4.4.7 with Postgres 9.2.7 but ran into the following compilation error:

cstore_fdw.c:23:33: error: access/htup_details.h: No such file or directory

A little googling reveals that this header file was a recent addition in 9.3: dimitri/pgextwlist#12

Suggest either updating the makefile and code to work against 9.2.x or explicitly indicating in your instructions that 9.3 is a hard requirement.

INSERT concurrency causes deadlock

Using cstore_fdw to benchmark CitusDB and the cstore_fdw extension. During the INSERT phase of the benchmark, a CTE INSERT of the following form is performed:

WITH temp_table(id,project_id,time,value,count) AS (VALUES (...),(...)) INSERT INTO benchmark SELECT * FROM temp_table

The rationale for using this awkward form of insertion due to being cstore_fdw's lack of support for bulk insert of the form INSERT INTO table VALUES (tuple_1),(tuple_2),(tuple_3)

When there is only one thread performing expressions of this type, the server accepts INSERTs as expected. When multiple threads are performing this same operation, the following error is printed in the CitusDB error log for all threads except the initial thread:

ERROR:  deadlock detected
DETAIL:  Process 2902 waits for ExclusiveLock on relation 16714 of database 16493; blocked by process 2900.
    Process 2900 waits for ExclusiveLock on relation 16714 of database 16493; blocked by process 2902.

Details regarding the machine where the error occurred:

[ec2-user@someip ~]$ /opt/citusdb/4.0/bin/psql --version
psql (PostgreSQL) 9.4.0
[ec2-user@someip ~]$ uname -a
Linux ip-someip 3.14.35-28.38.amzn1.x86_64 #1 SMP Wed Mar 11 22:50:37 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[ec2-user@someip ~]$ cat /etc/system-release
Amazon Linux AMI release 2015.03
[ec2-user@someip ~]$ /opt/citusdb/4.0/bin/psql benchmark -c "\\d benchmark"
                 Foreign table "public.benchmark"
   Column   |       Type       |     Modifiers      | FDW Options 
------------+------------------+--------------------+-------------
 id         | integer          | not null           | 
 project_id | integer          | not null           | 
 time       | integer          | not null default 0 | 
 value      | double precision | not null default 0 | 
 count      | integer          | not null default 0 | 
Server: cstore_server
FDW Options: (compression 'pglz')
[ec2-user@someip cstore_fdw]$ git rev-parse HEAD
8943b85ae21fcccc9941192e45282aaa0e3e0755

Machine is an ec2-instance, c3.2xlarge.

Support automatically determining filename

When creating a new cstore table, a lot of people won't really care to pick a location for where the file should live. It would be nice if you could pass an 'auto' or similar magic value to the filename option, or omit the filename option entirely, and let the extension pick a nice filename for you based on data_directory and your table name.

Inheritance support

For time series data, one approach in Postgres to manage data warehouse space is to create a set of inherited tables constrained by a date/time interval, and drop or move old partitions when no longer needed in the active working set.

This would also be a way to manage data size in cstore_fdw given its append-only approach.

Is inheritance on the road map?

things to check if no performance improvement seen?

I have loaded a cstore_fdw with about 10 columns of mostly int/date/varchar datatypes and running some fairly basic queries against it:
(The machine has very fast hardware, SSD's, 128GB ram, 40 core, 3ghz)

The table has about 16m rows in it and I ran ANALYZE _cstore after populating it with INSERT INTO()

SELECT zip, COUNT(_) FROM _cstore group by 1; --16 seconds
SELECT state, COUNT(DISTINCT phone1) FROM cstore group by 1; --101 seconds
SELECT insertdate, COUNT(
) FROM _cstore WHERE email_domain = 'gmail.com' group by 1; --3 seconds

I kind of had the understanding (coming from the SQL Server column store world) that most queries against cstore tables would be nearly instant.

Are my queries not good judges of speed or could I have missed a setting somewhere that might explain why these queries run for so long (except the 3rd one)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.