sqlalchemy-redshift / sqlalchemy-redshift Goto Github PK
View Code? Open in Web Editor NEWAmazon Redshift SQLAlchemy Dialect
Home Page: https://sqlalchemy-redshift.readthedocs.org/en/latest/
License: MIT License
Amazon Redshift SQLAlchemy Dialect
Home Page: https://sqlalchemy-redshift.readthedocs.org/en/latest/
License: MIT License
class Example(Model):
id = Column(types.Integer(), primary_key=True, info={'encode': 'lzo'})
name = Column(types.Unicode(255), info={'encode': 'lzo'})
I wasn't sure if this was an alembic bug/setting or not, but given that e.g. defining identity columns or column encodings are things which would be more important to redshift tables, I would have expected them to be included in the generated migrations.
Sorry, I should've tested this earlier as soon as you finished working on #64 , but I kept using 0.1.2 and didn't check if there were any more issues.
The issue now is with tables that have foreign keys contraints, I've been able to reproduce the problem with the following 2 tables:
create table ref.foo(id integer identity(1, 1) not null unique);
create table ref.bar(
foo_id integer not null,
foreign key(foo_id) references ref.foo(id))
And, when running .reflect(only=['bar'], schema='ref')
I get *** AttributeError: 'NoneType' object has no attribute 'group'
This is the traceback:
File "/home/dario/Projects/l2/Hydra/database/hydra_database/__init__.py", line 81, in cached_metadatas
redshiftmeta.reflect(only=tablenames, schema=schema)
File "/home/dario/.local/share/virtualenvs/hydra_worker/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 3647, in reflect
Table(name, self, **reflect_opts)
File "/home/dario/.local/share/virtualenvs/hydra_worker/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 416, in __new__
metadata._remove_table(name, schema)
File "/home/dario/.local/share/virtualenvs/hydra_worker/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/home/dario/.local/share/virtualenvs/hydra_worker/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 411, in __new__
table._init(name, metadata, *args, **kw)
File "/home/dario/.local/share/virtualenvs/hydra_worker/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 484, in _init
self._autoload(metadata, autoload_with, include_columns)
File "/home/dario/.local/share/virtualenvs/hydra_worker/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 496, in _autoload
self, include_columns, exclude_columns
File "/home/dario/.local/share/virtualenvs/hydra_worker/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1477, in run_callable
return callable_(self, *args, **kwargs)
File "/home/dario/.local/share/virtualenvs/hydra_worker/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 364, in reflecttable
return insp.reflecttable(table, include_columns, exclude_columns)
File "/home/dario/.local/share/virtualenvs/hydra_worker/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 578, in reflecttable
exclude_columns, reflection_options)
File "/home/dario/.local/share/virtualenvs/hydra_worker/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 666, in _reflect_fk
table_name, schema, **table.dialect_kwargs)
File "/home/dario/.local/share/virtualenvs/hydra_worker/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 447, in get_foreign_keys
**kw)
File "<string>", line 2, in get_foreign_keys
File "/home/dario/.local/share/virtualenvs/hydra_worker/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 54, in cache
ret = fn(self, con, *args, **kw)
File "/home/dario/.local/share/virtualenvs/hydra_worker/local/lib/python2.7/site-packages/sqlalchemy_redshift/dialect.py", line 396, in get_foreign_keys
referred_column = m.group('referred_column')
AttributeError: 'NoneType' object has no attribute 'group'
The last release was in April, and there have been a whole mass of patches made since then.
As discussed in #101, reflection on Redshift can be time-consuming because the per-query overhead is high and SQLAlchemy issues new queries for every table you need to reflect.
I'd love to find a way to "freeze" the schema cache and have it persist while you reflect multiple tables. We already do bulk queries to get info about all tables on reflection, but that info gets thrown away after each table is reflected.
One possibility is to introduce a RedshiftDialect._global_reflection_cache
member that would normally be None
and modify the reflection.cache
decorator to check for it. We'd need to be very intentional about how this gets used.
It might be nice to wrap it up in a context manager. You could write something like:
with RedshiftDialect.caching_schema():
# reflect a bunch of tables here, utilizing the persistent cache
# Reflection is back to normal now that we've left the caching_schema context
I've never used alembic. I wonder if there's some way we could hook this into alembic so that migrations could be faster.
I m trying to execute basic query and pass tuple parameter into it like:
engine = create_engine(app.config['REDSHIFT_DATABASE_URI'], echo=True)
conn = engine.connect()
s = text("select distinct t1.hh_id, t2.mri_id from msa t1 left outer join (select hh_id, mri_id from msa where mri_id in :code_param) t2 on t1.hh_id = t2.hh_id")
mriset = conn.execute(s, code_param = tuple(codes)).fetchall()
Here is an error:
2015-11-11 22:49:37,373 INFO sqlalchemy.engine.base.Engine select distinct t1.hh_id, t2.mri_id from msa t1 left outer join (select hh_id, mri_id from msa where mri_id in %(code_param)s) t2 on t1.hh_id = t2.hh_id
2015-11-11 22:49:37,374 INFO sqlalchemy.engine.base.Engine {'code_param': (-207426274, -205158201, -174737541)}
2015-11-11 22:49:37,681 INFO sqlalchemy.engine.base.Engine ROLLBACK
It works good though with postgres db.
Do you have any clue why?
We use BZIP2 compression on our s3 files, it doesn't seem possible to specify this with the CopyCommand.
http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-bzip2
>>> import sqlalchemy as sa
>>> from redshift_sqlalchemy.dialect import CopyCommand
>>> table = sa.table("foo")
>>> CopyCommand(table, "s3://bucket/data/", "XXXYYYZZZXXXYYYZZZ11", "XXXYYYZZZXXXYYYZZZ11XXXYYYZZZXXXYYYZZZ11", format="BZIP2")
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/virtualenv/local/lib/python2.7/site-packages/sqlalchemy_redshift/commands.py", line 362, in __init__
self.formats)
ValueError: "format" parameter must be one of ['CSV', 'JSON', 'AVRO', None]
>>> CopyCommand(table, "s3://bucket/data/", "XXXYYYZZZXXXYYYZZZ11", "XXXYYYZZZXXXYYYZZZ11XXXYYYZZZXXXYYYZZZ11", compression="BZIP2")
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/virtualenv/local/lib/python2.7/site-packages/sqlalchemy_redshift/commands.py", line 368, in __init__
self.compression_types
ValueError: "compression" parameter must be one of ['GZIP', 'LZOP']
Redshift's delete statement varies slightly from Postgresql's. See here for documentation.
Basic delete statements have the same syntax.
For instance, the following is valid SQL in both dialects:
DELETE FROM customer_table WHERE customer_table.id > 1000
However, while the following is a valid statement in Postgresql:
DELETE FROM customer_table
WHERE customer_table.id = order_table.customer_id
AND order_table.id < 100
It needs to be written for Redshift as:
DELETE FROM customer_table
USING order_table
WHERE customer_table.id = order_table.customer_id
AND order_table.id < 100
SqlAlchemy should be able to build this resultant query with the following Python snippet:
from sqlalchemy import delete, Table, Column, Integer, MetaData
from redshift_sqlalchemy import RedshiftDialect
meta = MetaData()
customer = Table('customer_table', meta, Column('id', Integer, primary_key=True))
order = Table('order_table', meta, Column('id', Integer, primary_key=True), Column('customer_id', Integer)
del_stmt = delete(order).where(order.c.customer_id==customer.c.id).where(order.c.id<100)
print(del_stmt.compile(dialect=RedshiftDialect())
The dialect needs to specify that sequences are not supported. Otherwise sqlalchemy tries to reference a nonexistent sequence and you get an error such as the following:
(sqlalchemy.exc.ProgrammingError) (psycopg2.ProgrammingError) relation "tablename_model_id_seq" does not exist [SQL: 'select nextval(\'"tablename_model_id_seq"\')'] [SQL: u'INSERT INTO tablename (model_id, model_name) VALUES (%(model_id)s, %(model_name)s,)']
After Amazon issued a new SSL root certificate for Redshift last week, I started getting connection failures with "sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL error: certificate verify failed".
I followed the instructions at http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-transitioning-to-acm-certs.html and copied the new https://s3.amazonaws.com/redshift-downloads/redshift-ca-bundle.crt to ~/.postgresql/root.crt , which is the default location according to postgresql docs.
However, for the client to actually check the certificate, I had to provide the file path:
redshift+psycopg2://username:[email protected]:5439/mydb?sslmode=verify-ca&sslrootcert=/home/mydir/.postgresql/root.crt
I guess this is because the module provides its own root certificate file, which is now out of date:
def create_connect_args(self, *args, **kwargs):
"""
Build DB-API compatible connection arguments.
Overrides interface
:meth:`~sqlalchemy.engine.interfaces.Dialect.create_connect_args`.
"""
default_args = {
'sslmode': 'verify-full',
'sslrootcert': pkg_resources.resource_filename(
__name__,
'redshift-ca-bundle.crt'
),
}
cargs, cparams = super(RedshiftDialect, self).create_connect_args(
*args, **kwargs
)
default_args.update(cparams)
return cargs, default_args
Hey,
I just wanted to give it a spin. But I get a NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:psycopg2
. I'm running 3.6.1 and installed sqlalchemy 1.1.9
and sqlalchemy-redshift 0.6.0
.
From there I just followed the docs and used my custom connection string to create the engine. However I'm not sure if this is actually a bug on this project.
Hello! The impetus for the investigation that led to issue #101 is a class of errors that we've been seeing that seem to be due to concurrent access to pg_catalog tables. We have a test which makes sure that our alembic migrations are up to date. This test necessarily does a lot of reflection. We've been seeing this test fail intermittently with errors like these:
1
E sqlalchemy.exc.InternalError: (psycopg2.InternalError) could not find tuple for constraint 839833
E [SQL: '\n SELECT\n n.nspname as "schema",\n c.relname as "table_name",\n t.contype,\n t.conname,\n t.conkey,\n a.attnum,\n a.attname,\n pg_catalog.pg_get_constraintdef(t.oid, true) as condef,\n n.oid as "schema_oid",\n c.oid as "rel_oid"\n FROM pg_catalog.pg_class c\n LEFT JOIN pg_catalog.pg_namespace n\n ON n.oid = c.relnamespace\n JOIN pg_catalog.pg_constraint t\n ON t.conrelid = c.oid\n JOIN pg_catalog.pg_attribute a\n ON t.conrelid = a.attrelid AND a.attnum = ANY(t.conkey)\n WHERE n.nspname !~ \'^pg_\'\n ORDER BY n.nspname, c.relname\n ']
2
E sqlalchemy.exc.InternalError: (psycopg2.InternalError) cache lookup failed for relation 870981
E [SQL: '\n SELECT\n n.nspname as "schema",\n c.relname as "table_name",\n d.column as "name",\n encoding as "encode",\n type, distkey, sortkey, "notnull", adsrc, attnum,\n pg_catalog.format_type(att.atttypid, att.atttypmod),\n pg_catalog.pg_get_expr(ad.adbin, ad.adrelid) AS DEFAULT,\n n.oid as "schema_oid",\n c.oid as "table_oid"\n FROM pg_catalog.pg_class c\n LEFT JOIN pg_catalog.pg_namespace n\n ON n.oid = c.relnamespace\n JOIN pg_catalog.pg_table_def d\n ON (d.schemaname, d.tablename) = (n.nspname, c.relname)\n JOIN pg_catalog.pg_attribute att\n ON (att.attrelid, att.attname) = (c.oid, d.column)\n LEFT JOIN pg_catalog.pg_attrdef ad\n ON (att.attrelid, att.attnum) = (ad.adrelid, ad.adnum)\n WHERE n.nspname !~ \'^pg_\'\n ORDER BY n.nspname, c.relname, att.attnum\n ']
3
E sqlalchemy.exc.InternalError: (psycopg2.InternalError) cache lookup failed for relation 731237
E [SQL: "\n SELECT c.oid\n FROM pg_catalog.pg_class c\n LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace\n WHERE (pg_catalog.pg_table_is_visible(c.oid))\n AND c.relname = %(table_name)s AND c.relkind in ('r', 'v', 'm', 'f')\n "] [parameters: {'table_name': 'ref_cm_icd9'}]
We think that these errors happen when one of the other schemas in our database is being dropped while one of these queries is being run. Since these queries grab information about the entire database, they are not scoped to schemas at all. We have also found this page in the redshift docs: http://docs.aws.amazon.com/redshift/latest/dg/c_serial_isolation.html which seems to be saying that this might be expected behavior.
Error number 1 is the one that we've seen the most. We think this is because of issue #101 - because the constraint query was being run many times, there was more of a chance that it would be run at the wrong time. However, we just recently saw error number 2, which is the query that this dialect uses for getting column info. Error number 3 is actually a query from the regular postgres dialect, so this issue isn't specific to this dialect. We think that these sorts of errors happen specifically when functions are involved - some data is fetched, a function is run on some fetched id, and if the entity associated with that id disappears between those two steps, an error occurs.
I'm creating this issue more to ask if you guys have encountered something like this before and whether you've found any resolution. However, it may be useful for our purposes to restrict the _get_all_*
queries so that they only query information about the requested schema. Since checking the schema name would not involve any functions, we think it would resolve our issue. I don't know whether that would be a good change to make in upstream as well.
I am trying to connect to redshift from my python code. my pip installed:
psycopg2==2.6.1
redshift-sqlalchemy==0.4.1
SQLAlchemy==1.0.9
and my virtual machine(ubuntu) has:
libpq-dev
python-psycopg2
But I am still getting
engine = create_engine('redshift+psycopg2://{}:{}@{}'.format(username, password, url))
File "/opt/project/env/local/lib/python2.7/site-packages/sqlalchemy/engine/init.py", line 386, in create_engine
return strategy.create(args, *kwargs)
File "/opt/project/env/local/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 51, in create
entrypoint = u._get_entrypoint()
File "/opt/project/env/local/lib/python2.7/site-packages/sqlalchemy/engine/url.py", line 131, in _get_entrypoint
cls = registry.load(name)
File "/opt/project/env/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 205, in load
(self.group, name))
NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:redshift.psycopg2
I tried SQLAlchemy version 0.8.0, but doesn't work either.
With the same config, I am able to run from my laptop (mac), but on linux, I guess some packages still missing? Any suggestion will be appreciated, thanks!
With Spectrum, Amazon has added support for querying external data to Redshift. There are a few new features that make this work. It would be nice to support these for migrations/ddl and querying:
DDL:
Querying:
For most PRs, we are requiring tests, documentation, and CHANGES.rst notes. Sometimes we ask for these upfront, but sometimes we forget until later in the review, and it delays merging good changes.
We should provide a CONTRIBUTING.md both as a checklist for those doing review to make sure they aren't forgetting things, and for contributors so that they don't feel like requirements are popping up out of nowhere.
cc @graingert
(psycopg2.OperationalError) server certificate for "ec2-[IP address].compute-1.amazonaws.com" does not match host name "[Hostname].redshift.amazonaws.com"
The same connection URI was fine until last Friday. I've checked that both hostnames correspond to the same IP address. If I set 'sslmode': 'prefer'
in dialect.py, the connection is successful.
According to https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-ssl-support.html,
SSL support in Amazon Redshift is strictly for encrypting the connection between your client and your cluster; it should not be relied on for authenticating the server. To authenticate the server, install the public key (.pem file) for the SSL certificate on your client and use the key to connect to your clusters.
Perhaps sslmode: require
should be the default?
Had a discussion with @graingert that he's no longer using Redshift, so likely won't be able to be active in owning and maintaining this project.
I'm still using Redshift and can be involved, but probably can't give the project the attention it deserves without help. I propose we advertise for a new maintainer.
A proposed path forward:
If we get someone interested in being maintainer, then we'd proceed with making sure they also have rights on pypi, and we can update metadata in setup.py as appropriate.
How does the above sound, @graingert ?
I have a working copy command that works in SQL Workbench/J. Mirroring it, I created a CopyCommand object. What do I do next to run the copy command against Redshift?
I see that there's a
sqlalchemy_redshift.commands.visit_copy_command(element, compiler, **kw)
method. Looks like I'll have element = my_copy_command_object, is there more documentation on the compiler argument?
It'd be super if there's an end-to-end example somewhere, thanks a lot.
In RedshiftDialect
, there are three methods, _get_all_relation_info
, _get_all_column_info
, and _get_all_constraint_info
, which query information about the whole cluster. These methods are called by several other methods which pass their info_cache
argument down to one of the above methods. These methods, in turn, are called by other methods which, in most cases, pass down all of their keyword arguments. The one exception is _get_redshift_constraints
- this method is never called with the keyword arguments of its caller, and therefore never gets the info_cache
object that was passed in by SQLAlchemy. The result of this is that constraint information is never cached. When running something like alembic which does a lot of reflection, this can slow things down considerably because the dialect winds up querying information about the entire database every time alembic asks for anything.
It'd be swell to have some documentation, that for now could consist of answers to questions like "How is this different from the postgresql dialect?" rtfd.org + sphinx would be my tool of choice here, but others may have thoughts on this.
Hi there,
It looks like SQLAlchemy has changed a few method signatures in version >=1.2. The following statement works on SQLAlchemy <1.2 but not on SQLAlchemy >= 1.2:
table = sa.Table(table_name, meta, autoload=True, postgresql_ignore_search_path=True)
I get the following traceback with SQLAlchemy version 1.2.5:
File "/home/colin/work/spectrify/spectrify/utils/schema.py", line 54, in get_table_schema
table = sa.Table(table_name, meta, **table_kwargs)
File "/home/colin/.virtualenvs/spectrify/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 456, in __new__
metadata._remove_table(name, schema)
File "/home/colin/.virtualenvs/spectrify/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/home/colin/.virtualenvs/spectrify/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 451, in __new__
table._init(name, metadata, *args, **kw)
File "/home/colin/.virtualenvs/spectrify/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 533, in _init
include_columns, _extend_on=_extend_on)
File "/home/colin/.virtualenvs/spectrify/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 559, in _autoload
_extend_on=_extend_on
File "/home/colin/.virtualenvs/spectrify/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2056, in run_callable
return conn.run_callable(callable_, *args, **kwargs)
File "/home/colin/.virtualenvs/spectrify/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1545, in run_callable
return callable_(self, *args, **kwargs)
File "/home/colin/.virtualenvs/spectrify/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 389, in reflecttable
table, include_columns, exclude_columns, **opts)
File "/home/colin/.virtualenvs/spectrify/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 618, in reflecttable
table_name, schema, **table.dialect_kwargs):
File "/home/colin/.virtualenvs/spectrify/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 369, in get_columns
**kw)
File "<string>", line 2, in get_columns
File "/home/colin/.virtualenvs/spectrify/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 54, in cache
ret = fn(self, con, *args, **kw)
File "/home/colin/.virtualenvs/spectrify/local/lib/python2.7/site-packages/sqlalchemy_redshift/dialect.py", line 404, in get_columns
enums=[], schema=col.schema, encode=col.encode)
File "/home/colin/.virtualenvs/spectrify/local/lib/python2.7/site-packages/sqlalchemy_redshift/dialect.py", line 604, in _get_column_info
**kw
TypeError: _get_column_info() takes exactly 9 arguments (8 given)
I tracked the issue down to the following commit: zzzeek/sqlalchemy@fadb8d6#diff-d33159d80d3deef1d5bdcd057dcc3d6bR2443
We're running tests against Redshift now and have AWS access keys available to Travis, but the cluster is currently running full-time.
@graingert - you mentioned that you had a good idea of how to hook a script in to spin up the cluster for tests and then spin it back down. Are there any blockers at this point to make that happen?
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_LIBRARY.html
Would be nice for this to be supported in sqlalchemy-redshift.
Can't currently justify the cost of a redshift database just to run travis tests.
I want to copy some files from S3 in a different region: https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-source-s3.html#copy-region
Redshift is transitioning to ACM certificates. http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-transitioning-to-acm-certs.html
Does the cert in this project need to be updated to support this transition? https://github.com/sqlalchemy-redshift/sqlalchemy-redshift/blob/master/sqlalchemy_redshift/redshift-ssl-ca-cert.pem
Right now, when reflection fails, it's usually with a KeyError
, but the other sqlalchemy dialects raise sa.exc.NoSuch<object>Error
. I have code that depends on this exception being raised, which is how I found this issue.
I've got a PR in the works.
There are some details we'll need to deal with when this project moves to the https://github.com/sqlalchemy-redshift org.
sqlalchemy-redshift
(it's currently inverted)I wanted to make sure to document thoughts as I have them. Other things we'll need to keep in mind?
Pretty long traceback due to use wrapping alembic within click based command:
Traceback (most recent call last):
File "/home/gsliwinski/.virtualenvs/x/bin/dw", line 11, in <module>
load_entry_point('dw', 'console_scripts', 'dw')()
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/gsliwinski/dw/src/dw/scripts/decorators.py", line 11, in call_with_app_context
return cmd(*args, **kwargs)
File "/home/gsliwinski/dw/src/dw/scripts/alembic_migrations.py", line 137, in autogenerate
alembic_ctx = command.revision(alembic_config(click_ctx.obj['DB']), autogenerate=True, message=message)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/alembic/command.py", line 176, in revision
script_directory.run_env()
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/alembic/script/base.py", line 425, in run_env
util.load_python_file(self.dir, 'env.py')
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/alembic/util/pyfiles.py", line 93, in load_python_file
module = load_module_py(module_id, path)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/alembic/util/compat.py", line 75, in load_module_py
mod = imp.load_source(module_id, path, fp)
File "alembic/env.py", line 88, in <module>
run_migrations_online()
File "alembic/env.py", line 82, in run_migrations_online
context.run_migrations()
File "<string>", line 8, in run_migrations
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/alembic/runtime/environment.py", line 836, in run_migrations
self.get_context().run_migrations(**kw)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/alembic/runtime/migration.py", line 321, in run_migrations
for step in self._migrations_fn(heads, self):
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/alembic/command.py", line 156, in retrieve_migrations
revision_context.run_autogenerate(rev, context)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/alembic/autogenerate/api.py", line 415, in run_autogenerate
self._run_environment(rev, migration_context, True)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/alembic/autogenerate/api.py", line 451, in _run_environment
autogen_context, migration_script)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/alembic/autogenerate/compare.py", line 22, in _populate_migration_script
_produce_net_changes(autogen_context, upgrade_ops)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/alembic/autogenerate/compare.py", line 38, in _produce_net_changes
schemas = set(inspector.get_schema_names())
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 158, in get_schema_names
info_cache=self.info_cache)
File "<string>", line 2, in get_schema_names
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 54, in cache
ret = fn(self, con, *args, **kw)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/base.py", line 2349, in get_schema_names
).columns(nspname=sqltypes.Unicode))
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 945, in execute
return meth(self, multiparams, params)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
compiled_sql, distilled_params
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
context)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1405, in _handle_dbapi_exception
util.reraise(*exc_info)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
context)
File "/home/gsliwinski/.virtualenvs/x/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
cursor.execute(statement, parameters)
TypeError: 'dict' object does not support indexing
I m trying to reproduce an example from RS documentation using:
python 3.4
sqlalchemy-redshift==0.3.1
Here is an example from RedShift doc (http://goo.gl/V2PuZh):
begin;
[waits]
copy listing from ;
end;
But it throws an exception:
ERROR - [RedShift] Programming Error: (psycopg2.ProgrammingError) syntax error at or near "["
LINE 1: begin; [waits] copy my_schema.my_table (userid, flag,...
^
[SQL: "begin; [waits] copy my_schema.my_table (userid, flag, campid, condition, ts) from 's3://mybucket/mydata' credentials 'aws_access_key_id=<my_aws_secret_key>;aws_secret_access_key=<my_aws_secret_access_key>' delimiter ','; end;"]
It works good without the [waits] operator. Is there any workaround so I can use [waits]?
This breaks, because this was added:
When it's called from get_columns()
:
sqlalchemy-redshift/sqlalchemy_redshift/dialect.py
Lines 406 to 409 in 92852b7
Because comment
is missing from there.
I have encountered table names that look like schema.table.name
and reflection is breaking because SQL_IDENTIFER_RE.findall(key)
returns a list of length 3 in dialect._get_schema_and_relation
.
Hi all,
In our database, we have a relation named quote sitting inside a schema called platform.
Because the word quote is a Redshift keyword, the string representation of a dialect.RelationKey corresponding to this relation is platform."quote". This is all good.
The class RelationKey has an unquote method, which returns a string representation of it with any "" quotes removed. Unfortunately, this method correctly removes the quotes only when the schema is None. In our case, the code is used with the schema non None (and equal to platform). What we see is that unquote, erroneously, does not remove the quotes (as it triggers only when str(RelationKey) both starts and ends with quotes).
I think a more comprehensive behaviour for the unquote method would be applying unquoting at both the schema names AND the relation names, so that unquoted(platform."quote") -> platform.quote.
This bug prevents sqlalchemy from reflecting our platform schema, ending with:
File "/home/vlasisva/.local/lib/python3.5/site-packages/sqlalchemy_redshift/dialect.py", line 630, in _get_redshift_relation
raise sa.exc.NoSuchTableError(key)
sqlalchemy.exc.NoSuchTableError: platform."quote"
Happy to elaborate, if above is not clear.
It looks like all the action is happening in this fork rather than the original @binarydud repo. I'm working on a PR that fills out the reflection capabilities. Should I propose that here?
sqlalchemy redshift does not appear to support authenticating with AWS token/secret key. I did find something that looks like it might in commands.py but there are no docs supporting how to supply the credentials if it does exist.
begin;
DECLARE curname cursor FOR
SELECT * FROM schema.table;
fetch forward 10 from curname;
fetch next from curname;
close curname;
commit;
or
command_text = """
SELECT * FROM schema.table;
"""
conn = psycopg2.connect(host="HOST", dbname="dbname", password="****", user="user", port="5439")
cursor2 = conn.cursor('curname')
cursor2.itersize = 100
cursor2.execute(command_text)
for item in cursor2:
print(item)
This method is executed but it still allocated all queryset on server side and executing for about 10 minutes. In same cases it still have problem with overflow memory, but at this time on server side.
package connection;
import java.sql.*;
import java.util.Properties;
public class Redshift {
//Redshift driver: "jdbc:redshift://host:5439/dev";
static final String dbURL = "jdbc:redshift://host:5439/dbname";
static final String MasterUsername = "user";
static final String MasterUserPassword = "****";
public static void main(String[] args) {
Connection conn = null;
Statement stmt = null;
try{
Class.forName("com.amazon.redshift.jdbc41.Driver");
//Open a connection and define properties.
System.out.println("Connecting to database...");
Properties props = new Properties();
//Uncomment the following line if using a keystore.
//props.setProperty("ssl", "true");
props.setProperty("user", MasterUsername);
props.setProperty("password", MasterUserPassword);
conn = DriverManager.getConnection(dbURL, props);
//Try a simple query.
System.out.println("Listing system tables...");
stmt = conn.createStatement();
String sql;
sql = "SELECT * FROM schema.table;";
ResultSet rs = stmt.executeQuery(sql);
//Get the data from the result set.
while(rs.next()){
//Retrieve two columns.
String catalog = rs.getString("list_entry_id");
String name = rs.getString("source_code");
//Display values.
System.out.print("Catalog: " + catalog);
System.out.println(", Name: " + name);
}
rs.close();
stmt.close();
conn.close();
}catch(Exception ex){
//For convenience, handle all errors here.
ex.printStackTrace();
}finally{
//Finally block to close resources.
try{
if(stmt!=null)
stmt.close();
}catch(Exception ex){
}// nothing we can do
try{
if(conn!=null)
conn.close();
}catch(Exception ex){
ex.printStackTrace();
}
}
System.out.println("Finished connectivity test.");
}
}
As I mentioned a couple weeks ago, I got another regression, here's the minimal schema test case:
create table foo.domain(
id integer identity(1,1) not null unique);
create table foo.bar(
domain_id integer not null,
foreign key(domain_id) references foo.domain(id))
(I almost got it working last time, but then life intervened and haven't touch this code since :( )
This is quite weird, since redshiftmeta.reflect(only=['domain'], schema='foo')
works just fine, but redshiftmeta.reflect(only=['bar'], schema='foo')
fails with
*** NoSuchTableError: foo."domain"
Also, the same with foo.domain2
will work just fine... so I'd guess that there's some special-cased/hardcoded code for stuff named domain
, though by grepping briefly through the code I don't find anything suspicious
I got an error trying to create a table that had a column called "tag". The list of reserved words are here: http://docs.aws.amazon.com/redshift/latest/dg/r_pg_keywords.html
I fixed it in a very dirty way:
import sqlalchemy.dialects.postgresql.base as ps
REDSHIFT_RESERVED_WORDS = set("""aes128,aes256,all,allowoverwrite,analyse,
analyze,and,any,array,as,asc,authorization,backup,between,binary,blanksasnull,
both,bytedict,bzip2,case,cast,check,collate,column,constraint,create,
credentials,cross,current_date,current_time,current_timestamp,current_user,
current_user_id,default,deferrable,deflate,defrag,delta,delta32k,desc,disable,
distinct,do,else,emptyasnull,enable,encode,encrypt,encryption,end,except,
explicit,false,for,foreign,freeze,from,full,globaldict256,globaldict64k,grant,
group,gzip,having,identity,ignore,ilike,in,initially,inner,intersect,into,is,
isnull,join,leading,left,like,limit,localtime,localtimestamp,lun,luns,lzo,lzop,
minus,mostly13,mostly32,mostly8,natural,new,not,notnull,null,nulls,off,offline,
offset,oid,old,on,only,open,or,order,outer,overlaps,parallel,partition,percent,
permissions,placing,primary,raw,readratio,recover,references,respect,rejectlog,
resort,restore,right,select,session_user,similar,some,sysdate,system,table,tag,
tdes,text255,text32k,then,timestamp,to,top,trailing,true,truncatecolumns,union,
unique,user,using,verbose,wallet,when,where,with,without""".split(","))
ps.PGIdentifierPreparer.reserved_words = ps.PGIdentifierPreparer.reserved_words.union(helpers.REDSHIFT_RESERVED_WORDS)
Would be good if the redshift dialect would support this out of the box.
Hi, I'm using the latest version 7691341 of the repo with sa v. 1.0.12. I'm noticing some strange behavior if I define a model with an identity column. Here's an example:
import os
import sqlalchemy as sa
from sqlalchemy.ext import declarative
Base = declarative.declarative_base()
class IdentityExample(Base):
__tablename__ = 'identity_example'
id = sa.Column(sa.Integer, primary_key=True, info={'identity': (0, 1)})
name = sa.Column(sa.String)
engine = sa.create_engine(os.environ['REDSHIFT_URL'])
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
RedshiftSession = sa.orm.sessionmaker()
session = RedshiftSession(bind=engine)
example_1 = IdentityExample(name='Example #1')
session.add(example_1)
print("EXAMPLE 1 BEFORE: ", example_1.id, example_1.name)
session.commit()
print("EXAMPLE 1 AFTER: ", example_1.id, example_1.name)
example_2 = IdentityExample(name='Example #2')
session.add(example_2)
print("EXAMPLE 2 BEFORE: ", example_2.id, example_2.name)
session.commit()
print("EXAMPLE 2 AFTER: ", example_2.id, example_2.name)
I'd expect to see output like this:
EXAMPLE 1 BEFORE: None Example #1
EXAMPLE 1 AFTER: 0 Example #1
EXAMPLE 2 BEFORE: None Example #2
EXAMPLE 2 AFTER: 1 Example #2
Instead I see this:
EXAMPLE 1 BEFORE: None Example #1
EXAMPLE 1 AFTER: 0 Example #1
EXAMPLE 2 BEFORE: None Example #2
EXAMPLE 2 AFTER: 0 Example #1
Am I making some mistake, or is this a bug? Thanks!
When I try to connect using redshift+psycopg2
I end up with this error:
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) sslmode value "verify-full" invalid when SSL support is not compiled in
What do I need to do to actually connect to my redshift instance? I'm able to connect fine using postgresql+psycopg2
Also do not depend on psycopg2
# SQLAlchemy==1.0.10, sqlalchemy-redshift==0.4.0, psycopg2==2.6
import sqlalchemy as sa
from sqlalchemy.schema import CreateTable
engine = sa.create_engine('redshift+psycopg2://example')
user = sa.Table(
'user',
sa.MetaData(),
# http://docs.sqlalchemy.org/en/latest/core/metadata.html#sqlalchemy.schema.Column.params.autoincrement
sa.Column('id', sa.Integer, primary_key=True, autoincrement=True),
)
col = user.columns.get('id')
print(col.expression.autoincrement)
# >>> True
print(CreateTable(user).compile(engine))
# CREATE TABLE user (
# id INTERGER NOT NULL
# PRIMARY KEY (id)
# )
The ddl_compiler.get_column_default_string()
method comes back None
. I'm unable to figure out whether this is a lack of documentation and there is an easy solution or a deeper problem with the DDL for redshift.
Thanks in advance!
eg pg8000 or psycopg2cffi; Also do not depend on psycopg2
As discovered in #130 (comment) it looks like there's a change in Redshift behavior that's causing tests to fail in master.
We may simply need to update tests to match the new behavior.
Parameters for where
are all properly escaped, but adding a .limit()
to the SQLAlchemy expression causes an error because it is unescaped (the parameter is passed in as (%param1)
).
@graingert - What would you think about hosting the bigcrunch repo on GitHub under the sqlalchemy-redshift org? It would make it easier to browse the code, and it would let us make PRs.
Another thought on bigcrunch - I'd be for upping the timeout. It looks like it's supposed to shut down 1 hour after the last created test session, and I've gotten burned by that a few times, meaning waiting for another whole round of shutdown and spinup to be able to run more tests. Maybe let it keep running for 4 hours?
The UNLOAD command now has a very useful MAXFILESIZE option to control the maximum size of the produced files. The UnloadFromSelect
construct in the dialect should have support for that.
Getting an error when trying to create a table with a name longer than 63 characters, even though the Redshift docs state that 127 characters is the limit:
Probably only @graingert has full admin access to this repo. We should set the URL for this project to our readthedocs URL, so that it displays at the top of the GitHub page, right next to the project description.
We use role based credentials (http://docs.aws.amazon.com/redshift/latest/dg/copy-usage_notes-access-permissions.html#copy-usage_notes-access-role-based) instead of access and secret key. However it appears that UnloadFromSelect and CopyCommand do not support them.
I'm new at SQL Alchemy so apologies if this is obvious
I'm not able to to reflect for the non-default schemas:
When I run:
meta = sa.MetaData(schema="booker")
meta.reflect(bind=engine)
I get
OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[SQL: '
SELECT
c.relkind,
n.oid as "schema_oid",
n.nspname as "schema",
c.oid as "rel_oid",
c.relname,
CASE c.reldiststyle
WHEN 0 THEN \'EVEN\' WHEN 1 THEN \'KEY\' WHEN 8 THEN \'ALL\' END
AS "diststyle",
c.relowner AS "owner_id",
u.usename AS "owner_name",
TRIM(TRAILING \';\' FROM pg_catalog.pg_get_viewdef(c.oid, true))
AS "view_definition",
pg_catalog.array_to_string(c.relacl, \'
\') AS "privileges"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
JOIN pg_catalog.pg_user u ON u.usesysid = c.relowner
WHERE c.relkind IN (\'r\', \'v\', \'m\', \'S\', \'f\')
AND n.nspname !~ \'^pg_\'
ORDER BY c.relkind, n.oid, n.nspname;
']
ps. This seems similar to apache/superset#217
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.