GithubHelp home page GithubHelp logo

Comments (8)

BurntSushi avatar BurntSushi commented on July 20, 2024

The business reason for altering data types is that Rails Active Record Migrations can't recognize user-defined types in PostgreSQL, so won't know to interpret start_field and end_field as integers. I'm making a bunch of other table and column alters too, and have taken care to write changes to revert to the original schema so that nfldb-update still works.

This sounds like a great project! Thank you for working on it. Bringing nfldb into the Rails world seems like a great thing.

With respect to your problem... The issue is that the field_pos type is actually a "row" type with a single column. You can see the different with these two select queries:

This selects the full "row": (notice the parenthesis, you could also think of start_field as a tuple of a single value)

SELECT start_field FROM drive LIMIT 1;
 start_field 
-------------
 (-30)
(1 row)

And you can access the individual components of the row or tuple:

SELECT (start_field).pos FROM drive LIMIT 1;
 start_field 
-------------
 -30
(1 row)

And now you can cast:

SELECT (start_field).pos::int FROM drive LIMIT 1;
 start_field 
-------------
 -30
(1 row)

The issue is that you were trying to cast a tuple to an integer directly. This problem would have been more clear if it weren't a 1-tuple, but creating custom types is useful for automatic serialization with Python's PostgreSQL driver, psycopg2.

from nfldb.

albertlyu avatar albertlyu commented on July 20, 2024

Hi -- thanks, that was a very helpful explanation! It makes sense now why some fields are encapsulated by parentheses -- they're tuples with only one element! start_time and end_time are more obviously 2-tuples of data type game_time as (phase, elapsed), and so on (e.g. (Q1,0)).

So I can cast (start_field).pos directly to an integer, but I am now getting a different error when altering the column using this method. Do you think you can explain the following?

ALTER TABLE drive ALTER COLUMN start_field SET DATA TYPE int USING (start_field).pos::int;

ERROR:  column notation .pos applied to type integer, which is not a composite type
LINE 1: ...BLE drive ALTER COLUMN start_field SET DATA TYPE int USING (...
                                                             ^

My guess is that this ALTER statement is trying to alter the column (a tuple) and change the data type by using an individual component of the tuple, instead of the tuple itself. But all I want is the component of the tuple, not the 'row' itself (composite type is to row/record as data type is to field?).

Perhaps I'm not understanding how USING is worked when changing the data type of a column. Curiously, the following also return the same error message, though they may be in effect executing the same command:

ALTER TABLE drive ALTER COLUMN start_field SET DATA TYPE int USING (start_field).pos;

ERROR:  column notation .pos applied to type integer, which is not a composite type
LINE 1: ...BLE drive ALTER COLUMN start_field SET DATA TYPE int USING (...
                                                             ^
ALTER TABLE drive ALTER COLUMN start_field TYPE int USING (start_field).pos;

ERROR:  column notation .pos applied to type integer, which is not a composite type
LINE 1: ...BLE drive ALTER COLUMN start_field TYPE int USING (start_fie...
                                                             ^

Any thoughts?

from nfldb.

BurntSushi avatar BurntSushi commented on July 20, 2024

I'm stumped. Can you brute force it instead?

nfldb_test=# alter table drive add column start_field_scalar field_offset;
ALTER TABLE
nfldb_test=# update drive set start_field_scalar = (start_field).pos;
UPDATE 39368
nfldb_test=# alter table drive drop column start_field;
ALTER TABLE
nfldb_test=# alter table drive rename column start_field_scalar to start_field;
ALTER TABLE

Resulting table:

nfldb_test=# \d drive
               Table "public.drive"
    Column     |         Type         | Modifiers 
---------------+----------------------+-----------
 gsis_id       | gameid               | not null
 drive_id      | usmallint            | not null
 start_time    | game_time            | not null
 end_field     | field_pos            | 
 end_time      | game_time            | not null
 pos_team      | character varying(3) | not null
 pos_time      | pos_period           | 
 first_downs   | usmallint            | not null
 result        | text                 | 
 penalty_yards | smallint             | not null
 yards_gained  | smallint             | not null
 play_count    | usmallint            | not null
 time_inserted | utctime              | not null
 time_updated  | utctime              | not null
 start_field   | field_offset         | 

from nfldb.

albertlyu avatar albertlyu commented on July 20, 2024

Sorry for the late response, but I will resort to adding/dropping columns if I can't figure out another way. Feel free to close this if you want to keep the issue list with recent or high priority issues.

from nfldb.

albertlyu avatar albertlyu commented on July 20, 2024

Thanks again for the help!

from nfldb.

BurntSushi avatar BurntSushi commented on July 20, 2024

No worries, I'm just doing some tidying. Feel free to close this when you reach a resolution.

I noodled on this a little more, and I speculate that it may be impossible to do with a single ALTER TABLE command. I think the problem is that ALTER COLUMN will only let you change the type if the in memory representation is the same. I would have thought that a tuple with a single field would have the same representation as a single field itself, but perhaps that was a bad assumption.

In any case, I found it difficult to get proper documentation on the matter. The ALTER TABLE docs are broad without much depth.

from nfldb.

albertlyu avatar albertlyu commented on July 20, 2024

So is it possible to do this with multiple ALTER TABLE commands without dropping the columns? Like altering the type of the field or the tuple in subsequent ALTER TABLE commands? The way I ideally would like to set up my ALTER statements would be each table per ALTER statement with ALTER COLUMNS one by one, but there does not seem to be a straightforward way to go about that. Maybe this is not worth the trouble to find a 'clean way' of alterring the data types from tuples. Hmm.

from nfldb.

BurntSushi avatar BurntSushi commented on July 20, 2024

If my speculation about ALTER COLUMN not being allowed to change the in memory representation of a column is correct (and (value,) has a different representation than value), then no, it cannot be done with just ALTER commands.

I emphasize that this is speculation. Haven't really had time to dig deeper.

The method I suggested above is a bit brutish, but if it's contained inside a single transaction, then it's guaranteed that you won't corrupt your database by being left in an intermediate state. (You should be thankful that you aren't dealing with SQLite. It doesn't even have an ALTER TABLE command in the first place! You have to create a temporary table, copy data to it, drop existing table, create new table with schema changes, copy old data into new table with whatever transformations are necessary and then drop the temporary table. Blech.)

If you're writing migrations and it's doing transactions correctly, then there really isn't a major downside to just using my approach. In fact, it's probably pretty close to what happens under the hood if the ALTER TABLE command worked.

from nfldb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.