GithubHelp home page GithubHelp logo

odino / docsql Goto Github PK

View Code? Open in Web Editor NEW
27.0 8.0 9.0 234 KB

Import Google Docs' spreadsheets into a MySQL table

License: MIT License

Makefile 8.28% Go 89.12% Dockerfile 2.60%
golang google-docs mysql

docsql's Introduction

docsql

A tool to import spreadsheets hosted on Google Docs to a MySQL table.

Usage

Grab a binary from the releases' page and start having some fun:

$ docsql \
--doc "https://docs.google.com/spreadsheets/d/1vyVxaYgfZ2Tka7reg4whg99kRlWqpg6cKvEa1QFArZI/export?format=tsv" \
--table my_sample \
--connection "root:@tcp(localhost:3308)/test?charset=utf8&allowAllFiles=true"

2018/02/12 23:20:31 Downloading https://docs.google.com/spreadsheets/d/1vyVxaYgfZ2Tka7reg4whg99kRlWqpg6cKvEa1QFArZI/export?format=tsv ...
2018/02/12 23:20:32 Doc downloaded in my_sample_1518463231621589126.csv
2018/02/12 23:20:32 Connecting to MySQL...
2018/02/12 23:20:32 Creating table 'my_sample_1518463231621589126'...
2018/02/12 23:20:32 Connecting to MySQL...
2018/02/12 23:20:32 Loading data into 'my_sample_1518463231621589126'...
2018/02/12 23:20:32 Connecting to MySQL...
2018/02/12 23:20:32 Swapping 'my_sample' with 'my_sample_1518463231621589126'
2018/02/12 23:20:32 Connecting to MySQL...
2018/02/12 23:20:32 Creating table 'my_sample'...
2018/02/12 23:20:32 Connecting to MySQL...
2018/02/12 23:20:32 Clearing old tables...
2018/02/12 23:20:32 All done

doc

select

Advanced

Spreadsheet

Your spreadsheet will need to be shared publicly (anyone with the link can access), and the URL you need to feed to docsql takes the form of https://docs.google.com/spreadsheets/d/$DOCID/export?format=tsv where $DOCID is the unique ID of the Google Doc.

By default, docsql will download the first sheet in the doc, but if you need to import other sheets you can simply append the gid of the sheet at the end of the URL (https://docs.google.com/spreadsheets/d/$DOCID/export?format=tsv&gid=$GID).

Please note that the export format must be tsv because, well, it's just easier than csv

MySQL

Instead of passing the connection string to MySQL as a flag you can export it as environment variable -- this makes sure you don't leave credentials on the CLI:

$ export $CONNECTION=...

$ docsql \
--doc "https://docs.google.com/spreadsheets/d/1vyVxaYgfZ2Tka7reg4whg99kRlWqpg6cKvEa1QFArZI/export?format=tsv" \
--table my_sample  

2018/02/12 23:27:30 Downloading https://docs.google.com/spreadsheets/d/1vyVxaYgfZ2Tka7reg4whg99kRlWqpg6cKvEa1QFArZI/export?format=tsv ...
2018/02/12 23:27:33 Doc downloaded in my_sample_1518463650997899367.csv
2018/02/12 23:27:33 Connecting to MySQL...
2018/02/12 23:27:33 Creating table 'my_sample_1518463650997899367'...
2018/02/12 23:27:33 Connecting to MySQL...
2018/02/12 23:27:33 Loading data into 'my_sample_1518463650997899367'...
2018/02/12 23:27:33 Connecting to MySQL...
2018/02/12 23:27:33 Swapping 'my_sample' with 'my_sample_1518463650997899367'
2018/02/12 23:27:33 Connecting to MySQL...
2018/02/12 23:27:33 Creating table 'my_sample'...
2018/02/12 23:27:33 Connecting to MySQL...
2018/02/12 23:27:33 Clearing old tables...
2018/02/12 23:27:33 All done

Be aware that LOAD DATA LOCAL INFILE must be available on the MySQL server, and you will need to end your connection string with allowAllFiles=true so that the Go MySQL driver is allowed to process local files.

Keeping old tables

docsql is (probably) meant to run as a cron, or everytime you make an update to your spreadsheet -- whenever it runs, it nukes the previous version of the output table and imports the new contents of the spreadsheet.

You can customize how many (old) tables to keep with the --keep flag. For example, docsql ... --keep 5 will keep 5 version of the old table in MySQL:

mysql> SHOW TABLES;
+---------------------------------------+
| Tables_in_test                        |
+---------------------------------------+
| my_sample                             |
| my_sample_1518463163413558194_archive |
| my_sample_1518463168405819860_archive |
| my_sample_1518463173716215291_archive |
| my_sample_1518463231621589126_archive |
| my_sample_1518463650997899367_archive |
+---------------------------------------+

Table structure

docsql will make a few opinionated assumptions for you:

  • all fields in the table are VARCHAR(255)
  • it creates an docsql_id field used as a primary key
  • it adds an docsql_created_at with the timestamp when the rows were loaded into the table
  • will sanitize column names (taken from the spreadsheet) filtering out non alphanumeric characters

There are plans to make all of these configurable in the future through flags... ...PRs are more than welcome!

Other stuff?

It might be a good idea to run docsql --help to have a look at what's available.

Contributing

docsql is being developed through docker because... ...well, don't always have the Go toolchain with me!

Anyhow, it should be fairly straighforward to get running:

  • make build_docker, will build the docker container used to develop
  • make test ARGS="go run main.go -d $YOUR_TEST_DOC -t $TABLE -c $MYSQL_CONNECTION_STRING" will build and run docsql on the fly
  • make release when you want to generate a release binary (under builds/)

Feel free to rant or, even better, fix some of my crappy code through a pull request!

Tests

tommy

Ideas

  • if a column ends in :index it should be indexed
  • ability to alter the CREATE TABLE via flags
  • abort if some basic checks don't pass (ie. minmum number of rows when someone nukes the doc by mistake)

docsql's People

Contributors

nerac avatar odino avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docsql's Issues

Some tsv files are not accepted in the tool

This tool has been extremely helpful for my job, thank you very much for sharing this publicly. Unfortunately, I started been having an issue with it. Even though I use the correct link to download the table with this URL format:
https://docs.google.com/spreadsheets/d/<TABLE_ID>/export?format=tsv&gid=<SHEET_GID>
I get the message
Unrecoverable error: The file we downloaded has content type '[text/html; charset=UTF-8]', while we expected 'text/tab-separated-values'. Are you sure you entered the right URL?

This URL format works for most of the tables but not for some.

When I comment out the part which checks for the table format (tsv) I get this parsing error which might be helpful figuring out the problem we're facing.

Unrecoverable error: parse error on line 3, column 11: bare " in non-quoted-field

I checked the table and it doesn't have any character (like ", \t, ',') that might have caused the parsing error.

Do you have any idea why I might be facing this problem?
I think it might be happening from a redirected link. If a download link is redirected, the downloaded file with curl will include the html file which will include the redirected link. Is there a way to follow redirections and download the very last file that is reached?

Archive tables never purged

Hi there, first of all - thank you so much for developing this software, it was exactly what I was looking for, and works like a charm.

I can't seem to get the archived tables to be deleted. I've experimented with different settings for -k (--keep) but none are deleted, even though command-line output seems to claim it is doing so:

2019/03/20 16:06:05 Connecting to MySQL... 2019/03/20 16:06:05 Clearing old tables... 2019/03/20 16:06:05 All done

The user has DROP privileges.

Unable to DL sheet

Keep getting the error Unrecoverable error: The file we downloaded has content type '[text/html; charset=UTF-8]', while we expected 'text/tab-separated-values'. Are you sure you entered the right URL?

This was working fine last week. Doc URL ends with /export?format=tsv

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.