GithubHelp home page GithubHelp logo

dogsheep / google-takeout-to-sqlite Goto Github PK

View Code? Open in Web Editor NEW
94.0 6.0 8.0 14 KB

Save data from Google Takeout to a SQLite database

License: Apache License 2.0

Python 100.00%
google sqlite datasette dogsheep datasette-io datasette-tool

google-takeout-to-sqlite's Introduction

Dogsheep

Dogsheep is a collection of tools for personal analytics using SQLite and Datasette.

Big internet companies know a lot about us. By exporting that data back out of them we can see what they know and maybe learn something interesting about ourselves.

Read more about Dogsheep on my blog: simonwillison.net/tags/dogsheep

Watch Personal Data Warehouses: Reclaiming Your Data for a demo of Dogsheep in action.

Dogsheep tools

These tools, maintained by the Dogsheep project, let you export your data into a SQLite database for further analysis.

Tools by other developers

These tools help bring the Dogsheep philosophy to life.

google-takeout-to-sqlite's People

Contributors

simonw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

google-takeout-to-sqlite's Issues

sqlite3.OperationalError: no such table: main.my_activity

Hello,
When i run the command google-takeout-to-sqlite my-activity db.db takeout-20220203T174446Z-001.zip, i get this error :

Traceback (most recent call last):
  File "c:\users\julie\appdata\local\programs\python\python39-32\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\julie\appdata\local\programs\python\python39-32\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\julie\AppData\Local\Programs\Python\Python39-32\Scripts\google-takeout-to-sqlite.exe\__main__.py", line 7, in <module>
  File "c:\users\julie\appdata\local\programs\python\python39-32\lib\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\julie\appdata\local\programs\python\python39-32\lib\site-packages\click\core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "c:\users\julie\appdata\local\programs\python\python39-32\lib\site-packages\click\core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\users\julie\appdata\local\programs\python\python39-32\lib\site-packages\click\core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\julie\appdata\local\programs\python\python39-32\lib\site-packages\click\core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "c:\users\julie\appdata\local\programs\python\python39-32\lib\site-packages\google_takeout_to_sqlite\cli.py", line 31, in my_activity
    utils.save_my_activity(db, zf)
  File "c:\users\julie\appdata\local\programs\python\python39-32\lib\site-packages\google_takeout_to_sqlite\utils.py", line 19, in save_my_activity
    db["my_activity"].create_index(["time"])
  File "c:\users\julie\appdata\local\programs\python\python39-32\lib\site-packages\sqlite_utils\db.py", line 629, in create_index
    self.db.conn.execute(sql)
sqlite3.OperationalError: no such table: main.my_activity

Thank you for your help
Sorry for my bad English
EDIT: i used the json format

Add more details on how to request data from google takeout correctly.

The default is to download everything. This can result in an enormous amount of data when you only really need 2 types of data for now:

  • My Activity
  • Location History

In addition unless you specify that "My Activity" is downloaded in JSON format the default is HTML. This then causes the

google-takeout-to-sqlite my-activity takeout.db takeout.zip

command to fail as it only contains html files not json files.

Thanks

KeyError: 'accuracy' when processing Location History

I'm new to both the dogsheep tools and datasette but have been experimenting a bit the last few days and these are really cool tools!

I encountered a problem running my Google location history through this tool running the latest release in a docker container:

Traceback (most recent call last):
  File "/usr/local/bin/google-takeout-to-sqlite", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/google_takeout_to_sqlite/cli.py", line 49, in my_activity
    utils.save_location_history(db, zf)
  File "/usr/local/lib/python3.9/site-packages/google_takeout_to_sqlite/utils.py", line 27, in save_location_history
    db["location_history"].upsert_all(
  File "/usr/local/lib/python3.9/site-packages/sqlite_utils/db.py", line 1105, in upsert_all
    return self.insert_all(
  File "/usr/local/lib/python3.9/site-packages/sqlite_utils/db.py", line 990, in insert_all
    chunk = list(chunk)
  File "/usr/local/lib/python3.9/site-packages/google_takeout_to_sqlite/utils.py", line 33, in <genexpr>
    "accuracy": row["accuracy"],
KeyError: 'accuracy'

It looks like the tool assumes the accuracy key will be in every location history entry.

My first attempt at a local patch to get myself going was to convert accessing the accuracy key to a .get instead to hopefully make the row nullable but I wasn't quite sure what sqlite_utils would do there. That did work in that the import happened and so I was going to propose a patch that made that change but in updating the existing test to include an entry with a missing accuracy entry, I noticed the expected type of the field appeared to be changing to a string in the test (and from a quick scan through the sqlite_utils code, probably TEXT in the database). Given this change in column type, it seemed that opening an issue first before proposing a fix seemed warranted. It seems the schema would need to be explicitly specified if you wanted a nullable integer column.

Now that I've done a successful import run using my initial fix of calling .get on the row dict, I can see with datasette that I only have 7 data points (out of ~250k) that have a null accuracy column. They are all from 2011-2012 in an import that includes points spanning ~2010-2016 so perhaps another approach might be to filter those entries out during import if it really is that infrequent?

I'm happy to provide a PR for a fix but figured I'd ask about which direction is preferred first.

location history changes

not sure if each download is unique, but I had to change some things to work with the takeout zip I made 2023-01-25

filename changed from "Location History.json" to "Records.json"

"timestampMs" is not present, "timestamp" is roughly iso timestamp

def get_timestamp_ms(raw_timestamp):
    try:
        return datetime.datetime.strptime(raw_timestamp, "%Y-%m-%dT%H:%M:%SZ").timestamp()
    except ValueError:
        return datetime.datetime.strptime(raw_timestamp, "%Y-%m-%dT%H:%M:%S.%fZ").timestamp()

def save_location_history(db, zf):
    location_history = json.load(
        zf.open("Takeout/Location History/Records.json")
    )
    db["location_history"].upsert_all(
        (
            {
                "id": id_for_location_history(row),
                "latitude": row["latitudeE7"] / 1e7,
                "longitude": row["longitudeE7"] / 1e7,
                "accuracy": row["accuracy"],
                "timestampMs": get_timestamp_ms(row["timestamp"]),
                "when": row["timestamp"],
            }
            for row in location_history["locations"]
        ),
        pk="id",
    )


def id_for_location_history(row):
    # We want an ID that is unique but can be sorted by in
    # date order - so we use the isoformat date + the first
    # 6 characters of a hash of the JSON
    first_six = hashlib.sha1(
        json.dumps(row, separators=(",", ":"), sort_keys=True).encode("utf8")
    ).hexdigest()[:6]
    return "{}-{}".format(
        row['timestamp'],
        first_six,
    )

example locations from mine

{
    "latitudeE7": 427220206,
    "longitudeE7": -923423972,
    "accuracy": 10,
    "deviceTag": -1312429967,
    "deviceDesignation": "PRIMARY",
    "timestamp": "2019-01-08T23:31:50.867Z"
  }
{
    "latitudeE7": 427011317,
    "longitudeE7": -923448300,
    "accuracy": 5,
    "deviceTag": -1312429967,
    "deviceDesignation": "PRIMARY",
    "timestamp": "2019-01-08T23:33:53Z"
  }, 

sqlite-utils error on takeout import

$ google-takeout-to-sqlite my-activity takeout.db /path/to/zip
...
sqlite3.OperationalError: no such table: main.my_activity

there is no table create in utils.py, unlike other importers such as github-to-sqlite

additionally, this package and hackernews-to-sqlite have conflicting sqlite-utils dep with datasette and dogsheep-beta

Feature Request: Gmail

From takeout, I only exported my Gmail account. Ideally I could parse this into sqlite via this tool.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.