GithubHelp home page GithubHelp logo

klieret / ankipandas Goto Github PK

View Code? Open in Web Editor NEW
124.0 9.0 16.0 757 KB

Analyze and manipulate your Anki collection using pandas! ๐ŸŒ ๐Ÿผ

Home Page: https://ankipandas.rtfd.io/

License: MIT License

Python 100.00%
anki pandas pandas-dataframes anki-addon anki21 spaced-repetition

ankipandas's Introduction

Analyze and manipulate your Anki collection using pandas!

Documentation Status Gitter License PR welcome

pre-commit.ci status gh actions Coveralls CodeQL gitmoji Black Pypi status

๐Ÿ“ Description

Note This package needs a new maintainer, as I currently do not have enough time to continue development of this package. Writing modifications back into the Anki database is currently disabled, in particular because of issue #137. Please reach out to me if you are interested in getting involved!

Anki is one of the most popular flashcard system for spaced repetition learning, pandas is the most popular python package for data analysis and manipulation. So what could be better than to bring both together?

With AnkiPandas you can use pandas to easily analyze or manipulate your Anki flashcards.

Features:

  • Select: Easily select arbitrary subsets of your cards, notes or reviews using pandas (one of many introductions, official documentation)
  • Visualize: Use pandas' powerful built in tools or switch to the even more versatile seaborn (statistical analysis) or matplotlib libraries
  • Manipulate: Apply fast bulk operations to the table (e.g. add tags, change decks, set field contents, suspend cards, ...) or iterate over the table and perform these manipulations step by step. โš ๏ธ This functionality is currently disabled until #137 has been resolved! โš ๏ธ
  • Import and Export: Pandas can export to (and import from) csv, MS Excel, HTML, JSON, ... (io documentation)

Pros:

  • Easy installation: Install via python package manager (independent of your Anki installation)
  • Simple: Just one line of code to get started
  • Convenient: Bring together information about cards, notes, models, decks and more in just one table!
  • Fully documented: Documentation on readthedocs
  • Well tested: More than 100 unit tests to keep everything in check

Alternatives: If your main goal is to add new cards, models and more, you can also take a look at the genanki project.

๐Ÿ“ฆ Installation

AnkiPandas is available as pypi package and can be installed or upgrade with the python package manager:

pip3 install --user --upgrade ankipandas

Development installation

For the latest development version you can also work from a cloned version of this repository:

git clone https://github.com/klieret/ankipandas/
cd ankipandas
pip3 install --user --upgrade --editable .

If you want to help develop this package further, please also install the pre-commit hooks and use gitmoji:

pre-commit install
gitmoji -i

๐Ÿ”ฅ Let's get started!

Starting up is as easy as this:

from ankipandas import Collection

col = Collection()

And col.notes will be dataframe containing all notes, with additional methods that make many things easy. Similarly, you can access cards or reviews using col.cards or col.revs.

If called without any argument Collection() tries to find your Anki database by itself. However this might take some time. To make it easier, simply supply (part of) the path to the database and (if you have more than one user) your Anki user name, e.g. Collection(".local/share/Anki2/", user="User 1") on many Linux installations.

To get information about the interpretation of each column, use print(col.notes.help_cols()).

Take a look at the documentation to find out more about more about the available methods!

Some basic examples:

๐Ÿ“ˆ Analysis

More examples: Analysis documentation, projects that use AnkiPandas.

Show a histogram of the number of reviews (repetitions) of each card for all decks:

col.cards.hist(column="creps", by="cdeck")

Show the number of leeches per deck as pie chart:

cards = col.cards.merge_notes()
selection = cards[cards.has_tag("leech")]
selection["cdeck"].value_counts().plot.pie()

Find all notes of model MnemoticModel with empty Mnemotic field:

notes = col.notes.fields_as_columns()
notes.query("model=='MnemoticModel' and 'Mnemotic'==''")

๐Ÿ› ๏ธ Manipulations

Warning Writing the database has currently been disabled until #137 has been resolved. Help is much appreciated!

Warning Please be careful and test this well! Ankipandas will create a backup of your database before writing, so you can always restore the previous state. Please make sure that everything is working before continuing to use Anki normally!

Add the difficult-japanese and marked tag to all notes that contain the tags Japanese and leech:

notes = col.notes
selection = notes[notes.has_tags(["Japanese", "leech"])]
selection = selection.add_tag(["difficult-japanese", "marked"])
col.notes.update(selection)
col.write(modify=True)  # Overwrites your database after creating a backup!

Set the language field to English for all notes of model LanguageModel that are tagged with English:

notes = col.notes
selection = notes[notes.has_tag(["English"])].query("model=='LanguageModel'").copy()
selection.fields_as_columns(inplace=True)
selection["language"] = "English"
col.notes.update(selection)
col.write(modify=True)

Move all cards tagged leech to the deck Leeches Only:

cards = col.cards
selection = cards[cards.has_tag("leech")]
selection["cdeck"] = "Leeches Only"
col.cards.update(selection)
col.write(modify=True)

๐Ÿž Troubleshooting

See the troubleshooting section in the documentation.

๐Ÿ’– Contributing

Your help is greatly appreciated! Suggestions, bug reports and feature requests are best opened as github issues. You could also first discuss in the gitter community. If you want to code something yourself, you are very welcome to submit a pull request!

Bug reports and pull requests are credited with the help of the allcontributors bot.

๐Ÿ“ƒ License & Disclaimer

This software is licenced under the MIT license and (despite best testing efforts) comes without any warranty. The logo is inspired by the Anki logo (license) and the logo of the pandas package (license2). This library and its author(s) are not affiliated/associated with the main Anki or pandas project in any way.

โœจ Contributors

Thanks goes to these wonderful people (emoji key):

Blocked
Blocked

๐Ÿ›
CalculusAce
CalculusAce

๐Ÿ›
Francis Tseng
Francis Tseng

๐Ÿ› ๐Ÿ’ป
Keith Hughitt
Keith Hughitt

๐Ÿ›
Miroslav ล edivรฝ
Miroslav ล edivรฝ

โš ๏ธ ๐Ÿ’ป
Nicholas Bollweg
Nicholas Bollweg

๐Ÿ’ป
Thomas Brownback
Thomas Brownback

๐Ÿ›
eshrh
eshrh

๐Ÿ“–
exc4l
exc4l

๐Ÿ› ๐Ÿ’ป
p4nix
p4nix

๐Ÿ›

This project follows the all-contributors specification. Contributions of any kind welcome!

ankipandas's People

Contributors

allcontributors[bot] avatar bollwyvl avatar eshrh avatar eumiro avatar exc4l avatar frnsys avatar khonkhortisan avatar klieret avatar lgtm-migrator avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ankipandas's Issues

Mention Notebooks in Docs

The examples provided don't work out of the box when run from the command line. They do work when run as part of a notebook, this makes me suspect there is a strong assumption by the authors that users will be working in notebooks.

If this is the case it should be noted in the documentation.

Update Cause Duplicates

The "col.notes.update()" seems to duplicates cards when editing values in "nflds". E.g. two exact "Card 1" from same Note (nid).

More options list_decks and list_models

  • list_decks(min_cards=100) to display only decks with at least 100 cards
  • list_models(with_field="Expression") to list all models that contain the expression field

Adding new field to existing note

Hi!

First of all, thank you so much for this excellent library! Save me a lot of time working with the anki database.

I just want to ask if there is any way to add a new field to a note? Do I need to edit the note model in Anki first or ankipandas also expose the note model for editing?

Thank you so much!

Process for Adding Cards to a Deck with ankipandas

I have been following the documentation for creating cards with ankipandas, but I feel like I am misunderstanding the process a bit. I have successfully been able to create notes, to spawn new nid's that can be used for the cards, but when I try to use those nids to add cards for those notes to a specific deck, I always get the following error:

ValueError: The following note IDs (nid) can't be found in the notes table: 1605035990398, 1605036027992. Perhaps you didn't call notes.write() to write them back into the database?

Note that the 2 nids in the error are the nids that I added with the add_notes function. While I really appreciate the hint about notes.write, it is a tad confusing to understand exactly what that means in this context. Do the new notes need to be written to the database before new cards can be added to the card table? If that is the case, that isn't very intuitive, as I'd expect that you could create notes and cards and then collectively write them back to the database together, but perhaps there's something I'm missing. It would be tremendously helpful to see a full example of adding a simple "Basic" note/card to a deck.

Thanks again for this module and really hope I can get some assistance with this.

Update:

I made a little more progress. I was able to successfully write the notes to the database and get over that error I was getting, but now I am running into the following error when trying to add cards to a deck:

OverflowError: Python int too large to convert to C long

The error occurs in this segment of code in add_cards function:

    add = add.astype(
        {
            key: value
            for key, value in _columns.dtype_casts_all.items()  <--
            if key in self.columns
        }
    )

It appears dtype_casts_all is throwing the error.

notes.has_tag() not working Unknown value of _df_format: None

MWE:

from ankipandas import Collection
col = Collection(user='User 1')
notes = col.cards.merge_notes()
notes.has_tag('leech')

results in:

~\Anaconda3\lib\site-packages\ankipandas\ankidf.py in _check_df_format(self)
    185             pass
    186         else:
--> 187             raise ValueError(
    188                 "Unknown value of _df_format: {}".format(self._df_format)
    189             )

ValueError: Unknown value of _df_format: None

pandas version 1.1.3.
Anki version 2.1.22

Is this reproducible? If not, is there anything I can provide you with?

Common database locations on Windows

Where does Anki usually store it's database under Windows?

This would be good to know in order to make it easy for AnkiPandas to find the databse automatically.

set_info()

info table isn't encoded and set properly yet

Restore from backup

Have a method Collection.restore() that restores the db from the last backup

Allow to create new decks

Thanks for this project.

I'm looking for a way to generate test decks for other add-ons.

So, I'd like to be able to programmatically build decks that have a fake review history. So they'd contain a mix of cards in review, lapsed, learning, and relearning, with different histories of reviews. Then I'd test other add-ons against those decks, to confirm those add-ons modified ease and ivls correctly.

I'm checking out genanki as well, maybe that's a better fit, not sure.

But if making up cards with review histories is something you think ankipandas could take on, would love to see an example or two in the Usages section of how to create a deck from scratch (if that's a thing), and how to edit card review histories (if that's a thing).

Not sure if this is a documentation request, or if it's completely out of scope for the project, sorry!

Warnings: Setting value on slice

when doing selection.fields_as_columns(inplace=True):

../../ankipandas/ankidf.py:542: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[prefix + field] = ""
/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py:543: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py:3697: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)

Interface changes and major refactoring

Duplicate information

Options:

  1. [CURRENT] Allow having duplicate information as columns in DataFrame and consequently have methods to check consistency and both way syncing: Method flds_vs_columns (perhaps internal), flds_to_columns (creates columns if nexist, prebooks back conversion), columns_to_flds.

    Pros:

    • Explicit

    Cons:

    • Perhaps that there's more than one place to edit? But Sync makes it obvious again.
  2. Only allow to toggle between corresponding representations fields_as_columns, fields_as_list (default)
    Pros:

    • No need to check
    • Always clear to the user where he can change something
  3. Settle for easy-to-use representation and hide the internal one. Impossible for flds (too many columns added, too costly computation time wise, unpractical when just copying). Definitely for tags. But for did/dname:

    Pros:

    • Most clutter-free option. User also never needs both, because he would have to overwrite one option anyhow
    • Nobody wants IDs
    • Can still get IDs with a method (if needed at all).
    • Can rename dname > deck, mname > model

    Cons:

    • Fragile?
    • Slightly less memory efficient

Resolutions:

  • dname/mname/did/mid: 3
  • Flds: Definitly never show \x1f-joined string to user, but need both joined and expanded form => 2
  • Tags: 3

Clashing column names

Options:

  1. [CURRENT] Append c or n to only these fields that clash when being merged from cards or notes.

    Cons:
    Inconsistent

  2. Append c or n to all fields that are being merged from cards or notes.

  3. Append r, c and n to all fields from revlog, cards, notes

    Pros:

    • Absolutely consistent, makes help table much easier
    • No more need for AnkiPandas._nid_column etc.
    • User finally understands different id columns.
  4. Up to user
    Cons:

    • Fragile

Resolutions:

  • 3

Help

Resolutions:

  • Rename help -> help_cols
  • Add help method that prints general help

IDs are strings

There should be a warning notice that all IDs are currently treated as strings.

This is due to problems to encode NaNs that can appear when merging IDs from another table.

Decodify card state columns?

Resolve "-3=sched buried, -2=user buried, -1=suspended, 0=new, 1=learning, 2=due (as for type), 3=in learning etc.

JSONDecodeError

I am not very experienced with Python, but I tried getting a basic example to work and it fails:

INFO: Searching for database. This might take some time. You can speed this up by specifying a search path or directly entering the path to your database.
WARNING: The search will stop at the first hit, so please verify that the result is correct (for example in case there might be morethan one Anki installations)
INFO: Loaded db from /Users/XXXX/Library/Application Support/Anki2/XXXX/collection.anki2
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    cards = col.cards.merge_notes()
  File "/usr/local/lib/python3.7/site-packages/ankipandas/collection.py", line 99, in cards
    return self._get_item("cards")
  File "/usr/local/lib/python3.7/site-packages/ankipandas/collection.py", line 83, in _get_item
    r = self._get_original_item(item).copy(True)
  File "/usr/local/lib/python3.7/site-packages/ankipandas/collection.py", line 76, in _get_original_item
    r = AnkiDataFrame.init_with_table(self, item)
  File "/usr/local/lib/python3.7/site-packages/ankipandas/ankidf.py", line 103, in init_with_table
    new._get_table(col, table, empty=empty)
  File "/usr/local/lib/python3.7/site-packages/ankipandas/ankidf.py", line 98, in _get_table
    self.normalize(inplace=True)
  File "/usr/local/lib/python3.7/site-packages/ankipandas/ankidf.py", line 1025, in normalize
    self["cdeck"] = self["did"].map(raw.get_did2deck(self.db))
  File "/usr/local/lib/python3.7/site-packages/ankipandas/raw.py", line 294, in get_did2deck
    dinfo = get_deck_info(db)
  File "/usr/local/lib/python3.7/site-packages/ankipandas/raw.py", line 281, in get_deck_info
    return get_info(db)["decks"]
  File "/usr/local/lib/python3.7/site-packages/ankipandas/raw.py", line 120, in get_info
    ret[col] = json.loads(val)
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.de
coder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The code is:

from ankipandas import Collection
col = Collection()

cards = col.cards.merge_notes()
selection = cards[cards.has_tag("leech")]
selection["cdeck"].value_counts().plot.pie()

Is there anything I am doing wrong? I tested both on Anki 2.1.26 and .28alpha3, running on macOS. Python version 3.7.

aquery method

Easy to use method that can perform most selections.

With options:

  • inplace
  • deck
  • model
  • has_field
  • has_tag (What to do if users want to select against one tag, or select multiple tags etc.?)
  • was_added
  • was_modified

??

  • queue: learning/new/relearn/...
  • suspended/.../.../...

or should we have is_... methods instead?

Maybe?

  • is_leech
  • is_marked

Testing on Windows: TemporaryDirectory can't be cleaned

If you run pytest locally on Windows, there are weird PermissionErrors: Some other process seems to be still running that is using a file from the TemporaryDirectory, hence it can't be removed.

@exc4l do you see this issue when running locally under Windows as well?

fields_as_columns and update

python col.notes.update(other_notes)
won't work if we had called other_notes.fields_as_columns() but not on col.notes.

AttributeError with ankipandas.Collection().notes

Was just kicking the tires, col.revs and col.cards worked as expected, but ran into an issue when calling col.notes:

(with col = ankipandas.Collection(), which it seemed to find fine.)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/collection.py", line 90, in notes
    return self._get_item("notes")
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/collection.py", line 83, in _get_item
    r = self._get_original_item(item).copy(True)
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/collection.py", line 76, in _get_original_item
    r = AnkiDataFrame.init_with_table(self, item)
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/ankidf.py", line 103, in init_with_table
    new._get_table(col, table, empty=empty)
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/ankidf.py", line 98, in _get_table
    self.normalize(inplace=True)
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/ankidf.py", line 1028, in normalize
    self["nmodel"] = self["mid"].map(raw.get_mid2model(self.db))
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/raw.py", line 347, in get_mid2model
    minfo = get_model_info(db)
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/raw.py", line 334, in get_model_info
    return {int(key): value for key, value in get_info(db)["models"].items()}
AttributeError: 'str' object has no attribute 'items'

Writing to collection destroys tags

MWE:

col = Collection(user='testing')
notes = col.notes
selection = notes[notes.has_tag("leech")]
#selection = selection.add_tag(["marked"])
col.notes.update(selection)
col.write(modify=True)

col.summarize_changes() for this:

======== notes ========
Total rows: 10347
Compared to original version:
Modified rows: 0
Added rows: 0
Deleted rows: 0

This doesn't result in an immediate error but after opening anki the card browser is unable to find notes/cards by tag.

In the testing profile, I do have 32 notes tagged with 'leech'.
After writing when searching by the leech tag it shows no notes.
The notes are still there and if I search for them manually I can see that they indeed do have the leech tag.
If I, after now manually selecting them, do a search again for the leech tag, they do appear but also only the ones I selected manually.

When using the .add_tag() function the new tags do show up in the notes but it's the same with the leech tag. The card browser is only able to find the new tag after I manually selected the cards.

This isn't fixed by anki's "check database" function.

As mentioned before I'm using an older anki version. If you aren't able to reproduce this let me know and I will see that I try the same with the newest anki version.

Furthermore, as you might see I'm doing the selection a bit different than your example in the readme. This example:

selection = col.notes.has_tags(["Japanese", "leech"])
selection = selection.add_tag(["difficult-japanese", "marked"])
col.notes.update(selection)
col.write(modify=True)  # Overwrites your database after creating a backup!

results in:

----> 2 selection = selection.add_tag(["difficult-japanese", "marked"])

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5137             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5138                 return self[name]
-> 5139             return object.__getattribute__(self, name)
   5140 
   5141     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'add_tag'

Could the first part be relating to #32 ? In my case no duplicates seem to be created though.

AttributeError: 'NoneType' object has no attribute 'col'

As reported by @CalculusAce in #42

In the pervious version, I was able to merge the notes tables into the card table after expanding the fields in the note table with fields as columns. In the current build, I am now unable to merge the field columns into the cards table, which is critical for an app I am developing. Here is the code I was using successfully in the previous version that no longer works in the current build.

collect = Collection()
cards = collect.cards
notes = collect.notes
notes.fields_as_columns(inplace=True)
field_cols = [col for col in notes.columns if 'nfld' in col]
cards.merge_notes(inplace=True,columns=field_cols+['nmodel','ntags','nid'])

I get the following error after upgrading to the latest version of ankipandas that you released this morning:

"ankidf.py", line 459, in merge_notes
    setattr(ret, md, getattr(self, md))

AttributeError: 'NoneType' object has no attribute 'col'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.