klieret / ankipandas Goto Github PK

View Code? Open in Web Editor NEW

124.0 9.0 16.0 757 KB

Analyze and manipulate your Anki collection using pandas! 🌠🐼

Home Page: https://ankipandas.rtfd.io/

License: MIT License

Python 100.00%

anki pandas pandas-dataframes anki-addon anki21 spaced-repetition

ankipandas's Introduction

Analyze and manipulate your Anki collection using pandas!

📝 Description

Note This package needs a new maintainer, as I currently do not have enough time to continue development of this package. Writing modifications back into the Anki database is currently disabled, in particular because of issue #137. Please reach out to me if you are interested in getting involved!

Anki is one of the most popular flashcard system for spaced repetition learning, pandas is the most popular python package for data analysis and manipulation. So what could be better than to bring both together?

With AnkiPandas you can use pandas to easily analyze or manipulate your Anki flashcards.

Features:

Select: Easily select arbitrary subsets of your cards, notes or reviews using pandas (one of many introductions, official documentation)
Visualize: Use pandas' powerful built in tools or switch to the even more versatile seaborn (statistical analysis) or matplotlib libraries
Manipulate: Apply fast bulk operations to the table (e.g. add tags, change decks, set field contents, suspend cards, ...) or iterate over the table and perform these manipulations step by step. ⚠️ This functionality is currently disabled until #137 has been resolved! ⚠️
Import and Export: Pandas can export to (and import from) csv, MS Excel, HTML, JSON, ... (io documentation)

Pros:

Easy installation: Install via python package manager (independent of your Anki installation)
Simple: Just one line of code to get started
Convenient: Bring together information about cards, notes, models, decks and more in just one table!
Fully documented: Documentation on readthedocs
Well tested: More than 100 unit tests to keep everything in check

Alternatives: If your main goal is to add new cards, models and more, you can also take a look at the genanki project.

📦 Installation

AnkiPandas is available as pypi package and can be installed or upgrade with the python package manager:

pip3 install --user --upgrade ankipandas

Development installation

For the latest development version you can also work from a cloned version of this repository:

git clone https://github.com/klieret/ankipandas/
cd ankipandas
pip3 install --user --upgrade --editable .

If you want to help develop this package further, please also install the pre-commit hooks and use gitmoji:

pre-commit install
gitmoji -i

🔥 Let's get started!

Starting up is as easy as this:

from ankipandas import Collection

col = Collection()

And col.notes will be dataframe containing all notes, with additional methods that make many things easy. Similarly, you can access cards or reviews using col.cards or col.revs.

If called without any argument Collection() tries to find your Anki database by itself. However this might take some time. To make it easier, simply supply (part of) the path to the database and (if you have more than one user) your Anki user name, e.g. Collection(".local/share/Anki2/", user="User 1") on many Linux installations.

To get information about the interpretation of each column, use print(col.notes.help_cols()).

Take a look at the documentation to find out more about more about the available methods!

Some basic examples:

📈 Analysis

More examples: Analysis documentation, projects that use AnkiPandas.

Show a histogram of the number of reviews (repetitions) of each card for all decks:

col.cards.hist(column="creps", by="cdeck")

Show the number of leeches per deck as pie chart:

cards = col.cards.merge_notes()
selection = cards[cards.has_tag("leech")]
selection["cdeck"].value_counts().plot.pie()

Find all notes of model MnemoticModel with empty Mnemotic field:

notes = col.notes.fields_as_columns()
notes.query("model=='MnemoticModel' and 'Mnemotic'==''")

🛠️ Manipulations

Warning Writing the database has currently been disabled until #137 has been resolved. Help is much appreciated!

Warning Please be careful and test this well! Ankipandas will create a backup of your database before writing, so you can always restore the previous state. Please make sure that everything is working before continuing to use Anki normally!

Add the difficult-japanese and marked tag to all notes that contain the tags Japanese and leech:

notes = col.notes
selection = notes[notes.has_tags(["Japanese", "leech"])]
selection = selection.add_tag(["difficult-japanese", "marked"])
col.notes.update(selection)
col.write(modify=True)  # Overwrites your database after creating a backup!

Set the language field to English for all notes of model LanguageModel that are tagged with English:

notes = col.notes
selection = notes[notes.has_tag(["English"])].query("model=='LanguageModel'").copy()
selection.fields_as_columns(inplace=True)
selection["language"] = "English"
col.notes.update(selection)
col.write(modify=True)

Move all cards tagged leech to the deck Leeches Only:

cards = col.cards
selection = cards[cards.has_tag("leech")]
selection["cdeck"] = "Leeches Only"
col.cards.update(selection)
col.write(modify=True)

🐞 Troubleshooting

See the troubleshooting section in the documentation.

💖 Contributing

Your help is greatly appreciated! Suggestions, bug reports and feature requests are best opened as github issues. You could also first discuss in the gitter community. If you want to code something yourself, you are very welcome to submit a pull request!

Bug reports and pull requests are credited with the help of the allcontributors bot.

📃 License & Disclaimer

This software is licenced under the MIT license and (despite best testing efforts) comes without any warranty. The logo is inspired by the Anki logo (license) and the logo of the pandas package (license2). This library and its author(s) are not affiliated/associated with the main Anki or pandas project in any way.

✨ Contributors

Thanks goes to these wonderful people (emoji key):

_Blocked 🐛	_CalculusAce 🐛	_{Francis Tseng} 🐛 💻	_{Keith Hughitt} 🐛	_{Miroslav Šedivý} ⚠️ 💻	_{Nicholas Bollweg} 💻	_{Thomas Brownback} 🐛
_eshrh 📖	_exc4l 🐛 💻	_p4nix 🐛

This project follows the all-contributors specification. Contributions of any kind welcome!

ankipandas's People

Contributors

Stargazers

Watchers

Forkers

khonkhortisan andrewsanchez afinney1 dexdex1515 bollwyvl eshrh artsr dolfino kartik-hegde simbaninja917 eli1797 15921483570 lgtm-migrator frnsys evelf eumiro

ankipandas's Issues

Mention Notebooks in Docs

The examples provided don't work out of the box when run from the command line. They do work when run as part of a notebook, this makes me suspect there is a strong assumption by the authors that users will be working in notebooks.

If this is the case it should be noted in the documentation.

More feedback wrt changes and merging procedures before (over)writing database

move to github actions from travis

Warn if backup files take up x space. Count number of backups.

Add github issue template

Add a date column to the rev table

The rid column is equal to time stamp of when the review was completed.

Sync model names <> model IDs

Update Cause Duplicates

The "col.notes.update()" seems to duplicates cards when editing values in "nflds". E.g. two exact "Card 1" from same Note (nid).

odid -> odeck

import/export examples

Make ``nid``, ``cid``, ``rid`` index

More options list_decks and list_models

list_decks(min_cards=100) to display only decks with at least 100 cards
list_models(with_field="Expression") to list all models that contain the expression field

Allow adding of new notes/cards/revs

IDs

new_row
new_rows

GUID

Add guid field in ADF.raw(), see #8

Constructor

Allow to initialize empty dataframe, not bound to database.

Adding new field to existing note

Hi!

First of all, thank you so much for this excellent library! Save me a lot of time working with the anki database.

I just want to ask if there is any way to add a new field to a note? Do I need to edit the note model in Anki first or ankipandas also expose the note model for editing?

Thank you so much!

Process for Adding Cards to a Deck with ankipandas

I have been following the documentation for creating cards with ankipandas, but I feel like I am misunderstanding the process a bit. I have successfully been able to create notes, to spawn new nid's that can be used for the cards, but when I try to use those nids to add cards for those notes to a specific deck, I always get the following error:

ValueError: The following note IDs (nid) can't be found in the notes table: 1605035990398, 1605036027992. Perhaps you didn't call notes.write() to write them back into the database?

Note that the 2 nids in the error are the nids that I added with the add_notes function. While I really appreciate the hint about notes.write, it is a tad confusing to understand exactly what that means in this context. Do the new notes need to be written to the database before new cards can be added to the card table? If that is the case, that isn't very intuitive, as I'd expect that you could create notes and cards and then collectively write them back to the database together, but perhaps there's something I'm missing. It would be tremendously helpful to see a full example of adding a simple "Basic" note/card to a deck.

Thanks again for this module and really hope I can get some assistance with this.

Update:

I made a little more progress. I was able to successfully write the notes to the database and get over that error I was getting, but now I am running into the following error when trying to add cards to a deck:

OverflowError: Python int too large to convert to C long

The error occurs in this segment of code in add_cards function:

    add = add.astype(
        {
            key: value
            for key, value in _columns.dtype_casts_all.items()  <--
            if key in self.columns
        }
    )

It appears dtype_casts_all is throwing the error.

Make sure any modified cards are marked for updating and other automatic fields

Dress rehearsal

notes.has_tag() not working Unknown value of _df_format: None

MWE:

from ankipandas import Collection
col = Collection(user='User 1')
notes = col.cards.merge_notes()
notes.has_tag('leech')

results in:

~\Anaconda3\lib\site-packages\ankipandas\ankidf.py in _check_df_format(self)
    185             pass
    186         else:
--> 187             raise ValueError(
    188                 "Unknown value of _df_format: {}".format(self._df_format)
    189             )

ValueError: Unknown value of _df_format: None

pandas version 1.1.3.
Anki version 2.1.22

Is this reproducible? If not, is there anything I can provide you with?

Document issue with write out required when adding new cards together with new notes

Merging: Do not use columns but .id properties

Update csum field when changing note fields.

Rather: Directly remove it from columns

has_tags example is wrong

As reported by @exc4l in #50

Common database locations on Windows

Where does Anki usually store it's database under Windows?

This would be good to know in order to make it easy for AnkiPandas to find the databse automatically.

set_info()

info table isn't encoded and set properly yet

Separate tag field

Tags as list, rather than as space separated string.

Find backup directory and backup database

selection["cdeck"].value_counts().plot.pie().show()?

How do I get it to display a pie chart? Do I need to install more than just ankipandas and matplotlib? Do I need to switch to the py2 or py3 version of anything? I'm on windows.

Convenience methods to filter for deck name, model name, etc.

Restore from backup

Have a method Collection.restore() that restores the db from the last backup

Allow to create new decks

Thanks for this project.

I'm looking for a way to generate test decks for other add-ons.

So, I'd like to be able to programmatically build decks that have a fake review history. So they'd contain a mix of cards in review, lapsed, learning, and relearning, with different histories of reviews. Then I'd test other add-ons against those decks, to confirm those add-ons modified ease and ivls correctly.

I'm checking out genanki as well, maybe that's a better fit, not sure.

But if making up cards with review histories is something you think ankipandas could take on, would love to see an example or two in the Usages section of how to create a deck from scratch (if that's a thing), and how to edit card review histories (if that's a thing).

Not sure if this is a documentation request, or if it's completely out of scope for the project, sorry!

Warnings: Setting value on slice

when doing selection.fields_as_columns(inplace=True):

../../ankipandas/ankidf.py:542: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[prefix + field] = ""
/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py:543: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py:3697: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)

was_modified & friends break if new columns were added

Interface changes and major refactoring

Duplicate information

Options:

[CURRENT] Allow having duplicate information as columns in DataFrame and consequently have methods to check consistency and both way syncing: Method flds_vs_columns (perhaps internal), flds_to_columns (creates columns if nexist, prebooks back conversion), columns_to_flds.

Pros:
- Explicit
Cons:
- Perhaps that there's more than one place to edit? But Sync makes it obvious again.
Only allow to toggle between corresponding representations fields_as_columns, fields_as_list (default)
Pros:
- No need to check
- Always clear to the user where he can change something
Settle for easy-to-use representation and hide the internal one. Impossible for flds (too many columns added, too costly computation time wise, unpractical when just copying). Definitely for tags. But for did/dname:

Pros:
- Most clutter-free option. User also never needs both, because he would have to overwrite one option anyhow
- Nobody wants IDs
- Can still get IDs with a method (if needed at all).
- Can rename dname > deck, mname > model
Cons:
- Fragile?
- Slightly less memory efficient

Resolutions:

dname/mname/did/mid: 3
Flds: Definitly never show \x1f-joined string to user, but need both joined and expanded form => 2
Tags: 3

Clashing column names

Options:

[CURRENT] Append c or n to only these fields that clash when being merged from cards or notes.

Cons:
Inconsistent
Append c or n to all fields that are being merged from cards or notes.
Append r, c and n to all fields from revlog, cards, notes

Pros:
- Absolutely consistent, makes help table much easier
- No more need for AnkiPandas._nid_column etc.
- User finally understands different id columns.
Up to user
Cons:
- Fragile

Resolutions:

Help

Resolutions:

Rename help -> help_cols
Add help method that prints general help

IDs are strings

There should be a warning notice that all IDs are currently treated as strings.

This is due to problems to encode NaNs that can appear when merging IDs from another table.

has_tag convenience method

Decodify card state columns?

Resolve "-3=sched buried, -2=user buried, -1=suspended, 0=new, 1=learning, 2=due (as for type), 3=in learning etc.

JSONDecodeError

I am not very experienced with Python, but I tried getting a basic example to work and it fails:

INFO: Searching for database. This might take some time. You can speed this up by specifying a search path or directly entering the path to your database.
WARNING: The search will stop at the first hit, so please verify that the result is correct (for example in case there might be morethan one Anki installations)
INFO: Loaded db from /Users/XXXX/Library/Application Support/Anki2/XXXX/collection.anki2
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    cards = col.cards.merge_notes()
  File "/usr/local/lib/python3.7/site-packages/ankipandas/collection.py", line 99, in cards
    return self._get_item("cards")
  File "/usr/local/lib/python3.7/site-packages/ankipandas/collection.py", line 83, in _get_item
    r = self._get_original_item(item).copy(True)
  File "/usr/local/lib/python3.7/site-packages/ankipandas/collection.py", line 76, in _get_original_item
    r = AnkiDataFrame.init_with_table(self, item)
  File "/usr/local/lib/python3.7/site-packages/ankipandas/ankidf.py", line 103, in init_with_table
    new._get_table(col, table, empty=empty)
  File "/usr/local/lib/python3.7/site-packages/ankipandas/ankidf.py", line 98, in _get_table
    self.normalize(inplace=True)
  File "/usr/local/lib/python3.7/site-packages/ankipandas/ankidf.py", line 1025, in normalize
    self["cdeck"] = self["did"].map(raw.get_did2deck(self.db))
  File "/usr/local/lib/python3.7/site-packages/ankipandas/raw.py", line 294, in get_did2deck
    dinfo = get_deck_info(db)
  File "/usr/local/lib/python3.7/site-packages/ankipandas/raw.py", line 281, in get_deck_info
    return get_info(db)["decks"]
  File "/usr/local/lib/python3.7/site-packages/ankipandas/raw.py", line 120, in get_info
    ret[col] = json.loads(val)
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.de
coder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The code is:

from ankipandas import Collection
col = Collection()

cards = col.cards.merge_notes()
selection = cards[cards.has_tag("leech")]
selection["cdeck"].value_counts().plot.pie()

Is there anything I am doing wrong? I tested both on Anki 2.1.26 and .28alpha3, running on macOS. Python version 3.7.

aquery method

Easy to use method that can perform most selections.

With options:

queue: learning/new/relearn/...
suspended/.../.../...

or should we have is_... methods instead?

Maybe?

is_leech
is_marked

Testing on Windows: TemporaryDirectory can't be cleaned

If you run pytest locally on Windows, there are weird PermissionErrors: Some other process seems to be still running that is using a file from the TemporaryDirectory, hence it can't be removed.

@exc4l do you see this issue when running locally under Windows as well?

Access to card templates?

Do not overwrite table if no modifications detected.

fields_as_columns and update

python col.notes.update(other_notes)
won't work if we had called other_notes.fields_as_columns() but not on col.notes.

Drop unused columns by default

AttributeError with ankipandas.Collection().notes

Was just kicking the tires, col.revs and col.cards worked as expected, but ran into an issue when calling col.notes:

(with col = ankipandas.Collection(), which it seemed to find fine.)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/collection.py", line 90, in notes
    return self._get_item("notes")
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/collection.py", line 83, in _get_item
    r = self._get_original_item(item).copy(True)
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/collection.py", line 76, in _get_original_item
    r = AnkiDataFrame.init_with_table(self, item)
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/ankidf.py", line 103, in init_with_table
    new._get_table(col, table, empty=empty)
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/ankidf.py", line 98, in _get_table
    self.normalize(inplace=True)
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/ankidf.py", line 1028, in normalize
    self["nmodel"] = self["mid"].map(raw.get_mid2model(self.db))
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/raw.py", line 347, in get_mid2model
    minfo = get_model_info(db)
  File "/home/thomas/.local/lib/python3.8/site-packages/ankipandas/raw.py", line 334, in get_model_info
    return {int(key): value for key, value in get_info(db)["models"].items()}
AttributeError: 'str' object has no attribute 'items'

Writing to collection destroys tags

MWE:

col = Collection(user='testing')
notes = col.notes
selection = notes[notes.has_tag("leech")]
#selection = selection.add_tag(["marked"])
col.notes.update(selection)
col.write(modify=True)

col.summarize_changes() for this:

======== notes ========
Total rows: 10347
Compared to original version:
Modified rows: 0
Added rows: 0
Deleted rows: 0

This doesn't result in an immediate error but after opening anki the card browser is unable to find notes/cards by tag.

In the testing profile, I do have 32 notes tagged with 'leech'.
After writing when searching by the leech tag it shows no notes.
The notes are still there and if I search for them manually I can see that they indeed do have the leech tag.
If I, after now manually selecting them, do a search again for the leech tag, they do appear but also only the ones I selected manually.

When using the .add_tag() function the new tags do show up in the notes but it's the same with the leech tag. The card browser is only able to find the new tag after I manually selected the cards.

This isn't fixed by anki's "check database" function.

As mentioned before I'm using an older anki version. If you aren't able to reproduce this let me know and I will see that I try the same with the newest anki version.

Furthermore, as you might see I'm doing the selection a bit different than your example in the readme. This example:

selection = col.notes.has_tags(["Japanese", "leech"])
selection = selection.add_tag(["difficult-japanese", "marked"])
col.notes.update(selection)
col.write(modify=True)  # Overwrites your database after creating a backup!

results in:

----> 2 selection = selection.add_tag(["difficult-japanese", "marked"])

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5137             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5138                 return self[name]
-> 5139             return object.__getattribute__(self, name)
   5140 
   5141     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'add_tag'

Could the first part be relating to #32 ? In my case no duplicates seem to be created though.

Mark inplace operations as deprecated

The inplace syntax causes a lot of bugs and doubles the amount of testing, e.g. #51 .

Also the not-inplace operations don't do a copy either at the moment, so it's currently 100% equivalent.

Rename nmodel > model, cdeck > deck

set db info mod before write

Add attribute setters (because of collisions with columns)

AttributeError: 'NoneType' object has no attribute 'col'

As reported by @CalculusAce in #42

In the pervious version, I was able to merge the notes tables into the card table after expanding the fields in the note table with fields as columns. In the current build, I am now unable to merge the field columns into the cards table, which is critical for an app I am developing. Here is the code I was using successfully in the previous version that no longer works in the current build.

collect = Collection()
cards = collect.cards
notes = collect.notes
notes.fields_as_columns(inplace=True)
field_cols = [col for col in notes.columns if 'nfld' in col]
cards.merge_notes(inplace=True,columns=field_cols+['nmodel','ntags','nid'])

I get the following error after upgrading to the latest version of ankipandas that you released this morning:

"ankidf.py", line 459, in merge_notes
    setattr(ret, md, getattr(self, md))

AttributeError: 'NoneType' object has no attribute 'col'

klieret / ankipandas Goto Github PK

ankipandas's Introduction

📝 Description

📦 Installation

Development installation

🔥 Let's get started!

📈 Analysis

🛠️ Manipulations

🐞 Troubleshooting

💖 Contributing

📃 License & Disclaimer

✨ Contributors

ankipandas's People

Contributors

Stargazers

Watchers

Forkers

ankipandas's Issues

IDs

GUID

Constructor

Duplicate information

Clashing column names

Help

Recommend Projects

Recommend Topics

Recommend Org

Jobs