GithubHelp home page GithubHelp logo

mdd-repo / anki-card-scraping-tools Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 3.51 MB

A collection of language-learning Anki cards and the Python scripts used for generating them.

License: Creative Commons Zero v1.0 Universal

Python 100.00%
anki anki-cards anki-deck anki-flashcards language-learning native-american mvskoke farsi persian

anki-card-scraping-tools's Introduction

Anki Card Scraping Tools

A collection of Python scripts used for generating Anki cards through web scraping and file scraping.
ctrl + shift + s downloads target files after clicking the link to them; key files linked below.

I love language learning and using Anki.
Now, ideally, I would just type entire dictionaries1 into the card creator by hand. Unfortunately life is short and I do not have time to do that for everything. So I automate the process in Python whenever I run into a resource I need to scrape for vocabulary.


Wiktionary: Persian lemmas (Persian / Farsi Anki Cards)

This card deck was last scraped on April 30th, 2024.
I cannot fully explain what possessed me to do this, beyond "I will use it."
Will anyone else? Maybe. Maybe not. I still wanted to do it.

This is a web scraped Anki deck pack that includes the entire Wiktionary database for words and phrases in Farsi/Persian. Incredibly hefty at 13,383 cards. They are sorted by part of speech, just as the Persian lemmas page divides them, and from there the script picks out the word, pronunciation, definitions, and etymology; redundant entries are passed over in favor of those with definitions. Due to the sheer quantity, expect minor formatting errors. Due to the nature of Wikipedia/Wiktionary, be critical.

Included: the Python code, the spreadsheets, and the Anki deck pack.
Categories: adjectives, adverbs, conjunctions, determiners, interjections, morphemes, multi-word terms, nouns, numerals, particles, phrases, prepositions, pronouns, verbs.

I have taken the liberty of removing any racial and ethnic slurs that I could find; this process may not have been perfect. Words that indicate sexual behavior and contact remain in the pack from a standpoint of linguistic knowledge. Please be aware of this when using the cards.

Available on AnkiNet #1446159529


Muskogee / Mvskoke Language Web Dictionary (Mvskoke Anki Cards)

This card deck was last scraped from the April 24th, 2024 update.
The wonderful teachers at Mvskoke Opunvkv have included an online edition of the 2000 "A Dictionary of Creek / Muskogee" reference book compiled by Jack B. Martin and Margaret McKane Mauldin, viewable here.2

Included: the Python code, the spreadsheet, and the Anki deck made from the current release.

Available on AnkiNet #2044931447.

Footnotes

  1. Recommendation for the whole-dictionary method: After importing into Anki, click on the deck, then Browse, then shift-click the first and last cards to select everything. Right click, and Toggle Suspend. Go section by section, or by words as you learn them individually, and un-Toggle Suspend, so as to not be overwhelmed. You can also use this to construct your own frequency lists, particularly for languages that do not have readily available lists to study.

  2. This web edition is still in its drafting stages. According to the roadmap, there will be several more rounds of community review before its final version is made public.

anki-card-scraping-tools's People

Contributors

mdd-repo avatar

Stargazers

sorata avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.