neocl / jamdict Goto Github PK

View Code? Open in Web Editor NEW

119.0 4.0 12.0 1 MB

Python 3 library for manipulating Jim Breen's JMdict, KanjiDic2, JMnedict and kanji-radical mappings

License: MIT License

Python 99.84% Shell 0.16%

python japanese dictionary japanese-dictionary python-library japanese-language japanese-study jmdict kanjidic2 kanji

jamdict's People

Contributors

Stargazers

Watchers

Forkers

reem-codes bernhardvalenti letuananh alt-romes tslater ciccone1978 killawords edwardcoventry swannedlakee hoanghungict tristcoil

jamdict's Issues

Searching POS in parameters doesn't search for all possible POS

I noticed when using this solution for finding POS that we discussed in #22 :

# find all idseq of lexical entry (i.e. words) that have at least 1 sense with pos = suru verb - irregular
with jam.jmdict.ctx() as ctx:
    # query all word's idseqs
    rows = ctx.select(
        query="SELECT DISTINCT idseq FROM Sense WHERE ID IN (SELECT sid FROM pos WHERE text = ?)",
        params=("expressions (phrases, clauses, etc.)",))
    for row in rows:
        # reuse database connection with ctx=ctx for better performance
        word = jam.jmdict.get_entry(idseq=row['idseq'], ctx=ctx)
        ruler.add_patterns([{"label": "EXPRESSION", "pattern": x.text} for x in word.kanji_forms])
        ruler.add_patterns([{"label": "EXPRESSION", "pattern": x.text} for x in word.kana_forms])
        print("Working on expressions...")

that some expressions that have the 'expressions (phrases, clauses, etc.)' as a secondary parameter instead of primary seem to not be caught by this search. Is this a bug, or an intended feature?

Additionally, it seems that the original JMDict does not use this scheme to refer to expressions. Instead the term used is 'exp'. Am I mistaken here?

Thank you

[Feature Request] In-memory database

Hi there, thanks for making this library! I was wondering if it's possible to add an option for the database to be forcibly created in memory:

class ExecutionContext(object):
    # ...
    def __init__(self, path, schema, auto_commit=True):
        source = sqlite3.connect(str(path))
        self.conn = sqlite3.connect(':memory:')
        source.backup(self.conn)
        # ...

I added this snippet to puchikarui.py and it sped up lookups by about 30-40% (7.7 seconds down to ~4 seconds). Of course, ideally the database would be kept outside of the context construction. I'm currently have reuse of contexts enabled.

Add iter search for big queries

Related to issue #22

Trim down dependencies

Make lxml optional (most people don't parse jamdict XML files but use prebuilt SQLite DB file)
May use embedded puchikarui and keep it up to date instead of loose linking
Review chirptext dependency

Can't install jamdict-data on Windows

Hello.

I tried to install jamdict-data on Windows but I couldn't. My details: Windows 11, Python 3.11.4, PowerShell 7.3.6.

Seems to be a Windows-specific thing: [WinError 32] The process cannot access the file because it is being used by another process

Full terminal output:

PS C:\Users\User> python.exe -m pip install jamdict-data
Collecting jamdict-data
  Downloading jamdict_data-1.5.tar.gz (53.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.9/53.9 MB 9.8 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [13 lines of output]
      running dist_info
      creating C:\Users\User\AppData\Local\Temp\pip-modern-metadata-_2sae9a0\jamdict_data.egg-info
      writing C:\Users\User\AppData\Local\Temp\pip-modern-metadata-_2sae9a0\jamdict_data.egg-info\PKG-INFO
      writing dependency_links to C:\Users\User\AppData\Local\Temp\pip-modern-metadata-_2sae9a0\jamdict_data.egg-info\dependency_links.txt
      writing top-level names to C:\Users\User\AppData\Local\Temp\pip-modern-metadata-_2sae9a0\jamdict_data.egg-info\top_level.txt
      writing manifest file 'C:\Users\User\AppData\Local\Temp\pip-modern-metadata-_2sae9a0\jamdict_data.egg-info\SOURCES.txt'
      reading manifest file 'C:\Users\User\AppData\Local\Temp\pip-modern-metadata-_2sae9a0\jamdict_data.egg-info\SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      adding license file 'LICENSE'
      writing manifest file 'C:\Users\User\AppData\Local\Temp\pip-modern-metadata-_2sae9a0\jamdict_data.egg-info\SOURCES.txt'
      creating 'C:\Users\User\AppData\Local\Temp\pip-modern-metadata-_2sae9a0\jamdict_data-1.5.dist-info'
      Unpacking database from C:\Users\User\AppData\Local\Temp\pip-install-6nyf8b0w\jamdict-data_51d99a1c3c554a3a9b8858235b75d3ac\jamdict_data\jamdict.db.xz to C:\Users\User\AppData\Local\Temp\pip-install-6nyf8b0w\jamdict-data_51d99a1c3c554a3a9b8858235b75d3ac\jamdict_data\jamdict.db
      error: [WinError 32] The process cannot access the file because it is being used by another process: 'jamdict_data/jamdict.db.xz'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Thank you very much for creating jamdict, by the way.

[Feature Request] Add Tatoeba

Add Tanaka Corpus (Tatoeba) (http://ftp.monash.edu/pub/nihongo/#oth_fil), like jisho.org. http://ftp.monash.edu/pub/nihongo/examples.utf.gz

Enhance setup.py

Don't import jamdict in setup.py

Add POS filter to Jamdict.lookup()

Related to issue #22

Add str() to gloss

`sqlite3.OperationalError: database is locked` when importing data (Windows)

$ python -m jamdict import
Jamdict 0.1a11.post2
Python library for using Japanese dictionaries and resources (Jim Breen's JMdict, KanjiDic2, KRADFILE, JMnedict)

Basic configuration
------------------------------------------------------------
JAMDICT_HOME: ~/.jamdict [OK]
jamdict-data: version 1.5.1a2 [OK]
Config file : Not available.
     Run `python3 -m jamdict config` to create configuration file if needed.

Data files
------------------------------------------------------------
Jamdict DB location: ~/extern/jamdict_data/jamdict_data/jamdict.db - [NOT FOUND]
JMDict XML file    : ~/.jamdict/data/JMdict_e.gz - [OK]
KanjiDic2 XML file : ~/.jamdict/data/kanjidic2.xml.gz - [OK]
JMnedict XML file  : ~/.jamdict/data/JMnedict.xml.gz - [OK]

Others
------------------------------------------------------------
puchikarui: version 0.1
chirptext : version 0.1.2
lxml      : False
Importing data to: ~/extern/jamdict_data/jamdict_data/jamdict.db
Started - [Creating Jamdict SQLite database. This process may take very long time ...]
WARNING:puchikarui.puchikarui:DB does not exist at ~/extern/jamdict_data/jamdict_data/jamdict.db. Setup is required.
WARNING:jamdict.util:Building Kanjidic2 DB using a different DB context None vs ~/extern/jamdict_data/jamdict_data/jamdict.db
ERROR:puchikarui.puchikarui:Query failed: q=INSERT INTO character (ID,literal,stroke_count,grade,freq,jlpt) VALUES (?,?,?,?,?,?) , p=(None, '亜', 7, '8', '1509', '1')
Traceback (most recent call last):
  File "~/.venv/Lib/site-packages/puchikarui/puchikarui.py", line 469, in execute
    _r = self.cur.execute(query, params)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.OperationalError: database is locked
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "~/.venv/Lib/site-packages/jamdict/__main__.py", line 2, in <module>
    tools.main()
  File "~/.venv/Lib/site-packages/jamdict/tools.py", line 294, in main
    app.run()
  File "~/.venv/Lib/site-packages/chirptext/cli.py", line 135, in run
    args.func(self, args)
  File "~/.venv/Lib/site-packages/jamdict/tools.py", line 83, in import_data
    jam.import_data()
  File "~/.venv/Lib/site-packages/jamdict/util.py", line 502, in import_data
    self.kd2.insert_chars(self.kd2_xml, ctx=kd_ctx)
  File "~/.venv/Lib/site-packages/jamdict/kanjidic2_sqlite.py", line 124, in insert_chars
    self.insert_char(c, ctx=ctx)
  File "~/.venv/Lib/site-packages/jamdict/kanjidic2_sqlite.py", line 132, in insert_char
    c.ID = ctx.character.save(c)
           ^^^^^^^^^^^^^^^^^^^^^
  File "~/.venv/Lib/site-packages/puchikarui/puchikarui.py", line 366, in save
    return self._context.insert_object(self._table, obj, columns, self._table._field_map)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.venv/Lib/site-packages/puchikarui/puchikarui.py", line 439, in insert_object
    self.insert_record(table, values, columns)
  File "~/.venv/Lib/site-packages/puchikarui/puchikarui.py", line 414, in insert_record
    self.execute(query, values)
  File "~/.venv/Lib/site-packages/puchikarui/puchikarui.py", line 469, in execute
    _r = self.cur.execute(query, params)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.OperationalError: database is locked

Better lookup results

Allow strict_lookup (no additional characters, only the ones in the query)
add str() and repr() to result objects

how to reset the sqlite 3 database

There is some error while importing the file gdrive

Which make it doesn't find any character

May I know where is the database storing at, how do reset them or delete them

How to search for part of speech

Hi- how am I able to search by part of speech?

Can't buid database file since latest release

I can't build the database since the latest (I think) release. Before I just did python3 -m jamdict.tools import and it worked.

Now python3 -m jamdict.tools import or python3 -m jamdict import give me this:

Traceback (most recent call last): File "/data/data/com.termux/files/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/data/data/com.termux/files/usr/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/jamdict/__main__.py", line 2, in <module> tools.main() File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/jamdict/tools.py", line 295, in main app.run() File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/chirptext/cli.py", line 135, in run args.func(self, args) File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/jamdict/tools.py", line 70, in import_data db_loc = os.path.abspath(os.path.expanduser(args.jdb)) File "/data/data/com.termux/files/usr/lib/python3.9/posixpath.py", line 231, in expanduser path = os.fspath(path) TypeError: expected str, bytes or os.PathLike object, not NoneType

Everything seems fine in python3 -m jamdict info:

Jamdict 0.1a11.post1
Python library for using Japanese dictionaries and resources (Jim Breen's JMdict, KanjiDic2, KRADFILE, JMnedict)

Basic configuration
------------------------------------------------------------
JAMDICT_HOME: /data/data/com.termux/files/home/.jamdict [OK]
jamdict-data: Not installed
Config file : /data/data/com.termux/files/home/.jamdict/config.json

Data files
------------------------------------------------------------
Jamdict DB location: /storage/emulated/0/Documents/Dictionaries/jamdict.db - [OK]
JMDict XML file : /storage/emulated/0/Documents/Dictionaries/JMdict_e.gz - [OK]
KanjiDic2 XML file : /storage/emulated/0/Documents/Dictionaries/kanjidic2.xml.gz - [OK]
JMnedict XML file : /storage/emulated/0/Documents/Dictionaries/JMnedict.xml.gz - [OK]

Jamdict database metadata
------------------------------------------------------------
jmdict.version: 1.08
jmdict.url: http://www.csse.monash.edu.au/~jwb/edict.html
generator: jamdict
generator_version: 0.1a9
generator_url: https://github.com/neocl/jamdict
jmnedict.version: 1.08
jmnedict.url: https://www.edrdg.org/enamdict/enamdict_doc.html
jmnedict.date: 2020-05-29
kanjidic2.version: 1.6
kanjidic2.url: https://www.edrdg.org/wiki/index.php/KANJIDIC_Project
kanjidic2.date: April 2008

Others
------------------------------------------------------------
puchikarui: version 0.2a2
chirptext : version 0.1.2
lxml : True

My config.json looks like this:
{ "JAMDICT_HOME": "/data/data/com.termux/files/home/.jamdict", "JAMDICT_DATA": "{JAMDICT_HOME}/data", "JAMDICT_DB": "/storage/emulated/0/Documents/Dictionaries/jamdict.db", "JMDICT_XML": "/storage/emulated/0/Documents/Dictionaries/JMdict_e.gz", "JMNEDICT_XML": "/storage/emulated/0/Documents/Dictionaries/JMnedict.xml.gz", "KD2_XML": "/storage/emulated/0/Documents/Dictionaries/kanjidic2.xml.gz", "KRADFILE": "/storage/emulated/0/Documents/Dictionaries/kradfile-u.gz" }

This is Termux on Android, if that matters. Also I can't install jamdict-data from pip, it fails and asks me to install wheel, which doesn't solve the problem -- but that's another matter.

use AppConfig to config jamdict

Simplify API:

People don't really use XML files to lookup, the default option to create a jamdict most likely will be DB.
jmdict, kanjidic and multikrad most likely will be in a single database.
Add util functions:
- read()
- parse() from xml file(s) to db file(s)

Accessing reading breakdown of a vocabulary term from JMdict

For example, given the entry for 日本語 I'd like to not only get the reading, にほんご, but also which parts of the reading are associated with which kanji, e.g. 日→に, 本→ほん, and 語→ご. This would make rendering furigana from the database much easier. Is this possible? Thanks!

Some words not searchable in dictionary

I am finding that a word that should be in the dictionary (for example, 大事にする（だいじにする）) is not showing up in the dictionary. Not sure why this is- it can be seen in the JMDictDB here

Split JamdictXML from Jamdict code base

End users should not use JamdictXML anyway (it's only useful for database building and testing)

Customizing JAMDICT_HOME / JAMDICT_DATA

I'm using jamdict for an educational game and I would like to install jamdict's data in a custom folder.
After looking at jamdict.config, I've tried setting environment variables JAMDICT_HOME and JAMDICT_DATA, be this seems to have no effect.
Is there a proper way to do this ?

========================================
Found entries
========================================
Entry: 1132270 | Kj:   | Kn: ムーン
--------------------
1. moon ((noun (common) (futsuumeishi)))

========================================
Found characters
========================================

neocl / jamdict Goto Github PK

jamdict's People

Contributors

Stargazers

Watchers

Forkers

jamdict's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs