GithubHelp home page GithubHelp logo

aaronhktan / jyut-dict Goto Github PK

View Code? Open in Web Editor NEW
122.0 5.0 8.0 140.96 MB

A free, open-source, offline Cantonese Dictionary for Windows, Mac, and Linux. Qt, SQLite. C++ and Python.

Home Page: https://jyutdictionary.com

License: MIT License

Python 21.62% C++ 76.28% Shell 0.26% Makefile 0.01% Objective-C++ 0.09% Batchfile 0.11% Qt Script 0.05% C 0.06% CMake 1.51%
cantonese cantonese-dictionary cantonese-language offline desktop qt cc-canto cedict cfdict handedict macos windows ubuntu cross-platform pinyin jyutping tatoeba dictionary language words-hk

jyut-dict's Introduction

Jyut Dictionary - A free, open-source, offline Cantonese dictionary

/jyːt ˈdɪkʃənɛɹi/

Look up words from multiple dictionaries in Cantonese or Mandarin, with Traditional Chinese, Simplified Chinese, Jyutping, Pinyin, and English input.

Available for macOS, Windows, and Ubuntu.

Features

Vast number of entries.

Jyut Dictionary gives you access to CEDICT, CC-CANTO, words.hk 粵典, the Unihan database, 開放詞典, and Chinese sentences from Tatoeba, all completely offline. Plus, you can download and add more dictionaries to the program. That's over 200,000 entries and 100,000 sentences to search from!

Search quickly.

Results appear in a list as you type, so it's faster and easier to find what you're looking for. Plus, your search history is saved if you want to go back to a word you've looked up before.

Search with your preferred input method.

Jyut Dictionary supports entry with Traditional Chinese, Simplified Chinese, Jyutping, Pinyin, and English.

Localized.

Use the dictionary in English, French, Simplified and Traditional Cantonese, or Simplified and Traditional Chinese.

Customizable.

Prefer to see only Traditional Chinese? Maybe hide Pinyin? Change the colours of the words or disable them altogether? Do that with a plethora of settings options!

Project structure

The project contains two subdirectories under src: dictionaries, and jyut-dict.

dictionaries

This folder contains several Python3 scripts that convert the various online Cantonese/Written Chinese dictionaries into the dictionary format used by Jyut Dictionary. However, for copyright reasons, data for these sources may not be included in this repository. Selected sources include:

jyut-dict

This folder contains the source code for the program, and a Qt Creator project file. Files are divided into several subdirectories:

  • components: UI components, such as the list view or search bar.
  • dialogs: dialogs, such as the "update available" notification dialog.
  • logic: definitions for search and entry classes, as well as other backend logic.
  • platform: platform-specific files, such as Info.plist for the macOS application bundle.
  • resources: databases, icons, and images.
  • windows: UI windows of the program, such as the main window.

Build and run

This project requires Qt 5.15.

Before building the application, you must provide a dictionary database. Download one from the website, or build the dictionary database using parse-set.py (for CEDICT + CC-CANTO) or parse-individual.py (for CFDICT/HanDeDict).

Place the database, named dict.db, in src/jyut-dict/resources/db/

macOS, Windows: Craft + Qt Creator

  1. Install Craft.
  2. Install various dependencies using Craft:
craft libs/qt/qtmultimedia
craft libs/qt/qtspeech
craft karchive
  1. Set up Qt Creator with a kit from Craft, following instructions here.
  2. Open CMakeLists.txt in Qt Creator, and define CMAKE_CXX_FLAGS as -DPORTABLE -DDEBUG in the CMake configuration if you would like to isolate your debug build from any system files.
  3. Compile and run!

Ubuntu: Manual Git clone + Qt Creator

This guide assumes you have already installed Qt 5.15.2 using the online installer to ~/Qt, with the Qt Multimedia and Qt Speech plugins.

  1. Clone KArchive and check out tags/v5.114.0. This is the last version of KArchive that is compatible with Qt 5.
git clone https://github.com/KDE/karchive.git
cd karchive
git checkout tags/v5.114.0
  1. Build according to KArchive instructions:
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=~/Qt/5.15.2/gcc_64
make
sudo make install
  1. Open CMakeLists.txt in Qt Creator, and define CMAKE_CXX_FLAGS as -DPORTABLE -DDEBUG in the CMake configuration if you would like to isolate your debug build from any system files.
  2. Compile and run!

Packaging for release

See the READMEs for each platform under src/jyut-dict/platform/<platform> for instructions on packaging the built executable.

On the roadmap

See the Github issues and projects for more information!

jyut-dict's People

Contributors

aaronhktan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

jyut-dict's Issues

Create dark mode for Ubuntu

Both Pop!_OS and Ubuntu use a dark mode by default. It would be nice to have a dark mode to better fit in on these platforms.

More font options

Allow user to select which font the Chinese words appear in. Good candidates might be Noto Sans/Serif CJK variants.

This would probably also help for bugs like the Catalina bug where the San Francisco CJK variant is broken if this project is compiled on Mojave.

Set the minimum height of the language pill

On Windows with magnification, the border radius of the pill indicating the language of a sentence fails to draw (and instead draws a rectangle).

My suspicion is that this is due to the height not being large enough — when the border radius in the stylesheet exceeds half the height of the item, Qt draws solid corners instead. A potential fix for this would be to set the minimum height of that element.

Add data from ABC Cantonese-English dictionary

Describe the source
"The ABC Cantonese-English Comprehensive Dictionary ... comprises about 15,000 lexical entries that are unique to the colloquial Cantonese language as it is spoken and written in Hong Kong today."

Reasoning
It's a really good additional source of Cantonese expressions and phrases, and it often has etymological explanations absent from other sources. Robert Bauer is a respected Cantonese linguist who has done a lot of work in this field.

Licensing or cost
This dictionary costs $99, and can be licensed by developers from Wenlin for personal use. See here for details.

Libellé incorrect pour la couleur des tons

Bonsoir,

Dans l'écran des réglages, nous pouvons déterminer les couleurs des caractères selon les tons du cantonais ou du mandarin. Il y a deux rangées de tons, une pour le cantonais (je suppose qu'à 6 tons, c'est le jyutping) et une autre juste en-dessous à 4 tons. Or, les deux sont identifiées « Couleurs de tons en jyutping: ». Est-il possible que le libellé de la deuxième rangée devrait être « Couleurs de tons en pinyin: »?
imagen

Add issue template for dictionary sources

Is your feature request related to a problem? Please describe.
There are additional Cantonese dictionaries I would like to add to the dictionary program. I would also like to track the work done to add these dictionaries via Github issues. As such, having a standardized interface for suggesting sources via Github issues would be useful.

Describe the solution you'd like
Add additional templates to the Github issues template folder for different dictionary sources.

Additional context
N/A

Allow user to favourite words

Keep a list of favourites so a user can refer back to them. These words would preferably be viewable in a separate window, as well as showing up as favourited when viewing the search result.

Colour characters in by Jyutping/Pinyin tones

Chinese dictionary programs usually colour characters by their tones.

See here for details on Pinyin tone colours: https://en.wikipedia.org/wiki/Pinyin#Using_tone_colors
See cantonese.org for Pleco's display of Jyutping tone colours.

ACs:

  • Entry list delegate should colour in characters based on their jyutping/pinyin tones.
  • Entry header widget should colour in characters based on their jyutping/pinyin tones.
  • Likely only support one for now (hard code it), but open up for support for others in the future.

Nice to have:

  • Preferably be able to switch between the two, possibly through implementation of a separate preferences class.

Fix some concurrency issues

In sqlsearch.cpp and sqluserdatautils, the list of observers is not protected by any mutual exclusion. This is potentially dangerous (although surprisingly hasn't blown up yet).

Revise words.hk parsing script

In the discussion of Issue #29, it was brought to my attention that words.hk provides a downloadable, parseable file of all the entries in the dictionary at this link (Thanks, @hnfong!).

The old implementation scrapes all the webpages and then parses each of the downloaded files; this is brittle (prone to breaking if the structure of the webpage changes, as it has twice since mid-2020 when the initial scrape was done), and might not be the best for words.hk's servers.

This new implementation should download the provided file from that link, and then parse the downloaded file into the Jyut Dictionary database format.

Use speech modules on various platforms to pronounce words.

As a user, I want to be able to hear what the words sound like in Cantonese and Mandarin. This could be useful if I'm not too familiar with Jyutping/Pinyin, or if I just want to hear how the word is actually pronounced.

Qt Speech should make this not too difficult. It hooks into macOS' Siri voice, but I'm not sure about Windows or Ubuntu.

Searching for ü in pinyin doesn't work

When searching for the pinyin representation of 女, represented as nǚ in pretty pinyin or nu:3 in the database, it will not work because improper escaping of query strings.

Solution:

  • Surround search terms with double quotes, to make it a SQLite string in the query.
  • Turn v and ü into u: to search the table.

Add data from ABC Chinese-English Dictionary

Describe the source
The ABC Chinese-English dictionary is a comprehensive Chinese-language dictionary with over 200,000 entries.

Reasoning
It has a wide range of words that are (1) not covered by CC-CEDICT, or (2) provide additional English translations for entries that are in MoEDict/兩岸詞典.

Licensing or cost
This source is available to developers for personal use for a cost of $99. See here for details.

Closing the application takes a long time

For some reason, in Big Sur and Ubuntu, closing the application sometimes hangs.

I'm suspecting this might be because of the network request to end the session in analytics. The analytics system should probably be removed anyway, since it hasn't been used.

Windows typing sometimes freezes up

When typing quickly in the search bar, the Windows version sometimes freezes up for a split second.

This is probably because of Qt taking a long time to apply stylesheets and laying out the widgets to render sentences. Possible solution to investigate: wait some time before rendering sentences, so it doesn't interfere with somebody typing?

Fix aggressive merging of CC-CANTO words into CC-CEDICT

There are several words that are 多音字 in Cantonese but not in Mandarin; the current script merges them all into the single Mandarin word.

See "敦" for example: CC-CANTO has many different pronunciations in Cantonese for the word (deoi1, deoi3, deoi6, deon1, deon6, diu1, tyun4), but has only dun1 for Mandarin.

These should be represented as separate entries in the database.

De-duplicate and clean up code

Several issues:

  • sqluserdatautils and sqlsearch contain large amounts of exactly the same code for parsing an entry from a SQL query - could this be re-factored into a separate class or static functions?
  • sourcesentence and entry contain large amounts of exactly the same code for creating pretty pinyin versions/exploding phonetic/getting pinyin numbers/getting jyutping numbers/colouring in words. This should be re-factored into static namespaced functions.

Add data from words.hk

words.hk (https://words.hk/) is a great Cantonese-Cantonese and Cantonese-English dictionary. Of particular value are the Cantonese definitions, as well as the example sentences.

words.hk does not provide their data available as a download, but the majority of their data is licensed under the open data license that permits use as long as proper credit is given and it is for non-commercial purposes. Scraping the website is also not expressly forbidden, so that may be what needs to be done.

Add a function in Entry to get prettified Pinyin

Currently, the data supplied by CEDICT gives pinyin as pronunciation followed by tone number. Most people are used to seeing pinyin with accent marks.

A simple algorithm to convert to appropriate pinyin is given here: http://www.pinyin.info/rules/where.html

ACs:

  • Entry should be able to use getPhonetic() with a new flag in the EntryPhoneticOptions enum to get the new pretty pinyin.
  • Pinyin should be correct.

Nice to have:

  • Modify the entry header widget and entry delegate to display the pretty pinyin instead of raw pinyin.

Add Dark Theme on Windows

Like MacOS, Windows 10+ has support for both light and dark themes. The appearance of the dark theme in the repo looks nice, would be helpful to have on Windows.

Move search to its own thread

Currently, the entire program takes place in one thread. When running queries against a large database takes time, which detracts from the UX, as the UI freezes up. Possible solution: move database querying to its own thread.

ACs:

  • Search should work as it does currently (auto-populating list, results as the user types)
  • UI should not freeze when quickly entering or deleting characters from the searchbar.

Use standard paths for storing data

QStandardPaths, paired with QDir::mkpath, should allow us to store stuff in the correct, OS-dependent location. The benefits would be: preferences that do not get deleted when user deletes app, and database in a location other than the program bundle (which would reduce the apparent size of the program).

ACs:

  • QStandardPaths should work on both macOS and Windows.
  • QStandardPaths::AppConfigLocation should be used for persistent configuration settings.
  • QStandardPaths::AppDataLocation should be used for storage of the database.

Update to build on Qt 5.15

Qt 5.15 is the most recent LTS version of Qt. It should be possible to build this application using that version.

i18n

J'aimerais utiliser cette appli en français. De plus, à l'avenir, il y aura une base de données pour un dictionnaire chinois/cantonais -> français (CFDICT, CC-FCANTO), donc ce sera idéal pour les nouveaux utilisateurs.

V's in Pinyin should be converted to ü

Even though the CC-CEDICT standard states that "ü" should be represented with "u:", CC-CANTO has some irregularities and sometimes uses "v" to represent "ü". For an example see "細路女": xi4 lu4 nv3.

Support changing colour for tones

Currently, the dictionary colours tones based on Cantonese pronunciation, falling back to Mandarin if no Cantonese pronunciation is available.

We should be able to switch between which language is preferred, as well as customize the colours used for each language.

ACs:

  • Preference pane or modal window to switch colours
  • Ability to set which language to colour tones

Support updating database independently of the application

Currently, the database is a one and done deal - it comes with the application.

This would move the database to something that can be downloaded separately.

ACs:

  • Be able to download a new database and use it as source for searches.

Rename pull request template file

Describe the bug
The pull request template file should be called pull_request_template.md, and the folder PULL_REQUEST_TEMPLATE. Currently, the pull request template does not show up in Github's PR interface.

Pinyin/Jyutping segmentation

As a user, when searching using pinyin or jyutping, I want to be able to find results without putting spaces between characters.

For example:

  • "mgoisaai" should segment to "m" "goi" "saai"
  • "do1zesaai" should segment to "do1" "ze" "saai"
  • "beijinglu" should segment to "bei" "jing" "lu"
  • "guangzhou1hua" should segment to "guang" "zhou1" "hua"

Copying and pasting broken in definition

To reproduce: search for an entry, view it, try copying a definition from an entry, then paste into the search box. Nothing will end up pasting.

This is caused by QApplication::postEvent() causing an infinite loop when sending the Ctrl+C combo to the QLabel focus widget.

Create a dictionary patching system

As dictionaries get updated, it would be nice to be able to install those updates without removing and re-adding the new version. This is particularly relevant for CC-CEDICT, which has updates very often.

Add "related words" section

It would be nice to have a footer with all the related words, preferably broken up into several sections.

  • Other words containing this word at the beginning
  • Other words containing this word in the middle
  • Other words containing this word at the end
  • Characters/words in this word

This could be done with a SELECT EXISTS on the sqlite database.

Migrate to Qt Installer Framework

Instead of relying on Advanced Installer, I should migrate to the Qt Installer Framework. It would allow for a single file with many translations, instead of having to provide separate versions for each language version.

Support updating/notifying users of an update available to the program

The program should pop up a dialog informing users that an updated version of the application is available. This could be accomplished by checking Github Releases, or some other yet-to-be-defined method.

ACs:

  • When a new update is published, alert users with a dialog and offer to download it for them.

Change mouse cursor to pointing hand and highlight buttons in detail view

Is your feature request related to a problem? Please describe.
I've received feedback from multiple people that the speaker icon (for pronouncing words) and the save/share buttons do not appear easily clickable enough.

Describe the solution you'd like
When hovering over the speaker icon, the hand should turn into a pointer.
When hovering over the save/share buttons, the buttons should change colour (and highlight).

Bug when opening entries with dimensions larger than scroll area

  • To reproduce: look up 點, click on entry, and there will be a big space at the bottom. This happens only when the first entry clicked on is larger than the scroll area.
  • Entries affected: 行, 點; entries not affected: 白(Does this mean that only entries with definitions from more than one source are affected?)
  • It seems like this might be related to how the sizeHint() of the _scrollAreaWidget is determined, since the sizeHint() and actual size() both report the size with the extra space at the bottom. It might also be related to how the DefinitionWidget generates a size hint; seems like it might sometimes be too large?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.