luteorg / lute-v3 Goto Github PK

LUTE = Learning Using Texts: learn languages through reading. Python/Flask.

License: MIT License

Python 61.82% CSS 7.00% JavaScript 7.73% HTML 15.23% Gherkin 7.37% Shell 0.72% Dockerfile 0.12%

lute-v3's Introduction

Lute v3

This repo contains the source code for Lute (Learning Using Texts) v3, a Python/Flask tool for learning foreign languages through reading.

To learn more about Lute v3, or to install it for your own use and study, please see the Lute v3 manual.

Getting Started

Users

See the Lute v3 manual. Hop onto the Discord too.

Developing

For more information on building and developing, please see Development.

Contributing

If you'd like to contribute code to Lute (hooray!), check out the Contribution Guidelines. And with every repo star, an angel gets its wings.

License

Lute uses the MIT license: LICENSE

lute-v3's People

Contributors

Stargazers

Watchers

lute-v3's Issues

Only set translation for parent terms?

When creating a new Term and adding a new Parent, currently both get the same translation. E.g. from the Tutorial, click on "dogs" and create a new Term with new parent "dog", translation "woof." When saved, "dogs: woof" and "dog: woof" are both created -- but that's kind of redundant.

Would it be better to save the translation with the parent only?

I've created branch set_translation_for_parent_only which implements this, but it makes Lute look like it loses data. eg create new term for "dogs":

On save, the "dogs" and "dog" term hovers look good, however, when I click on "dogs" again I see the following:

This looks like some data has been lost.

Add "words read" statistics

words read today
words read cumulative

Todo:

add texts.TxWordCount - should be updated when page text is updated
user might read the same page several times, this should be included in the word count ... perhaps log reading in a separate table? Nope, good enough just to have the one track of the counts.

Support non-consecutive multi-word terms

Is your feature request related to a problem? Please describe.

Germanic languages like German or Dutch have words that span several words that can be separated, in particular verbs.

For example: "Ich lade dich zu meiner Party ein." means I invite you to my party. The verb is "einladen", but in the phrase, the "lade" and the "ein" are separate (and not only this is common, this is mandatory as per the grammar). These kinds of verbs are very common.

Describe the solution you'd like

Current multi-word selection doesn't work, and shift+clicking on words is used for bulk selection. Maybe alt+clicking or some other combination could work.

Describe alternatives you've considered

I don't think there's any,, or at least I can't imagine it. The suggestion on possible solution works maybe to add terms, but no idea on how it would work to show back the information, to be honest. The main reason for the feature request is in case I'm missing something obvious that can work as a solution (short of trying to parse natural grammars).

Feature request: Add option to remove image from word

It happened to me, that I chose an image for a word, and then I realized, that it would be very difficult to find an image that represented that particular word.

It would be great to be able to click on a selected image again and have that image separate from the word.

Add "book package" export and import

It would be nice be able to export individual books into files that someone else can load into their Lute. It'd include all the words/definitions, the audio, the bookmarks, everything. I think a feature like that would really help form a Lute community. I know that I'd love to share my texts with other Czech learners.

This would be something like an "anki package" zip file. Work involved:

define export, import format
export file versioning, so that importers can deal with old formats
export book and audio only, no terms?
export book and terms?
handle term images
overwrite existing terms on import, or ignore?
book with same title allowed or not?

Integrate Golden dict app with Lute

Is your feature request related to a problem? Please describe.
Golden dict is app can have many dictionaries. It may be good to integrate it with Lute
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like

I have an idea. Can open Golden dict in webpage like localhost so I could have its link like DeepL or Google translate to use. I'm not sure it can be successful!
A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Add "term count" statistics

Terms created today/cumulative -- when reading, sometimes my reading is creating too many status = 1 terms in one day, too much to bite off. If I create too much new stuff, there's not enough time to digest everything.

"Autopopulate" button for the parent term

In most languages, the parent is pretty visibly related to the child, but with a few letter changes. It´d be nice not to have to copy the whole word from below, but just press a button and then type the end of the word.

I'm not sure how this would be done for most languages ... this feels extremely tough.

Add "page break" markers (e.g "---") to text to force breaks of text during book creation

Currently pages break by tokens. Sometimes it would be nice to break chapters or sections forcibly.

e.g., creating a book with text

Hello.
---
Goodbye.

creates a book with two pages: "Hello.", "Goodbye." This page break marker does not change the max words per page, it works with it.

Test cases:

text with and without breaks
no blank pages should be created -- e.g. two lines with separators right next to each other shouldn't be created
no blank lines at top or bottom of page when split

Add "mark page done, but don't set all to known" check on final page

Per https://discord.com/channels/1074759089051160647/1074759090338812015/1185063655977529404

On the last page, there's only a green checkmark, but not a ">" to mark page as done. Need something similar on last page.

Completed books checkmark on left, line up titles

Per https://discord.com/channels/1074759089051160647/1181103335265288202/1184966117941313536

a.completed_book:before {
content: url('/static/icn/tick.png');
margin-right: 5px;
}

but then also have the book titles line up. Easiest/best just to put a new unnamed column in the datatables output.

Reset db settings to docker-specific values when using Docker (and check start-up time)

Its not a big deal, but right now you need to edit the database file to migrate from python to docker config.

This configuration should be converted automatically, or it should be possible to edit all settings (like "backup directory").

Add Kobo dictionary support (requires issue 5 to be done)

This is a good idea, simple offline-style dict.

Import new language option

Is your feature request related to a problem? Please describe.

Not all languages are supported and the regex/link stuff isn't possible for "non-techies."

Describe the solution you'd like

When adding a new language, there's the option to "load from predefined." It would be a great "stepping stone" if there was a simple text file that could be made and shared to "import" languages according to the settings that work well for another user.

This could also allow for more "default" languages to be supported if they're just a small text file that can be downloaded and added to future releases.

Describe alternatives you've considered

Additional context

This is how a file might look:
czechlanguagelute.txt

Allow navigate to arbitrary page in book

Either a slider, text box, or select box. When reading, I sometimes want to jump to the first page, or the list, arbitrarily. Shouldn't have to go through the book page by page.

Prerequisite for this: #86

Support for Korean parsing

Is your feature request related to a problem? Please describe.

I'd like for there to be a way to parse Korean texts as I'm learning Korean.

Describe the solution you'd like

Implement a Korean parser based on MeCab-Ko.

Describe alternatives you've considered

I tried to use MeCab to parse a Korean text, but it didn't work, even though MeCab and MeCab-Ko seem to have similarities based on my online research.

(I was using \p{Hangul} as the Regex for character matching, but I'm not sure if that's correct either so that could have been the issue.)

[Classical Chinese] preload default Dictionary 1 - clicking a word redirects to dictionary 1 entry page

Description
When using https://ctext.org/dictionary.pl?if=en&char=### as a dictionary 1, clicking a word in the text

To Reproduce

Steps to reproduce the behavior, e.g.:

Create new language and use Classical Chinese preload settings
Import a chinese language text and click on any highlighted word
Page will be redirected to the ctext.org definition page

This issue does not come up with other dictionaries such as:
https://www.archchinese.com/chinese_english_dictionary.html?find=###

Extra software info, if not already included in the Description:

OS: Both Mac and Windows
setup: docker
Version: 2.1.3 (@latest version)

Feature request: Bulk set term status

Just like bulk set parent, but status :)

Export terms

Is it possible to add ability to export terms as a CSV or TXT file format? It would be awesome if we could export filtered terms, not all of them.

Feature request: Add created date to Books section

Would it be possible to add a column in the Books section with the book creation date and an option to sort based on it (or maybe a simple ordinal number)?

I can use tags to mark records as newest (or a special naming convention), but that is not as convenient.

Add Audio controls to play text mp3.

Store audio in same parent folder as where images are stored, perhaps ... or stream from URI?

Requirements:

page content to be ajaxed in, not URL-per-page.
store or stream audio file
audio player and controls

Import subtitle file, add auto-bookmarks

From Discord:

It'd be great if you could upload an .srt or other subtitle file with an audio file and have it convert it to a txt for reading, but also add bookmarks for the different pages (or even for all the sentences and next to them, there's a little button that, basically, says "skip to this sentence in the audio"). This would make Lute amazing to use with anything audio based alongside Whisper getting better and better.

Challenges I can see with this request:

bookmarks aren't associated with pages or sentences, so there will be many many bookmarks in the audio timeline and no clear way to jump to the text
during parsing, the timestamp data isn't stored, it's just another token. This could potentially be worked around with the base parser doing a preliminary pass to get timestamps, and then the actual parsers being called for each section between stamps. This is a big change from the current method, but perhaps is doable. I'm not sure of the payoff, but I'd need to work on a new language to fully understand the ins and outs.

Remove excess languages table fields

Unused fields:

LgRemoveSpaces
LgSplitEachChar

Possibly ignore word accents when saving terms in DB

Notes from a slack chat:

I will attempt to articulate what I think the deal breaker could potentially be without getting into the weeds of how Ancient Greek actually works. You may have noticed that the words contain accented characters. There are various features of the language that cause those accents to change without changing anything about the meaning of the word. For example, I would have to define γὰρ twice because it can appear as either γὰρ OR γάρ. That's one of the most common words in the language meaning something like "for" or "since" or "because". Now, it only comes in those two flavors but I think you could see how quickly it would become tedious to define words over and over just because of diacritics.

With respect to this question accents, I have noticed that chrome is character agnostic when it does its "find in page" search. If I search γὰρ it will highlight γάρ and even γαρ. Perhaps Lute could have in the options specific to a language to ignore accents as well?

Let's continue to use the example of γὰρ, when I am reading the text, I would still see the orthography displayed as the author intended but, behind the scenes in the database, as far as Lute is concerned, γὰρ, γάρ, and γαρ share the same entry.

I know the original LWT let you do character substitutions but it actually just hotswapped one character for another and that fact was reflected in the actual text that you are reading. Basically it would see the character set as consisting of only 24 characters (not accounting for uppercase). The unaccented Greek alphabet.

My thoughts:

Rendered TextTokens (i.e., words shown in the reading pane) would include the accents, but Terms (stored in the db) would be without accents, and the rendered TextTokens would be associated to Terms w/o accents.

No idea at the moment if this would be tough or not!

Follow standards for javascript "data-" attribute names

Currently, lute/templates/read/textitem.html has the following:

      tid="{{ item.text_id }}"
      lid="{{ item.lang_id }}"
      paraid="{{ item.para_id }}"
      seid="{{ item.se_id }}"
      data_text="{{ item.text }}"
      data_status_class="{{ item.status_class }}"
      data_order="{{ item.order }}"
{% if item.wo_id is not none %}
      data_wid="{{ item.wo_id }}"

This doesn't follow javascript standards, e.g outlined at https://dev.to/dev-harbiola/custom-data-attributes-in-html-a-guide-to-data--373.

These could be changed as follows:

tid => data-tid (or data-text-id)
lid => data-lid (or data-lang-id)
paraid => data-para-id
data-se-id or data-sentence-id
data-status-class
data-order
data-wid or data-word-id

I believe that these are only referenced in lute/static/lute.js:

(.venv) MacBook-Pro:lute-v3 jeff$ for t in tid lid paraid seid data_text data_status_class data_order data_wid; do
>   echo ------------------------------------
>   echo $t
>   inv search $t | grep lute.js    # limit search to only lute.js
> done
------------------------------------
tid
lute/static/js/lute.js:function prepareTextInteractions(textid) {
------------------------------------
lid
lute/static/js/lute.js:  elid = parseInt(el.attr('data_wid'));
lute/static/js/lute.js:    url: `/read/termpopup/${elid}`,
lute/static/js/lute.js:  const lid = parseInt(el.attr('lid'));
lute/static/js/lute.js:  const url = `/read/termform/${lid}/${sendtext}?${extras}`;
lute/static/js/lute.js:  const langid = firstel.attr('lid');
------------------------------------
paraid
lute/static/js/lute.js:    attr_name = 'paraid';
lute/static/js/lute.js:    attr_value = w.attr('paraid');
------------------------------------
seid
lute/static/js/lute.js:  let attr_name = 'seid';
lute/static/js/lute.js:  let attr_value = w.attr('seid');
------------------------------------
data_text
lute/static/js/lute.js:  let text = extra_args.textparts ?? [ el.attr('data_text') ];
------------------------------------
data_status_class
lute/static/js/lute.js: * Terms have data_status_class attribute.  If highlights should be shown,
lute/static/js/lute.js:/** Add the data_status_class to the term's classes. */
lute/static/js/lute.js:  el.addClass(el.attr("data_status_class"));
lute/static/js/lute.js:    el.removeClass(el.attr("data_status_class"));
lute/static/js/lute.js:    const st = nextword.attr('data_status_class');
lute/static/js/lute.js:  let update_data_status_class = function (e) {
lute/static/js/lute.js:        .attr('data_status_class',`${newClass}`);
lute/static/js/lute.js:  $('span.kwordmarked').each(update_data_status_class);
lute/static/js/lute.js:  $('span.wordhover').each(update_data_status_class);
------------------------------------
data_order
lute/static/js/lute.js:let save_curr_data_order = function(el) {
lute/static/js/lute.js:  LUTE_CURR_TERM_DATA_ORDER = parseInt(el.attr('data_order'));
lute/static/js/lute.js:  save_curr_data_order($(this));
lute/static/js/lute.js:  save_curr_data_order($(this));
lute/static/js/lute.js:  const first = parseInt(start_el.attr('data_order'))
lute/static/js/lute.js:  const last = parseInt(end_el.attr('data_order'));
lute/static/js/lute.js:    const ord = $(this).attr("data_order");
lute/static/js/lute.js:  save_curr_data_order(el);
lute/static/js/lute.js:    return $(a).attr('data_order') - $(b).attr('data_order');
lute/static/js/lute.js:  const i = words.toArray().findIndex(x => parseInt(x.getAttribute('data_order')) === LUTE_CURR_TERM_DATA_ORDER);
lute/static/js/lute.js:  save_curr_data_order(curr);
------------------------------------
data_wid
lute/static/js/lute.js:  elid = parseInt(el.attr('data_wid'));
(.venv) MacBook-Pro:lute-v3 jeff$

I don't know if this work is worth it ... following standards is good, but not critical. Is this make-work only?

Add opinionated Anki export

(I've revised this issue based on new thoughts)

Summary

I wrote Lute based off of LWT, but dropped the SRS feature of LWT: the code was brutal, and for the initial MVP (minimal viable product) release of Lute I didn't feel that it was a necessary feature. I still don't :-) for a few reasons:

A brute-force approach of "just test everything" isn't the best. In some cases like verb inflections, I don't need to test every permutation -- perhaps I should only see the parent, and a few child examples. Also, there are many words that I've only seen once so far in my reading, and may never see again. I think I should be able to select the terms I want/need to test.
I question whether Anki testing falls within the primary use-case of Lute, which is just to get you reading, and to hopefully encourage you to keep reading. I read a lot with Lute, and would vastly prefer to focus on reading, rather than testing.
Testing by seeing sentences I've read, and then regurgitating those (or similar), isn't very fun for me!

Even Steve Kaufmann of LingQ doesn't really recommend using their testing feature, probably for the same reasons as I have above. :-) (He does recommend their "sentence mode" for building sentences, I believe.)

Exporting terms to a CSV, and images to somewhere else, may be trickier than needed, so I'll with using AnkiConnect as the first iteration of this.

This will be an opinionated export: it will assume certain note types, deck name, field names, etc.

Design/UX notes

The following are some rough ideas only. I'd need to try implementations to really get a handle on the UX.

Config

Some ppl don't use Anki, so they shouldn't see it if they don't want it. Default should be to see it.
add setting to set the Ankiweb address and port; default should be whatever needed

Exports from term listing

The term listing has a checkbox. Users could select the terms they want to export, and then click an "export to anki" button.

Ankiconnect supports exporting images, see FooSoft/anki-connect#158.

Lute doesn't store sample sentences for terms, but it does has a reference lookup that could get the latest sentence for any term. The sentences table is loaded on opening a page for reading, so even if the page isn't marked read something should be in the table. Need to update the export to include a non-read page's sentences.

Fields to export:

term id (can change if people delete terms)
language id
term
audio (left blank, as lute doesn't have audio, but people might add it later)
language
image
translation
parent term
tags
sample sentence as "manual cloze" (with term replaced by ____)
sample sentence with cloze removed

failed exports

model doesn't exist
deck name doesn't exist
ankiconnect isn't on
misc errors?

Anki note/card templates

I'll put some kind of pre-designed note type in a public place so people can access it ... that's probably the easiest thing to do. Creating a single-card shared deck on AnkiWeb would be easiest. AnkiConnect apparently does let you create models using the API, I'm not sure how tough that would be. https://foosoft.net/projects/anki-connect/index.html#model-actions

It would be nice to have Anki cards be able click back to Lute, if Lute's running, so that people can see the term and its sample sentences again.

Improve backup visibility

I want to know about my backups:

show listing of files
show size of backups

Add book "chapter" markings and table of contents

It would be very nice to have a chapter marking or similar, and then have a table of contents or similar, showing the current chapter number at the top of the page perhaps.

When reading long texts, I sometimes want to know how many pages until I reach the end of the chapter.

Add audio files for word pronunciation (TTS)

Audio files could be stored in user's data folder, and files could be found by md5 of term, e.g.

Big effort required:

web service calls to some uri to get a file
support different endpoints? polly, azure, forvo, etc etc --
API tokens in settings for the different services
selecting the service and voice you want to call
how to store the file, naming etc
allow only one sample, or multiple, for any given word?
include audio in any anki export
play on mouse over setting? or just mouse over the speaker?

Lots of things to do here.

Add import .epub files

Lute can import .txt files, would be nice to also support importing .epub.

There are python libraries for importing .epubs, eg: https://andrew-muller.medium.com/getting-text-from-epub-files-in-python-fbfe5df5c2da

Don't know if that's the best one.

Code outline

The code in develop has things in place for the epub import to be implemented:

/lute/book/routes.py method _get_file_content(filefielddata) has a check for the filename extension, and calls the service.py for epub parsing
/lute/book/service.py has a stub method get_epub_content(epub_file_field_data) to be implemented.
/tests/acceptance/book.feature has a commented-out epub import test. The implementation should add a short sample epub file to tests/acceptance/sample_files/.

Todo items

The code has a few comments with "todo epub:", where things should be updated:

$ inv todos | grep -i epub
Group: epub
  ./pyproject.toml                                  :  # TODO epub: add epub parsing library to dependencies
  ./tests/acceptance/book.feature                   :  # TODO epub: add an epub file to sample_files, activate this test.
  ./lute/book/service.py                            :  raise ValueError("TODO epub: to be implemented.")
  ./lute/book/forms.py                              :  # TODO epub: add epub to the list, change prompt.

Keep home screen book filter state on page refresh

The home screen filters are cleared on every page refresh. It should save state, like the Term listing does.

Add "term export"

Make it easy to export terms. This would let users share data, ppl could group up to make data mappings, etc.

Initial solution: just export everything into a CSV. :-) Good enough for now.

Possible long-term solution: somehow combine this with the "filters" in the Term Listing page, so that once a filter is applied, only those terms would be exported.

"Easy exporting and syncing of "parent" database (users learning the same lang could crowdsource)" - "crowdsourcing" to me implies some kind of central place to store definitions, choose the best, filter out trash etc -- that's a different beast.

Hotkey status update should update the form if it's displayed

You can't use the hot key AND put in the parent since it doesn't update right hand side, so saving would "reset" the word status

I hit the hotkey to set status to 3, but the form still has it as 1, so saving it will override.

Text imports mix up lines

Description

I've noticed that some lines (not very common, but some) are out of order from the original text file that was imported. I noticed this while listening along with an audiobook and certain things would be skipped then gone back to in a few moments. It's not an issue with the next file (see attachments).

To Reproduce

Steps to reproduce the behavior, e.g.:

Import using a text file
Notice that some lines are out of order

Screenshots

This is the text I confirmed with. Harry Potter a Fenixuv rad - J. K. Rowling.txt. An example of a line that is out of place is S mou drahou starou matičkou, ano which is on line 1889.

Here it is in Lute out of order:

Extra software info, if not already included in the Description:

OS (e.g., iOS, windows): Pop!_OS
Browser (e.g., chrome, safari): Chromium
How you've installed Lute (Docker, python, source): Python (v3)
Version: 3.0.0b11

Investigate internationalization

The grinberg article appears to be the best resource.

References:

Mass term adding

Is your feature request related to a problem? Please describe.

I sometimes have some dead time in my day when I'd like to just "add more words to my dictionary." I don't really want to read, I just want to mindlessly add more terms and definitions so that when I come across them in reading, it's more seamless.

Describe the solution you'd like

To be able to enter a special view of a book where it only has each new/unknown word ONCE, in order of frequency. Then I can just go through them.

It'd also be cool to have the same, but in alphabetical order so that you might find word families and be able to "kill a bunch of birds with one copy and paste." (Add the head word, then copy it and paste it as a parent in the next 5-6 in the list)

Add support for custom fonts

Discord discussion notes with user "Jiggle":

before I was editing the original css files (styles, styles-compact) and was storing the font files in the same folder as these css files
but unfortunately the custom_styles is not a static file as I understand

css file edits:

@font-face {
font-family: "MYFONT";
src: url("Rubik-Regular.woff") format("woff");

font-style: normal;
font-weight: normal;
}

if the font is in the same folder as the css file, it works

Notes:

The "custom styles" are actually a flask route. I'm not sure where the files would need to be stored for them to be available.
For docker containers, this would have to be a mounted directory; could potentially just store them in the "data" folder for Docker. Maybe should just store them there for pip users too ... would require a custom config.yml file, no good way to work around that, I think.

Investigate use of spaCy or NLTK for parsing

This is a complicated issue

spaCy and the stanford stanza project are very good parsing libraries, it would be nice to use something like that instead of (hacky?) regex solutions. Unfortunately spaCy is very slow, so things would need to change quite a lot to make it usable within Lute.

Currently Lute parses very frequently:

when a new book is created
when a page is open for reading
when a new term is created

To get around the frequent parsing (for reading), Lute could:

parse the book once, and store the tokens and token boundaries (zero-width strings) in the texts.TxText field. Then, when reading, everything is already parsed
if creating terms from the reading pane, the zero-width strings (spans with spaces) could be sent to the route that creates the terms. No extra parsing would be needed
creating terms from the term index page would still require parsing ... that would be slow. The parsing library could be loaded one-time only and then kept available for any runtime session
I'm not sure how the spaCy dictionaries would be loaded, especially for Docker. They'd have to be ... mounted somehow, in the user data folder.

Add user pref toggle to not pause audio on click

Currently, when the term form is displayed while reading, it sends an event to pause the audio. Some users want to be able to disable that, i.e. audio continues even when the form is opened.

[Feature] Ability to delete a term from reading screen.

It is occasionally needed to be able to remove a term while staying on the reading page.

Ajax in page content, instead of using loading read/<bookid>/page/<pagenum>

Prerequisite (?) for audio support that doesn't break when moving to new pages, allow responsive paging.

Up/down status arrow shouldn't scroll the page

Pressing the up arrow scrolls up on the page (and now with the audioplayer, there's less vertical space, so it's more common)

Add "sentence notes" for term references

I think there are many use cases for this:

There are often cases where a term's usage, or special meaning, is only rarely given, and it's good to keep track of those specially
Some sentences may show special grammar, interesting constructions
Useful place to record questions etc -- like a running notebook of stuff
EDIT: could also use these for sentence translations as well

These notes could be tagged by category, or by term, say, and when looking at a term, the associated notes would be returned too.

Sentences are "Value objects", the id could be tracked by md5 etc of the sentence text, including the language id as part of the md5. case-insens md5 too.

Add control (slider or dropdown?) to increase/decrease text size on the reading screen.

Currently users set font size through custom settings, but a nicer method would be a slider or dropdown.

The font size should be applied to all span.textitem elements on the web page.
On move to a next/previous page, the font size should stay at the user's setting -- so store it in the localsettings or whatever.
I'm not sure if the font-size should be set in px, em, or rem. From my reading, rem is the way to go. I guess that the rem could go between, what, 25% up to 500%? No idea what range makes sense.

For the first pass implementation, don't bother storing this in the db settings table (i.e. where the custom css is stored). That would require a web service call to set the value, another to reload at launch, etc, a bunch of code for little value. If it's easy enough to adjust, it should suffice.

Example: https://codepen.io/p-mohamed-elsawy/pen/bJGgaZ

Improve image search and save

see if can add a text box to refine image search results, sometimes the default search images aren't that great.
possible to search by the parent term, instead of the child? e.g., if "dogs" has parent "dog", then the image search should be done with "dog".

ChatGPT Integration

Is your feature request related to a problem? Please describe.

Many small issues with texts or dictionaries are easily solved by ChatGPT and a few prompts. But copy/pasting things between Lute and ChatGPT can be cumbersome and slow. It would be great to have the integrated a little.

Describe the solution you'd like

I would like to see a few ChatGPT features.

Add ChatGPT similar to a dictionary source. When you click on a word/sentence, you have a button to send that variable (plus a predefined prompt) to ChatGPT and then receive a response. It would be great for looking up words that don't appear in your dictionary or for getting an explanation for something. ChatGPT can also provide cultural context or come up with mnemonics. There are tons of possibilities if it's configurable in the language settings.
Allow ChatGPT to reformat pages. Since Lute only works on text, some of the pages can be imported weird, and it'd be really handy to have a button that just sends that text to ChatGPT, tells it to reformat it to better fit Lute, then to replace that page with the new response. There'd need to be some prompt engineering, but it would be very useful. I've personally been coming across lots of typos in the webnovels that I'm reading which ChatGPT would solve in an instant.

Additional context

Here're some example prompts that I've been using:

For defining stubborn words:

Help me translate this word as it doesn't appear in dictionaries.

The word is: olizovala

Format the response like this, but replace the capitalized words with the correct information:

WORD
UNCONJUGATED, UNDECLINED DICTIONARY FORM OF THE WORD
PART OF SPEECH

TRANSLATION IN ENGLISH

OTHER MEANING (ONLY IF APPLICABLE)

SHORT EXAMPLE SENTENCE USING THE WORD

VERY SHORT EXPLANATION OF THE SIGINIFICANCE OF THE WORD USING SIMPLE ENGLISH

For reformatting:

The following passage has a few typos and formatting issues. Please rewrite the passage exactly the same, but fix any typos and reformat it to be more readable. Keep all the "artistic" choices made by the author.

Here's the passage:

Hotkey arrow up and down to increase/decrease status number

Hotkey left-right moves to terms, so up/down could change status.

Would need to go up related to the current status. If multiple terms chosen, could just start with the lowest status. Go from 1-2-3-4-5-WellKnown, skip "ignored".

The function to update is in lute.js, handle_keydown -- at least, that is how I was intending to do it. If there is a better option, LMK.

Increase acceptance test coverage (master list)

Commit ce50ee5112f27 added a basic acceptance (browser-level) test of Lute using Panther: reading a text, and creating Terms and multi-word Terms.

Per https://github.com/jzohrab/lute/blob/develop/tests/acceptance/README.md#tests-to-write, there are a bunch of tests to write, and if extensive work is done on any section of Lute then some of these acceptance might be useful.

Languages
- List languages
- ~~Create new lang~~
Create text
- ~~from textbox~~
- ~~from file~~
- ~~import web page~~
Texts
- ~~archive text~~
- ~~view archive~~
- ~~unarchive text~~
- delete text
Terms
- ~~list terms~~
- search for terms
- ~~create new term from main form~~
- bulk map parents from listing
Term Tags
- ~~list all~~
- ~~create~~
- ~~delete~~
Reading
- Update refreshes multiple terms
- ~~Update on one page updates other books~~
- hotkeys (done?)
Parent term mapping
- export book file
- export language file
- import mapping file
Backups
- backup setting defaults
- set backup settings
- create a backup (done, just need to verify file)
~~Version and software info~~

Change how dictionaries are defined and used

Copying over notes from jzohrab/lute#21.

Currently, Lute stores "dictionary 1" and "dictionary 2" URLs in the Language table, with placeholders for term substitution. This creates a few limitations:

use weird "*" character to designate a pop-up dictionary
restricted to only using HTML dictionaries, no easy way to handle json, plug-ins, or other types of dictionaries
limited to 2 dicts per language

It is potentially worth it to change dictionaries into first-class entities, e.g. with a brand new user form like this:

field	notes
dictionary URL	textbox, the url with "###" placeholders -- better yet, change the placeholder to "[LUTETERM]" or similar, since "#" is a valid URL entry (e.g., looking up "https://en.m.wiktionary.org/wiki/essere#Italian" would use a URL like "https://en.m.wiktionary.org/wiki/[LUTETERM]#Italian")
opens in pop-up?	checkbox
encoding	dropdown or textbox
returns	dropdown (html default, or json -- reason for the "json" option is that some languages seem to only have dictionaries available via a json API)
active	checkbox. Sometimes some dictionaries will be more useful than others -- eg, when offline, any online dicts are useless, so I could potentially deactivate the online dicts and only use an offline Kobo dict or whatever.

These would be stored in a new dictionaries table, and would be linked to the Languages. First draft UI implementation could be a dedicated UI screen to define dictionaries, that would be easiest (It's possible to create child subforms, but I haven't done that yet in Symfony :-) ).

One dict would have to be marked as primary. A language could define one or multiple dicts.

Add book text search

If I know a book has a term, I just want to search for it somewhere, and have the pages where it shows up.

luteorg / lute-v3 Goto Github PK

lute-v3's Introduction

Lute v3

Getting Started

Users

Developing

Contributing

License

lute-v3's People

Contributors

Stargazers

Watchers

Forkers

lute-v3's Issues

Summary

Design/UX notes

Config

Exports from term listing

failed exports

Anki note/card templates

Code outline

Todo items

Recommend Projects

Recommend Topics

Recommend Org

Jobs