jzohrab / lute Goto Github PK
View Code? Open in Web Editor NEWDEPRECATED: LUTE (Learning Using Texts) is a self-hosted web app for learning language through reading, based on Learning with Texts (LWT)
License: The Unlicense
DEPRECATED: LUTE (Learning Using Texts) is a self-hosted web app for learning language through reading, based on Learning with Texts (LWT)
License: The Unlicense
The current method of defining all terms can be streamlined by some form of "lemmatization", i.e., finding root terms of words.
Currently, Lute treats every word as different: eg, "blancas" and "blancos" are different, though both have the same parent term "blanco", as are "escribo" and "escribieron", though both are forms of the verb "escribir." When I first started out, I didn't mind having to manually make all of these mappings, but as I progress, I feel that's a hassle. I often want to have the parent images available for the child terms, just for my own enjoyment.
It would be nice to have an "auto-lemmatize" feature that can take a given text or book, and automatically map terms to existing parents.
Currently, the only functionality around parent terms, but a significant one in my experience, is the ability to see a bunch of sentences for a term when looking at the references. Eg. for me, the term "albergado" is linked to the parent term "albergar", and when I click on the "sentences" link of "albergado" I get an extensive list of sentences with albergar, albergaba, albergó, albergado, etc etc, which is great b/c I can see the term in my readings. In the future, I can also see this being useful for something like "create Anki cards for only parent terms, with examples of child terms" etc..
This iteration would be good enough for me, at present!
This only finds lemma that are different than the original term.
import stanza
import spacy_stanza
# Download the stanza model if necessary
# print("downloading model ...");
# stanza.download("es")
# Initialize the pipeline
nlp = spacy_stanza.load_pipeline("es")
text = """
Los acomodé contra las paredes, pensando en la comodidad y no en la estética.
"""
# with nlp.select_pipes(enable=['tok2vec', 'tagger', 'attribute_ruler', 'lemmatizer']):
doc = nlp(text)
# for token in doc:
# print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
# print(doc.ents)
lemmatized = [ token for token in doc if token.text != token.lemma_ ];
for token in lemmatized:
print(token.text, token.lemma_)
Run with python3 -W ignore ex.py
(when all dependencies are installed in a python venv):
Output:
Los él
acomodé acomodar
las el
paredes pared
pensando pensar
la el
la el
The lemmatizing code takes a while to load due to the extensive data, but that's ok. If people run the process outside of Lute, they'll understand the processing needs. And this is a first-pass idea anyway.
This data could be loaded into a file and then passed back to Lute for magic processing.
Obviously, having Lute manage this would be great, but it implies a full installation of some form of Python and spaCy or similar. This could be done with Docker containers too, managed by compose.
I don't think this would need a constantly running server for the lemma process, it could just run a "docker command" style microcontainer that just processes some input (list of terms) and returns the mapping.
However, possibly in the future it would be nice to do the lemmatization on-the-fly, which would need some kind of REST API server running. This might require a bunch of config though, to get the corpus(es) necessary for users with their specific languages.
I think creating a registry in docker hub would make it easier to install and update the app as new versions are released.
What do you guys thinks about it? Is it hard to manage?
This button is so confusing and easy to press by occasion. How can I remove it?
This feels like a nice idea, less clicky-clicky.
In addition, at least one user (quopquaoi in Discord) reported an issue: when clicking a term, the language dictionary (Jisho, Japanese dict) was "pulling focus", so the hotkey was actually getting sent to Jisho, instead of being handled by Lute. If hotkeys worked on hover, that wouldn't be a problem, because Lute would still be running the show (would have focus).
Possible implementation:
UX issues to sort out:
Description
See title :-)
To Reproduce
Lute v1.1.4
File attached, language = Chinese.
Stats currently are calculated in such a way that don't work for character-based languages, such as Chinese. For example, take this single-page text, with completely garbage terms created:
Even though the terms are trash, they cover 100% of the text, so you'd expect the % to be pretty high ... but the index page shows 0% known:
Obviously not right.
Currently Lute downcases all clicked words. e.g., clicking on "Futter" opens a form with the term "futter". German uses caps for nouns, so it would be good if the initial caps could be preserved for German nouns, e.g.:
The above is from branch spike_preserve_caps
pushed to this repo.
This feature of "preserving case" is really only needed for German, because that's the only lang that uses capitalization to indicate something special in the sentence ... but then again, in other languages, caps could be used for proper names etc like "Jesus Christ" or "Mexico".
To-do:
No other language needs case preservation -- and preserving case could be annoying (???? no idea). So, we might want a "preserve case" flag on the language -- annoying. Optionally, we could let users downcase the word somehow on term creation.
Is your feature request related to a problem? Please describe.
I believe that the current behavior discourages ever setting a word to well-known. My stated reasoning is as follows: A user may have intimate familiarity with the more common forms of a word and consequently wish to remove the colored highlight of learning statuses 1-5 but still desire to have quick reference to the parent word. The current behavior requires the user to completely open the entry for a word in order to access its definition, parent word, and tags when a mere hover should suffice.
Take this use case as an example; in Latin, in order to be able to conjugate a given verb into every tense, voice, and mood, the learner is required to have mastered the verb's so called "principal parts". The more frequent a learner's exposure to the principal parts, the easier their ultimate acquisition will be. The parent word box is an excellent place to put not only the principal parts but also a short definition of the headword for quick reference.
Describe the solution you'd like
Ideally, Lute would retain the identical pop up on hover behavior for words set to "well-known" as it does for learning statuses 1-5.
With docker pre-built images pushed to docker hub, users with docker would be able to start using Lute with just a few clicks:
$ mkdir lute
$ cd lute
$ # cp the .env file and compose file to this dir, editing them if needed
$ docker compose up
I don't feel any code changes are needed. V2 (soon to be merged into develop) already has the various code changes in place needed for the image to work well. ... But there may be other requirements as well for this to work on all client machines that I'm not aware of!!
Ref different builds for different architectures (https://docs.docker.com/build/building/multi-platform)
Sometimes words have more than one parent, depending on context.
Spanish example: in "Él se siente mejor," the verb is "sentirse", to feel. But in the below extract, it's "sentarse", to sit down:
Yo hice ademán de tomar asiento. —¿Quién le ha dicho que se siente? —murmuró don Basilio
Czech example (from Mycheze in Discord):
hoře is a a declension of hora the regular form of hoře and a conjugation of hořet. Two nouns and a verb all use the same "word." And their meanings aren't even close. First one's mountain, second's is sorrow or grief and the verb is to feel love.
Things to consider
For users, I think using "tags" as the parents would be easiest, allowing spaces in the tags, and making an ajax call to get the current list of words. I like the way that Lute currently shows the parent's definition in the dropdown:
I'm not sure how to replicate that with tags.
Sometimes texts have two spaces, which then causes problems with multi-word terms not matching. Eg, "llevar [ ] [ ] a" will not match with "llevar [ ] a". Just regex replace space-space on import text.
As at commit:
$ git log -n 1
commit 4b587946918bd8ee71e363c0e6b79e02a98be4ca (HEAD -> develop, origin/develop)
Author: Jeff Zohrab <[email protected]>
Date: Thu Mar 2 13:17:14 2023 -0600
Remove unused code.
Weird, only one record in the db:
mysql> select * from textitems2 where ti2txid = 359 and ti2text like '%resq%';
+---------+---------+---------+---------+----------+--------------+---------------+---------------+
| Ti2WoID | Ti2LgID | Ti2TxID | Ti2SeID | Ti2Order | Ti2WordCount | Ti2Text | Ti2TextLC |
+---------+---------+---------+---------+----------+--------------+---------------+---------------+
| 111581 | 1 | 359 | 94730 | 834 | 1 | resquebrajado | resquebrajado |
+---------+---------+---------+---------+----------+--------------+---------------+---------------+
1 row in set (0.14 sec)
I find the current front page not very useful, I'm always immediately leaving it to either read, or to go to the book list. When I open Lute, the only thing I'm interested in doing is reading, not seeing the current list of links, and when I'm done reading something, the only thing I'm interested in is creating or starting the next thing to read.
For new users, the existing list of links is good!
Perhaps the page could look like this, once some books have been defined:
The links at the bottom of the page could be rearranged to save real estate ... actually, for users that have already defined books, some of those links wouldn't even be needed, as the list of books is already there.
The book listing sort order should initially be something like:
Sample text showing issue:
1 とばされた家
ドロシーは、ヘンリーおじさんとエムおばさんと、三人でくらしていました。家は小さくて、部屋は一つだけですが、地下室がありました。三人が住むカンザスでは、よくたつまきが起こるのです。地下室ににげこめば、たつまきから身を守ることができました。
牛の世話をしたり、畑をたがやしたり、おじさんもおばさんも、いそがしくはたらいています。二人とも、ドロシーと遊んだり話したりするひまはありません。ドロシーはいつもひとりでした。家のまわりは草原で、友だちもいません。
「ぼくがいるよ、ワンワン」というように、ドロシーにとびついて走りまわるのは、小さくて黒い犬のトトです。ドロシーはトトが大すきで、トトもドロシーが大すきでした。朝からばんまでいっしょで、ドロシーがベッドに入ると、トトももぐりこんでくるのです。
「たつまきが近づいているぞ。」
ある日、空を見たおじさんがいいました。空は、どこまでも暗いはい色です。
「牛たちのようすを見てやらなければ。」
おじさんは牛小屋に走っていき、おばさんの声がひびきました。
「ドロシー、急いで地下室に入りなさい。」
おばさんは地下へ下りていきます。
Hover over "ドロシー", Shift + c, and the whole page is highlighted. Cry bitter tears.
Currently Lute using MySQL, which is potentially a bit heavyweight. Other tools like VocabSieve and Anki use Sqlite and it works fine for them.
If Lute used Sqlite, users wouldn't have to install and configure MySQL, which is pretty heavyweight. Install would be simplified to the following:
They should be able to use the built-in PHP web server from the public folder with $ php -S localhost:8000
. The project could be initially config'd to run the db file from an internal folder and file (saved within the project directory). Then users could modify the .env.local file, with clear instructions on a wiki page. It should suffice, performance-wise.
(Of course, if they want, they can install the Symfony CLI (https://symfony.com/download) for a lightweight web server, and run it from the app folder with $ symfony server:start
, or set up MAMP or XAMPP or whatever, but those would be overkill).
Branch sqlite_work_in_progress
pushed to this repo has a few things done already. The unit tests all pass. The code appears to work, but I haven't tested it with any real volume of data, so I don't yet know if it's performant.
dev:find
to find themCan keep creating baselines as needed
**** reset
rm var/data/test.sqlite
# using https://github.com/techouse/mysql-to-sqlite3
mysql2sqlite -f var/data/test.sqlite -d test_lute -u root --mysql-password root -W
**** fix col types so that conversion to sqlite sets up primary key as autoincrement
alter table languages modify column LgID INTEGER NOT NULL AUTO_INCREMENT;
alter table books modify column BkID INTEGER NOT NULL AUTO_INCREMENT;
alter table booktags modify column BtBkID INTEGER NOT NULL;
alter table booktags modify column BtT2ID INTEGER NOT NULL;
alter table sentences modify column SeID INTEGER NOT NULL AUTO_INCREMENT;
alter table statuses modify column StID INTEGER NOT NULL;
alter table tags modify column TgID INTEGER NOT NULL AUTO_INCREMENT;
alter table tags2 modify column T2ID INTEGER NOT NULL AUTO_INCREMENT;
alter table texts modify column TxID INTEGER NOT NULL AUTO_INCREMENT;
alter table texttags modify column TtTxID INTEGER NOT NULL;
alter table texttags modify column TtT2ID INTEGER NOT NULL;
alter table texttokens modify column TokTxID INTEGER NOT NULL;
alter table wordimages modify column WiID INTEGER NOT NULL AUTO_INCREMENT;
alter table wordparents modify column WpWoID INTEGER NOT NULL;
alter table wordparents modify column WpParentWoID INTEGER NOT NULL;
alter table words modify column WoID INTEGER NOT NULL AUTO_INCREMENT;
alter table words modify column WoLgID INTEGER NOT NULL;
alter table wordtags modify column WtWoID INTEGER NOT NULL;
alter table wordtags modify column WtTgID INTEGER NOT NULL;
**** fix trigger -- committed in rep. migrations
**** weird 'like' vs '=' issue in sqlite
ref Ref https://stackoverflow.com/questions/26719948/sqlite-why-select-like-works-and-equals-does-not
in SpaceDelimitedParser_IntTest.php
**** TODO db filename in settings
**** fix migration thing
***** commit a baseline _empty_ db to the reop
***** migration helper thing should copy the baseline
With attached file (language = Mandarin)
Notes from discord chat:
root@63dce09ff1c7:/# mysqldump --version
mysqldump Ver 10.19 Distrib 10.7.8-MariaDB, for debian-linux-gnu (aarch64)
Lute v1.1.7
Dump in terminal works:
# mysqldump -u root --password=root lute > /lute/backup.sql
root@63dce09ff1c7:/# cd lute
root@63dce09ff1c7:/lute# ls
backup.sql
Is your feature request related to a problem? Please describe.
For people who want to have a simple login feature.
! important ! It can only stop regular people to mess up your database, b/c the password is plaintext.
Maybe someone can help me to set/hash the password. For now, it's fine for me.
Describe the solution you'd like
Just use HTTP Basic Authentication
.env
file and change USERNAME
as well as PASSWORD
, both default values are lute# For login Lute
# You cannot use log out with the HTTP basic authenticator.
# Even if you log out from Symfony, your browser "remembers" your
#credentials and will send them on every request.
# -------------------
LUTE_USER_USERNAME=lute
LUTE_USER_PASSWORD=lute
./config/packages/security.yaml
as belowsecurity:
# https://symfony.com/doc/current/security.html#registering-the-user-hashing-passwords
password_hashers:
# Uncomment below 1 lines to restore orginal setting
# Symfony\Component\Security\Core\User\PasswordAuthenticatedUserInterface: 'auto'
# Uncomment below 1 line to use login feature (Http Basic Access)
Symfony\Component\Security\Core\User\InMemoryUser: plaintext
# https://symfony.com/doc/current/security.html#loading-the-user-the-user-provider
providers:
# Uncomment below 1 lines to restore orginal setting
# users_in_memory: { memory: null }
# Uncomment below 4 lines to use login feature (Http Basic Access)
users_in_memory:
memory:
users:
'%env(LUTE_USER_USERNAME)%': {password: '%env(LUTE_USER_PASSWORD)%', roles: ['ROLE_USER']}
firewalls:
dev:
pattern: ^/(_(profiler|wdt)|css|images|js)/
security: false
# TURNING OFF SECURITY FOR PROD.
# Yes, this looks bad, but Lute is designed to run locally only.
# There are _no security checks_.
# Uncomment below 3 lines to restore orginal setting
# prod:
# pattern: ^/
# security: false
# Uncomment below 4 lines to use login feature (Http Basic Access)
main:
lazy: true
provider: users_in_memory
http_basic:
realm: Secured Area
# activate different ways to authenticate
# https://symfony.com/doc/current/security.html#the-firewall
# https://symfony.com/doc/current/security/impersonating_user.html
# switch_user: true
# Easy way to control access for large sections of your site
# Note: Only the *first* access control that matches will be used
access_control:
# Uncomment below 1 line to use login feature (Http Basic Access)
- { path: ^/, roles: ROLE_USER }
# - { path: ^/admin, roles: ROLE_ADMIN }
# - { path: ^/profile, roles: ROLE_USER }
when@test:
security:
password_hashers:
# By default, password hashers are resource intensive and take time. This is
# important to generate secure password hashes. In tests however, secure hashes
# are not important, waste resources and increase test times. The following
# reduces the work factor to the lowest possible values.
Symfony\Component\Security\Core\User\PasswordAuthenticatedUserInterface:
algorithm: auto
cost: 4 # Lowest possible value for bcrypt
time_cost: 3 # Lowest possible value for argon
memory_cost: 10 # Lowest possible value for argon
There is no log out button, so you might use incognito window for every time login.
For docker user, if you want to change user/password after running docker
4.1 run docker compose stop
4.2 amend user/password you want in .env
4.3 run docker compose up
Additional context
Both YeYueMX and King-Awgwa have reported this. I thought it was working, but maybe not!
Description
Brief description of bug. Include copy-paste of error message details, or table of error data, if available.
To Reproduce
Steps to reproduce the behavior, e.g.:
Screenshots
If it will be helpful, add screenshots.
Extra software info, if not already included in the Description:
Currently, the dictionary URL contain arbitrary placeholders, it has a few consequences:
I formatted everything with a proper URL in my fork of LWT, as it works better than the previous system I would like to apply a similar system here. Let me detail it.
As far as I have though, the best system is to replace arbitrary code by instructions preceded by a nice prefix like "lute_". For instance:
###
by lute_term
.*
by an argument lute_popup=1
I don't think this part should go into the dictionary URL, as it makes it longer whatsoever and can be confusing for users. A field "display in pop-up", with a database counterpart can be better, but it is slightly more difficult to implement.
Backward compatibility with the "placeholder" system was not to hard to achieve, it shouldn't be harder to achieve here. I would also like to add a compatibility with my (prefixed by lwt_
) if you don't have any objection against it.
If it makes sense to you, I can work on it and make a nice PR.
For people coming to Lute from other systems, or for some learning materials, it would be nice to have a bulk CSV import of terms. This could also be used to help bulk translations for imported materials (even though I don't use Lute that way myself).
Is your feature request related to a problem? Please describe.
Go back to previous page after edit/update terms.
Describe the solution you'd like
eg. You have 20 pages of terms, you go to page 11, edit and save one of those terms, and it should stay on page 11.
Describe alternatives you've considered
Use command
+click to pop up a new window on macOS.
Additional context
I use phpMyAdmin instead for the moment.
I think the docs are a bit screwy, more than one person has gotten lost in a few places. Can simplify (at least for Docker).
Feedback from MyCheze in discord: "Many instructions say "Go to this page and do what it says" and then the page was like 1 instruction. It could have all been on the first page to streamline the process. ... I needed to edit the .env file to say BACKUP_HOST_DIR since it originally says BACKUP_DIR and Docker would fail. That was the hardest thing to find."
So, fix the wiki, and then the main page README.
Currently, the DB doesn't really enforce referential integrity, which could result in weird behaviour. Better get that under control.
e.g., the "words" table has PK "WoID", which is referenced in various tables. Currently, when hacking at the db with straight SQL, it appears that deletes from the "words" table aren't cascaded to child tables, which is bad -- eg. a wordparents record may refer to something that has been deleted, or, worse, was deleted and then replaced with something new.
Deletes in parent tables should cascade to child tables. I believe that the Doctrine model is correctly removing things from dependent tables when parents are removed (unit tests are covering that), but it is better to be safe than sorry.
Todo:
List of FKs to fix -- there may be others, but this is a good start.
CREATE TABLE IF NOT EXISTS "books" (
FOREIGN KEY("BkLgID") REFERENCES "languages" ("LgID") ON UPDATE NO ACTION ON DELETE NO ACTION
);
CREATE TABLE IF NOT EXISTS "bookstats" (
FOREIGN KEY("BkID") REFERENCES "books" ("BkID") ON UPDATE NO ACTION ON DELETE NO ACTION
);
CREATE TABLE IF NOT EXISTS "booktags" (
FOREIGN KEY("BtT2ID") REFERENCES "tags2" ("T2ID") ON UPDATE NO ACTION ON DELETE NO ACTION,
FOREIGN KEY("BtBkID") REFERENCES "books" ("BkID") ON UPDATE NO ACTION ON DELETE NO ACTION
);
CREATE TABLE IF NOT EXISTS "sentences" (
FOREIGN KEY("SeTxID") REFERENCES "texts" ("TxID") ON UPDATE NO ACTION ON DELETE NO ACTION,
FOREIGN KEY("SeLgID") REFERENCES "languages" ("LgID") ON UPDATE NO ACTION ON DELETE NO ACTION
);
CREATE TABLE IF NOT EXISTS "texts" (
FOREIGN KEY("TxBkID") REFERENCES "books" ("BkID") ON UPDATE NO ACTION ON DELETE NO ACTION,
FOREIGN KEY("TxLgID") REFERENCES "languages" ("LgID") ON UPDATE NO ACTION ON DELETE NO ACTION
);
CREATE TABLE IF NOT EXISTS "texttags" (
FOREIGN KEY("TtTxID") REFERENCES "texts" ("TxID") ON UPDATE NO ACTION ON DELETE NO ACTION,
FOREIGN KEY("TtT2ID") REFERENCES "tags2" ("T2ID") ON UPDATE NO ACTION ON DELETE NO ACTION
);
CREATE TABLE IF NOT EXISTS "texttokens" (
FOREIGN KEY("TokTxID") REFERENCES "texts" ("TxID") ON UPDATE NO ACTION ON DELETE NO ACTION
);
CREATE TABLE IF NOT EXISTS "wordimages" (
FOREIGN KEY("WiWoID") REFERENCES "words" ("WoID") ON UPDATE NO ACTION ON DELETE NO ACTION
);
CREATE TABLE IF NOT EXISTS "wordparents" (
FOREIGN KEY("WpParentWoID") REFERENCES "words" ("WoID") ON UPDATE NO ACTION ON DELETE NO ACTION,
FOREIGN KEY("WpWoID") REFERENCES "words" ("WoID") ON UPDATE NO ACTION ON DELETE NO ACTION
);
CREATE TABLE IF NOT EXISTS "wordtags" (
FOREIGN KEY("WtWoID") REFERENCES "words" ("WoID") ON UPDATE NO ACTION ON DELETE NO ACTION,
FOREIGN KEY("WtTgID") REFERENCES "tags" ("TgID") ON UPDATE NO ACTION ON DELETE NO ACTION
);
CREATE TABLE wordflashmessages (
FOREIGN KEY("WfWoID") REFERENCES "words" ("WoID") ON UPDATE NO ACTION ON DELETE NO ACTION
);
CREATE TABLE IF NOT EXISTS "words" (
FOREIGN KEY("WoLgID") REFERENCES "languages" ("LgID") ON UPDATE NO ACTION ON DELETE NO ACTION
);
With Sqlite have to follow a painful process: https://www.sqlite.org/lang_altertable.html#otheralter
Description
Multi-words display error.
To Reproduce
Steps to reproduce the behavior, e.g.:
The first was a new U.S. Department of Energy (DOE) report, which has not been made public.
DOE
, input Department of Energy
as parent then save itSolution
Terms
Department of Energy
)Department of Energy
)Update
Extra software info, if not already included in the Description:
Is your feature request related to a problem? Please describe.
In the V2 beta version, there is currently only one
backup file available, and the manual files are also rolled by the automatic backup.
Describe the solution you'd like
It would be helpful to create backup files with timestamps and keep all manual backups because there are instances where people prefer manual backups. It would also be great if those backup files could be managed in Lute.
Additionally, we might consider keeping 2-3 auto backup files, and include one file from the end of the previous month.
Describe alternatives you've considered
Add timestamps feature and move backup files manually.
Additional context
e.g., I have my language exceptions for spanish set up so that "A." is highlighted as an "unknown term" but clicking on that gives this:
The reason for this is that Symfony doesn't like URLs with dots in them. ... similar to the JPEG issue noted in src/entity/Term.php (I think).
Minor issue, but still needs fixing.
Currently, that's not possible.
Allow changes:
I'm not sure what happens if the parent mapping file contains dups, it doesn't look like it will handle it well.
eg
cat cats
cat cats
Also, the code may have problems if the same term is mapped to different parents -- eg a fake example:
parent somechild
child somechild
should only import the mapping once. Add a check and test.
Currently, "parent term mapping" is handled at "Import Parent Term mapping", and "term import" at "Import Terms". From a user's perspective, there's no real need for a difference, they're both just file imports. The parent term mapping takes a "parent [tab] child" mapping file, and the term import takes a CSV with several columns.
The Term Import could take a CSV with variable number of fields. Minimum required fields: language, term. Optional fields would be the rest. So, a parent mapping file could contain something like
language,term,parent
Spanish,gatos,gato
and the full import could have the whole thing.
Terms that partially overlap are both displayed. For example, suppose you defined terms "apple ball" and "ball cat". Given the imported text "apple ball cat dog", Lute will show this as "[apple ball][ball cat][dog]". The word "ball" is shown twice, because Lute cannot decide which term should really be shown ... only you know that.
Now, I know that this looks off, but it was the best solution I could come up with! For me studying Spanish, this has only occurred a few times while reading ... e.g. I have the terms "llegar a", and "a ver", which are both common constructs, and very occasionally while reading this has been rendered as "[llegar a][a ver]". It has not been enough of a bother for me to come up with an alternate solution -- after a cursory think, I believe that a good solution to this could be quite complicated, but I'd have to spend time investigating to be sure.
Issue given user @alguien in Discord, for the sentence "开始新生活吧,好吗?" (Note that "生" is rendered twice):
Possible solutions, sketches only:
1. "Mouseover reveal overlap" ... maybe something like ... "show the first term completely, and show the second one in such a way that the user knows that it's partially overlapped by the first; and on mouseover of the second term show it fully, and hide part of the first term." Tricky!
2. "Mouseover popup shows full" - "show the un-overlapped portion of the second term, but on mouseover the pop-up shows the full term".
Both of these solutions don't change the fact that Lute is only showing the first term fully, and that perhaps it's really the second term that's the right one, in the context, but at least you wouldn't see weird repeats. Solution 2 is easier, less moving parts.
Comment from user @alguien in Discord:
I see, the system as it is for spanish it sounds like a decent compromise but in chinese i don't think it's as good of an solution as it currently is because, sometimes the overall meaning will get through (不知知道) but other times the combination means something else entirely and the meaning of the text will get distorted(新生生活), so it's far from ideal. It's possible that this happens less with material not aimed at beginners but it's going to still happen from time to time regardless. If it's not as much of a coding error as an unintended consequence of the algorhitm I understand it will be problematic to fix just for one language, so I won't expect a fix anytime soon. I think that having this give priority to the first term would be okay here, not sure about other languages with similar issues but I think that'd parse well with Chinese
3. "Unhighlighted text mode" - Another possible solution, but a big change for the UI/user experience: when reading, add a "render white page" mode or something is which Lute doesn't show the terms as color-coded "chunks" on a page, just show the text pretty much as-is (white page), but for each character/word, on hover, pop up every possible phrase that it's a part of. Then, on un-check of "render white page", all of the color-coded terms show up.
Description
The display of Japanese numerical words is incorrect.
To Reproduce
Steps to reproduce the behavior, e.g.:
Creat New Text
言葉一つで傷つくような
ヤワな私を捧げたい今
二度と訪れない季節が
Save
Screenshots
I checked with MeCab in terminal, it can recognize numerical words very well.
Extra software info, if not already included in the Description:
Just wondering if anyone could help me run this project with Docker.
Steps that I took:
docker compose build
(as per instructions)c:\code\lute>docker compose build
time="2023-06-25T10:54:29+12:00" level=warning msg="The \"BACKUP_HOST_DIR\" variable is not set. Defaulting to a blank string."
1 error(s) decoding:
error decoding 'volumes[1]': invalid spec: :/lute/backup: empty section between colons
Description
Reading, and accidentally set the (new) parent for the term "cordillera" as "cordillera", resulting in error.
src/domain/dictionary->add
should check if the parent is the term, and prevent setting the term as its own parent!
Add a unit test to that effect.
Invalid timestamp
Hi Jeff! I'm trying to install Lute with a LAMP server, but I ran into a fatal error. Basically, the app crashes while displaying File 20221221_233742_add_textstatscache_timestamps.sql exception: Invalid default value for 'UpdatedDate' Quitting
.
The file is located at ./db/migrations/20221221_233742_add_textstatscache_timestamps.sql
. I have been trying multiple different timestamps as 0000-00-00 00:00:00
or 0
without any change. The annoying thing is that it also make the tests crash.
EDIT: skipping this instructions was enough, as it is a fresh DB install, I don't need the migration.
I have a few books loaded that I haven't read yet, or have pages remaining I haven't read yet. Sometimes when I ask for a term to show its sentences, I'm shown sentences that are well in advance in the book I'm reading, so I really don't know what the context is (I can sometimes guess for texts I've read in the past).
Outline of code changes:
Currently, user browsers may cache public/js/lute.js
, so when people update they don't necessarily get the latest code. Massively annoying for all.
The release process could easily do something like the following, as a very hacky but workable workaround:
lute.js
to lute_<somedatetime>.js
lute.js
to match the screwy filenameThis is all terribly hacky, of course, and the right thing to do would be something that I know nothing about, like WebPack. That belongs in a separate ticket; in the meantime, hacky hack above will do.
Hello all, if you're reading this, perhaps you have Docker experience and can help Dockerize Lute.
I've had a few people ask about having Lute Dockerized. It's been a loooong time since I've hacked on Docker, and I don't have the time or energy to spend fiddling with it. Perhaps someone is looking for a Docker project to try out, and would be willing to contribute.
Below is some info, LMK if you need more. This is a free project, I'm not looking to make money from it, and so I can't pay anyone. Hopefully this doesn't make you feel you're being taken advantage of ... (as a dev, I used to always feel that certain types of ppl were leeching off me) ... my limited dev time is better spent working on Lute itself.
Cheers and best wishes!
Jeff
Regex chars:
a-zA-ZÀ-ÖØ-öø-ȳͰ-Ͽἀ-ῼ
Sample story: https://www.greek-language.gr/certification/dbs/teachers/show.html?id=5
Figure out the dictionary
from Cynthios in slack:
Modern Greek
https://en.wiktionary.org/wiki/###
https://www.wordreference.com/gren/###
*https://www.deepl.com/translator#el/en/###
Currently, Lute uses the ratings from LWT (unknown, 1 to 5, then Well Known or Ignore). That's 7 choices, which I think is way too many.
Personally, when I'm reading, I'm really only thinking like this:
The current statuses are as follows:
(StID, StAbbreviation, StText)
(0, '?', 'Unknown'),
(1, '1', 'New (1)'),
(2, '2', 'New (2)'),
(3, '3', 'Learning (3)'),
(4, '4', 'Learning (4)'),
(5, '5', 'Learned'),
(99, 'WKn', 'Well Known'),
(98, 'Ign', 'Ignored');
I think these should be mapped as follows:
Changes needed:
There may be other places too. I think the numbers are hardcoded in a few places, but there are some predefined constants in src/Entity/Status.php.
NOTE: I don't know why I was so hung up on this at the time ... I could just choose to ignore the values I don't use :-)
maybe helps people using other tools.
Option: somehow toggle term selection so that people can click-drag-highlight to select terms. Currently click-drag creates multiword terms, could do something differently ??
While reading today, I had a word showing up as "known", but I couldn't find a reference for it (when I clicked "sentences"). Turns out a book had been archived, and the sentences had been removed. This is probably a relic of prior code that used to wipe sentences.
This will likely be an expensive operation (will have to re-parse any texts that don't have sentences), so maybe have a special section for one-time jobs, and only show this if there are texts that don't have sentences.
Will this blow up the size of the DB?
Description
RTL display is broken, it's not completely LTR either, I'll provide Arabic text.
Without diacritics (which represent short vowels), it shows words correctly, but in the wrong order LTR instead of RTL, but when I add the diacritics, it loses its ability to show connected letters mostly.
Another point is, Titles show completely fine, with the diacritics and in the correct order.
Reproduce
This is the file if you wanna try it on your system.
Edit: text from the file ... SORT OF ... even copying and pasting it here makes it behave strangely. The exclamation marks belong on the left of the first two lines. :-)
خالِد: صَبَاحُ الخَيرِ!
خُلُود: صَبَاحُ النُّور!
خالِد: كَيفَ حَالُكِ؟
خُلُود: بِخَيرٍ، وَأَنْتَ؟
خَالِد: فِي أَحْسَنِ حالٍ. ما اسمُكِ؟
خُلُود: اسمي خُلُودُ، وأَنتَ، ما اسمُكَ؟
خَالِد: أَنا خَالِد، تَشَرَّفْتُ بِلِقَائِكِ.
خُلُود: الشَّرَفُ لِي.
Screenshot of the text file:
Extra software info, if not already included in the Description:
I believe that @99MengXin already has Lute running with Docker on the Rasp Pi - in the Discord "install" channel ecurp_forp is having massive trouble getting started ... and I'm not sure what's wrong.
I have no idea what the issue is, and can't debug it or suggest changes, so maybe try setting up a Vagrant Box with a Rasp Pi OS, put Docker in there, install Lute in there, run it ... and watch the world explode. (Try using Vagrant box w/ Docker inside, and run Lute within Docker within Vagrant)
and the rest should be ... pretty straightforward. :-/ Not.
Is your feature request related to a problem? Please describe.
I suggest giving users the option to choose whether or not to use the book
feature when they import text. eg. lyric
Describe the solution you'd like
The default setting could be to not use the book
feature, and users can click on ⚡️ to activate it when they decide to do so.
Describe alternatives you've considered
When importing a TXT file, the input will be treated as a book
feature. When importing via copy and paste in the Text column, the input will be treated as a single page but keep the ⚡️ option.
Additional context
There is a dirty way currently. Create an one line text first then update the text, you will see the whole text and ⚡ for book
feature if you want to do so️ in the future.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.