GithubHelp home page GithubHelp logo

torik42 / yalafi Goto Github PK

View Code? Open in Web Editor NEW
64.0 2.0 12.0 2.04 MB

Yet another LaTeX filter

License: GNU General Public License v3.0

Python 93.99% TeX 0.47% HTML 0.24% Shell 0.61% Vim Script 3.24% sed 1.45%
latex filter languagetool python-3 parser html-report vim emacs

yalafi's People

Contributors

blipp avatar juliangoeltz avatar matze-dd avatar mfbehrens99 avatar mstmob avatar symphorien avatar torik42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

yalafi's Issues

Add submodule for \documentclass

We should add a submodule yalafi.documentclasses, together with options --dcls and --documentclass for yalafi and yalafi.shell, respectively.

Provide output option for single-line messages

There seems to be a Vim issue with handling of multi-byte characters, if Vim's errorformat is used to parse mult-line messages. See here for the bug description.
This causes problems with Vim plugins, e.g. the compiler vlty for vimtex, if the LaTeX text contains many multi-byte characters, as in Russian texts with cyrillic letters. See here for an example.

Even if the Vim issue is fixed, it will take some time for the fix to be incorporated into Linux distributions. We therefore should provide a single-line format for option --output, for instance

  • --output sl-1: include file name, line, column, LT'serror message text
  • --output sl-2: as sl-1, but append LT's replacement suggestions

This option should be used in editors/vlty.py and editors/ltyc.py.

Support multi-language documents

A way with -- hopefully -- only moderate intrusion could be as follows.

YaLafi core

  • Macros for language change as \selectlanguage leave special tokens that indicate the switch of language in the expanded token list.
  • The parser switches language-dependent settings as maths replacements, when it sees such a special token.
  • Currently, the function utils.get_txt_pos() transforms the expanded token list into a string and a character position map at the very end. This is changed. It then produces separate strings and position maps for each detected language, using the special tokens from above.
  • Altogether, this should even allow nested language changes and changes, e.g., inside of \text in equations.

Yalafi.shell

  • It sends multiple requests to LanguageTool.
  • As currently, messages are sorted according to their occurrence in the LaTeX text.

We should implement some heuristic that decides whether a language change by \foreignlanguage really breaks the text flow of the surrounding language, or whether it is rather short. In the latter case, it should be substituted by a placeholder in the text flow of the surrounding language (as is done with inline formulas), continuing its sentence / paragraph.

There will remain at least one bug. The scanner currently is initialised with a language code that, for instance on 'de', detects "' as a special token in German texts. Since the scanner first scans the whole LaTeX text, this won't be changed inside of \foreignlanguage{english}{"'}.

Improve error recovery on missing braces / brackets

In the following snippet, the closing ] is missing.

\section[1}{Title}
This is a a text.

Currently, this correctly produces an error indication from the LaTeX filter. However, the text behind [ is skipped, as it is consumed when trying to find the closing ].
This unnecessarily hides mistakes, as here 'a a'.

"Brute force" modification of LaTeX code

In a one-file document, the LaTeX filter sees the preamble that often contains macros not properly handled "out-of-the-box". Currently, the LaTeX text can only be modified with special macros \LTadd, \LTskip, \LTalter.
Placing the critical parts of the preamble in an \LTskip{...} may fail, as this changes the environment seen by the TeX system.

  • Add special comments like %%LT_SKIP_BEGIN%% and %%LT_SKIP_END%% that cause the filter to skip all input between these marks.
  • Add a section to README.md that collects all available means to manipulate the LaTeX and the plain text.

Provide option for simpler parsing of displayed equations

The current parsing of displayed equations assumes a certain style of writing formulas that may be inappropriate. On the other hand, at least inclusion of a place holder like 'V-V-V' together with trailing punctuation extracted from the formula does help the proofreading program. (Even if this omits text parts included, e.g., with \text.)

We could provide options for yalafi and yalafi.shell that switch to this simpler mode for all equation environments, which have been registered with EquEnv() in the fully detailed mode. The replacement like 'V-V-V' should be taken from a rotating collection.

See the discussion in Issue #83.

Emacs kills its own LT server

If one uses Emacs via the call of script yalafi-emacs, then a newly started LT server is stopped immediately at the end of a language check. Therefore, usage of an LT server rather decreases speed, as starting the server is more expensive than starting the command-line LT tool. (This does not happen with Vim and script yalafi-grammarous: subsequent checks are really faster with a server.)

Probably, the new LT server process has to be "detached" somehow from the sub-process that is started by Emacs to perform a language check.

Macro(): problem with parameter 'extract'

Method Parser.expand_arguments() has to be revised. In case a macro has set parameter extract, the behaviour may be incorrect, if repl is a function or if opts is not empty.

Multi-language documents: scanner is fixed for initial language

For instance, in an English document (--language en-GB), the input

\foreignlanguage{german}{"'}

produces

"'

Instead, the German right double quotes should result.

It is unlikely that we can fix that in the near future. For a real solution, scanner and parser would have to be reorganised to work in a pipelined way. Currently, the scanner first reads the whole input text and produces a token list, on which the parser then operates.

EDIT. A probably simpler solution:

  • Do not detect "' etc. in the scanner.
  • Let the parser process " as 'active character' in German text parts, as in TeX.

Use LanguageTool installed by packet manager

On some systems, LanguageTool can be installed via a packet manager. This places an executable Bash script languagetool in the standard search path that dispatches to the different software components. Compare, for instance, Issue #19.
At least under Arch Linux, the script also can start an LT server (option --http).

It seems reasonable to somewhat redefine option --lt-command ... for this case. We simply would ignore an option value in --lt-directory ... (just set it to current directory), and call languagetool --http ... in case a local server has to be used (--server my).

This then also could be integrated as option into the Vim interface scripts.

positional argument of figure environment

Hi,

let me thank you at first for this great project and especially the good vim and languagetool integration.

I've noticed that yalafi leaves the positional argument of the figure environment in.

Input:

\begin{figure}[h]
    \centering
    \includegraphics[width=0.7\linewidth]{./image.png}
\end{figure}

Output: [h]

I'm using Yalafi version 1.1.1 (from pip) with the Languagetool Plugin in vim.

Best Regards
Max

Yalafi.shell with HTML server: wrong diagnostics on unknown language

Assume that yalafi.shell is used with '--server my', and an unknown language like '--language en-XX' is specified. Then after some trials, the tool concludes that it cannot contact or start the LT server, since the request with the given language is not successful.

Furthermore, on a 32-bit system with unsupported German for higher LT versions, yalafi.shell terminates in a way it shouldn't.

When sending the HTML request, we have to closer examine the reason of a failure.

Server with LanguageTool's interface?

It might be practical to have a small server on top of yalafi.shell that pretends to be LT's server, but additionally performs LaTeX filtering and position mapping.

Add vowel detection for text replacements

We use simple rotating replacements for maths material, as 'C-C-C'. Similarly, short foreign-language inclusions will be substituted this way in the surrounding text flow. This may cause false positives in English documents, if the replaced text part starts with a vowel, and is preceded by the article 'an'.

When the substituted text starts with a vowel, we should use a replacement starting with a vowel.

Provide byte offsets for certain Vim plugins

Vim-grammarous and ALE internally use Vim function matchaddpos() for error highlighting. This function expects byte, rather than character offsets for column numbers.
If character offsets are given, then highlighting may be shifted, see Issue #89@vim-grammarous.

  • vim-grammarous: add output option xml-b to yalafi.shell (vim-LanguageTool expects character offsets)
  • ALE: account for that in the planned JSON interface

EDIT: in ALE, one can simply set 'vcol': 1 in the linter component.

Some tests are not run

This seems to depend on the test function name. For instance, we had to change test_macros_latex() to test_macros_latex_builtins() in file tests/test_packages/test_latex_builtins.py.

Need to check that.

Macro to read definitions for YaLafi

It would be practical to have a macro like

\LTmacros{defs.tex}

It should expand to nothing, but read the LaTeX text in the given file and append its macro definitions to the current list.
At first sight, this might be possible with a "handler function" in yalafi/handlers.py.
A problem could be, however, character position tracking.

AttributeError: 'NoneType' object has no attribute 'lin' on tex-file

I run vlty via VIM plus vimtex on tex-file with quite complex template. As a result I get error:

AttributeError: 'NoneType' object has no attribute 'lin'

Run command

python3 -m yalafi.shell --lt-command languagetool --language ru \
        --disable "WHITESPACE_RULE" --enable "" \
        --disablecategories "" --enablecategories "" \
        --documentclass "" \
        --packages "amsbsy,keyval,fontenc,enumerate,eufrak,calc,inputenc,url,amsopn,babel,bm,amssymb,epstopdf-base,trig,amsmath,revsymb,amsthm,array,longtab le,caption2,natbib,amstext,mathtext,graphics,amsgen,graphicx,ifthen,caption3,extsizes,amsfonts" \
        --encoding cp866 \
        my_article.tex

Output:

=== my_article.tex
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.amsbsy'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.keyval'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.fontenc'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.enumerate'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.eufrak'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.calc'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.inputenc'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.url'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.amsopn'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.babel'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.bm'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.amssymb'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.epstopdf_base'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.trig'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.revsymb'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.array'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.longtab_le'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.caption2'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.natbib'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.amstext'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.mathtext'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.graphics'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.amsgen'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.ifthen'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.caption3'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.extsizes'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.amsfonts'
*** yalafi.shell: warning:
*** could not load module 'yalafi.documentclasses.revtex4'
*** yalafi.shell: warning:
*** could not load module 'yalafi.packages.enumerate'
Expected text language: Russian
Working on STDIN...
=== my_article.tex ===
1.) Line 18, column 20, Rule ID: UPPERCASE_SENTENCE_START
Message: Это предложение не начинается с заглавной буквы
Suggestion: Maik
 maik   Гипотеза Лемма    russian  О конечности чи...
 ^^^^

=== my_article.tex ===
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.8/site-packages/yalafi/shell/__main__.py", line 3, in <module>
    from . import shell
  File "/usr/lib/python3.8/site-packages/yalafi/shell/shell.py", line 351, in <module>
    gentext.generate_text_report(proofreader.run_proofreader, sys.stdout)
  File "/usr/lib/python3.8/site-packages/yalafi/shell/gentext.py", line 93, in generate_text_report
    output_text_report(tex, plain, charmap, matches, file, out)
  File "/usr/lib/python3.8/site-packages/yalafi/shell/gentext.py", line 48, in output_text_report
    s = (str(nr) + '.) Line ' + str(lc.lin) + ', column ' + str(lc.col)
AttributeError: 'NoneType' object has no attribute 'lin'

Options from --lt-options ignored on --as-server

On --as-server, all entries in --lt-options are ignored.
For example, this is wrong, if languagetool-commandline is internally used (no --server given), and something like --languagemodel should be passed to LT.

yalafi.shell - internal error: error reading JSON output from proofreader

This problem may occur with LanguageTool 5.0 and 5.1, it has been fixed with the daily snapshot from 2020/10/26 (see here). Versions 4.9.1 and below work well.

Apparently, only German texts are affected.

The problem is provoked with an input like

auf$\Omega$

The plain text sent to LanguageTool is aufC-C-C, and LanguageTool replies with an invalid JSON message.

This is due to the issue here. It has been fixed with the daily snapshot 2020-10-26.

Add setup.py to the repository.

setup.py should be in the repository, so that one can install yalafi using pip directly from github with pip install --user git+https://github.com/matze-dd/YaLafi.git@master.

Error with Python2: unqualified exec is not allowed in function ...

When using yalafi-grammarous I had to change python to python3 as python is Python 2 on my system. Leading to the error message:

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/huecking/.config/nvim/plugged/YaLafi/yalafi/shell/__main__.py", line 3, in <module>
    from . import shell
  File "yalafi/shell/shell.py", line 124, in <module>
    from yalafi import tex2txt
  File "yalafi/tex2txt.py", line 29, in <module>
    from . import parameters, parser, utils
  File "yalafi/parameters.py", line 23, in <module>
    from .defs import Environ, EquEnv, Macro
  File "yalafi/defs.py", line 19, in <module>
    from . import utils
  File "yalafi/utils.py", line 119
    exec('import ' + mod)
SyntaxError: unqualified exec is not allowed in function 'get_module_handler' because it contains a nested function with free variables

Maybe this could be change in the code? Otherwise this issue may just help people to know what they have to change. Maybe it could be mentioned in the docs that it works only with python3

Add CLI options for LT

For simpler configuration of editor interfaces, we should add to yalafi.shell options --enable, --disablecategories, --enablecategories.

Harmonise hack for error localisation in simple macro?

For an input like

\newcommand{\xxx}{XXERR}
\xxx

the spelling error is related to '\xxx' in an HTML report, due to a small hack already implemented in Tex2txt/shell.py.
For the other formats, only the single leading backslash of '\xxx' is given as error location.
It would be nice to activate the HTML-report hack in other cases, except for the plain-text report.

Provide option to change ltcommand

In yalafi/shell/shell.py, the language tool command is hardcoded, which works well if you directly
download languagetool. I installed it via my package manager, which gives me a binary instead of
jarfile in /usr/bin, and other files are located in /usr/share/languagetool/. In order to get it working I had to do the following change in yalafi/shell/shell.py

# ltcommand = 'java -jar languagetool-commandline.jar --json --encoding utf-8'
ltcommand = 'languagetool --json --encoding utf-8'

I was then able to generate a report by using the following command

python -m yalafi.shell --lt-directory /usr/share/languagetool/ --output html draft.tex > draft.html

It would be nice to provide a command line option to change ltcommand as well.

Avoid unnecessary test requests to LT server

When using yalafi.shell for an editor interface, then a local LT server is used on option --server my. In this case, for each invocation of yalafi.shell by the editor, we first send a small test request in order to check for a running server.
Instead, we should send the real request immediately, and try to start the local server only afterwards, if nobody is responding.

Integration via Docker?

For Vim and Emacs usage, putting all the components together might be rather complex.
We should check, whether integrating things with Docker makes life easier.

Extension mechanism for standard LaTeX packages

Currently, only a subset of commonly used LaTeX macros and environments is recognised "out-of-the-box". In order to ease application , we could add a submodule, say yalafi.packages, that contains further Python files with definitions like in example [definitions.py]. For a LaTeX package, the corresponding file would provide initial versions for important macros and environments.

These "subsubmodules" could be activated via command-line option for yalafi and yalafi.shell, for instance ... --packages amsmath,hyperref.

EDIT
Additionally, one could provide an option --root-document file for yalafi.shell. The script would read that file and extract package information from \documentclass and \usepackage.

In both cases, it is important to ensure a proper evaluation order. First, macros and environments from yalafi/parameters.py, then from package extensions, finally from user-declarations given by --define and --python-defs. This has to be independent of declaration method, LaTeX or Python code.

lt_command not working with vimtex

Hi, thanks for this great tool!

I noticed the recent PR #62 and issue #60, allowing users to specify alternative languagetool command. I replaced the original vlty.vim in vimtex with the one that comes with YaLafi to try it. Based on README, I commented out g:vimtex_grammar_vlty.lt_directory and only specified g:vimtex_grammar_vlty.lt_command. However, vimtex then complained lt_directory path not valid.

I'm not familiar with vim script, but I think it has something to do with the checking code here in vlty.vim?

https://github.com/matze-dd/YaLafi/blob/f932ce9f78524d7b5ba7b6b55eea7370297a0600/editors/vlty.vim#L39-L43

bibitem isn't processed as expected

Suppose we have bibliography in our tex-file:

\begin{thebibliography}{99}
  \bibitem{paper_name}
\end{thebibliography}

yalafi will process paper_name as sentence but not the technical label. So one sees an error on that line.

Glossaries comands support

Could the glossaries / glossaries-extra packages commands be supported?
e.g. \acr, \gls, \glspl etc commands

I know you've written some documentation on how to extend and support different packages, but I haven't got the time currently to do it myself (though hopefully, I'll revisit this issue at some time). Thanks for YaLafi, it's really great!

Incorrect filtering of \\[length]

The hard linebreak \\[length] with vertical skip is not removed. Tex2txt did that correctly :-(.

EDIT: The first fix version removes a linebreak after \\, if no [...] is following. We try to avoid that.

Unexpected WORD_REPEAT_RULE on big multiline formulas with aligned

I have a latex-file with math formulas

For example I have equation:

\begin{equation}
  \begin{aligned}
    f_{13}(x, z) = & x^{3}+ \frac{1}{12} \left(-24 z^{4} + 72 z^{3} - 70 z^{2} + 112 z - 76\right) x^{2} + \\
 & +\frac{1}{11} \left(2877 z^{4} - 9184 z^{3} + 13080 z^{2} - 23436 z + 24318\right) x + \\
 & +\frac{1}{4} \left( 10224 z^{4} - 31451 z^{3} + 46509 z^{2} - 83811 z + 80129 \right).
  \end{aligned}
\end{equation}

and yalafi yields WORD_REPEAT_RULE on this line:

=== my_file.tex ===
28.) Line 547, column 5, Rule ID: WORD_REPEAT_RULE
Message: Возможная опечатка: повтор слова
Suggestion: U-U-U
...ветствует B-B-B, и этим значениям отвечает   U-U-U U-U-U     plus V-V-V     plus W-W-W.  Разложение э...
                                                ^^^^^^^^^^^

Error highlighting with vim-LanguageTool may fail

For highlighting of errors in the source text buffer, vim-LanguageTool does not directly use the error location returned by the proofreader. Instead, it tries to identify the reported problematic text part at the line given by the proofreader (application of Vim's matchadd()). This fails for

\newcommand\books{books}
A \books{} are interesting.

The mistake 'A books' is correctly reported for line 2, but it cannot be highlighted in the source text buffer.

Support for macros like \newtheorem

With a minor reorganisation of method Parser.init_package(), we could provide an iterface for handler functions of LaTeX macros. For instance, it would allow them to add an environment type (possibly with its own handler function).
This way, we could easily implement something like \newtheorem.

Harmonise treatment of maths macros

  • Some macros like \quad are declared twice in yalafi/parameters.py, both for text and math mode. As the maths parser expands "normal" macros before simplification of maths material, a single definition in text mode should suffice, e.g., \newcommand{\quad}{\;}.
  • Some macros like \medspace from package amsmath should only be loaded by a package extension, compare Issue #28.
  • The same accounts for \text in Parameters.math_text_macros as well as for equation and theorem environments.

Better treatment of errors for editor plug-in

When using the script for an editor plug-in, we should not harshly terminate if a LaTeX error occurs (like missing end of an equation).
Instead, we could try to recover somehow and include a descriptive message (containing a bold spelling error) in the text sent to the proofreader. This would generate a "normal" proofreader message that is hopefully placed at the right position in the text.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.