distributedproofreaders / ppwb Goto Github PK
View Code? Open in Web Editor NEWPost Processor's Workbench
License: GNU General Public License v3.0
Post Processor's Workbench
License: GNU General Public License v3.0
Please add this option, with an edit box to enter CSS text to pass.
The pptools site provides an example.
It's common DP practice to represent table borders with |
in the text version of a book. In HTML, borders can be created with CSS, so the pipe characters are usually removed. When comparing the two versions with ppcomp, this leads to sometimes hundreds of lines of diffs to look through, and finding any genuine diffs that need fixing is needle-in-haystack level of difficulty.
The example that broke me today: https://www.pgdp.net/d/ppwb/r63aebbc4c881a/result.html
There are at least two genuine diffs in that report. Good luck. ;)
So: I'd really like to be able to run ppcomp with an option to ignore the pipe character, to make the output on table-heavy books like this one much shorter and much easier to review.
(Bonus points for also excluding sequences that look like --+----+--
or ==+====+==
, but those occur much less frequently and aren't so much trouble to scroll past.)
The attached html and text files cause the following error log:
Traceback (most recent call last): File "./bin/comp_pp.py", line 1621, in main() File "./bin/comp_pp.py", line 1616, in main _, html_content, fn1, fn2 = x.do_process() File "./bin/comp_pp.py", line 1338, in do_process f.load(fname) File "./bin/comp_pp.py", line 694, in load self.myfile.load_xhtml(filename, relax=True) File "./bin/comp_pp.py", line 223, in load_xhtml os.path.basename(name)) SyntaxError: Parsing errors in document: lettersfromandoldtimesalesman.html
The HTML file validates OK at W3C.
The old standalone ppcomp had a way of recognizing that square brackets around footnote anchors in plain text should not be reported (when there aren't corresponding brackets around superscripted anchors in HTML). Can an option for that be added to this version?
Not sure where the problem is, the current ppcomp.py in Github is correct, and the PHP code looks good to me. I locally change a </p>
to a </div>
and redirect output to a file, I get:
((5731, 44), 'end-tag-too-early', {'name': 'div'})
.
But on the web site I just get:
`Whoops! Something went wrong and no output was generated. The error message was
For more assistance, ask in the discussion topic and include this identifier: r6268af6feae7d`.
Request
Change the pptext validation to allow no language to be selected as long as spellcheck is also not selected. Bonus: allowing pptext to dynamically load the list of available dictionary languages and populate the checkboxes.
Discussion
The following discussion was pulled from #9
[U]nder "Select wordlist language(s), is the implication that pptext should not be run if a text is in a language not listed? One of my test projects is in Dutch, which is not listed. Even if spellcheck is not possible because of a missing dictionary, I would think that some of the other checks (excluding jeebies) would be useful. Could an "Other non-English" option be added that would disable spellcheck unless a good words file is provided?
I think it's fine to run pptext if it's not a language on the list. Perhaps a user would not want to tick the "run spellcheck" box in some cases. And it's not really right to me to have a list anyway because there are perfectly good dictionaries loaded that do not have a checkbox. Wouldn't surprise me if Dutch were actually available.
There is an issue with the spell check for Portuguese, at a minimum, and possible other non-English languages. When you run it with Portuguese checked as your only language, the spell check only returns words starting with 'a'. If you add English also, it will spell check the other letters. Laura Natal reported on the DP forums that doing this with Spanish as a secondary language will also resolve the issue, although I have not personally tried that.
There are some more details here from my original question about it and her replies: https://www.pgdp.net/phpBB3/viewtopic.php?p=1293475#p1293475
Since I'm working on another book today and realized it still wasn't fixed, I figured I'd log an official ticket, including the files I'm using right now.
good_words.txt
projectID62c58d2177033.txt
I can provide some more sample files if needed.
It is working from the command line, but not on either the test site or the main site. I am also selecting "add [Illustration ] tag" and " Ignore case".
My command line: "python3 ppcomp.py ../tests/tower.htm ../tests/tower.txt --ignore-case --css-add-sidenote --css-add-illustration >../tests/compare.html"
tower.zip
Why are paragraphs which end in a period, question mark, or exclamation mark followed by Right Single Quotation Mark considered unexpected?
For example these 3 on my current output:
...“‘Well, youngster, what are you looking for here?’
...got yourself into a scrape with your meddlesome disposition.’
...“‘I am no beggar!’
Not an urgent issue.
The option Type of text cleaning:
can be removed, there is no code to support it.
I'm working on a major upgrade to ppcomp to handle HTML5 & other changes, and noticed this.
When the"Extract and process footnotes separately" option is selected, the Workbench's ppcomp returns this error message:
`Whoops! Something went wrong and no output was generated. The error message was
For more assistance, ask in the discussion topic and include this identifier: r625483e56d486`
(There is no error message.) The attached files will demonstrate this when these options are selected:
Ignore case when comparing
Extract and process footnotes separately
Suppress "[Illustration:" marks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.