Comments (6)
The "title" issue and the false positive for German are probably distinct issues.
The title that displays in
http://vmcp.conaltuohy.com/xtf/view?docId=tei/Mueller%20letters/1850-9/1858/58-10-07a-final.xml
is disconcerting, as it is counterintuitive to have to click upon [...] to open the file
There are two others with the same feature in 1858 files:
http://vmcp.conaltuohy.com/xtf/view?docId=tei/Mueller%20letters/1850-9/1858/58-11-00-final.xml
and
http://vmcp.conaltuohy.com/xtf/view?docId=tei/Mueller%20letters/1850-9/1858/58-12-00a-final.xml
but I can't pick up a common feature of those files in Word.
There are eight 1858 files coded German that do have German text that behave as expected.
The three identified above are the only files in 1858 where this issue arises.
I did spot checks:
1840s folders, no such cases when German selected, nor when English selected
1868 folder, one case of the problem when faceted as English, and is in English,
http://vmcp.conaltuohy.com/xtf/view?docId=tei/Mueller%20letters/1860-9/1868/68-12-06-draft.xml
BUT the same file appears with same "title" when I select the in German facet!!
So there is something odd about the recognition of languages also.
from vmcp-upconversion.
The resaving did not eliminate the "title" as [...]
the four files all report as being both English and German
I do not understand this behaviour.
from vmcp-upconversion.
Thanks for your detective work, here @LucasHorseshoeBend ; based on your analysis I have got to the bottom of the missing title and the language mis-classification, and they are indeed related, as you surmised.
The immediate cause of the problem in 58-10-07a-final
is that the letter contains an empty trailing paragraph which is styled with the style t-letter
.
The chain of events leading from that empty paragraph to the text being classified as German and also having a missing title is this:
- The paragraphs whose style names begin with
t-
are assumed to be translations (into English), and the remainder of the letter is, by elimination, assumed to be in German. So that single blankt-letter
para is what causes all the text of the letter to be tagged as German. - Because XTF irritatingly requires every TEI document to have a title, which these letters don't generally have, my pipeline has a step which adds a heading which it derives from the opening lines of the letter (an "incipit") suffixed with an ellipsis. In order to give the documents English language titles, the incipit includes only text which has been tagged as English, but in this case the only English language paragraph is that empty
t-letter
paragraph, so the incipit also ends up empty.
I think that I can fix the whole thing by a small tweak to the translation-recognition step so it ignores paragraphs which are styled as a translation if they are empty.
from vmcp-upconversion.
from vmcp-upconversion.
Cheers @LucasHorseshoeBend; since it was a trivial tweak I just went ahead and did it.
from vmcp-upconversion.
from vmcp-upconversion.
Related Issues (20)
- Finding a set of files without a given style HOT 4
- display of equations and super and subscripts. HOT 12
- Detecting addressee in letters in Mentions folders HOT 2
- No persons in correspondent line HOT 10
- letters dated to decade, not year HOT 3
- Strange behavior in Addressee facet HOT 3
- Tables in footnotes are flattened HOT 20
- capture document metadata HOT 1
- size of apparatus files HOT 7
- Finding embedded objects HOT 1
- An unexpected servlet error has occurred. HOT 2
- Searching for correspondents with "[...]" as the name
- One file not opening in XTF HOT 2
- Odd behaviopur in footnotes in some files HOT 4
- Indents showing as outdents HOT 2
- Coversion run crashed 18 March 18:00 UTC HOT 1
- Physical location field in Document information: Tweak for XProc display
- Closing pop-up window in search for places plant name is used: tweak for XProc display
- Underscore in XProc version
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vmcp-upconversion.