Comments (11)
from vmcp-upconversion.
The update is scheduled to start at midnight, GMT, and would typically be finished about 22 minutes later, or a bit more, depending on how many documents had been updated that day. So there's plenty of scope for updating more frequently than daily.
The slowest steps are the Dropbox sync which takes a second or so per document, and then converting each document from Word format to OpenDocument format, which can take up to several seconds for some documents. So I've optimised this part so that only the Word documents that have actually been edited since the last update are converted. This part should be completed in a few minutes, if there are only a few dozen documents changed.
The conversion from OpenDocument to TEI is relatively quick; it gets through about 18 documents per second; about 15 minutes to do the lot. And rebuilding XTF's index takes only 3:41 (about 70 documents / second). So for these steps I haven't bothered to track which documents need processing, and I'm just doing the entire corpus. This also means I can change the OpenDocument to TEI conversion process, too, and know that it will be applied to all the documents, not just ones that've been edited.
from vmcp-upconversion.
An update at 1800 GMT would be the only other useful time at the moment: I always (or nearly always) break from about 1715 until 2100, so an update at that time would allow me to check the effect of an afternoon's work that evening.
from vmcp-upconversion.
Another conversion is now scheduled for 18:00 UTC.
from vmcp-upconversion.
Noted, thanks
There is no button showing to let me re-open the issue
from vmcp-upconversion.
Haha yes, I thought that might be the case, because it was a pretty obvious button. Perhaps this is because you weren't officially a "collaborator" on the repository. I've invited you; let's see if that makes a difference. I thought I already had invited you, but now I realise I'd cancelled the invite, thinking that it wasn't necessary, and wanting to avoid complications. But being able to close and reopen issues is pretty important I think.
from vmcp-upconversion.
I looked at a couple of files of the 17 I changed today; Interestingly thereare both the current versions there and a previous version in XTF.
Compare
http://vmcp.conaltuohy.com/xtf/view?docId=tei/1880-9/1885/85-12-03a-final.xml
and
http://vmcp.conaltuohy.com/xtf/view?docId=tei/1880-9/1885/85-12-03a-proofed.xml
I found this because I just happened to search for the file in the total corpus, not just in final as I usually do.
Does your algorithm detect deleted files?
I think that this is what has happened; when Rod updates a file he does not just amend and chage the file name; he relaods the updated file with the suffix final and then deletes the earlier version. I then check that the styling and so on is correct by looking at the final version (and sometimes deleting the previous version if he has not done so). If your algorithm does not detect the deletion, we will quickly see an escalation in the apparent number of files in the corpus! I had not looked carefully at that, but I think it has happened.
I can go back and explore the problem files, my guess is approaching 200, but if I am correct about the explanation, it will not help as I won't be able to delete them when I find them.
Advice?
from vmcp-upconversion.
You are correct. I will need to deal with deletions.
from vmcp-upconversion.
I believe I've got deletions sorted now. Please reopen this issue if you spot any problems
from vmcp-upconversion.
Was there an update Jan 18 at 18.00 UTC?
Changes to files where I was fixing layouts did not show up at 22.00
from vmcp-upconversion.
It seems the update did take place at 18:00 hours; here is a listing of the folder of TEI files which XTF is serving up:
ubuntu@vmcp:~$ cat /etc/timezone
Etc/UTC
ubuntu@vmcp:~$ ls -l /usr/src/xtf/data/tei/
total 48
drwxr-xr-x 4 root root 4096 Jan 18 18:03 1840-9
drwxr-xr-x 12 root root 4096 Jan 18 18:04 1850-9
drwxr-xr-x 12 root root 4096 Jan 18 18:08 1860-9
drwxr-xr-x 12 root root 4096 Jan 18 18:11 1870-9
drwxr-xr-x 12 root root 4096 Jan 18 18:15 1880-9
drwxr-xr-x 10 root root 4096 Jan 18 18:17 1890-6
drwxr-xr-x 2 root root 4096 Jan 18 18:17 Dallachy notes
drwxr-xr-x 2 root root 4096 Jan 18 18:17 Envelopes
drwxr-xr-x 7 root root 4096 Jan 18 18:19 inscriptions
drwxr-xr-x 11 root root 4096 Jan 18 18:19 Mentions
drwxr-xr-x 2 root root 4096 Jan 18 18:19 Misc
drwxr-xr-x 2 root root 4096 Jan 18 18:19 no date letters
So the updates do seem to be happening, even if only up to the point where the TEI is generated. However, the remainder of the processing is I think unlikely to fail. When I last ran the conversion process manually and watched the messages generated while it runs, it ran without a problem. I haven't until now had a log of the messages generated when the conversion was run automatically on a schedule, but I have now made a tweak to the scheduled conversion process so that it keeps a log; this will make it easy to review the last update process. http://vmcp.conaltuohy.com/conversion-log.txt
In the meantime, I'm going to close this issue again, and ask you to open a separate issue with specific details of what you are seeing (or not seeing). If it turns out the automatic updates are failing (at some point after generating the TEI), I'll re-open this issue or create a new one as appropriate.
Cheers!
from vmcp-upconversion.
Related Issues (20)
- Finding a set of files without a given style HOT 4
- Oddity in "title"; false positives for German HOT 6
- display of equations and super and subscripts. HOT 12
- Detecting addressee in letters in Mentions folders HOT 2
- No persons in correspondent line HOT 10
- letters dated to decade, not year HOT 3
- Strange behavior in Addressee facet HOT 3
- Tables in footnotes are flattened HOT 20
- capture document metadata HOT 1
- size of apparatus files HOT 7
- Finding embedded objects HOT 1
- An unexpected servlet error has occurred. HOT 2
- Searching for correspondents with "[...]" as the name
- One file not opening in XTF HOT 2
- Odd behaviopur in footnotes in some files HOT 4
- Indents showing as outdents HOT 2
- Coversion run crashed 18 March 18:00 UTC HOT 1
- Physical location field in Document information: Tweak for XProc display
- Closing pop-up window in search for places plant name is used: tweak for XProc display
- Underscore in XProc version
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vmcp-upconversion.