GithubHelp home page GithubHelp logo

Daily updates about vmcp-upconversion HOT 11 CLOSED

conal-tuohy avatar conal-tuohy commented on June 4, 2024 1
Daily updates

from vmcp-upconversion.

Comments (11)

LucasHorseshoeBend avatar LucasHorseshoeBend commented on June 4, 2024 1

from vmcp-upconversion.

Conal-Tuohy avatar Conal-Tuohy commented on June 4, 2024

The update is scheduled to start at midnight, GMT, and would typically be finished about 22 minutes later, or a bit more, depending on how many documents had been updated that day. So there's plenty of scope for updating more frequently than daily.

The slowest steps are the Dropbox sync which takes a second or so per document, and then converting each document from Word format to OpenDocument format, which can take up to several seconds for some documents. So I've optimised this part so that only the Word documents that have actually been edited since the last update are converted. This part should be completed in a few minutes, if there are only a few dozen documents changed.

The conversion from OpenDocument to TEI is relatively quick; it gets through about 18 documents per second; about 15 minutes to do the lot. And rebuilding XTF's index takes only 3:41 (about 70 documents / second). So for these steps I haven't bothered to track which documents need processing, and I'm just doing the entire corpus. This also means I can change the OpenDocument to TEI conversion process, too, and know that it will be applied to all the documents, not just ones that've been edited.

from vmcp-upconversion.

LucasHorseshoeBend avatar LucasHorseshoeBend commented on June 4, 2024

An update at 1800 GMT would be the only other useful time at the moment: I always (or nearly always) break from about 1715 until 2100, so an update at that time would allow me to check the effect of an afternoon's work that evening.

from vmcp-upconversion.

Conal-Tuohy avatar Conal-Tuohy commented on June 4, 2024

Another conversion is now scheduled for 18:00 UTC.

from vmcp-upconversion.

LucasHorseshoeBend avatar LucasHorseshoeBend commented on June 4, 2024

Noted, thanks
There is no button showing to let me re-open the issue
no re-open issue

from vmcp-upconversion.

Conal-Tuohy avatar Conal-Tuohy commented on June 4, 2024

Haha yes, I thought that might be the case, because it was a pretty obvious button. Perhaps this is because you weren't officially a "collaborator" on the repository. I've invited you; let's see if that makes a difference. I thought I already had invited you, but now I realise I'd cancelled the invite, thinking that it wasn't necessary, and wanting to avoid complications. But being able to close and reopen issues is pretty important I think.

from vmcp-upconversion.

LucasHorseshoeBend avatar LucasHorseshoeBend commented on June 4, 2024

I looked at a couple of files of the 17 I changed today; Interestingly thereare both the current versions there and a previous version in XTF.
Compare
http://vmcp.conaltuohy.com/xtf/view?docId=tei/1880-9/1885/85-12-03a-final.xml
and
http://vmcp.conaltuohy.com/xtf/view?docId=tei/1880-9/1885/85-12-03a-proofed.xml

I found this because I just happened to search for the file in the total corpus, not just in final as I usually do.

Does your algorithm detect deleted files?

I think that this is what has happened; when Rod updates a file he does not just amend and chage the file name; he relaods the updated file with the suffix final and then deletes the earlier version. I then check that the styling and so on is correct by looking at the final version (and sometimes deleting the previous version if he has not done so). If your algorithm does not detect the deletion, we will quickly see an escalation in the apparent number of files in the corpus! I had not looked carefully at that, but I think it has happened.

I can go back and explore the problem files, my guess is approaching 200, but if I am correct about the explanation, it will not help as I won't be able to delete them when I find them.
Advice?

from vmcp-upconversion.

Conal-Tuohy avatar Conal-Tuohy commented on June 4, 2024

You are correct. I will need to deal with deletions.

from vmcp-upconversion.

Conal-Tuohy avatar Conal-Tuohy commented on June 4, 2024

I believe I've got deletions sorted now. Please reopen this issue if you spot any problems

from vmcp-upconversion.

LucasHorseshoeBend avatar LucasHorseshoeBend commented on June 4, 2024

Was there an update Jan 18 at 18.00 UTC?
Changes to files where I was fixing layouts did not show up at 22.00

from vmcp-upconversion.

Conal-Tuohy avatar Conal-Tuohy commented on June 4, 2024

It seems the update did take place at 18:00 hours; here is a listing of the folder of TEI files which XTF is serving up:

ubuntu@vmcp:~$ cat /etc/timezone
Etc/UTC
ubuntu@vmcp:~$ ls -l /usr/src/xtf/data/tei/
total 48
drwxr-xr-x  4 root root 4096 Jan 18 18:03 1840-9
drwxr-xr-x 12 root root 4096 Jan 18 18:04 1850-9
drwxr-xr-x 12 root root 4096 Jan 18 18:08 1860-9
drwxr-xr-x 12 root root 4096 Jan 18 18:11 1870-9
drwxr-xr-x 12 root root 4096 Jan 18 18:15 1880-9
drwxr-xr-x 10 root root 4096 Jan 18 18:17 1890-6
drwxr-xr-x  2 root root 4096 Jan 18 18:17 Dallachy notes
drwxr-xr-x  2 root root 4096 Jan 18 18:17 Envelopes
drwxr-xr-x  7 root root 4096 Jan 18 18:19 inscriptions
drwxr-xr-x 11 root root 4096 Jan 18 18:19 Mentions
drwxr-xr-x  2 root root 4096 Jan 18 18:19 Misc
drwxr-xr-x  2 root root 4096 Jan 18 18:19 no date letters

So the updates do seem to be happening, even if only up to the point where the TEI is generated. However, the remainder of the processing is I think unlikely to fail. When I last ran the conversion process manually and watched the messages generated while it runs, it ran without a problem. I haven't until now had a log of the messages generated when the conversion was run automatically on a schedule, but I have now made a tweak to the scheduled conversion process so that it keeps a log; this will make it easy to review the last update process. http://vmcp.conaltuohy.com/conversion-log.txt

In the meantime, I'm going to close this issue again, and ask you to open a separate issue with specific details of what you are seeing (or not seeing). If it turns out the automatic updates are failing (at some point after generating the TEI), I'll re-open this issue or create a new one as appropriate.

Cheers!

from vmcp-upconversion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.