GithubHelp home page GithubHelp logo

Comments (6)

jesusbagpuss avatar jesusbagpuss commented on August 17, 2024

@aalbinclark does this relate to the filesize in any way? The example is a thesis, which could be a large document.
It would be useful to get the corresponding lines from the EPrints/Apache logs to see whether the EPrints server is returning an error response, or if e.g. the transfer is timing-out.

[Edit: in the logs provided, there is a 'sizelong=64496457' - which looks like 64M. Not that big in the grand scheme of things, but possibly big enough to cause timeouts]

from eprints3.4.

drn05r avatar drn05r commented on August 17, 2024

@aalbinclark EPrints Apache error log messages would be really useful. (If you have separate error logs for HTTP and HTTPS be sure to check both). Also, it would be useful if you could confirm whether you have only just setup Pure to deposit to EPrints or whether the issue has only occurred since upgrading either Pure or EPrints. If the issue has only occurred since an upgrade to one of the systems, please let us know what version you were running prior to the upgrade.

from eprints3.4.

aalbinclark avatar aalbinclark commented on August 17, 2024

EPrints has been running for over fifteen years and has been updated via the Pure Deposit API for many years. We installed a new instance of EPrints (3.4.5) mid-November 2023.

Pure EPrints Issue?
5.27.1-2 3.4.2 No
5.27.3 3.4.2 No
5.27.3 3.4.5 Yes
5.28.2 3.4.5 Yes

I've compiled a fuller set of logs for another eprint https://eprints.lancs.ac.uk/id/eprint/217096/. File size is only 6MB.

Pure audit logs

EPrints Apache ssl_access_log

148.88.22.127 - - [27/Mar/2024:15:40:05 +0000] "POST /pda/handler/eprint/217096/documents/2024DolmorMScRes.pdf?license=creative_commons_attribution_noncommercial_4_0_international_license&visibility=public&description=2024DolmorMScRes&documentVersion=published HTTP/1.1" 401 -
148.88.22.127 - - [28/Mar/2024:01:20:24 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 401 -
148.88.22.127 - - [28/Mar/2024:01:20:24 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 200 129
148.88.22.127 - - [28/Mar/2024:01:20:24 +0000] "POST /pda/handler/eprint/217096/documents/2024DolmorMScRes.pdf?license=creative_commons_attribution_noncommercial_4_0_international_license&visibility=public&description=2024DolmorMScRes&documentVersion=published HTTP/1.1" 401 -
148.88.22.127 - - [29/Mar/2024:01:46:01 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 401 -
148.88.22.127 - - [29/Mar/2024:01:46:01 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 200 129
148.88.22.127 - - [29/Mar/2024:01:46:02 +0000] "POST /pda/handler/eprint/217096/documents/2024DolmorMScRes.pdf?license=creative_commons_attribution_noncommercial_4_0_international_license&visibility=public&description=2024DolmorMScRes&documentVersion=published HTTP/1.1" 401 -
148.88.22.127 - - [30/Mar/2024:01:21:56 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 401 -
148.88.22.127 - - [30/Mar/2024:01:21:56 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 200 129
148.88.22.127 - - [30/Mar/2024:01:21:57 +0000] "POST /pda/handler/eprint/217096/documents/2024DolmorMScRes.pdf?license=creative_commons_attribution_noncommercial_4_0_international_license&visibility=public&description=2024DolmorMScRes&documentVersion=published HTTP/1.1" 401 -
148.88.22.127 - - [31/Mar/2024:01:34:13 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 401 -
148.88.22.127 - - [31/Mar/2024:01:34:13 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 200 129
148.88.22.127 - - [31/Mar/2024:01:34:13 +0000] "POST /pda/handler/eprint/217096/documents/2024DolmorMScRes.pdf?license=creative_commons_attribution_noncommercial_4_0_international_license&visibility=public&description=2024DolmorMScRes&documentVersion=published HTTP/1.1" 401 -
148.88.22.127 - - [02/Apr/2024:01:03:37 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 401 -
148.88.22.127 - - [02/Apr/2024:01:03:37 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 200 129
148.88.22.127 - - [02/Apr/2024:01:03:37 +0000] "POST /pda/handler/eprint/217096/documents/2024DolmorMScRes.pdf?license=creative_commons_attribution_noncommercial_4_0_international_license&visibility=public&description=2024DolmorMScRes&documentVersion=published HTTP/1.1" 401 -
148.88.22.127 - - [03/Apr/2024:00:36:24 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 401 -
148.88.22.127 - - [03/Apr/2024:00:36:24 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 200 129
148.88.22.127 - - [03/Apr/2024:00:36:25 +0000] "POST /pda/handler/eprint/217096/documents/2024DolmorMScRes.pdf?license=creative_commons_attribution_noncommercial_4_0_international_license&visibility=public&description=2024DolmorMScRes&documentVersion=published HTTP/1.1" 401 -
148.88.22.127 - - [04/Apr/2024:00:27:32 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 401 -
148.88.22.127 - - [04/Apr/2024:00:27:32 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 200 129
148.88.22.127 - - [04/Apr/2024:00:27:33 +0000] "POST /pda/handler/eprint/217096/documents/2024DolmorMScRes.pdf?license=creative_commons_attribution_noncommercial_4_0_international_license&visibility=public&description=2024DolmorMScRes&documentVersion=published HTTP/1.1" 401 -
148.88.22.127 - - [05/Apr/2024:00:33:33 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 401 -
148.88.22.127 - - [05/Apr/2024:00:33:33 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 200 129
148.88.22.127 - - [05/Apr/2024:00:33:34 +0000] "POST /pda/handler/eprint/217096/documents/2024DolmorMScRes.pdf?license=creative_commons_attribution_noncommercial_4_0_international_license&visibility=public&description=2024DolmorMScRes&documentVersion=published HTTP/1.1" 401 -
148.88.22.127 - - [06/Apr/2024:00:18:10 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 401 -
148.88.22.127 - - [06/Apr/2024:00:18:10 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 200 129
148.88.22.127 - - [06/Apr/2024:00:18:10 +0000] "POST /pda/handler/eprint/217096/documents/2024DolmorMScRes.pdf?license=creative_commons_attribution_noncommercial_4_0_international_license&visibility=public&description=2024DolmorMScRes&documentVersion=published HTTP/1.1" 401 -
148.88.22.127 - - [07/Apr/2024:00:02:35 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 401 -
148.88.22.127 - - [07/Apr/2024:00:02:35 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 200 129
148.88.22.127 - - [07/Apr/2024:00:02:36 +0000] "POST /pda/handler/eprint/217096/documents/2024DolmorMScRes.pdf?license=creative_commons_attribution_noncommercial_4_0_international_license&visibility=public&description=2024DolmorMScRes&documentVersion=published HTTP/1.1" 401 -
148.88.22.127 - - [07/Apr/2024:23:57:35 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 401 -
148.88.22.127 - - [07/Apr/2024:23:57:35 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 200 129
148.88.22.127 - - [07/Apr/2024:23:57:36 +0000] "POST /pda/handler/eprint/217096/documents/2024DolmorMScRes.pdf?license=creative_commons_attribution_noncommercial_4_0_international_license&visibility=public&description=2024DolmorMScRes&documentVersion=published HTTP/1.1" 401 -
148.88.22.127 - - [07/Apr/2024:23:57:36 +0000] "POST /pda/handler/eprint/217096/documents/2024DolmorMScRes.pdf?license=creative_commons_attribution_noncommercial_4_0_international_license&visibility=public&description=2024DolmorMScRes&documentVersion=published HTTP/1.1" 200 92
10.56.49.214 - - [09/Apr/2024:09:39:45 +0000] "GET /id/eprint/217096/12/2024DolmorMScRes.pdf HTTP/1.1" 200 6052873
148.88.22.127 - - [09/Apr/2024:09:40:38 +0000] "GET /pda/handler/eprint/217096/documents/124480 HTTP/1.1" 401 -
148.88.22.127 - - [09/Apr/2024:09:40:38 +0000] "GET /pda/handler/eprint/217096/documents/124480 HTTP/1.1" 200 6052873
148.88.22.127 - - [09/Apr/2024:09:45:05 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 401 -
148.88.22.127 - - [09/Apr/2024:09:45:05 +0000] "PUT /pda/handler/eprint/217096 HTTP/1.1" 200 129
148.88.22.127 - - [09/Apr/2024:09:45:06 +0000] "PUT /pda/handler/eprint/217096/documents/124480 HTTP/1.1" 401 -
148.88.22.127 - - [09/Apr/2024:09:45:06 +0000] "PUT /pda/handler/eprint/217096/documents/124480 HTTP/1.1" 200 127

from eprints3.4.

jesusbagpuss avatar jesusbagpuss commented on August 17, 2024

@aalbinclark - for the Pure connector, I normally expect to see a 401 (unauthorised) followed by a 200 response.
For the failing POST requests, EPrints responds with the normal 401, but there isn't a subsequent request (with the required auth, that would then get the 200 response).

This could mean:

  • Pure isn't making the follow-up request (maybe didn't get the 401 response or couldn't formulate the follow-up request)
  • something is blocking the follow-up request (a WAF (Web Application Firewall) perhaps?)
  • something is so broken that Apache isn't logging the request - or it's getting logged elsewhere (in a multi-vhost environment, this could be possibly)
  • (maybe other things)

The POST on 7th April did have the follow-up request - which got a 200 response, and looks like it was successful. The GET on the 9th resulted in a 6M response.

What changed between the 6th and 7th - either in EPrints, Pure, or channels between them?

One thing from the logs is the timing of the deposit - shortly before midnight, to shortly after midnight. Do any jobs start on the EPrints server at midnight that might impact things e.g. database or storage backups?


In case it's useful, I've got a modified version of the PDA modules that allows you keep the raw files as sent across from Pure - both the EPrints XML and the Document payloads.

Search this gist:
https://gist.github.com/jesusbagpuss/757138e884eba03a9ad30d7a1df52bc6
for 'pda_keep_tmp_files' to see what's needed.
The files will be in /tmp/ (possibly under a systemd...httpd directory) with a filename that starts pdahandler.

It might be useful to see exactly what's being sent. If that all looks normal, then the files can be used to put together some tests using e.g. Postman.

Happy to describe the modified PDA code in more detail if you want. Will - drop me a line [email protected] if you want.

from eprints3.4.

drn05r avatar drn05r commented on August 17, 2024

@aalbinclark It is important to appreciate that there are three components here not just two:

  1. Pure
  2. EPrints
  3. PDA Handler (a.k.a. Pure Connector)

The PDA Handler was developed by Pure rather than using EPrints REST API (like I think @jesusbagpuss's implementation uses). This means in it is not officially supported by EPrints. It was only quite recently that I agreed with Pure that we could make a reference version of the PDA Handler available on EPrints GitHub Organisation.

As @jesusbagpuss has already said you would expect 401s followed by 200s so at an HTTP interface level everything seems to be working as expected. The issue looks like it might lie with how the PDA Handler calls whatever functions it uses to store a file to disk. I am not aware of any changes off hand that would have affected calls to EPrints functions that store files to disk, especially not any which would cause an intermittent issue like this. As @jesusbagpuss previously queried, I could believe an issue were smaller files uploading successfully but larger ones always failing but the same file uploading sometimes and not others feels unlikely to be a coding issue and more likely an enviromental one.

I would advise that you compare the reference version of the PDA Handler against what you are currently running on your repository to see if there are any differences. This will at least allow us to rule out any bug that may have been fixed. Unfortunately the latest version I am aware of that uses this reference version of the PDA Handler is only on version 3.4.3. To my knowledge there has not been any intermittent issues with zero byte files for this version.

It would also be useful that you could confirm that the only thing that has changed between the tests you did with EPrints 3.4.2 and EPrints 3.4.5. Did you tests run on the same version of operating system with similar configuration (e.g. SELinux state)? Were the underlying applications (e.g. Apache, mod_perl, MySQL, etc.) the same versions and running much the same configuration?

Finally, although the access logs look at expected, some warnings may still be being recorded in the Apache error logs, please can you check if there is you can find anything.

P.S. Just looking at the access log more closely, it looks like PUT commands always have a 401 followed by a 200 but the POST commands to upload a file almost always have a 401 but then no further 200, except at 07/Apr/2024:23:57:36. From what I have experienced from how Pure uses the PDA Handler, it never send credentials until it is explicitly told to do so with a 401 response. So what is odd here is what changes Pure's behaviour at 07/Apr/2024:23:57:36. I cannot think of any reason for this, unless something has changed about EPrint record 217096. It may be worth checking the history tab for this eprint to see if there are any changes between 07/Apr/2024:00:02:36 and 07/Apr/2024:23:57:36. Alternatively maybe something about the equivalent record in Pure has changed which means it is will to try uploading the file with credentials, when before it was not.

from eprints3.4.

drn05r avatar drn05r commented on August 17, 2024

Closing as this is not a development task or a bug directly related to the EPrints codebase (i.e. it involves third-party application/plugins)

from eprints3.4.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.