GithubHelp home page GithubHelp logo

eprints / eprints3.4 Goto Github PK

View Code? Open in Web Editor NEW
30.0 12.0 27.0 5.39 MB

EPrints 3.4 core and releases

Home Page: http://www.eprints.org/uk/index.php/eprints-3-4/

License: GNU Lesser General Public License v3.0

Makefile 0.02% Perl 96.62% Shell 0.05% M4 0.08% JavaScript 0.97% CSS 1.40% XSLT 0.85%

eprints3.4's Introduction

NAME

Join the chat at https://gitter.im/eprints/eprints

GNU EPrints README

SYNOPSIS

Installation guide: https://wiki.eprints.org/w/Installation.

For more information see https://www.eprints.org/software/.

NOTE:

DESCRIPTION

EPrints is a document management system aimed at Higher Educational institutions and individuals. EPrints has been used to provide Open Access to research results, sharing of educational resources and providing portfolios of work. It has a flexible metadata and workflow model to support varied business needs as well as interaction with other Web and institutional systems.

CONTACT

For support options please see https://www.eprints.org/software/.

Enquiries that can not be made via our public mailing list (e.g. security concerns) may be sent to [email protected].

EPrints can be contacted in the real world at

EPrints Services
Electronics and Computer Science
Faculty of Engineering and Physical Sciences
University of Southampton
Southampton, SO17 1BJ
United Kingdom

COPYRIGHT

Copyright 2000-2022 University of Southampton.

EPrints is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

EPrints is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with EPrints. If not, see https://www.gnu.org/licenses/.

eprints3.4's People

Contributors

alex-ball avatar aofc avatar cook879 avatar dgc avatar drn05r avatar fatchild avatar gobfrey avatar jb4 avatar jesusbagpuss avatar mpbraendle avatar pfiffikus avatar tgoeg avatar wfyson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eprints3.4's Issues

Citation caching breaks the Bazaar

As EPM citations are generated without setting a non-zero objectid the caching causes all Bazaar entries to look the same. There is not real need to cache EPM citations so they should be explicitly excluded.

Review use of XML libraries

Historical EPrints has supported multiple different XML libraries. It seem as though XML::LibXML has now pretty much won out. Consider whether the codebase can be simplified to only support the XML::LibXML library and remove this from the codebase and use the appropriate distributed version of this library.

Bug in Tokenizer

See eprints/eprints#514

Basicaly there is a typo where $utext rather than $text variable should be used in two places in the function EPrints::Index::Tokenizer::apply_mapping.

tools/epm command unavailable on Ubuntu 18.04

After installing eprints 3.4 using the ubuntu package there is nothing available at tools/epm to install any packages from the bazaar with.

Has this been changed in 3.4 over 3.3?

Load issue is EPrints::Sword::Utils

On occasion EPrints::Sword::Utils has a load order issue and EPrints::Apache::Sword is not loaded when it is needed. To fix this a use line for this needs to be added to perl_lib/EPrints/Sword/Utils.pm

null phrase not used when Date field value is undefined

If a year browse view has items that have no date set rather than there being a clickable link like "Not Specified" there is just the number of items with no way of accessing them. This is due to a get_value_label method for EPrints::MetaField::Date that does not take into considering the value being empty, which is considered in the super classes equivalent method. This test and then use of the null phrase needs to be ported to the EPrints::MetaField::Date get_value_label method

EPM sources broken

The default config that deals with EPM sources is mis-matched:
In the config file, $c->{epm_sources} is used:

# EPM Configuration File
$c->{epm_sources} = [] if !defined $c->{epm_sources};
# Define the EPM sources
push @{$c->{epm_sources}}, {
name => "EPrints Bazaar",
base_url => "https://bazaar.eprints.org",
};

Whereas in the Bazaar screen, $c->{epm}->{sources} is used:

my $sources = $repo->config( "epm", "sources" );
$sources = [
{ name => "EPrints Bazaar", base_url => "https://bazaar.eprints.org/" }
] if !defined $sources;

Both deal with the case that there are no sources by adding the default bazaar.eprints.org URL.

Not sure whether $c->{epm_sources} or $c->{epm}->{sources} is the more preferable fix..?

Enabling HTTPS everywhere breaks OAI-PMH

HTTPS everywhere can be enabled by only setting $c->{securehost} and not $c->{host} in configuration but this can break OAI-PMH as if the archive_id is not set explicitly it uses the $->{host} config variable instead. This needs to be modified to further try the $c->{securehost} if $c->{host} is not set as the repository is configured for HTTPS everywhere.

Create proper Keywords MetaField

EPrints has a keywords field but under the hood it is just a Longtext field and therefore searching on keywords is not very reliable. In particular keywords that don't confirm to normal words (e.g. covid-19) or your keywords are multiple word phrases (e.g. "fast forward") would not be looked up in a useful way. If you set to ALL match this probably would not be too bad but ANY match would find potentially tens of times more items than what is really being looked for.

Compound MetaField when multiple shows loads of 'UNSPECIFIED's

If you have a Compound field it may have loads of rows but even if no row has a particular sub-field set (e.g. creators_id), it was still show UNSPECIFIED for each line.

It is possible to at least get this to display empty table cells rather than UNSPECIFIED if the render_quiet MetaField attribute is used. However, it would be useful to extend this to not display the column / sub-field at all if every row will be empty for it.

some item editing could _not_ be cancelled

Reordering of the creators or changing to the files stage, e.g., causes an immediate saving of the current state, thus could not be undone by cancelling the modification.

If this behaviour is due to the implementation and hardly to prevent, then the user should be informed after hitting the cancel button!

´send_alerts <repoID> weekly´ causes several 'Use of uninitialized value ...'

When executing aforementioned command I receive
Use of uninitialized value in string eq at /usr/share/eprints/bin/../perl_lib/EPrints/Search/Condition/Control.pm line 48.
Use of uninitialized value in string eq at /usr/share/eprints/bin/../perl_lib/EPrints/Search/Condition/Or.pm line 50.
Use of uninitialized value in string eq at /usr/share/eprints/bin/../perl_lib/EPrints/Search/Condition/Or.pm line 53.
Use of uninitialized value in string eq at /usr/share/eprints/bin/../perl_lib/EPrints/Search/Condition/And.pm line 54.
Use of uninitialized value in string eq at /usr/share/eprints/bin/../perl_lib/EPrints/Search/Condition/And.pm line 57.
Fortunately the alerts are sent nevertheless :-)

Bazaar inaccessible behind HTTP proxy

Running EPrints 3.4 with Perl 5.22 behind a HTTP proxy (and using HTTPS), I find that on the Admin::EPM screen, the list of available packages fails to load.

Adding the following lines to ingredients/bazaar/plugins/EPrints/EPM/Source.pm solves that problem:

use strict;
use LWP::Protocol::https; # <-- Add this line.
#...
sub accolades
{
        my( $_ua, $_base_url ) = @_;
 
        my $ua = ( $_ua ) ? $_ua : LWP::UserAgent->new;
        $ua->env_proxy;  # <-- Add this line.

I have not put this as a pull request as the actual code will need to be more carefully changed (e.g. dependent on Perl version) and I can't guarantee this is a complete fix as there may be related problems I haven't hit yet.

Event Queue - Primary Key violation

Reported on the Tech-list: http://threader.ecs.soton.ac.uk/lists/eprints_tech/thread-23483.html

ERROR:  duplicate key value violates unique constraint "event_queue_pkey"
DETAIL:  Key (eventqueueid)=(e9210339587b3507d2bdafc95a238a39) already exists. at /opt/eprints3/perl_lib/EPrints/Database.pm line 1387.
        EPrints::Database::add_record('EPrints::Database::Pg=HASH(0x55aa5922a478)', 'EPrints::DataSet=HASH(0x55aa57023030)', 'HASH(0x55aa592f77c8)') called at /opt/eprints3/perl_lib/EPrints/DataObj.pm line 294

My interpretation of this error is:
The 'eventqueueid' is generated from a hash of the parameters that will be used in the index event:
https://github.com/eprints/eprints3.4/blob/v3.4.0/perl_lib/EPrints/DataObj/EventQueue.pm#L88-L110

This means that if exactly the same event (e.g. generate RDF for EPrint X) is created twice, the eventqueueid will be the same, and the new task should not get added to the index queue.

The EPrints::DataObj::EventQueue::create_unique code calls EPrints::DataObj create_from_data (https://github.com/eprints/eprints3.4/blob/v3.4.0/perl_lib/EPrints/DataObj.pm#L221-L353) which then tries to add the data to the database: https://github.com/eprints/eprints3.4/blob/v3.4.0/perl_lib/EPrints/DataObj.pm#L293-L295 using EPrints::Database::add_record.

This code is working correctly:
https://github.com/eprints/eprints3.4/blob/v3.4.0/perl_lib/EPrints/Database.pm#L1387-L1388
An error is raised when a duplicate ID is used.
This will return '0' to the calling code, which in turn returns undef to the EventQueue call.

The question here is: which module should be responsible for checking whether the eventqueueid already exists before trying to insert it.
Most other modules use auto-increment keys, so this isn't an issue.

We could add a dataset search for the MD5 in EPrints::DataObj::EventQueue::create_unique before trying to re-create the event?

site_lib plugins are read in additionally

the (sometimes additional) reading of extending plugins in

# # /site_lib/ extensions plugins - we want those enabled by default
$dir = $repository->config( "base_path" )."/site_lib/plugins";
$self->_load_dir( \%SYSTEM_PLUGINS, $repository, $dir );
if( $use_xslt )
{
$self->_load_xslt_dir( \%SYSTEM_PLUGINS, $repository, $dir );
}
is inconsistent to the introduction of the flavour ranking (read in afterwards), i.e. the new filesystem structure without explicitly mentioned ./site_lib.

Should the aforementioned lines removed?

Edit page phrases appear on static pages

homepage and /eprints/ pages amongst other static pages display the "Edit page phrases" link in the admin menu when these are not editable and clicking on the link causes an internal server error.

file type recognition depends on correct filename extension

Files, whose extension isn't spelled in lower case, won't be recognized correctly; i.e. EXAMPLE.PDF has to be specified as text file by hand ...
I fear the inspection of the file header is an overkill, but the capitalization should be ignored!?

'toform' rendered unusable by EPrints::MetaField::Itemref->render_single_value

In the manual page on EPrints Metadata Fields, there is this suggestion:

toform: This function is allowed to modify the current value which appears in the form. For example, if your database stores userids in a field, but you want to allow people to edit them as usernames, then this function can be used to take the current value (a userid) and return the associated username.

The trouble is, if you have this in place for an itemref field (like, say, user) that appears in a workflow, the row is rendered as both the value and a citation rendered from the value.

Normally for, say, user ID 1234, username jd46:

[  1234  ]  John Doe

With toform in place:

[  jd46  ]  User jd46 not found.

The trouble arises because when render_single_value tries to get the object, it is using the value output by the toform function (username), not the value from the database (userid).

For this to work properly, presumably EPrints::MetaField::Itemref should override EPrints::MetaField->render_input_field to somehow preserve the unconverted value (as an extra parameter at then end?), or else convert it back again using the fromform function if it is defined.

Views with similar label are generated several times

The grep statement in

next VIEW if( @view_opts && !grep( /$view->{id}/, @view_opts ) );

matches to fuzzily. As a consequence, views with similar labels are generated several times, hurting performance of the view generation.

Example: Two configured views, one named "a", the other "ab".
Because "a" matches "ab", it will be generated also when generate_views {repo} --view ab is called.

workflow list of unspecified fields isn't sorted

The list of core workflow is neither sorted by variable name (unwanted) nor by phrase name (recommended) thus the link for editing to still empty fields is hard to find. The ordering seems to depend on database scheme, i.e. table definition ...

Provide support for custom handlers

Save having to hack EPrints::Repository and EPrints::Apache::Rewrite to shoehorn in additional handlers (e.g. PURE's PDA handler), there should be a block that can load custom handlers that can be defined in archive configuration.

citelink does not play nice with citation caching

citelink behaves differently depending on whether the citation is used on an abstract/summary page or somewhere else. Ideally there would be a way for the citation caching to tell and generate two citation cache entries. However, probably the most straightforward way is to have a default_for_summary_page.xml so this does not interfere with the default citation used elsewhere.

RequestCopy hardcoded to use EPrint title in email subject line

EPrints::Plugin::Screen::Public::RequestCopy is hardcoded to use the title of the EPrint in the subject line of request a copy emails. Most of the time this is fine but some repositories may either not use the title field at all or only use in some cases. Also, it would be useful to customise this so maybe you could have the title and the EPrint ID.

I propose changing the code so it will default to using the title but allows you to define your own function for generating the EPrint descriptor string used in request a copy emails.

Ensure table names for other database are not returned

DatabaseSchema page was returning cache tables that were not in the current repository's database. It turned out that it was in another repositories database, which the database (e.g. MySQL) user had permissions to view. In this sense it is not a security issue but as other calls the get_tables could lead to tables being modified or updated it could lead to issues (only for multiple archive repositories) and at very lease confusion in scenarios like the cache table listing in DatabaseSchema.

on 'manage records' screen some undefined phrases arose

error log:
Undefined phrase: eprint:workflow:stage:core:title (en) at line 168 in /usr/share/eprints/perl_lib/EPrints/Workflow/Stage.pm Undefined phrase: eprint:workflow:stage:files:title (en) at line 168 in /usr/share/eprints/perl_lib/EPrints/Workflow/Stage.pm Undefined phrase: eprint:workflow:stage:subjects:title (en) at line 168 in /usr/share/eprints/perl_lib/EPrints/Workflow/Stage.pm Undefined phrase: eprint:workflow:stage:type:title (en) at line 168 in /usr/share/eprints/perl_lib/EPrints/Workflow/Stage.pm Undefined phrase: file_fieldname_copies (en) at line 423 in /usr/share/eprints/perl_lib/EPrints/MetaField.pm Undefined phrase: file_fieldname_copies_pluginid (en) at line 423 in /usr/share/eprints/perl_lib/EPrints/MetaField.pm Undefined phrase: file_fieldname_copies_sourceid (en) at line 423 in /usr/share/eprints/perl_lib/EPrints/MetaField.pm Undefined phrase: file_fieldname_data (en) at line 423 in /usr/share/eprints/perl_lib/EPrints/MetaField.pm Undefined phrase: file_fieldname_url (en) at line 423 in /usr/share/eprints/perl_lib/EPrints/MetaField.pm Undefined phrase: subject_fieldname_sortvalue (en) at line 423 in /usr/share/eprints/perl_lib/EPrints/MetaField.pm Undefined phrase: subject_fieldname_sortvalue_lang (en) at line 423 in /usr/share/eprints/perl_lib/EPrints/MetaField.pm Undefined phrase: subject_fieldname_sortvalue_sortvalue (en) at line 423 in /usr/share/eprints/perl_lib/EPrints/MetaField.pm Undefined phrase: user_fieldname_pin (en) at line 423 in /usr/share/eprints/perl_lib/EPrints/MetaField.pm Undefined phrase: user_fieldname_pinsettime (en) at line 423 in /usr/share/eprints/perl_lib/EPrints/MetaField.pm

Chrome complains EPrints is "not fully secure"

Chrome complains that EPrints is not fully secure as the template has a search form that uses HTTP even when the page is HTTPS. This is due to it using the config option http_cgiurl when it would be better to use rel_cgipath.

It would similarly be a good idea to remove all use of http_cgiurl and http_url from templates and use rel_cgipath and rel_path.

upper case letters in user's name could cause trouble

At login the case does not matter, as long there is no user with the 'same' (apart from case) username (… because the password mapping fails in this case?!). The admin's searches use a case-sensitive username.

I would prefer lower case usernames, but could also imagine a correction of login's behaviour.

bad link in batch edit listing

The link of the partial listing points to /id/eprint/<eprintid>, which is not valid for all states.

I think the link should also point to /cgi/users/home?screen=EPrint::View&eprintid=<eprintid> as usual for a result listing.

Request a copy reports 404 not found when form submitted.

When submitting the form (cgi/request_doc?docid=...) for request a copy you get a 404 rather than a successful request message reported. This is due to the http_root rather than the http_cgiroot config option being used to set the POST URL for for the form.

Remove preparing_static_page repository/session variable

EPrints behaves differently when generating static pages through use of the preparing_static_page repository/session variable. There seems no good reason to do this and in fact creates problems such as mixed content warnings on abstract pages, where file icons use HTTP but the user has request the abstract page over HTTPS.

As preparing_static_page has been in the codebase so long (pre 3.3), maybe there are still some historical uses that are unclear, so after removal this needs to be thoroughly tested before 3.4.2 release.

Handle various special characters that lead to question marks in reimported BibTeX

There are various special characters that can appear in titles and other fields exported by BibTeX that do not get properly encoded and therefore are converted into question marks. Although TeX::Encode::BibTeX may improve in the future, for now it would be best if the BibTeX export replaces these special characters with reasonable substitutes.

Malicious uploaded HTML files could cause problems

From Chris Gutteridge:

Bad person submits an EPrints with an HTML document. The HTML contains javascript.

Editor looks at the HTML document to decide if they should approve it. At that point the editor is authenticated to the site with a cookie. The javascript runs on the editors browser and requests one or more alternate pages from the site which are not visible to the public. Let's say a sensitive document. It is making the request using the editors credentials as those js queries still have the cookie set. The javascript then sends that data to their evil master... eg. requests a bunch of URLs on evil.com's server, passing the data. I'm not sure JS can make post requests to other sites but I bet it can if the target site allows XSS.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.