seerlabs / citeseerx Goto Github PK

CiteSeerX public repository

License: Other

Shell 0.06% Perl 3.19% Python 4.74% CSS 0.66% HTML 52.02% XSLT 0.41% Java 29.23% JavaScript 9.59% Perl 6 0.04% TSQL 0.07%

citeseerx's People

Contributors

Stargazers

Watchers

citeseerx's Issues

number of file descriptors increases beyond limit

This happens on one of the web servers (03). I first got a "server down message" from the monitor. I then looked at the localhost file and saw lots of errors like ""too many files opened". Basically, the system cannot even open a sitemap file and respond to the client. I then checked "ullimit -Hn" and the number was 8192. Using "lsof" I see more than 10,000 rows and most of them are like "web server -> repo server [CLOSE_WAIT]". This CLOSE_WAIT status is very annoying and has been directly causing web server down issues. However, as far as I know, there's no way to clean the socket except to restart the service or reboot. In this situation, I had to restart Tomcat, but I wonder if this could be a software or hardware issue. The numbers of file descriptors are below the limit on the other web servers (also 8192).

Active Repo

Which is the active repo? Github or sourceforge? looking at the .pdf it said sf.net, but looking at the commit log it seems github is the one

Link to submit an author's homepage is broken

Summary

When viewing an author who does not have a homepage listed the "Submit a homepage" link points to a "localhost" address.

Steps to Reproduce

Visit an author who does not have a homepage listed. Example: http://citeseerx.ist.psu.edu/viewauth/summary?aid=686603
Attempt to add a homepage by clicking "Not found. Submit a homepage"
User is directed to http://localhost/?aid=686603 , an invalid link.

Only weak and insecure TLS ciphers are supported by the server

According to https://www.ssllabs.com/ssltest/analyze.html?d=citeseerx.ist.psu.edu all supported ciphers are either weak or insecure:

Some libraries and runtimes deliberately don't support such ciphers, so it is not possible to access the site via HTTPS using those libraries and runtimes. E.g., it is not possible with rustls and Deno (denoland/deno#10447).

Hello!!! I'm Kate Balls

What are common traps for aspiring writers?

Mysql installation

I want to installed mysql just like in document. I installed mysql using tar.gz file for linux generic (i am using CentOS). I run the tutorial form here : https://dev.mysql.com/doc/refman/5.0/en/binary-installation.html. I already change etc/my.cnf just like the CXM document but when i call "service mysqld start", there's an error message "Redirecting to /bin/systemctl start mysqld.service. Failed to issue method call : Unit mysqld.service failed to load : No such file or directory".

I read form here : https://ask.fedoraproject.org/en/question/43459/how-to-start-mysql-mysql-isnt-starting/, it write that MySQL was replaced by MariaDB. So what i need to do? Should i change to MariaDB? or any other suggestion?

Sorry for my english.

Thanks in advance

Error 502s using direct links or the search form

Hi all,

I'm currently getting "502 Bad Gateways" attempting to look up resources on CiteSeerX. It seems to be somewhat intermittent, but mostly not working. In the act of troubleshooting, I tried both Firefox and a Chrome-based browser, in Incognito / Private Browsing mode, with or without "Enhanced Tracking Protection," my adblocker, and various security settings enabled. I think it's a site problem.

To the casual visitor, the "loading" animation just pulses forever with no results. But some clues arise if you pop open the dev console in the browser:

For example: this direct link https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.8740 (via https://en.wikipedia.org/wiki/Trusted_timestamping), or use of the search form on the homepage with search terms from the same paper both result in Error 502s from https://citeseerx.ist.psu.edu/api/getSummary?paper_id=10.1.1.46.8740, https://citeseerx.ist.psu.edu/api/search, or https://citeseerx.ist.psu.edu/api/aggregate.

I get that it's after close-of-business in the Eastern US, on a Friday, but I figured you'd like to know, being a widely-depended upon resource and all.

Cheers and good luck!

Couldn't build CiteSeerX .war file with Ant

Hi,
I got a problem when trying to build CiteseerX .war file with Ant.
It looks like the Ant couldn't find the library of servlet-api.jar (since the error is located on ServletResponse). How to fix this? Did I do something wrong?

Here is my environment:
Java version "1.8.0_25"
Tomcat 6
Ant 1.9.4

Here is the tomcat/bin/startup.sh:
Using CATALINA_BASE: /usr/share/tomcat6
Using CATALINA_HOME: /usr/share/tomcat6
Using CATALINA_TMPDIR: /usr/share/tomcat6/temp
Using JRE_HOME: /usr/local/sbin/jdk1.8.0_25/
Using CLASSPATH: /usr/share/tomcat6/bin/bootstrap.jar

Here is the error:
root@abraham-Parallels-Virtual-Platform:~/divusi/CiteSeerX# ant
Buildfile: /home/abraham/divusi/CiteSeerX/build.xml

init:

compile:
[javac] Compiling 7 source files to /home/abraham/divusi/CiteSeerX/build
[javac] /home/abraham/divusi/CiteSeerX/src/java/edu/psu/citeseerx/webutils/CharsetFilter.java:42: error: cannot find symbol
[javac] response.setCharacterEncoding(encoding);
[javac] ^
[javac] symbol: method setCharacterEncoding(String)
[javac] location: variable response of type ServletResponse
[javac] /home/abraham/divusi/CiteSeerX/src/java/edu/psu/citeseerx/webutils/RedirectUtils.java:102: error: cannot find symbol
[javac] builder.append(request.getLocalName());
[javac] ^
[javac] symbol: method getLocalName()
[javac] location: variable request of type HttpServletRequest
[javac] /home/abraham/divusi/CiteSeerX/src/java/edu/psu/citeseerx/webutils/RedirectUtils.java:103: error: cannot find symbol
[javac] int port = request.getLocalPort();
[javac] ^
[javac] symbol: method getLocalPort()
[javac] location: variable request of type HttpServletRequest
[javac] Note: /home/abraham/divusi/CiteSeerX/src/java/edu/psu/citeseerx/webutils/HeaderFilter.java uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 3 errors

Installation Solr 5.1

Hi. I want to install the CiteSeerX. In your document, you use Solr 1.6. But i want to use Solr 5.2.1. The i found information in https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+Tomcat that Solr do not support for WAR file. And when i Installed Solr it crash because the port already listening to Tomcat. So how i do solve this problem?

Do you have solution for install CiteseerX with Solr 5.2 or higher?

Thanx in advance

Crawl history does not reflect real number of papers ingested in CiteSeer

In the crawl web site, the crawl history page shows the proportion of documents "In System", "Crawled" and "Fail to Convert", but the "In System" documents just means documents are extracted, but not necessarily mean they are ingested, i.e., documents may in the waiting list. And because of the significant speed difference between ingestion and extraction, the waiting list can be long. Therefore, we need the fourth parameter reflecting the real number of papers ingested. This can be done in three steps
(1) add a new flag in the "state" field in citeseerx_crawl.main_crawl_document table to indicate ingested papers;
(2) update view.py, adding "ingested_count" and calculate it in some way (either dynamically from the production database, or from the crawling, or from a database dump);
(3) update template, adding "ingested_count" in the displayed graph.

Sort by recency vs sort by year (descending)

What is the difference between "sort by recency" and "sort by year (descending)"?

Escaping Special Characters

We should probably escape special characters in the query string.
For example, this query returns error
http://citeseerx.ist.psu.edu/search?q=CollabSeer:+A+Search+Engine+for+Collaboration+Discovery&submit=Search&fs=1&sort=rlv&t=doc

but this one is correct
http://citeseerx.ist.psu.edu/search?q=CollabSeer\:+A+Search+Engine+for+Collaboration+Discovery&submit=Search&fs=1&sort=rlv&t=doc

Based on the reference here http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
we should escape the following special characters.

- && || ! ( ) { } [ ] ^ " ~ * ? : \

Running Crawler and Ingestion Service

In .pdf, is there explanation about how to set and running the crawler and ingestion service?

Is it possible for the crawler and ingestion service to act automatically when a link is submitted?

At project, there is folder crawler, and when i build the citeseerx, in dist folder there is some service there. What am i suppose to do with it?

Is there a way to convert old CiteSeerX links to new?

At some point, it seems that the links have changed, and the older links now 404. Information and discussion can be found on Math.Stackexchange's Meta site.

An example: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.737 is now broken, and should be replaced with https://citeseerx.ist.psu.edu/pdf/d1d02576a8325e4b089d6549cc93c6984176038f. It was possible to find this because we knew beforehand the title of the paper; is there a way to do this, only knowing the old link?

maybe we should move the sitemap files to their own directory?

Are there plans for an API

Being able to query CiteSeerX directly, instead of resorting to tricks such as https://stackoverflow.com/questions/14085383/citeseerx-search-api, would be very nice.

No download link exists against my paper.Request for adding

Dear Sir,
I have a paper titled "Rare Intelligent Life on Universe" on CiteSeerX. Thank you for deleting the obsolete download link against that paper as per my request.But since the currently working download link has not been added it is showing "The document with DOI "10.1.1.294.8283" has been removed" and no downloading link exists against my paper. Below links are currently working links from where CiteSeerX can download my paper. I earnestly request you to add any of the below working link against my paper in CiteSeerX.
i) http://www.elixirpublishers.com/articles/1351335076_50%20(2012)%2010465.pdf
ii) http://www.academia.edu/3302269/Rare_Intelligent_Life_on_Universe
iii) http://vixra.org/pdf/1209.0098v2.pdf
Thanking You,
Yours sincerely,
Arnab Shome

Login CSS problem (requesting an insecure stylesheet and scripts) - https://citeseerx.ist.psu.edu/myciteseer/login

Chrome Version 61.0.3163.100 (Official Build) (64-bit) :-

Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure stylesheet 'http://citeseerx.ist.psu.edu/css/main.css'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/jquery-1.4.2.min.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/jquery-ui-1.8.custom.min.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/jquery.idTabs.min.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/metacart.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/topnav.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/citeseerx.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/correctionutils.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/checkboxes.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/ga.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/s2button.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure favicon 'http://citeseerx.ist.psu.edu/favicon.ico'. This content should also be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure stylesheet 'http://citeseerx.ist.psu.edu/css/main.css'. This request has been blocked; the content must be served over HTTPS.

Firefox 57.0b8 (64-bit) :-
Login page looks like there is no CSS loaded.

IE 11 :-
Login page has content over other content.
"Only Secure Content is displayed" - "Show all Content" button

Edge 40.15063.674.0 :-
Login page has content over other content.

Kind regards,

Unicode support in searches

When searching for, e.g., works referencing Kazakçi, Kazakci will bring up the appropriate results, but Kazakçi, the author's actual name, is interpreted as KazakÃ§i.

Solr Version

still at your .pdf it requires solr 1.6 .. is it still valid?
*it is quite old solr

seerlabs / citeseerx Goto Github PK

citeseerx's People

Contributors

Stargazers

Watchers

Forkers

citeseerx's Issues

Summary

Steps to Reproduce

Recommend Projects

Recommend Topics

Recommend Org

Jobs