GithubHelp home page GithubHelp logo

seerlabs / citeseerx Goto Github PK

View Code? Open in Web Editor NEW
128.0 128.0 59.0 162.55 MB

CiteSeerX public repository

License: Other

Shell 0.06% Perl 3.19% Python 4.74% CSS 0.66% HTML 52.02% XSLT 0.41% Java 29.23% JavaScript 9.59% Perl 6 0.04% TSQL 0.07%

citeseerx's People

Contributors

bharathgit956 avatar dwj300 avatar fanchyna avatar kylemarkwilliams avatar pychuang avatar sagnik avatar shauryr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

citeseerx's Issues

number of file descriptors increases beyond limit

This happens on one of the web servers (03). I first got a "server down message" from the monitor. I then looked at the localhost file and saw lots of errors like ""too many files opened". Basically, the system cannot even open a sitemap file and respond to the client. I then checked "ullimit -Hn" and the number was 8192. Using "lsof" I see more than 10,000 rows and most of them are like "web server -> repo server [CLOSE_WAIT]". This CLOSE_WAIT status is very annoying and has been directly causing web server down issues. However, as far as I know, there's no way to clean the socket except to restart the service or reboot. In this situation, I had to restart Tomcat, but I wonder if this could be a software or hardware issue. The numbers of file descriptors are below the limit on the other web servers (also 8192).

Active Repo

Which is the active repo? Github or sourceforge? looking at the .pdf it said sf.net, but looking at the commit log it seems github is the one

Mysql installation

Hi

I want to installed mysql just like in document. I installed mysql using tar.gz file for linux generic (i am using CentOS). I run the tutorial form here : https://dev.mysql.com/doc/refman/5.0/en/binary-installation.html. I already change etc/my.cnf just like the CXM document but when i call "service mysqld start", there's an error message "Redirecting to /bin/systemctl start mysqld.service. Failed to issue method call : Unit mysqld.service failed to load : No such file or directory".

I read form here : https://ask.fedoraproject.org/en/question/43459/how-to-start-mysql-mysql-isnt-starting/, it write that MySQL was replaced by MariaDB. So what i need to do? Should i change to MariaDB? or any other suggestion?

Sorry for my english.

Thanks in advance

Error 502s using direct links or the search form

Hi all,

I'm currently getting "502 Bad Gateways" attempting to look up resources on CiteSeerX. It seems to be somewhat intermittent, but mostly not working. In the act of troubleshooting, I tried both Firefox and a Chrome-based browser, in Incognito / Private Browsing mode, with or without "Enhanced Tracking Protection," my adblocker, and various security settings enabled. I think it's a site problem.

To the casual visitor, the "loading" animation just pulses forever with no results. But some clues arise if you pop open the dev console in the browser:

Screenshot of one of the Error 502s in the Chromium dev console, in Incognito mode (to ensure no extensions are interfering)

For example: this direct link https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.8740 (via https://en.wikipedia.org/wiki/Trusted_timestamping), or use of the search form on the homepage with search terms from the same paper both result in Error 502s from https://citeseerx.ist.psu.edu/api/getSummary?paper_id=10.1.1.46.8740, https://citeseerx.ist.psu.edu/api/search, or https://citeseerx.ist.psu.edu/api/aggregate.

I get that it's after close-of-business in the Eastern US, on a Friday, but I figured you'd like to know, being a widely-depended upon resource and all.

Cheers and good luck!

Couldn't build CiteSeerX .war file with Ant

Hi,
I got a problem when trying to build CiteseerX .war file with Ant.
It looks like the Ant couldn't find the library of servlet-api.jar (since the error is located on ServletResponse). How to fix this? Did I do something wrong?

Here is my environment:
Java version "1.8.0_25"
Tomcat 6
Ant 1.9.4

Here is the tomcat/bin/startup.sh:
Using CATALINA_BASE: /usr/share/tomcat6
Using CATALINA_HOME: /usr/share/tomcat6
Using CATALINA_TMPDIR: /usr/share/tomcat6/temp
Using JRE_HOME: /usr/local/sbin/jdk1.8.0_25/
Using CLASSPATH: /usr/share/tomcat6/bin/bootstrap.jar

Here is the error:
root@abraham-Parallels-Virtual-Platform:~/divusi/CiteSeerX# ant
Buildfile: /home/abraham/divusi/CiteSeerX/build.xml

init:

compile:
[javac] Compiling 7 source files to /home/abraham/divusi/CiteSeerX/build
[javac] /home/abraham/divusi/CiteSeerX/src/java/edu/psu/citeseerx/webutils/CharsetFilter.java:42: error: cannot find symbol
[javac] response.setCharacterEncoding(encoding);
[javac] ^
[javac] symbol: method setCharacterEncoding(String)
[javac] location: variable response of type ServletResponse
[javac] /home/abraham/divusi/CiteSeerX/src/java/edu/psu/citeseerx/webutils/RedirectUtils.java:102: error: cannot find symbol
[javac] builder.append(request.getLocalName());
[javac] ^
[javac] symbol: method getLocalName()
[javac] location: variable request of type HttpServletRequest
[javac] /home/abraham/divusi/CiteSeerX/src/java/edu/psu/citeseerx/webutils/RedirectUtils.java:103: error: cannot find symbol
[javac] int port = request.getLocalPort();
[javac] ^
[javac] symbol: method getLocalPort()
[javac] location: variable request of type HttpServletRequest
[javac] Note: /home/abraham/divusi/CiteSeerX/src/java/edu/psu/citeseerx/webutils/HeaderFilter.java uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 3 errors

Installation Solr 5.1

Hi. I want to install the CiteSeerX. In your document, you use Solr 1.6. But i want to use Solr 5.2.1. The i found information in https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+Tomcat that Solr do not support for WAR file. And when i Installed Solr it crash because the port already listening to Tomcat. So how i do solve this problem?

Do you have solution for install CiteseerX with Solr 5.2 or higher?

Thanx in advance

Crawl history does not reflect real number of papers ingested in CiteSeer

In the crawl web site, the crawl history page shows the proportion of documents "In System", "Crawled" and "Fail to Convert", but the "In System" documents just means documents are extracted, but not necessarily mean they are ingested, i.e., documents may in the waiting list. And because of the significant speed difference between ingestion and extraction, the waiting list can be long. Therefore, we need the fourth parameter reflecting the real number of papers ingested. This can be done in three steps
(1) add a new flag in the "state" field in citeseerx_crawl.main_crawl_document table to indicate ingested papers;
(2) update view.py, adding "ingested_count" and calculate it in some way (either dynamically from the production database, or from the crawling, or from a database dump);
(3) update template, adding "ingested_count" in the displayed graph.

Escaping Special Characters

We should probably escape special characters in the query string.
For example, this query returns error
http://citeseerx.ist.psu.edu/search?q=CollabSeer:+A+Search+Engine+for+Collaboration+Discovery&submit=Search&fs=1&sort=rlv&t=doc

but this one is correct
http://citeseerx.ist.psu.edu/search?q=CollabSeer\:+A+Search+Engine+for+Collaboration+Discovery&submit=Search&fs=1&sort=rlv&t=doc

Based on the reference here http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
we should escape the following special characters.

    • && || ! ( ) { } [ ] ^ " ~ * ? : \

Running Crawler and Ingestion Service

In .pdf, is there explanation about how to set and running the crawler and ingestion service?

Is it possible for the crawler and ingestion service to act automatically when a link is submitted?

At project, there is folder crawler, and when i build the citeseerx, in dist folder there is some service there. What am i suppose to do with it?

Is there a way to convert old CiteSeerX links to new?

At some point, it seems that the links have changed, and the older links now 404. Information and discussion can be found on Math.Stackexchange's Meta site.

An example: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.737 is now broken, and should be replaced with https://citeseerx.ist.psu.edu/pdf/d1d02576a8325e4b089d6549cc93c6984176038f. It was possible to find this because we knew beforehand the title of the paper; is there a way to do this, only knowing the old link?

No download link exists against my paper.Request for adding

Dear Sir,
I have a paper titled "Rare Intelligent Life on Universe" on CiteSeerX. Thank you for deleting the obsolete download link against that paper as per my request.But since the currently working download link has not been added it is showing "The document with DOI "10.1.1.294.8283" has been removed" and no downloading link exists against my paper. Below links are currently working links from where CiteSeerX can download my paper. I earnestly request you to add any of the below working link against my paper in CiteSeerX.
i) http://www.elixirpublishers.com/articles/1351335076_50%20(2012)%2010465.pdf
ii) http://www.academia.edu/3302269/Rare_Intelligent_Life_on_Universe
iii) http://vixra.org/pdf/1209.0098v2.pdf
Thanking You,
Yours sincerely,
Arnab Shome

Login CSS problem (requesting an insecure stylesheet and scripts) - https://citeseerx.ist.psu.edu/myciteseer/login

Login CSS problem https://citeseerx.ist.psu.edu/myciteseer/login

Chrome Version 61.0.3163.100 (Official Build) (64-bit) :-

Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure stylesheet 'http://citeseerx.ist.psu.edu/css/main.css'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/jquery-1.4.2.min.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/jquery-ui-1.8.custom.min.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/jquery.idTabs.min.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/metacart.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/topnav.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/citeseerx.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/correctionutils.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/checkboxes.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/ga.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/s2button.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure favicon 'http://citeseerx.ist.psu.edu/favicon.ico'. This content should also be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure stylesheet 'http://citeseerx.ist.psu.edu/css/main.css'. This request has been blocked; the content must be served over HTTPS.

Firefox 57.0b8 (64-bit) :-
Login page looks like there is no CSS loaded.

IE 11 :-
Login page has content over other content.
"Only Secure Content is displayed" - "Show all Content" button

Edge 40.15063.674.0 :-
Login page has content over other content.

CiteSeerX Data

Dear all,

I am a PhD student and researcher in Brazil working with expertise retrieval and comparison. For my studies and proposal tests I would like access to the CiteSeerX data, through rsync as told in the site.

Kind regards,

Unicode support in searches

When searching for, e.g., works referencing Kazakçi, Kazakci will bring up the appropriate results, but Kazakçi, the author's actual name, is interpreted as Kazakçi.

Solr Version

still at your .pdf it requires solr 1.6 .. is it still valid?
*it is quite old solr

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.