seerlabs / citeseerx Goto Github PK
View Code? Open in Web Editor NEWCiteSeerX public repository
License: Other
CiteSeerX public repository
License: Other
This happens on one of the web servers (03). I first got a "server down message" from the monitor. I then looked at the localhost file and saw lots of errors like ""too many files opened". Basically, the system cannot even open a sitemap file and respond to the client. I then checked "ullimit -Hn" and the number was 8192. Using "lsof" I see more than 10,000 rows and most of them are like "web server -> repo server [CLOSE_WAIT]". This CLOSE_WAIT status is very annoying and has been directly causing web server down issues. However, as far as I know, there's no way to clean the socket except to restart the service or reboot. In this situation, I had to restart Tomcat, but I wonder if this could be a software or hardware issue. The numbers of file descriptors are below the limit on the other web servers (also 8192).
Which is the active repo? Github or sourceforge? looking at the .pdf it said sf.net, but looking at the commit log it seems github is the one
When viewing an author who does not have a homepage listed the "Submit a homepage" link points to a "localhost" address.
According to https://www.ssllabs.com/ssltest/analyze.html?d=citeseerx.ist.psu.edu all supported ciphers are either weak or insecure:
Some libraries and runtimes deliberately don't support such ciphers, so it is not possible to access the site via HTTPS using those libraries and runtimes. E.g., it is not possible with rustls and Deno (denoland/deno#10447).
What are common traps for aspiring writers?
Hi
I want to installed mysql just like in document. I installed mysql using tar.gz file for linux generic (i am using CentOS). I run the tutorial form here : https://dev.mysql.com/doc/refman/5.0/en/binary-installation.html. I already change etc/my.cnf just like the CXM document but when i call "service mysqld start", there's an error message "Redirecting to /bin/systemctl start mysqld.service. Failed to issue method call : Unit mysqld.service failed to load : No such file or directory".
I read form here : https://ask.fedoraproject.org/en/question/43459/how-to-start-mysql-mysql-isnt-starting/, it write that MySQL was replaced by MariaDB. So what i need to do? Should i change to MariaDB? or any other suggestion?
Sorry for my english.
Thanks in advance
Hi all,
I'm currently getting "502 Bad Gateways" attempting to look up resources on CiteSeerX. It seems to be somewhat intermittent, but mostly not working. In the act of troubleshooting, I tried both Firefox and a Chrome-based browser, in Incognito / Private Browsing mode, with or without "Enhanced Tracking Protection," my adblocker, and various security settings enabled. I think it's a site problem.
To the casual visitor, the "loading" animation just pulses forever with no results. But some clues arise if you pop open the dev console in the browser:
For example: this direct link https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.8740 (via https://en.wikipedia.org/wiki/Trusted_timestamping), or use of the search form on the homepage with search terms from the same paper both result in Error 502s from https://citeseerx.ist.psu.edu/api/getSummary?paper_id=10.1.1.46.8740, https://citeseerx.ist.psu.edu/api/search, or https://citeseerx.ist.psu.edu/api/aggregate.
I get that it's after close-of-business in the Eastern US, on a Friday, but I figured you'd like to know, being a widely-depended upon resource and all.
Cheers and good luck!
Hi,
I got a problem when trying to build CiteseerX .war file with Ant.
It looks like the Ant couldn't find the library of servlet-api.jar (since the error is located on ServletResponse). How to fix this? Did I do something wrong?
Here is my environment:
Java version "1.8.0_25"
Tomcat 6
Ant 1.9.4
Here is the tomcat/bin/startup.sh:
Using CATALINA_BASE: /usr/share/tomcat6
Using CATALINA_HOME: /usr/share/tomcat6
Using CATALINA_TMPDIR: /usr/share/tomcat6/temp
Using JRE_HOME: /usr/local/sbin/jdk1.8.0_25/
Using CLASSPATH: /usr/share/tomcat6/bin/bootstrap.jar
Here is the error:
root@abraham-Parallels-Virtual-Platform:~/divusi/CiteSeerX# ant
Buildfile: /home/abraham/divusi/CiteSeerX/build.xml
init:
compile:
[javac] Compiling 7 source files to /home/abraham/divusi/CiteSeerX/build
[javac] /home/abraham/divusi/CiteSeerX/src/java/edu/psu/citeseerx/webutils/CharsetFilter.java:42: error: cannot find symbol
[javac] response.setCharacterEncoding(encoding);
[javac] ^
[javac] symbol: method setCharacterEncoding(String)
[javac] location: variable response of type ServletResponse
[javac] /home/abraham/divusi/CiteSeerX/src/java/edu/psu/citeseerx/webutils/RedirectUtils.java:102: error: cannot find symbol
[javac] builder.append(request.getLocalName());
[javac] ^
[javac] symbol: method getLocalName()
[javac] location: variable request of type HttpServletRequest
[javac] /home/abraham/divusi/CiteSeerX/src/java/edu/psu/citeseerx/webutils/RedirectUtils.java:103: error: cannot find symbol
[javac] int port = request.getLocalPort();
[javac] ^
[javac] symbol: method getLocalPort()
[javac] location: variable request of type HttpServletRequest
[javac] Note: /home/abraham/divusi/CiteSeerX/src/java/edu/psu/citeseerx/webutils/HeaderFilter.java uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 3 errors
Hi. I want to install the CiteSeerX. In your document, you use Solr 1.6. But i want to use Solr 5.2.1. The i found information in https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+Tomcat that Solr do not support for WAR file. And when i Installed Solr it crash because the port already listening to Tomcat. So how i do solve this problem?
Do you have solution for install CiteseerX with Solr 5.2 or higher?
Thanx in advance
In the crawl web site, the crawl history page shows the proportion of documents "In System", "Crawled" and "Fail to Convert", but the "In System" documents just means documents are extracted, but not necessarily mean they are ingested, i.e., documents may in the waiting list. And because of the significant speed difference between ingestion and extraction, the waiting list can be long. Therefore, we need the fourth parameter reflecting the real number of papers ingested. This can be done in three steps
(1) add a new flag in the "state" field in citeseerx_crawl.main_crawl_document table to indicate ingested papers;
(2) update view.py, adding "ingested_count" and calculate it in some way (either dynamically from the production database, or from the crawling, or from a database dump);
(3) update template, adding "ingested_count" in the displayed graph.
We should probably escape special characters in the query string.
For example, this query returns error
http://citeseerx.ist.psu.edu/search?q=CollabSeer:+A+Search+Engine+for+Collaboration+Discovery&submit=Search&fs=1&sort=rlv&t=doc
but this one is correct
http://citeseerx.ist.psu.edu/search?q=CollabSeer\:+A+Search+Engine+for+Collaboration+Discovery&submit=Search&fs=1&sort=rlv&t=doc
Based on the reference here http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
we should escape the following special characters.
In .pdf, is there explanation about how to set and running the crawler and ingestion service?
Is it possible for the crawler and ingestion service to act automatically when a link is submitted?
At project, there is folder crawler, and when i build the citeseerx, in dist folder there is some service there. What am i suppose to do with it?
At some point, it seems that the links have changed, and the older links now 404. Information and discussion can be found on Math.Stackexchange's Meta site.
An example: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.737
is now broken, and should be replaced with https://citeseerx.ist.psu.edu/pdf/d1d02576a8325e4b089d6549cc93c6984176038f
. It was possible to find this because we knew beforehand the title of the paper; is there a way to do this, only knowing the old link?
Being able to query CiteSeerX directly, instead of resorting to tricks such as https://stackoverflow.com/questions/14085383/citeseerx-search-api, would be very nice.
Dear Sir,
I have a paper titled "Rare Intelligent Life on Universe" on CiteSeerX. Thank you for deleting the obsolete download link against that paper as per my request.But since the currently working download link has not been added it is showing "The document with DOI "10.1.1.294.8283" has been removed" and no downloading link exists against my paper. Below links are currently working links from where CiteSeerX can download my paper. I earnestly request you to add any of the below working link against my paper in CiteSeerX.
i) http://www.elixirpublishers.com/articles/1351335076_50%20(2012)%2010465.pdf
ii) http://www.academia.edu/3302269/Rare_Intelligent_Life_on_Universe
iii) http://vixra.org/pdf/1209.0098v2.pdf
Thanking You,
Yours sincerely,
Arnab Shome
Login CSS problem https://citeseerx.ist.psu.edu/myciteseer/login
Chrome Version 61.0.3163.100 (Official Build) (64-bit) :-
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure stylesheet 'http://citeseerx.ist.psu.edu/css/main.css'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/jquery-1.4.2.min.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/jquery-ui-1.8.custom.min.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/jquery.idTabs.min.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/metacart.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/topnav.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/citeseerx.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/correctionutils.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/checkboxes.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/ga.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure script 'http://citeseerx.ist.psu.edu/js/s2button.js'. This request has been blocked; the content must be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure favicon 'http://citeseerx.ist.psu.edu/favicon.ico'. This content should also be served over HTTPS.
Mixed Content: The page at 'https://citeseerx.ist.psu.edu/myciteseer/login' was loaded over HTTPS, but requested an insecure stylesheet 'http://citeseerx.ist.psu.edu/css/main.css'. This request has been blocked; the content must be served over HTTPS.
Firefox 57.0b8 (64-bit) :-
Login page looks like there is no CSS loaded.
IE 11 :-
Login page has content over other content.
"Only Secure Content is displayed" - "Show all Content" button
Edge 40.15063.674.0 :-
Login page has content over other content.
Certificates expired
Dear all,
I am a PhD student and researcher in Brazil working with expertise retrieval and comparison. For my studies and proposal tests I would like access to the CiteSeerX data, through rsync as told in the site.
Kind regards,
When searching for, e.g., works referencing Kazakçi, Kazakci
will bring up the appropriate results, but Kazakçi
, the author's actual name, is interpreted as Kazakçi
.
still at your .pdf it requires solr 1.6 .. is it still valid?
*it is quite old solr
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.