GithubHelp home page GithubHelp logo

arXiv about archives HOT 29 OPEN

ipfs-inactive avatar ipfs-inactive commented on August 19, 2024
arXiv

from archives.

Comments (29)

jbenet avatar jbenet commented on August 19, 2024

πŸ‘ πŸ‘ πŸ‘ πŸ‘ @davidar on mirroring the CC part of arxiv on IPFS.

from archives.

rht avatar rht commented on August 19, 2024

Don't have enough network+storage bandwidth to do this, but wish to cat the source archives https://gateway.ipfs.io/ipfs/QmPFiQcfUPr9DYTJ7MPzY9ixxWkfK1oqQQEmQHS1zeD5PP and then process it to get the most used operators 1-gram / var names.

e.g. with 50 sources (from 1 field):

1. { 15425 590.54%
2. _ 9059 346.82%
3. ^ 7225 276.61%
4. - 4081 156.24%
5. + 3019 115.58%
6. = 2612 100.00%
7. \frac 1757 67.27%
8. \mu 1485 56.85%
9. \partial 1295 49.58%
10. \bar 1290 49.39%

If brevity of a notation comes with usage.

from archives.

davidar avatar davidar commented on August 19, 2024

@rht what I'd really love to do is http://latexsearch.com

But more generally, running computations on large distributed datasets is an interesting problem. IPMR (map reduce) anyone? It would be cool if you could pay nodes (bitcoin?) to run some sandboxed code over their local blockstore and publish the results back to the ipfs network. @jbenet have you had any thoughts on this topic?

from archives.

davidar avatar davidar commented on August 19, 2024

@rht Also, if you could give me the script to do this I'd be happy to run it on pollux

from archives.

rht avatar rht commented on August 19, 2024

(on latexsearch : it is too crude. it's like searching through latex "bytecodes" instead of the formulas it represent)

I will put the script here after I have modified it to walk through several files.
Currently it is https://ipfs.io/ipfs/QmQtLp5BxYVgo6miYz3EUAaoP32eqUcW6G8ywsiMxGB5vT

from archives.

davidar avatar davidar commented on August 19, 2024

rht: I guess my interest is more academic, as in, has anybody written anything about this expression I've just derived?

Also, would be really cool to finalise this issue soon (and publicise it a little bit). @whyrusleeping what's the best way to proceed with adding lots of small files at the moment? Edit: use the tar adder? Are you able to do something like ipfs cat /ipfs/hash/archive.tar/foo/bar.txt?

Edit: and also do cool stuff with it to showcase the benefits of publishing with a creative commons license. CC-BY-SA is really the embodiment of the scientific social contract

from archives.

rht avatar rht commented on August 19, 2024

But it would be better if there are specifiers e.g. if the expression is vectorized, approximate/exact, etc.

https://ipfs.io/ipfs/QmcH6av29y7fXBqifmEMU54vVuMYjpBFBdcQsDkWZK8Hbi should work with python parselatex.py dirname.
Also wanted to know how long it takes to traverse the entire CC arxiv.

from archives.

davidar avatar davidar commented on August 19, 2024

@rht definitely, and that's exactly why an open source version of LaTeX search would be better ☺

from archives.

davidar avatar davidar commented on August 19, 2024

@rht https://ipfs.io/ipfs/QmeFkmYSPhn33hzEHxxqtvDNxHorW93iZFFMgZDQKrfeH4

real 6m24.104s
user 3m34.448s
sys 1m14.448s

from archives.

jbenet avatar jbenet commented on August 19, 2024

We should get @rht access on pollux -- @rht what's your ssh pubkey?Β 

β€”
Sent from Mailbox

On Tue, Sep 15, 2015 at 6:13 AM, David A Roberts [email protected]
wrote:

@rht https://ipfs.io/ipfs/QmeFkmYSPhn33hzEHxxqtvDNxHorW93iZFFMgZDQKrfeH4

real 6m24.104s
user 3m34.448s

sys 1m14.448s

Reply to this email directly or view it on GitHub:
#2 (comment)

from archives.

rht avatar rht commented on August 19, 2024

That's too fast!

This would have taken ages if I were to do it by myself (or if anyone had done this before).

It appears that

  • \sum / \int (map-reduce of +) is the most used operator (\partial doesn't count because it appears 3 or 4 times in a line). This could have been matmul, but unfortunately this is implicit in latex.
  • \alpha is actually being used more than \mu.
  • \rangle happens more often than \psi (by a small margin)

from archives.

rht avatar rht commented on August 19, 2024

@jbenet https://api.github.com/users/rht/keys, the one with id 12491781, as @lgierth suggested.

from archives.

whyrusleeping avatar whyrusleeping commented on August 19, 2024

@davidar the IPMR thing sounds like it could be accomplished with ethereum and ipfs.

from archives.

whyrusleeping avatar whyrusleeping commented on August 19, 2024

@whyrusleeping what's the best way to proceed with adding lots of small files at the moment? Edit: use the tar adder? Are you able to do something like ipfs cat /ipfs/hash/archive.tar/foo/bar.txt?

The most reliable way is going to be to add the content offline. Even the tar add command will hit the file descriptor issue :/

from archives.

 avatar commented on August 19, 2024

We should get @rht access on pollux

Will take care of it this sprint: ipfs/infra#87

from archives.

jbenet avatar jbenet commented on August 19, 2024

it would be cool if you could pay nodes (bitcoin?) to run some sandboxed code over their local blockstore and publish the results back to the ipfs network. @jbenet have you had any thoughts on this topic?

that's like phase 3 of the IPPlan. (we're in phase 1). we could run some experiments now with this. it shouldn't be too hard.

from archives.

davidar avatar davidar commented on August 19, 2024

The most reliable way is going to be to add the content offline. Even the tar add command will hit the file descriptor issue :/

@whyrusleeping I am, still too slow :(

Re ethereum, can you elaborate? I've played with it a little bit, and it seems incapable of doing any kind of interesting computation before heat death of the universe...

@jbenet cool! What's phase 2?

from archives.

davidar avatar davidar commented on August 19, 2024

There's still a little bit of work that needs to be done, but I'm going to go ahead and call this the first complete (pdf+src+metadata) release of the arXiv archive:

https://ipfs.io/ipfs/QmfXH9XtP7xmoTH8WAp4HNSduqWMwLTH8B8TvbTkdgzNAa

🎈 πŸŽ†

from archives.

jbenet avatar jbenet commented on August 19, 2024

🌟 🌠 🎈 πŸ‘

from archives.

NDuma avatar NDuma commented on August 19, 2024

Link to GitHub via GitXiv.com

from archives.

davidar avatar davidar commented on August 19, 2024

@NDuma GitXiv looks really cool, thanks! cc @samim23 @mekarpeles

from archives.

rht avatar rht commented on August 19, 2024

@davidar Request for reproducible build script. The spec for the package metadata (which includes the build process) should be achievable at 25 GB-scale (~ one server-node) regardless of the complication for many server-nodes case.

from archives.

davidar avatar davidar commented on August 19, 2024

@rht I'm not sure there is a script (yet), it was mostly a (semi-)manual process... :/

from archives.

rht avatar rht commented on August 19, 2024

I see. I have re-created the scholarpedia.org archive in a local branch, since this is ~1GB, fits within one node of a computer. I wasn't able to find the example _Metadata.json from @eminence since it looks like the content has been gc-ed.

from archives.

leerspace avatar leerspace commented on August 19, 2024

The hash for this on archives.ipfs.io (/ipfs/QmfXH9XtP7xmoTH8WAp4HNSduqWMwLTH8B8TvbTkdgzNAa) seems to have become unavailable. If anyone still has a copy I can pin it to my node which I plan to keep running indefinitely.

from archives.

whyrusleeping avatar whyrusleeping commented on August 19, 2024

I may have a copy on my NAS box which is currently in a storage unit (Recently moved). If nobody comes forth with another copy in the next couple months, i'll venture out to get it and plug it into a fat pipe

from archives.

razum2um avatar razum2um commented on August 19, 2024

Seems like QmfXH9XtP7xmoTH8WAp4HNSduqWMwLTH8B8TvbTkdgzNAa is still unavailable, isn't it?

from archives.

Stebalien avatar Stebalien commented on August 19, 2024

I've managed to find a piece of it. I'm going to see if I can find any more.

from archives.

razum2um avatar razum2um commented on August 19, 2024

@Stebalien 18.91 MiB? yep, got it and nothing more for days

from archives.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.