dgl / cpangrep Goto Github PK
View Code? Open in Web Editor NEWSearch code on CPAN with Regexps. No longer maintained, I suggest using http://grep.metacpan.org/ instead.
Search code on CPAN with Regexps. No longer maintained, I suggest using http://grep.metacpan.org/ instead.
Is there a command line tool — perhaps cpangrep
or cpang
— for this wonderful service yet? I haven't found any existing tool. If that's the case, I'll have to build one.
Currently leaking Redis connections on each search, which is somewhat bad, track this down.
In the documentation, http://grep.cpan.me/about
For example -dist=perl to exclude perl, file:.xs to search only XS files or -file:"ppport.h" to exclude ppport.h.
However, that doesn't work:
http://grep.cpan.me/?q=SvPV_const+-file%3A%22ppport%5C.h%22
returns lots of ppport.h results.
The correct syntax seems to be
SvPV_const -file:"ppport.h"
which gives the correct results:
http://grep.cpan.me/?q=SvPV_const+-file%3A%22ppport.h%22
not containing ppport.h results.
It looks like either file names are not regular expressions or possibly this may be a more serious issue if file names are supposed to be regexes.
Add the "Fork me on GitHub" ribbon on the home page.
See https://github.com/blog/273-github-ribbons
Currently the indexer picks up whatever is in 02packages.details.gz from CPAN. For perl this can be a rather arbitrary version of perl (I did some research and asked Andreas about this, but haven't heard back yet).
Work out a way of pulling the latest stable release (apparently stored in some JSON somewhere?) and indexing that rather than what the CPAN index gives us.
Searching here:
http://grep.cpan.me/?q=-%3Eto_json
I noticed that the Unicode bytes in this file:
https://metacpan.org/source/SJDY/Mojo-Webqq-1.6.0/doc/Webqq.pod#L357
are being doubly encoded.
use Mojo::JSON qw(encode_json);
my $json_hash = $msg->to_json_hash(); #获�到�过utf8 decode的hash引用
This is due to printing bytes not marked as "utf8" in a filehandle set to :encoding(utf8):
no utf8;
my $string = 'my $json_hash = $msg->to_json_hash(); #获取到经过utf8 decode的hash引用';
binmode STDOUT, ":encoding(UTF-8)";
print $string;
# Gives the same output as above.
Thanks again for the grep.cpan.me service. It's very useful.
Stuff like ppport.h is everywhere, somehow dedupe it.
On this page:
http://grep.cpan.me/?q=meep&page=2
It says:
26 to 27 of 112 results
The 112 implies to me that there should be 5 pages to go through, however it only offers two pages.
The check for whether the '/' keypress event came from the search box is using the non-standard e.srcElement
. It should use the standard e.target
instead (or as well).
https://developer.mozilla.org/en-US/docs/Web/API/Event/srcElement
https://developer.mozilla.org/en-US/docs/Web/API/Event/target
This incantation properly finds the namespace::clean Changelog
Yet this does not include n::c in the results
The only difference is ...\d
vs ...0
Perhaps an ::RE2 issue...?
This is repeatable:
http://grep.cpan.me/?q=%28raw|embed|render|f.cloud%29.github.com&page=2
gives "Internal Server Error"
However, pages 1 and 3 work fine.
The following command causes a crash:
http://grep.cpan.me/?q=.gitignore%20dist%3AApp.%2A&page=2
share/html is a mess, use a template for it.
With this search http://grep.cpan.me/?q=MANIFEST[^.]+file%3A.t+dist%3DIO-Socket-SSL+file%3Dt%2Fcore.t
I get:
local *MANIFH;
open MANIFH, "MANIFEST" or die "No MANIFEST?!: $!";
while () {[local *MANIFH;
open MANIFH, "MANIFEST" or die "No MANIFEST?!: $!";
while () {L, LSQL::Translator
This file contains message digests of all files listed in MANIFEST,
signed via the Module::Signature module, version 0.27.SHA1 1adcd25e8be40a3d228f13067e2d17f44f306eeb Changes
SHA1 4683a6415d0a7c05e2592d376c068fc848871f09 MANIFEST
SHA1 5b662aa37d4dc77e37257eb5565a89bf7cde03cf MANIFEST.SKIP
SHA1 a27e5fab21a1952c32a452cd638aa5767fa
But the extracts above, do not match the content of t/core.t as I see on MetaCPAN.org or inside the archive.
Is it an indexing issue, or is there a MITM (that would not be so surprising for such a critical module as IO::Socket::SSL)?
It would be helpful to have direct access to CPAN grep from my browser's seach box.
This can be done by adding an OpenSearch description of the search URL.
See:
I think it'd be useful to be able to provide patterns which should be matched against only the author / dist name to filter results down - for instance, to look for your own name, but not in your own modules, for a random example usage.
If I get sufficient tuits I'll fork the repo, try to get a dev version of this up and running, and submit a pull request, but just thought I'd raise it as a "wishlist" issue item to share the idea at least.
It appears that there is no deterministic order given to the results, presently. It would be helpful if they were sorted by release date (descending), as the most recently-released distributions tend to be the most relevant for most searches.
see also #32
and if the regex isn't supplied, just return dists which contain the named file
Type £ into the search box, watch the mess that ensues.
Ideally want charset guessing somehow, but also want to keep it possible to search using raw bytes, which is obviously somewhat impossible.
I think best would be raw bytes along with a warning if the search string contains anything >\x{bf} because the latin1/utf8 encoding will not be consistent.
e.g. Hover over dist name for abstract, other nice touches.
The OpenSearch plugin at http://s.cpan.me/opensearch.xml is currently broken: this is not valid UTF-8 due to an invalid byte line 15 just before "search for assignments".
It would be useful if the matcher survived redis restarts.
% grep -r '#.*Tie::Redis' lib
lib/WWW/CPANGrep/Index/Worker.pm: # (XXX: Probably should make Tie::Redis handle this somehow).
lib/WWW/CPANGrep/Index/Worker.pm: # Tie::Redis currently won't autovivify :(
lib/WWW/CPANGrep/Slabs.pm:use JSON; # XXX: Implement serialisation in Tie::Redis to avoid this
lib/WWW/CPANGrep/Slabs.pm: # Tie::Redis won't autovivify yet :(
This would include:
The MIME type for the OpenSearch plugin at http://s.cpan.me/opensearch.xml is incorrect.
From the 1.1 specification draft:
OpenSearch description documents are referred to via the following type:
`application/opensearchdescription+xml`
I know you were talking about already having some form of an API out there. How complete is that? I had an idea for a Dist::Zilla module that would search for plugins of a certain type, but that would involve searching roles like:
http://grep.cpan.me/?q=^with[\s\%27\%22]%2BDist%3A%3AZilla%3A%3ARole%3A%3AReleaser
It would be nice to be able to grep just within a single author's releases.
The line numbers always come out as 1:
http://grep.cpan.me/?q=nice%2C+rice%2C+lice
https://metacpan.org/source/BKB/Text-Fuzzy-0.22/examples/list-context.pl#L1
http://grep.cpan.me/?q=Gotta+get+down+on+Friday
https://metacpan.org/source/BKB/JSON-Parse-0.38/lib/JSON/Parse.pod#L1
Every other example I have tried gives a line number of 1.
The line number seems to be printed here:
Line 169 in 4fc4f0f
using a value which is set here:
cpangrep/lib/WWW/CPANGrep/Matcher.pm
Line 170 in 4fc4f0f
from calculations here:
cpangrep/lib/WWW/CPANGrep/Matcher.pm
Line 162 in 4fc4f0f
Without running the code I cannot see what's wrong in the above, but it is always coming out as 1.
By the way, thanks for the useful grep.cpan.me service.
attr:[{section}/]attr.in.dot.format: (MetaCPAN property filter; section = release, file; release
is default if not there)
Leftover from conversations within Issue #1.
It would be useful to have the date of the release of the distribution displayed just after the distribution name.
Prefered format: ISO 8601 (YYYY-mm-dd)
Searching for something like:
http://grep.cpan.me/?q=dist%3Aperl+file%3Dgv.c+GV+%5C*
Will show snippets that overlap. (Might make sense to fix this along with line numbers because that will make this even more obvious when it happens).
I searched on this: http://grep.cpan.me/?q=JSON%3A%3AMaybeXS++author%3Aether
The "4 more files" link for Dist-Zilla-Plugin-OptionalFeature is: http://grep.cpan.me/?q=JSON%3A%3AMaybeXS%20%20author%3Aether+dist=Dist-Zilla-Plugin-OptionalFeature
but clicking on this link yields no results at all!
When looking at http://grep.cpan.me/?q=XMLTV the paths of files in the xmltv-0.5.33 distribution are shown as xmltv-0.5.33/./xmltv-0.5.33/
[...]
$1
\$1
\\$1
\\\$1
'$1'
$$1
does not seem to work
an example: http://grep.cpan.me/?q=file%3AMETA.json+woobling.org
The result at https://metacpan.org/source/ETHER/MooseX-Storage-0.48/META.json#L1 is a utf8-encoded file, but the non-ascii characters are not rendered properly on the webpage. (The page, however, does claim to be charset=utf-8.)
The results count at the top right now counts distributions (see issue #11), however in some cases a plain grep -c is more what is wanted, it would be useful to have a count: operator.
Maybe two forms:
Count number of matches in each file.
Count number of matches in each distribution.
Currently rather ad-hoc.
This one is most odd:
http://cpangrep3.default.dgl.uk0.bigv.io/?q=HINT_BLOCK_SCOPE
http://cpangrep3.default.dgl.uk0.bigv.io/?q=dist%3Aperl+HINT_BLOCK_SCOPE
A simple dist:perl should have matched the "DROLSKY/perl-5.15.6" distro that was referenced in the first query. Yet, the second result doesn't make sense. Also, the "#define HINT_BLOCK_SCOPE" is not captured in the top part of the first query, even those these match:
http://cpangrep3.default.dgl.uk0.bigv.io/?q=\%23define+HINT_BLOCK_SCOPE
http://grep.cpan.me/?q=\%23define+HINT_BLOCK_SCOPE
Shouldn't all results from one distro be grouped together?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.