dgl / cpangrep Goto Github PK

View Code? Open in Web Editor NEW

35.0 35.0 12.0 838 KB

Search code on CPAN with Regexps. No longer maintained, I suggest using http://grep.metacpan.org/ instead.

Perl 67.92% Shell 1.86% Perl 6 1.10% CSS 4.46% HTML 24.65%

cpangrep's Introduction

cpangrep's People

Contributors

Stargazers

Watchers

Forkers

sineswiper tsibley njohnston mishin haarg mark-5 perlprojectrepos cpansprout dur-randir metacpan

cpangrep's Issues

Command line tool

Is there a command line tool — perhaps cpangrep or cpang — for this wonderful service yet? I haven't found any existing tool. If that's the case, I'll have to build one.

Fix Redis connection leaks

Currently leaking Redis connections on each search, which is somewhat bad, track this down.

Documentation typo? "\." should be "."

In the documentation, http://grep.cpan.me/about

For example -dist=perl to exclude perl, file:.xs to search only XS files or -file:"ppport.h" to exclude ppport.h.

However, that doesn't work:

http://grep.cpan.me/?q=SvPV_const+-file%3A%22ppport%5C.h%22

returns lots of ppport.h results.

The correct syntax seems to be

SvPV_const -file:"ppport.h"

which gives the correct results:

http://grep.cpan.me/?q=SvPV_const+-file%3A%22ppport.h%22

not containing ppport.h results.

It looks like either file names are not regular expressions or possibly this may be a more serious issue if file names are supposed to be regexes.

Add "Fork me on GitHub" ribbon

Add the "Fork me on GitHub" ribbon on the home page.
See https://github.com/blog/273-github-ribbons

Currently the indexer picks up whatever is in 02packages.details.gz from CPAN. For perl this can be a rather arbitrary version of perl (I did some research and asked Andreas about this, but haven't heard back yet).

Work out a way of pulling the latest stable release (apparently stored in some JSON somewhere?) and indexing that rather than what the CPAN index gives us.

Unicode bytes are being doubly encoded

Searching here:

http://grep.cpan.me/?q=-%3Eto_json

I noticed that the Unicode bytes in this file:

https://metacpan.org/source/SJDY/Mojo-Webqq-1.6.0/doc/Webqq.pod#L357

are being doubly encoded.

use Mojo::JSON qw(encode_json);
my $json_hash = $msg->to_json_hash();    #èŽ·å�–åˆ°ç»�è¿‡utf8 decodeçš„hashå¼•ç”¨

This is due to printing bytes not marked as "utf8" in a filehandle set to :encoding(utf8):

no utf8;
my $string = 'my $json_hash = $msg->to_json_hash();    #获取到经过utf8 decode的hash引用';
binmode STDOUT, ":encoding(UTF-8)";
print $string;
# Gives the same output as above.

Thanks again for the grep.cpan.me service. It's very useful.

Dedupe files based on checksum

Stuff like ppport.h is everywhere, somehow dedupe it.

either the total result count is wrong, or it doesn't allow paging through all of them anymore

On this page:

http://grep.cpan.me/?q=meep&page=2

It says:

26 to 27 of 112 results

The 112 implies to me that there should be 5 pages to go through, however it only offers two pages.

Cannot type slash in the search box on the results page

The check for whether the '/' keypress event came from the search box is using the non-standard e.srcElement. It should use the standard e.target instead (or as well).

https://developer.mozilla.org/en-US/docs/Web/API/Event/srcElement
https://developer.mozilla.org/en-US/docs/Web/API/Event/target

Bizarre match problem

This incantation properly finds the namespace::clean Changelog

Yet this does not include n::c in the results

The only difference is ...\d vs ...0

Perhaps an ::RE2 issue...?

internal server error

This is repeatable:

http://grep.cpan.me/?q=%28raw|embed|render|f.cloud%29.github.com&page=2

gives "Internal Server Error"

However, pages 1 and 3 work fine.

This command causes a crash

The following command causes a crash:

http://grep.cpan.me/?q=.gitignore%20dist%3AApp.%2A&page=2

Use base HTML template

share/html is a mess, use a template for it.

File content in result does not match the real file

With this search http://grep.cpan.me/?q=MANIFEST[^.]+file%3A.t+dist%3DIO-Socket-SSL+file%3Dt%2Fcore.t
I get:

IO-Socket-SSL-2.012/t/core.t

local *MANIFH;
open MANIFH, "MANIFEST" or die "No MANIFEST?!: $!";
while () {[

local *MANIFH;
open MANIFH, "MANIFEST" or die "No MANIFEST?!: $!";
while () {

L, LSQL::Translator
This file contains message digests of all files listed in MANIFEST,
signed via the Module::Signature module, version 0.27.

SHA1 1adcd25e8be40a3d228f13067e2d17f44f306eeb Changes
SHA1 4683a6415d0a7c05e2592d376c068fc848871f09 MANIFEST
SHA1 5b662aa37d4dc77e37257eb5565a89bf7cde03cf MANIFEST.SKIP
SHA1 a27e5fab21a1952c32a452cd638aa5767fa

SULLR/IO-Socket-SSL-2.012

But the extracts above, do not match the content of t/core.t as I see on MetaCPAN.org or inside the archive.

Is it an indexing issue, or is there a MITM (that would not be so surprising for such a critical module as IO::Socket::SSL)?

Add a plugin for browser's search boxes (OpenSearch)

It would be helpful to have direct access to CPAN grep from my browser's seach box.
This can be done by adding an OpenSearch description of the search URL.

See:

Filter by author / dist name

I think it'd be useful to be able to provide patterns which should be matched against only the author / dist name to filter results down - for instance, to look for your own name, but not in your own modules, for a random example usage.

If I get sufficient tuits I'll fork the repo, try to get a dev version of this up and running, and submit a pull request, but just thought I'd raise it as a "wishlist" issue item to share the idea at least.

sort results by release date (descending)

It appears that there is no deterministic order given to the results, presently. It would be helpful if they were sorted by release date (descending), as the most recently-released distributions tend to be the most relevant for most searches.

Make the regex optional for -file=... queries

and if the regex isn't supplied, just return dists which contain the named file

Handle 404s properly

http://grep.cpan.me/favicon.png

gives a text output saying

ARRAY(0x16f9870)

Fix UTF-8 handling

Type £ into the search box, watch the mess that ensues.

Ideally want charset guessing somehow, but also want to keep it possible to search using raw bytes, which is obviously somewhat impossible.

I think best would be raw bytes along with a warning if the search string contains anything >\x{bf} because the latin1/utf8 encoding will not be consistent.

Use metacpan API to adorn web interface

e.g. Hover over dist name for abstract, other nice touches.

OpenSearch plugin is broken (encoding issue)

The OpenSearch plugin at http://s.cpan.me/opensearch.xml is currently broken: this is not valid UTF-8 due to an invalid byte line 15 just before "search for assignments".

cpangrep-matcher stops when redis is restarted

It would be useful if the matcher survived redis restarts.

Fix up usage of Tie::Redis

% grep -r '#.*Tie::Redis' lib
lib/WWW/CPANGrep/Index/Worker.pm: # (XXX: Probably should make Tie::Redis handle this somehow).
lib/WWW/CPANGrep/Index/Worker.pm: # Tie::Redis currently won't autovivify :(
lib/WWW/CPANGrep/Slabs.pm:use JSON; # XXX: Implement serialisation in Tie::Redis to avoid this
lib/WWW/CPANGrep/Slabs.pm: # Tie::Redis won't autovivify yet :(

In need of docs for new syntax

This would include:

author:
dist:
file:
- prefix for negative search
reminder of (?msi:...) syntax

Incorrect MIME type for OpenSearch plugin

The MIME type for the OpenSearch plugin at http://s.cpan.me/opensearch.xml is incorrect.

From the 1.1 specification draft:

OpenSearch description documents are referred to via the following type:
 `application/opensearchdescription+xml`

API?

I know you were talking about already having some form of an API out there. How complete is that? I had an idea for a Dist::Zilla module that would search for plugins of a certain type, but that would involve searching roles like:

http://grep.cpan.me/?q=^with[\s\%27\%22]%2BDist%3A%3AZilla%3A%3ARole%3A%3AReleaser

Feature request: search by author

It would be nice to be able to grep just within a single author's releases.

Line numbers are always 1

The line numbers always come out as 1:

http://grep.cpan.me/?q=nice%2C+rice%2C+lice
https://metacpan.org/source/BKB/Text-Fuzzy-0.22/examples/list-context.pl#L1

http://grep.cpan.me/?q=Gotta+get+down+on+Friday
https://metacpan.org/source/BKB/JSON-Parse-0.38/lib/JSON/Parse.pod#L1

Every other example I have tried gives a line number of 1.

The line number seems to be printed here:

cpangrep/lib/WWW/CPANGrep.pm

Line 169 in 4fc4f0f

 $_->select('.excerpt-link')->set_attribute(href => "$source#L$excerpt->{line}[0]"); 

using a value which is set here:

cpangrep/lib/WWW/CPANGrep/Matcher.pm

Line 170 in 4fc4f0f

line => [$start, $end],

from calculations here:

cpangrep/lib/WWW/CPANGrep/Matcher.pm

Line 162 in 4fc4f0f

 my $start = 1 + substr($pm, $indexed_file_offset, $match->[0] - $indexed_file_offset) =~ tr/\n//; 

Without running the code I cannot see what's wrong in the above, but it is always coming out as 1.

By the way, thanks for the useful grep.cpan.me service.

attr: MetaCPAN property filter

attr:[{section}/]attr.in.dot.format:  (MetaCPAN property filter; section = release, file; release
    is default if not there)

Leftover from conversations within Issue #1.

Add release date in output

It would be useful to have the date of the release of the distribution displayed just after the distribution name.
Prefered format: ISO 8601 (YYYY-mm-dd)

Dedupe overlapping matches

Searching for something like:
http://grep.cpan.me/?q=dist%3Aperl+file%3Dgv.c+GV+%5C*

Will show snippets that overlap. (Might make sense to fix this along with line numbers because that will make this even more obvious when it happens).

subsearch link with author, dist does not lead to anything

I searched on this: http://grep.cpan.me/?q=JSON%3A%3AMaybeXS++author%3Aether

The "4 more files" link for Dist-Zilla-Plugin-OptionalFeature is: http://grep.cpan.me/?q=JSON%3A%3AMaybeXS%20%20author%3Aether+dist=Dist-Zilla-Plugin-OptionalFeature

but clicking on this link yields no results at all!

xmltv-0.5.33 is not properly indexed

When looking at http://grep.cpan.me/?q=XMLTV the paths of files in the xmltv-0.5.33 distribution are shown as xmltv-0.5.33/./xmltv-0.5.33/[...]

how to grep for $1 ?

$1 \$1 \\$1 \\\$1 '$1' $$1 does not seem to work

unicode characters not correctly rendered on result pages

an example: http://grep.cpan.me/?q=file%3AMETA.json+woobling.org

The result at https://metacpan.org/source/ETHER/MooseX-Storage-0.48/META.json#L1 is a utf8-encoded file, but the non-ascii characters are not rendered properly on the webpage. (The page, however, does claim to be charset=utf-8.)

count: operator

The results count at the top right now counts distributions (see issue #11), however in some cases a plain grep -c is more what is wanted, it would be useful to have a count: operator.

Maybe two forms:

count:files

Count number of matches in each file.

count:dist

Count number of matches in each distribution.

Cleanup config file handling

Currently rather ad-hoc.

Block combining and a bug with dist:

This one is most odd:

http://cpangrep3.default.dgl.uk0.bigv.io/?q=HINT_BLOCK_SCOPE
http://cpangrep3.default.dgl.uk0.bigv.io/?q=dist%3Aperl+HINT_BLOCK_SCOPE

A simple dist:perl should have matched the "DROLSKY/perl-5.15.6" distro that was referenced in the first query. Yet, the second result doesn't make sense. Also, the "#define HINT_BLOCK_SCOPE" is not captured in the top part of the first query, even those these match:

http://cpangrep3.default.dgl.uk0.bigv.io/?q=\%23define+HINT_BLOCK_SCOPE
http://grep.cpan.me/?q=\%23define+HINT_BLOCK_SCOPE

Shouldn't all results from one distro be grouped together?

dgl / cpangrep Goto Github PK

cpangrep's Introduction

cpangrep's People

Contributors

Stargazers

Watchers

Forkers

cpangrep's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs