GithubHelp home page GithubHelp logo

dgl / cpangrep Goto Github PK

View Code? Open in Web Editor NEW
35.0 35.0 12.0 838 KB

Search code on CPAN with Regexps. No longer maintained, I suggest using http://grep.metacpan.org/ instead.

Perl 67.92% Shell 1.86% Perl 6 1.10% CSS 4.46% HTML 24.65%

cpangrep's Introduction

cpangrep's People

Contributors

benkasminbullock avatar dgl avatar njohnston avatar oalders avatar seveas avatar sineswiper avatar tsibley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cpangrep's Issues

Command line tool

Is there a command line tool — perhaps cpangrep or cpang — for this wonderful service yet? I haven't found any existing tool. If that's the case, I'll have to build one.

Documentation typo? "\." should be "."

In the documentation, http://grep.cpan.me/about

For example -dist=perl to exclude perl, file:.xs to search only XS files or -file:"ppport.h" to exclude ppport.h.

However, that doesn't work:

http://grep.cpan.me/?q=SvPV_const+-file%3A%22ppport%5C.h%22

returns lots of ppport.h results.

The correct syntax seems to be

SvPV_const -file:"ppport.h"

which gives the correct results:

http://grep.cpan.me/?q=SvPV_const+-file%3A%22ppport.h%22

not containing ppport.h results.

It looks like either file names are not regular expressions or possibly this may be a more serious issue if file names are supposed to be regexes.

Fix indexing of perl itself

Currently the indexer picks up whatever is in 02packages.details.gz from CPAN. For perl this can be a rather arbitrary version of perl (I did some research and asked Andreas about this, but haven't heard back yet).

Work out a way of pulling the latest stable release (apparently stored in some JSON somewhere?) and indexing that rather than what the CPAN index gives us.

Unicode bytes are being doubly encoded

Searching here:

http://grep.cpan.me/?q=-%3Eto_json

I noticed that the Unicode bytes in this file:

https://metacpan.org/source/SJDY/Mojo-Webqq-1.6.0/doc/Webqq.pod#L357

are being doubly encoded.

use Mojo::JSON qw(encode_json);
my $json_hash = $msg->to_json_hash();    #获�到�过utf8 decode的hash引用

This is due to printing bytes not marked as "utf8" in a filehandle set to :encoding(utf8):

no utf8;
my $string = 'my $json_hash = $msg->to_json_hash();    #获取到经过utf8 decode的hash引用';
binmode STDOUT, ":encoding(UTF-8)";
print $string;
# Gives the same output as above.

Thanks again for the grep.cpan.me service. It's very useful.

File content in result does not match the real file

With this search http://grep.cpan.me/?q=MANIFEST[^.]+file%3A.t+dist%3DIO-Socket-SSL+file%3Dt%2Fcore.t
I get:

IO-Socket-SSL-2.012/t/core.t

local *MANIFH;
open MANIFH, "MANIFEST" or die "No MANIFEST?!: $!";
while () {[

local *MANIFH;
open MANIFH, "MANIFEST" or die "No MANIFEST?!: $!";
while () {

L, LSQL::Translator
This file contains message digests of all files listed in MANIFEST,
signed via the Module::Signature module, version 0.27.

SHA1 1adcd25e8be40a3d228f13067e2d17f44f306eeb Changes
SHA1 4683a6415d0a7c05e2592d376c068fc848871f09 MANIFEST
SHA1 5b662aa37d4dc77e37257eb5565a89bf7cde03cf MANIFEST.SKIP
SHA1 a27e5fab21a1952c32a452cd638aa5767fa

SULLR/IO-Socket-SSL-2.012

But the extracts above, do not match the content of t/core.t as I see on MetaCPAN.org or inside the archive.

Is it an indexing issue, or is there a MITM (that would not be so surprising for such a critical module as IO::Socket::SSL)?

Filter by author / dist name

I think it'd be useful to be able to provide patterns which should be matched against only the author / dist name to filter results down - for instance, to look for your own name, but not in your own modules, for a random example usage.

If I get sufficient tuits I'll fork the repo, try to get a dev version of this up and running, and submit a pull request, but just thought I'd raise it as a "wishlist" issue item to share the idea at least.

sort results by release date (descending)

It appears that there is no deterministic order given to the results, presently. It would be helpful if they were sorted by release date (descending), as the most recently-released distributions tend to be the most relevant for most searches.

see also #32

Fix UTF-8 handling

Type £ into the search box, watch the mess that ensues.

Ideally want charset guessing somehow, but also want to keep it possible to search using raw bytes, which is obviously somewhat impossible.

I think best would be raw bytes along with a warning if the search string contains anything >\x{bf} because the latin1/utf8 encoding will not be consistent.

Fix up usage of Tie::Redis

% grep -r '#.*Tie::Redis' lib
lib/WWW/CPANGrep/Index/Worker.pm: # (XXX: Probably should make Tie::Redis handle this somehow).
lib/WWW/CPANGrep/Index/Worker.pm: # Tie::Redis currently won't autovivify :(
lib/WWW/CPANGrep/Slabs.pm:use JSON; # XXX: Implement serialisation in Tie::Redis to avoid this
lib/WWW/CPANGrep/Slabs.pm: # Tie::Redis won't autovivify yet :(

Line numbers are always 1

The line numbers always come out as 1:

http://grep.cpan.me/?q=nice%2C+rice%2C+lice
https://metacpan.org/source/BKB/Text-Fuzzy-0.22/examples/list-context.pl#L1

http://grep.cpan.me/?q=Gotta+get+down+on+Friday
https://metacpan.org/source/BKB/JSON-Parse-0.38/lib/JSON/Parse.pod#L1

Every other example I have tried gives a line number of 1.

The line number seems to be printed here:

$_->select('.excerpt-link')->set_attribute(href => "$source#L$excerpt->{line}[0]");

using a value which is set here:

line => [$start, $end],

from calculations here:

my $start = 1 + substr($pm, $indexed_file_offset, $match->[0] - $indexed_file_offset) =~ tr/\n//;

Without running the code I cannot see what's wrong in the above, but it is always coming out as 1.

By the way, thanks for the useful grep.cpan.me service.

attr: MetaCPAN property filter

attr:[{section}/]attr.in.dot.format:  (MetaCPAN property filter; section = release, file; release
    is default if not there)

Leftover from conversations within Issue #1.

Add release date in output

It would be useful to have the date of the release of the distribution displayed just after the distribution name.
Prefered format: ISO 8601 (YYYY-mm-dd)

count: operator

The results count at the top right now counts distributions (see issue #11), however in some cases a plain grep -c is more what is wanted, it would be useful to have a count: operator.

Maybe two forms:

  • count:files

Count number of matches in each file.

  • count:dist

Count number of matches in each distribution.

Block combining and a bug with dist:

This one is most odd:

http://cpangrep3.default.dgl.uk0.bigv.io/?q=HINT_BLOCK_SCOPE
http://cpangrep3.default.dgl.uk0.bigv.io/?q=dist%3Aperl+HINT_BLOCK_SCOPE

A simple dist:perl should have matched the "DROLSKY/perl-5.15.6" distro that was referenced in the first query. Yet, the second result doesn't make sense. Also, the "#define HINT_BLOCK_SCOPE" is not captured in the top part of the first query, even those these match:

http://cpangrep3.default.dgl.uk0.bigv.io/?q=\%23define+HINT_BLOCK_SCOPE
http://grep.cpan.me/?q=\%23define+HINT_BLOCK_SCOPE

Shouldn't all results from one distro be grouped together?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.