GithubHelp home page GithubHelp logo

junkblocker / codesearch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from google/codesearch

42.0 42.0 12.0 160 KB

Fork of Google codesearch with more options

License: BSD 3-Clause "New" or "Revised" License

Go 96.90% Shell 1.12% Makefile 1.98%
grep indexed indexer indexing regex regexp regular-expression search

codesearch's People

Contributors

dgryski avatar junkblocker avatar makuto avatar pmezard avatar rsc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

codesearch's Issues

Multiple code repo search

From @suntong on January 18, 2017 19:18

Expanding on the idea from #9, I think the best solution is to separate different code repo into different indexes, because after all, developers working on a single project is rather rare -- I personally need to do code search on many different code repos.

I propose just adding a -name to both index and search program, to separate different indexes.

Further, it's not only apply to different code repo, but even within the same repo, you can use -name to divide them into logical groups whatever you like, if you find eliminating a section is constantly needed.

Copied from original issue: junkblocker/codesearch-pre-github#12

Skipping long lines

First, I'd like to thank you for the work you have put into expanding codesearch.

One question, though: What's the reason for skipping the entire file if a long line is encountered, instead of just ignoring the line?

Parallel indexing?

Would it be possible to parallelise the indexing process, or at least parts of it, to improve the overall speed?

Running this over a 6.4GB repository with 275,000 files in it, on Windows, the process is neither bottlenecked on CPU or disk IO, but the process takes over an hour. Running two index commands on two repos on the same NVMe SSD, in parallel, results in a disk IO of around 20% and barely taxes one core. The memory usage is only around 400MB per process.

I suspect that sequentially opening each file, reading and processing contents, storing the results, then moving onto the next file is causing heavy throughput limitations when there are many thousands of small files.

I don't know enough Go to implement this myself, unfortunately. Is this something you could potentially investigate?

Incremental indexing

From @cloudspeech on October 29, 2015 14:10

Great to discover today there's an actively worked on fork of codesearch!

I am using codesearch already, and noticed that with lots of files reindexing is slow.

It would be great if one could tell the indexer to (re-)index a few files only and merge that efficiently into the existing index. Cursory inspection of the code tells me this should be doable.

A strong plus would be to read the file names - 1 per line - from a (named or regular) pipe, or else a regular file, and index those as soon as a new line becomes available.

Maybe an option --reindex-using < pipeOrFile > ?

Copied from original issue: junkblocker/codesearch-pre-github#8

cindex: skip binary files

I think that would be a great feature.

Often you need to search through a lot of files to find out why some obsolete code doesn't working. Now you have to run shell scripts just to get file list for indexing.

Fix Windows not being able to delete temporary files

Hi,
The Windows version of codesearch is broken because it doesn't properly close any of its file handles before it tries to remove the files.

My fork has this change which fixes this. My fork has diverged (I added a tempDir argument so that I could move the temp files where I wanted them), but someone should be able to port over my changes without too much issue.

Figure out why go get something gets upstream codesearch instead of this fork

Installing codesearch (my codesearch) (github.com/junkblocker/codesearch/cmd/...)
go: found github.com/junkblocker/codesearch/cmd/... in github.com/junkblocker/codesearch v1.1.0
go: finding module for package github.com/google/codesearch/index
go: finding module for package github.com/google/codesearch/regexp
go: found github.com/google/codesearch/regexp in github.com/google/codesearch v1.2.0
SUCCESS.

Somehow google/codesearch/regexp is being considered v1.2.0 and fetched?

Does it depend on GOPROXY being used?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.