GithubHelp home page GithubHelp logo

naetimus / bathyscaphe Goto Github PK

View Code? Open in Web Editor NEW

This project forked from creekorful/bathyscaphe

0.0 0.0 0.0 847 KB

Fast, highly configurable, cloud native dark web crawler.

Home Page: https://blog.creekorful.com/building-fast-modern-web-crawler/

License: GNU General Public License v3.0

Go 96.81% Shell 1.21% Python 1.98%

bathyscaphe's Introduction

Bathyscaphe dark web crawler

CI

Bathyscaphe is a Go written, fast, highly configurable, cloud-native dark web crawler.

How to start the crawler

To start the crawler, one just need to execute the following command:

$ ./scripts/docker/start.sh

and wait for all containers to start.

Notes

  • You can start the crawler in detached mode by passing --detach to start.sh.
  • Ensure you have at least 3 GB of memory as the Elasticsearch stack docker will require 2 GB.

How to initiate crawling

One can use the RabbitMQ dashboard available at localhost:15003, and publish a new JSON object in the crawlingQueue .

The object should look like this:

{
  "url": "https://facebookcorewwwi.onion"
}

How to speed up crawling

If one want to speed up the crawling, he can scale the instance of crawling component in order to increase performances. This may be done by issuing the following command after the crawler is started:

$ ./scripts/docker/start.sh -d --scale crawler=5

this will set the number of crawler instance to 5.

How to view results

You can use the Kibana dashboard available at http://localhost:15004. You will need to create an index pattern named ' resources', and when it asks for the time field, choose 'time'.

How to hack the crawler

If you've made a change to one of the crawler component and wish to use the updated version when running start.sh you just need to issue the following command:

$ goreleaser --snapshot --skip-publish --rm-dist

this will rebuild all images using local changes. After that just run start.sh again to have the updated version running.

Architecture

The architecture details are available here.

bathyscaphe's People

Contributors

creekorful avatar ffroztt avatar gaganbhat avatar smithalc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.