GithubHelp home page GithubHelp logo

web-crawler-cli's Introduction

Web Crawler CLI

alt travis

Web Crawler CLI is a simple command line interface implementing the Frog Front Web Crawler Library.

Building

The project is built using gradle. Once installed building the project is done with the following command.

$> gradle
...
BUILD SUCCESSFUL in 17s
6 actionable tasks: 6 executed

Running

From the command line the application can be run by invoking two different methods.

The first is to use gradle to run the project with its internal JavaExec command. This is invoked as follows.

$> gradle run --args='-f output.txt https://sample.com/'

Building report for https://sample.com/
processing of https://sample.com/ took 9 seconds
output file located at -> output.txt

The second method is to invoke it with the java -jar command. During the initial build Shadow Jar is invoked creating a Fat jar.

$> java -jar build/libs/web-crawler-cli-{version}.jar -f output.txt https://sample.com/

Building report for https://sample.com/
processing of https://sample.com/ took 9 seconds
output file located at -> output.txt

Running in Docker

If you don't want to build locally from source you can use the runnable Docker image. Before doing so you will have to have Docker installed.

The following command will execute the latest docker image and write the output to your current directory.

$> docker run -e "crawl_url=https://sample.com/" -v $(pwd):/app/out cuzz22000/web-crawler-cli

Building Docker

The following is the build command for the docker image. It will install the latest .jar file located in you build/libs directory. You will have to substitute your docker repositoy name.

$> docker build -t ${your_repo}/web-crawler-cli:latest .

Future Plans

  • Implement a more robust CLI. Currently the arguments have to be ordered.. not nice!
  • Dockerize!! Runnning the application from a docker container and have it available via HUB would be pretty cool.

web-crawler-cli's People

Contributors

cuzz22000 avatar

Stargazers

 avatar

Watchers

 avatar

web-crawler-cli's Issues

Gradle 6.x

Update build to play nice with gradle 6.x

New CLI

The current CLI is pretty lame.. with limited functionality.

Implement more robust CLI such as Apache Commons CLI.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.