lil-scraper
is a small CLI tool to quickly scrape short snippets of text data from multiple HTTP sources.
Quickest way to install lil-scraper
is using the provided install.sh
script:
curl -LSfs https://walterbm.github.io/lil-scraper/install.sh | sh -s
Alternatively you can download the most recent binary from the Github release artifacts
Feed lil-scraper
a list of urls (one url per line) from stdin and pass in a regular expression pattern with a capture group to search for:
cat urls.txt | lil-scraper --pattern '<i lang="es">([^<]+)</i>'
This is roughly equivalent to running a similar command using xargs
however lil-scraper
will run significantly faster thanks to the tokio async runtime.
cat urls.txt | xargs -P 0 curl | grep -ioE '<i lang="es">([^<]+)</i>'
The lil-scraper
can also search and match on response headers. Headers are normalized to match the curl standard (i.e. lowercased and prepended with <
). For example:
cat urls.txt | lil-scraper --pattern '< last-modified: (.+)'
A Rust regular expression with a capture group which will be used to search and extract text from the HTTP responses.
The amount of time to wait for an HTTP response before disconnecting.
Run the test suite with:
cargo test
Distributed under the MIT License. See LICENSE
for more information.