GithubHelp home page GithubHelp logo

jeffkala / pyurlcheck Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 1.0 87 KB

pyurlcheck can be used to scan through all of a projects documents and validate any `public` facing URLs are still reachable.

Python 98.65% Dockerfile 1.35%

pyurlcheck's Introduction

PYURLCHECK

Project is currently a WIP.

pyurlcheck can be used to scan through all of a projects documents and validate any public facing URLs are still reachable.

Why??

It's apparent when navigating code documentation online that keeping up with URLs in documentation isn't always done properly. Running into constant 404 Not Found errors is frustrating for users that are trying to learn how to use a product or tool.

Examples

Running the tool against a single file.

python cli.py examples/example1.md                          
examples/example1.md:8  URL Issue: https://www.ansible.com/jeff

Running the tool against a directory. All files in the directory will be executed.

python cli.py examples/           
examples/example2.md:6  URL Issue: https://www.ansible.com/jeff
examples/example1.md:8  URL Issue: https://www.ansible.com/fake

Alternatively,

you can replace python cli.py with pyurlcheck on the command line.

▶ pyurlcheck pyurlcheck/examples/
pyurlcheck/examples/example3.txt:4      URL Issue: https://www.ansible.com/jeff
pyurlcheck/examples/example2.md:7       URL Issue: https://www.ansible.com/jeff
pyurlcheck/examples/example3.md:3       URL Issue: https://www.ansible.com/jeff
pyurlcheck/examples/example4.rst:22     URL Issue: http://google.com/france
pyurlcheck/examples/example4.rst:23     URL Issue: http://google.com/japan
pyurlcheck/examples/example1.md:9       URL Issue: https://www.ansible.com/jeff

File extensions are currently not checked; therefore all files in a directory that is passed in will be validated.

Installation

pip install pyurlcheck

pyurlcheck's People

Contributors

jeffkala avatar jvanderaa avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

jvanderaa

pyurlcheck's Issues

Error On Link Name With "." in it

### Zigbits Network Design

Main Page: [Zigbits.tech](https://zigbits.tech/)

Fails with the message:

requests.exceptions.MissingSchema: Invalid URL 'Zigbits.tech': No schema supplied. Perhaps you meant http://Zigbits.tech?

URL Issue - Manual Test Fine

I re-ran the tests. I'm getting a couple of failures. One consistently and the other one was an one time issue.

This showed up one time and not others. Maybe a retry issue?

[ASA for Vagrant](https://techbloc.net/archives/2360)  

This link is showing up consistently, yet works when I go to it manually.

[GNS3 DMVPN Intro](http://resources.intenseschool.com/gns3-lab-introduction-to-dmvpn/) 

Feature Request: Display Warning on Redirects

I went looking to check on the code about what happens with a redirect. I believe it would be worth while to display a warning when there is a 3xx redirect that happens. This will allow for the proper link to get updated, but the content is still being displayed.

Sys.Exit() Should Return > 0 On Fail

To execute the check and report back to a CI tool, when there is an URL that doesn't respond sys.exit() should return an integer greater than 0, likely 1, or the number of sites that failed.

Nested directory lookups fail

▶ pyurlcheck ../nautobot/
Traceback (most recent call last):
  File "/Users/jeffkala/Library/Caches/pypoetry/virtualenvs/pyurlcheck-jbPNAqXn-py3.9/bin/pyurlcheck", line 5, in <module>
    main()
  File "/Users/jeffkala/Library/Caches/pypoetry/virtualenvs/pyurlcheck-jbPNAqXn-py3.9/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jeffkala/Library/Caches/pypoetry/virtualenvs/pyurlcheck-jbPNAqXn-py3.9/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/jeffkala/Library/Caches/pypoetry/virtualenvs/pyurlcheck-jbPNAqXn-py3.9/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jeffkala/Library/Caches/pypoetry/virtualenvs/pyurlcheck-jbPNAqXn-py3.9/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/jeffkala/Documents/github-clones/pyurlcheck/pyurlcheck/cli.py", line 20, in main
    files_urls = FindUrls(input_data).find_urls()
  File "/Users/jeffkala/Documents/github-clones/pyurlcheck/pyurlcheck/find.py", line 54, in find_urls
    results.update(_parse_file(f"{self.search}{file}"))
  File "/Users/jeffkala/Documents/github-clones/pyurlcheck/pyurlcheck/find.py", line 29, in _parse_file
    file_data = _read_in_file(filepath)
  File "/Users/jeffkala/Documents/github-clones/pyurlcheck/pyurlcheck/find.py", line 9, in _read_in_file
    return Path(file_to_read).read_text().splitlines()
  File "/usr/local/Cellar/[email protected]/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pathlib.py", line 1255, in read_text
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
  File "/usr/local/Cellar/[email protected]/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pathlib.py", line 1241, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
IsADirectoryError: [Errno 21] Is a directory: '../nautobot/docker'

Fine Tune Regex

currently the regex can catch non-urls if there is dotted notation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.