GithubHelp home page GithubHelp logo

codeinthehole / purl Goto Github PK

View Code? Open in Web Editor NEW
287.0 11.0 37.0 168 KB

A simple, immutable URL class with a clean API for interrogation and manipulation.

Home Page: http://codeinthehole.com/writing/purl-immutable-url-objects-for-python/

License: MIT License

Makefile 0.93% Python 99.07%

purl's Introduction

purl - A simple Python URL class

A simple, immutable URL class with a clean API for interrogation and manipulation. Supports Pythons 2.7, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8 and pypy.

Also supports template URLs as per RFC 6570

Contents:

https://secure.travis-ci.org/codeinthehole/purl.png

Docs

http://purl.readthedocs.org/en/latest/

Install

From PyPI (stable):

$ pip install purl

From Github (unstable):

$ pip install git+git://github.com/codeinthehole/purl.git#egg=purl

Use

Construct:

>>> from purl import URL

# String constructor
>>> from_str = URL('https://www.google.com/search?q=testing')

# Keyword constructor
>>> from_kwargs = URL(scheme='https', host='www.google.com', path='/search', query='q=testing')

# Combine
>>> from_combo = URL('https://www.google.com').path('search').query_param('q', 'testing')

URL objects are immutable - all mutator methods return a new instance.

Interrogate:

>>> u = URL('https://www.google.com/search?q=testing')
>>> u.scheme()
'https'
>>> u.host()
'www.google.com'
>>> u.domain()
'www.google.com'
>>> u.username()
>>> u.password()
>>> u.netloc()
'www.google.com'
>>> u.port()
>>> u.path()
'/search'
>>> u.query()
'q=testing'
>>> u.fragment()
''
>>> u.path_segment(0)
'search'
>>> u.path_segments()
('search',)
>>> u.query_param('q')
'testing'
>>> u.query_param('q', as_list=True)
['testing']
>>> u.query_param('lang', default='GB')
'GB'
>>> u.query_params()
{'q': ['testing']}
>>> u.has_query_param('q')
True
>>> u.has_query_params(('q', 'r'))
False
>>> u.subdomains()
['www', 'google', 'com']
>>> u.subdomain(0)
'www'

Note that each accessor method is overloaded to be a mutator method too, similar to the jQuery API. Eg:

>>> u = URL.from_string('https://github.com/codeinthehole')

# Access
>>> u.path_segment(0)
'codeinthehole'

# Mutate (creates a new instance)
>>> new_url = u.path_segment(0, 'tangentlabs')
>>> new_url is u
False
>>> new_url.path_segment(0)
'tangentlabs'

Hence, you can build a URL up in steps:

>>> u = URL().scheme('http').domain('www.example.com').path('/some/path').query_param('q', 'search term')
>>> u.as_string()
'http://www.example.com/some/path?q=search+term'

Along with the above overloaded methods, there is also a add_path_segment method for adding a segment at the end of the current path:

>>> new_url = u.add_path_segment('here')
>>> new_url.as_string()
'http://www.example.com/some/path/here?q=search+term'

Couple of other things:

  • Since the URL class is immutable it can be used as a key in a dictionary
  • It can be pickled and restored
  • It supports equality operations
  • It supports equality operations

URL templates can be used either via a Template class:

>>> from purl import Template
>>> tpl = Template("http://example.com{/list*}")
>>> url = tpl.expand({'list': ['red', 'green', 'blue']})
>>> url.as_string()
'http://example.com/red/green/blue'

or the expand function:

>>> from purl import expand
>>> expand(u"{/list*}", {'list': ['red', 'green', 'blue']})
'/red/green/blue'

A wide variety of expansions are possible - refer to the RFC for more details.

Changelog

v1.6 - 2021-05-15

  • Use pytest insteed of nose.
  • Fix warning around regex string.

v1.5 - 2019-03-10

  • Allow @ in passwords.

v1.4 - 2018-03-11

  • Allow usernames and passwords to be removed from URLs.

v1.3.1

  • Ensure paths always have a leading slash.

v1.3

  • Allow absolute URLs to be converted into relative.

v1.2

  • Support password-less URLs.
  • Allow slashes to be passed as path segments.

v1.1

  • Support setting username and password via mutator methods

v1.0.3

  • Handle some unicode compatibility edge-cases

v1.0.2

  • Fix template expansion bug with no matching variables being passed in. This ensures purl.Template works correctly with the URLs returned from the Github API.

v1.0.1

  • Fix bug with special characters in paths not being escaped.

v1.0

  • Slight tidy up. Document support for PyPy and Python 3.4.

v0.8

  • Support for RFC 6570 URI templates

v0.7

  • All internal strings are unicode.
  • Support for unicode chars in path, fragment, query, auth added.

v0.6

  • Added append_query_param method
  • Added remove_query_param method

v0.5

  • Added support for Python 3.2/3.3 (thanks @pmcnr and @mitchellrj)

v0.4.1

  • Added API docs
  • Added to readthedocs.org

v0.4

  • Modified constructor to accept full URL string as first arg
  • Added add_path_segment method

v0.3.2

  • Fixed bug port number in string when using from_string constructor

v0.3.1

  • Fixed bug with passing lists to query param setter methods

v0.3

  • Added support for comparison and equality
  • Added support for pickling
  • Added __slots__ so instances can be used as keys within dictionaries

Contribute

Clone, create a virtualenv then install purl and the packages required for testing:

$ git clone [email protected]:codeinthehole/purl.git
$ cd purl
$ mkvirtualenv purl  # requires virtualenvwrapper
(purl) $ make

Ensure tests pass using:

(purl) $ pytest

or:

$ tox

purl's People

Contributors

codeinthehole avatar koval avatar mieciu avatar mitchellrj avatar movermeyer avatar rmihael avatar ronnix avatar sbraz avatar specialunderwear avatar st4lk avatar timgates42 avatar tirkarthi avatar treyhunner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

purl's Issues

Cannot create URL with '@' symbol in password

It isn't currently possible to create a URL when there is an @ character in the password.

Using the '@' character directly:
>>> purl.URL("mysql://foo:b@r@localhost")
_URLTuple(host='r', username='foo', password='b', scheme='mysql', port=None, path='', query='', fragment='')

URL encoding it first:
>>> purl.URL("mysql://foo:b%40r@localhost")
_URLTuple(host='localhost', username='foo', password='b%2540r', scheme='mysql', port=None, path='', query='', fragment='')

A potential solution would be to split on '@' from the right when extracting credentials from the URL.

1.5: pytest is failing

I'm trying to package your module as rpm packag. So I'm using typical in such case build, install and test cycle used on building package from non-root account:

  • "setup.py build"
  • "setup.py install --root </install/prefix>"
  • "pytest with PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>

May I ask for help because few units are failing:

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-purl-1.5-3.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-purl-1.5-3.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra --import-mode=importlib
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.8.11, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/tkloczko/rpmbuild/BUILD/purl-1.5/.hypothesis/examples')
rootdir: /home/tkloczko/rpmbuild/BUILD/purl-1.5, configfile: pytest.ini
plugins: forked-1.3.0, shutil-1.7.0, virtualenv-1.7.0, expect-1.1.0, flake8-1.0.7, timeout-1.4.2, betamax-0.8.1, freezegun-0.4.2, case-1.5.3, aspectlib-1.5.2, toolbox-0.5, rerunfailures-9.1.1, requests-mock-1.9.3, cov-2.12.1, pyfakefs-4.5.0, flaky-3.7.0, benchmark-3.4.1, xdist-2.3.0, pylama-7.7.1, datadir-1.3.1, regressions-2.2.0, cases-3.6.3, xprocess-0.18.1, black-0.3.12, checkdocs-2.7.1, anyio-3.3.0, Faker-8.11.0, asyncio-0.15.1, trio-0.7.0, httpbin-1.0.0, subtests-0.5.0, isort-2.0.0, hypothesis-6.14.6, mock-3.6.1, profiling-1.7.0
collected 171 items / 2 errors / 169 selected

================================================================================== ERRORS ==================================================================================
____________________________________________________________________ ERROR collecting purl/template.py _____________________________________________________________________
/usr/lib/python3.8/site-packages/_pytest/runner.py:311: in from_call
    result: Optional[TResult] = func()
/usr/lib/python3.8/site-packages/_pytest/runner.py:341: in <lambda>
    call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
/usr/lib/python3.8/site-packages/_pytest/doctest.py:532: in collect
    module = import_path(self.fspath)
/usr/lib/python3.8/site-packages/_pytest/pathlib.py:544: in import_path
    raise ImportPathMismatchError(module_name, module_file, path)
E   _pytest.pathlib.ImportPathMismatchError: ('purl.template', '/home/tkloczko/rpmbuild/BUILDROOT/python-purl-1.5-3.fc35.x86_64/usr/lib/python3.8/site-packages/purl/template.py', PosixPath('/home/tkloczko/rpmbuild/BUILD/purl-1.5/purl/template.py'))
_______________________________________________________________________ ERROR collecting purl/url.py _______________________________________________________________________
/usr/lib/python3.8/site-packages/_pytest/runner.py:311: in from_call
    result: Optional[TResult] = func()
/usr/lib/python3.8/site-packages/_pytest/runner.py:341: in <lambda>
    call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
/usr/lib/python3.8/site-packages/_pytest/doctest.py:532: in collect
    module = import_path(self.fspath)
/usr/lib/python3.8/site-packages/_pytest/pathlib.py:544: in import_path
    raise ImportPathMismatchError(module_name, module_file, path)
E   _pytest.pathlib.ImportPathMismatchError: ('purl.url', '/home/tkloczko/rpmbuild/BUILDROOT/python-purl-1.5-3.fc35.x86_64/usr/lib/python3.8/site-packages/purl/url.py', PosixPath('/home/tkloczko/rpmbuild/BUILD/purl-1.5/purl/url.py'))
========================================================================= short test summary info ==========================================================================
ERROR purl/template.py - _pytest.pathlib.ImportPathMismatchError: ('purl.template', '/home/tkloczko/rpmbuild/BUILDROOT/python-purl-1.5-3.fc35.x86_64/usr/lib/python3.8/si...
ERROR purl/url.py - _pytest.pathlib.ImportPathMismatchError: ('purl.url', '/home/tkloczko/rpmbuild/BUILDROOT/python-purl-1.5-3.fc35.x86_64/usr/lib/python3.8/site-package...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================================================ 2 errors in 0.59s =============================================================================
pytest-xprocess reminder::Be sure to terminate the started process by running 'pytest --xkill' if you have not explicitly done so in your fixture with 'xprocess.getinfo(<process_name>).terminate()'.

Pre-quoted path segments

Currently, you can't set a pre-quoted path segment. This makes it impossible to set a path with a quoted / (because it will be treated as a segment, not part of the path so won't be quoted, and passing %2F will return %252F because the % gets quoted).

I have a branch, with tests, but the rabbit hole goes deeper when you try to do things like:

URL.add_path_segment(quote=True).add_path_segment(quote=False)

I don't have time to fix this case right now (simply not using purl becomes easier), so thought I'd just raise the issue so that you're aware of it, and point you at my WIP branch, if you'd like to fix that last case. Tests included, so you can see the issue too:

RickyCook#1

passwordless URL support in string representation

Hi,

Today I've discovered quite weird behaviour - please have a look at snippet below:

>>> from purl import URL
>>> url = URL(scheme='postgres', username='user', host='127.0.0.1', port='5432', path='/db_name')
>>> url.as_string()
u'postgres://127.0.0.1:5432/db_name'        # Username disappeared 
>>> url = URL(scheme='postgres', username='user', password='pass', host='127.0.0.1', port='5432', path='/db_name')
>>> url.as_string()
u'postgres://user:[email protected]:5432/db_name'        # In combination w/ passord it works

Of course its being stored properly:

>>> url = URL(scheme='postgres', username='user', host='127.0.0.1', port='5432', path='/db_name')
>>> url
_URLTuple(host='127.0.0.1', username='user', password=None, scheme='postgres', port='5432', path='/db_name', query=None, fragment=N
one)
>>> url.username
<bound method URL.username of _URLTuple(host='127.0.0.1', username='user', password=None, scheme='postgres', port='5432', path='/db
_name', query=None, fragment=None)>
>>> url.username()
u'user'

I think that passwordless URLs should be supported, as - for instance - following postgres URL is fully proper: psql "postgres://[email protected]:5432/template1"

It should be relatively easy to fix, I'll try to submit a patch this week, but what do you think about this?

Custom scheme is not respected in as_string

ipython
Python 3.7.3 (default, Apr 12 2019, 14:40:22)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.3.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from purl import URL

In [2]: url = URL("foobar:///my_path")

In [3]: url.scheme()
Out[3]: 'foobar'

In [4]: url.as_string()
Out[4]: '/my_path'

I expect Out[4] to be "foobar:///my_path"

`query_params()` does not accept lists as query values

The .query_params() method does not parse lists as multiple query parameters. For example {'q': ['testing']} is parsed as equivalent to {'q': "['testing']"}.

>>> from purl import URL
>>> q = URL.from_string('?q=testing')
>>> u = URL.from_string('http://www.google.com')
>>> v = u.query_params(q.query_params())
>>> print v.query_params()
{'q': ["['testing']"]}

Cannot be installed using buildout

When I try to install purl 1.2 using buildout I get the following error:

Traceback (most recent call last):
  File "/Users/denis/.buildout/eggs/setuptools-19.4-py3.5.egg/setuptools/sandbox.py", line 154, in save_modules
  File "/Users/denis/.buildout/eggs/setuptools-19.4-py3.5.egg/setuptools/sandbox.py", line 195, in setup_context
  File "/Users/denis/.buildout/eggs/setuptools-19.4-py3.5.egg/setuptools/sandbox.py", line 239, in run_setup
  File "/Users/denis/.buildout/eggs/setuptools-19.4-py3.5.egg/setuptools/sandbox.py", line 269, in run
  File "/Users/denis/.buildout/eggs/setuptools-19.4-py3.5.egg/setuptools/sandbox.py", line 238, in runner
  File "/Users/denis/.buildout/eggs/setuptools-19.4-py3.5.egg/setuptools/sandbox.py", line 46, in _execfile
  File "/var/folders/mm/5rlp4h9906n3npgjt96lm0fw0000gn/T/easy_install-4zdguc7q/purl-1.2/setup.py", line 5, in <module>
  File "/var/folders/mm/5rlp4h9906n3npgjt96lm0fw0000gn/T/easy_install-4zdguc7q/purl-1.2/purl/__init__.py", line 1, in <module>
  File "/var/folders/mm/5rlp4h9906n3npgjt96lm0fw0000gn/T/easy_install-4zdguc7q/purl-1.2/purl/url.py", line 10, in <module>
ImportError: No module named 'six'

The problem seems to be that you try to import the package in setup.py, when it's not yet installed. This works if six is installed, or indeed if pip is used (because pip has a workaround for this). What you should be doing is either define the version twice in the __init__.py as well as setup.py. Or you open __init__.py as text and parse the version statically.

License?

What license are you releasing this under?

Deprecation warning due to invalid escape sequences

Deprecation warnings are raised due to invalid escape sequences. This can be fixed by using raw strings or escaping the literals. pyupgrade also helps in automatic conversion : https://github.com/asottile/pyupgrade/

find . -iname '*.py' | grep -v example | xargs -P4 -I{} python3.8 -Wall -m py_compile {}
./purl/template.py:16: DeprecationWarning: invalid escape sequence \}
  patterns = re.compile("{([^\}]+)}")

Don't correct parse host

Don't work with cyrillic domain

from purl import URL

url = "www.мастеркофе.рф"
parse_url = URL(url)

print(parse_url.host()) <- return ""

Don't detect host

Add an ability to clear any part of URL

Currently I can't clear port, query and fragment:

>>> url = URL('http://google.com:80/path/to/doc.html?q=query#frag')

>>> url.port('')
80               # NOT a new URL!

>>> url.query('')
'q=query'        # NOT a new URL!

>>> url.fragment('')
'frag'           # NOT a new URL!

Add resolve() method (or function)

Here's the helper function I wrote:

from purl import URL
from urllib import parse

def resolve(base, url):
    """
    Resolves a target URL relative to a base URL in a manner similar to that of a Web browser resolving an anchor tag HREF
    :param base: str|URL
    :param url: str|URL
    :return: URL
    """
    if isinstance(base, URL):
        baseurl = base
    else:
        baseurl = URL(base)

    if isinstance(url, URL):
        relurl = url
    else:
        relurl = URL(url)

    if relurl.host():
        return relurl

    if relurl.path():
        return URL(
            scheme=baseurl.scheme(),
            host=baseurl.host(),
            port=baseurl.port(),
            path=parse.urljoin(baseurl.path(), relurl.path()),
            query=relurl.query(),
            fragment=relurl.fragment(),
        )
    elif relurl.query() or '?' in url:
        return URL(
            scheme=baseurl.scheme(),
            host=baseurl.host(),
            port=baseurl.port(),
            path=baseurl.path(),
            query=relurl.query(),
            fragment=relurl.fragment(),
        )
    elif relurl.fragment() or '#' in url:
        return URL(
            scheme=baseurl.scheme(),
            host=baseurl.host(),
            port=baseurl.port(),
            path=baseurl.path(),
            query=baseurl.query(),
            fragment=relurl.fragment(),
        )
    return baseurl

Usage:

>>> base = URL('http://user:[email protected]:8080/path/to/some/doc.html?q=query#frag')
...
>>> print(resolve(base, '../home'))
http://user:[email protected]:8080/path/to/home

>>> print(resolve(base, 'doc2.html'))
http://user:[email protected]:8080/path/to/some/doc2.html

>>> print(resolve(base, '?'))
http://user:[email protected]:8080/path/to/some/doc.html

>>> print(resolve(base, '?q=git'))
http://user:[email protected]:8080/path/to/some/doc.html?q=git

>>> print(resolve(base, '#'))
http://user:[email protected]:8080/path/to/some/doc.html?q=query

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.