GithubHelp home page GithubHelp logo

beatsbears / tarsafe Goto Github PK

View Code? Open in Web Editor NEW
25.0 3.0 5.0 43 KB

A safe subclass of the TarFile class for interacting with tar files. Can be used as a direct drop-in replacement for safe usage of extractall()

License: MIT License

Python 100.00%

tarsafe's Issues

`src` in uploaded wheel

๐Ÿ‘‹ Was doing some random PyPI scraping and noticed tarsafe has src/ inside the built distribution (wheel). I don't think that was intentional, and can cause a few minor issues with import resolution.

tarsafe is very slow to extract some .tar.gz archives, unrelated to size

Hello!

We're using tarsafe as part of https://github.com/datadog/guarddog/ and I noticed that for some archives (unrelated to size), it takes a lot of time to extract, much more than the stdlib tarfile.

Sample file: https://files.pythonhosted.org/packages/2a/e3/624e95d2bc75f78ab7ce45e868b3609dea9da210a9f54e0e4e2c8cf95aa3/datadog-api-client-2.10.0.tar.gz (MD5 6f20eb7f5239a051230bb0a211d11f0b, only around 3k files and 1.5M )

Reproduction:

$ time python3 -c 'import tarfile; tarfile.open("datadog-api-client.tar.gz").extractall("/tmp/tarfile")'
python3 -c   0.55s user 0.67s system 96% cpu 1.276 total

$ time python3 -c 'import tarsafe; tarsafe.open("datadog-api-client.tar.gz").extractall("/tmp/tarsafe")'
python3 -c   66.29s user 1.64s system 98% cpu 1:08.97 total

Here's a profile generated using python3 -m cProfile -s tottime repro.py on Python 3.10.9.

         489374595 function calls (489363690 primitive calls) in 201.207 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 12956401   54.448    0.000   77.369    0.000 posixpath.py:337(normpath)
 12960105   19.456    0.000   31.087    0.000 posixpath.py:71(join)
     3600   18.788    0.005  198.102    0.055 tarsafe.py:47(_safetar_check)
 12956400   15.129    0.000  153.743    0.000 tarsafe.py:64(_is_traversal_attempt)
 64785773   12.333    0.000   12.333    0.000 {method 'startswith' of 'str' objects}
 12956401   10.366    0.000  104.856    0.000 posixpath.py:376(abspath)
 12956401    8.232    0.000   16.106    0.000 posixpath.py:60(isabs)
 99561884    7.588    0.000    7.588    0.000 {method 'append' of 'list' objects}
 12956400    6.601    0.000    9.943    0.000 tarsafe.py:83(_is_device)
 25920105    6.439    0.000    9.825    0.000 posixpath.py:41(_get_sep)
 12956405    5.042    0.000    5.042    0.000 {method 'split' of 'str' objects}
 38887752    4.955    0.000    4.955    0.000 {built-in method builtins.isinstance}
 12956400    4.522    0.000    6.874    0.000 tarsafe.py:69(_is_unsafe_symlink)
 51832955    4.080    0.000    4.080    0.000 {built-in method posix.fspath}
 12956400    4.035    0.000    5.829    0.000 tarsafe.py:76(_is_unsafe_link)
 12956912    3.352    0.000    3.352    0.000 {method 'join' of 'str' objects}
 12963492    2.355    0.000    2.355    0.000 tarfile.py:1417(issym)
 12960129    2.340    0.000    2.340    0.000 {method 'endswith' of 'str' objects}
 12963600    2.304    0.000    2.927    0.000 tarfile.py:2453(__iter__)
 12963598    1.796    0.000    1.796    0.000 tarfile.py:1421(islnk)
 12956400    1.725    0.000    1.725    0.000 tarfile.py:1425(ischr)
 12956400    1.617    0.000    1.617    0.000 tarfile.py:1429(isblk)
     3494    1.466    0.000    1.466    0.000 {built-in method io.open}

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.