GithubHelp home page GithubHelp logo

Comments (10)

eliben avatar eliben commented on May 26, 2024

Thanks for reporting.

I'll be happy to try to help, but I'll really need a better bug report here :) The usage of pycparser has to be isolated from this stack, and then I can try and see if memory usage is reasonable and can be reduced.

This can end up being a simple matter of a very large job given to pycparser which results in a large AST that consumes a lot of memory, but more data is needed to determine where the fault is.

from pycparser.

ianw avatar ianw commented on May 26, 2024

I instrumented pycparser to dump out whatever it was parsing and it seems to account for the increased memory usage.

pycparser is getting [1] from cryptography->ffi and this pretty much accounts for the increased memory usage

ubuntu@ubuntu:/tmp$ cat test.py
import time

from pycparser import c_parser, parse_file

print "starting"
time.sleep(5)

parse_file('/tmp/file.c')
print "parsed"

time.sleep(10)
ubuntu@ubuntu:/tmp$ python ./test.py 
starting
^Z
[1]+  Stopped                 python ./test.py

ubuntu@ubuntu:/tmp$ ps -eo pid,rss,pmem,cmd  | grep [p]ython
13922  6392  1.2 python ./test.py

ubuntu@ubuntu:/tmp$ fg
python ./test.py
parsed
^Z
[1]+  Stopped                 python ./test.py

ubuntu@ubuntu:/tmp$ ps -eo pid,rss,pmem,cmd  | grep [p]ython
13922 20992  4.1 python ./test.py

if this is reasonable or not, given the input, i'm not sure

[1] http://paste.openstack.org/show/197640/

from pycparser.

ianw avatar ianw commented on May 26, 2024

For reference of how it seems to get this; the bindings defined in _modules at [1] are collected together in [2] where the TYPES FUNCTIONS and MACROS for each binding are eventually sent down to ffi.cdef which makes its way to pycparser

    for name in modules:
        module_name = module_prefix + name
         ...
        types.append(module.TYPES)
        macros.append(module.MACROS)
        functions.append(module.FUNCTIONS)

ffi = build_ffi(cdef_source="\n".join(types + functions + macros) ...)

...

def build_ffi(cdef_source, verify_source, libraries=[], extra_compile_args=[],
              extra_link_args=[]):
    ffi = FFI()
    ffi.cdef(cdef_source)

[1] https://github.com/pyca/cryptography/blob/master/src/cryptography/hazmat/bindings/openssl/binding.py#L65
[2] https://github.com/pyca/cryptography/blob/master/src/cryptography/hazmat/bindings/utils.py#L44

from pycparser.

rbtcollins avatar rbtcollins commented on May 26, 2024

So the question in this context, AIUI, is whether this memory use is excessive to parse the bindings for OpenSSL.

If the memory use is excessive fine. But what I'm curious about is whether the AST is freed once the bindings are compiled and symbols resolved, or if the AST is being kept around indefinitely.

from pycparser.

eliben avatar eliben commented on May 26, 2024

@ianw so it's 14 MB of memory used for that 2K-line header file? I wouldn't say this is completely outlandish - this is Python, and memory usage is quite high. It's possible that this can be reduced - I've never really attempted to micro-optimize pycparser's memory usage. But I don't see a disaster here either (no non-linear increase, no crazy usage).

There's much that can be done, but I don't have much free time to spend on memory optimizations right now. Patches are welcome though :)

from pycparser.

vstinner avatar vstinner commented on May 26, 2024

You may play with https://pytracemalloc.readthedocs.org/ to see exactly which lines allocate memory. The module is now part of Python 3.4.

from pycparser.

eliben avatar eliben commented on May 26, 2024

Thanks for the tip @Haypo -- I used to use heapy for this before, but tracemalloc being built-in is way more convenient.

from pycparser.

eliben avatar eliben commented on May 26, 2024

@ianw check out the newly committed version. I use __slots__ in AST nodes to make them as small as possible, and this helps somewhat.

However, this may not affect peak memory usage too much (down from 21MB to 17.5MB on my machine), because the parser itself and the lexer use a bunch of memory while doing their thing - it's inside PLY, so I don't control it. But once the parsing is done, the memory consumed by the AST nodes should be much lower.

According to heapy, the memory consumption of the result is 33% lower after this change (not peak memory).

I'll keep this issue open for now - can you run some tests and verify that memory consumption is lower now?

from pycparser.

eliben avatar eliben commented on May 26, 2024

After some more poking with heapy, I've reduced the size of the result AST by ~15% and the max RSS for the sample file goes down by another ~1MB.

from pycparser.

eliben avatar eliben commented on May 26, 2024

Closing this - the recent changes should have reduced pycparser's memory usage considerably.

from pycparser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.