I have opened <a class="issue-link js-issue-link" data-error-text="Failed to load titl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for the tip <a class="user-mention notranslate" data-hovercard-type="user" data

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

pycparser increases memory usage of pyOpenSSL by an order of magnitue about pycparser HOT 10 CLOSED

eliben commented on May 26, 2024

pycparser increases memory usage of pyOpenSSL by an order of magnitue

from pycparser.

Comments (10)

eliben commented on May 26, 2024

Thanks for reporting.

I'll be happy to try to help, but I'll really need a better bug report here :) The usage of pycparser has to be isolated from this stack, and then I can try and see if memory usage is reasonable and can be reduced.

This can end up being a simple matter of a very large job given to pycparser which results in a large AST that consumes a lot of memory, but more data is needed to determine where the fault is.

from pycparser.

ianw commented on May 26, 2024

I instrumented pycparser to dump out whatever it was parsing and it seems to account for the increased memory usage.

pycparser is getting [1] from cryptography->ffi and this pretty much accounts for the increased memory usage

ubuntu@ubuntu:/tmp$ cat test.py
import time

from pycparser import c_parser, parse_file

print "starting"
time.sleep(5)

parse_file('/tmp/file.c')
print "parsed"

time.sleep(10)

ubuntu@ubuntu:/tmp$ python ./test.py 
starting
^Z
[1]+  Stopped                 python ./test.py

ubuntu@ubuntu:/tmp$ ps -eo pid,rss,pmem,cmd  | grep [p]ython
13922  6392  1.2 python ./test.py

ubuntu@ubuntu:/tmp$ fg
python ./test.py
parsed
^Z
[1]+  Stopped                 python ./test.py

ubuntu@ubuntu:/tmp$ ps -eo pid,rss,pmem,cmd  | grep [p]ython
13922 20992  4.1 python ./test.py

if this is reasonable or not, given the input, i'm not sure

[1] http://paste.openstack.org/show/197640/

from pycparser.

ianw commented on May 26, 2024

For reference of how it seems to get this; the bindings defined in _modules at [1] are collected together in [2] where the TYPES FUNCTIONS and MACROS for each binding are eventually sent down to ffi.cdef which makes its way to pycparser

    for name in modules:
        module_name = module_prefix + name
         ...
        types.append(module.TYPES)
        macros.append(module.MACROS)
        functions.append(module.FUNCTIONS)

ffi = build_ffi(cdef_source="\n".join(types + functions + macros) ...)

...

def build_ffi(cdef_source, verify_source, libraries=[], extra_compile_args=[],
              extra_link_args=[]):
    ffi = FFI()
    ffi.cdef(cdef_source)

[1] https://github.com/pyca/cryptography/blob/master/src/cryptography/hazmat/bindings/openssl/binding.py#L65
[2] https://github.com/pyca/cryptography/blob/master/src/cryptography/hazmat/bindings/utils.py#L44

from pycparser.

rbtcollins commented on May 26, 2024

So the question in this context, AIUI, is whether this memory use is excessive to parse the bindings for OpenSSL.

If the memory use is excessive fine. But what I'm curious about is whether the AST is freed once the bindings are compiled and symbols resolved, or if the AST is being kept around indefinitely.

from pycparser.

eliben commented on May 26, 2024

@ianw so it's 14 MB of memory used for that 2K-line header file? I wouldn't say this is completely outlandish - this is Python, and memory usage is quite high. It's possible that this can be reduced - I've never really attempted to micro-optimize pycparser's memory usage. But I don't see a disaster here either (no non-linear increase, no crazy usage).

There's much that can be done, but I don't have much free time to spend on memory optimizations right now. Patches are welcome though :)

from pycparser.

vstinner commented on May 26, 2024

You may play with https://pytracemalloc.readthedocs.org/ to see exactly which lines allocate memory. The module is now part of Python 3.4.

from pycparser.

eliben commented on May 26, 2024

Thanks for the tip @Haypo -- I used to use heapy for this before, but tracemalloc being built-in is way more convenient.

from pycparser.

eliben commented on May 26, 2024

@ianw check out the newly committed version. I use __slots__ in AST nodes to make them as small as possible, and this helps somewhat.

However, this may not affect peak memory usage too much (down from 21MB to 17.5MB on my machine), because the parser itself and the lexer use a bunch of memory while doing their thing - it's inside PLY, so I don't control it. But once the parsing is done, the memory consumed by the AST nodes should be much lower.

According to heapy, the memory consumption of the result is 33% lower after this change (not peak memory).

I'll keep this issue open for now - can you run some tests and verify that memory consumption is lower now?

from pycparser.

eliben commented on May 26, 2024

After some more poking with heapy, I've reduced the size of the result AST by ~15% and the max RSS for the sample file goes down by another ~1MB.

from pycparser.

eliben commented on May 26, 2024

Closing this - the recent changes should have reduced pycparser's memory usage considerably.

from pycparser.

pycparser increases memory usage of pyOpenSSL by an order of magnitue about pycparser HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs