Comments (10)
Thanks for reporting.
I'll be happy to try to help, but I'll really need a better bug report here :) The usage of pycparser has to be isolated from this stack, and then I can try and see if memory usage is reasonable and can be reduced.
This can end up being a simple matter of a very large job given to pycparser which results in a large AST that consumes a lot of memory, but more data is needed to determine where the fault is.
from pycparser.
I instrumented pycparser to dump out whatever it was parsing and it seems to account for the increased memory usage.
pycparser is getting [1] from cryptography->ffi and this pretty much accounts for the increased memory usage
ubuntu@ubuntu:/tmp$ cat test.py
import time
from pycparser import c_parser, parse_file
print "starting"
time.sleep(5)
parse_file('/tmp/file.c')
print "parsed"
time.sleep(10)
ubuntu@ubuntu:/tmp$ python ./test.py
starting
^Z
[1]+ Stopped python ./test.py
ubuntu@ubuntu:/tmp$ ps -eo pid,rss,pmem,cmd | grep [p]ython
13922 6392 1.2 python ./test.py
ubuntu@ubuntu:/tmp$ fg
python ./test.py
parsed
^Z
[1]+ Stopped python ./test.py
ubuntu@ubuntu:/tmp$ ps -eo pid,rss,pmem,cmd | grep [p]ython
13922 20992 4.1 python ./test.py
if this is reasonable or not, given the input, i'm not sure
[1] http://paste.openstack.org/show/197640/
from pycparser.
For reference of how it seems to get this; the bindings defined in _modules at [1] are collected together in [2] where the TYPES
FUNCTIONS
and MACROS
for each binding are eventually sent down to ffi.cdef
which makes its way to pycparser
for name in modules:
module_name = module_prefix + name
...
types.append(module.TYPES)
macros.append(module.MACROS)
functions.append(module.FUNCTIONS)
ffi = build_ffi(cdef_source="\n".join(types + functions + macros) ...)
...
def build_ffi(cdef_source, verify_source, libraries=[], extra_compile_args=[],
extra_link_args=[]):
ffi = FFI()
ffi.cdef(cdef_source)
[1] https://github.com/pyca/cryptography/blob/master/src/cryptography/hazmat/bindings/openssl/binding.py#L65
[2] https://github.com/pyca/cryptography/blob/master/src/cryptography/hazmat/bindings/utils.py#L44
from pycparser.
So the question in this context, AIUI, is whether this memory use is excessive to parse the bindings for OpenSSL.
If the memory use is excessive fine. But what I'm curious about is whether the AST is freed once the bindings are compiled and symbols resolved, or if the AST is being kept around indefinitely.
from pycparser.
@ianw so it's 14 MB of memory used for that 2K-line header file? I wouldn't say this is completely outlandish - this is Python, and memory usage is quite high. It's possible that this can be reduced - I've never really attempted to micro-optimize pycparser's memory usage. But I don't see a disaster here either (no non-linear increase, no crazy usage).
There's much that can be done, but I don't have much free time to spend on memory optimizations right now. Patches are welcome though :)
from pycparser.
You may play with https://pytracemalloc.readthedocs.org/ to see exactly which lines allocate memory. The module is now part of Python 3.4.
from pycparser.
Thanks for the tip @Haypo -- I used to use heapy for this before, but tracemalloc being built-in is way more convenient.
from pycparser.
@ianw check out the newly committed version. I use __slots__
in AST nodes to make them as small as possible, and this helps somewhat.
However, this may not affect peak memory usage too much (down from 21MB to 17.5MB on my machine), because the parser itself and the lexer use a bunch of memory while doing their thing - it's inside PLY, so I don't control it. But once the parsing is done, the memory consumed by the AST nodes should be much lower.
According to heapy, the memory consumption of the result is 33% lower after this change (not peak memory).
I'll keep this issue open for now - can you run some tests and verify that memory consumption is lower now?
from pycparser.
After some more poking with heapy
, I've reduced the size of the result AST by ~15% and the max RSS for the sample file goes down by another ~1MB.
from pycparser.
Closing this - the recent changes should have reduced pycparser's memory usage considerably.
from pycparser.
Related Issues (20)
- offsetof parsing fails due to TYPEID as offsetof_member_designator HOT 1
- Hash Pin Github Action on Workflows HOT 1
- is it possible to parse in-complete C code snippet? HOT 1
- c_generator returning a dict mapping the nodes to their position in the resulting code HOT 1
- Support for __attribute__((weak)) HOT 1
- pycparser.plyparser.ParseError: xx/include/vadefs.h:24:28: before: __gnuc_va_list HOT 4
- Enable OpenSSF Scorecard Action and Badge HOT 2
- make pycparser work with linux kernel code HOT 4
- Missing ; when generating code for extern functions
- Can
- Can't parse incomplete types and other syntactically valid but non-compilable code HOT 1
- Curly braces inside braced-group throws ParseError HOT 2
- AssertionError
- Two-dimensional array binding type problem HOT 1
- Is there a release plan for the next version of pycparser?
- parser error with typedef HOT 10
- assertion error on gcc-9 stddef.h
- Is there a way to find the function declaration matching a function call? HOT 1
- CParser doesn't work with comments HOT 1
- Add end of token coord
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pycparser.