GithubHelp home page GithubHelp logo

Comments (33)

nickrolfe avatar nickrolfe commented on August 22, 2024 1

Catastrophic errors will cause that source file to be missing from the database; regular errors will not. I've created an internal issue to investigate why we don't find stdatomic.h on Windows.

Regarding the excessive memory consumption, we've identified one of the major causes. In general, memory consumption should only be affected by the complexity of the one source file being extracted, but there is one feature that causes memory usage to scale with the overall size of the entire project. For the vast majority of projects, this is never a problem, but with Chromium and its tens of thousands of source files, it is. Of course we want to reliably build databases of Chromium and other massive C++ projects, so we're looking for a solution.

from codeql-cli-binaries.

criemen avatar criemen commented on August 22, 2024 1

Hi @creative16, I'm really sorry for the delays on this issue!
I just wanted to let you know that I'm now on it, and I was able to reproduce the stack overflow exceptions on Windows.
When building the chrome target, I got approximately 3500 (chromium HEAD, codeql HEAD).
I also observed some unrelated errors which will also lead to missing source files from the database.

I'll update the issue once I know more/have a fix.
Some of this might take some time though.

from codeql-cli-binaries.

criemen avatar criemen commented on August 22, 2024 1

While our vendor is still working on fixing the underlying issue, I discovered that we had a regression in our workaround.
After fixing the regression, we do not enounter any more stack overflow exceptions when extracting chromium!
There's still a couple of files we fail to extract, and I have logged issues internally for these, but overall it's looking very good now.

The fix is in the next release of the codeql cli.
Please reopen this issue, or file a new one if you encounter issues with chromium again.

from codeql-cli-binaries.

nickrolfe avatar nickrolfe commented on August 22, 2024

Hi @creative16 - I work on our C/C++ support, and I am absolutely interested in finding and fixing extractor crashes. I'll build a Chromium/Windows database to try and reproduce them and track them down.

Thanks!

from codeql-cli-binaries.

nickrolfe avatar nickrolfe commented on August 22, 2024

A quick update: we've made a change to address this, so the next version (2.2.5) should fix the problem for you. I expect that release will happen in approximately 2 weeks' time.

Thanks again for letting us know.

from codeql-cli-binaries.

creative16 avatar creative16 commented on August 22, 2024

Thanks for looking into it promptly! I'll look forward to the release.

from codeql-cli-binaries.

sad-dev avatar sad-dev commented on August 22, 2024

The chromium on linux build also misses many files, including the aforementioned src/services/network/p2p/socket_manager.cc. It might be a regression, as building with 2.1.0 actually results in a larger database with more files included.

from codeql-cli-binaries.

sad-dev avatar sad-dev commented on August 22, 2024

I noticed 2.25 came out and rebuilt chromium on Linux. Compared to my builds from 2.10-2.14 with 73k files, the 2.24 and 2.25 dbs return about 70k files. Running comm between them flagged up some very important files. I'll hence be moving back to 2.14 for now.

I have attached the filelists (from File f select f.getRelativePath())) here @nickrolfe. Please let me know if there is anything else in the log files (4 GB, sorry) to look out for in helping to troubleshoot
old_files.txt
new_files.txt

from codeql-cli-binaries.

sad-dev avatar sad-dev commented on August 22, 2024

One thing that occurred to me was that my old, bigger databases were created without setting --threads. Codeql could be causing extra crashes in the extractor due to this; I'll rebuild on 2.14 and 2.25 with/without the --threads and see how it goes. Really hoping that this isn't the root cause though, especially if it leads to larger compile times overall.

from codeql-cli-binaries.

sad-dev avatar sad-dev commented on August 22, 2024

Have 4 builds for 2.14 and 2.25 with/without --threads, all of which still have around 70k files. The only two variables left that I can think of which vary as compared to my bigger 73k builds are different versions of github/codeql in my codeql-home, and varying power/clock speed configurations of my CPU (the bigger builds were done on CPUs with less cores and lower clock speeds) . Does the version of codeql affect the cli/extractor?

I did notice that identical (same codeql-cli, same github/codeql, same codebase) builds can result in slightly different database contents, but am at my wits end right now. It is looking like instability is the likely cause, but havn't quite pinned down exactly why.

from codeql-cli-binaries.

nickrolfe avatar nickrolfe commented on August 22, 2024

Hi @sad-dev - sorry to hear that the 2.2.5 release did not fix things for you, and thanks for letting us know. I will build a Chromium snapshot locally. Hopefully I can reproduce your problem and I will track down the cause of the missing files.

Usually it's because of either a segfault or catastrophic error in the extractor. In general, build-system parallelism shouldn't affect this, unless the crashes are related to out-of-memory errors. But the more likely cause is that the extractor is choking on some construct in a header file included by all the files that didn't get extracted. I'll let you know what I find.

Does the version of codeql affect the cli/extractor?

Yes, each new release of the CodeQL CLI includes the latest extractor changes.

from codeql-cli-binaries.

sad-dev avatar sad-dev commented on August 22, 2024

Hi @sad-dev - sorry to hear that the 2.2.5 release did not fix things for you, and thanks for letting us know. I will build a Chromium snapshot locally. Hopefully I can reproduce your problem and I will track down the cause of the missing files.

Usually it's because of either a segfault or catastrophic error in the extractor. In general, build-system parallelism shouldn't affect this, unless the crashes are related to out-of-memory errors. But the more likely cause is that the extractor is choking on some construct in a header file included by all the files that didn't get extracted. I'll let you know what I find.

Does the version of codeql affect the cli/extractor?

Yes, each new release of the CodeQL CLI includes the latest extractor changes.

Thank you for taking a look! With regards to the version of codeql, I was referring to the repos for the queries rather than the CLI (i.e. those found by codeql resolve languages). I deleted them and was still able to build with the cli, so that answers my question, I guess :)

from codeql-cli-binaries.

sad-dev avatar sad-dev commented on August 22, 2024

@nickrolfe I made three builds of chromium with codeql-cli 2.25 on three machines with --threads=0 passed to codeql database create. The specs and results are below:

6C/12T, 32gb ram - 74.6k files
8C/16T, 32gb ram - 73k files
16C/32T, 32gb ram - 70k files

From the looks of it, more threads are leading to stability issues in the extractor, possibly due to a larger 'ripple effect'.

from codeql-cli-binaries.

nickrolfe avatar nickrolfe commented on August 22, 2024

Hi @sad-dev. The Chromium build I'm doing here (also using CodeQL 2.2.5) on an old Linux laptop still hasn't quite finished, but so far I see only 5 files (all in Skia) that failed to extract. And since you are seeing different behaviours on different machines, I'd like to see if we can find what's going wrong with the extractor from your build-tracer.log file.

Assuming that the failed extractions are caused by the extractor segfaulting or otherwise hitting a catastrophic error, do you think you could take a look at places where CodeQL C++ extractor: Backtrace occurs in the log and see if there's a common pattern in the backtraces you find? Note that the top few functions will always be in the exception-handling code, but the following entries will show what was executing when the exception happened.

If the log file is too big to open in a text editor, you could do something like:

grep -A 12 "CodeQL C++ extractor: Backtrace:" log/build-tracer.log

from codeql-cli-binaries.

sad-dev avatar sad-dev commented on August 22, 2024

The greps (with and without the -A12) for "CodeQL C++ extractor" are attached for my 6,8, and 16 core builds, as well as the filelists.

extractor_6c.txt
extractor_6c_full.txt
extractor_8c.txt
extractor_8c_full.txt
extractor_16c.txt
extractor_16c_full.txt
f6c.txt
f8c.txt
f16c.txt

from codeql-cli-binaries.

nickrolfe avatar nickrolfe commented on August 22, 2024

Thanks for sending those. We (the CodeQL team) will need to fix those crashes, but they don't explain why you're missing so many files.

Can you confirm that you're building Chromium from scratch? CodeQL will only extract source files that it observes being compiled, so if the build system thinks foo.o is already up to date, it won't recompile foo.cc and then it won't get extracted. Also, are you using ninja to build? Ninja should be fine, but some other build systems compile code from a daemon process that CodeQL won't 'see'.

The next thing I would do is pick a file that's missing, say foobar.cc, and grep build-tracer.log for that name. If there are no results, then CodeQL did not observe that file being compiled. If there are some results, then they should give clues as to what happened when trying to extract that file.

The other possibility is that extraction was successful, but something went wrong during database finalization. Once my local build finishes, I can investigate whether anything goes wrong there for me, but it might be worth checking the database-create-*.log file for errors around the line Importing .*foobar.cc.*, if that exists.

from codeql-cli-binaries.

sad-dev avatar sad-dev commented on August 22, 2024

I edited the previous post to update with filelists and extractor crashes. Also can confirm that all three were build with autoninja -C out/Default chrome with a empty (rm -rf out) output folder

Picking v8/src/wasm/wasm-objects.cc at random from comm f6.txt f16.txt,

1.) The build-tracer.log files are slightly broken up (even on 6c, it appears that multiple threads write to it at the same time) but they all show :
a) the extractor being called to 'mimic' clang on v8/src/wasm/wasm-objects.cc to create obj/v8/v8_base_without_compiler/wasm-objects.o
b) Creating trap tarball /...v8/src/wasm/wasm-objects.cc....trap.tar.br
c) Emitting trap files for ../../v8/src/wasm/wasm-objects.cc
d) Archiving /home/user/codeql_chromium_db/src/home/user/chromium/src/v8/src/wasm/wasm-objects.cc

only my 6core and 8core logs contain lines of the form
e) Wrote XXX files to ...v8/src/wasm/wasm-objects.cc....trap.tar.br
f) 11 errors detected in the compilation of "../../v8/src/wasm/wasm-objects.cc"

2.) the database-create-*log does not mention v8/src/wasm/wasm-objects.cc on 6c and 8c, but the 16c log has the line:
[ERROR] Skipping import of TRAP files in /home/user/codeql_latest_db/trap/cpp/tarballs/home/user/chromium/src/v8/src/wasm/wasm-objects.cc.506d6f4e_0.trap.tar.br as the manifest could not be read.

grep "manifest could not be read" on the logs throws up 118, 321, and 769 errors on the 6c, 8c, and 16c builds respectively.Some quick sed commands show they they all correspond to missing files in the respective databases

Further observations:
3.) On the 16c build, a large number of files do not even reach step (b) - creating the trap tarball.
4.) the manifest reading error almost perfectly corresponds with the difference in number of files appearing in steps (b) and steps (e).
5.) The files reaching step (e) corresponds almost (> 99%) perfectly to the different .cc files appearing between builds
6.) .h files only appear in step d, if at all (and not in steps a to c), and I observed missing files reaching and not reaching this step.

from codeql-cli-binaries.

nickrolfe avatar nickrolfe commented on August 22, 2024

Thanks, that's really useful.

My guess is that, on the 16c machine, you have a file at /home/user/codeql_latest_db/trap/cpp/tarballs/home/user/chromium/src/v8/src/wasm/wasm-objects.cc.506d6f4e_0.trap.tar.br.manifest.tmp file that is partially or completely truncated, and was not renamed (to strip the .tmp extension).

That's consistent with the extractor terminating prematurely, but I don't yet know why that would happen, and without any kind of backtrace. One guess is that it's the OOM killer.

However, I now see that it's also happening on my machine, so I have a good chance of tracking this down. I'll let you know how I get on, but thanks again for your help and patience!

from codeql-cli-binaries.

nickrolfe avatar nickrolfe commented on August 22, 2024

.h files only appear in step d, if at all (and not in steps a to c), and I observed missing files reaching and not reaching this step.

That is normal - the log entries in steps a to c derive from the name of the file being compiled. Step d corresponds to copying any observed source or header file to the database archive.

I also note the extractor running out of memory would be consistent with your observation that more threads means more failures. I will try to confirm if that's what's happening and, if so, what steps we can take to reduce the memory usage.

from codeql-cli-binaries.

sad-dev avatar sad-dev commented on August 22, 2024

.h files only appear in step d, if at all (and not in steps a to c), and I observed missing files reaching and not reaching this step.

That is normal - the log entries in steps a to c derive from the name of the file being compiled. Step d corresponds to copying any observed source or header file to the database archive.

I also note the extractor running out of memory would be consistent with your observation that more threads means more failures. I will try to confirm if that's what's happening and, if so, what steps we can take to reduce the memory usage.

Instead of reducing the memory usage (or if it proves unfeasible to do so / fails to completely fix extractor crash problems), is it possible to rerun the extractor, either immediately upon failure or just before finalizing? One obvious thing to look out for is that build files might be temporary, thus necessitating an archival if the extractor is to be rerun at the end.

from codeql-cli-binaries.

creative16 avatar creative16 commented on August 22, 2024

Hello @nickrolfe ,

just an update after trying version 2.2.5 to build Chromium on Windows. Generally, it definitely looks like an improvement over 2.2.4. The table below provides some summarised numbers from my build:

2.2.4 2.2.5
Total files in database 63356 73672
Total .c, .cc, .cpp, .cxx files 26248 31670
Total .h, .hxx, .hpp, .hh files 36787 41680

Grepping for "manifest could not be read" from database-create*.log only resulted in 76 results, as compared to 5555 for version 2.2.4. These files do correspond to files missing in the database.

There is, however, no significant difference when looking for "CodeQL C++ extractor: Backtrace" in build-tracer.log for both versions of CodeQL CLI (about 10 to 11 results each). The files indicated are also not present in database-create*.log, although I'm guessing perhaps this is to be expected. So there are still at least 76 + 11 = 87 missing source files.

Finally, as compared to version 2.2.4, the extractor for version 2.2.5 only crashed 7 times due to access violations as compared to hundreds (or thousands) of times for version 2.2.4.

To get a rough idea of how many files are missing, I'm wondering if you or @sad-dev have any idea how to get a full list of files that will be built by ninja?

from codeql-cli-binaries.

sad-dev avatar sad-dev commented on August 22, 2024

I monitored my peak memory usage on a smaller codebase that uses ninja: v8

-j = 34 (default with ninja), no codeql: 10GB
-j = 12, codeql --threads=1 (default): 18GB
-j = 12, codeql --threads=0 : 18GB
-j = 18, codeql --threads=0 : 28GB
As long as memory is not exhausted, I have consistent builds on v8. Remove the limiter and:

-j = 34, codeql --threads=0 : maxes out on my 32gb system, and throws about 150 manifest errors as compared to 0 otherwise. About 5% less files in the database.

We are thus looking at a roughly 5x multiplier on peak memory usage that codeql imposes when compiling v8, and threads used for importing doesn't seem to affect that. Given chromium's higher complexity and much larger (about 30x) size, I wouldn't be surprised if larger multipliers of 10x or so are observed on the memory requirements.

from codeql-cli-binaries.

creative16 avatar creative16 commented on August 22, 2024

After comparing the list of files in the database, and a list of possible files I believe should have been compiled, I think the database is still missing about 1000 source files (i.e. .cc, .cpp, .cxx, .c) in the Windows build.

I did a grep for some of the missing files in build-tracer.log, and managed to find results for them that appears to show they were intercepted by the extractor. However, there seemed to be errors indicated. I include three of the most common ones I found below. Could these errors be the reason why the files were not included in the database? I checked the compiled binary using IDA Pro and verified that the files were compiled into the binary during the build.

"../../third_party/ffmpeg/libavutil/cpu.c", line 21: catastrophic error: cannot open source file "stdatomic.h"
  #include <stdatomic.h>
"../../content/browser/gpu/gpu_process_host.cc": using precompiled header file "obj/content/browser/browser_cc.pch.semmle"
"../../base/trace_event/trace_category.h", line 34: error: expression must have a constant value
          offsetof(TraceCategory, state_) == 0,
          ^
"../../base/trace_event/trace_category.h", line 34: note: conversion from "unsigned char" to "char" is invalid in constant-expression evaluation
          offsetof(TraceCategory, state_) == 0,
          ^
"../../buildtools/third_party/libc++/trunk/include/exception", line 216: error: expected a ")"
  _LIBCPP_FUNC_VIS exception_ptr __copy_exception_ptr(void *__except, const void* __ptr);
                                                            ^

from codeql-cli-binaries.

creative16 avatar creative16 commented on August 22, 2024

Hello,

I've tried a Chromium build on Windows using version 2.2.6. The database had about 1000 more source files than the one I created using 2.2.5, which is great. Thanks for the improvements!

The catastrophic error regarding the missing stdatomic.h still appears in the log file. Another error I noticed that popped up 173 times in the log file was the following:

ERROR: Uncaught Windows exception (EXCEPTION_STACK_OVERFLOW). Exception code: 0xc00000fd

I checked a few of the source files where this error popped up and indeed they were not included in the database. Would this be a cause for the missing files as well?

from codeql-cli-binaries.

nickrolfe avatar nickrolfe commented on August 22, 2024

Hi @creative16, thanks for letting us know.

Yes, the stack overflow exception would also cause the files to be missing from the database. That was the same underlying issue for the problem reported at the start of this thread, but since then we increased the extractor's stack size while we investigate a longer-term fix for the excessive stack usage. If you're seeing that in 2.2.6, then even the increased stack size is not sufficient.

I haven't observed this problem on my own Windows machine while building V8 databases, but I'll do another build of Chromium to try and reproduce the exception.

from codeql-cli-binaries.

creative16 avatar creative16 commented on August 22, 2024

Hello @nickrolfe,

noticed that building Chromium on Windows using version 2.4.2 produced 11134 instances of the EXCEPTION_STACK_OVERFLOW error in the log file, as opposed to 178 instances using version 2.3.2.

You previously mentioned that this had something to do with the extractor's stack size. Just wondering if there was a regression of some sort between the two versions, as the number of such errors are now even more than prior to the fix in 2.2.5.

I haven't had the chance to test this out on version 2.4.3 yet, but looking at the release notes for 2.4.3, I suspect there shouldn't be a difference.

Thanks!

from codeql-cli-binaries.

jbj avatar jbj commented on August 22, 2024

Thanks for following up, @creative16. If stack usage or allocation on Windows has indeed regressed as you observe, we're eager to fix the problem. Did you build the same version of Chromium between CodeQL 2.3.2 and 2.4.2? I'm wondering if a newer and more complex version of Chromium could have pushed us over some limit.

@nickrolfe has switched teams and no longer works on C/C++, but I'll make sure someone investigates this problem.

from codeql-cli-binaries.

creative16 avatar creative16 commented on August 22, 2024

Hello @jbj,

thanks for replying. Nope, I did not build the same version of Chromium. However, I could try using 2.4.2 (or 2.4.3) to build the version of Chromium I used previously for CodeQL 2.3.2. I'll get back to you after I've got the chance to try that out.

from codeql-cli-binaries.

creative16 avatar creative16 commented on August 22, 2024

Hello @jbj, I've finished building the version of Chromium I used previously for CodeQL 2.3.2 with 2.4.3. There were 10414 instances of the stack overflow exception in the log files. So it does appear that there might have been a regression of some sort between 2.3.2 and 2.4.3.

from codeql-cli-binaries.

jbj avatar jbj commented on August 22, 2024

Thanks for investigating! I've planned for the team to address this problem in our next sprint, meaning within three weeks.

from codeql-cli-binaries.

criemen avatar criemen commented on August 22, 2024

Hi @creative16, I established that our previous workaround of increasing the stack size further doesn't work anymore.

I did track the problem to the C++ compiler frontend that's supplied by our vendor.
I have contacted them, and we're working with them on a solution to this problem, but so far there's no ETA for a fix yet.

from codeql-cli-binaries.

creative16 avatar creative16 commented on August 22, 2024

Hello @criemen , thanks for the update, appreciate it. Hopefully it'll be fixed soon!

from codeql-cli-binaries.

criemen avatar criemen commented on August 22, 2024

The stars misaligned, and the upcoming CLI release 2.4.6 only contains half the fix for this issue. I hope that the release after that will contain all relevant patches for this problem.

from codeql-cli-binaries.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.