GithubHelp home page GithubHelp logo

Comments (17)

p0 avatar p0 commented on August 22, 2024 2

To process C++ code, we indeed need to know how to compile it. The easiest way of setting that up is codeql's default, but that indeed relies on observing compilations locally, and distributed build systems like goma or bazel will not work with that approach without disabling the "distributed" aspect.

One possible approach is to leverage compile-commands.json files, as generated by many build systems. The information in them is sufficient to drive the CodeQL tooling, but obtaining such a file is a non-standard build-system-specific and sometimes project-specific process. One possibility would be to add support in CodeQL for creating a database based on such compiler settings files. [It's worth noting that this would not be equivalent to tracing a full local build, since CodeQL takes advantage of information from linker invocations too, and those are not represented in compilation command databases.]

from codeql-cli-binaries.

mvanotti avatar mvanotti commented on August 22, 2024 1

Maybe it would be good to have a way of determining what is needed for CodeQL to work properly, and then each build system could figure out how to export that data somehow. That way, having extractors for different build systems depends on the community.

from codeql-cli-binaries.

haxmeadroom avatar haxmeadroom commented on August 22, 2024 1

The above approach did not work for me. I've also tried many combinations with --spawn_strategy=local, --nouse_action_cache, --batch, --action_env=LD_PRELOAD=...lib64trace.so, etc... I have fuzzgoat that builds outside of bazel and gets 12 results (in the csv file output). If I build inside bazel, it runs the 142 evaluations but returns no rows in the csv file. I also always get a warning from bazel that LD_PRELOAD is being ignore. My impression is LD_PRELOAD is required to work, right? Any ideas?

from codeql-cli-binaries.

adityasharad avatar adityasharad commented on August 22, 2024 1

@pestophagous you can see a brief summary of the lines of code seen by CodeQL within your Actions logs here: https://github.com/pestophagous/heory/runs/2765497253?check_suite_focus=true#step:5:257 (Analysis summary for <language>).

We're in the process of rolling out some new features that give you additional diagnostic information about the codebase that was analysed, such as the number of files (or the list of files when running with higher verbosity). Will report back when you can try those out.

from codeql-cli-binaries.

mvanotti avatar mvanotti commented on August 22, 2024 1

@pestophagous , I created issue #13 to track what you are asking for. There's a query that will give you the list of files that are in the database.

from codeql-cli-binaries.

dmivankov avatar dmivankov commented on August 22, 2024 1

Another option for bazel with strict action_env is to add following to bazelrc

# CodeQL build mode
# some vars are defined in https://github.com/github/codeql-action/blob/d7ad71d8034d228d5c8076dc7f058905e272a3fd/src/tracer-config.ts

# CodeQL needs to trace compiler via LD_PRELOAD + some other vars
build:codeql --action_env LD_PRELOAD --action_env ODASA_TRACER_CONFIGURATION --action_env SEMMLE_EXECP --action_env SEMMLE_JAVA_TOOL_OPTIONS --action_env SEMMLE_PRELOAD_libtrace --action_env SEMMLE_PRELOAD_libtrace32 --action_env SEMMLE_PRELOAD_libtrace64 --action_env SEMMLE_COPY_EXECUTABLES_ROOT

# CodeQL needs to compile everything locally and without cache
build:codeql --noremote_accept_cached --remote_upload_local_results=false --spawn_strategy=local

# Pass along CODEQL_* env vars
build:codeql --action_env CODEQL_EXEC_ARGS_OFFSET --action_env CODEQL_EXTRACTOR_JAVA_LOG_DIR --action_env CODEQL_EXTRACTOR_JAVA_RAM --action_env CODEQL_EXTRACTOR_JAVA_ROOT --action_env CODEQL_EXTRACTOR_JAVA_SOURCE_ARCHIVE_DIR --action_env CODEQL_EXTRACTOR_JAVA_THREADS --action_env CODEQL_EXTRACTOR_JAVA_TRAP_DIR --action_env CODEQL_EXTRACTOR_JAVA_WIP_DATABASE --action_env CODEQL_JAVA_HOME --action_env CODEQL_PARENT_ID --action_env CODEQL_PLATFORM --action_env CODEQL_PLATFORM_DLL_EXTENSION --action_env CODEQL_RAM --action_env CODEQL_SCRATCH_DIR --action_env CODEQL_THREADS --action_env CODEQL_DIST --action_env CODEQL_TRACER_LOG

and then use bazel build --config codeql as build command

update: above works for java 11 code under bazel 4, java 17 with bazel 5 but not for java 11 with bazel 5

from codeql-cli-binaries.

Manouchehri avatar Manouchehri commented on August 22, 2024

Using bear + #9 would be a decent solution in my opinion. =)

from codeql-cli-binaries.

p0 avatar p0 commented on August 22, 2024

I would expect CodeQL's built-in support to be able to handle any situation where bear would also work. The problem with goma or bazel is that the compilations end up being done on a server process or different machine, and are invisibile to whatever local monitoring you are trying to do of the build process.

from codeql-cli-binaries.

haxmeadroom avatar haxmeadroom commented on August 22, 2024

Has CodeQL been tested to work with bazel by disabling the distributed aspect, or is this hypothetical? What changes were made to bazel to accomplish this?

To process C++ code, we indeed need to know how to compile it. The easiest way of setting that up is codeql's default, but that indeed relies on observing compilations locally, and distributed build systems like goma or bazel will not work with that approach without disabling the "distributed" aspect.

from codeql-cli-binaries.

adityasharad avatar adityasharad commented on August 22, 2024

For Bazel, one approach that we have used successfully is the following. It can be passed as the build command to codeql database create or use as a run shell step within a GitHub Actions workflow for CodeQL code scanning.

bazel shutdown; bazel build --spawn_strategy=local --nouse_action_cache //path/to/build/targets/...
  • shutdown stops all locally-running Bazel servers
  • --spawn_strategy=local disables the distributed aspect
  • --nouse_action_cache disables the action cache, increasing the likelihood that all your code is recompiled during the build

More involved integration into Bazel's dependency graph is possible but not likely for the majority of use cases. Please try this and let us know if it helps.

from codeql-cli-binaries.

adityasharad avatar adityasharad commented on August 22, 2024

@haxmeadroom could you share more about the project you're building (link if it's open source) and the build commands you're using with and without Bazel?

from codeql-cli-binaries.

pestophagous avatar pestophagous commented on August 22, 2024

I discovered CodeQL this weekend (while digging around in GitHub repo settings looking for other things).

I enabled it to see what would happen, but beyond that I have put essentially zero extra time into reconfiguring my build or trying to get CodeQL to work better on my repo.

Relevant to this bug/enhancement ticket:

  • My build toolchain is qmake (for now), and it seems that any cc/cpp files built in my qmake build are not being scanned.

My build also uses a submodule pointing to a different project that uses CMake, and when my build first compiles that project (which is a dependency of my app), then those files built with CMake do appear to get scanned. I know this because there are 3 warnings from the submodule codebase.

I'm actually quite pleased to see the scan including the submodule code. (After all, any vulnerabilities in the submodule will become "my" vulns after I link to that library.)

Now I just need the scan to include my code, too!

Sometime (on the weekends, for my weekend-only side project), I am willing to tweak my build script to help the scanning work.


QUESTION:

  • Where in the Analysis results (in GitHub web UI) or in the CI/Action log can I see a list of all the cc/cpp files that are scanned?

There must be a list (?), so I don't need to keep injecting bad code into files to see if a warning appears.


Here is the PR where I investigated that my own code does not trigger CodeQL warnings: pestophagous/heory#46

I injected the same "Multiplication result converted to larger type" issue into my code to match the issue that I saw trigger a warning in the submodule. But the scan result says "No new or fixed alerts".

Screenshot from 2021-06-07 08-43-00

from codeql-cli-binaries.

pestophagous avatar pestophagous commented on August 22, 2024

@adityasharad This is all I see when I follow the link to the "brief summary" that you mention:

Analysis produced the following metric data:

|                  Metric                   | Value  |
+-------------------------------------------+--------+
| Total lines of C/C++ code in the database | 775466 |
##[endgroup]
##[group]Analysis summary for cpp
Counted 605060 lines of code for cpp as a baseline.
Analysis produced the following metric data:

|                  Metric                   | Value  |
+-------------------------------------------+--------+
| Total lines of C/C++ code in the database | 775466 |

That provides a "no" answer to "can I see a list of all the cc/cpp files?"

Right? I'm not mad if the answer is "no". I just want to clearly understand if it is yes or no to make sure I didn't misunderstand or follow an incorrect link.

It's great to hear you are working on additional features! I look forward to it. I contributed to this ticket only in the spirit of "giving back" and providing more real-life test cases for the team. I'm not complaining! (How could I, this is all provided free of cost to me!)

Thanks for your reply and interest.

from codeql-cli-binaries.

mvanotti avatar mvanotti commented on August 22, 2024

I have seen that there's a new "Indirect Tracing Mode" for building CodeQL databases. Is this the recommended way to build databases for other build environments (for example, GOMA or RBE)?

Would it be possible to just have something that parses compile_commands.json and emmits the env variables that are needed for codeql cli ?

from codeql-cli-binaries.

mvanotti avatar mvanotti commented on August 22, 2024

Ah, my bad, the Indirect Build Tracing still tries to figure out what the extractors are. But it seems like it should detect gomacc, right?

from codeql-cli-binaries.

adityasharad avatar adityasharad commented on August 22, 2024

πŸ‘‹ For goma (as I understand it) the main requirement is that you disable the distributed aspect of the build. If the build is constrained to the local machine, then either a direct command line passed to codeql database create or a sequence of build steps wrapped by CodeQL's indirect build tracing will work. Neither of those features is designed to force the build to run locally, so you must configure your build to do so.

compile_commands.json support is something we see the need for and are discussing at the moment, with the same caveats that p0 described earlier in this issue. Will keep you updated if this makes it onto our roadmap.

from codeql-cli-binaries.

mvanotti avatar mvanotti commented on August 22, 2024

Hi @adityasharad !

I thought codeqlcli only needed to lookup the compiler invocations of the commands. My understanding is that when compiling with goma, you just use gomacc to build, instead of your regular compiler. That's why I thought it would be somewhat doable to trace.

AIUI, RBE (Remote Build Execution) uses a similar thing, but uses a different prefix (no gomacc).

So I am wondering what would we need to get those as recognized by the codeql cli extractors.

from codeql-cli-binaries.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.