semgrep / semgrep-action Goto Github PK
View Code? Open in Web Editor NEWThis project is deprecated. Use https://github.com/returntocorp/semgrep instead
Home Page: https://semgrep.dev/docs/semgrep-ci/
This project is deprecated. Use https://github.com/returntocorp/semgrep instead
Home Page: https://semgrep.dev/docs/semgrep-ci/
Looks like semgrep-agent is passing files to semgrep using --includes which causes semgrep to traverse the directory tree to see if any files match the include pattern.
Fix will be to pass the files directly to semgrep as a target argument.
The --baseline-ref
is only used in GitMeta, and not in GithubMeta or GitlabMeta. This means that semgrep-action will claim that all files are new, even when the --baseline-ref
argument is given.
I expect --baseline-ref
to override environment variables.
Running in GitLab CI:
$ python -m semgrep_agent --config /tmp/semgrep.yml --baseline-ref reviewed
=== detecting environment
| versions - semgrep 0.32.0 on Python 3.7.9
| environment - running in environment gitlab-ci, triggering event is 'push'
| manage - not logged in
=== setting up agent configuration
| using semgrep rules from /tmp/semgrep.yml
| using default path ignore rules of common test and dependency directories
| found 911 files in the paths to be scanned
| skipping 43 files based on path ignore rules
=== looking for current issues in 868 files
| 20 current issues found
| No ignored issues found
| 20 current issues found
| No ignored issues found
=== not looking at pre-existing issues since all files with current issues are newly created
...
I tested out the .semgrep folder rule passing option for the action and it works great. But if you directly copy the rules from semgrep-rules with the accompanying tests, semgrep-action will also scan the .semgrep folder since the default .semgrepignore folder doesn't have this exception. Should this be the default behaviour?
Anyways, if possible, i vote that by default the .semgrep folder should be under .semgrepignore for the semgrep-action. To help out, i even prepared a pull request: https://github.com/returntocorp/semgrep-action/pull/71
Thank you!
I am running the semgrep_agent in a gitlab runner with the new --json
option. Thanks a lot for the option. It's detecting an eval
usage three times.
Sample file is modified from OWASP Juiceshop and contains and eval
. This is detected. See the first comment after this for the file (I did not put it here to make the issue more readable).
We could list just the paths that actually have hits. Maybe hide this behind a verbose flag though, not sure how noisy it'd get.
We should omit fields that are both:
Specifically, the syntactic_context
field falls in this category.
Original "bug" report below, but check out the discussion for more relevant details to this ticket
Sorry for the obscure title, I can't be much more descriptive because I'm not sure what's going on.
These were run within an hour of each other:
https://github.com/returntocorp/dry-runs/pull/7/checks
https://github.com/returntocorp/dry-runs/pull/8/checks
For some reason, it appears that when someone other than me tries to create a PR, semgrep errors out in CI with:
=== detecting environment
| versions - semgrep 0.30.0 on Python 3.7.9
| environment - running in github-actions, triggering event is 'pull_request'
| semgrep.dev - not logged in
=== setting up agent configuration
Error: OR] you didn't configure what rules semgrep should scan for.
(the first few lines from running on my PR:)
=== detecting environment
| versions - semgrep 0.30.0 on Python 3.7.9
| environment - running in github-actions, triggering event is 'pull_request'
| semgrep.dev - logged in as deployment #1
=== setting up agent configuration
| using semgrep rules configured on the web UI
The rules to scan for are just from the default policy, so they are not configured in semgrep.yml
.
Maybe should add the attrs
package to https://github.com/returntocorp/semgrep-action/blob/develop/pyproject.toml#L11 since semgrep-agent
uses it directly?
For improved user serviceability
The hidden SEMGREP_AGENT_DEBUG
env variable is not exposed to users, we should make this a comfortable public flag instead.
@msorens recommended making it toggleable on the semgrep app's web UI.
semgrep
should get the --verbose
flag too and we should pipe its output throughSeems like we'll need more changes than usual as according to semgrep/semgrep#2054 (comment) the PRECOMPILED_LOCATION
var we use no longer exists.
Sometime between Thursday and Saturday, semgrep-agent started exceeding the 20-minute timeout in my CI system (Buildkite). Prior to this, it used to take anywhere from 30 seconds to 2 minutes.
I am using the returntocorp/semgrep-agent:v1
docker image and the v1
tag was updated with a new docker image yesterday just prior to my first failing build:
https://hub.docker.com/layers/returntocorp/semgrep-agent/v1/images/sha256-93d7382e52[…]0d0aaf8c42f48ff9decc28950974cf038d7a9a201d405?context=explore
Here is what I know:
This same behavior occurs both in Buildkite and on the command-line when I run it locally. Also, my repository is open-source, so you can use my actual data to observe the problem.
The repo is here: https://github.com/chef/automate
And here are the bits from my Makefile:
SEMGREP_CONTAINER := returntocorp/semgrep-action:v1
SEMGREP_COMMON_PARAMS := -m semgrep_agent --publish-token ${SEMGREP_TOKEN} --publish-deployment ${SEMGREP_ID}
SEMGREP_REPO := --env SEMGREP_REPO_NAME=chef/automate
DOCKER_PARAMS := --volume $(realpath .):/automate --workdir /automate
semgrep: ## runs differential semgrep, checking only changes in the current PR, just as is done in CI
docker run -it --rm --init $(DOCKER_PARAMS) $(SEMGREP_REPO) $(SEMGREP_CONTAINER) python $(SEMGREP_COMMON_PARAMS) --baseline-ref master
Implementation details:
Priority: High
I have had to completely disable semgrep-agent until this can be resolved.
Greetings! Testing out the platform, and enjoying things so far. Got this error in my Github Actions pipeline, and followed your request to post it for analysis. Maybe related to #112?
Run returntocorp/semgrep-action@v1
with:
publishToken: ***
publishDeployment: 203
env:
GITHUB_TOKEN: ***
/usr/bin/docker run --name returntocorpsemgrepactionv1_351056 --label 179394 --workdir /github/workspace --rm -e GITHUB_TOKEN -e INPUT_PUBLISHTOKEN -e INPUT_PUBLISHDEPLOYMENT -e INPUT_CONFIG -e INPUT_GENERATESARIF -e HOME -e GITHUB_JOB -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_REPOSITORY_OWNER -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RETENTION_DAYS -e GITHUB_ACTOR -e GITHUB_WORKFLOW -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GITHUB_EVENT_NAME -e GITHUB_SERVER_URL -e GITHUB_API_URL -e GITHUB_GRAPHQL_URL -e GITHUB_WORKSPACE -e GITHUB_ACTION -e GITHUB_EVENT_PATH -e GITHUB_ACTION_REPOSITORY -e GITHUB_ACTION_REF -e GITHUB_PATH -e GITHUB_ENV -e RUNNER_OS -e RUNNER_TOOL_CACHE -e RUNNER_TEMP -e RUNNER_WORKSPACE -e ACTIONS_RUNTIME_URL -e ACTIONS_RUNTIME_TOKEN -e ACTIONS_CACHE_URL -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/_temp/_runner_file_commands":"/github/file_commands" -v "/home/runner/work/semgrep-test-repo/semgrep-test-repo":"/github/workspace" returntocorp/semgrep-action:v1
=== detecting environment
| versions - semgrep 0.32.0 on Python 3.7.9
| environment - running in environment github-actions, triggering event is 'pull_request'
| manage - logged in as deployment #203
=== setting up agent configuration
| policy - using Getting Started
| using semgrep rules configured on the web UI
| using default path ignore rules of common test and dependency directories
| looking at 4 changed paths
| found 4 files in the paths to be scanned
=== looking for current issues in 4 files
=== failed command's STDOUT:
{"results": [], "errors": [{"type": "SemgrepError", "code": 7, "message": "no valid configuration file found (0 configs were invalid)"}]}
=== failed command's STDERR:
A new version of Semgrep is available. Please see https://github.com/returntocorp/semgrep#upgrading for more information.
Error: ROR] `/root/.local/bin/semgrep --skip-unknown-extensions --disable-nosem --json --no-rewrite-rule-ids --config /tmp/tmp3tby2xfe.yml more_fail.py other_feature.py .github/workflows/semgrep.yml should_fail.py` failed with exit code 7
This is an internal error, please file an issue at https://github.com/returntocorp/semgrep-action/issues/new/choose
and include any log output from above.
The current semgrep-action can rely on either
.semgrep.yml
file with a collection of rules, which can be unwieldily to manage for a large number of rules.The "normal" semgrep, supports a .semgrep
folder, that can contain a number of rules in the .semgrep/**/*.yml
form, where it is easy to mantain a number of semgrep rules in a single folder for a project.
So, if it is possible, i would suggest a feature, where semgrep-action would support the same method for rule passing, as this would make it easy to manage larger rulesets for CI (where a project can contain it's rulefolder), where a external registry can't be used because of regulatory requirements.
Thanks!
Add tests
to your .semgrepignore will ignore only files named tests
. To ignore module/tests/test.py
, you need to add tests/
to the .semgrepignore instead.
This is unexpected, and not consistent with .gitignore, where writing just tests
will ignore module/tests/test.py
already.
semgrep.dev will soon support different actions per rule, so
Assume that Finding
instances can have a "dev.semgrep.actions"
key in their finding.metadata
dict in the Results
object that semgrep invocations return here: https://github.com/returntocorp/semgrep-action/blob/1e1c2f06307dcdda29c6f8889c2ffb5abb88eb35/src/semgrep_agent/main.py#L120
We should make the agent follow the actions recommended by semgrep-app.
Specifically, we should exit with
5 notify-only findings hidden in output
<exit code 0>
and
| [... actual blocking errors here ...]
| [... actual blocking errors here ...]
| [... actual blocking errors here ...]
1 notify-only finding hidden in output
<exit code 1>
depending on the value of metadata: dev.semgrep.actions: ["notify", "block"]
["block"]
as their actions when not otherwise specified.Confusing things right now:
:v1
instead of :stable
?agent
even supposed to mean anyway? Why not semgrep-ci
?@dlukeomalley am I missing anything else?
Currently, the agent can only accept the config file and prints out the results to stdout (absent other switches). There's no way to get a json output to process the results using the command line.
It would be nice to be able to configure the Semgrep core that is running inside the agent. Mainly, the ability to get the JSON output. I think the easiest way to do so is by accepting Semgrep code switches (e.g., --json
). This will make the transition between running Semgrep and the agent seamless. The agent is the suggested way of running Semgrep in a CI pipeline so having access to the json output in the container is great.
It seems like this is the location where the context (that contains the config) to Semgrep.
https://github.com/returntocorp/semgrep-action/blob/develop/src/semgrep_agent/main.py#L121
Assume you have this diff:
eval(foo)
# some code
+ eval(foo)
The addition of a new eval security issue should be warned about on a PR. Right now we use a set
to figure out what findings are new, which would make the agent unaware about the new issue. If we used a Counter
, we could warn that there are more instances of the same thing now.
Other than the extra workflow step that we currently use, https://docs.github.com/en/free-pro-team@latest/rest/reference/code-scanning#upload-a-sarif-file exists which would allow for direct upload, so users would only have to pass their github token to get the security tab working. And they might already have passed that to get slack notifications working anyway.
E.g. invoke semgrep from action with --no-rewrite-rule-ids
-[] Results in CI should be named with something short, which probably is just the rule ID (without the registry leader)
-[] Labels of findings in CI should not change with location of rule in a pack or directory
Suggested cheap solution:
Action should call --no-rewrite-rule-ids
on the semgrep binary
Run ./.github/actions/semgrep-action
/usr/bin/docker run --name returntocorpsemgrepactionv1_7098e9 --label 1e5c35 --workdir /github/workspace --rm -e INPUT_CONFIG -e INPUT_PUBLISHTOKEN -e INPUT_PUBLISHDEPLOYMENT -e INPUT_GENERATESARIF -e HOME -e GITHUB_JOB -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_REPOSITORY_OWNER -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RETENTION_DAYS -e GITHUB_ACTOR -e GITHUB_WORKFLOW -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GITHUB_EVENT_NAME -e GITHUB_SERVER_URL -e GITHUB_API_URL -e GITHUB_GRAPHQL_URL -e GITHUB_WORKSPACE -e GITHUB_ACTION -e GITHUB_EVENT_PATH -e GITHUB_PATH -e GITHUB_ENV -e RUNNER_OS -e RUNNER_TOOL_CACHE -e RUNNER_TEMP -e RUNNER_WORKSPACE -e ACTIONS_RUNTIME_URL -e ACTIONS_RUNTIME_TOKEN -e ACTIONS_CACHE_URL -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/_temp/_runner_file_commands":"/github/workspace" returntocorp/semgrep-action:v1
Unable to find image 'returntocorp/semgrep-action:v1' locally
v1: Pulling from returntocorp/semgrep-action
df20fa9351a1: Pulling fs layer
36b3adc4ff6f: Pulling fs layer
4db9de03f499: Pulling fs layer
cd38a04a61f4: Pulling fs layer
9a3838385f13: Pulling fs layer
09359e37df4b: Pulling fs layer
2593afa0e612: Pulling fs layer
cff1f9ba2a6e: Pulling fs layer
27800508e272: Pulling fs layer
84c0aae16fc3: Pulling fs layer
fdda4f84e7a3: Pulling fs layer
cd38a04a61f4: Waiting
9a3838385f13: Waiting
09359e37df4b: Waiting
2593afa0e612: Waiting
cff1f9ba2a6e: Waiting
27800508e272: Waiting
84c0aae16fc3: Waiting
fdda4f84e7a3: Waiting
36b3adc4ff6f: Verifying Checksum
36b3adc4ff6f: Download complete
df20fa9351a1: Verifying Checksum
df20fa9351a1: Download complete
4db9de03f499: Verifying Checksum
4db9de03f499: Download complete
cd38a04a61f4: Verifying Checksum
cd38a04a61f4: Download complete
09359e37df4b: Verifying Checksum
09359e37df4b: Download complete
2593afa0e612: Verifying Checksum
2593afa0e612: Download complete
9a3838385f13: Verifying Checksum
9a3838385f13: Download complete
cff1f9ba2a6e: Verifying Checksum
cff1f9ba2a6e: Download complete
fdda4f84e7a3: Verifying Checksum
fdda4f84e7a3: Download complete
df20fa9351a1: Pull complete
27800508e272: Verifying Checksum
27800508e272: Download complete
84c0aae16fc3: Verifying Checksum
84c0aae16fc3: Download complete
36b3adc4ff6f: Pull complete
4db9de03f499: Pull complete
cd38a04a61f4: Pull complete
9a3838385f13: Pull complete
09359e37df4b: Pull complete
2593afa0e612: Pull complete
cff1f9ba2a6e: Pull complete
27800508e272: Pull complete
84c0aae16fc3: Pull complete
fdda4f84e7a3: Pull complete
Digest: sha256:8498aff37222c4f69405b4f4db3a67fcdd12ce60eb3ded39e82b097305fb913e
Status: Downloaded newer image for returntocorp/semgrep-action:v1
=== detecting environment
| versions - semgrep 0.27.0 on Python 3.7.9
| environment - running in github-actions, triggering event is 'pull_request'
| semgrep.dev - not logged in
=== setting up agent configuration
| using semgrep rules from the committed .semgrep.yml
| using default path ignore rules of common test and dependency directories
| looking at 3399 changed paths
| found 3387 files in the paths to be scanned
| skipping 303 files based on path ignore rules
=== looking for current issues in 3084 files
| No current issues found
| No current issues found
| 1 current issue found
| 2 current issues found
| 2 current issues found
| 2 current issues found
| 2 current issues found
=== failed command's STDOUT:
=== failed command's STDERR:
fatal: No pathspec was given. Which files should I remove?
Error: ROR] `/usr/bin/git rm -f` failed with exit code 128
Our docker image is 388mb, but semgrep itself is <100MB and semgrep-agent is 100mb. We are adding a few dependnecies but we shouldn't be so large. This has a user impact because CI systems pull semgrep_agent so the smaller we are, the faster we run.
quick ideas:
COPY --from=semgrep /usr/local/bin/semgrep-core /tmp/semgrep-core
which we can't delete later because docker layers are append-only. Either squash the image, or maybe we just can just install semgrep from pip?Some debugging:
ine@imbp4 ~/D/r/semgrep-action (develop)> docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
deleteme latest 3a2024614c6b 59 seconds ago 388MB
ine@imbp4 ~/D/r/semgrep-action (develop) [1]> docker image history deleteme
IMAGE CREATED CREATED BY SIZE COMMENT
3a2024614c6b About a minute ago /bin/sh -c #(nop) ENV SEMGREP_ACTION=true S… 0B
1248a8253a31 About a minute ago /bin/sh -c #(nop) CMD ["python" "-m" "semgr… 0B
ea96aab92e25 About a minute ago /bin/sh -c #(nop) ENV PATH=/root/.local/bin… 0B
5e31234a8ae2 About a minute ago /bin/sh -c #(nop) COPY dir:1fd9cad476546e4a5… 117kB
880a9eeb1cb8 About a minute ago /bin/sh -c apk add --no-cache --virtual=.bui… 207MB
f9731445eead 3 minutes ago /bin/sh -c #(nop) COPY file:242636c2950567f1… 139MB
32b1ab530668 3 minutes ago /bin/sh -c #(nop) ENV INSTALLED_SEMGREP_VER… 0B
d085a20dee12 3 minutes ago /bin/sh -c #(nop) COPY file:89f9fdac4917c31a… 597B
1866ac2367b4 3 minutes ago /bin/sh -c #(nop) COPY file:c53eceb6b503d20b… 8.47kB
c061f1cc2db7 3 minutes ago /bin/sh -c #(nop) WORKDIR /app 0B
6b73b71fd64e 8 days ago /bin/sh -c #(nop) CMD ["python3"] 0B
<missing> 8 days ago /bin/sh -c set -ex; wget -O get-pip.py "$P… 7.24MB
<missing> 8 days ago /bin/sh -c #(nop) ENV PYTHON_GET_PIP_SHA256… 0B
<missing> 8 days ago /bin/sh -c #(nop) ENV PYTHON_GET_PIP_URL=ht… 0B
<missing> 3 weeks ago /bin/sh -c #(nop) ENV PYTHON_PIP_VERSION=20… 0B
<missing> 3 weeks ago /bin/sh -c cd /usr/local/bin && ln -s idle3… 32B
<missing> 3 weeks ago /bin/sh -c set -ex && apk add --no-cache --… 27.7MB
<missing> 3 weeks ago /bin/sh -c #(nop) ENV PYTHON_VERSION=3.7.9 0B
<missing> 3 weeks ago /bin/sh -c #(nop) ENV GPG_KEY=0D96DF4D4110E… 0B
<missing> 3 weeks ago /bin/sh -c apk add --no-cache ca-certificates 512kB
<missing> 3 weeks ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
<missing> 3 weeks ago /bin/sh -c #(nop) ENV PATH=/usr/local/bin:/… 0B
<missing> 3 weeks ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 3 weeks ago /bin/sh -c #(nop) ADD file:f17f65714f703db90… 5.57MB
I have a repo and set of rules.
Running semgrep-action on the repo, the action succeeds with "no errors":
semgrep-agent --baseline-ref ... --config semgrep.yml
=== detecting environment
| versions - semgrep 0.31.1 on Python 3.9.0
| environment - running in git, triggering event is 'unknown'
| semgrep.dev - not logged in
=== setting up agent configuration
| using semgrep rules from semgrep.yml
| using path ignore rules from .semgrepignore
| looking at 4 changed paths
| found 4 files in the paths to be scanned
=== looking for current issues in 4 files
| No current issues found
=== not looking at pre-existing issues since there are no current issues
=== exiting with success status
However, running semgrep directly on the code fails:
docker run -v ${PWD}:/src returntocorp/semgrep:0.31.0 --config /src/semgrep.yml ...
running 435 rules...
an internal error occured while invoking semgrep-core:
unknown exception: Parse_info.NoTokenLocation("Match returned an empty list with no token location information; this may be fixed by adding enclosing token information (e.g. bracket or parend tokens) to the list's enclosing node type.")
An error occurred while invoking the semgrep engine; please help us fix this by creating an issue at https://github.com/returntocorp/semgrep
The consequence of this is that security issues are silently making it through my CI pipeline (see https://github.com/returntocorp/semgrep-app/pull/1123#discussion_r526341734) (!)
As a user, I expect that, if Semgrep fails, my CI job should fail.
This would improve total run time by probably around 40% in the case when you have 1 out of 5 changed files introducing new issues.
A continuation of https://github.com/returntocorp/semgrep-action/pull/25
Describe the bug
When I attempt to run semgrep-agent on the command-line in the same fashion that I run it in CI, it is not connecting to the right project on the web UI dashboard (and therefore not using the correct policy).
In the figure, my two real projects are highlighted: chef/automate (which exists at https://github.com/chef/automate
) and chef/chef-cloud (https://github.com/chef/chef-cloud
).
In CI, I use the block of code below (for Buildkite). The relevant portions are highlighted.
The equivalent from the command-line, as I understand it, is this:
$ cd ~/code/go/src/github.com/chef/automate
$ docker run --rm \
--volume $(realpath .):/chef/automate --workdir /chef/automate \
returntocorp/semgrep-action:v1 \
python -m semgrep_agent --publish-token $SEMGREP_TOKEN --publish-deployment $SEMGREP_ID --baseline-ref master
Notable:
--volume $(realpath .):/automate --workdir /automate
, it still results in updating "automate" in the dashboard -- item (1) again. --volume $(realpath .):/foo --workdir /foo
or --volume $(realpath .):/src --workdir /src
, then it creates (or updates) items (2) and (3) respectively.To Reproduce
As above.
Expected behavior
Should be able to connect to the "chef/automate" project and use "Chef-01" policy.
Screenshots
As above.
What is the priority of the bug to you?
Is this a P0 (blocking your adoption of Semgrep or workflow), P1 (important to fix or quite annoying), P2 (regular bug that should get fixed)?
P2 (a bit frustrating, but I can get by without it for a time)
Environment
docker
Currently semgrep-action will return the number of files that are lined up for scanning, and the number of ignored files, but won't print out the number of rules that are loaded or used (and what is being used to load those rules, registry link, .semgrep or .semgrep.yml):
=== detecting environment
| versions - semgrep 0.25.0 on Python 3.7.9
| environment - running in gitlab-ci, triggering event is 'push'
| semgrep.dev - not logged in
=== setting up agent configuration
| using semgrep rules from the committed .semgrep/ directory
| using path ignore rules from .semgrepignore
| found 100 files in the paths to be scanned
| skipping 5 files based on path ignore rules
=== looking for current issues in 100 files
| 0 current issues found
Can we consider if we want to print out the number of rules used in scanning and where the rules have been fetched from?
Options:
Action appears to be reporting issues that occur in PR as well as any new issues on the merge target (e.g. master
).
It should only report issues in the PR itself.
Suggestion:
Calculate --diff-against
using git merge-base
of PR commit and target branch.
I mentioned this some time ago as a casual comment in slack, but surfacing here just to give a bit more exposure.
Since the v1 tag on semgrep-agent is continually bumped with new releases, that means that consumers of the v1 docker image are always at risk of having their CI build break due to a new release. My release engineering folks rather frown on that, which means I cannot make use of semgrep's failing a build iff there is a new problem in our code.
It would be nice to have the option to pin to an unchanging version of your docker image so I could eliminate the risk in my CI pipeline. Not saying you have to immobilize "v1" --I understand the desire to keep that at the head for your own needs--but perhaps have additional tags corresponding to the encapsulated semgrep (since I imagine that changes more frequently).
Right now, it's not clear whether the action will use the hard-coded config or use the backend-configured config.
Suggestion is to fail hard with a descriptive error message in this case.
Discovered via dog-fooding on returntocorp/semgrep.
Even if we override :include .gitignore
in .semgrepignore, semgrep
itself ignores those files by default. We could run it with semgrep --no-git-ignore
to fix this.
User should be able to configure file globs to define run locations when using action without the SaaS backend.
Possible solutions:
--glob
à la ripgrep
--include
and --exclude
to semgrepthe issue is with a surprising behavior of github: let’s assume you have this git history:
main branch 0--1--2--3--4--5
\
your branch A--B
when github starts this job, it actually merges commit B into 5! that’s the codebase semgrep sees and our diff-aware scanning will compare “B merged into 5” against 1 to find new issues
so while you expect that only A and B’s changes will be scanned, right now it’s actually A, B, and 2 through 5 all being scanned, hence the additional changed file count
This might be the right one to use? But we still probably need to add python logic to fetch more commits for the baseline checkout to work https://github.com/actions/checkout#checkout-pull-request-head-commit-instead-of-merge-commit
We should create a script that automatically pulls the SHA256 hash from Docker hub and changes the necessary lines in the Dockerfile.
How to change the semgrep version is prone to error for maintenance devs.
Proposal:
Documentation has moved to semgrep.dev/docs and now includes details on using the GitHub Security Dashboard, but neither is discussed in the README. This ticket is to update that per #97
We can rework it to use poetry for this.
got this error; not sure whats happening but sharing as requested in the message
=== failed command's STDOUT:
{"results": [], "errors": [{"type": "SemgrepError", "code": 2, "message": "an internal error occured while invoking semgrep-core:\n\tunknown exception: Parse_info.NoTokenLocation(\"Match returned an empty list with no token location information; this may be fixed by adding enclosing token information (e.g. bracket or parend tokens) to the list's enclosing node type.\")\nAn error occurred while invoking the semgrep engine; please help us fix this by creating an issue at https://github.com/returntocorp/semgrep"}]}
=== failed command's STDERR:
running 481 rules...
Error: ROR] `/root/.local/bin/semgrep --skip-unknown-extensions --disable-nosem --json --no-rewrite-rule-ids --config /tmp/tmpstr5uk_i.yml webhooks/json_map.go webhooks/notification_formatter.go webhooks/events.go webhooks/events_test.go` failed with exit code 2
This is an internal error, please file an issue at https://github.com/returntocorp/semgrep-action/issues/new/choose
and include any log output from above.
@ievans made a fork of semgrep-action that can also scan changes of yarn.lock etc. and post about how the dependency security hotspots have changed.
From a technical standpoint this feature is pretty well separated, so I'm not worried about unclean code. A wider feature set would also mean the same project would be useful for more people. Some might discover the Semgrep action by looking for a cool dependency change analysis tool.
I think it would lead to a branding nightmare though. Semgrepdep users would expect more support than we'd give it, semgrep-action users would be confused by a weird, unnatural option in the action's config. Users of both would still need to add separate workflows (like we did internally), which is also confusing to maintain since looking at a GHA overview you'd just see 'semgrep-action' running twice.
Is your feature request related to a problem? Please describe.
Not related to a problem
Describe the solution you'd like
Be able to use Semgrep in Jenkins CI, with something like a plugin.
Describe alternatives you've considered
Call Semgrep from CLI in Jenkins. This would need the Jenkins server to have Semgrep installed, which in some setups is a lot harder to configure. For instance, if Jenkins runs in distributed on-demand cloud / containers / runtime sandboxes, you would need to configure the install for Semgrep each time a container / runtime sandbox is created and used.
Semgrep already supporting SARIF output, it would be nice if the action could just upload the result so it will be shown under the security tab on GitHub
Upstream semgrep
in version 0.33.0 recently added a --severity
flag that allows filtering rules to WARNING
, etc. It would be fantastic if this project could support the ability to optionally configure that. As an example use case, we would love the ability to add rules within our codebase such as:
WARNING
severityERROR
severity when all existing reports are fixedWith the following logic in our CI configuration:
ERROR
severity rules on main branch pushesWARNING
and ERROR
severity rules during pull request submission so new code can be fixed before going inHappy to provide more details or submit an implementation if pointed in the right direction! Thanks!
When connecting to semgrep.dev, we should log info about the policy being executed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.