someengineering / fixinventory Goto Github PK

Fix Inventory consolidates user, resource, and configuration data from your cloud environments into a unified, graph-based asset inventory.

Home Page: https://inventory.fix.security

License: GNU Affero General Public License v3.0

Python 99.25% Shell 0.28% HTML 0.14% Makefile 0.21% Jupyter Notebook 0.10% Dockerfile 0.02% CSS 0.01%

aws cnapp cspm cybersecurity digitalocean gcp infrastructure-as-code open-source security security-automation

fixinventory's People

Contributors

Stargazers

Watchers

Forkers

lloesche tolidano jumping gothka prmrreddy devopsuser99 devopstoday11 jnjmarte trendingtechnology kherings denise-amiga mjsolidarios vector-rc yasinai mrmarvin hexpy asher-lab jplourenco1 yuval-k cctvbtx colbyshores tinle shinroo franklinharry frankfanslc tnachen kensipe syllogy tdickers yaeljan jcputter pokom abhihub houey yashvendra xploitsec orinocoz fernandocarletti shivangx01b amar-m-cloud aambertin kernoio luca-digrazia rrwright jesusoctavioas kushthedude packplusplus app-creative akash190104 trellixvulnteam cloud-devops-factory gg-big-org brlink danielabelski fatz wolfsrudel shhadi 1ndevelopment acumenix faizanbabar dhomane creatnodabfa culgiofsuyu 1nv1 cloudgeometry 1101-1 puthiphorn artureio imedc iveskins jamestiotio syedmouaazfarrukh sgergely jeremymechouche fossabot kkpan11 shelegs nburtsev datascientist1976 archiveproject nguyenngocthanhson129 supershal mesosphere

fixinventory's Issues

Update docstrings for sphinx and configure automated generation/export

We should have better structured documentation than the current README.md.

[keepercore] allow backup and restore of database

tbd

[keepercore] rename merge -> replace

rename merge -> replace and move the merge point up. As in if a collector delivers a graph and sets replace = true on the cloud kind node then the entire cloud should be replaced.

keepercore - collect query stats

instrument the query parser and collect stats re: which fields get queried how often for automated index creation and removal

[docker] better defaults on many-core machines

Right now the defaults in our Docker image are not to fork collector plugins and collect 2 accounts simultaneously. This is great for testing Cloudkeeper on a personal Laptop but not for a many-core server. The image should detect the environment it is running in and the amount of resources available to it and automatically scale to higher settings by default.

[keepercore] topic/workflows -> allow to filter by criteria like cloud/account

The task queue allows filtering by attributes. E.g. if a worker tells the core that it can apply tagging for cloud aws then it would receive tag tasks for cloud aws account 123.

In workflows we currently don't have a similar filter criteria. It's just a "collect" or "cleanup" workflow. Not a "collect aws" or "cleanup aws account 123" workflow.

Implement Azure collector plugin

Simple PSK auth between components

Right now all communication between components is open. As mentioned in #218 we're going to add transport layer security over all communication channels. However we also need to do authorization independent of the communication channel.

This will eventually be done by an OpenID Connect service but as a first immediate step let's have a simple auth mechanism that's better than having no-auth and being open to the entire network and also better than plain text passwords.

We'll use a pre-shared key to sign and verify a JWT.

Use TLS for all communication

The current communication over unencrypted http and websocket is obviously not acceptable. We should add the ability to load x509 certs for everything. Starting with the server side of things (ckcore) but eventually doing client authentication (ckworker, ckmetrics, cksh) as well.
This is independent of any JWT users/roles authorization work.

workerd - backpropagate tag changes

Currently tag update/delete are not backpropagated to the core. Instead changes are picked up during the next collect run.
It would be useful to update the individual nodes in real-time so that the graph immediately reflects the changes.

workerd - fan-out tasks to worker pool

Right now incoming tag update/delete tasks are being worked on one by one. One could start n workerd to work on n tasks in parallel. Instead if would be better to accept n tasks in parallel and distribute them onto a worker pool. The basic functionality is already there for all collectors as well as the cleaner. It just isn't used yet for tag tasks.

[keepercore] allow emitting events via the CLI

Implement an on-premise collector

Implement an on-premise collector that can take a network range, finde nodes in it and add them into the graph. Where possible maybe use snmp, ssh or wmi to connect to discovered systems and figure out more about them. Essentially get the equivalent information of what a cloud provider API would return about an compute instance or network.

[keepercore] Add http command to support web hooks

Why

While it is possible to attach to the event bus, or to query the state of the system, it might be the simplest solution to also allow for web hooks to be triggered.
In order to support this scenario, a http command should be implemented, which will perform the request on execution.
This way it would be possible to trigger web hooks based on events (or any other trigger) in the system.

What

Add an http command that allows to specify: method, url
Every element in the in_stream will trigger a http request
When the method is POST: send the element as body
Allow the definition of parallel requests per command

Open Questions

How to handle auth?

Acceptance Criteria

TBD

[docker] bind to 0.0.0.0 by default when running inside docker

[keepercore] less verbose logging on query errors

15:29:37 [WARNING] Request <Request POST /graph/ck/reported/query/aggregate > has failed with exception: Error: AttributeError
Message: Given kind does not exist: volume [core.web.api]
Traceback (most recent call last):
  File "/home/lukas/repo/cloudkeeper/keepercore/core/web/api.py", line 598, in middleware_handler
    response = await handler(request)
  File "/home/lukas/repo/cloudkeeper/venv/lib/python3.9/site-packages/aiohttp/web_urldispatcher.py", line 195, in handler_wrapper
    return await result
  File "/home/lukas/repo/cloudkeeper/keepercore/core/web/api.py", line 485, in query_aggregation
    return await self.stream_response_from_gen(request, gen)
  File "/home/lukas/repo/cloudkeeper/keepercore/core/web/api.py", line 565, in stream_response_from_gen
    gen = await force_gen(gen_in)
  File "/home/lukas/repo/cloudkeeper/keepercore/core/util.py", line 174, in force_gen
    return with_first(await gen.__anext__())
  File "/home/lukas/repo/cloudkeeper/keepercore/core/db/graphdb.py", line 277, in query_aggregation
    q_string, bind = self.to_query(query)
  File "/home/lukas/repo/cloudkeeper/keepercore/core/db/graphdb.py", line 857, in to_query
    part_tuple = part(p, idx, crsr)
  File "/home/lukas/repo/cloudkeeper/keepercore/core/db/graphdb.py", line 820, in part
    cursor = filter_statement()
  File "/home/lukas/repo/cloudkeeper/keepercore/core/db/graphdb.py", line 736, in filter_statement
    query_part += f"LET {out} = (FOR {crsr} in {in_cursor} FILTER {term(crsr, p.term)} RETURN {crsr})"
  File "/home/lukas/repo/cloudkeeper/keepercore/core/db/graphdb.py", line 721, in term
    return is_instance(cursor, ab_term)
  File "/home/lukas/repo/cloudkeeper/keepercore/core/db/graphdb.py", line 706, in is_instance
    raise AttributeError(f"Given kind does not exist: {t.kind}")
AttributeError: Given kind does not exist: volume

is too much logging for saying "user tried to query data that doesn't exist"

[keepercore] Allow v1 like collect in intervals behaviour

Currently workflows are scheduled using a cron like syntax.

If the collect workflow runs every hour and a run takes 65 minutes, the next execution would be skipped and the next workflow only executed at the next full hour.

In v1 the behaviour is: collect in intervals. If the interval target is 1 hour and a collect run takes 30 minutes the system will wait another 30 minutes for the next run to start. However if the target is 1 hour and the collect takes 65 minutes the system will start the next collect right after the current one finished.

This behaviour ensures that our data is as "fresh" and close to the chosen collection interval as possible. v2 should implement a similar scheduling behaviour.

[keepercore] make match/reported/desired/metadata aliases for query

Make match/reported/desired/metadata aliases for query where attributes are prefixed with the section name

keepercore allow definition of args via env vars

Atm. all keepercore args have to come via the commandline. For better container support we should allow setting them via the env as well as is already the case with the other projects.

AWS Cloudformation (and by proxy EKS Nodegroup) delete() should block until completed or failed

Right now during the cleanup phase of AWS resources we mostly just call the delete() method on whatever resource we're trying to remove. For almost all resources this is a blocking call that returns whether or not a resource was succesfully deleted. However Cloudformation always returns success and then the operation status has to be polled to know its status. Those deletes can run for many minutes even hours for complexe environments so waiting for them would currently block resource collection.
We could delete a Cloudformation stack and then wait a certain amount of time for the delete to complete before timing out.

[keepercore] Import: filter undefined sections

If a section is not defined in the import statement, it should not be persisted as null value.
Example:

{id:123, reported:{}} --> {id:123, reported:{}, desired:null, metadata:null}

Upload to PyPI

It would be convenient to just pip install cloudkeeper.
Right now only the Docker image that also runs the tox tests is being build automatically. We have all the appropriate PyPI package names reserved, there's just no automated build pipeline set up.

[keepercore] Add jq commend

Why

Cloudkeeper v1 supports this command.

What

Add jq as dependency
Implement the command

Acceptance criteria

jq can be used in the CLI

[keepercore] Add command to trigger workflows or jobs

Why

To test a workflow it should be easy to start it directly, so the user does not need to wait for the usual trigger.

What

Implement a command to start a workflow or job
start_task task_name

AC

A job or workflow can be started via the CLI.

[ckcore] using tail or head directly after a source should result in the query using a limit

Right now when using tail or head no other query command like successors can be used. When using tail or head after a source command like match/query/etc. it should be converted into a limit / limit desc instead of the string filtering it's currently doing.

[keepercore] there must only be one node of kind graph_root

[keepercore] dump_graph command as poor mans backup solution

superseded by #155

Distribute parallel cleanup by cloud, account and region as to optimaly use API request limits

Right now cleanup is parallelized solely on a resources place within the graph. This could be further optimized for cleanup runtime optimization.

[ckcore] aggregation on graph traversals with duplicated elements are wrong

graph traversals can emit an element multiple times.
duplicated elements are filtered out on the ckcore side.

In case of aggregations, duplicated elements are not filtered but counted multiple times.
Example:

> query is(region) <-[0:]- | count reported.kind
onelogin_region: 1
onelogin_account: 1
slack_region: 3
slack_team: 3
gcp_region: 476
gcp_project: 476
aws_account: 702
aws_region: 702
cloud: 1182
graph_root: 1182
total matched: 4728
total unmatched: 0

There are 1182 regions in total. The path from every region to the root is duplicated on the path, yielding 1182 graph roots. The result of count is wrong in this case.

[keepercore] Aggregation: Allow for combined parameters

Why

There are scenarios where the aggregation grouping key has to be derived from more than one variable.
Example:

aggregate(cloud.name as cloud, account.name as account, region.name as region, instance_type as type, quota_type : sum(reservations) as reserved_instances_total) (merge_with_ancestors="cloud,account,region"): is("instance_type") and reservations >= 0

The account.name is not necessarily unique. Ideally we could express something like:

aggregate("{account.name} ({account.id})" as account, r

What

tbd

AC

tbd

Should we break plugins out into separate Git repos?

Plugins already are standalone Python projects that can be installed and packaged independantly.
Maybe we should move them into separate Github repos.

[cksh] print raw json/yaml instead of python dict

[plugin gcp] Collecting instances with custom machine types raises exceptions, breaks collecting

Issue

When collecting instances on a project with about 100 instances, the resulting graph only ends up with a small subset of those. This seems to be due to the collect task breaking early when encountering an instance with a custom machine type.

Context

Running latest main (3663023) using

ckworker --verbose --psk changeme --ckcore-uri http://localhost:8900 --ckcore-ws-uri ws://localhost:8900 --collector gcp --gcp-service-account="" --gcp-fork --gcp-project-pool-size=64 --gcp-project=myproject --gcp-collect=instances --verbose --debug-dump-json 2>&1 | tee log/ckworker.log

Error

[...]
2021-10-15 10:00:44,528 - DEBUG - 110556/gcp_myproject - Fetching custom instance type for gcp_instance my-custom-instance (2471263805638373273)
2021-10-15 10:00:44,549 - ERROR - 110556/gcp_myproject - Caught exception in collect_something(<cloudkeeper_plugin_gcp.collector.GCPProjectCollector object at 0x7f249e72bd90>, paginate_method_name='aggregatedList',
resource_class=<class 'cloudkeeper_plugin_gcp.resources.GCPInstance'>, post_process=<function GCPProjectCollector.collect_instances.<locals>.post_process at 0x7f2499d92040>, search_map={'__network': ['link', <function GCPProjectCollector.collect_instances.<locals>.<lambda> at 0x7f2499d92160>], '__subnetwork': ['link', <function GCPProjectCollector.collect_instances.<locals>.<lambda> at 0x7f2499d92280>], 'machine_type': ['link', 'machineType']}, attr_map={'instance_status': 'status', 'machine_type_link': 'machineType'}, predecessors=['__network', '__subnetwork', 'machine_type'])
Traceback (most recent call last):
  File "/home/marv/cloudkeeper/venv/lib/python3.9/site-packages/cklib/utils.py", line 489, in catch_and_log
    return f(*args, **kwargs)
  File "/home/marv/cloudkeeper/venv/lib/python3.9/site-packages/cloudkeeper_plugin_gcp/collector.py", line 657, in collect_something
    post_process(r, self.graph)
  File "/home/marv/cloudkeeper/venv/lib/python3.9/site-packages/cloudkeeper_plugin_gcp/collector.py", line 752, in post_process
    request = gr.get(**kwargs)
  File "/home/marv/cloudkeeper/venv/lib/python3.9/site-packages/googleapiclient/discovery.py", line 997, in method
    raise TypeError('Got an unexpected keyword argument {}'.format(name))
TypeError: Got an unexpected keyword argument region
2021-10-15 10:00:44,623 - DEBUG - 110487/gcp - Merging graph of gcp_project myproject into graph of cloud gcp
[...]

[keepercore/ckworker] propagate cleanup status back to core

Currently a cleaned up resource will only be reflected in the graph upon next collect. It would be nice to update the graph immediately after cleanup.

[keepercore] comparison with non existing values yields true.

comparison queries should yield x!=null and x<y

[ckcore] `query | head -1 | ancestors | head -1` returns all ancestors instead of one

Currently only the first limit is obeyed. In the example from the title the seconds head -1 has no effect.

Move to Poetry and pyproject.toml

[keepercore] tag command needs to resolve cloud, account, region, zone

Why

The tagger needs additional information for cloud, account, region and zone.
This information should be provided without further user interaction.

What

Incoming elements need to restore load and merge all references.

AC

match | tag command can be issued without any additional merge_ancestors

keepercore - add node protection metadata and CLI command

match ... | protect should result in:

{
  ...
  ...
  metadata: {
    "protected" : true,
    ...
    ...
  }
}

Move all Cloudkeeper components into a library/utility project

Right now cloudkeeper, collectord, metricsd (graph_exporter), keeper-cli and to some degree keepercore share some duplicated code. We should move that into a keeperlib or keeperutils package.

[keepercore] - allow dynamic CLI aliases

From user feedback on Discord:

Queries like is(aws_alb) and ctime<"-7d" and backends==[] with(empty, <-- is(aws_alb_target_group) and target_type=="instance" and ctime<"-7d" with(empty, <-- is(aws_ec2_instance) and instance_status!="terminated")) <-[0:1]- is(aws_alb_target_group) or is(aws_alb) are powerful but also complex.

Chat protocol:

ViktorHarutyunyan:
I am not sure what's the target audience but I think this is an overkill, I think a simple query like:
> ck show unused lbs
then there should be like a man page explaining the criteria of selection. 
then the user can choose to say -r region -a account etc.. 
or say --days=7
imo, this is much simpler 
then if we talk about UI, then it is all clicks and nice small value edit windows. 
ck show unused tgs .... 
ck show unattached volumes ....

The CLI already has aliases. We should make those configurable so that users can download/exchange other users aliases. So the complex query above could turn into show unused loadbalancers.

Personally I'd also not want to have to type the complex query even if I know the syntax. A shorthand form would be useful.

[ckcore] add `--with-origin` to successors/predecessors/ancestors/descendants again

cloudkeeper v1 had an arg --with-origin that would include the origin node in the output. We should add the same functionality to v2.

Running `echo` without arguments causes an exception

Passing echo with no args to /cli/evaluate returns a 200 OK but when passed to /cli/execute causes an exception

01:04:01 [WARNING] Request <Request POST /cli/execute > has failed with exception: Error: JSONDecodeError
Message: Expecting value: line 1 column 1 (char 0) [core.web.api]
Traceback (most recent call last):
  File "/Users/lukas/repo/cloudkeeper/keepercore/core/web/api.py", line 490, in middleware_handler
    response = await handler(request)
  File "/Users/lukas/repo/cloudkeeper/keepercore/core/web/api.py", line 418, in execute
    return await self.stream_response_from_gen(request, result)
  File "/Users/lukas/repo/cloudkeeper/keepercore/core/web/api.py", line 484, in stream_response_from_gen
    return await respond_json()
  File "/Users/lukas/repo/cloudkeeper/keepercore/core/web/api.py", line 464, in respond_json
    async for item in gen:
  File "/Users/lukas/repo/cloudkeeper/venv/lib/python3.9/site-packages/aiostream/stream/advanced.py", line 59, in base_combine
    result = task.result()
  File "/Users/lukas/repo/cloudkeeper/keepercore/core/cli/command.py", line 48, in parse
    js = json.loads(arg if arg else "")
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The same happens when passing an unquoted string. It evaluates okay but when executed causes an exception.

[resotoworker] propagate collection progress to `resotocore`

Right now the workers only log collection progress. For the UI as well as CLI it would be useful to get collection feedback so they can provide info like "Collecting AWS account 4/20" or similar.
For this to work BasePlugin needs to be extended so it can feed current collection progress back to the worker and then the worker needs to forward that to the core.

Better tests for Cloudkeeper core and plugins

Currently the core has some test coverage and the plugins essentially none. Each plugin has a stub that tests some cli args essentially as a starting point for adding more tests. All the infrastructure is already there and tox configured for each plugin.

Why

When a resource is cleaned up, users of the system should be able to understand why.
The simplest solution for the moment: whenever a resource should be cleaned, the reason is written to the log.

What

CleanCommand accepts a reason message as argument
Add a log message for every resource that is marked for cleanup with the reason included

Acceptance Criteria

A user can see from the log, why a resource was cleaned.

[ckcore] help for alias should return help for referenced command

help should resolve in returning the help for the command the alias is referencing to.
e.g.

help match
No command found with this name: match

someengineering / fixinventory Goto Github PK

fixinventory's People

Contributors

Stargazers

Watchers

Forkers

fixinventory's Issues

Why

What

Open Questions

Acceptance Criteria

Why

What

Acceptance criteria

Why

What

AC

Why

What

AC

Issue

Context

Error

Why

What

AC

Why

What

Acceptance Criteria

Recommend Projects

Recommend Topics

Recommend Org

Jobs