GithubHelp home page GithubHelp logo

duanshuaimin / cartography Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lyft/cartography

0.0 1.0 0.0 2.32 MB

Cartography is a Python tool that consolidates infrastructure assets and the relationships between them in an intuitive graph view powered by a Neo4j database.

License: Apache License 2.0

Python 99.94% Makefile 0.06%

cartography's Introduction

Cartography

Cartography is a Python tool that consolidates infrastructure assets and the relationships between them in an intuitive graph view powered by a Neo4j database.

Table of Contents generated with DocToc

Why Cartography?

Cartography aims to enable a broad set of exploration and automation scenarios. It is particularly good at exposing otherwise hidden dependency relationships between your service's assets so that you may validate assumptions about security risks.

Service owners can generate asset reports, Red Teamers can discover attack paths, and Blue Teamers can identify areas for security improvement. All can benefit from using the graph for manual exploration through a web frontend interface, or in an automated fashion by calling the APIs.

Cartography is not the only security graph tool out there, but it differentiates itself by being fully-featured yet generic and extensible enough to help make anyone better understand their risk exposure, regardless of what platforms they use. Rather than being focused on one core scenario or attack vector like the other linked tools, Cartography focuses on flexibility and exploration.

You can learn more about the story behind Cartography in our presentation at BSidesSF 2019.

Installation

Time to set up the server that will run Cartography. Cartography should work on both Linux and Windows servers, but bear in mind we've only tested it in Linux so far. Cartography requires Python 3.6 or greater.

  1. Get and install the Neo4j graph database on your server.

    1. Go to the Neo4j download page, click "Community Server" and download Neo4j Community Edition 3.5.*.

       ⚠️ At this time we run our automated tests on Neo4j version 3.5.\*.  Other versions may work but are not explicitly supported. ⚠️
      
    2. Install Neo4j on the server you will run Cartography on.

  2. If you're an AWS user, prepare your AWS account(s)

    • If you only have a single AWS account

      1. Set up an AWS identity (user, group, or role) for Cartography to use. Ensure that this identity has the built-in AWS SecurityAudit policy (arn:aws:iam::aws:policy/SecurityAudit) attached. This policy grants access to read security config metadata.
      2. Set up AWS credentials to this identity on your server, using a config and credential file. For details, see AWS' official guide.
    • If you want to pull from multiple AWS accounts, see here.

  3. If you're a GCP user, prepare your GCP credential(s)

    1. Create an identity - either a User Account or a Service Account - for Cartography to run as
    2. Ensure that this identity has the following roles (https://cloud.google.com/iam/docs/understanding-roles) attached to it:
      • roles/iam.securityReviewer
      • roles/resourcemanager.organizationViewer: needed to list/get GCP Organizations
      • roles/resourcemanager.folderViewer: needed to list/get GCP Folders
    3. Ensure that the machine you are running Cartography on can authenticate to this identity.
      • Method 1: You can do this by setting your GOOGLE_APPLICATION_CREDENTIALS environment variable to point to a json file containing your credentials. As per SecurityCommonSense™️, please ensure that only the user account that runs Cartography has read-access to this sensitive file.
      • Method 2: If you are running Cartography on a GCE instance or other GCP service, you can make use of the credential management provided by the default service accounts on these services. See the official docs on Application Default Credentials for more details.
    4. If you want to pull from multiple GCP Projects, see here.
  4. If you're a CRXcavator user, prepare your CRXcavator API key

    1. Generate an API key from your CRXcavator user page
    2. Populate the following environment variables in the shell running Cartography
      1. CRXCAVATOR_URL - the full URL to the CRXcavator API. https://api.crxcavator.io/v1 as of 07/09/19
      2. CREDENTIALS_CRXCAVATOR_API_KEY - your API key generated in the previous step. Note this is a credential and should be stored in an appropriate secret store to be populated securely into your runtime environment.
    3. If the credentials are configured, the CRXcavator module will run automatically on the next sync
  5. If you're using GSuite, prepare your GSuite Credential

    Ingesting GSuite Users and Groups utilizes the Google Admin SDK.

    1. Enable Google API access
    2. Create a new G Suite user account and accept the Terms of Service. This account will be used as the domain-wide delegated access.
    3. Perform G Suite Domain-Wide Delegation of Authority
    4. Download the service account's credentials
    5. Export the environmental variables:
      1. GSUITE_GOOGLE_APPLICATION_CREDENTIALS - location of the credentials file.
      2. GSUITE_DELEGATED_ADMIN - email address that you created in step 2
  6. If you're using Okta intel module, prepare your Okta API token

    1. Generate your API token by following the steps from Okta Create An API Token documentation
    2. Populate an environment variable with the API token. You can pass the environment variable name via the cli --okta-api-key-env-var parameter
    3. Use the cli --okta-org-id parameter with the organization id you want to query. The organization id is the first part of the Okta url for your organization.
    4. If you are using Okta to administer AWS as a SAML provider then the module will automatically match OktaGroups to the AWSRole they control access for
      • If you are using a regex other than the standard okta group to role regex ^aws\#\S+\#(?{{role}}[\w\-]+)\#(?{{accountid}}\d+)$ defined in Step 5: Enabling Group Based Role Mapping in Okta then you can specify your regex with the --okta-saml-role-regex parameter.
  7. Get and run Cartography

    1. Run pip install cartography to install our code.

    2. Finally, to sync your data:

      • If you have one AWS account, run

         cartography --neo4j-uri <uri for your neo4j instance; usually bolt://localhost:7687>
        
      • If you have more than one AWS account, run

         AWS_CONFIG_FILE=/path/to/your/aws/config cartography --neo4j-uri <uri for your neo4j instance; usually bolt://localhost:7687> --aws-sync-all-profiles
        

      The sync will pull data from your configured accounts and ingest data to Neo4j! This process might take a long time if your account has a lot of assets.

Usage Tutorial

Once everything has been installed and synced, you can view the Neo4j web interface at http://localhost:7474. You can view the reference on this here.

ℹ️ Already know how to query Neo4j? You can skip to our reference material!

If you already know Neo4j and just need to know what are the nodes, attributes, and graph relationships for our representation of infrastructure assets, you can skip this handholdy walkthrough and see our quick canned queries. You can also view our reference material.

What RDS instances are installed in my AWS accounts?

MATCH (aws:AWSAccount)-[r:RESOURCE]->(rds:RDSInstance)
return *

Visualization of RDS nodes and AWS nodes

In this query we asked Neo4j to find all [:RESOURCE] relationships from AWSAccounts to RDSInstances, and return the nodes and the :RESOURCE relationships.

We will do more interesting things with this result next.

ℹ️ Protip - customizing your view

You can adjust the node colors, sizes, and captions by clicking on the node type at the top of the query. For example, to change the color of an AWSAccount node, first click the "AWSAccount" icon at the top of the view to select the node type selecting an AWSAccount node

and then pick options on the menu that shows up at the bottom of the view like this: customizations

Which RDS instances have encryption turned off?

MATCH (a:AWSAccount)-[:RESOURCE]->(rds:RDSInstance{storage_encrypted:false})
RETURN a.name, rds.id

Unencrypted RDS instances

The results show up in a table because we specified attributes like a.name and rds.id in our return statement (as opposed to having it return *). We used the "{}" notation to have the query only return RDSInstances where storage_encrypted is set to False.

If you want to go back to viewing the graph and not a table, simply make sure you don't have any attributes in your return statement -- use return * to return all nodes decorated with a variable label in your MATCH statement, or just return the specific nodes and relationships that you want.

Let's look at some other AWS assets now.

Which EC2 instances are directly exposed to the internet?

MATCH (instance:EC2Instance{exposed_internet: true})
RETURN instance.instanceid, instance.publicdnsname

EC2 instances open to the internet

These instances are open to the internet either through permissive inbound IP permissions defined on their EC2SecurityGroups or their NetworkInterfaces.

If you know a lot about AWS, you may have noticed that EC2 instances don't actually have an exposed_internet field. We're able to query for this because Cartography performs some data enrichment to add this field to EC2Instance nodes.

Which S3 buckets have a policy granting any level of anonymous access to the bucket?

MATCH (s:S3Bucket)
WHERE s.anonymous_access = true
RETURN s

S3 buckets that allow anon access

These S3 buckets allow for any user to read data from them anonymously. Similar to the EC2 instance example above, S3 buckets returned by the S3 API don't actually have an anonymous_access field and this field is added by one of Cartography's data augmentation steps.

A couple of other things to notice: instead of using the "{}" notation to filter for anonymous buckets, we can use SQL-style WHERE clauses. Also, we used the SQL-style AS operator to relabel our output header rows.

How many unencrypted RDS instances do I have in all my AWS accounts?

Let's go back to analyzing RDS instances. In an earlier example we queried for RDS instances that have encryption turned off. We can aggregate this data by AWSAccount with a small change:

MATCH (a:AWSAccount)-[:RESOURCE]->(rds:RDSInstance)
WHERE rds.storage_encrypted = false
RETURN a.name as AWSAccount, count(rds) as UnencryptedInstances

Table of unencrypted RDS instances by AWS account

Learning more

If you want to learn more in depth about Neo4j and Cypher queries you can look at this tutorial and see this reference card.

Extending Cartography with Analysis Jobs

You can add your own custom attributes and relationships without writing Python code! Here's how.

Contributing

Code of conduct

This project is governed by Lyft's code of conduct. All contributors and participants agree to abide by its terms.

Contributing code

How to test your code contributions

See these docs.

Sign the Contributor License Agreement (CLA)

We require a CLA for code contributions, so before we can accept a pull request we need to have a signed CLA. Please visit our CLA service and follow the instructions to sign the CLA.

File issues in Github

In general all enhancements or bugs should be tracked via github issues before PRs are submitted. We don't require them, but it'll help us plan and track.

When submitting bugs through issues, please try to be as descriptive as possible. It'll make it easier and quicker for everyone if the developers can easily reproduce your bug.

Submit pull requests

Our only method of accepting code changes is through Github pull requests.

Reference

Schema

Detailed view of our schema and all data types 😁.

Sample queries

What RDS instances are installed in my AWS accounts?

MATCH (aws:AWSAccount)-[r:RESOURCE]->(rds:RDSInstance)
return *

Which RDS instances have encryption turned off?

MATCH (a:AWSAccount)-[:RESOURCE]->(rds:RDSInstance{storage_encrypted:false})
return a.name, rds.id

Which EC2 instances are exposed (directly or indirectly) to the internet?

MATCH (instance:EC2Instance{exposed_internet: true})
RETURN instance.instanceid, instance.publicdnsname

Which ELB LoadBalancers are internet accessible?

MATCH (elb:LoadBalancer{exposed_internet: true})—->(listener:ELBListener)
RETURN elb.dnsname, listener.port
ORDER by elb.dnsname, listener.port

Which ELBv2 LoadBalancerV2s (Application Load Balancers) are internet accessible?

MATCH (elbv2:LoadBalancerV2{exposed_internet: true})—->(listener:ELBV2Listener)
RETURN elbv2.dnsname, listener.port
ORDER by elbv2.dnsname, listener.port

Which S3 buckets have a policy granting any level of anonymous access to the bucket?

MATCH (s:S3Bucket)
WHERE s.anonymous_access = true
RETURN s

How many unencrypted RDS instances do I have in all my AWS accounts?

MATCH (a:AWSAccount)-[:RESOURCE]->(rds:RDSInstance)
WHERE rds.storage_encrypted = false
return a.name as AWSAccount, count(rds) as UnencryptedInstances

What users have the TotallyFake extension installed?

MATCH (u:GSuiteUser)-[r:INSTALLS]->(ext:ChromeExtension)
WHERE ext.name CONTAINS 'TotallyFake'
return ext.name, ext.version, u.email

What users have installed extensions that are risky based on CRXcavator scoring?

Risk > 200 is evidence of 3 or more critical risks or many high risks in the extension.

MATCH (u:GSuiteUser)-[r:INSTALLS]->(ext:ChromeExtension)
WHERE ext.risk_total > 200
return ext.name, ext.version, u.email

Data Enrichment

Cartography adds custom attributes to nodes and relationships to point out security-related items of interest. Unless mentioned otherwise these data augmentation jobs are stored in cartography/data/jobs/analysis. Here is a summary of all of Cartography's custom attributes.

  • exposed_internet indicates whether the asset is accessible to the public internet.

    • Elastic Load Balancers: The exposed_internet flag is set to True when the load balancer's scheme field is set to internet-facing, and the load balancer has an attached source security group with rules allowing 0.0.0.0/0 ingress on ports or port ranges matching listeners on the load balancer. This scheme indicates that the load balancer has a public DNS name that resolves to a public IP address.

    • Application Load Balancers: The exposed_internet flag is set to True when the load balancer's scheme field is set to internet-facing, and the load balancer has an attached security group with rules allowing 0.0.0.0/0 ingress on ports or port ranges matching listeners on the load balancer. This scheme indicates that the load balancer has a public DNS name that resolves to a public IP address.

    • EC2 instances: The exposed_internet flag on an EC2 instance is set to True when any of following apply:

      • The instance is part of an EC2 security group or is connected to a network interface connected to an EC2 security group that allows connectivity from the 0.0.0.0/0 subnet.

      • The instance is connected to an Elastic Load Balancer that has its own exposed_internet flag set to True.

      • The instance is connected to a TargetGroup which is attached to a Listener on an Application Load Balancer (elbv2) that has its own exposed_internet flag set to True.

    • ElasticSearch domain: exposed_internet is set to True if the ElasticSearch domain has a policy applied to it that makes it internet-accessible. This policy determination is made by using the policyuniverse library. The code for this augmentation is implemented at cartography.intel.aws.elasticsearch._process_access_policy().

  • anonymous_access indicates whether the asset allows access without needing to specify an identity.

Cross-Account Auditing

Multiple AWS Account Setup

There are many ways to allow Cartography to pull from more than one AWS account. We can't cover all of them, but we can show you the way we have things set up at Lyft. In this scenario we will assume that you are going to run Cartography on an EC2 instance.

  1. Pick one of your AWS accounts to be the "Hub" account. This Hub account will pull data from all of your other accounts - we'll call those "Spoke" accounts.

  2. Set up the IAM roles: Create an IAM role named cartography-read-only on all of your accounts. Configure the role on all accounts as follows:

    1. Attach the built-in AWS SecurityAudit IAM policy (arn:aws:iam::aws:policy/SecurityAudit) to the role. This grants access to read security config metadata.

    2. Set up a trust relationship so that the Spoke accounts will allow the Hub account to assume the cartography-read-only role. The resulting trust relationship should look something like this:

      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Effect": "Allow",
            "Principal": {
              "AWS": "arn:aws:iam::<Hub's account number>:root"
            },
            "Action": "sts:AssumeRole"
          }
        ]
      }
      
    3. Allow a role in the Hub account to assume the cartography-read-only role on your Spoke account(s).

      • On the Hub account, create a role called cartography-service.

      • On this new cartography-service role, add an inline policy with the following JSON:

         {
           "Version": "2012-10-17",
           "Statement": [
             {
               "Effect": "Allow",
               "Resource": "arn:aws:iam::*:role/cartography-read-only",
               "Action": "sts:AssumeRole"
             },
         	{
         	  "Effect": "Allow",
         	  "Action": "ec2:DescribeRegions",
         	  "Resource": "*"
         	}
           ]
         }
        

        This allows the Hub role to assume the cartography-read-only role on your Spoke accounts and to fetch all the different regions used by the Spoke accounts.

      • When prompted to name the policy, you can name it anything you want - perhaps CartographyAssumeRolePolicy.

  3. Set up your EC2 instance to correctly access these AWS identities

    1. Attach the cartography-service role to the EC2 instance that you will run Cartography on. You can do this by following these official AWS steps.

    2. Ensure that the [default] profile in your AWS_CONFIG_FILE file (default ~/.aws/config in Linux, and %UserProfile%\.aws\config in Windows) looks like this:

       [default]
       region=<the region of your Hub account, e.g. us-east-1>
       output=json
      
    3. Add a profile for each AWS account you want Cartography to sync with to your AWS_CONFIG_FILE. It will look something like this:

      [profile accountname1]
      role_arn = arn:aws:iam::<AccountId#1>:role/cartography-read-only
      region=us-east-1
      output=json
      credential_source = Ec2InstanceMetadata
      
      [profile accountname2]
      role_arn = arn:aws:iam::<AccountId#2>:role/cartography-read-only
      region=us-west-1
      output=json
      credential_source = Ec2InstanceMetadata
      
      ... etc ...
      

Multiple GCP Project Setup

In order for Cartography to be able to pull all assets from all GCP Projects within an Organization, the User/Service Account assigned to Cartography needs to be created at the Organization level. This is because IAM access control policies applied on the Organization resource apply throughout the hierarchy on all resources in the organization.

cartography's People

Contributors

achantavy avatar dschaller avatar ecdavis avatar evansgp avatar hvnsweeting avatar manesioz avatar marco-lancini avatar mtanda avatar ramonpetgrave64 avatar sachafaust avatar scriptsrc avatar skiptomyliu avatar tayasteere avatar thomashli avatar tobli avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.