GithubHelp home page GithubHelp logo

aadeaina / netograph-api Goto Github PK

View Code? Open in Web Editor NEW

This project forked from netograph/netograph-api

0.0 0.0 0.0 14.94 MB

Tools and libraries for interacting with the Netograph API

Go 59.16% Python 40.28% Shell 0.56%

netograph-api's Introduction

Netograph API

This repository contains a command-line tool and language packs for interacting with the Netograph API.

ngc: Netograph command-line interface

ngc is a command-line tool that exposes the complete Netograph interface for interactive use. If you have a Go development environment set up, you can install it as follows:

go get -u github.com/netograph/netograph-api/go/cmd/ngc

Alternatively, binaries for all major platforms can be downloaded from the latest release page.

After installation, try running ngc for a high-level overview of the API, and ngc help command for help on any specific command.

You can find rendered documentation for the full API here.

Configuration

ngc is configured through environment variables.

Variable Description
NGC_DATASET The dataset to operate on, unless over-ridden with the --dset flag. Defaults to netograph:social, our largest public data repository
NGC_TOKEN The authentication token for API access. Can be over-ridden with the --token command-line flag.

Output formats

By default ngc outputs data as indented JSON. You can also use the --cjson flag to output compact JSON, which consists of one record per line with no newlines, and so is suitable for programmatic use.

Examples

Begin by exporting your access token:

export NGC_TOKEN=MYTOKEN

You can now list the datasets that you have access to:

ngc datasets

Note that public datasets will be marked read-only. You can query these, but not submit new URLs to be captured to them. For datasets you have write access to, see the ngc submitcapture command for capture submission.

The default dataset is the netograph:social dataset, which aggregates a sizeable fraction of all URLs passing through social media in real time. As a first step, let's list all the satellites we've ever seen for a domain query.

ngc satellitesforroot rt.com

There are a few things to note here.

  • This command searches for rt.com and ALL its subdomains. See the Domain Queries section for a description of how to restrict to a specific domain.
  • The output is pretty verbose, and it would be nice to be able to extract only the information we're interested in. We recommend jq, a light-weight command-line JSON processor for this.
  • The command name is pretty long. Most commands have shorter aliases you can view using ng help.
  • By default we limit the number of responses for queries to 100. It's pretty easy to craft a command that will return millions of records. In this case, we want to list ALL of the satellites we've ever seen on rt.com, and we can disable this limit by setting it to 0.

Putting all of this together, we have a command like this:

ngc -n 0 --cjson satsforroot  rt.com | jq -r .satellite

At the time of writing, this command cleanly lists about 3600 third-party domains for rt.com.

Language packs

Go

You may install the Netograph Go library with the following command:

go get -u github.com/netograph/netograph-api/go

Please see the source for the ngc tool in this repo for a comprehensive usage example.

Python

You can install the current release from the PyPi registry as follows:

pip install netograph

To install the development version of the library, check out this repo, and type:

pip install ./python

Both of these commans will install the netograph Python module. See the examples within the Python directory for usage.

API Notes

Domain queries

When dealing with domains, the Netograph API usually accepts domain queries rather than specific domains. This means that a query for rt.com will also return results for www.rt.com, social.rt.com and so forth. You can restrict a query strictly to a specified domain by prefixing it with "$". So, a query for "$rt.com" will match no subdomains.

Note that "$" is syntactically significant to most shells, even within quoted strings. You'll need to escape it when using ngc, e.g.:

ngc ipsfordomain "\$rt.com"

Resume tokens

Netograph has an efficient streaming API - queries that return thousands or hundreds of thousands of records are permitted and common. In most cases, each record comes with a resume token, which can be passed in queries to resume streaming if a connection was lost, or to provide functionality like paging. Resume tokens are only valid when passed to the exact same query that originated them, and should not be stored persistently.

Capture IDs

Each capture has a unique ID. IDs can contain dashes ("-"), which means they need to be treated specially on the command line. In particular, since the first character of an ID can be a dash, an ID can be confused with a flag. To avoid this, you may need to use the rarely-used double-dash shell idiom to indicate the end of flag arguments:

ngc download ./dst -- -IDWITHINITIALDASH

netograph-api's People

Contributors

cortesi avatar mhils avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.