GithubHelp home page GithubHelp logo

ppfeister / oculus Goto Github PK

View Code? Open in Web Editor NEW
9.0 1.0 2.0 55.2 MB

Simplify the link between social and real identities

Home Page: http://sylva.pfeister.dev/

License: GNU Affero General Public License v3.0

Python 98.34% Dockerfile 1.66%
osint osint-tool social-network-analysis cti information-gathering identity-discovery

oculus's Introduction

Shows an illustrated sun in light mode and a moon with stars in dark mode.

Sylva identity discovery

    Static Badge

Note that Sylva is undergoing rapid development and documentation may be quickly obsoleted.

Visit the Sylva Wiki for more information.

Summary

Useful integrations and data sources

Name Description API Key
Endato Person data source (phone, address, cell, etc) Req [ T | $ ]
IntelX Data leak source Req [ T | $ ]
ProxyNova COMB API (cleartext passwords, usernames) Native
Veriphone Phone number lookup Req [ F+ ]
GitHub See detail below Opt [ F ]
Reddit Natural language processing for residency hints Native

$ : paid | T : trial | F : Free | F+ : Freemium

Most development was done without any paid access -- so despite some integrations requiring an account, the full experience can be attained without any subscriptions.

Generic modules

Name Description
PGP Search Search for identities through discovered PGP keys
Sherlock Sherlock extended for discovery of additional identities and branching
Voter Records Geographical, relation, and age lookup in 18 US States

GitHub Integration

Query GitHub for any known PGP keys, scrape both the oldest and newest 1000 commit authorships (2000 total) for leaked identifying information, and search for identities based on full name, email, or username.

Personal Access Token (PAT) is required for PGP scraping, but all other functions work out of the box. PAT is recommended for higher rate limits on other functions. PAT does not require any permissions assigned to it whatsoever.

Quick Start

Docker is the preferred method of installation, providing the most consistent and predictable user experience.

docker run -it sylva/sylva --help

For a preview of the latest changes, the preview tag may be used.

docker run -it sylva/sylva:preview --help

Tip

Some users may opt to add an alias to their shell for ease of use.

Adding alias sd="docker run -it sylva/sylva" to your ~/.bashrc or ~/.zshrc will allow you to simply type sd branch user123 rather than the entire docker command. Add the :preview tag if necessary.

Other installation methods are described on the Sylva Wiki.

Packagers

It's recommended that you don't package Sylva yet. Changes are happening at rates quicker than most release cycles allow. If you'd like to package Sylva, feel free to reach out for info!

Contributing

Contributors should refer to our contributing guidelines for information on how to contribute to the project. Note that since the project is still in its infancy, there isn't yet a formal roadmap.

Contributors opening a pull request are assumed to have read and agreed to the guidelines.

Stargazers over time

Stargazers over time

oculus's People

Contributors

ppfeister avatar spiritgun91 avatar

Stargazers

 avatar  avatar  avatar Small Data Science avatar Dominik Antal avatar  avatar Mon avatar  avatar  avatar

Watchers

 avatar

oculus's Issues

Displayed count of results from Reddit NLP occasionally off by one (above)

After a module or integration is ran, the created dataframe is returned to the handler. The handler doesn't care about this dataframe, but it does care about the length of the dataframe, displaying the length to the user as the quantity of discovered items.

Natural language processing has been added to an extent to search a user's contribution history on Reddit for possible hints as to their residency or visited locations.

For certain usernames on the new module, the displayed count may be off by one (above). This isn't for all usernames, but it is consistent on the ones impacted. If 5 rows are added to the collector's dataframe, the stdout may read 6.

Proxy service fails to start

Debian repositories updated Chromium to 127. This new version has a high rate of crashing on launch, leading to proxy failures. Chromium 126.0.6478.182 is known to be functional.

Either regress to 126.0.6478.182 in the Dockerfile, or patch the new version.

Note: With the applied hotfix, Dockerfile is the temporary functional copy while Dockerfile.std is the normal, otherwise functional copy. Dockerfile.std should be the base once the bug is fixed.

NLP helper (spaCy) lacks proper support for complex prompts in regards to residency

Sylva uses spaCy to help parse arbitrary text for possible residency info.

This helper module can be found at: https://github.com/ppfeister/sylva/blob/master/src/sylva/helpers/nlp.py

To reproduce using the included test suite, developers can either run pytest --runxfail (detailed) or pytest -rx (minimal).

Two examples of currently failing prompts from the test suite:
- I've lived in both Boston and Bremen before
- I've vacationed in Manchester, but lived in Bremen and Boston
Note that the second example is expected to not return Manchester, only Bremen and Boston.


General improvements to this module (accuracy, match rate...) are always welcome.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.