GithubHelp home page GithubHelp logo

hm-seclab / yafra Goto Github PK

View Code? Open in Web Editor NEW
27.0 3.0 4.0 1.29 MB

YAFRA is a semi-automated framework for analyzing and representing reports about IT Security incidents.

Home Page: https://seclab.cs.hm.edu/oss-projects/yafra/

License: Apache License 2.0

Dockerfile 0.66% Python 98.29% Makefile 0.25% HTML 0.46% Shell 0.34%
ioc incident-response cybersecurity threatintel threat-intelligence threat-hunting indicators indicators-of-compromise ioa cyber-threat-intelligence

yafra's Introduction

YAFRA

Maintenance PRs Welcome Open Source Love png1

YAFRA stands for [y]et [a]nother [f]ramework for [r]eport [a]nalysis

Description

YAFRA is a semi-automated framework for analysing and representing reports about IT security incidents. Users can provide reports as PDF and YAFRA will extract IOCs (indicators of compromise). After extraction these IOCs will be enriched by external sources such as VirusTotal or MITRE in order to provide more context.

Installation and Configuration

For information about the installation and configuration have a look in the docs folder.

Examples

Example reports can be found on the website of the US-CERT (CISA): https://us-cert.cisa.gov/ncas/analysis-reports

Extensions

YAFRA provides a simple to use extension system called YAFRA-Extensions. For more information, have a look at the extensions folder.

yafra's People

Contributors

certbe-trey avatar deralexmeister avatar fritterhoff avatar p2h5 avatar thomas-schreck avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

yafra's Issues

Empty branch for checkout

Instead of using the monthly branch to checkout in case a new report gets created, use a empty branch with no filles.

Should improve performance of gitlab and usability of the branch system.

Exchange Flask-scripts

Flask-scripts will no longer be actively maintained so exchange flask-scripts by a similar lib.

Separate between auto scraped and individual reports in gitlab

Is your feature request related to a problem? Please describe.
To get a better overview within gitlab, it would be good practice to separate the branches and their reports of auto scraped data from the branches and reports, which got analyzed based on an uploaded report.

Extractor No Broker Available Error

Describe the bug
Once a while the Extractor is throwing following error:

NoBrokersAvailable

It still runs after this error, but throws it again later.

Adding Filter Service for IoC

Is your feature request related to a problem? Please describe.
Adding a dedicated filter service/system as standalone microservice. Also adding the MISP-WarningLists #21.

Describe the solution you'd like
Add a microservice which filters found ioc by a bigger and optimized blacklist. This microservice should be places between Extractor and Pusher but will NOT communicate with them directly.

Add Webscraper for websites without rss feed

Is your feature request related to a problem? Please describe.
Various websites are not having rss feeds to scrape. For those websites, another way has to be found to get the data from.

Describe the solution you'd like

  • Download html from gven sources
  • Extract links from the downloaded data.
  • Scrape for IOCs within the reports, the link points to.

Improve Readme

Add more information in the readme and link to the docs.

Extractor None Type Error

Describe the bug
The Extractor is getting the following Error, shortly after starting:

object of type 'NoneType' has no len()

It still runs, but the error occurs once a while.

Add rules for removing special chars from branch title

Describe the bug
Getting errors, while creating new branches for reports, which contains special chars in the title.

To Reproduce
Steps to reproduce the behavior:

  1. Create a new Report with a special char in the title e.g. @'/
  2. Upload it with yafra to gitlab

Expected behavior
Creation of a new branch and report by title in gitlab.

Possible solution
import re
result = re.sub('[^a-zA-Z0-9]', '', '')

Update docs

  • Configuration
  • System Monitoring
  • Requirements
  • Installation
  • Testing
  • Overriew (README.md in docs)
  • Readme.md in root-dir

Add reverse proxy

To reduce the huge amount of required ports add a reverse proxy in front of the web applications.

Remove empty reports

Is your feature request related to a problem? Please describe.
Remove empty reports in gitlab, to avoid too many branches.

Describe the solution you'd like
Only keep those reports, which contains IOCs.

Separate links to further information from IOC url in tweets

Is your feature request related to a problem? Please describe.
Links to further information e.g. articles, websites in tweets are marked as IOCs url.
Would be better, if those links are marked as further links to other websites and also written into the report.

No IoC from Pusher when MISP is down

Describe the bug
When MISP is down/not available and some IoC will be process than the Report will be empty incase the ioc is a domain, ip, etc.

To Reproduce
Steps to reproduce the behavior:

  1. Start YAFRA
  2. Stop MISP/Disconnect
  3. Enter a report/data from the scraper
  4. A report with a domain will not have the ioc in the report

Expected behavior
The domain but no misp info.

Sources error on scraper

Describe the bug
Error in the Scraper concerning rss_sources, twitter_sources and api_sources

To Reproduce
Steps to reproduce the behavior:

  1. Fetch the current version from feature-scraper-extractor-marriage
  2. Add .env
  3. docker-compose up

Screenshots

grafik

Desktop (please complete the following information):

  • Ubuntu 21.04

GitLab backoff on error

In case gitlab is returning a 502/500 the system should backoff and wait for some time until retrying to send data.

Use slimer docker base image

At the moment the "large" python:3.8 (about 320MB) image is used. In the most cases a python:3.8-slim (40MB) or python:3.8-alpine (15MB) should be enough.

Use wsgi server for production

At the moment the default flask server is used for production. It is best practice/recommended to use an wsgi server like Gunicorn for production.
As far as I know, the scheduler could cause some issues due to the fork modell of the server (depending on the individual solution).

Kafka does not start correct

In some cases the kafka container does not startup correctly so the services are unable to publish/... their data.

An indicator for that situation seems to be the fact that kafka starts a new broker with id != 1001 on the inital start and the services run into a timeout.

At the moment the following workaround seems to fix the problem:

  • Stopping kafka
  • Getting the implicit created docker volume by running docker volume kafka and note the volume id
  • Executing a docker-compose down kafka
  • Deleting the volume by the id

At the moment this issue seems to affect Windows 10 + Docker and Debian 10 + Docker

Get initial sources from a local directory at the first time.

Is your feature request related to a problem? Please describe.
To avoid errors by not finding sources and the blacklist in gitlab, when running yafra for the first time, it should get the sources and the blacklist initially from a local directory.

Describe the solution you'd like
Get all sources and the blacklist from a local directory, before calling it for the first time.
Note -> datasources and blacklist are locally stored within /datasets.

MISP warning

Please make sure the API key and the URL are correct (http/https is required): maximum recursion depth exceeded while calling a Python object - (Pusher)

Collision of RSS-Feed Branch names possible?

Could a collision between multiple events in a RSS-Feed be possible?

Are there RSS-Feed sending similar events with similar names?

Because on fetching information using the scraper from time to time the system prints a lot of "Branch already exists" messages.

Although the name of a branch from every sources should to be unique.

A test run with ~1650 RSS-Events turned into ~900 in GitLab.

Improve Logging

Is your feature request related to a problem? Please describe.
To get to know faster about possible errors, we should be more specific about the running jobs, functions etc. by using LogTyp.INFO.

Throttle upload of data to Gitlab.

Describe the bug
Gitlab returns a 500 and crashes, if too many data is uploaded in a short amount of time.
Therefore we should throttle it.

Error when creating a repository

Describe the bug
When starting the Extractor an error occurred on a new repository.

To Reproduce
Steps to reproduce the behavior:

  1. Fetch the current version from feature-scraper-extractor-marriage
  2. Add a .env-File
  3. docker-compose up

Expected behavior
No exception when starting the service.

Desktop (please complete the following information):

  • Ubuntu 21.04

Type Error Scraper

Describe the bug
While the system is booting the scraper will raise an exception. This will appear in the version on feature-scraper-extractor-marraige

unsupported operand type(s) for +: 'NoneType' and 'NoneType'

To Reproduce
Steps to reproduce the behavior:

  1. Fetch the current version from feature-scraper-extractor-marraige
  2. Add .env
  3. docker-compose up

Expected behavior
No type error

Screenshots
grafik

Desktop (please complete the following information):

  • Ubuntu 21.04

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.