GithubHelp home page GithubHelp logo

nvisosecurity / ee-outliers Goto Github PK

View Code? Open in Web Editor NEW
203.0 21.0 34.0 4.01 MB

Open-source framework to detect outliers in Elasticsearch events

Home Page: https://blog.nviso.eu

License: GNU General Public License v3.0

Dockerfile 0.25% Python 99.01% Shell 0.74%
outliers netsec threat-hunting statistics security-monitoring anomaly-detection outlier-detection siem cirt security-operations

ee-outliers's Introduction

ee-outliers

Framework to easily detect outliers in Elasticsearch events.

Developed in Python and fully dockerized!

version badge tests badge

Documentation

Introduction

Using ee-outliers

Misc.

What is ee-outliers?

ee-outliers is a framework to detect statistical outliers in events stored in an Elasticsearch cluster. It uses easy to write user-defined configuration files to decide which & how events should be analysed for outliers.

The framework was developed for the purpose of detecting anomalies in security events, however it could just as well be used for the detection of outliers in other data.

The only thing you need is Docker and an Elasticsearch cluster and you are ready to start your hunt for outlier events!

Why ee-outliers?

Although we love Elasticsearch, its search language is still lacking support for complex queries that allow for advanced analysis and detection of outliers - features we came to love while using other tools such as Splunk.

This framework tries to solve these limitations by allowing the user to write simple use cases that can help in spotting outliers in your data using statistical and models. Machine learning models are under development.

How it works

The framework makes use of statistical models that are easily defined by the user in a configuration file. In case the models detect an outlier, the relevant Elasticsearch events are enriched with additional outlier fields. These fields can then be dashboarded and visualized using the tools of your choice (Kibana or Grafana for example).

The possibilities of the type of anomalies you can spot using ee-outliers is virtually limitless. A few examples of types of outliers we have detected ourselves using ee-outliers during threat hunting activities include:

  • Detect beaconing (DNS, TLS, HTTP, etc.)
  • Detect geographical improbable activity
  • Detect obfuscated & suspicious command execution
  • Detect fileless malware execution
  • Detect malicious authentication events
  • Detect processes with suspicious outbound connectivity
  • Detect malicious persistence mechanisms (scheduled tasks, auto-runs, etc.)
  • โ€ฆ

Visit the page Getting started to get started with outlier detection in Elasticsearch yourself!

Contact

ee-outliers is developed & maintained by NVISO Labs.

You can reach out to the developers of ee-outliers by creating an issue in github.
For any other communication, you can reach out by sending us an e-mail at [email protected].

We write about our research on our blog: https://blog.nviso.eu
You can follow us on twitter: https://twitter.com/NVISO_Labs

Thank you for using ee-outliers and we look forward to your feedback! ๐Ÿ€

License

ee-outliers is released under the GNU GENERAL PUBLIC LICENSE v3 (GPL-3). LICENSE

Acknowledgements

We are grateful for the support received by INNOVIRIS and the Brussels region in funding our Research & Development activities.

Getting started โ†’

NVISO Labs logo

ee-outliers's People

Contributors

0xthiebaut avatar daanraman avatar dependabot[bot] avatar detobel36 avatar jvanwilder avatar maximilienroberti avatar michielmeersmans avatar olivierbuez avatar rdepril avatar speedyfirecyclone avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ee-outliers's Issues

helpers/utils.ps -> flatten_dict

If the a key contains the separator character, it can cause an issue

Example:

flatten_dict({'i.': {'j':0}}) == flatten_dict({'i': {'.j':0}})
# True

Solution:

def flatten_dict(d, parent_key='', sep='.'):
    items = []
    for k, v in d.items():
        k = k.replace('\\', '\\\\')
        k = k.replace(sep, '\\.')
        new_key = parent_key + sep + k if parent_key else k
        if isinstance(v, collections.MutableMapping):
            items.extend(flatten_dict(v, new_key, sep=sep).items())
        else:
            items.append((new_key, v))
    return dict(items)

Allow outliers case to specify index pattern

Right now the index pattern is selected globally, however it would be nice to define this per use case.
For example in Sigma I'm getting hits on "powershell_ntfs_ads_access" in the SuricataFilter index.

I think I can work around it by using the _index meta field in the query string but performance wise it would probably be way quicker to just define the search index.

@daanraman I'm not entirely familiar with the internals of the search functions behind outliers. Is this a trivial improvement or does this require some serious backend overhaul?

Simplequery gives 0 hits on query that does give hits in Kibana discover

[simplequery_rc4_kerberos_ticket]
es_query_filter = (WineventFilter.EventID: 4769 AND WineventFilter.TicketOptions.raw:"0x40810000" AND WineventFilter.TicketEncryptionType.raw:"0x17" AND NOT (WineventFilter.ServiceName.raw: /.*$/ )) AND smoky_filter_name: WineventFilter
outlier_type = Windows Logs
outlier_reason = Kerberos RC4 encryption usage by {{WineventFilter.ServiceName}}
outlier_summary = Kerberos RC4 encryption usage
run_model = 1
test_model = 0

My gut feeling says something goes wrong when reading in the query (there's some special characters in there).

term.py never read the next batch

In term.py, [34-50] we break the for loop which reads the documents after reading the first batch. So we never read the next batch.

high number of tagged outliers results in scroll context expiring

2018-12-11 19:00:38 - ERROR - Traceback (most recent call last):
  File "outliers.py", line 104, in <module>
    perform_analysis()
  File "outliers.py", line 47, in perform_analysis
    metrics_generic.perform_analysis()
  File "/app/analyzers/metrics_generic.py", line 20, in perform_analysis
    run_generic_metrics_model(section_name=name, model_name=model_name, model_settings=model_settings)
  File "/app/analyzers/metrics_generic.py", line 214, in run_generic_metrics_model
    evaluate_model(model_name=model_name, model_settings=model_settings)
  File "/app/analyzers/metrics_generic.py", line 77, in evaluate_model
    for doc in es.scan(lucene_query=lucene_query):
  File "/usr/local/lib/python3.5/dist-packages/elasticsearch/helpers/__init__.py", line 379, in scan
    **scroll_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/elasticsearch/client/utils.py", line 76, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/elasticsearch/client/__init__.py", line 1011, in scroll
    params=params, body=body)
  File "/usr/local/lib/python3.5/dist-packages/elasticsearch/transport.py", line 314, in perform_request
    status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
  File "/usr/local/lib/python3.5/dist-packages/elasticsearch/connection/http_urllib3.py", line 180, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python3.5/dist-packages/elasticsearch/connection/base.py", line 125, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.NotFoundError: TransportError(404, 'search_phase_execution_exception', 'No search context found for id [390777]')

Do not crash outliers in case a single document can't be updated

Example error:


-eagleeye-%25%7B%5Bsmoky_filter_name%5D%7D-2018.10.26/doc/5SEcr2YBDK6V8tqIuxL_/_update?refresh=true [status:400 request:0.016s]
2018-10-31 10:25:00 - ERROR - Traceback (most recent call last):
  File "outliers.py", line 100, in <module>
    perform_analysis()
  File "outliers.py", line 44, in perform_analysis
    simplequery_generic.perform_analysis()
  File "/app/analyzers/simplequery_generic.py", line 20, in perform_analysis
    run_simplequery_model(section_name=name, model_name=model_name, model_settings=model_settings)
  File "/app/analyzers/simplequery_generic.py", line 77, in run_simplequery_model
    evaluate_model(model_name=model_name, model_settings=model_settings)
  File "/app/analyzers/simplequery_generic.py", line 68, in evaluate_model
    es.process_outliers(doc=doc, outliers=[outlier], should_notify= model_settings["should_notify"])
  File "/app/helpers/es.py", line 144, in process_outliers
    self.save_outlier(doc=doc, outlier=outlier)
  File "/app/helpers/es.py", line 165, in save_outlier
    self.conn.update(index=doc["_index"], doc_type=doc["_type"], id=doc["_id"], body=doc_body, refresh=True)
  File "/usr/local/lib/python3.5/dist-packages/elasticsearch/client/utils.py", line 76, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/elasticsearch/client/__init__.py", line 547, in update
    doc_type, id, '_update'), params=params, body=body)
  File "/usr/local/lib/python3.5/dist-packages/elasticsearch/transport.py", line 314, in perform_request
    status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
  File "/usr/local/lib/python3.5/dist-packages/elasticsearch/connection/http_urllib3.py", line 180, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python3.5/dist-packages/elasticsearch/connection/base.py", line 125, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'illegal_argument_exception', '[esnode2][172.18.0.25:9300][indices:data/write/update[s]]')

2018-10-31 10:25:01 - INFO - housekeeping thread #140529569572608 stopped
2018-10-31 10:25:01 - INFO - finished performing outlier detection
2018-10-31 10:25:01 - INFO - next run scheduled on 2018-10-31 10:26:00

Time window does not correctly shift along in daemon mode

in outliers.py, line 106:

logging.logger.info(time_window_info)

This reuses info from outside of the loop which is outdated:

# Prepare log messages
search_start_range_printable = dateutil.parser.parse(settings.search_range_start).strftime('%Y-%m-%d %H:%M:%S')
search_end_range_printable = dateutil.parser.parse(settings.search_range_end).strftime('%Y-%m-%d %H:%M:%S')
time_window_info = "processing events between " + search_start_range_printable + " and " + search_end_range_printable

Recommendation is to move this part into the settings.py source code, and call settings.print_time_window_info once from inside and once from outside of the loop

Catch casting error in numerical_value

2019-01-15 10:08:04 - INFO - ===== evaluating suspiciously_small_process_size outlier detection =====
2019-01-15 10:08:04 - INFO - analyzing 188,389 events
2019-01-15 10:08:06 - ERROR - Traceback (most recent call last):
  File "outliers.py", line 104, in <module>
    perform_analysis()
  File "outliers.py", line 47, in perform_analysis
    metrics_generic.perform_analysis()
  File "/app/analyzers/metrics_generic.py", line 20, in perform_analysis
    run_generic_metrics_model(section_name=name, model_name=model_name, model_settings=model_settings)
  File "/app/analyzers/metrics_generic.py", line 214, in run_generic_metrics_model
    evaluate_model(model_name=model_name, model_settings=model_settings)
  File "/app/analyzers/metrics_generic.py", line 99, in evaluate_model
    metric = float(target_value)
ValueError: could not convert string to float:

Make sure housekeeping is restarted in case thread crashes

Thread can crash if for example the cluster is temporarily down and the search returns 404.
In this case, housekeeping should continue to take place on the planned schedule and not crash, while the rest of the outlier detection keeps running.

Location in code that requires exception management: housekeeping.py, lince 23

Fix typo in readme

pct_of_median_value: percentage of media value. trigger_sensitivity ranges from 0-100.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.