Join our community on Discord 🤓. Feel free to ask about anything on the channel.
You are already saving a lot of money by using Fortinet+Elastic, so consider making a contribution to the project. 💰💰💰 (It is NOT a must for asking for help)
- Paypal 🤑
So you want to take you Fortinet logs to Elasticseach??? You have come to the right place!!! 👍
But wait! Doesn't Elastic provide a Filebeat module for Fortinet??? Why should you go with all the logstash hassle?? 🤷
Well, Filebeat module and Fortidragon are like cousins 👪. The logic for Filebeat module for Fortigate was based on FortiDragon, we colaborated together with Elastic when they made it.
The main difference is that FortiDragon is a full anayltics platform for threat hunting with Fortinet datasources, we do not restrict it to just the "ingestion" of logs.
We actually use FortiDragon on our day to day operations for threat hunting, so we undestand all the painpoints of a security analyst. That is why we created it on the first place, after 10+ years experience with Fortinet we could not find a solution that could extract all the juice out of Fortinet logs. We tried several SIEMs along the way and found out that firewall logs are just a checkbox on their datasheet. Full parsing and performance for such volume of logs was not carefully considered by any SIEM vendor. Finally we decided we needed to build it ourselves and chose Elastic because of its flexibility, performance and cost. FortiDragon is by far the best out there.
Some notable differences are with Filebeat:
Category | FortiDragon | Filebeat |
---|---|---|
Dashboard | We got super cool dashboards!!! | Just one for now 😢 |
Other platforms | FortiEDR, Forticlient, Fortimail, Fortiweb | Just Fortigate |
Updates | Much more often | Dependant to Elastic releases |
Installation | Harder | Easier |
If you can handle the hassle of logstash installation, it is worth the effort.
Let's get this party on!!! 🤩
- Configure syslog
config log syslogd setting
set status enable
set server "logstash_IP"
set port 5140
end
Or if you run FortiOS v7, you can use syslog5424. RECOMMENDED
config log syslogd setting
set status enable
set server "logstash_IP"
set port 5141
set format rfc5424
end
You have to be very careful with your firewall name when usinng syslog5424 format
MY_FIREWALL_SITEA
will not work
MY-FIREWALL-SITEA
will work
- Extendend logging on webfilter OPTIONAL
config webfilter profile
edit "test-webfilter"
set extended-log enable
set web-extended-all-action-log enable
next
end
You may get a warning that you need to change to reliable syslogd. Remember that "The full rawdata field of 20KB is only sent to reliable Syslog servers. Other logging devices, such as disk, FortiAnalyzer, and UDP Syslog servers, receive the information, but only keep a maximum of 2KB total log length, including the rawdata field, and discard the rest of the extended log information."
- If you also would like to have metrics about your SDWAN Performance SLAs and view them in the SDWAN dashboard, you need to set both
sla-fail-log-period
andsla-pass-log-period
on your healthchecks.
config health-check
edit "Google"
set server "8.8.8.8" "8.8.4.4"
set sla-fail-log-period 10
set sla-pass-log-period 30
set members 0
config sla
edit 1
set latency-threshold 100
set jitter-threshold 10
set packetloss-threshold 5
next
end
next
end
- You can also pump your own fields into Fortigate's syslog OPTIONAL
config log custom-field
edit "3"
set name "org"
set value "some_organization_name"
next
end
config log setting
set custom-log-fields "3"
end
- Load ingest pipeline OPTIONAL Remember to comment out on the output pipeline if you decide not to use it.
PUT _ingest/pipeline/add_event_ingested
{
"processors": [
{
"set": {
"field": "event.ingested",
"value": "{{_ingest.timestamp}}"
}
}
]
}
- Create ILM policies according to your needs. You can use these examples. Make sure you name them accordingly to your index strategy. For our case, that would be:
-
logs-fortinet.fortigate.traffic
-
logs-fortinet.fortigate.utm
-
logs-fortinet.fortigate.event
In our experience,
type=traffic
generates lots of logs, whiletype=event
very few. So it makes sense to have different lifecycles for differente types of logs. Other slicing ideas can be found below.
- Load component templates both from Elastic ECS and FortiDragon specific. Do it manually one by one:
PUT _component_template/ecs-base
{
"_meta": {
"documentation": "https://www.elastic.co/guide/en/ecs/current/ecs-base.html",
"ecs_version": "8.3.1"
},
"template": {
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"labels": {
"type": "object"
},
"message": {
"type": "match_only_text"
},
"tags": {
"ignore_above": 1024,
"type": "keyword"
}
}
}
}
}
-
Load index templates
-
Load Dashboards: Go to Management --> Stack Management --> Saved Objects --> Import
-
Make sure dashboard controls are enabled: Go to Management --> Kibana Advanced Settings --> Presentation Labs --> Enable dashboard controls
-
A good idea would be to setup your ES password as a secret
-
Logstash Hostname OPTIONAL
Add HOSTNAME="myhostname" to /etc/default/logstash when running logstash as a service
echo HOSTNAME=\""$HOSTNAME"\" | sudo tee -a /etc/default/logstash
It is very useful if you run several logstash instances.
- Install tld filter plugin (You should do it every time you upgrade logstash version as well)
cd /usr/share/logstash
sudo bin/logstash-plugin install logstash-filter-tld
- Copy pipelines.yml to your logstash folder.
- Copy conf.d content to your conf.d folder.
- Start logstash
Hopefully you should be dancing with your logs by now. 🕺💃
The overall pipeline flow is as follows:
graph LR;
fortimail-input-kv-->fortimail_2_ecs;
fortiweb-input-kv-->fortiweb_2_ecs;
fortigate-input-kv-->fortigate_2_ecs;
forticlient-input-kv-->forticlient_2_ecs;
fortisandbox-input-kv-->fortisandbox_2_ecs;
fortimail_2_ecs-->common_ecs-output;
fortiweb_2_ecs-->common_ecs-output;
fortigate_2_ecs-->common_ecs-output;
forticlient_2_ecs-->common_ecs-output;
fortisandbox_2_ecs-->common_ecs-output;
Receives syslog logs and populates data_stream
fields depending on udp port.
You can also uncomment Fortianalyzer tags is you are using it for syslog forwarding. Fortianalyzer stamps its own date format to the log, so it needs to be treated different.
Splits the original log into key-value pairs and sets the timestamp. Timezone is also obtained from the log itself if FortiOS v6.2+.
- Validates nulls on IP fields. Fortinet loves to fill with "N/A" null fields, which turns into ingestion errors if your field has IP mapping. We could do it with grok.
- Renames Fortigate fields that overlaps with ECS. In the future this will be done on the kv filter stage, to be more ECS compliant.
- Translates Fortigate field to ECS by
type
of logs. - Introduces
network.protocol_category
used on dashboards controls. Mega useful!!! - Populates other ECS fields based on ECS recommendations, like
event.kind
,event.category
,event.type
.
Populates several ECS fields based on other present fields.
*.locality
for private, loopback, link-local, multicast and public addresses. These fields are not ECS official fields.- Geo localitzation of IPs. 🌎
related.ip
andrelated.user
.network.bytes
andnetwork.packets
.event.duration
. ⌛event.hour_of_day
andevent.day_of_week
. These fields are not ECS official fields.- Calcualtes
network.community_id
just for tcp/udp. - Registered domain.
- Url parsing.
user_agent.*
.network.transport
.
Output is based on index strategy, which is crucial for large ingestion cases 🤯. On Fortigate datastreams are split by type
.
In our experience, type=traffic
generates lots of logs, while type=event
very few. Even inside type=traffic
you might have that most of your logs have action=denied
, so you may want to split them even further. Splitting into several datastreams allows to assign different ILMs policies and also will be faster for searching.
Elasticseach has a "statical" approach to datastream definition because we have to somehow map our strategy to the datastream fields. If we would like to add a couple of fields to our splitting decision, like action
and source.locality
, we would need to insert those fields into data_stream.dataset
and we might impact index template and ILM. Surely if we want to benefit from a faster searching on those fields we would need to change their mapping to constant_keyword
. We don't know in advance how our data is distributed, and even if we knew, that might change on the future. On the other hand, Grafana Loki provides a more "flexible" approach, we just index the fields that should "split" our data, and their are treated as labels. This looks really cool and we will be exploring Grafana Loki on the near future.
We have tried to follow Fortigate´s Logs & Report section. Main objective of these dashboards is to do threat hunting by checking some critical KPIs in order to spot anomalies on them.
We have migrated eveythigh to Lens now, so that has helped a lot on performance, but still it is very recommended to fine tune the dashboards with the relevant info to your needs. There a lot of visualizations on each dashboard so keep in mind performance can be impacted (slow loading times or even failures when loading).
All dashboards are connected via its header structure. Making it easy to navigate trough them.
Dashboards follow a (max) 3 layer structure, going from more general to more specific.
-
Top level reference Fortinet´s type field: traffic, utm or event. UTM is already disaggregated so it can be easier to go to an specif UTM type, just like in Fortigate´s Logs & Report section.
-
Second level dives into traffic direction (if possible). For example: on traffic dashboard, we have
Outbound | Inbound | LAN 2 LAN
. It makes total sense to analyze it separetly. -
Third level refers to which metric are we using for exploring the dataset: we only use sessions and bytes.
-
sessions: we consider each log as a unique session. Be careful is you have connections that go through several firewalls or vdoms.
-
bytes: we analyze
source.bytes
anddestination.bytes
by both sum and average.logid=20 introduces duplicate data when doing aggregations (sum of bytes for a particular
source.ip
). That is why it is filtered out on all dashboards. It is recommended not to drop this logs as they might be useful for troubleshooting or forensic analysis.
- Controls, above header structure, let you quickly filter your data.
Dashboards have 2 sections:
- The upper visualizations show specific fields for the dataset that it is been analyzed. We split the most meaningful fields by
action
, because that is why you bought a firewall in first place, to allow/block stuff. - The lower visualizations are entity specific, on the first row there will always be
source.ip
,destination.ip
,network.protocol
which are fields that are present on all logs. The second raw has entities that might be useful on the analysis of that specific dashboard.
We need your help for getting the datasets for versions 6.2 and forward. Currently we only got this. Fortinet fields are always "evolving" (changing witout any notice and logic), and not all changes get docummented on Log Reference releases. Any help on taking a Log Refence file and transform it into a csv will be welcome. 🆘
Once we got the Log Reference guides turned into spreadsheets we could process the data. We had to denormalize data, merge fields, verify fields mapping (data type), look for fileds that overlap with ECS fields, translate fields to ECS, make mappings and pipelines configs.
We plan to consolidate datasets per major version in a single spreadsheet.
Fortigate logs are an ugly beast, mainly because its lack of (good) documentation. Altough it has been improving, it is still far from being coherent. For example, starting from 6.2.1, type "utm" was documented, altough it existed long ago.
On top of that, GTP events cause some field mismatch mapping like:
-
checksum:
string | uint32
-
from:
ip | string
-
to:
ip | string
-
version:
string | uint32
As far as we are concern, GTP is only part of Fortigate Carrier, which is a different product (¿?) How can Fortigate manage a field that has 2 different data types in its internal relational database? how does fortianalyzer do it? We have no idea because we have never seen GTP events on a real scenario. In order to avoid any data type mismatch, GTP events are not going to be considered, and unless you use Fortigate Carrier, you are not going to see them either.
The spreadsheet goes like:
-
Data 6.2.X
is the denormalized data set obtained from the Log Reference Guide of version 6.2.X. This is the raw dataset of every version. -
Data
has all the datasets combined fromData 6.2.X
sheets. You can look at it as the denormalize version of all datasets of major release 6.2. -
On
Overlap ECS - Summary of fields
, we look for any field on Fortigate dataset that could overlap with ECS fields. First we consolidate all fields with a dynamic table, and then lookup for it over root fields onECS 1.X
. For example, Fortigate dataset has fieldagent
, which overlaps with fieldagent
on ECS. If we find an overlap, we propose a rename for Fortigate field:fortios.agent
. -
We have decided to slice the full dataset by type, resulting in 3 datasets: traffic, utm and event. Each of them has its own translation. So, on sheets
Summary of "traffic type" fields
,Summary of "utm type" fields
andSummary of "event type" fields
we consolidate the fields of each dataset independently. -
On
ECS Translation of "XXX type" fields
is where the magic happens. Based on our criteria, we translate to ECS the fields we consider can fit. -
On
logstash - XXX
we consolidate the translated fields of previous sheets and generate logstash code. -
On
fortigate mapping
we filter all fortigate fields that are not string and. Based on its type, mapping is generated. The template we use consider keyword as default mapping, this is why we only explicitly define non-keyword fields.
Translation is where we need more help from the community!!! Any suggestions are welcome.
Not updated in a while 😕
Not updated in a while 😕
We have not tested it yet on FortiOS v7+
We can divide the whole project into these phases:
- Parsing (DONE ✅)
- ECS normalization (DONE ✅)
- Common enrichements (GeoIP, network.community_id, etc.) (DONE ✅)
- Fortigate v7 support, specially Syslog RFC5424 format. (WIP 🏗)
- Palo Alto support (WIP 🏗)
- Asset Enrichment: Fortigate can map user identity inside the logs, but that is not enough. We need to map networks funtionality, assets risk and group. For example, we might have a critical application that may have several web servers, sveral backends and a distributed DB; we need to know in our logs that all those IPs belong to the same application, which is a critical asset for business. If an alert gets generated, we can treat it with more priority. 🧠
- IoC enrichment: IoC in general has 2 sides: enriching a log that is currently being ingested, and enriching a log that has already been ingested. Both approachs are needed, and both have very different challenges. For the first one, we can tweak the geoip filter for such purpose, like these guys do for assets.
- Explore other ingest options: Kafka and Rsyslog
Ingestion should be about fortifying raw logs as much as possible, so we can have more inputs for doing analysis.
One of the benefits of FortiDragon is that we are not limited to Elastic, we can use any tool we would like. Altough we love ELK, there are some other tools that can be used on specific areas such as Jupyter Notebooks or Google Colab for ML analytics, or Neo4j for graph analysis.
On the near future, we would like to integrate Loki/Grafana. Logstash already has a Loki output plugin, so it should not be very diffult to start testing it. We want to explore other visualization and alerting options.
graph LR;
Fortigate-->Logstash;
Palo_Alto-->Logstash;
Logstash-->Elasticsearch;
Logstash-->Loki;
Elasticsearch-->Kibana;
Loki-->Grafana;
We got our super enriched logs 🦾, now what 😕?? Well, firewall logs are just firewall logs: lots of them, very few value on each of them. We need to pivot from just logs to datapoints, by defining entities and features about the behaivor of those entities.
Traditionally we have used the 5 tuple concept to identify connections, meaning the components of the tuple are the important entities of any given connection. However, with next generation application firewalls we now have more relevant information, like application or user. We can define a "new" 5 tuple, composed of source.ip
, source.user
, destination.ip
, service
(destination port + protocol) and application
. These are the entities on the NGFW world that are present on all connecions and we should analyze KPIs of them and the interation between them. 💡💡💡
For example, let's say in a period of 1 hour, we see an IP that has had connecions to a thousand different DNS servers. That is really weird, right?
What have we just done? We have definied source.ip
as our entity, and we had definied unique destination.ip on UDP/53 over 1 hour
as a feature of that entity, transforming all those 1k logs into a single document. We can define many features that can be relevant to take a look at. Those features are what we call Key Security Indicators (KSIs) and they will be the foundation for making sense out of network logs. Once we got KSIs for our entities, we can profile them just by checking how these KSIs evolve over time, or in comparisson to other entities in our infraestructure. We should use Transforms and ML of Elasticsearch for such purpose.
Another particular topic that has always had me wonder is P(A-->B), meaning the probability of A talking to B. We already got all the connections that are running on the network, so obtaining that probability should be trivial. More complex to calculate would be P(A-->B-->C), probability of A talking to C through B. What we are trying to get is the relations that different assets of our network have. If we mix it with KSIs of every individual asset, we can have a very powerful analysis. This seems particular useful for lateral movement and beaconing. For such analytics we need a graph database like Neo4j.
- More dashboards: SD-WAN, traffic shapping, consolidated risk-score, etc.
- Vega visualiations.
- Canvas for reports and C-level presentations. 🖌
- Grafana
Logstash pipelines and Elasticsearch config @hoat23 and @enotspe 🐉
Dataset analysis and Kibana @enotspe 🐉
Current maintenance @enotspe 🐉