sepinf-inc / iped Goto Github PK

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.

License: Other

Java 90.63% JavaScript 5.04% Python 0.70% HTML 3.09% CSS 0.43% XSLT 0.03% Cypher 0.07%

forensic recovery digital-forensics

iped's Introduction

IPED Digital Forensic Tool

IPED is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.

History

IPED - Digital Evidence Processor and Indexer (translated from Portuguese) is a tool implemented in java and originally and still developed by digital forensic experts from Brazilian Federal Police since 2012. Although it was always open source, only in 2019 its code was officially published.

Since the beginning, the goal of the tool was efficient data processing and stability. Some key characteristics of the tool are:

Command line data processing for batch case creation
Multiplatform support, tested on Windows and Linux systems
Portable cases without installation, you can run them from removable drives
Integrated and intuitive analysis interface
High multithread performance and support for large cases: up to 400GB/h processing speed using modern hardware and 135 million items in a (multi) case as of 12/12/2019

Currently IPED uses the Sleuthkit Library only to decode disk images and file systems, so the same image formats are supported: RAW/DD, E01, ISO9660, AFF, VHD, VMDK. There is also support for EX01, VHDX, UDF(ISO), AD1 (AccessData) and UFDR (Cellebrite) formats.

If you are new to the tool, please refer to the Beginner's Start Guide.

Building

To build from source, you need git, maven and Java JDK 11 + JavaFX (e.g. Liberica OpenJDK 11 Full JDK) installed. Set JAVA_HOME environment var to your java 11 installation folder, then run:

git clone https://github.com/sepinf-inc/IPED.git
cd IPED
mvn clean install

It will generate an snapshot version of IPED in target/release folder.

Attention: the default master branch is the development one and is unstable. If you want to build a stable version, checkout some of the release tags after the clone step.

On Linux you also must build The Sleuthkit and additional dependencies. Please refer to Linux Section

Contributions are very welcome! Before contributing please refer to Contributing

Features

Some of IPED several features are listed below:

Supported hashes: md5, sha-1, sha-256, sha-512 and edonkey. PhotoDNA is also available for law enforcement (please contact iped at pf dot gov dot br)
Supported hash sets: NIST NSRL, NIST CAID, ProjectVIC, Interpol ICSE, standard CSV format
Fast hash deduplication
Signature analysis
Categorization by file type and properties
Recursive container expansion of dozens of file formats
Embedded forensic/virtual disks expansion: supports splitted or single segment DD, E01, EX01, VHD, VHDX, VMDK (differential VMDKs are also supported)
Image and video gallery for hundreds of formats
Georeferencing of GPS data, using Google Maps, Bing or OpenStreetMaps
Regex searches with optional script validation for credit cards, emails, urls, ip & mac addresses, money values, bitcoin, ethereum, monero, ripple wallets and more...
Embedded hex, unicode text, metadata and native viewers
File content and metadata indexing and fast searching, including unknown files and unallocated space
Efficient data carving engine (takes < 10% processing time) that scans much more than unallocated, with support for +40 file formats, including videos, extensible by scripting
Optical Character Recognition powered by tesseract 5
Encryption detection for known formats and using entropy test
Processing profiles: forensic, pedo (csam), triage, fastmode (preview) and blind (for automatic data extraction)
Detection for +70 languages
Named Entity Recognition (needs Stanford CoreNLP models to be downloaded)
Customizable filters based on any file metadata
Similar document search with configurable threshold
Similar image search, using internal or external image
Similar face recognition, optimized to run without GPU, with configurable threshold
Unified table timeline view and event filtering for timeline analysis
Powerful file grouping (clustering) based on ANY metadata
Support for multicases up to 135 million items
Extensible with javascript and python (including cpython extensions) scripts
External command line tools integration for file decoding
Browser history for IE, Edge, Firefox, Chrome and Safari
Custom parsers for Emule, Shareaza, Ares, WhatsApp, Skype, Telegram, Bittorrent, ActivitiesCache, and more...
Fast nudity detection for images and videos using random forests algorithm (thanks to its author @tc-wleite)
Nudity detection using Yahoo open-nsfw deeplearning model (needs keras and tensorflow)
Audio Transcription, local and remote implementations with Azure and Google Cloud services
Graph analysis for communications (calls, emails, instant messages...)
Stable processing with out-of-process file system decoding and file parsing
Resuming or restarting stopped or aborted processing (--continue/--restart options)
Web API for searching remote cases, get file metadata, raw content, decoded text, thumbnails and posting bookmarks
Creation of bookmarks/tags for interesting data
HTML, CSV reports and portable cases with tagged data

Screenshots

Processing:

Analysis:

Data Carving & Video Thumbnails:

Regex Results:

Map:

Communication links:

Face search:

Audio Transcription:

Timeline:

Time chart:

iped's People

Contributors

Stargazers

Watchers

Forkers

ekshimokawa ffurtuoso doutornet thatguyz dlnfreitas gribel ivaelym bentoti nikoyakana emanoelbarros jeshusho cristianluz jilds maildesn deduplicate edpess brunontr atilaromero thebarristersway fmpfeifer bhoelz fcleal gfd2020 mbichara jonasmfreitas lulzsectoolkit marcelosjbs thyarles masterscott luanjampa douglasbalen josethiagof andrenunes1812 alexjmbarton ebersantana juliocbcoelho-ia ramonfontes vitorlco sycomix danieldfs nalomysouza fabiovenicius hugonh danbr78 kraftdenker awillsousa atrio-emc ferrazrs digomes87 6un9-h0-dan lucklespp markmckinnon ruisantana claytoncova rajivraj fsgalletti caiorobaldo andrepiacentini maxopollo2 samsplunks thalespr br3ign jack51706 dantepippi slompo kennynakamura marciopocebon isiaon 5l1v3r1 sahaniarun takianfif optionalg sofienelkamel vieiravh inc775 security888test jorgeih kakawome nimitzufo rodac5 felipecampanini marcosi2 maxbyz aberenguel optydeviocourses shailu-coder hackinfinity h4xl0r cristiano-cc sandbox-125 trystero7 j4xtr1x seijihariki darkmoonzika chrisdelvalle90 ra2003 jayrebuilds gccgustavo edmilsontome joseleobas

iped's Issues

Liberação do código fonte para contribuições?

Prezado,

Antes de mais nada parabéns pelo trabalho. Gostaria de saber se há interesse na liberação do código para que outras pessoas possam contribuir.
Há algumas visualizações e análises que poderiam ser feitas, por exemplo, com teoria dos grafos aplicadas aos contatos realizados, mensagens encontradas, etc. Além disso, com o código seria possível integrar bibliotecas que fazem vetorização de palavras para aumentar o escopo da pesquisa de regex. Enfim, há várias contribuições possíveis. Peço pra saber se há interesse na abertura do código para, então, fazer essas contribuições.
Muito obrigado,
Danilo

Skype v12 support

Initial implementation by Patrick Bernardina was pushed to SkypeV12 branch. Actually seems the implementation supports versions v13 and v14.

Selecting rows in ResultTable by typing is not working

Selecting rows in ResultTable by typing their content for the sorted column is not working anymore.
This is a minor regression bug, but I am not sure when (which release) it was introduced.

Monitor timeout for all processing modules

Currently only problematic modules (parsing, image and video thumbs, ocr) have timeout control. But even simple modules (like signature) could hang, eg after dependency updates (happened in the past) or when dealing with specifically crafted files. Buggy user modules could hang too.

So a general timeout control is desired.

GUI error when re-opening case while processing

While processing, if the analysis UI is opened, closed, and opened again, some errors occur. NPE in ColumnsManager and problem with re-adding dockable panels.

Dependência da imagem

Bom dia.
Após a indexação do caso, sempre necessitarei da imagem, por exemplo, E01, para poder visualizar os arquivos? Em qual arquivo de configuração eu posso indicar o caminho da imagem, caso a mude de local?

Incremental processing

Make it possible to run any processing module (hash, signature, container expansion, carving, ocr, indexing...) after initial processing. This will also allow to resume processing with errors.

Probably the index will be broken into 2 (metadata and text indexes), so metadata index with processing flags will be easily and efficiently updated. Will break back compatibility, so is scheduled for v4.0.

Use DockingFrames in viewer tabs

@lfcnassif was there any reason the viewer tabs were not converted to use Docking Frames library?
It could be useful to detach and move one of the viewers.

Add an easier way to change the number of thumbnail columns in the gallery.

Add an alternative (easier) way to change the number of columns displayed in the thumbnail gallery.
Currently this option is available as an item application menu.
My suggestion is to use a nice feature called “Action” available in the DockingFrames library.
It allows associating controls (e.g. buttons) to a “dockable” element, in this case the “Gallery”. These controls are shown in its title bar, close to the already existing window controls.
This feature (adding buttons to the title bar) is used by several applications (like Eclipse).
Whenever the gallery is active, it would be possible to change the number of thumbnail columns directly.
In my tests, this feature works better with small icon buttons (without text), so it would use tooltips.
If this option works fine, in the future, the current way of changing the number of thumbnail columns could be removed from the menu, as it contains a large number of items already.
An example from the library manual:

Arquivo *.iped

No manual, no tópico processsamento, informa-se que " -d: dados diversos (pode ser usado varias vezes): pasta, imagem DD, 001, E01, AFF (apenas linux), ISO, disco físico, ou arquivo *.iped (contendo seleção de itens a exportar e reindexar)". Qual o formato de um arquivo *.iped? Que tipos de itens eu posso ter nesse arquivo? Nomes de arquivos? Nomes de pastas? Obrigado.

GPX and KML files should be expanded in profiles different from pt-BR/default

Just a profile configuration issue. Detected when merging #31

Implement a view and export of all findings

We need to be able to visualize and export all findings from all the files.

Simple use case would be to check if all findings are the same.
It is easy to find more complicated scenarios.

User interface to configure options and start processing

A lot of third parties have developed user interfaces to configure and start processing. We have heard about 7 of them, at least. So this is a needed feature, very important for non tech users. It will be needed when additional/post processing is implemented.

Parsers for phone artifacts integrating ALeapp/iLeapp

Currently we just have parsers for WhatsApp and Skype (edited: and Telegram). To decrease the dependency of other tools (UFDR reports), it is important to have parsers for calls, contacts, calendar, sms/mms, notes, locations, other instant message apps (facebook, ~~telegram~~, instagram, twitter, snapchat...), custom email containers. Android and iOS will need different parsers. This ticket could be broken in smaller ones for each artifact.

Contributions are very welcome :)

Problem importing NSRL

Boa tarde lfcnassif

Estou com problemas para importar o NSLR no IPED, no meu caso nem chega a iniciar a importação. aparece o erro:
"ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console."
Poderia dar uma ajuda por favor?
Obrigado!

Range searches in Message-Date field and other internal dates (office, pdf, exif...) not working

It does not work with versions <=3.17.1. Currently that field is using a tokenizer that breaks tokens in '-' chars. That field should not be tokenized, so will be indexed as a unique token. This will not break sorting multicases, because sorting info is stored separately.

Allow to choose file properties to include in HTML report

Currently just fixed basic file properties are included.

Error reading files on Linux without patch

IPED versions 3.16.3 and 3.17 are failing to read file content on Linux with pure Sleuthkit 4.6.5 and 4.8.0. Sleuthkit throws org.sleuthkit.datamodel.TskCoreException: Invalid IMG_INFO object.

The following fork works: https://github.com/lfcnassif/sleuthkit-APFS

It should work without APFS and optimization patches.

Problem importing NSRL

Boa tarde,

Como utilizar a base NSRL? Baixei os arquivos do site deles, mas não contém arquivos .db, como sugere o arquivo de configuração. Tentei importar o que estão lá, mas dá erro. Qual o procedimento correto?

Improve HexViewer

Add goto, search, charset option and layout customization. PR #28 sent by @gfd2020.

Optimize bookmarking of duplicated items

Currently an index search is done to find hash duplicates of items being bookmarked to include those duplicates together in the bookmark. It could be a lot faster, using the same approach used by the fast DynamicDuplicateFilter (Lucene DocValues) that filters duplicates on the fly.

Use classic rectangular tabs instead of curved ones, to save space.

Currently the dockable tabs waste a lot of horizontal space because of their "curved" aspect.
@lfcnassif , do you have a personal preference or any other reason to keep this particular feature of the current look (Docking Frames Eclipse theme)?!

In smaller screens, this can be annoying as the divider between the left and the central controls can't be moved too far to the left without hiding one of the tab titles (e.g. Categories / Evidences).

My suggestion is to change it to use classic rectangular tabs and with reduced insets, as can be seen in the lower part of the picture.

Parser for binary plist files

Could be used com.googlecode.plist:dd-plist library, already used by SafariPlistParser.

Allow image rotation in the viewer

Sometimes it is useful to rotate images that contain rotated text (e.g. scanned documents with wrong orientation).
My suggestion is to add a small vertical toolbar to the image viewer, which would only be visible when the mouse is inside the viewer area, to avoid wasting space.
The rotation would be available with two buttons: rotate right, rotate left.
As there will be a toolbar, I suggest also adding two basic functions:

Zooming buttons (in, out, and fit-to-window), as an alternative way to control zoom (other than existing mouse scroll button).
A slider to control image brightness (useful for dark images).

Store all email headers as metadata

Currently just subject, date, from, to, cc and bcc are stored. It would be useful to index all header fields as individual metadata, so user could filter or sort by those fields. EML, PST and OST parsers should be updated.

Temporary sqlite file leak

Usually temporary sqlite files are left behind in temporary folder, sometimes they are opened in write mode and uses WAL logs, so closing db file handles does not clear wal logs and mmap files.

Export to CSV sometimes generates malformed CSV

Seems it started to happen after automatic column management was turned on by default. As export to CSV includes all visible columns, some custom columns eventually have invalid chars, like image:comments, and that cause the issue.

Index (and store?) files into ElasticSearch cluster

ElasticSearch cluster has many advantages over local Lucene indexes: remote access api well defined and documented, scalability, replication, load balancing... Also, that will allow developing a web analysis ui totally decoupled and independently from the processing engine.

Drawback is those cases will not be portable. Probably local indexes will need to continue being supported, at least for reports.

Refactor internationalization

Currently there are profiles for each language and index fields change with language. This is bad and must be changed. Will break back compatibility, so will be done for 4.0 version.

HtmlViewer is downloading external resources since 3.16

It is happening since a "fix" to a mapTab/HtmlViewer issue that caused the map to stop working after a HTML location artifact was clicked, because that enables the blocking http proxy.

Profile fastmode not working in v3.17

It throws NullPointerException. Fixed with 4337033

Store small extracted files into container format

Currently iped generates a lot of small files (thumbnails, container subitems and ocr text) into case folder. That makes copying or deleting case very slow. Storing small files into a container would be better. Big files (>50MB?) will remain extracted into case folder because some modules need an actual file to process, otherwise processing or UI will be slower waiting for big temp files to be extracted from the new container format.

SQLite is a storage option, but performance needs to be evaluated because only one thread can write at a time into the database.

Organize evidence folders

Is there a way to organize the evidences into folders? For example, lets say i have data from two custodians, each has a forensic image of a laptop in E01 format and a forensic report of cellebrite in xml format.

Can i put these evidences according to the custodian folder? Like the example bellow.

Evidences

Custodian1_Folder

Laptop.e01
SmartphoneCellebrite

Custodian2_Folder

Laptop.e01
SmartphoneCellebrite

Are you going to make it OpenSource ?

I'd read this:

http://m.folha.uol.com.br/poder/2017/01/1846272-neo-volume-de-dados-da-lava-jato-forca-pf-a-criar-novo-sistema.shtml

Barra de navegação vertical da página tabela não aparece

Boa tarde.
Processei um caso com sucesso, mas, na visualização pela "ferramenta de pesquisa.exe" a barra de navegação vertical das abas Tabela e Galeria não aparece. Segue tela onde ocorre o erro.

Att.
Claudemir

On-the-fly duplicate filter sometimes does not filter first hash

It happens when first listed hash, after sorting, has a value (not null and not empty). When there is a null or empty hash, it works. Fix is simple.

Make IPED Viewer DPI-aware

When IPED search app is used in a high-resolution monitor (e.g. 4K), fonts and most of the controls look too small.
@lfcnassif , has anyone ever complained about this?

There are a few workarounds (at least in Windows 10), but all of them will need to scale up the window content, losing quality.
Currently there is the "CTRL+" / "CTRL-" to increase font sizes, but it doesn't work very well (and is "hidden").
The ideal solution, in my opinion, would be to review all user interface related code (iped-viewer?) to be "resolution aware", i.e. to know which is the current resolution of the monitor.
In practice, a single class would take care of the calculations to scale, but existing code would need to be reviewed to use that class.
Once the application is converted to be resolution aware, it would be possible to add a local parameter to allow scaling (e.g. 120%, if the user wants larger fonts/controls).
This is how it looks like now in a 4K monitor:

Typed evidence description not included in html report

Affects version 3.17.1. It works with evidence info imported from .json or .asap files.

Filter Manager

Create a central filter manager dialog or tab to list all applied or last filters, so users could enable/disable specific filters, last one or all of them. Currently user needs to go to each tab to disable its filter, that's annoying.

From a selected media file, go to related conversations / p2p histories

Similar to "go to parent folder", it would be useful to find chats where a specific media file was sent or received and automatically scroll down to the media transfer point into the chat.

Problem with -ocr option

Estou tentando usar o -ocr mas não estou conseguindo, poderia mostrar a sintaxe? Estou fazendo como abaixo, mas o resultado não está indexando:
java -jar iped.jar -d S:\teste2\IPED\marcados\03-01.iped -ocr Documentos -o S:\teste2\IPED\teste-03-01.

Outra dúvida, há como marcar o -ocr para somente alguns ítens marcados? Algo tipo ... [-ocr marcador1] ?

Grato.

Processing aborts when sleuthkit sqlite returns read error

This sometimes happens when processing cases over network. We could retry sleuthkit reads for a while before aborting for all errors, currently just SQLITE_BUSY is retried.

Graph/link analysis

Show graphs of communications (calls, emails, messages) or relations between entities (emails, phones, accounts, contacts, person ids). Initial implementation done by @filipesimoes. Will be pushed after dependency problems are solved.

Currently both entities and items (files, emails, messages) are represented as nodes. Probably the model will be refactored, so nodes will represent only entities and edges will represent case items (communications and files).

Continue aborted processing

Got an ideia while working on #26. I think this don't need #24 to be implemented before. The current problem to recover processing is that item IDs change between different runs because of multithreading.

But we can create some persistent ID between runs like (path + sleuthkitID + subitemID) and do partial commits into the index. If processing is aborted or crashes, we can load those persistent IDs from the index and skip already commited items. There are some details, but I think it will work.

Como obter o IPED ?

Caro Nassif , sou perito do TJRJ e gostaria de obter o sofrware , perdoe minha ignorancia mas não consegui achar aqui na página . Como faço para baixar ?

Problem with 3gp exporting

Boa tarde.
Processei com sucesso um caso no IPED. Configurei para exportar todas as categorias exceto:
#Peer-to-peer
#Arquivos OLE
#Registro do Windows
#Programas e Bibliotecas
#Tamanho Zero
No IPED, selecionei os arquivos 3gp, todos com tamanho maior que 0 e não deletados e mandei exportar para uma pasta no meu computador. Todos eles foram copiados com tamanho 0, que certamente como verifiquei na listagem do próprio IPED e na própria mídia original, não estavam deletados e não possuem tamanho 0, pois os visualizei com o Encase. Será que houve problema na importação dos arquivos pelo IPED?

Optimize PST decoding

While working on #62, I noted java-libpst was wasting a lot of time in PSTNodeInputStream.seek(long) iterating over a list of skipPoints to find the correct position. Using a Set should be much faster.

Contacts and other PST objects generate different outputs between different runs

This is because some internal attributes do not override toString() method, so it results in different strings (hashcode values) in each run, changing the object html report hash.

Records from sqlites in wal mode may be missed

If sqlite is in wal mode, last writes are written to a separate wal file. This file is usually merged and deleted when last connection to database is closed. But if application suddenly terminates (computer power cable unplugged or phone battery removed), wal logs will retain last records written. Currently wal logs are not checked nor processed by SQLite parses.

Unified Timeline Table View and Event Filtering

Please implement a linear - supertimeline view with MACB and the ability to export it to csv.

In such a view, same files are shown few times depending on times associated with it.