mandiant / capa Goto Github PK

The FLARE team's open-source tool to identify capabilities in executable files.

License: Apache License 2.0

Python 99.96% Dockerfile 0.04%

malware-analysis reverse-engineering gsoc-2024

capa's Introduction

capa detects capabilities in executable files. You run it against a PE, ELF, .NET module, shellcode file, or a sandbox report and it tells you what it thinks the program can do. For example, it might suggest that the file is a backdoor, is capable of installing services, or relies on HTTP to communicate.

Check out our capa blog posts:

$ capa.exe suspicious.exe

+------------------------+--------------------------------------------------------------------------------+
| ATT&CK Tactic          | ATT&CK Technique                                                               |
|------------------------+--------------------------------------------------------------------------------|
| DEFENSE EVASION        | Obfuscated Files or Information [T1027]                                        |
| DISCOVERY              | Query Registry [T1012]                                                         |
|                        | System Information Discovery [T1082]                                           |
| EXECUTION              | Command and Scripting Interpreter::Windows Command Shell [T1059.003]           |
|                        | Shared Modules [T1129]                                                         |
| EXFILTRATION           | Exfiltration Over C2 Channel [T1041]                                           |
| PERSISTENCE            | Create or Modify System Process::Windows Service [T1543.003]                   |
+------------------------+--------------------------------------------------------------------------------+

+-------------------------------------------------------+-------------------------------------------------+
| CAPABILITY                                            | NAMESPACE                                       |
|-------------------------------------------------------+-------------------------------------------------|
| check for OutputDebugString error                     | anti-analysis/anti-debugging/debugger-detection |
| read and send data from client to server              | c2/file-transfer                                |
| execute shell command and capture output              | c2/shell                                        |
| receive data (2 matches)                              | communication                                   |
| send data (6 matches)                                 | communication                                   |
| connect to HTTP server (3 matches)                    | communication/http/client                       |
| send HTTP request (3 matches)                         | communication/http/client                       |
| create pipe                                           | communication/named-pipe/create                 |
| get socket status (2 matches)                         | communication/socket                            |
| receive data on socket (2 matches)                    | communication/socket/receive                    |
| send data on socket (3 matches)                       | communication/socket/send                       |
| connect TCP socket                                    | communication/socket/tcp                        |
| encode data using Base64                              | data-manipulation/encoding/base64               |
| encode data using XOR (6 matches)                     | data-manipulation/encoding/xor                  |
| run as a service                                      | executable/pe                                   |
| get common file path (3 matches)                      | host-interaction/file-system                    |
| read file                                             | host-interaction/file-system/read               |
| write file (2 matches)                                | host-interaction/file-system/write              |
| print debug messages (2 matches)                      | host-interaction/log/debug/write-event          |
| resolve DNS                                           | host-interaction/network/dns/resolve            |
| get hostname                                          | host-interaction/os/hostname                    |
| create a process with modified I/O handles and window | host-interaction/process/create                 |
| create process                                        | host-interaction/process/create                 |
| create registry key                                   | host-interaction/registry/create                |
| create service                                        | host-interaction/service/create                 |
| create thread                                         | host-interaction/thread/create                  |
| persist via Windows service                           | persistence/service                             |
+-------------------------------------------------------+-------------------------------------------------+

download and usage

Download stable releases of the standalone capa binaries here. You can run the standalone binaries without installation. capa is a command line tool that should be run from the terminal.

To use capa as a library or integrate with another tool, see doc/installation.md for further setup instructions.

For more information about how to use capa, see doc/usage.md.

example

In the above sample output, we ran capa against an unknown binary (suspicious.exe), and the tool reported that the program can send HTTP requests, decode data via XOR and Base64, install services, and spawn new processes. Taken together, this makes us think that suspicious.exe could be a persistent backdoor. Therefore, our next analysis step might be to run suspicious.exe in a sandbox and try to recover the command and control server.

By passing the -vv flag (for very verbose), capa reports exactly where it found evidence of these capabilities. This is useful for at least two reasons:

it helps explain why we should trust the results, and enables us to verify the conclusions, and
it shows where within the binary an experienced analyst might study with IDA Pro

$ capa.exe suspicious.exe -vv
...
execute shell command and capture output
namespace   c2/shell
author      [email protected]
scope       function
att&ck      Execution::Command and Scripting Interpreter::Windows Command Shell [T1059.003]
references  https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/ns-processthreadsapi-startupinfoa
function @ 0x4011C0
  and:
    match: create a process with modified I/O handles and window @ 0x4011C0
      and:
        number: 257 = STARTF_USESTDHANDLES | STARTF_USESHOWWINDOW @ 0x4012B8
        or:
          number: 68 = StartupInfo.cb (size) @ 0x401282
        or: = API functions that accept a pointer to a STARTUPINFO structure
          api: kernel32.CreateProcess @ 0x401343
    match: create pipe @ 0x4011C0
      or:
        api: kernel32.CreatePipe @ 0x40126F, 0x401280
    optional:
      match: create thread @ 0x40136A, 0x4013BA
        or:
          and:
            os: windows
            or:
              api: kernel32.CreateThread @ 0x4013D7
        or:
          and:
            os: windows
            or:
              api: kernel32.CreateThread @ 0x401395
    or:
      string: "cmd.exe" @ 0x4012FD
...

Additionally, capa also supports analyzing CAPE sandbox reports for dynamic capability extraction. In order to use this, you first submit your sample to CAPE for analysis, and then run capa against the generated report (JSON).

Here's an example of running capa against a packed binary, and then running capa against the CAPE report of that binary:

$ capa 05be49819139a3fdcdbddbdefd298398779521f3d68daa25275cc77508e42310.exe
WARNING:capa.capabilities.common:--------------------------------------------------------------------------------
WARNING:capa.capabilities.common: This sample appears to be packed.
WARNING:capa.capabilities.common: 
WARNING:capa.capabilities.common: Packed samples have often been obfuscated to hide their logic.
WARNING:capa.capabilities.common: capa cannot handle obfuscation well using static analysis. This means the results may be misleading or incomplete.
WARNING:capa.capabilities.common: If possible, you should try to unpack this input file before analyzing it with capa.
WARNING:capa.capabilities.common: Alternatively, run the sample in a supported sandbox and invoke capa against the report to obtain dynamic analysis results.
WARNING:capa.capabilities.common: 
WARNING:capa.capabilities.common: Identified via rule: (internal) packer file limitation
WARNING:capa.capabilities.common: 
WARNING:capa.capabilities.common: Use -v or -vv if you really want to see the capabilities identified by capa.
WARNING:capa.capabilities.common:--------------------------------------------------------------------------------

$ capa 05be49819139a3fdcdbddbdefd298398779521f3d68daa25275cc77508e42310.json

┍━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑
│ ATT&CK Tactic          │ ATT&CK Technique                                                                   │
┝━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
│ CREDENTIAL ACCESS      │ Credentials from Password Stores T1555                                             │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ DEFENSE EVASION        │ File and Directory Permissions Modification T1222                                  │
│                        │ Modify Registry T1112                                                              │
│                        │ Obfuscated Files or Information T1027                                              │
│                        │ Virtualization/Sandbox Evasion::User Activity Based Checks T1497.002               │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ DISCOVERY              │ Account Discovery T1087                                                            │
│                        │ Application Window Discovery T1010                                                 │
│                        │ File and Directory Discovery T1083                                                 │
│                        │ Query Registry T1012                                                               │
│                        │ System Information Discovery T1082                                                 │
│                        │ System Location Discovery::System Language Discovery T1614.001                     │
│                        │ System Owner/User Discovery T1033                                                  │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ EXECUTION              │ System Services::Service Execution T1569.002                                       │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ PERSISTENCE            │ Boot or Logon Autostart Execution::Registry Run Keys / Startup Folder T1547.001    │
│                        │ Boot or Logon Autostart Execution::Winlogon Helper DLL T1547.004                   │
│                        │ Create or Modify System Process::Windows Service T1543.003                         │
┕━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙

┍━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑
│ Capability                                           │ Namespace                                            │
┝━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
│ check for unmoving mouse cursor (3 matches)          │ anti-analysis/anti-vm/vm-detection                   │
│ gather bitkinex information                          │ collection/file-managers                             │
│ gather classicftp information                        │ collection/file-managers                             │
│ gather filezilla information                         │ collection/file-managers                             │
│ gather total-commander information                   │ collection/file-managers                             │
│ gather ultrafxp information                          │ collection/file-managers                             │
│ resolve DNS (23 matches)                             │ communication/dns                                    │
│ initialize Winsock library (7 matches)               │ communication/socket                                 │
│ act as TCP client (3 matches)                        │ communication/tcp/client                             │
│ create new key via CryptAcquireContext               │ data-manipulation/encryption                         │
│ encrypt or decrypt via WinCrypt                      │ data-manipulation/encryption                         │
│ hash data via WinCrypt                               │ data-manipulation/hashing                            │
│ initialize hashing via WinCrypt                      │ data-manipulation/hashing                            │
│ hash data with MD5                                   │ data-manipulation/hashing/md5                        │
│ generate random numbers via WinAPI                   │ data-manipulation/prng                               │
│ extract resource via kernel32 functions (2 matches)  │ executable/resource                                  │
│ interact with driver via control codes (2 matches)   │ host-interaction/driver                              │
│ get Program Files directory (18 matches)             │ host-interaction/file-system                         │
│ get common file path (575 matches)                   │ host-interaction/file-system                         │
│ create directory (2 matches)                         │ host-interaction/file-system/create                  │
│ delete file                                          │ host-interaction/file-system/delete                  │
│ get file attributes (122 matches)                    │ host-interaction/file-system/meta                    │
│ set file attributes (8 matches)                      │ host-interaction/file-system/meta                    │
│ move file                                            │ host-interaction/file-system/move                    │
│ find taskbar (3 matches)                             │ host-interaction/gui/taskbar/find                    │
│ get keyboard layout (12 matches)                     │ host-interaction/hardware/keyboard                   │
│ get disk size                                        │ host-interaction/hardware/storage                    │
│ get hostname (4 matches)                             │ host-interaction/os/hostname                         │
│ allocate or change RWX memory (3 matches)            │ host-interaction/process/inject                      │
│ query or enumerate registry key (3 matches)          │ host-interaction/registry                            │
│ query or enumerate registry value (8 matches)        │ host-interaction/registry                            │
│ delete registry key                                  │ host-interaction/registry/delete                     │
│ start service                                        │ host-interaction/service/start                       │
│ get session user name                                │ host-interaction/session                             │
│ persist via Run registry key                         │ persistence/registry/run                             │
│ persist via Winlogon Helper DLL registry key         │ persistence/registry/winlogon-helper                 │
│ persist via Windows service (2 matches)              │ persistence/service                                  │
┕━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙

capa uses a collection of rules to identify capabilities within a program. These rules are easy to write, even for those new to reverse engineering. By authoring rules, you can extend the capabilities that capa recognizes. In some regards, capa rules are a mixture of the OpenIOC, Yara, and YAML formats.

Here's an example rule used by capa:

rule:
  meta:
    name: create TCP socket
    namespace: communication/socket/tcp
    authors:
      - [email protected]
      - [email protected]
      - [email protected]
    scopes:
      static: basic block
      dynamic: call
    mbc:
      - Communication::Socket Communication::Create TCP Socket [C0001.011]
    examples:
      - Practical Malware Analysis Lab 01-01.dll_:0x10001010
  features:
    - or:
      - and:
        - number: 6 = IPPROTO_TCP
        - number: 1 = SOCK_STREAM
        - number: 2 = AF_INET
        - or:
          - api: ws2_32.socket
          - api: ws2_32.WSASocket
          - api: socket
      - property/read: System.Net.Sockets.TcpClient::Client

The github.com/mandiant/capa-rules repository contains hundreds of standard library rules that are distributed with capa. Please learn to write rules and contribute new entries as you find interesting techniques in malware.

If you use IDA Pro, then you can use the capa explorer plugin. capa explorer helps you identify interesting areas of a program and build new capa rules using features extracted directly from your IDA Pro database.

If you use Ghidra, then you can use the capa + Ghidra integration to run capa's analysis directly on your Ghidra database and render the results in Ghidra's user interface.

further information

capa

capa rules

capa testfiles

The capa-testfiles repository contains the data we use to test capa's code and rules

capa's People

Contributors

Stargazers

Watchers

Forkers

crim3hound cephurs acealchemycyberblaze cybertoxin meshaeltech blue-infosec m00zh33 pynpy evilares xu7103224 arturoruz kernal-gh threathive tccontre slowmistio jeffli678 ceyhuncamli aqqdgyz kernweak m4rm0k d1pakda5 malwar3hunt3r tempbottle killbugs analyticsearch jermainlaforce wisdark runonceex olivierh59500 jack51706 fredyfx psifertex tzf-omkey sunware-shellcoder kelvinguo1988 grandgarcon litchi125 bharadwaj1997 dump-guy vishal9066 krzemienski h4sh5 fxcebx hatchetxuexi ja1e0 zldww2011 youngjun-chang usama7628674 binihao5bei seth1002 yurenhan asdlei99 fzxcp3 4n6strider reposities meishao bambooqj walt1998 msmmer angrykobe bruce2014 ashishvishwkarma skirankumar chubbymaggie janette88 limkokholefork gleeda porlockzzz dzbeck crackercat h1d3r guardianrg notepaddotexe uqcybersquad pp00001 keyman9848 mbhatt1 davidliu88 ivankabestwill ana06 5l1v3r1 recvfrom simplesoftmx hercul3s winniepe gitter-badger aryanguenthner freemanzyq threatpage hariram32 y0d4a nutmag timb-machine-mirrors harry1080 kernel1337 cclauss sukelluskello zhangzongchen zeta1999 peterrosetu

capa's Issues

linter: lib rules should not have a namespace

capa explorer fails on Python 2, IDA 7.5

IDAPython: Error while calling Python callback <OnCreate>:
Traceback (most recent call last):
  File "ida_capa_explorer.py", line 99, in OnCreate
    self.load_capa_results()
  File "capa/capa/ida/ida_capa_explorer.py", line 342, in load_capa_results
    capabilities = capa.main.find_capabilities(rules, capa.features.extractors.ida.IdaFeatureExtractor(), True)
  File "capa\capa\main.py", line 99, in find_capabilities
    for f in tqdm.tqdm(extractor.get_functions(), disable=disable_progress, unit=" functions"):
  File "C:\Python27\lib\site-packages\tqdm\_tqdm.py", line 997, in __iter__
    for obj in iterable:
  File "capa\capa\features\extractors\ida\__init__.py", line 54, in get_functions
    from capa.features.extractors.ida import helpers
ImportError: cannot import name helpers
INFO:capa:form closed.
Python>sys.version
'2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AMD64)]'

move pycodestyle and other dev dependencies into setup.py

rather than putting the python installation into the setup-hooks.py script

use an extras_require for [dev] maybe?

simplify metadata rendering

I propose the following formats to reduce duplicate information (MD5) and display the most important information first.

capa report could be included as a header/heading as well

default before

+------------------------+--------------------------------------------------------------+
| capa report for        | 34404a3fb9804977c6ab86cb991fb130                             |
| timestamp              | 2020-07-03T12:41:55.267000                                   |
| version                | 0.0.0                                                        |
| path                   | tests\data\34404a3fb9804977c6ab86cb991fb130.exe_             |
| md5                    | 34404a3fb9804977c6ab86cb991fb130                             |
+------------------------+--------------------------------------------------------------+

>>>>>
after

+------------------------+--------------------------------------------------------------+
| md5                    | 34404a3fb9804977c6ab86cb991fb130                             |
| path                   | tests\data\34404a3fb9804977c6ab86cb991fb130.exe_             |
| timestamp              | 2020-07-03T12:41:55.267000                                   |
| capa version           | 0.0.0                                                        |
+------------------------+--------------------------------------------------------------+



verbose, vverbose (should use same function) before

capa report for  34404a3fb9804977c6ab86cb991fb130
timestamp        2020-07-03T12:42:07.813000
version          0.0.0
path             tests\data\34404a3fb9804977c6ab86cb991fb130.exe_
md5              34404a3fb9804977c6ab86cb991fb130
sha1             b345e6fae155bfaf79c67b38cf488bb17d5be56d
sha256           c6930e298bba86c01d0fe2c8262c46b4fce97c6c5037a193904cfc634246fbec
format           auto
extractor        VivisectFeatureExtractor
base address     0x400000

>>>>>
after

md5              34404a3fb9804977c6ab86cb991fb130
sha1             b345e6fae155bfaf79c67b38cf488bb17d5be56d
sha256           c6930e298bba86c01d0fe2c8262c46b4fce97c6c5037a193904cfc634246fbec
path             tests\data\34404a3fb9804977c6ab86cb991fb130.exe_
timestamp        2020-07-03T12:42:07.813000
capa version     0.0.0
format           auto
extractor        VivisectFeatureExtractor
base address     0x400000

linter: lib rules should be found in lib directory

assume characteristics always encode the existance

after months of use, it seems that characteristic features are only used like characteristic(nzxor): True. that is, the value is always True. we can simplify and make the rule syntax more consistent by changing the format to look like characteristic: nzxor and count(characteristic(nzxor)).

to match the non-existence of this feature, use not: characteristic: ... or count(characteristic(...)): 0.

doc missing locations for "calls from" chacateristic

The doc format does not include locations for calls from characteristic. From my understanding these locations are recorded and should be included?

{'children': [],
 'locations': (),
 'node': {'statement': {'child': {'characteristic': 'calls from',
                 'type': 'characteristic'},
                  'max': 4,
                  'min': 0,
                   'type': 'range'},
                  'type': 'statement'},
'success': True},

add feature & function count to report metadata and render

from #91

also, this:

INFO:capa:format: blob, platform: windows, architecture: i386, number of functions: 42
INFO:capa:analyzed file and extracted 112 features

vivisect/viv-utils - Exception: Invalid File: shellcode

$ capa -f sc32 tests/data/499c2a85f6e8142c3f48d4251c9c7cd6.raw32
INFO:capa:--------------------------------------------------------------------------------
INFO:capa: Using default embedded rules.
INFO:capa: To provide your own rules, use the form `capa.exe  ./path/to/rules/  /path/to/mal.exe`.
INFO:capa: You can see the current default rule set here:
INFO:capa:     https://github.com/fireeye/capa-rules
INFO:capa:--------------------------------------------------------------------------------
WARNING:capa:skipping non-.yml file: .git
WARNING:capa:skipping non-.yml file: README.md
INFO:capa:successfully loaded 277 rules
INFO:capa:generating vivisect workspace for: tests/data/499c2a85f6e8142c3f48d4251c9c7cd6.raw32
Traceback (most recent call last):
  File "c:\python27\lib\site-packages\vivisect\impemu\monitor.py", line 147, in prehook
    cb(self, emu, op, starteip)
  File "c:\python27\lib\site-packages\vivisect\analysis\generic\switchcase.py", line 19, in analyzeJmp
    ctx = getSwitchBase(vw, op, starteip, emu)
  File "c:\python27\lib\site-packages\vivisect\analysis\generic\switchcase.py", line 69, in getSwitchBase
    imgbase = vw.getFileMeta(filename, 'imagebase')
  File "c:\python27\lib\site-packages\vivisect\__init__.py", line 2484, in getFileMeta
    raise Exception("Invalid File: %s" % filename)
Exception: Invalid File: shellcode
[...]
INFO:capa:format: blob, platform: windows, architecture: i386, number of functions: 42
INFO:capa:analyzed file and extracted 112 features
+------------------------+----------------------------------------------------------------+
| ATT&CK Tactic          | ATT&CK Technique                                               |
|------------------------+----------------------------------------------------------------|
| DEFENSE EVASION        | Obfuscated Files or Information [T1027]                        |
| EXECUTION              | Shared Modules [T1129]                                         |
+------------------------+----------------------------------------------------------------+

+---------------------------------------------+----------------------------------------------+
| CAPABILITY                                  | NAMESPACE                                    |
|---------------------------------------------+----------------------------------------------|
| contain obfuscated stackstrings (2 matches) | anti-analysis/obfuscation/string/stackstring |
| encode data using XOR                       | data-manipulation/encoding/xor               |
| parse PE header                             | load-code/pe                                 |
+---------------------------------------------+----------------------------------------------+

INFO:capa:done.

count: 0 - range fails if no feature extracted

test case: rule and output (it should match on functions with no calls)

rule:
  meta:
    name: calls from
    namespace: test
    author: [email protected]
    scope: function
  features:
    - or:
      - count(mnemonic(call)): 0
      - count(characteristic(calls from)): 0

capa tests/data/34404a3fb9804977c6ab86cb991fb130.exe_ -t test -vv
INFO:capa:--------------------------------------------------------------------------------
INFO:capa: Using default embedded rules.
INFO:capa: To provide your own rules, use the form `capa.exe  ./path/to/rules/  /path/to/mal.exe`.
INFO:capa: You can see the current default rule set here:
INFO:capa:     https://github.com/fireeye/capa-rules
INFO:capa:--------------------------------------------------------------------------------
WARNING:capa:skipping non-.yml file: .git
WARNING:capa:skipping non-.yml file: README.md
INFO:capa:successfully loaded 278 rules
INFO:capa:selected 1 rules
INFO:capa:generating vivisect workspace for: tests/data/34404a3fb9804977c6ab86cb991fb130.exe_
INFO:capa:format: pe, platform: windows, architecture: i386, number of functions: 853
INFO:capa:analyzed file and extracted 1549 features

INFO:capa:done.

capa/engine.py:156

    def evaluate(self, ctx):
        if self.child not in ctx:
            return Result(False, self, [])

vivisect workspace creation

@mr-tz

vivisect and/or viv_util updates may result in modified workspaces. By default getWorkspace loads existing .viv files if they exist. This can lead to confusion, misleading analysis and errors.

@williballenthin

we should probably report this upstream.

@williballenthin

in the meantime, maybe we can stuff the viv version in a meta field and do the check ourselves.

add documentation for IDA plugin

at least include a screenshot in the main readme so people can get a sense for what it does.

json: include locations for range nodes

update serialization of characteristic feature

remove special handling of characteristic feature when serializing and refreeze testbed files.

it currently maintains backwards compatibility with an old format, by using a list of two elements.

integrate capa with ghidra

lots of people use ghidra, which is free and open source. we should recommend a way of integrating capa results into ghidra.

add capafmt utility for consistent formatting of rules

it would be nice to format rules with a consistent style.

this includes:

whitespacing, especially with lists
order meta before features

by default, python yaml emits keys alphabetically. as an example:

rule:
  meta:
    att&ck:
    - Defense Evasion::Obfuscated Files or Information T1027.002
    author: [email protected]
    examples:
    - CD2CBA9E6313E8DF2C1273593E649682
    - Practical Malware Analysis Lab 01-02.exe_:0x0401000
    mbc:
    - Anti-Static Analysis::Software Packing
    name: packed with UPX
    namespace: anti-analysis/packer/upx
    scope: file
  features:
  - or:
    - section: UPX0
    - section: UPX1

this wold look nicer:

rule:
  meta:
    name: packed with UPX
    namespace: anti-analysis/packer/upx
    author: [email protected]
    att&ck:
    - Defense Evasion::Obfuscated Files or Information T1027.002
    mbc:
    - Anti-Static Analysis::Software Packing
    examples:
    - CD2CBA9E6313E8DF2C1273593E649682
    - Practical Malware Analysis Lab 01-02.exe_:0x0401000
    scope: file
  features:
  - or:
    - section: UPX0
    - section: UPX1

ci: configure black for code formatting

configure gh actions to update version

blocked on gh actions be available, though.

Associate context with a string

@Ana06

In some cases it could be useful to associate context with a string as it can be done with numbers. For example:
- string: "{3E5FC7F9-9A51-4367-9063-A120244FBEC7}" = CLSID_CMSTPLUA

@mr-tz

hm, good point! maybe it makes sense to make the extra context available to all features.

@williballenthin

+1

plan: rule reorganization

linter: filename should match rule name

modulo some stripping of special characters

Automate submodule sync (rules)

via GitHub Actions

Currently has to be done manually, see https://stackoverflow.com/questions/5828324/update-git-submodule-to-latest-commit-on-origin

<in capa base dir>
cd rules/
git checkout master
git pull origin master
cd ..
git add rules/
git commit
git push origin master

capa can't be used as a library on py3

capa relies on vivisect for its standalone code analysis (when run within IDA, it uses IDA's analysis). since vivisect is py2-only, this means capa is py2-only, when used standalone or as a library. we should provide an analysis backend that can be used on py3, as this is the future.

we're aware that everyone (actually, including ourselves) has already moved on to py3. you should be aware that using vivisect was the path of least resistance to developing capa. now that we've proved that capa works and is useful, its finally appropriate to dedicate substantial time towards the upgrade.

note, the capa code base is already py3 compatible. this is strictly a limitation of the backend that we ship by default.

style-checker hook fails

count basic block

This rule is not working as I expect, I get no results. Am I using this wrong?

rule:
  meta:
    name: count bb
    namespace: test
    scope: function
  features:
    - and:
      - count(basic blocks): 1 or more

consider using `black` for formatting

https://github.com/psf/black

notably, this is found under the Python Software Foundation (PSF) organization. seems to lend some weight. also, tons of stars and engagements.

pull function scope features into file scope

@mr-tz

add another scope program to encompass file and function (and lower) scopes

@mr-tz

Should we prioritize this feature? We have various instances from Ana's work where this would be helpful. According to @mwilliams31 schannel is also likely implemented across multiple functions.

@williballenthin

works for me. shall we have @Ana06 tackle it? will require getting familiar with the matching logic, which is a good lesson (and maybe torture???).

@mr-tz

sounds good 😄 if it becomes too much torture, let us know, @Ana06

ci: configure isort for code formatting

$ isort --length-sort --line-width 120 --thirdparty idc --thirdparty idaapi --thirdparty idautils --thirdparty ida_gdl --thirdparty PyQt5 --thirdparty argparse --builtin posixpath --thirdparty tabulate --thirdparty viv_utils --recursive .

make sure pyinstaller still works

linter: namespace should match directory structure

...unless in nursery

output feature count

capa shows the file feature count

INFO:capa:analyzed file and extracted 21 file features

to avoid confusion, this should be removed or extended to also show function features

discussion: capa JSON format

I have the following questions/comments after changing the IDA plugin to use the new JSON format:

Does it make sense to define (if not done already) a JSON schema for the new format?
- Pros: Schema would allow for easy validation of the format and serve as documentation for developers wanting to ingest the data into their systems
- Cons: Time and effort
Does it make sense to include the original rule content for match? This data can be found in the source field of the parent match but finding the original source this way isn't as convenient
- Pros: Convenience when parsing/displaying rule data for match
- Cons: Duplicate data in output
Does it make sense to include the locations for range? There locations, and corresponding context e.g. the instruction at a location, used to be displayed in the IDA plugin.
- Pros: Locations can be rendered providing additional context
- Cons: More data in output
Does it make sense to include additional meta data e.g. hash value, entry point, etc. specific to the binary file from which the output was produced?
- Pros: Systems looking to ingest the data could render the additional context - meta data could be used to map output back to original binary
- Cons: More data in output and more work on extractor end to get the meta data
Does it make sense to include feature comments e.g. PAGE_EXECUTE_READWRITE from number: 0x40 = PAGE_EXECUTE_READWRITE
- Pros: Additional context/comments can be rendered
- Cons: More data in output

linter: warn on non-standard meta fields

and maybe suggest "reference" -> "references"

Introduce variable DESCRIPTION_SEPARATOR

rather than using an inline string ' = ' that is prone to typo and cannot be used with "find references", use a constant like DESCRIPTION_SEPARATOR = "' = "' and use this throughout the code. (from @williballenthin's comment in #39)

discussion: capa doesn't extract features from packed files

capa relies on disassembly and code analysis that can easily be defeated by packing. right now, capa doesn't attempt to do any auto-unpacking, so even trivially packed samples can bypass capa. fortunately, capa can often recognize when packing is in use (if you notice a bypass, submit a rule!), and will emit a warning about this.

doing auto-unpacking is a non-trivial job, and not really in scope for the what capa does. however, if there are easy ways to make this work, we can revisit the idea.

include additional metadata in json and verbose output modes

this could include:

md5, sha1, sha256 hashes
file format
path of input file
extractor used
arguments specified
base address of module

capa refuses to output JSON if a file limitation is detected e.g AutoIt

discussion: capa doesn't handle sandbox or API traces

capa relies on analysis of code structures to identify patterns. this is similar to matching sequences of API calls or other events in a sandbox, but not exactly. right now, capa rules don't directly translate to identifying behaviors from sandbox or debugging output, but it seems like there's a lot of overlap. maybe we can find a way to re-use a lot of work we've done for the static analysis rules.

discussion: capa doesn't do anything for non-Windows or non-PE files

capa contains a small amount of code and a large amount of default rules that assume the input file is a Windows PE file. this is because the original authors primarily analyze Windows malware. there is nothing stopping analysis of Linux ELF or MacOS Mach-O binaries; however, we haven't yet had the experience, sample binaries, nor time to make this happen.

support for additional platforms may be added in the future, especially with (1) contributions from experts in those fields, and (2) sufficient sample binaries to demonstrate capa works as expected. if you're interested in helping out in these areas, please get in touch!

discussion: false positives in vcrt functions

there are a number of interesting rules, like manual PEB parsing, that fire on standard routines inserted by the MSVC compiler. typically, we'd want to include these in the output, except that some of these normal runtime functions aren't doing anything nefarious (as the rule might suggest, like anti-vm).

this leads to the desire that we'd want to filter out some known functions from matching.

there are at least two obvious approaches:

using existing capa logic/rules to match known functions (like count of bb, count and/or distribution of mnemonics, etc) and then not the matches
rely on the analysis backend to provide metadata about functions, such as auto-detected function name, and let rules match against this

both of these have tradeoffs, and its not clear what we should do.

if we use capa infrastructure to match functions,

pro:

need no new features or syntax, can do it today
works across all analysis backends
easy to inspect

con:

we have to maintain function signatures (not our goal here)
our signatures may not be as good as purpose built tech, like FLIRT of Ghidra's database
matching N signatures against M functions may introduce performance issues (maybe, this is a guess)

if we rely on backend analysis backends to match functions,

pro:

rely on backend expertise to do function id very well
less maintainence

con:

need new syntax, maybe like function/name: __init_iob
different analysis backends have different quality, i.e. IDA is very good, and vivisect has minimal coverage
different analysis backends may use different names/formats for function names that we have to normalize

post-commit git hook incorrectly stashes unstaged changes

when i stage and commit only some of the pending changes, the post-commit git hook places unstaged changes into the git stash stack. i have to manually pop them with git stash pop stash@{0}. i would rather have these unstaged changes untouched by the git hook (at least, they should be there when the hook completes its job).

last night i was really scared that i had lost hours of work until i noticed the changes were hidden in the stack.

TypeError: Can't instantiate abstract class NullFeatureExtractor

Tests are failing in master:

――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ERROR collecting tests/test_freeze.py ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
tests/test_freeze.py:27: in <module>
   0x401002: {"features": [(0x401002, capa.features.insn.Mnemonic("mov")),],},
E   TypeError: Can't instantiate abstract class NullFeatureExtractor with abstract methods get_base_address

==================================================================================================== warnings summary ====================================================================================================
/usr/local/lib/python2.7/site-packages/vivisect/parsers/__init__.py:14
 /usr/local/lib/python2.7/site-packages/vivisect/parsers/__init__.py:14: DeprecationWarning: the md5 module is deprecated; use hashlib instead
   import md5

-- Docs: https://docs.pytest.org/en/latest/warnings.html
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Introduced in ff44801

separate logging level from output mode

add flag --debug to enable DEBUG level logging. this is independent of --verbose mode that affects result output.

vivisect extractor: bytes features for immediate operands

currently this gets bytes features for many invalid immediate operators

        if isinstance(oper, envi.archs.i386.disasm.i386ImmOper):
            v = oper.getOperValue(oper)

for example add ebp, 0Bh etc.

this case should be fine-tuned or removed?

Get rid of the Element class

The Element class is just used for testing. By using Element we are not testing the actual code. Also, every time we implement a new feature for the Feature class, we need to implement it for Element as well. I think it would be a better idea to use real classes for testing and get rid of Element. We could start by substituing it by number, which should be straighforward. Although I think it could be a good idea to add some more tests for the different Feature classes.

Related to #5, as it symplifies the implementation.

@mr-tz @williballenthin what do you think?

engine: support matching on rule namespace prefixes

right now we support matching on other rule names, like match: encrypt data with RC4 KSA

we should support matching on namespaces, as well, like match: data-manipulation/encryption

this would mean that rule authors don't have to know about all the possible techniques to do a thing (like encryption).

add JSON-formatted output mode

use this JSON as the source data for all formatters. this will ensure it has all data necessary to render complete details of capa matches.

the JSON document will be the primary method of integration for external tools and scripts, rather than supporting a multitude of integrations.

Support descriptions for regular expressions

This was not implemented in #39, as at RegExp are not a Feature. It is needed to either make RegExp a feature or implement this for RegExp as well. It should works in the same as for strings.

Just tracking it here, so that we don't forget about it. 😉

Add a CONTRIBUTING file

I think we should add a CONTRIBUTING file to collect some important information we now have in other documents. I information is usually in the CONTRIBUTING file in other project and it is where people expect it to be. In addition, it is used by GitHub to help guiding new contributors. For example, when someone opens a pull request or creates an issue, they will see a link to that file:

Reference: https://help.github.com/en/github/building-a-strong-community/setting-guidelines-for-repository-contributors

I think this document should include the following information:

How to contribute with issues, including a reference to the capa-rules repository and which issues belongs to every repo. This should also be linked from the issues template.
How to write rules, linking current documentation and explaining the linter
How to contribute with code, including how to set the project up (currently in different documents) and how to run the tests.

Something else?

remove args from Features

After #39 it is really obvious that args and value are a duplication for most of the features. In most cases args = [value]. In few features value has a different name, but I think I makes sense to rename this attribute. We could think about it as the value in the yaml file. So, I propose to get rid of args and introduce value for Feature (the main class instead of the subclasses). Removing duplication would simplify the code.

@mr-tz @williballenthin what do you think?

mandiant / capa Goto Github PK

capa's Introduction

download and usage

example

further information

capa

capa rules

capa testfiles

capa's People

Contributors

Stargazers

Watchers

Forkers

capa's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs