GithubHelp home page GithubHelp logo

mandiant / capa Goto Github PK

View Code? Open in Web Editor NEW
4.0K 82.0 504.0 29.63 MB

The FLARE team's open-source tool to identify capabilities in executable files.

License: Apache License 2.0

Python 99.96% Dockerfile 0.04%
malware-analysis reverse-engineering gsoc-2024

capa's Introduction

capa

PyPI - Python Version Last release Number of rules CI status Downloads License

capa detects capabilities in executable files. You run it against a PE, ELF, .NET module, shellcode file, or a sandbox report and it tells you what it thinks the program can do. For example, it might suggest that the file is a backdoor, is capable of installing services, or relies on HTTP to communicate.

Check out our capa blog posts:

$ capa.exe suspicious.exe

+------------------------+--------------------------------------------------------------------------------+
| ATT&CK Tactic          | ATT&CK Technique                                                               |
|------------------------+--------------------------------------------------------------------------------|
| DEFENSE EVASION        | Obfuscated Files or Information [T1027]                                        |
| DISCOVERY              | Query Registry [T1012]                                                         |
|                        | System Information Discovery [T1082]                                           |
| EXECUTION              | Command and Scripting Interpreter::Windows Command Shell [T1059.003]           |
|                        | Shared Modules [T1129]                                                         |
| EXFILTRATION           | Exfiltration Over C2 Channel [T1041]                                           |
| PERSISTENCE            | Create or Modify System Process::Windows Service [T1543.003]                   |
+------------------------+--------------------------------------------------------------------------------+

+-------------------------------------------------------+-------------------------------------------------+
| CAPABILITY                                            | NAMESPACE                                       |
|-------------------------------------------------------+-------------------------------------------------|
| check for OutputDebugString error                     | anti-analysis/anti-debugging/debugger-detection |
| read and send data from client to server              | c2/file-transfer                                |
| execute shell command and capture output              | c2/shell                                        |
| receive data (2 matches)                              | communication                                   |
| send data (6 matches)                                 | communication                                   |
| connect to HTTP server (3 matches)                    | communication/http/client                       |
| send HTTP request (3 matches)                         | communication/http/client                       |
| create pipe                                           | communication/named-pipe/create                 |
| get socket status (2 matches)                         | communication/socket                            |
| receive data on socket (2 matches)                    | communication/socket/receive                    |
| send data on socket (3 matches)                       | communication/socket/send                       |
| connect TCP socket                                    | communication/socket/tcp                        |
| encode data using Base64                              | data-manipulation/encoding/base64               |
| encode data using XOR (6 matches)                     | data-manipulation/encoding/xor                  |
| run as a service                                      | executable/pe                                   |
| get common file path (3 matches)                      | host-interaction/file-system                    |
| read file                                             | host-interaction/file-system/read               |
| write file (2 matches)                                | host-interaction/file-system/write              |
| print debug messages (2 matches)                      | host-interaction/log/debug/write-event          |
| resolve DNS                                           | host-interaction/network/dns/resolve            |
| get hostname                                          | host-interaction/os/hostname                    |
| create a process with modified I/O handles and window | host-interaction/process/create                 |
| create process                                        | host-interaction/process/create                 |
| create registry key                                   | host-interaction/registry/create                |
| create service                                        | host-interaction/service/create                 |
| create thread                                         | host-interaction/thread/create                  |
| persist via Windows service                           | persistence/service                             |
+-------------------------------------------------------+-------------------------------------------------+

download and usage

Download stable releases of the standalone capa binaries here. You can run the standalone binaries without installation. capa is a command line tool that should be run from the terminal.

To use capa as a library or integrate with another tool, see doc/installation.md for further setup instructions.

For more information about how to use capa, see doc/usage.md.

example

In the above sample output, we ran capa against an unknown binary (suspicious.exe), and the tool reported that the program can send HTTP requests, decode data via XOR and Base64, install services, and spawn new processes. Taken together, this makes us think that suspicious.exe could be a persistent backdoor. Therefore, our next analysis step might be to run suspicious.exe in a sandbox and try to recover the command and control server.

By passing the -vv flag (for very verbose), capa reports exactly where it found evidence of these capabilities. This is useful for at least two reasons:

  • it helps explain why we should trust the results, and enables us to verify the conclusions, and
  • it shows where within the binary an experienced analyst might study with IDA Pro
$ capa.exe suspicious.exe -vv
...
execute shell command and capture output
namespace   c2/shell
author      [email protected]
scope       function
att&ck      Execution::Command and Scripting Interpreter::Windows Command Shell [T1059.003]
references  https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/ns-processthreadsapi-startupinfoa
function @ 0x4011C0
  and:
    match: create a process with modified I/O handles and window @ 0x4011C0
      and:
        number: 257 = STARTF_USESTDHANDLES | STARTF_USESHOWWINDOW @ 0x4012B8
        or:
          number: 68 = StartupInfo.cb (size) @ 0x401282
        or: = API functions that accept a pointer to a STARTUPINFO structure
          api: kernel32.CreateProcess @ 0x401343
    match: create pipe @ 0x4011C0
      or:
        api: kernel32.CreatePipe @ 0x40126F, 0x401280
    optional:
      match: create thread @ 0x40136A, 0x4013BA
        or:
          and:
            os: windows
            or:
              api: kernel32.CreateThread @ 0x4013D7
        or:
          and:
            os: windows
            or:
              api: kernel32.CreateThread @ 0x401395
    or:
      string: "cmd.exe" @ 0x4012FD
...

Additionally, capa also supports analyzing CAPE sandbox reports for dynamic capability extraction. In order to use this, you first submit your sample to CAPE for analysis, and then run capa against the generated report (JSON).

Here's an example of running capa against a packed binary, and then running capa against the CAPE report of that binary:

$ capa 05be49819139a3fdcdbddbdefd298398779521f3d68daa25275cc77508e42310.exe
WARNING:capa.capabilities.common:--------------------------------------------------------------------------------
WARNING:capa.capabilities.common: This sample appears to be packed.
WARNING:capa.capabilities.common: 
WARNING:capa.capabilities.common: Packed samples have often been obfuscated to hide their logic.
WARNING:capa.capabilities.common: capa cannot handle obfuscation well using static analysis. This means the results may be misleading or incomplete.
WARNING:capa.capabilities.common: If possible, you should try to unpack this input file before analyzing it with capa.
WARNING:capa.capabilities.common: Alternatively, run the sample in a supported sandbox and invoke capa against the report to obtain dynamic analysis results.
WARNING:capa.capabilities.common: 
WARNING:capa.capabilities.common: Identified via rule: (internal) packer file limitation
WARNING:capa.capabilities.common: 
WARNING:capa.capabilities.common: Use -v or -vv if you really want to see the capabilities identified by capa.
WARNING:capa.capabilities.common:--------------------------------------------------------------------------------

$ capa 05be49819139a3fdcdbddbdefd298398779521f3d68daa25275cc77508e42310.json

┍━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑
│ ATT&CK Tactic          │ ATT&CK Technique                                                                   │
┝━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
│ CREDENTIAL ACCESS      │ Credentials from Password Stores T1555                                             │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ DEFENSE EVASION        │ File and Directory Permissions Modification T1222                                  │
│                        │ Modify Registry T1112                                                              │
│                        │ Obfuscated Files or Information T1027                                              │
│                        │ Virtualization/Sandbox Evasion::User Activity Based Checks T1497.002               │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ DISCOVERY              │ Account Discovery T1087                                                            │
│                        │ Application Window Discovery T1010                                                 │
│                        │ File and Directory Discovery T1083                                                 │
│                        │ Query Registry T1012                                                               │
│                        │ System Information Discovery T1082                                                 │
│                        │ System Location Discovery::System Language Discovery T1614.001                     │
│                        │ System Owner/User Discovery T1033                                                  │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ EXECUTION              │ System Services::Service Execution T1569.002                                       │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ PERSISTENCE            │ Boot or Logon Autostart Execution::Registry Run Keys / Startup Folder T1547.001    │
│                        │ Boot or Logon Autostart Execution::Winlogon Helper DLL T1547.004                   │
│                        │ Create or Modify System Process::Windows Service T1543.003                         │
┕━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙

┍━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑
│ Capability                                           │ Namespace                                            │
┝━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
│ check for unmoving mouse cursor (3 matches)          │ anti-analysis/anti-vm/vm-detection                   │
│ gather bitkinex information                          │ collection/file-managers                             │
│ gather classicftp information                        │ collection/file-managers                             │
│ gather filezilla information                         │ collection/file-managers                             │
│ gather total-commander information                   │ collection/file-managers                             │
│ gather ultrafxp information                          │ collection/file-managers                             │
│ resolve DNS (23 matches)                             │ communication/dns                                    │
│ initialize Winsock library (7 matches)               │ communication/socket                                 │
│ act as TCP client (3 matches)                        │ communication/tcp/client                             │
│ create new key via CryptAcquireContext               │ data-manipulation/encryption                         │
│ encrypt or decrypt via WinCrypt                      │ data-manipulation/encryption                         │
│ hash data via WinCrypt                               │ data-manipulation/hashing                            │
│ initialize hashing via WinCrypt                      │ data-manipulation/hashing                            │
│ hash data with MD5                                   │ data-manipulation/hashing/md5                        │
│ generate random numbers via WinAPI                   │ data-manipulation/prng                               │
│ extract resource via kernel32 functions (2 matches)  │ executable/resource                                  │
│ interact with driver via control codes (2 matches)   │ host-interaction/driver                              │
│ get Program Files directory (18 matches)             │ host-interaction/file-system                         │
│ get common file path (575 matches)                   │ host-interaction/file-system                         │
│ create directory (2 matches)                         │ host-interaction/file-system/create                  │
│ delete file                                          │ host-interaction/file-system/delete                  │
│ get file attributes (122 matches)                    │ host-interaction/file-system/meta                    │
│ set file attributes (8 matches)                      │ host-interaction/file-system/meta                    │
│ move file                                            │ host-interaction/file-system/move                    │
│ find taskbar (3 matches)                             │ host-interaction/gui/taskbar/find                    │
│ get keyboard layout (12 matches)                     │ host-interaction/hardware/keyboard                   │
│ get disk size                                        │ host-interaction/hardware/storage                    │
│ get hostname (4 matches)                             │ host-interaction/os/hostname                         │
│ allocate or change RWX memory (3 matches)            │ host-interaction/process/inject                      │
│ query or enumerate registry key (3 matches)          │ host-interaction/registry                            │
│ query or enumerate registry value (8 matches)        │ host-interaction/registry                            │
│ delete registry key                                  │ host-interaction/registry/delete                     │
│ start service                                        │ host-interaction/service/start                       │
│ get session user name                                │ host-interaction/session                             │
│ persist via Run registry key                         │ persistence/registry/run                             │
│ persist via Winlogon Helper DLL registry key         │ persistence/registry/winlogon-helper                 │
│ persist via Windows service (2 matches)              │ persistence/service                                  │
┕━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙

capa uses a collection of rules to identify capabilities within a program. These rules are easy to write, even for those new to reverse engineering. By authoring rules, you can extend the capabilities that capa recognizes. In some regards, capa rules are a mixture of the OpenIOC, Yara, and YAML formats.

Here's an example rule used by capa:

rule:
  meta:
    name: create TCP socket
    namespace: communication/socket/tcp
    authors:
      - [email protected]
      - [email protected]
      - [email protected]
    scopes:
      static: basic block
      dynamic: call
    mbc:
      - Communication::Socket Communication::Create TCP Socket [C0001.011]
    examples:
      - Practical Malware Analysis Lab 01-01.dll_:0x10001010
  features:
    - or:
      - and:
        - number: 6 = IPPROTO_TCP
        - number: 1 = SOCK_STREAM
        - number: 2 = AF_INET
        - or:
          - api: ws2_32.socket
          - api: ws2_32.WSASocket
          - api: socket
      - property/read: System.Net.Sockets.TcpClient::Client

The github.com/mandiant/capa-rules repository contains hundreds of standard library rules that are distributed with capa. Please learn to write rules and contribute new entries as you find interesting techniques in malware.

If you use IDA Pro, then you can use the capa explorer plugin. capa explorer helps you identify interesting areas of a program and build new capa rules using features extracted directly from your IDA Pro database.

capa + IDA Pro integration

If you use Ghidra, then you can use the capa + Ghidra integration to run capa's analysis directly on your Ghidra database and render the results in Ghidra's user interface.

further information

capa

capa rules

capa testfiles

The capa-testfiles repository contains the data we use to test capa's code and rules

capa's People

Contributors

aaronatp avatar aayush-goel-04 avatar ana06 avatar anushkavirgaonkar avatar atlas-64 avatar capa-bot avatar captaingeech42 avatar cclauss avatar colton-gabertan avatar dependabot[bot] avatar doomedraven avatar ggold7046 avatar jcrussell avatar jsoref avatar kn0wl3dge avatar manasghandat avatar mike-hunhoff avatar mr-tz avatar psifertex avatar rainrat avatar recvfrom avatar ronniesalomonsen avatar ruppde avatar s-ff avatar stevemk14ebr avatar uckelman-sf avatar williballenthin avatar xusheng6 avatar yelhamer avatar ygasparis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

capa's Issues

capa explorer fails on Python 2, IDA 7.5

IDAPython: Error while calling Python callback <OnCreate>:
Traceback (most recent call last):
  File "ida_capa_explorer.py", line 99, in OnCreate
    self.load_capa_results()
  File "capa/capa/ida/ida_capa_explorer.py", line 342, in load_capa_results
    capabilities = capa.main.find_capabilities(rules, capa.features.extractors.ida.IdaFeatureExtractor(), True)
  File "capa\capa\main.py", line 99, in find_capabilities
    for f in tqdm.tqdm(extractor.get_functions(), disable=disable_progress, unit=" functions"):
  File "C:\Python27\lib\site-packages\tqdm\_tqdm.py", line 997, in __iter__
    for obj in iterable:
  File "capa\capa\features\extractors\ida\__init__.py", line 54, in get_functions
    from capa.features.extractors.ida import helpers
ImportError: cannot import name helpers
INFO:capa:form closed.
Python>sys.version
'2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AMD64)]'

simplify metadata rendering

I propose the following formats to reduce duplicate information (MD5) and display the most important information first.

capa report could be included as a header/heading as well

default before

+------------------------+--------------------------------------------------------------+
| capa report for        | 34404a3fb9804977c6ab86cb991fb130                             |
| timestamp              | 2020-07-03T12:41:55.267000                                   |
| version                | 0.0.0                                                        |
| path                   | tests\data\34404a3fb9804977c6ab86cb991fb130.exe_             |
| md5                    | 34404a3fb9804977c6ab86cb991fb130                             |
+------------------------+--------------------------------------------------------------+

>>>>>
after

+------------------------+--------------------------------------------------------------+
| md5                    | 34404a3fb9804977c6ab86cb991fb130                             |
| path                   | tests\data\34404a3fb9804977c6ab86cb991fb130.exe_             |
| timestamp              | 2020-07-03T12:41:55.267000                                   |
| capa version           | 0.0.0                                                        |
+------------------------+--------------------------------------------------------------+



verbose, vverbose (should use same function) before

capa report for  34404a3fb9804977c6ab86cb991fb130
timestamp        2020-07-03T12:42:07.813000
version          0.0.0
path             tests\data\34404a3fb9804977c6ab86cb991fb130.exe_
md5              34404a3fb9804977c6ab86cb991fb130
sha1             b345e6fae155bfaf79c67b38cf488bb17d5be56d
sha256           c6930e298bba86c01d0fe2c8262c46b4fce97c6c5037a193904cfc634246fbec
format           auto
extractor        VivisectFeatureExtractor
base address     0x400000

>>>>>
after

md5              34404a3fb9804977c6ab86cb991fb130
sha1             b345e6fae155bfaf79c67b38cf488bb17d5be56d
sha256           c6930e298bba86c01d0fe2c8262c46b4fce97c6c5037a193904cfc634246fbec
path             tests\data\34404a3fb9804977c6ab86cb991fb130.exe_
timestamp        2020-07-03T12:42:07.813000
capa version     0.0.0
format           auto
extractor        VivisectFeatureExtractor
base address     0x400000

assume characteristics always encode the existance

after months of use, it seems that characteristic features are only used like characteristic(nzxor): True. that is, the value is always True. we can simplify and make the rule syntax more consistent by changing the format to look like characteristic: nzxor and count(characteristic(nzxor)).

to match the non-existence of this feature, use not: characteristic: ... or count(characteristic(...)): 0.

doc missing locations for "calls from" chacateristic

The doc format does not include locations for calls from characteristic. From my understanding these locations are recorded and should be included?

{'children': [],
 'locations': (),
 'node': {'statement': {'child': {'characteristic': 'calls from',
                 'type': 'characteristic'},
                  'max': 4,
                  'min': 0,
                   'type': 'range'},
                  'type': 'statement'},
'success': True},

vivisect/viv-utils - Exception: Invalid File: shellcode

$ capa -f sc32 tests/data/499c2a85f6e8142c3f48d4251c9c7cd6.raw32
INFO:capa:--------------------------------------------------------------------------------
INFO:capa: Using default embedded rules.
INFO:capa: To provide your own rules, use the form `capa.exe  ./path/to/rules/  /path/to/mal.exe`.
INFO:capa: You can see the current default rule set here:
INFO:capa:     https://github.com/fireeye/capa-rules
INFO:capa:--------------------------------------------------------------------------------
WARNING:capa:skipping non-.yml file: .git
WARNING:capa:skipping non-.yml file: README.md
INFO:capa:successfully loaded 277 rules
INFO:capa:generating vivisect workspace for: tests/data/499c2a85f6e8142c3f48d4251c9c7cd6.raw32
Traceback (most recent call last):
  File "c:\python27\lib\site-packages\vivisect\impemu\monitor.py", line 147, in prehook
    cb(self, emu, op, starteip)
  File "c:\python27\lib\site-packages\vivisect\analysis\generic\switchcase.py", line 19, in analyzeJmp
    ctx = getSwitchBase(vw, op, starteip, emu)
  File "c:\python27\lib\site-packages\vivisect\analysis\generic\switchcase.py", line 69, in getSwitchBase
    imgbase = vw.getFileMeta(filename, 'imagebase')
  File "c:\python27\lib\site-packages\vivisect\__init__.py", line 2484, in getFileMeta
    raise Exception("Invalid File: %s" % filename)
Exception: Invalid File: shellcode
[...]
INFO:capa:format: blob, platform: windows, architecture: i386, number of functions: 42
INFO:capa:analyzed file and extracted 112 features
+------------------------+----------------------------------------------------------------+
| ATT&CK Tactic          | ATT&CK Technique                                               |
|------------------------+----------------------------------------------------------------|
| DEFENSE EVASION        | Obfuscated Files or Information [T1027]                        |
| EXECUTION              | Shared Modules [T1129]                                         |
+------------------------+----------------------------------------------------------------+

+---------------------------------------------+----------------------------------------------+
| CAPABILITY                                  | NAMESPACE                                    |
|---------------------------------------------+----------------------------------------------|
| contain obfuscated stackstrings (2 matches) | anti-analysis/obfuscation/string/stackstring |
| encode data using XOR                       | data-manipulation/encoding/xor               |
| parse PE header                             | load-code/pe                                 |
+---------------------------------------------+----------------------------------------------+

INFO:capa:done.

count: 0 - range fails if no feature extracted

test case: rule and output (it should match on functions with no calls)

rule:
  meta:
    name: calls from
    namespace: test
    author: [email protected]
    scope: function
  features:
    - or:
      - count(mnemonic(call)): 0
      - count(characteristic(calls from)): 0

capa tests/data/34404a3fb9804977c6ab86cb991fb130.exe_ -t test -vv
INFO:capa:--------------------------------------------------------------------------------
INFO:capa: Using default embedded rules.
INFO:capa: To provide your own rules, use the form `capa.exe  ./path/to/rules/  /path/to/mal.exe`.
INFO:capa: You can see the current default rule set here:
INFO:capa:     https://github.com/fireeye/capa-rules
INFO:capa:--------------------------------------------------------------------------------
WARNING:capa:skipping non-.yml file: .git
WARNING:capa:skipping non-.yml file: README.md
INFO:capa:successfully loaded 278 rules
INFO:capa:selected 1 rules
INFO:capa:generating vivisect workspace for: tests/data/34404a3fb9804977c6ab86cb991fb130.exe_
INFO:capa:format: pe, platform: windows, architecture: i386, number of functions: 853
INFO:capa:analyzed file and extracted 1549 features

INFO:capa:done.

capa/engine.py:156

    def evaluate(self, ctx):
        if self.child not in ctx:
            return Result(False, self, [])

vivisect workspace creation

@mr-tz

vivisect and/or viv_util updates may result in modified workspaces. By default getWorkspace loads existing .viv files if they exist. This can lead to confusion, misleading analysis and errors.

@williballenthin

we should probably report this upstream.

@williballenthin

in the meantime, maybe we can stuff the viv version in a meta field and do the check ourselves.

update serialization of characteristic feature

remove special handling of characteristic feature when serializing and refreeze testbed files.

it currently maintains backwards compatibility with an old format, by using a list of two elements.

integrate capa with ghidra

lots of people use ghidra, which is free and open source. we should recommend a way of integrating capa results into ghidra.

add capafmt utility for consistent formatting of rules

it would be nice to format rules with a consistent style.

this includes:

  • whitespacing, especially with lists
  • order meta before features

by default, python yaml emits keys alphabetically. as an example:

rule:
  meta:
    att&ck:
    - Defense Evasion::Obfuscated Files or Information T1027.002
    author: [email protected]
    examples:
    - CD2CBA9E6313E8DF2C1273593E649682
    - Practical Malware Analysis Lab 01-02.exe_:0x0401000
    mbc:
    - Anti-Static Analysis::Software Packing
    name: packed with UPX
    namespace: anti-analysis/packer/upx
    scope: file
  features:
  - or:
    - section: UPX0
    - section: UPX1

this wold look nicer:

rule:
  meta:
    name: packed with UPX
    namespace: anti-analysis/packer/upx
    author: [email protected]
    att&ck:
    - Defense Evasion::Obfuscated Files or Information T1027.002
    mbc:
    - Anti-Static Analysis::Software Packing
    examples:
    - CD2CBA9E6313E8DF2C1273593E649682
    - Practical Malware Analysis Lab 01-02.exe_:0x0401000
    scope: file
  features:
  - or:
    - section: UPX0
    - section: UPX1

Associate context with a string

@Ana06

In some cases it could be useful to associate context with a string as it can be done with numbers. For example:

- string: "{3E5FC7F9-9A51-4367-9063-A120244FBEC7}" = CLSID_CMSTPLUA

@mr-tz

hm, good point! maybe it makes sense to make the extra context available to all features.

@williballenthin

+1

plan: rule reorganization

  • agree on proposed rule names and namespaces. see shared excel spreadsheet.
  • develop script to do migration #25
  • develop formatter to ensure consistency of formatted rules #8
  • run formatter on all rules (and confirm results) mandiant/capa-rules#12
  • execute migration mandiant/capa-rules#14
  • post snapshot of excel spreadsheet mandiant/capa-rules#14
  • run formatter on all rules
  • update linter to support namespaces
  • document rule naming and namespacing conventions
  • update outputter to support namespaces #34
  • update readme with new rules and output examples
  • update ida plugin to support namespaces @mike-hunhoff #58
  • update readme with screenshots of IDA plugin @mike-hunhoff #66
  • update FC service to support namespaces @MalwareMechanic

capa can't be used as a library on py3

capa relies on vivisect for its standalone code analysis (when run within IDA, it uses IDA's analysis). since vivisect is py2-only, this means capa is py2-only, when used standalone or as a library. we should provide an analysis backend that can be used on py3, as this is the future.

we're aware that everyone (actually, including ourselves) has already moved on to py3. you should be aware that using vivisect was the path of least resistance to developing capa. now that we've proved that capa works and is useful, its finally appropriate to dedicate substantial time towards the upgrade.

note, the capa code base is already py3 compatible. this is strictly a limitation of the backend that we ship by default.

count basic block

This rule is not working as I expect, I get no results. Am I using this wrong?

rule:
  meta:
    name: count bb
    namespace: test
    scope: function
  features:
    - and:
      - count(basic blocks): 1 or more

pull function scope features into file scope

@mr-tz

add another scope program to encompass file and function (and lower) scopes

@mr-tz

Should we prioritize this feature? We have various instances from Ana's work where this would be helpful. According to @mwilliams31 schannel is also likely implemented across multiple functions.

@williballenthin

works for me. shall we have @Ana06 tackle it? will require getting familiar with the matching logic, which is a good lesson (and maybe torture???).

@mr-tz

sounds good 😄 if it becomes too much torture, let us know, @Ana06

ci: configure isort for code formatting

$ isort --length-sort --line-width 120 --thirdparty idc --thirdparty idaapi --thirdparty idautils --thirdparty ida_gdl --thirdparty PyQt5 --thirdparty argparse --builtin posixpath --thirdparty tabulate --thirdparty viv_utils --recursive .

output feature count

capa shows the file feature count

INFO:capa:analyzed file and extracted 21 file features

to avoid confusion, this should be removed or extended to also show function features

discussion: capa JSON format

I have the following questions/comments after changing the IDA plugin to use the new JSON format:

  • Does it make sense to define (if not done already) a JSON schema for the new format?

    • Pros: Schema would allow for easy validation of the format and serve as documentation for developers wanting to ingest the data into their systems
    • Cons: Time and effort
  • Does it make sense to include the original rule content for match? This data can be found in the source field of the parent match but finding the original source this way isn't as convenient

    • Pros: Convenience when parsing/displaying rule data for match
    • Cons: Duplicate data in output
  • Does it make sense to include the locations for range? There locations, and corresponding context e.g. the instruction at a location, used to be displayed in the IDA plugin.

    • Pros: Locations can be rendered providing additional context
    • Cons: More data in output
  • Does it make sense to include additional meta data e.g. hash value, entry point, etc. specific to the binary file from which the output was produced?

    • Pros: Systems looking to ingest the data could render the additional context - meta data could be used to map output back to original binary
    • Cons: More data in output and more work on extractor end to get the meta data
  • Does it make sense to include feature comments e.g. PAGE_EXECUTE_READWRITE from number: 0x40 = PAGE_EXECUTE_READWRITE

    • Pros: Additional context/comments can be rendered
    • Cons: More data in output

discussion: capa doesn't extract features from packed files

capa relies on disassembly and code analysis that can easily be defeated by packing. right now, capa doesn't attempt to do any auto-unpacking, so even trivially packed samples can bypass capa. fortunately, capa can often recognize when packing is in use (if you notice a bypass, submit a rule!), and will emit a warning about this.

doing auto-unpacking is a non-trivial job, and not really in scope for the what capa does. however, if there are easy ways to make this work, we can revisit the idea.

discussion: capa doesn't handle sandbox or API traces

capa relies on analysis of code structures to identify patterns. this is similar to matching sequences of API calls or other events in a sandbox, but not exactly. right now, capa rules don't directly translate to identifying behaviors from sandbox or debugging output, but it seems like there's a lot of overlap. maybe we can find a way to re-use a lot of work we've done for the static analysis rules.

discussion: capa doesn't do anything for non-Windows or non-PE files

capa contains a small amount of code and a large amount of default rules that assume the input file is a Windows PE file. this is because the original authors primarily analyze Windows malware. there is nothing stopping analysis of Linux ELF or MacOS Mach-O binaries; however, we haven't yet had the experience, sample binaries, nor time to make this happen.

support for additional platforms may be added in the future, especially with (1) contributions from experts in those fields, and (2) sufficient sample binaries to demonstrate capa works as expected. if you're interested in helping out in these areas, please get in touch!

discussion: false positives in vcrt functions

there are a number of interesting rules, like manual PEB parsing, that fire on standard routines inserted by the MSVC compiler. typically, we'd want to include these in the output, except that some of these normal runtime functions aren't doing anything nefarious (as the rule might suggest, like anti-vm).

this leads to the desire that we'd want to filter out some known functions from matching.

there are at least two obvious approaches:

  1. using existing capa logic/rules to match known functions (like count of bb, count and/or distribution of mnemonics, etc) and then not the matches
  2. rely on the analysis backend to provide metadata about functions, such as auto-detected function name, and let rules match against this

both of these have tradeoffs, and its not clear what we should do.

if we use capa infrastructure to match functions,

pro:

  • need no new features or syntax, can do it today
  • works across all analysis backends
  • easy to inspect

con:

  • we have to maintain function signatures (not our goal here)
  • our signatures may not be as good as purpose built tech, like FLIRT of Ghidra's database
  • matching N signatures against M functions may introduce performance issues (maybe, this is a guess)

if we rely on backend analysis backends to match functions,

pro:

  • rely on backend expertise to do function id very well
  • less maintainence

con:

  • need new syntax, maybe like function/name: __init_iob
  • different analysis backends have different quality, i.e. IDA is very good, and vivisect has minimal coverage
  • different analysis backends may use different names/formats for function names that we have to normalize

post-commit git hook incorrectly stashes unstaged changes

when i stage and commit only some of the pending changes, the post-commit git hook places unstaged changes into the git stash stack. i have to manually pop them with git stash pop stash@{0}. i would rather have these unstaged changes untouched by the git hook (at least, they should be there when the hook completes its job).

last night i was really scared that i had lost hours of work until i noticed the changes were hidden in the stack.

TypeError: Can't instantiate abstract class NullFeatureExtractor

Tests are failing in master:

――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ERROR collecting tests/test_freeze.py ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
tests/test_freeze.py:27: in <module>
   0x401002: {"features": [(0x401002, capa.features.insn.Mnemonic("mov")),],},
E   TypeError: Can't instantiate abstract class NullFeatureExtractor with abstract methods get_base_address

==================================================================================================== warnings summary ====================================================================================================
/usr/local/lib/python2.7/site-packages/vivisect/parsers/__init__.py:14
 /usr/local/lib/python2.7/site-packages/vivisect/parsers/__init__.py:14: DeprecationWarning: the md5 module is deprecated; use hashlib instead
   import md5

-- Docs: https://docs.pytest.org/en/latest/warnings.html
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Introduced in ff44801

vivisect extractor: bytes features for immediate operands

currently this gets bytes features for many invalid immediate operators

        if isinstance(oper, envi.archs.i386.disasm.i386ImmOper):
            v = oper.getOperValue(oper)

for example add ebp, 0Bh etc.

this case should be fine-tuned or removed?

Get rid of the Element class

The Element class is just used for testing. By using Element we are not testing the actual code. Also, every time we implement a new feature for the Feature class, we need to implement it for Element as well. I think it would be a better idea to use real classes for testing and get rid of Element. We could start by substituing it by number, which should be straighforward. Although I think it could be a good idea to add some more tests for the different Feature classes.

Related to #5, as it symplifies the implementation.

@mr-tz @williballenthin what do you think?

engine: support matching on rule namespace prefixes

right now we support matching on other rule names, like match: encrypt data with RC4 KSA

we should support matching on namespaces, as well, like match: data-manipulation/encryption

this would mean that rule authors don't have to know about all the possible techniques to do a thing (like encryption).

add JSON-formatted output mode

use this JSON as the source data for all formatters. this will ensure it has all data necessary to render complete details of capa matches.

the JSON document will be the primary method of integration for external tools and scripts, rather than supporting a multitude of integrations.

Support descriptions for regular expressions

This was not implemented in #39, as at RegExp are not a Feature. It is needed to either make RegExp a feature or implement this for RegExp as well. It should works in the same as for strings.

Just tracking it here, so that we don't forget about it. 😉

Add a CONTRIBUTING file

I think we should add a CONTRIBUTING file to collect some important information we now have in other documents. I information is usually in the CONTRIBUTING file in other project and it is where people expect it to be. In addition, it is used by GitHub to help guiding new contributors. For example, when someone opens a pull request or creates an issue, they will see a link to that file:

image

Reference: https://help.github.com/en/github/building-a-strong-community/setting-guidelines-for-repository-contributors

I think this document should include the following information:

  • How to contribute with issues, including a reference to the capa-rules repository and which issues belongs to every repo. This should also be linked from the issues template.
  • How to write rules, linking current documentation and explaining the linter
  • How to contribute with code, including how to set the project up (currently in different documents) and how to run the tests.

Something else?

remove args from Features

After #39 it is really obvious that args and value are a duplication for most of the features. In most cases args = [value]. In few features value has a different name, but I think I makes sense to rename this attribute. We could think about it as the value in the yaml file. So, I propose to get rid of args and introduce value for Feature (the main class instead of the subclasses). Removing duplication would simplify the code.

@mr-tz @williballenthin what do you think?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.