GithubHelp home page GithubHelp logo

mrexodia / dumpulator Goto Github PK

View Code? Open in Web Editor NEW
665.0 19.0 45.0 768 KB

An easy-to-use library for emulating memory dumps. Useful for malware analysis (config extraction, unpacking) and dynamic analysis in general (sandboxing).

License: Boost Software License 1.0

Python 41.39% C++ 6.63% C 51.98%
python python3 malware-analysis malware-research malware-analyzer unicorn minidump emulator x64 easy-to-use

dumpulator's People

Contributors

anthonyprintup avatar calastrophe avatar mrexodia avatar oopsmishap avatar regionuser avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dumpulator's Issues

Saving/restoring state

Right now the state after a .call or .start persists. It would be nice to have a feature to roll back memory changes without having to reload the dump.

I think unicorn has a feature for this.

Trace points

The idea would be to generate a unique number for each syscall invocation and print it in the log. The sequence should be deterministic for a given dumpulator version (although preferably across versions as well). Uses cases:

  • "Breakpoint" (stop emulation or literally a debugger breakpoint) at a certain ID you saw previously in the log for closer inspection.
  • Enable single step tracing (slow) only after/between trace points.
  • Dump the state after a certain trace point to avoid re-running the same code over and over again (needs #13)

Incorrect width for Enums in 64bit

When a syscall is processes an enum with a 64bit dump the value is treated as a 64bit integer. Within Windows syscalls most enums are treated as 32bit values. I can't find the documentation for this but if you start going through syscalls that use enums you will notice the trend.

When a syscall with an enum is called Dumpulator processes it as the following resulting in a ValueError

Error: ValueError: 549755813892 is not a valid FSINFOCLASS

549755813892 = 0x8000000004

Previous argument in the call had the value: 0x8

I have tried to implement a CTypes style of python enum's to no avail.
The following hotfix is enough to fix the issue, but could run into issues if an enum size is anything other than 32bit wide, however I have yet to find one that isn't 32bit wide.

elif issubclass(argtype, Enum):
    try:
        argvalue = argtype(dp.args[i] & 0xFFFFFFFF)
    except KeyError as x:
        raise Exception(f"Unknown enum value {dp.args[i]} for {type(argtype)}")

Support fixed-size arrays in `Struct`

This can currently not be ported:

def _RTL_PROCESS_MODULE_INFORMATION(arch: Architecture):
    class _RTL_PROCESS_MODULE_INFORMATION(ctypes.Structure):
        _alignment_ = arch.alignment()
        _fields_ = [
            ("Section", arch.ptr_type()),
            ("MappedBase", arch.ptr_type()),
            ("ImageBase", arch.ptr_type()),
            ("ImageSize", ctypes.c_uint32),
            ("Flags", ctypes.c_uint32),
            ("LoadOrderIndex", ctypes.c_uint16),
            ("InitOrderIndex", ctypes.c_uint16),
            ("LoadCount", ctypes.c_uint16),
            ("OffsetToFileName", ctypes.c_uint16),
            ("FullPathName", ctypes.c_ubyte * 256),
        ]
    return _RTL_PROCESS_MODULE_INFORMATION()

Fix loading of kernel32 on modern systems

SizeOfImage is not consecutive in memory. So unicorn fails to read the memory. Fix would be to read the memory in chunks of 0x1000 pages, or only pass the first page to pefile and load in pages when necessary.

allocate throwing error

Getting the following error with a simple allocation call.

Dumpulator Version 0.1.2

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-c4eb66b71b05> in <module>
----> 1 temp_addr = dp.allocate(256)

~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/dumpulator/dumpulator.py in allocate(self, size, page_align)
    499         if not self._allocate_ptr:
    500             self._allocate_base = self.memory.find_free(self._allocate_size)
--> 501             self.memory.reserve(
    502                 start=self._allocate_base,
    503                 size=self._allocate_size,

~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/dumpulator/memory.py in reserve(self, start, size, protect, type, info)
    163         assert isinstance(type, MemoryType)
    164         assert size > 0 and self.align_page(size) == size
--> 165         assert self.align_allocation(start) == start
    166         region = MemoryRegion(start, size, protect, type, info)
    167         if region.start < self._minimum or region.end > self._maximum:

~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/dumpulator/memory.py in align_allocation(self, addr)
    146     def align_allocation(self, addr: int):
    147         mask = self._granularity - 1
--> 148         return (addr + mask) & ~mask
    149 
    150     def find_free(self, size: int):

TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

Reproduce

from dumpulator import Dumpulator
dp = Dumpulator("shell.dmp")
temp_addr = dp.allocate(256)

Zipped version shell.dmp (shell.dmp.zip) available on Malshare

Implement a memory manager

Currently there is no information about the mapped memory regions available. There should be a robust system to manage memory and allow syscalls to use it for memory queries:

dp.mem_protect(addr, size, Memory.ReadWrite)
info = dp.mem_find(addr)
dp.mem_allocate(size, addr=None)
dp.mem_free(addr)

Improve performance of the `LazyPageManager`

This is a very naïve implementation. It stores a map of page_address -> LazyPage in the PageManager. Initially none of the pages are committed, so once execution starts it raises a memory exception. Once this happens the page is committed and emulation resumed.

There is no optimization done on the data structure yet, so a 10GB RW page would create ~10 million dictionary entries. The speedup is still very significant though.

#36

  • Lazily load the data from the minidump structure to improve startup times even further
  • Switch to a region-based structure (like the MemoryManager)
  • Load the full region in handle_lazy_page to reduce the amount of page faults

Better error when information is missing

   dp = Dumpulator("c:\\tmp\\dump1.dmp", trace=True)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\dumpulator\dumpulator.py", line 322, in __init__
    self._setup_memory()
  File "C:\Python311\Lib\site-packages\dumpulator\dumpulator.py", line 412, in _setup_memory
    for info in self._minidump.memory_info.infos:
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'infos'

This happens if you do not specify the MiniDumpWithFullMemoryInfo

Dumpulator takes a long time to load when handling large dumps

Feature Request: Please look into addressing the issue where large dump files aren't easily processed by dumpulator.

Description: Dumpulator is taking an excessively long time to load a dump file of 40MB. I was testing the decryption of an RC4 encrypted configuration block from gh0stRAT. I dumped out the process and it was 40MB. I fed this dump into dumpulator and it was still trying to load after 20 minutes.

Sample: https://www.virustotal.com/gui/file/0a9c881b857d4044cb6dfeba682190d7d9dc6ef94bc149cac77f3e0f65e9c69a

Implement a module manager

Module information should be available with the PEB data (base, size, path, exports, etc.) with a simple interface

Design ctypes equivalent for syscall implementation

Currently the type system for syscalls is very rough and you need to do a lot of manual work. A type system similar to ctypes needs to be implemented where you can set struct members, work with enums etc.

Once the type system is complete a pdb/header parser can be implemented to support all the native types.

Implement a handle manager

Handles are currently hacked in. Add a handle manager that allows something like this:

def ZwCreateFile_syscall(...):
    handle_data = (...)
    hFile = dp.handles.new(data)
    return hFile

def ZwReadFile_syscall(hFile: HANDLE, ...):
    handle_data = dp.handles.get(hFile, None)
    return

def ZwCloseHandle(hFile):
    dp.handles.close(hFile)

It should also support duplication (every handle points to refcounted data).

Proper (extensible) testing

Currently the "tests" are just running the "getting-started" examples. This doesn't scale and makes it difficult to find regressions.

The test framework:

  • A single unified Visual Studio solution with test executables
  • Build the solution with GitHub Actions so new feature PRs can be tested in the same PR
  • A dump of a minimal Windows usermode environment (ntdll, kernel32, kernelbase, x64+x86)
  • A dump of a more "complete" environment (winsock, cryptapi, advapi32, shell32, user32, etc.)
  • Load the test executables with dp.map_module
  • Execute the (exported) test functions on a fresh Dumpulator instance each time
  • Calculate code coverage per test

Necessary tests:

All of these should use /NODEFAULTLIB to not have to care about the MSVC runtime initialization (which uses a lot of unimplemented syscalls)

  • ExceptionTest (for testing different exception scenarios)
  • LoadLibrary (to test map_module and the whole PEB loading chain)
  • StringEncryptionFun
  • More depending on coverage...

x64 dump of x86 process fail to emulate

I have noticed if you do a dump on x64 OS of a x86 process (from task manager or procdump64 for example) dumpulator would try to emulate everything in x64 even though most of the dump is actually x86 causing some unexpected behavior, it doesn't happen if you do a x86 dump from x32dbg or from procdump(32).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.