GithubHelp home page GithubHelp logo

bediger4000 / reverse-php-malware Goto Github PK

View Code? Open in Web Editor NEW
76.0 9.0 34.0 194 KB

De-obfuscate and reverse engineer PHP malware

License: MIT License

PHP 99.33% Shell 0.67%
de-obfuscates-php malware deobfuscator deobfuscate

reverse-php-malware's Introduction

An aid to de-obfuscating PHP malware

Lots of malware that afflicts WordPress, Joomla and other PHP-based web sites is written in PHP. PHP is an interpreted language, so attackers distribute malware as source code. Much of the PHP malware is obfuscated. This (PHP) program does de-obfuscation to aid human understanding of the malware.

If you come across or possess PHP malware and this program doesn't de-obfuscate, email me [email protected] I will look into improving this code to handle your malware.

Why is PHP malware obfuscated?

My guess is that attackers obfuscate their PHP code for three reasons:

  1. To evade simple signature or checksum based malware detection.
  2. To attempt to keep website owners from understanding what the malware does.
  3. To keep other malware writers from "stealing" their code, or even understanding it.

I base guess no. 1 on the fact that obfuscation methods change rapidly, sometimes only getting used for a single installation of malware.

I make guess no. 2 because the obfuscation is often just a "visual confusion" thing, rather than any kind of encryption. Having assert() evaluate a single, very long line of PHP isn't going to fool any algorithm, but the human eye might glide right past it.

I base guess no. 3 on the fact that most PHP malware is either embarassling simple, or evolves by wholesale feature addition, even if that feature is a hidden back door, or phone-home-emails. Keeping other inept programmers from understanding the code might give an individual a temporary advantage.

Features

  • Replaces strings obfuscated by Base64 (encoding and decoding), Rot13, URL-encoded, reversed and some forms of compression.
  • It can de-obfuscate strings that are created by composing encoding, decoding, compression and other manipulations.
  • It can replace function names that are obfuscated by indirection (i.e. $function($arg1, $arg2...);), or by tricky use of $GLOBALS
  • It can replace variable names that are obfuscaed by the same indirections.
  • It replaces arguments of functions of special interest (eval(), fopen(), preg_replace(), etc) with de-obfuscated, or otherwise statically determined values.
  • It aggregates concatenated strings, or concatenated mixes of strings and obscuring function calls.
  • It evaluates Array() creations to allow deobfuscating strings made by concatenating array elements.
  • It pretty-prints function body arguments of create_function() invocations, composing names for the anonymous functions created that way, and uses those names to de-obfuscate.

Evaluating Array() calls when creating arrays means that revphp changes its own code on-the-fly. Hopefully this doesn't lead to code injection from malware into revphp, but the possibility is there. For better or for worse, malware uses arrays of strings quite often, so some feature like this is necessary.

Installing

Use composer to retrieve the latest PHP-Parser code:

composer install

After that, everything should be in place.

Usage

Basic usage involves a file full of obfuscated PHP, and stdout:

/wherever/reerse-php-malware/revphp  obfuscated.php > pretty.php

or

/wherever/reerse-php-malware/revphp  -R obfuscated.php > pretty.php

The -R flag causes revphp to examine all variable name and replace those names that are indirected by various techniques.

Command line flag -C causes it to leave comments in the ouput code. Ordinarily it deletes comments, because who can believe comments in malware?

Should you find a function in some malware that deserves to have its arguments decoded, you can add that via a -f flag. For instance, fwrite() calls don't have their arguments de-obfuscated by default. To get revphp to do that:

/wherever/reverse-php-malware/revphp -f fwrite obfuscated.php > cleanedup.php

Occasionally you will want to rename a function (and its calls) in the de-obfuscated code. You can use the -F original=new flag:

/wherever/reverse-php-malware/revphp -F OO_000O__O=htaccess_creator obfuscated.php > cleanedup.php

In the file cleanedup.php, all calls to OO_000O__O() will appear as htaccess_creator(), and the function definition will also appear as function htaccess_creator().

Very rarely, a malware author will put in a unique obfuscating function that's not merely a composition of base64_encode(), gzinflate() and rot13(). In that case, you can edit out the obfuscating function into its own file. revephp can read, evaluate, and use that special function during deobfuscation:

/wherever/reverse-php-malware/revphp -D decoding_function.php obfuscated.php > cleanedup.php

The testing script runtests includes a test of a unique obfuscating function. runtests invokes this:

./revphp -r zork -D tests/zork.php tests/t1_1.php

The PHP functions in file tests/zork.php get read in and evaluated using the -D flag. The -r zork flag causes revphp to examine and replace any obfuscated arguments to invocations of function zork() in the subject PHP, tests/t1_1.php. This is a somewhat confusing example, because tests/zork.php contains the definition of function zork(), and so does tests/t1_1.php. One function zork(), the one in tests/t1_1.php, just get de-obfuscated. The other definition of function zork(), in tests/zork.php gets read in and evaluted by the -D flag. The -r zork flag causes revphp to invoke the read-in-and-evaluated function zork() while revphp is traversing the parse tree of tests/t1_1.php.

This closely mimics a realistic situation, where you might run revphp on some malware PHP. You find that revphp can't decode some key obfuscated strings because the malware PHP has a custom decoding function. You can extract a copy of the custom decoding function into a file, and re-invoke revphp with appropriate -D and -r flags to cause the important strings to be de-obfuscated by the custom decoding function.

Design

revphp is written in PHP, and de-obfuscates PHP, in a kind of philosophical short-circuit.

revphp uses PHP-Parser to create a parse tree from a source file, then traverses the parse tree. It keeps a global symbol table, and local symbol tables, which are created and destroyed on parse-tree-function entrance and exit.

During the traverse of the parse tree, it keeps track of assignments to variables. Any value it can de-obfuscate by base64_decode(), urldecode(), strrev(), gzinflate() and gzuncompress(), it will associate with variable's name in the symbol table. It substitutes de-obfuscated values for obfuscated in the parse tree. Most of the work revphp does is evaluting (if it can) the right hand side of assignment statements. PHP malware tends to use a lot of superfluous variables, and a lot of assignments to and from thosse superfluous variables. Tracking variable contents allows de-obfuscation later.

PHP malware tries to obfuscate function calls, both names of functions, and arguments to functions. When revphp reaches a function call in the parse tree, it tries to de-obfuscate any indirect function names (like $fn()), substituting de-obfuscated for obfuscated function name in the parse tree. If revphp happens upon an instance of a select list of functions (some built-in, or set by -f flag on command line), it examines any arguments and tries to substitute de-obfuscated arguments for obfuscated arguments in the parse tree.

Keeping a global and local symbol table allows revphp to de-obfuscate constructions like this:

<?php
$glorf = 'c3lzdGVt';
$frolg = 'ZWNobyAiSGVsbG8sIHdvcmxkIg==';
// ... lots of code ...
function doBadStuff() {
    $fn = base64_decode($GLOBALS['glorf']);
    $fn(base64_decode($GLOBALS['frolg']));
}

After it has completely traversed the parse tree, revphp uses a PHP-Parser built-in pretty-printer to present the user with a (possibly de-obfuscated) source translation. Any anonymous functions created with create_function() calls have their bodies pretty printed at this time. Pretty-printing malware is about half the way towards understanding it.

The design of PHP-Parser caused me to create class RevPHPNodeVisitor extends PhpParser\NodeVisitorAbstract which contains almost all of the above functionality.

Testing

Directories zoo/ and tests/ contain pieces of PHP that illustrate obfuscations found in PHP malware. Many of the test cases are simplified extracts from malware that earlier versions of revphp had problems de-obfuscating. Invoking runtests will execute revphp against all PHP fragments in zoo/, and check the output against desired/correct outputs in desired/.

runtests also executes more complex scenarios, with code residing in tests/.

reverse-php-malware's People

Contributors

bediger4000 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reverse-php-malware's Issues

PHP Warning: addcslashes() expects parameter 1 to be string, object given

Hi, getting this error 5 times before it ends:

PHP Warning: addcslashes() expects parameter 1 to be string, object given in /.../reverse-php-malware/vendor/nikic/php-parser/lib/PhpParser/PrettyPrinter/Standard.php on line 101

Using:

$ php -v
PHP 7.3.21 (cli) (built: Aug  4 2020 08:06:20) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.21, Copyright (c) 1998-2018 Zend Technologies

Tried replacing vendor/nikic/php-parser with latest version, but then got this:

PHP Notice:  Undefined property: PhpParser\Node\Param::$name in /.../reverse-php-malware/RevPHPNodeVisitor.php on line 402
PHP Notice:  Undefined property: PhpParser\Node\Param::$name in /.../reverse-php-malware/RevPHPNodeVisitor.php on line 402
PHP Notice:  Undefined property: PhpParser\Node\Param::$name in /.../reverse-php-malware/RevPHPNodeVisitor.php on line 402
PHP Fatal error:  Uncaught TypeError: Argument 1 passed to PhpParser\Node\Scalar\String_::__construct() must be of the type string, object given, called in /.../reverse-php-malware/RevPHPNodeVisitor.php on line 439 and defined in /.../reverse-php-malware/vendor/nikic/php-parser/lib/PhpParser/Node/Scalar/String_
.php:36
Stack trace:
#0 /.../reverse-php-malware/RevPHPNodeVisitor.php(439): PhpParser\Node\Scalar\String_->__construct(Object(PhpParser\Node\Expr\Variable))
#1 /.../reverse-php-malware/RevPHPNodeVisitor.php(498): RevPHPNodeVisitor->newNodeAsType(Object(PhpParser\Node\Expr\Variable))
#2 /.../reverse-php-malware/RevPHPNodeVisitor.php(700): RevPHPNodeVisitor->performAssignment(Object(PhpParser\Node\Expr\Assign))
#3 /.../reverse-php-malware/vendor/nikic/php-parser/lib/PhpParser/NodeTraverser.php(153): RevPHPNodeVisitor->leaveNode(Object(PhpParser\Node\Expr\Assign))
#4 /... in /.../reverse-php-malware/vendor/nikic/php-parser/lib/PhpParser/Node/Scalar/String_.php on line 36

I need little help

Hi i need help aboud merged and obfuscated codes,
I have both file for decode, one of if variable definations called name is eg: code
like this
` public $x673 = null;
public $x681 = null;
public $x68c = null;
public $x6be = null;

function __construct()
{
    $this->x673 = new \StdClass();
    $this->x681 = new \StdClass();
    $this->x68c = new \StdClass();
    $this->x6be = new \StdClass();
    $this->x673->x6cf = "\x78\66\143\144";
    $this->x6be->x176c = "\1PbqpeGUtu6gwPq2ujNkcnqEkDHqHqk2eR7";
    $this->x673->x8bf = "\x78\70b\141";
    $this->x681->xbfc = "\x78b\146\70";
    $this->x6be->x13d4 = "\x78\61\63d\61";....`

and another obfuscated file also taking function or variable names from this file with class extends.
how can decode thoose files correctly ? if i decode first files ( definations only ) its corrcectly decode, but another one give me following errors
Couldn't find variable name
Couldn't find variable name
Could not find function name to see if arguments get replaced, line 13.
Couldn't find variable name
Could not find function name to see if arguments get replaced, line 13.
Couldn't find variable name
Could not find function name to see if arguments get replaced, line 13.
Could not find function name to see if arguments get replaced, line 13.
Could not find function name to see if arguments get replaced, line 13.
Could not find function name to see if arguments get replaced, line 13.
Could not find function name to see if arguments get replaced, line 13.
Could not find function name to see if arguments get replaced, line 13.

Not malware but contains good excercises for deobfuscating

Sample source code: https://malwaredecoder.com/result/0a2a7c5bb813d755f72823f2a5895ac8

this is the cleanup module of ai-bolit scanner and it is a great sample of obfuscation techniques.

Until the third run of deobfuscation, I was missing the point about using the magic constant __FILE__

$f000001 = basename(__FILE__); // 'procu2.php' 

This specific script relies in its obfuscation techniques only on the basename: procu2.php, but for the sake of completeness The full path and filename of the file is

/opt/psa/admin/plib/modules/revisium-antivirus/library/externals/procu2.php

Regarding the usage of __FILE__ I noticed a good solution in the linked malwaredecoder source

  1. the first issue, that reverse-php-malware and PHP-Parser could not solve on it's own, was a goto pair, that jumps in the middle of the code, sets the function alias variables, then jumps back to the beginning.
    Without the variable to function replacements lot of things cannot be further deobfuscated.

  2. the second issue, that needed manual help, was the usage of XOR operator, sometimes combined with variable concatenatiion, and string-part deobfuscation like base64_decode to generate the function-names

$f000000001 = "CDB" ^ "000"; // "str"
$a000000001 = $f000000001 . ("E]FDESE" ^ "1234567"); // "str" . "toupper" = "strtoupper"
$b000000001 = $f000000001 . '_repeat'; // "str" ."_repeat" = "str_repeat"
$d000000001 = ("P@AULi@YU" ^ "1234567890") . 'k'; // "array_wal".'k' = "array_walk"
$c000000001 = strlen($a000000001) - 4; // = 6
  1. the third issue, that had to be resolved manually, was caused by some introduced compexity:
    either the name of a variable, that represented a function, was concatenated a weird way, array_map("i1001" . '100100111100111', array(''));
    or one time for example, the multi level obfuscation generated a list of function names, that were called by array_walk in their order of appearance as an argument of a wrapper function, that processed a strongly obfuscated string recursively, to replace a combination of base64decode, compression, rot13, etc calls .
$f000000001 = 'str';
$a000000001 = 'strtoupper';
$b000000001 = 'str_repeat';
$d000000001 = 'array_walk';
$c000000001 = 6;
$e000000001 = 'i1001010100101001';
$f000001 = $a000000001($b000000001($f000001, $c000000001)); // $f000001 = strtoupper(str_repeat('procu2.php', 6));
$i01000010100 = $i0010100010101(",", $i00100001000100("MjM8JmMGcTQtMz82Km8ySEc+LjwxJipvJkZcDzo/JGN8byZGXCItJnwwLjAwBBoPLDUzPSsm") ^ $f000001); // explode(',', "base64_decode,gzinflate,str_rot13,strrev,base64_decode");
$d000000001($i01000010100, $e000000001); // array_walk(array('base64_decode','gzinflate','str_rot13','strrev','base64_decode'), i1001010100101001);

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.