trailofbits / fickling Goto Github PK

A Python pickling decompiler and static analyzer

License: GNU Lesser General Public License v3.0

Python 99.02% Makefile 0.98%

fickling's Introduction

Fickling

Fickling is a decompiler, static analyzer, and bytecode rewriter for Python pickle object serializations. You can use fickling to detect, analyze, reverse engineer, or even create malicious pickle or pickle-based files, including PyTorch files.

Fickling can be used both as a python library and a CLI.

Installation
Malicious file detection
Advanced usage
More information
Contact

Installation

Fickling has been tested on Python 3.8 through Python 3.11 and has very few dependencies. Both the library and command line utility can be installed through pip:

python -m pip install fickling

PyTorch is an optional dependency of Fickling. Therefore, in order to use Fickling's pytorch and polyglot modules, you should run:

python -m pip install fickling[torch]

Malicious file detection

Fickling can seamlessly be integrated into your codebase to detect and halt the loading of malicious files at runtime.

Below we show the different ways you can use fickling to enforce safety checks on pickle files. Under the hood, it hooks the pickle library to add safety checks so that loading a pickle file raises an UnsafeFileError exception if malicious content is detected in the file.

Option 1 (recommended): check safety of all pickle files loaded

# This enforces safety checks every time pickle.load() is used
fickling.always_check_safety()

# Attempt to load an unsafe file now raises an exception  
with open("file.pkl", "rb") as f:
    try:
        pickle.load(f)
    except fickling.UnsafeFileError:
        print("Unsafe file!")

Option 2: use a context manager

with fickling.check_safety():
    # All pickle files loaded within the context manager are checked for safety
    try:
        with open("file.pkl", "rb") as f:
            pickle.load("file.pkl")
    except fickling.UnsafeFileError:
        print("Unsafe file!")

# Files loaded outside of context manager are NOT checked
pickle.load("file.pkl")

Option 3: check and load a single file

# Use fickling.load() in place of pickle.load() to check safety and load a single pickle file 
try:
    fickling.load("file.pkl")
except fickling.UnsafeFileError as e:
    print("Unsafe file!")

Option 4: only check pickle file safety without loading

# Perform a safety check on a pickle file without loading it
if not fickling.is_likely_safe("file.pkl"):
    print("Unsafe file!")

Accessing the safety analysis results

You can access the details of fickling's safety analysis from within the raised exception:

>>> try:
...     fickling.load("unsafe.pkl")
... except fickling.UnsafeFileError as e:
...     print(e.info)

{
    "severity": "OVERTLY_MALICIOUS",
    "analysis": "Call to `eval(b'[5, 6, 7, 8]')` is almost certainly evidence of a malicious pickle file. Variable `_var0` is assigned value `eval(b'[5, 6, 7, 8]')` but unused afterward; this is suspicious and indicative of a malicious pickle file",
    "detailed_results": {
        "AnalysisResult": {
            "OvertlyBadEval": "eval(b'[5, 6, 7, 8]')",
            "UnusedVariables": [
                "_var0",
                "eval(b'[5, 6, 7, 8]')"
            ]
        }
    }
}

If you are using another language than Python, you can still use fickling's CLI to safety-check pickle files:

fickling --check-safety -p pickled.data

Advanced usage

Trace pickle execution

Fickling's CLI allows to safely trace the execution of the Pickle virtual machine without exercising any malicious code:

fickling --trace file.pkl

Pickle code injection

Fickling allows to inject arbitrary code in a pickle file that will run every time the file is loaded

fickling --inject "print('Malicious')" file.pkl

Pickle decompilation

Fickling can be used to decompile a pickle file for further analysis

>>> import ast, pickle
>>> from fickling.fickle import Pickled
>>> fickled_object = Pickled.load(pickle.dumps([1, 2, 3, 4]))
>>> print(ast.dump(fickled_object.ast, indent=4))
Module(
    body=[
        Assign(
            targets=[
                Name(id='result', ctx=Store())],
            value=List(
                elts=[
                    Constant(value=1),
                    Constant(value=2),
                    Constant(value=3),
                    Constant(value=4)],
                ctx=Load()))],
    type_ignores=[])

PyTorch polyglots

PyTorch contains multiple file formats with which one can make polyglot files, which are files that can be validly interpreted as more than one file format. Fickling supports identifying, inspecting, and creating polyglots with the following PyTorch file formats:

PyTorch v0.1.1: Tar file with sys_info, pickle, storages, and tensors
PyTorch v0.1.10: Stacked pickle files
TorchScript v1.0: ZIP file with model.json
TorchScript v1.1: ZIP file with model.json and attributes.pkl
TorchScript v1.3: ZIP file with data.pkl and constants.pkl
TorchScript v1.4: ZIP file with data.pkl, constants.pkl, and version set at 2 or higher (2 pickle files and a folder)
PyTorch v1.3: ZIP file containing data.pkl (1 pickle file)
PyTorch model archive format[ZIP]: ZIP file that includes Python code files and pickle files

>> import torch
>> import torchvision.models as models
>> from fickling.pytorch import PyTorchModelWrapper
>> model = models.mobilenet_v2()
>> torch.save(model, "mobilenet.pth")
>> fickled_model = PyTorchModelWrapper("mobilenet.pth")
>> print(fickled_model.formats)
Your file is most likely of this format:  PyTorch v1.3 
['PyTorch v1.3']

Check out our examples to learn more about using fickling!

More information

Pickled Python objects are in fact bytecode that is interpreted by a stack-based virtual machine built into Python called the "Pickle Machine". Fickling can take pickled data streams and decompile them into human-readable Python code that, when executed, will deserialize to the original serialized object. This is made possible by Fickling’s custom implementation of the PM. Fickling is safe to run on potentially malicious files because its PM symbolically executes code rather than overtly executing it.

The authors do not prescribe any meaning to the “F” in Fickling; it could stand for “fickle,” … or something else. Divining its meaning is a personal journey in discretion and is left as an exercise to the reader.

Learn more about fickling in our blog post and DEF CON AI Village 2021 talk.

Contact

If you'd like to file a bug report or feature request, please use our issues page. Feel free to contact us or reach out in Empire Hacking for help using or extending fickling.

License

This utility was developed by Trail of Bits. It is licensed under the GNU Lesser General Public License v3.0. Contact us if you're looking for an exception to the terms.

fickling's People

Contributors

Stargazers

Watchers

fickling's Issues

Add direct support for PyTorch/TorchScript serialized models

Right now, pytorch_poc.py injects malicious code contained within the pickle files of the PyTorch standard model format. This and the TorchScript serialization format are ZIP archives with pickle files. It would be great to expand upon those and provide users with easy-to-use functions that can directly manipulate these files since they're relatively common.

NotImplementedError: TODO: Add support for Opcode BININT

File "/Users/abenavides/workspace/enricher/fury_fda-models-hub-enricher/venvpython3/lib/python3.8/site-packages/fickling/pickle.py", line 106, in new
raise NotImplementedError(f"TODO: Add support for Opcode {info.name}")
NotImplementedError: TODO: Add support for Opcode BININT

Update `pytorch_poc.py`

This PoC should instead use the PyTorch module inside of fickling.

Injections not cleaning up after itself.

The malicious code injected doesn't clean up the stack after itself which is what prevents it from being injected into arbitrary locations. This also would be the easiest way to detect pickles you've injected into. A "correct" pickle will only leave one value on the stack when everything is done, the pointer to the final object. I've never seen a real pickle not comply with this, so using pickletools.dis or your symbolic interpreter you can detect pickles you've injected into because it leaves two values on the stack whether you inject at the beginning or end.

You can make the injections more covert by adding a pop instruction to the end so that it cleans up after itself. You would then also be able to inject into an arbitrary location like I do in https://github.com/coldwaterq/pickle_injector/blob/main/inject.py.

For replacing the output you would add the pop instruction to the beginning of your payload / end of the real pickle, to throw away everything created and replace it with what you create.

Errors when scanning Stable Diffusion/Textual Inversion embeddings pickle file

I'm trying to give the stable diffusion community the ability to trade Textual Inversion embeddings (basically, fine-tuning the model) between each other. When I run fickle against one my embeddings, I see this:

(base) berble@berbletron:~/Downloads/archive$ fickling -t data.pkl

PROTO
EMPTY_DICT
Pushed {}
BINPUT
Memoized 0 -> {}
MARK
Pushed MARK
BINUNICODE
Pushed 'string_to_token'
BINPUT
Memoized 1 -> 'string_to_token'
EMPTY_DICT
Pushed {}
BINPUT
Memoized 2 -> {}
BINUNICODE
Pushed ''
BINPUT
Memoized 3 -> ''
GLOBAL
Traceback (most recent call last):
File "/home/berble/.local/bin/fickling", line 8, in
sys.exit(main())
File "/home/berble/.local/lib/python3.8/site-packages/fickling/cli.py", line 82, in main
print(unparse(trace.run()))
File "/home/berble/.local/lib/python3.8/site-packages/fickling/tracing.py", line 54, in run
self.on_statement(added)
File "/home/berble/.local/lib/python3.8/site-packages/fickling/tracing.py", line 38, in on_statement
print(f"\t{unparse(statement).strip()}")
File "/home/berble/.local/lib/python3.8/site-packages/astunparse/init.py", line 13, in unparse
Unparser(tree, file=v)
File "/home/berble/.local/lib/python3.8/site-packages/astunparse/unparser.py", line 38, in init
self.dispatch(tree)
File "/home/berble/.local/lib/python3.8/site-packages/astunparse/unparser.py", line 66, in dispatch
meth(tree)
File "/home/berble/.local/lib/python3.8/site-packages/astunparse/unparser.py", line 113, in _ImportFrom
interleave(lambda: self.write(", "), self.dispatch, t.names)
File "/home/berble/.local/lib/python3.8/site-packages/astunparse/unparser.py", line 19, in interleave
f(next(seq))
File "/home/berble/.local/lib/python3.8/site-packages/astunparse/unparser.py", line 66, in dispatch
meth(tree)
File "/home/berble/.local/lib/python3.8/site-packages/astunparse/unparser.py", line 856, in _alias
if t.asname:
AttributeError: 'alias' object has no attribute 'asname'

Any idea where I could start looking? We'd really like to be able to share embeddings safely!

Here's a base64-encoded copy of my data.pkl:

gAJ9cQAoWA8AAABzdHJpbmdfdG9fdG9rZW5xAX1xAlgBAAAAKnEDY3RvcmNoLl91dGlscwpfcmVi
dWlsZF90ZW5zb3JfdjIKcQQoKFgHAAAAc3RvcmFnZXEFY3RvcmNoCkxvbmdTdG9yYWdlCnEGWAEA
AAAwcQdYAwAAAGNwdXEIS010cQlRSwEpKYljY29sbGVjdGlvbnMKT3JkZXJlZERpY3QKcQopUnEL
dHEMUnENc1gPAAAAc3RyaW5nX3RvX3BhcmFtcQ5jdG9yY2gubm4ubW9kdWxlcy5jb250YWluZXIK
UGFyYW1ldGVyRGljdApxDymBcRB9cREoWAgAAAB0cmFpbmluZ3ESiFgLAAAAX3BhcmFtZXRlcnNx
E2gKKVJxFGgDY3RvcmNoLl91dGlscwpfcmVidWlsZF9wYXJhbWV0ZXIKcRVoBCgoaAVjdG9yY2gK
RmxvYXRTdG9yYWdlCnEWWAEAAAAxcRdYBgAAAGN1ZGE6MHEYTQADdHEZUUsASwFNAAOGcRpNAANL
AYZxG4loCilScRx0cR1ScR6IaAopUnEfh3EgUnEhc1gIAAAAX2J1ZmZlcnNxImgKKVJxI1gbAAAA
X25vbl9wZXJzaXN0ZW50X2J1ZmZlcnNfc2V0cSRjX19idWlsdGluX18Kc2V0CnElXXEmhXEnUnEo
WA8AAABfYmFja3dhcmRfaG9va3NxKWgKKVJxKlgWAAAAX2lzX2Z1bGxfYmFja3dhcmRfaG9va3Er
TlgOAAAAX2ZvcndhcmRfaG9va3NxLGgKKVJxLVgSAAAAX2ZvcndhcmRfcHJlX2hvb2tzcS5oCilS
cS9YEQAAAF9zdGF0ZV9kaWN0X2hvb2tzcTBoCilScTFYGgAAAF9sb2FkX3N0YXRlX2RpY3RfcHJl
X2hvb2tzcTJoCilScTNYGwAAAF9sb2FkX3N0YXRlX2RpY3RfcG9zdF9ob29rc3E0aAopUnE1WAgA
AABfbW9kdWxlc3E2aAopUnE3WAUAAABfa2V5c3E4fXE5aANOc3VidS4=

can't create a safe python class

numpy_poc example has the following class as an example of an unsafe class:

...
class Test(object):
    def __init__(self):
        self.a = 1

    def __reduce__(self):
        # Runs the other PoC found in /examples
        return (os.system, ("python pytorch_poc.py",))
...

removing the unsafe __reduce__ method from the class is not enough to make it safe:

...
class Test(object):
    def __init__(self):
        self.a = 1
...

$ python example/numpy_poc.py
...
Is this is_likely_safe?
❌

Is this behavior expected?

Polyglot module improvements

To better account for different parser implementations and to make identification more robust, we could use PolyFile and call it in Fickling. In addition, we should ensure the module directly corresponds to the Netron consensus as the present file format descriptions can be more granular. Specifically, the version numbers in the file name are partially dynamically generated in Netron; Fickling’s file format naming convention chooses the minimum possible file format version instead (for instance, a TorchScript v1.6 file in Netron may be deemed a TorchScript v1.4 by Fickling). Any issues with the PyTorch file format versioning system should be taken into consideration.

Current list:

PyTorch v0.1.1: Tar file with sys_info, pickle, storages, and tensors
PyTorch v0.1.10: Stacked pickle files
TorchScript v1.0: ZIP file with model.json
TorchScript v1.1: ZIP file with model.json and attributes.pkl (1 pickle file)
TorchScript v1.3: ZIP file with data.pkl and constants.pkl (2 pickle files)
TorchScript v1.4: ZIP file with data.pkl, constants.pkl, and version set at 2 or higher (2 pickle files)
PyTorch v1.3: ZIP file containing data.pkl (1 pickle file)
PyTorch model archive format [ZIP]: ZIP file that includes Python code files and pickle files

Is `socket` not considered an unsafe import ?

Hey,

I was wondering while looking at the Pickled.unsafe_imports method why socket was not present.

Have a nice day :)

Function hook does not work on all PyTorch inputs

The global function hook (shown in hook_functions.py does not work on all PyTorch model inputs. I added print statements in hook.run_hook and fickling.load() to demonstrate. More concretely, it does not work on PyTorch v1.3 files, but it does work on PyTorch v0.1.10 files.

PoC:

import os
import pickle
import numpy
import fickling.hook as hook
import torch
import torchvision.models as models

hook.run_hook()

model = models.mobilenet_v2()

torch.save(model, "model.pt")
torch.save(model, "legacy_model.pt", _use_new_zipfile_serialization=False)
print("\n\nMODEL\n\n")
torch.load("model.pt")
print("LEGACY MODEL")
torch.load("legacy_model.pt")

Output:

Running hook!
MODEL
LEGACY MODEL
Running fickling.load()
Running fickling.load()
Running fickling.load()
Running fickling.load()

Not python3.12 compatible due to distutils deprecation.

On python3.12, import fickling leads to ModuleNotFoundError: No module named 'distutils'

distutils has been deprecated in python3.12.

distutils.sysconfig.get_python_lib specifically is called as having no direct replacement and is used in fickle.py to determine is a module is in the standard library or not.

NotImplementedError: TODO: Add support for Opcode BINFLOAT

I was trying out something sophisticated with a simple model pre-trained on MNIST. But I git this error.

Traceback (most recent call last):
  File ".\pytorch_poc.py", line 147, in <module>
    exfil_model.pickled.insert_python_exec(
  File ".\pytorch_poc.py", line 58, in pickled
    self._pickled = Pickled.load(pickle_file)
  File "C:\Python38\lib\site-packages\fickling\pickle.py", line 343, in load
    opcodes.append(Opcode(info=info, argument=arg, data=data, position=pos))
  File "C:\Python38\lib\site-packages\fickling\pickle.py", line 105, in __new__
    raise NotImplementedError(f"TODO: Add support for Opcode {info.name}")
NotImplementedError: TODO: Add support for Opcode BINFLOAT

I guess, the project still needs work to allow to make a full-fledged ML-based attack.
Any plans for when this will be completed?

Further align the CLI and Python API

We should incorporate more Python API features inside of the CLI such as the PyTorch and polyglot modules. This would help with #97.

Possible to apply heuristics scan to pickle files?

I'm not so familiar with pickling and these scans. However, I wondered if maybe there are heuristics or signatures for certain types of pickle files that could be evaluated.

If you knew for example that a pickle file should be for a stable diffusion model, some properties could be examined that might help to verify a bit more.

If so, could set up something like a /signatures directoy and let people pull request in definitions, then could scan -security -sig='signatures/typename'

This can be closed, just wanted to pass the idea by in case it could be useful

Support more pickle-based file formats

NumPy
Sklearn/Joblib
Riva
Nemo
PyTorch Package
Executorch (only for identification)
PyTorch Mobile (only for identification)

Error using check-safety/trace features (AttributeError: 'alias' object has no attribute 'asname')

Hello! Great tool, I like that it also includes a way to check for potentially malicious opcodes in pickle files.

I injected a payload into a stylegan2-ada pickle file and it behaves as expected. :)

Now, when running both --check-safety or --trace commands the following error is shown:

!fickling --check-safety /tmp/network-snapshot-000250.backdoor.pkl

Traceback (most recent call last):
  File "/usr/local/bin/fickling", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/fickling/cli.py", line 79, in main
    return [1, 0][check_safety(pickled)]
  File "/usr/local/lib/python3.7/dist-packages/fickling/analysis.py", line 38, in check_safety
    shortened, already_reported = shorten_code(node)
  File "/usr/local/lib/python3.7/dist-packages/fickling/analysis.py", line 23, in shorten_code
    code = unparse(ast_node).strip()
  File "/usr/local/lib/python3.7/dist-packages/astunparse/__init__.py", line 13, in unparse
    Unparser(tree, file=v)
  File "/usr/local/lib/python3.7/dist-packages/astunparse/unparser.py", line 38, in __init__
    self.dispatch(tree)
  File "/usr/local/lib/python3.7/dist-packages/astunparse/unparser.py", line 66, in dispatch
    meth(tree)
  File "/usr/local/lib/python3.7/dist-packages/astunparse/unparser.py", line 113, in _ImportFrom
    interleave(lambda: self.write(", "), self.dispatch, t.names)
  File "/usr/local/lib/python3.7/dist-packages/astunparse/unparser.py", line 19, in interleave
    f(next(seq))
  File "/usr/local/lib/python3.7/dist-packages/astunparse/unparser.py", line 66, in dispatch
    meth(tree)
  File "/usr/local/lib/python3.7/dist-packages/astunparse/unparser.py", line 856, in _alias
    if t.asname:
AttributeError: 'alias' object has no attribute 'asname'

Let me know if there is anything more needed to debug the issue.

Greetings!

torch dependency

Hello and thank you very much for all your hard work! We use fickling as a dependency of polyfile. Version 0.1.0 added a significant build time for us due to the inclusion of torch as a requirement.

We can continue working with 0.0.8 for now, so we have no complaints. But we were wondering if the torch requirement could be made optional in the future?

Thank you again!

Support hooking `pickle.loads()` and add `fickling.loads()`

Similar to our hooks on pickle.load(), we should support pickle.loads(). It will be fairly easy to do so.

Injection does not properly handle `PROTO` and `FRAME`

Add support for injecting in the presence of these opcodes.

Support more pickle-based file formats and can san it

Hi, there are a lot of malicious POC under the url address https://github.com/mmaitre314/picklescan/tree/main/tests/data, and then use https://github.com/mmaitre314/picklescan the tool scans these pickle files normally and outputs the results. However, when using the fickling tool to scan these pickle files, multiple errors are reported, such as malicious10.pkl, malicious1.zip and so on.

runpy._run_code and torch.jit.unsupported_tensor_ops.execWrapper

Hello,

I've been playing around with some alternative ways to execute Python via pickles, and discovered both runpy._run_code and torch.jit.unsupported_tensor_ops.execWrapper can be used to call into exec without fickling detecting it. I have some demo code here that will create pickles using these techniques: https://bitbucket.org/hiddenlayersec/sai/src/master/pytorch_inject/torch_picke_inject.py

runpy._run_code produces no warnings, and execWrapper generates a "Call to execWrapper(...) can execute arbitrary code and is inherently unsafe" warning.

It might be worth adding explicit checks for both of these methods and detecting as overtly bad.

Many thanks btw for the awesome library!

Best regards,

Tom

Nothing

Nice 😅

check-safety returns no output

After installing fickling with python3 -m pip install fickling, and creating a simple test file with $ fickling simple_list.pickle, running $ fickling --check-safety simple_list.pickle returns no results. After injecting "hello world" with --inject, there is also no warning from check-safety, as demonstrated in the article 'Never a dill moment: Exploiting machine learning pickle files' from March 2021.

Adding an `allow-list` to the list of packages from which imports are considered safe

Right now the scanner checks if there is any imports from a package which is not a part of the standard library. However, when it comes to dumps of machine learning models, they are going to have imports from the used libraries (e.g. scikit-learn, numpy, scipy, lightgbm, etc).

It would be nice to have an option to include a list of libraries which are considered safe by the user, and not raise a warning on their imports.

We are planning to use this library at @huggingface, but such a modification is necessary to make it work for us. It would be really nice if we could include it in the upstream.

cc @McPatate (who would be happy to work on a PR).

WDYT @ESultanik ?

multiple pickles can be stacked in one file

since pickles aren't a file type you can dump multiple into a single file, which some ML frameworks do. As such injections will create a much smaller and incomplete version of the original when using fickling from the cli.

It would be nice to have a flag to determine which pickle to inject into, and to dump the remaining bytes in the file to the output when injection.

When detecting, all pickles in the file should be scanned so that injecting into the seccond pickle doesn't completely bypass all detections.

NotImplementedError: TODO: Add support for Opcode LONG1

When attempting to use fickling on PyTorch models I get this error. I believe these models were just the weights. So i'm currious if this is hard to fix, and if you don't have time to fix it, any guidance you can give about the code base to help me attempt to patch it.

Port this project's metadata over to `pyproject.toml`

In my ongoing effort to kill setup.py in all of our projects.