GithubHelp home page GithubHelp logo

aff4 / pyaff4 Goto Github PK

View Code? Open in Web Editor NEW
43.0 13.0 25.0 18.85 MB

The Python implementation of the AFF4 standard.

License: Apache License 2.0

Shell 0.01% Python 99.99%
forensics compression-library compression-formats forensic-analysis digital-forensics

pyaff4's People

Contributors

ajnelson-nist avatar blschatz avatar enqueuing avatar gonmator avatar grrrrrrrrr avatar hillu avatar joachimmetz avatar kant avatar mikolajww avatar scudette avatar starwarsfan2099 avatar timbolle-unil avatar tweksteen avatar wenzel avatar ydkhatri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyaff4's Issues

Directories in ingested zips are errantly treated as files

I wrote a test to confirm that files in an aff4 archive created with pyaff4 match what I expect them to be, by using aff.py --extract-all. Unfortunately, dumping files fails, because a directory from my input is treated like a file. It appears to be an issue that affects all directories.

This processing path follows creating an aff4 archive from scratch using a zip. (Particularly, this is a zipped LoC Bag, though I don't think that has an impact apart from an internal path name not entirely relevant to this bug.) Reproduction instructions are included.

Suspected diagnosis

Every member of a zip, whether a file or directory, appears to be assigned the type aff4:FileImage per the --meta dump from the .aff4 file. I'm guessing in-zip directories should instead be aff4:FolderImage, as this query is being used to feed a loop:

for imageUrn in resolver.QueryPredicateObject(volume.urn, lexicon.AFF4_TYPE, lexicon.standard11.FileImage)

And in that loop, every FileImage is being created/treated as regular file. A directory thrown in the mix raises a IsADirectoryError.

Suspected correction

In the function BasicZipFile.parse_cd, somewhere before the info message on line 694, a check needs to be made for the file being a directory. The since-Python-3.6 method of checking for the last character of the name being "/" should do.

However, I don't know the code well enough to suggest where that information be integrated (aside from a check soon after fn is defined in that function), and propagated to causing a aff4:FolderImage. The ZipInfo class in that file?

Steps to reproduce

The code segments below work when run as individual shell scripts, confirmed on an Ubuntu 18.04 system.

  1. Create a zip with some directory in it.
#!/bin/bash

# step1.sh

rm -rf deep flat
mkdir -p flat
mkdir -p deep/input_dir_1

echo 'file 1' > flat/file1.txt
echo 'file 2' > flat/file2.txt
pushd flat
  zip -r ../flat.zip .
popd
rm -r flat

echo 'file 3' > deep/file3.txt
echo 'file 4' > deep/input_dir_1/file4.txt
pushd deep
  zip -r ../deep.zip .
popd
rm -r deep
  1. Ingest the zips into their respective aff4 archives.
#!/bin/bash

# step2.sh

# (First loading venv, fixing path to aff4.py ...)

python .../aff4.py \
  --hash \
  --ingest \
  --paranoid \
  --recursive \
  flat.aff4 \
  flat.zip

python .../aff4.py \
  --hash \
  --ingest \
  --paranoid \
  --recursive \
  deep.aff4 \
  deep.zip
  1. Extract everything from the flat aff4 archive. Currently works.

Pull Request 14 fixes an unrelated issue with the way extractAll is called, and updates Pull Request 13 as a matter of convenience---I also found some of @gonmator's fixes while fixing this call.

#!/bin/bash

# step3.sh

# (First loading venv, fixing path to aff4.py ...)

rm -rf extraction_flat
mkdir extraction_flat

# Note that the last argument here will not be necessary if PR 16 is incorporated.
python .../aff4.py \
  --extract-all \
  --folder extraction_flat \
  flat.aff4 \
  extraction_flat
  1. Extract everything from the aff4 archive. Currently fails.

PR 14 should be integrated in order to see step3.sh below fail in the illustrative way.

#!/bin/bash

# step4.sh

# (First loading venv, fixing path to aff4.py ...)

rm -rf extraction_deep
mkdir extraction_deep

# Note that the last argument here will not be necessary if PR 16 is incorporated.
python .../aff4.py \
  --extract-all \
  --folder extraction_deep \
  deep.aff4 \
  extraction_deep

Traceback of step4.sh:

Traceback (most recent call last):
  File "../deps/pyaff4/aff4.py", line 421, in <module>
    main(sys.argv)
  File "../deps/pyaff4/aff4.py", line 414, in main
    extractAll(dest, args.folder)
  File "../deps/pyaff4/aff4.py", line 312, in extractAll
    with open(destFile, "wb") as destStream:
IsADirectoryError: [Errno 21] Is a directory: 'extraction_deep/deep.zip/input_dir_1'

Resolution confirmation

When step4.sh above creates this file hierarchy, this Issue's good to close.

$ find extraction
extraction_deep
extraction_deep/deep.zip
extraction_deep/deep.zip/file3.txt
extraction_deep/deep.zip/input_dir_1
extraction_deep/deep.zip/input_dir_1/file4.txt

Why does pyaff4 needs aff4-snappy instead of python-snappy?

pyaff4 depends on aff4-snappy

Which is forked from python-snappy:

@blschatz @scudette I'm looking into building pyaff4 for testing purposes, some questions:

  • What makes aff4-snappy different from python-snappy?
  • Could the changes in aff4-snappy be merged into python-snappy?
  • Could pyaff4 be changed to use python-snappy instead of aff4-snappy?

Missing FS metadata when using --append

Dear pyaff4-maintainers,

when I use aff4.py from the master-branch (commit 94a3583) and specify --append no filesystem metadata is recorded inside the resulting container.
Here are the following steps to reproduce the issue:

echo "a" > a
echo "b" > b

python3 pyaff4/aff4.py --create-logical --paranoid --hash test.aff4 ./a
python3 pyaff4/aff4.py --create-logical --append --paranoid --hash test.aff4 ./b

When running --meta aftwards only FS metadata for a is displayed:

@prefix aff4: <http://aff4.org/Schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<aff4://961a3f9e-0dbf-4084-80c9-bf2e5ef74f5a> a aff4:ImageStream ;
    aff4:chunkSize 32768 ;
    aff4:chunksInSegment 1024 ;
    aff4:compressionMethod <http://code.google.com/p/snappy/> ;
    aff4:size 2 .

<aff4://b1062223-5dc0-4759-939f-08d08469493e//a> a aff4:FileImage,
        aff4:Image,
        aff4:Map ;
    aff4:birthTime "2021-12-26T14:59:50.257219+01:00"^^xsd:string ;
    aff4:hash "60b725f10c9c85c70d97880dfe8191b3"^^aff4:MD5,
        "3f786850e387550fdab836ed7e6dc881de23001b"^^aff4:SHA1 ;
    aff4:lastAccessed "2021-12-26T15:06:37.995319+01:00"^^xsd:string ;
    aff4:lastWritten "2021-12-26T15:06:46.735360+01:00"^^xsd:string ;
    aff4:originalFileName "./a"^^xsd:string ;
    aff4:recordChanged "2021-12-26T15:06:46.735360+01:00"^^xsd:string ;
    aff4:size 2 .

<aff4://b1062223-5dc0-4759-939f-08d08469493e//b> a aff4:FileImage,
        aff4:Image,
        aff4:Map ;
    aff4:originalFileName "./b"^^xsd:string .

<aff4://d9f08e06-ccc7-4ca9-99af-e1f91be71857> a aff4:ImageStream ;
    aff4:chunkSize 32768 ;
    aff4:chunksInSegment 1024 ;
    aff4:compressionMethod <http://code.google.com/p/snappy/> ;
    aff4:size 2 .

<aff4:sha512:FisLMvAkgtWsoKfJPdA86sOs1-QQpfGPP7mQ_JWK4N9vMiM7kYMer5nKWBqMTd-ci6MVrEgtttTqAcx4hKY1vg==> aff4:dataStream <aff4://961a3f9e-0dbf-4084-80c9-bf2e5ef74f5a[0x0:0x2]> .

<aff4:sha512:hopqxuHQKT10-tB_bZWVKz4B09MVPbZ3p12Ad5g_1OMNtr_Im3YIqT-yZGkjOp8aCVctaHqcXaeLID6xUQQKFQ==> aff4:dataStream <aff4://d9f08e06-ccc7-4ca9-99af-e1f91be71857[0x0:0x2]> .


        ./a <aff4://b1062223-5dc0-4759-939f-08d08469493e//a>
                [0,2] -> aff4:sha512:FisLMvAkgtWsoKfJPdA86sOs1-QQpfGPP7mQ_JWK4N9vMiM7kYMer5nKWBqMTd-ci6MVrEgtttTqAcx4hKY1vg==[0,2]
        ./b <aff4://b1062223-5dc0-4759-939f-08d08469493e//b>
                [0,2] -> aff4:sha512:hopqxuHQKT10-tB_bZWVKz4B09MVPbZ3p12Ad5g_1OMNtr_Im3YIqT-yZGkjOp8aCVctaHqcXaeLID6xUQQKFQ==[0,2]

Can you confirm that this is a bug?
If it is intended behaviour, how can I save FS metadata while using --append?

Thanks already in advance for a short reply and thank you for your work on pyaff4.

Best regards,
jgru

Handle symlinks and character devices

I was testing pyaff4 with an M1 Mac attached to another Mac using Mac Sharing mode, but it stalled after a short while. The last few lines of console output are:

	Adding: /Volumes/Untitled/Volumes/Untitled/Applications/iTerm.app
	Adding: /Volumes/Untitled/Volumes/Untitled/Applications/Developer.app
	Adding: /Volumes/Untitled/Volumes/Untitled/Applications/Firefox.app
	Adding: /Volumes/Untitled/Volumes/Untitled/dev/console
  • /Volumes/Untitled is where the M1 Mac is mounted (using sharing mode) on the second Mac.
  • /Volumes/Untitled/Volumes/Untitled is a symlink to /.
  • /Volumes/Untitled/Volumes/Untitled/dev/console is therefore /dev/console on the second Mac.

pyaff4 should not follow symlinks and it should not try to read /dev/console.

Undefined Variable 'file'

Undefined variable error in extract function:

Traceback (most recent call last):
  File "C:\test\pyaff4-master\aff4.py", line 421, in <module>
    main(sys.argv)
  File "C:\test\pyaff4-master\aff4.py", line 411, in main
    extract(dest, args.srcFiles, args.folder)
  File "C:\test\pyaff4-master\aff4.py", line 331, in extract
    printVolumeInfo(file, volume)
NameError: name 'file' is not defined

Getting a ChunkLen Error "not defined"

I am getting the following error while i am trying to create a aff4 file.
I am running the following command:

python3 aff4.py -r --create-logical test.aff4 /home/development/E01_Files/

    Adding: /home/development/E01_Files/Program Files/Messengernewalert.wav
    Adding: /home/development/E01_Files/Program Files/Mozilla Firefoxplc4.dll
    Adding: /home/development/E01_Files/Program Files/AIM6coolcore52.dll
    Adding: /home/development/E01_Files/Program Files/Windows Media Player
    Adding: /home/development/E01_Files/Program Files/NetMeetingmst123.dll
    Adding: /home/development/E01_Files/Program Files/Movie Makermoviemk.exe

Traceback (most recent call last):
File "aff4.py", line 497, in
main(sys.argv)
File "aff4.py", line 475, in main
addPathNames(dest, args.srcFiles, args.recursive, args.append, args.hash, args.password)
File "aff4.py", line 319, in addPathNames
addPathNamesToVolume(resolver, volume, pathnames, recursive, hashbased)
File "aff4.py", line 292, in addPathNamesToVolume
urn = volume.writeLogicalStream(pathname, hasher, fsmeta.length)
File "/home/nc3admin/nc3apps/development/E01_Files/pyaff4/pyaff4/container.py", line 404, in writeLogicalStream
self.writeCompressedBlockStream(image_urn, filename, readstream)
File "/home/nc3admin/nc3apps/development/E01_Files/pyaff4/pyaff4/container.py", line 354, in writeCompressedBlockStream
stream.WriteStream(readstream)
File "/home/nc3admin/nc3apps/development/E01_Files/pyaff4/pyaff4/aff4_image.py", line 214, in WriteStream
bevy.WriteStream(stream, progress=progress)
File "/home/nc3admin/nc3apps/development/E01_Files/pyaff4/pyaff4/zip.py", line 516, in WriteStream
owner.StreamAddMember(
File "/home/nc3admin/nc3apps/development/E01_Files/pyaff4/pyaff4/zip.py", line 952, in StreamAddMember
data = stream.read(BUFF_SIZE)
File "/home/nc3admin/nc3apps/development/E01_Files/pyaff4/pyaff4/aff4_image.py", line 91, in read
if chunkLen < self.owner.chunk_size:
NameError: name 'chunkLen' is not defined

samples/extract_streams.py not compatible with lib

$ python samples/extract_streams.py  ../t.aff4 
Traceback (most recent call last):
  File "/home/r/pyaff4/samples/extract_streams.py", line 18, in <module>
    with zip.ZipFile.NewZipFile(resolver, volume_path_urn) as volume:
TypeError: NewZipFile() missing 1 required positional argument: 'backing_store_urn'

Indeed, NewZipFile required 3 arguments

Provide and executable pyaff4

Can an .exe of pyaff4 be provided for running on Windows and be portable, so it can be used withou python needed? It seems that pyaff4 is more advanced and updated then c-aff4.

Thank you.

update tzlocal version

pyaff4 uses an old version of tzlocal (1.5.1), which throws warnings when on python 3.8

..../env/lib/python3.8/site-packages/tzlocal-1.5.1-py3.8.egg/tzlocal/unix.py:108: 
SyntaxWarning: "is not" with a literal. Did you mean "!="?
..../env/lib/python3.8/site-packages/tzlocal-1.5.1-py3.8.egg/tzlocal/unix.py:108: 
SyntaxWarning: "is not" with a literal. Did you mean "!="?

Should upgrade tzlocal to latest (2.1), which fixes this.

Possible bug using the aff4.py basic script in a Windows environnement

Hello,

I tried to create an aff4 container using the following command:
python aff4.py -c container.aff4 source_file
I had the following error:

Creating AFF4Container: file://container.aff4 <aff4://64ec2613-a8cf-4c52-8866-f8d1570f8634>
        Adding: source_file
Traceback (most recent call last):
  File "pyaff4\aff4.py", line 496, in <module>
    main(sys.argv)
  File "pyaff4\aff4.py", line 474, in main
    addPathNames(dest, args.srcFiles, args.recursive, args.append, args.hash, args.password)
  File "pyaff4\aff4.py", line 319, in addPathNames
    addPathNamesToVolume(resolver, volume, pathnames, recursive, hashbased)
  File "pyaff4\aff4.py", line 296, in addPathNamesToVolume
    fsmeta.store(resolver)
  File "C:\Users\Tim\Documents\Projets\Test\AFF4\pyaff4\pyaff4\logical.py", line 130, in store
    resolver.Set(resolver.urn, self.urn, rdfvalue.URN(lexicon.AFF4_STREAM_SIZE), rdfvalue.XSDInteger(self.length))
AttributeError: 'MemoryDataStore' object has no attribute 'urn'

I'm running python27 on a Windows machine. I had the same issue with python37.

I fixed the problem by changing the following lines in pyaff4/logical.py (l.130 to 133):

class WindowsFSMetadata(FSMetadata):
...
    def store(self, resolver):
        resolver.Set(resolver.urn, self.urn, rdfvalue.URN(lexicon.AFF4_STREAM_SIZE), rdfvalue.XSDInteger(self.length))
        resolver.Set(resolver.urn, self.urn, rdfvalue.URN(lexicon.standard11.lastWritten), rdfvalue.XSDDateTime(self.lastWritten))
        resolver.Set(resolver.urn, self.urn, rdfvalue.URN(lexicon.standard11.lastAccessed), rdfvalue.XSDDateTime(self.lastAccessed))
        resolver.Set(resolver.urn, self.urn, rdfvalue.URN(lexicon.standard11.birthTime), rdfvalue.XSDDateTime(self.birthTime))

Changed to:

class WindowsFSMetadata(FSMetadata):
...
    def store(self, resolver):
        resolver.Set(self.urn, self.urn, rdfvalue.URN(lexicon.AFF4_STREAM_SIZE), rdfvalue.XSDInteger(self.length))
        resolver.Set(self.urn, self.urn, rdfvalue.URN(lexicon.standard11.lastWritten), rdfvalue.XSDDateTime(self.lastWritten))
        resolver.Set(self.urn, self.urn, rdfvalue.URN(lexicon.standard11.lastAccessed), rdfvalue.XSDDateTime(self.lastAccessed))
        resolver.Set(self.urn, self.urn, rdfvalue.URN(lexicon.standard11.birthTime), rdfvalue.XSDDateTime(self.birthTime))

I'm not very familiar with bug correction so let me know if you want me to send a pull request!

TypeError enumerating images a PreStdLogicalImageContainer

A TypeError is raised when listing the content of a volume instance of a PreStdLogicalImageContainer. (For instance, calling the printLogicalImageInfo() function in aff4.py. The error I got:

TypeError: __init__() missing 1 required positional argument: 'pathName'

I found the issue is in container.LogicalImageContainer and container.PreStdLogicalImageContainer classes. While the method images() of the first class initialize correctly the LogicalImage instance to be yield:

yield aff4.LogicalImage(self, self.resolver, self.urn, image, pathName)

the method images() of PreStdLogicalImageContainer class and method open() of both classes missed to pass the container (self) as first parameter:

yield aff4.LogicalImage(self.resolver, self.urn, image, pathName)
...
return aff4.LogicalImage(self.resolver, self.urn, urn, pathName)

Initializer of aff4.LogicalImage expects the container as first parameter:

class LogicalImage(AFF4Object):
    def __init__(self, container, resolver, volume, urn, pathName):
        super(LogicalImage, self).__init__(resolver, urn)
        self.volume = volume
        self.pathName = pathName
        self.container = container

Just passing self as first parameter should solve the issue.

(I could prepare a pull request if needed, but I would like to have some guide about branch naming, etc.)

Extreme Slowness after version 0.31

Something has changed that has made reading a lot slower. I'm using pyaff4 to read full disk images. With version 0.31, it takes about 3 minutes to read an APFS filesystem randomly seeking into various offsets to read 4KiB blocks at a time.
Same code with version 0.32 or above takes ~70 minutes to do so.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.