aff4 / pyaff4 Goto Github PK
View Code? Open in Web Editor NEWThe Python implementation of the AFF4 standard.
License: Apache License 2.0
The Python implementation of the AFF4 standard.
License: Apache License 2.0
I wrote a test to confirm that files in an aff4 archive created with pyaff4 match what I expect them to be, by using aff.py --extract-all
. Unfortunately, dumping files fails, because a directory from my input is treated like a file. It appears to be an issue that affects all directories.
This processing path follows creating an aff4 archive from scratch using a zip. (Particularly, this is a zipped LoC Bag, though I don't think that has an impact apart from an internal path name not entirely relevant to this bug.) Reproduction instructions are included.
Every member of a zip, whether a file or directory, appears to be assigned the type aff4:FileImage
per the --meta
dump from the .aff4
file. I'm guessing in-zip directories should instead be aff4:FolderImage
, as this query is being used to feed a loop:
for imageUrn in resolver.QueryPredicateObject(volume.urn, lexicon.AFF4_TYPE, lexicon.standard11.FileImage)
And in that loop, every FileImage
is being created/treated as regular file. A directory thrown in the mix raises a IsADirectoryError
.
In the function BasicZipFile.parse_cd
, somewhere before the info message on line 694, a check needs to be made for the file being a directory. The since-Python-3.6 method of checking for the last character of the name being "/
" should do.
However, I don't know the code well enough to suggest where that information be integrated (aside from a check soon after fn
is defined in that function), and propagated to causing a aff4:FolderImage
. The ZipInfo
class in that file?
The code segments below work when run as individual shell scripts, confirmed on an Ubuntu 18.04 system.
#!/bin/bash
# step1.sh
rm -rf deep flat
mkdir -p flat
mkdir -p deep/input_dir_1
echo 'file 1' > flat/file1.txt
echo 'file 2' > flat/file2.txt
pushd flat
zip -r ../flat.zip .
popd
rm -r flat
echo 'file 3' > deep/file3.txt
echo 'file 4' > deep/input_dir_1/file4.txt
pushd deep
zip -r ../deep.zip .
popd
rm -r deep
#!/bin/bash
# step2.sh
# (First loading venv, fixing path to aff4.py ...)
python .../aff4.py \
--hash \
--ingest \
--paranoid \
--recursive \
flat.aff4 \
flat.zip
python .../aff4.py \
--hash \
--ingest \
--paranoid \
--recursive \
deep.aff4 \
deep.zip
Pull Request 14 fixes an unrelated issue with the way extractAll
is called, and updates Pull Request 13 as a matter of convenience---I also found some of @gonmator's fixes while fixing this call.
#!/bin/bash
# step3.sh
# (First loading venv, fixing path to aff4.py ...)
rm -rf extraction_flat
mkdir extraction_flat
# Note that the last argument here will not be necessary if PR 16 is incorporated.
python .../aff4.py \
--extract-all \
--folder extraction_flat \
flat.aff4 \
extraction_flat
PR 14 should be integrated in order to see step3.sh
below fail in the illustrative way.
#!/bin/bash
# step4.sh
# (First loading venv, fixing path to aff4.py ...)
rm -rf extraction_deep
mkdir extraction_deep
# Note that the last argument here will not be necessary if PR 16 is incorporated.
python .../aff4.py \
--extract-all \
--folder extraction_deep \
deep.aff4 \
extraction_deep
Traceback of step4.sh
:
Traceback (most recent call last):
File "../deps/pyaff4/aff4.py", line 421, in <module>
main(sys.argv)
File "../deps/pyaff4/aff4.py", line 414, in main
extractAll(dest, args.folder)
File "../deps/pyaff4/aff4.py", line 312, in extractAll
with open(destFile, "wb") as destStream:
IsADirectoryError: [Errno 21] Is a directory: 'extraction_deep/deep.zip/input_dir_1'
When step4.sh
above creates this file hierarchy, this Issue's good to close.
$ find extraction
extraction_deep
extraction_deep/deep.zip
extraction_deep/deep.zip/file3.txt
extraction_deep/deep.zip/input_dir_1
extraction_deep/deep.zip/input_dir_1/file4.txt
I am getting the following error while i am trying to create a aff4 file.
I am running the following command:
python3 aff4.py -r --create-logical test.aff4 /home/development/E01_Files/
Adding: /home/development/E01_Files/Program Files/Messengernewalert.wav
Adding: /home/development/E01_Files/Program Files/Mozilla Firefoxplc4.dll
Adding: /home/development/E01_Files/Program Files/AIM6coolcore52.dll
Adding: /home/development/E01_Files/Program Files/Windows Media Player
Adding: /home/development/E01_Files/Program Files/NetMeetingmst123.dll
Adding: /home/development/E01_Files/Program Files/Movie Makermoviemk.exe
Traceback (most recent call last):
File "aff4.py", line 497, in
main(sys.argv)
File "aff4.py", line 475, in main
addPathNames(dest, args.srcFiles, args.recursive, args.append, args.hash, args.password)
File "aff4.py", line 319, in addPathNames
addPathNamesToVolume(resolver, volume, pathnames, recursive, hashbased)
File "aff4.py", line 292, in addPathNamesToVolume
urn = volume.writeLogicalStream(pathname, hasher, fsmeta.length)
File "/home/nc3admin/nc3apps/development/E01_Files/pyaff4/pyaff4/container.py", line 404, in writeLogicalStream
self.writeCompressedBlockStream(image_urn, filename, readstream)
File "/home/nc3admin/nc3apps/development/E01_Files/pyaff4/pyaff4/container.py", line 354, in writeCompressedBlockStream
stream.WriteStream(readstream)
File "/home/nc3admin/nc3apps/development/E01_Files/pyaff4/pyaff4/aff4_image.py", line 214, in WriteStream
bevy.WriteStream(stream, progress=progress)
File "/home/nc3admin/nc3apps/development/E01_Files/pyaff4/pyaff4/zip.py", line 516, in WriteStream
owner.StreamAddMember(
File "/home/nc3admin/nc3apps/development/E01_Files/pyaff4/pyaff4/zip.py", line 952, in StreamAddMember
data = stream.read(BUFF_SIZE)
File "/home/nc3admin/nc3apps/development/E01_Files/pyaff4/pyaff4/aff4_image.py", line 91, in read
if chunkLen < self.owner.chunk_size:
NameError: name 'chunkLen' is not defined
$ python samples/extract_streams.py ../t.aff4
Traceback (most recent call last):
File "/home/r/pyaff4/samples/extract_streams.py", line 18, in <module>
with zip.ZipFile.NewZipFile(resolver, volume_path_urn) as volume:
TypeError: NewZipFile() missing 1 required positional argument: 'backing_store_urn'
Indeed, NewZipFile required 3 arguments
pyaff4 uses an old version of tzlocal (1.5.1), which throws warnings when on python 3.8
..../env/lib/python3.8/site-packages/tzlocal-1.5.1-py3.8.egg/tzlocal/unix.py:108:
SyntaxWarning: "is not" with a literal. Did you mean "!="?
..../env/lib/python3.8/site-packages/tzlocal-1.5.1-py3.8.egg/tzlocal/unix.py:108:
SyntaxWarning: "is not" with a literal. Did you mean "!="?
Should upgrade tzlocal to latest (2.1), which fixes this.
pyaff4 depends on aff4-snappy
Which is forked from python-snappy:
@blschatz @scudette I'm looking into building pyaff4 for testing purposes, some questions:
Something has changed that has made reading a lot slower. I'm using pyaff4 to read full disk images. With version 0.31, it takes about 3 minutes to read an APFS filesystem randomly seeking into various offsets to read 4KiB blocks at a time.
Same code with version 0.32 or above takes ~70 minutes to do so.
Undefined variable error in extract function:
Traceback (most recent call last):
File "C:\test\pyaff4-master\aff4.py", line 421, in <module>
main(sys.argv)
File "C:\test\pyaff4-master\aff4.py", line 411, in main
extract(dest, args.srcFiles, args.folder)
File "C:\test\pyaff4-master\aff4.py", line 331, in extract
printVolumeInfo(file, volume)
NameError: name 'file' is not defined
Dear pyaff4-maintainers,
when I use aff4.py
from the master-branch (commit 94a3583) and specify --append
no filesystem metadata is recorded inside the resulting container.
Here are the following steps to reproduce the issue:
echo "a" > a
echo "b" > b
python3 pyaff4/aff4.py --create-logical --paranoid --hash test.aff4 ./a
python3 pyaff4/aff4.py --create-logical --append --paranoid --hash test.aff4 ./b
When running --meta
aftwards only FS metadata for a
is displayed:
@prefix aff4: <http://aff4.org/Schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<aff4://961a3f9e-0dbf-4084-80c9-bf2e5ef74f5a> a aff4:ImageStream ;
aff4:chunkSize 32768 ;
aff4:chunksInSegment 1024 ;
aff4:compressionMethod <http://code.google.com/p/snappy/> ;
aff4:size 2 .
<aff4://b1062223-5dc0-4759-939f-08d08469493e//a> a aff4:FileImage,
aff4:Image,
aff4:Map ;
aff4:birthTime "2021-12-26T14:59:50.257219+01:00"^^xsd:string ;
aff4:hash "60b725f10c9c85c70d97880dfe8191b3"^^aff4:MD5,
"3f786850e387550fdab836ed7e6dc881de23001b"^^aff4:SHA1 ;
aff4:lastAccessed "2021-12-26T15:06:37.995319+01:00"^^xsd:string ;
aff4:lastWritten "2021-12-26T15:06:46.735360+01:00"^^xsd:string ;
aff4:originalFileName "./a"^^xsd:string ;
aff4:recordChanged "2021-12-26T15:06:46.735360+01:00"^^xsd:string ;
aff4:size 2 .
<aff4://b1062223-5dc0-4759-939f-08d08469493e//b> a aff4:FileImage,
aff4:Image,
aff4:Map ;
aff4:originalFileName "./b"^^xsd:string .
<aff4://d9f08e06-ccc7-4ca9-99af-e1f91be71857> a aff4:ImageStream ;
aff4:chunkSize 32768 ;
aff4:chunksInSegment 1024 ;
aff4:compressionMethod <http://code.google.com/p/snappy/> ;
aff4:size 2 .
<aff4:sha512:FisLMvAkgtWsoKfJPdA86sOs1-QQpfGPP7mQ_JWK4N9vMiM7kYMer5nKWBqMTd-ci6MVrEgtttTqAcx4hKY1vg==> aff4:dataStream <aff4://961a3f9e-0dbf-4084-80c9-bf2e5ef74f5a[0x0:0x2]> .
<aff4:sha512:hopqxuHQKT10-tB_bZWVKz4B09MVPbZ3p12Ad5g_1OMNtr_Im3YIqT-yZGkjOp8aCVctaHqcXaeLID6xUQQKFQ==> aff4:dataStream <aff4://d9f08e06-ccc7-4ca9-99af-e1f91be71857[0x0:0x2]> .
./a <aff4://b1062223-5dc0-4759-939f-08d08469493e//a>
[0,2] -> aff4:sha512:FisLMvAkgtWsoKfJPdA86sOs1-QQpfGPP7mQ_JWK4N9vMiM7kYMer5nKWBqMTd-ci6MVrEgtttTqAcx4hKY1vg==[0,2]
./b <aff4://b1062223-5dc0-4759-939f-08d08469493e//b>
[0,2] -> aff4:sha512:hopqxuHQKT10-tB_bZWVKz4B09MVPbZ3p12Ad5g_1OMNtr_Im3YIqT-yZGkjOp8aCVctaHqcXaeLID6xUQQKFQ==[0,2]
Can you confirm that this is a bug?
If it is intended behaviour, how can I save FS metadata while using --append
?
Thanks already in advance for a short reply and thank you for your work on pyaff4
.
Best regards,
jgru
Can an .exe of pyaff4 be provided for running on Windows and be portable, so it can be used withou python needed? It seems that pyaff4 is more advanced and updated then c-aff4.
Thank you.
I was testing pyaff4 with an M1 Mac attached to another Mac using Mac Sharing mode, but it stalled after a short while. The last few lines of console output are:
Adding: /Volumes/Untitled/Volumes/Untitled/Applications/iTerm.app
Adding: /Volumes/Untitled/Volumes/Untitled/Applications/Developer.app
Adding: /Volumes/Untitled/Volumes/Untitled/Applications/Firefox.app
Adding: /Volumes/Untitled/Volumes/Untitled/dev/console
/Volumes/Untitled
is where the M1 Mac is mounted (using sharing mode) on the second Mac./Volumes/Untitled/Volumes/Untitled
is a symlink to /
./Volumes/Untitled/Volumes/Untitled/dev/console
is therefore /dev/console
on the second Mac.pyaff4 should not follow symlinks and it should not try to read /dev/console.
A TypeError is raised when listing the content of a volume instance of a PreStdLogicalImageContainer
. (For instance, calling the printLogicalImageInfo()
function in aff4.py. The error I got:
TypeError: __init__() missing 1 required positional argument: 'pathName'
I found the issue is in container.LogicalImageContainer
and container.PreStdLogicalImageContainer
classes. While the method images()
of the first class initialize correctly the LogicalImage
instance to be yield:
yield aff4.LogicalImage(self, self.resolver, self.urn, image, pathName)
the method images()
of PreStdLogicalImageContainer
class and method open()
of both classes missed to pass the container (self
) as first parameter:
yield aff4.LogicalImage(self.resolver, self.urn, image, pathName)
...
return aff4.LogicalImage(self.resolver, self.urn, urn, pathName)
Initializer of aff4.LogicalImage
expects the container as first parameter:
class LogicalImage(AFF4Object):
def __init__(self, container, resolver, volume, urn, pathName):
super(LogicalImage, self).__init__(resolver, urn)
self.volume = volume
self.pathName = pathName
self.container = container
Just passing self
as first parameter should solve the issue.
(I could prepare a pull request if needed, but I would like to have some guide about branch naming, etc.)
The latest pypi push broke all dependent projects because the api has changed. Pyaff4 should start to use semantic versioning from now on to ensure people can rely on it being stable moving forward.
Hello,
I tried to create an aff4 container using the following command:
python aff4.py -c container.aff4 source_file
I had the following error:
Creating AFF4Container: file://container.aff4 <aff4://64ec2613-a8cf-4c52-8866-f8d1570f8634>
Adding: source_file
Traceback (most recent call last):
File "pyaff4\aff4.py", line 496, in <module>
main(sys.argv)
File "pyaff4\aff4.py", line 474, in main
addPathNames(dest, args.srcFiles, args.recursive, args.append, args.hash, args.password)
File "pyaff4\aff4.py", line 319, in addPathNames
addPathNamesToVolume(resolver, volume, pathnames, recursive, hashbased)
File "pyaff4\aff4.py", line 296, in addPathNamesToVolume
fsmeta.store(resolver)
File "C:\Users\Tim\Documents\Projets\Test\AFF4\pyaff4\pyaff4\logical.py", line 130, in store
resolver.Set(resolver.urn, self.urn, rdfvalue.URN(lexicon.AFF4_STREAM_SIZE), rdfvalue.XSDInteger(self.length))
AttributeError: 'MemoryDataStore' object has no attribute 'urn'
I'm running python27 on a Windows machine. I had the same issue with python37.
I fixed the problem by changing the following lines in pyaff4/logical.py (l.130 to 133):
class WindowsFSMetadata(FSMetadata):
...
def store(self, resolver):
resolver.Set(resolver.urn, self.urn, rdfvalue.URN(lexicon.AFF4_STREAM_SIZE), rdfvalue.XSDInteger(self.length))
resolver.Set(resolver.urn, self.urn, rdfvalue.URN(lexicon.standard11.lastWritten), rdfvalue.XSDDateTime(self.lastWritten))
resolver.Set(resolver.urn, self.urn, rdfvalue.URN(lexicon.standard11.lastAccessed), rdfvalue.XSDDateTime(self.lastAccessed))
resolver.Set(resolver.urn, self.urn, rdfvalue.URN(lexicon.standard11.birthTime), rdfvalue.XSDDateTime(self.birthTime))
Changed to:
class WindowsFSMetadata(FSMetadata):
...
def store(self, resolver):
resolver.Set(self.urn, self.urn, rdfvalue.URN(lexicon.AFF4_STREAM_SIZE), rdfvalue.XSDInteger(self.length))
resolver.Set(self.urn, self.urn, rdfvalue.URN(lexicon.standard11.lastWritten), rdfvalue.XSDDateTime(self.lastWritten))
resolver.Set(self.urn, self.urn, rdfvalue.URN(lexicon.standard11.lastAccessed), rdfvalue.XSDDateTime(self.lastAccessed))
resolver.Set(self.urn, self.urn, rdfvalue.URN(lexicon.standard11.birthTime), rdfvalue.XSDDateTime(self.birthTime))
I'm not very familiar with bug correction so let me know if you want me to send a pull request!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.