jessek / hashdeep Goto Github PK

View Code? Open in Web Editor NEW

696.0 696.0 132.0 6.51 MB

License: Other

Shell 7.55% C 41.79% C++ 45.66% Makefile 0.96% M4 4.04%

hashdeep's People

Contributors

Stargazers

Watchers

Forkers

chiehwen mnagel dago peterclemenko chudel ajnelson wagnerp armuk spnow henry-bennett bradyu gdemo1 esven davidstrauss jonathanschmid emuxevans robinmathewrajan samwilson kyonetca meizhoubao yekigoc rootbeer sdigit joswr1ght richardranft profcab influencia0406 dmfreemon vkorichkov cloudxtreme mfalhi jbohling dinkin fidergo-stephane-gourichon madivad tkelman rahul138 ols3er bogeyman ai8rahim trabing poeblu queensgate5 diamondhead90 dericed deveynull idkwim cnsuhao ethus3h universal-it-systems olivierh59500 kangamoo sethcohn ohio813 abdihaikal saranajafi ctr mattsee mathguru weeshlow glcraft actionjackson80 hoyt-harness s-d-adams shelltips i5r3doe2l 91thu6us vgoncharenko tinkertux forkme7 andrewnimmo tomotan mohamed00736 nixworks datafiend modulexcite kraj bluecodecat woorifis grevutiu-gabriel pebsconsulting shyang seabreg mimoune00 misko9 muthhus sliceofbytes julio1925 m4k3r-org haichuan0424 cymgen30 tzehringer blue-infosec 5l1v3r1 zymitsky richtong pullmoll tank0226 brendanhoar codeshane

hashdeep's Issues

-x mode produces message "stdin matches"

Converted from SourceForge issue 1256914, submitted by jessekornblum

When using the -x or -X modes with standard input, the
program will display "stdin matches" if the data
supplied via standard input does not match any of the
known hashes. This message should be changed to "stdin
does not match" for -x and -X modes.

Unhandled symlink loops

Converted from SourceForge issue 932476, submitted by gregorycjohnson

md5deep follows symlinks and does not do loop
detection.

Seems to me loop detection is unjustifiably complex in
forensic software. Thus, I suggest a command line
argument to ignore symlinks. (Which is what I thought "-
o f" would do)

See also: Feature requests 932473 and 932473.

SHA-1 wrong on AIX

Converted from SourceForge issue 1532071, submitted by jessekornblum

sha1deep version 1.12 produces the wrong result on
64-bit AIX systems. Example:

aix$ uname -a
AIX rs5 2 5 00CDFB3E4C00

aix$ echo "" | sha1deep
fd9876a88849d0cea696c78aee6678bc88c48aed

The correct result is:
adc83b19e793491b1c6ea0fd8b46cd9f32e592fc

This may be related to using variables that do not
specify the byte width.

Usage message should display all command line options

The current usage messages are limited to 22 lines of text--the default size of a Win32 command prompt window. Without access to the HASHDEEP.TXT file, users can't get a full help message. There's no Win32 equivalent to 'man hashdeep'. The -h option should include all of the command line options. Another option, expand the -hh mode to display all options.

Tigerdeep displays results in big endian

Converted from SourceForge issue 1421377, submitted by jessekornblum

Tigerdeep uses the reference implementation of the Tiger algorithm which displays the results in big endian format. Most other algorithms like MD5 and SHA-1 display their results in little endian format. Tiger should also be displayed in little endian. From one of the Perl modules implementing Tiger:

http://search.cpan.org/~clintdw/Digest-Tiger-0.02/Tiger.pm#NOTE

As of version 0.02, hexhash() returns a hex digest starting with the least significant byte first. For example:

Hash of "Tiger":
0 7 8 15 16 23
DD00230799F5009F EC6DEBC838BB6A27 DF2B9D6F110C7937

Instead of:
7 0 15 8 23 16
9F00F599072300DD 276ABB38C8EB6DEC 37790C116F9D2BDF

The print order issue was brought up by Gordon Mohr; Eli Biham clarifies with: "The testtiger.c was intended to allow easy testing of the code, rather than to define any particular print order. ...using a standard printing method, like the one for MD5 or SHA-1, the DD should probably should be printed first [for the example above]".

** Tigerdeep needs to be fixed to display hashes in little endian **

Regexp file spec causes up-tree recursion

Converted from SourceForge issue 932420, submitted by gregorycjohnson

The DOS style "md5deep -re -o f *" does not recurse.

This implies a regexp style filespec, or "md5deep -re -o
f .*"

This spec results in up-tree recursion after the current
directory is complete. (presumably via "..")

I believe there should be (at a minimum) a reference to
this behavior in the documentation.

Thanks,
-Greg

Recursive Filtering (-l -r *.txt)

Converted from SourceForge issue 1115142, submitted by nobody

When doing a recursive hash, if you specify a file other
than ., it ignores the recursive option. It would be
hugely beneficial to have the option to recursively hash
files of only a certain type. Example:

md5deep -l -r *.txt

This should (in theory) scan the current working
directory, and all subdirectories hashing only files with
extension txt. However (in actuality) it only hashes txt
files in the current directory. If you specify ., it works
fine, but hashes all files.

Is this known? Something you're working on? Not
something you plan to address? Thanks!

Using Triage Mode Output as Known Files

The programs should be able to accept Triage mode output files as files of known hashes. Ideally, output generated by the programs in -Z mode should be acceptable as input. Example:

$ sha256deep -Z * > ../known.txt
$ sha256deep -m ../known.txt
foo.txt
bar.txt

Makefile solaris won't compile

Converted from SourceForge issue 972847, submitted by nobody

I downloaded the source code and extracted the files
from the tarball.

After CD'ing into the md5deep directory, I typed in:
make solaris

There is an error on line 78 with an error
of "dependencies are included in line"....

Not sure if it's our machine or a compile bug

Thanks.

Expert mode becomes disabled when processing symlinks

Converted from SourceForge issue 1362492, submitted by jessekornblum

When processing a symbolic link to a directory, expert
mode becomes disabled. For example:

$ md5deep (-s -o fl -r /usr) follows a directory
(/usr/lib/cron) which is a symlink to /etc/cron.d:

[spinst4][uid=0(root)] $ ls -al /usr/lib/cron/
lrwxrwxrwx 1 root root 16 Aug 23 11:31
/usr/lib/cron/ -> ../../etc/cron.d

In this directory we have a FIFO (which causes md5deep
to hang):

[spinst4][uid=0(root)] $ ls -alL /usr/lib/cron/
total 32
drwxr-xr-x 2 root sys 512 Nov 9 14:35 .
drwxr-xr-x 58 root sys 6144 Nov 16 00:02 ..
-rwxr--r-- 1 root sys 72 Apr 7 2002
.proto
prw------- 1 root root 0 Nov 16 09:46 FIFO
-rw-r--r-- 1 root sys 43 Aug 23 12:44
at.allow

The problem is in dig.c:should_hash_expert() where
expert mode is stripped and the regular should hash is
called.

Extra characters on Win32 time estimation mode

Converted from SourceForge issue 1471976, submitted by jessekornblum

md5deep running on Win32 produces extra characters at
the end of each line while time estimation mode is
enabled. This appears to be similar to the bug fixed in
version 1.7.

man page on web site is out of date

Converted from SourceForge issue 1395540, submitted by jessekornblum

The man page on the md5deep web site
(http://md5deep.sourceforge.net/manpage.html) is out of
date. Please update it to reflect the latest version of
the program.

Program does not compile on some versions of Linux

Converted from SourceForge issue 1818223, submitted by jessekornblum

The program fails to compile on some versions of Linux (sorry I can't be more specific there) because the ioctl call to BLKSSZGET is not defined. A sample error message:

gcc -DHAVE_CONFIG_H -I. -I. -I. -I/usr/local/include
-I/usr/local/include -g -O2 -c test -f 'helpers.c' || echo './'helpers.c
helpers.c: In function find_file_size': helpers.c:190: error:BLKSSZGET' undeclared (first
use in this function)
helpers.c:190: error: (Each undeclared identifier is
reported only once
helpers.c:190: error: for each function it appears

This problem is known to occur on Debian 1:3.3.5-8ubuntu2 and some other systems. It does not appear to occur on the developer's Ubuntu machine.

Hashdeep audits fail when known files computed on different operating system

If you generate hashes recursively on a Windows (-c md5 -r -l *), it will have the Windows separator which is different from the Linux separator. So if you check these hashes from a Linux system, it will fail to verify saying the file has been moved.

First BSD hash skipped

Converted from SourceForge issue 1230875, submitted by jessekornblum

The program skips the first hash in a file of BSD hashes when operating in any of the matching modes. For example, given the file foo:

MD5 (report.doc) = b4b953517d0205fd8b034a58c65c893d
MD5 (finances.xls) = 1a2f4b08cef1d6fcdc9f814d66f8bb62
MD5 (checklist.txt) = 6acafcafdc0cda86b4834c0cca599164

and the command:

$ md5deep -m foo *

The hash for report.doc will not be read and thus won't be used in matching. Check match.c to see if we forget to rewind the file after identification?

Internal error on cycle checking

Converted from SourceForge issue 1721634, submitted by jessekornblum

A user on Win32 reported the following error while using time estimation mode.

"Internal error: Cycle checking failed to unregister directory. CONTACT DEVELOPER!"

The directory in question was C:\Documents and Settings[username]\Application Data\Microsoft\Installer{[SID deleted]}.

The user reported this error did not occur when not in time estimation mode.

Error displaying matching file names

Converted from SourceForge issue 1200202, submitted by jessekornblum

When using the -w mode, the program will occasionaly display the correct filename of the known file, but with some extra characters added on. This is most likely because the buffer being used to hold the known file names is not cleared between reads. A single memset call in match.c between each read should fix this.

Don't require spaces after known hashes

The programs currently require a space after each hash in a file of known hashes. Please modify the program to require either a space or a newline after the hash for validity.

Example:

$ md5deep < foo > known.txt

$ md5deep -m known.txt *
md5deep: known.txt: Unable to find any hashes in file, skipped.
md5deep: Unable to load any matching files.
Try md5deep -h for more information.

Report hashing performance

Would be useful in some circumstances if hashdeep would report performance MB/sec of hashing, both per-thread and aggregate.

stdout output isn't fflush()'ed

Converted from SourceForge issue 1440345, submitted by juza

When doing something like

md5deep -rl . > ../md5deep.txt

md5deep -rl . | less

(or other '*deep's as well, like sha1deep)

on a large directory full of large files, it takes a
long time before one sees any output/update. I mean,
the lines for finished hashes do not appear
immediately. This is because, in hash.c, there's only
printf()s without fflush()es.

Of course, one can use the -e flag to have a progress
indicator, but I find it a bit confusing to have no
entries although the program has finished calculaing
hashes for them.

(Note:

md5deep -rl .

works ok, because TTYs tend have a mechanism that does
fflush() after a newline character. But this is not the
case with output redirection and pipes.

http://c-faq.com/stdio/fflush.html)

One possible fix could be done by modifying helpers.c
like in the attached patch: add fflush(stdout) at the
end of the make_newline() function. I think this
shouldn't introduce too much I/O overhead as it makes
the program feel really doing the job. :)

(I'm currently creating checksum file for a directory
with approx. 150GiB of files, and now with a patched
version I can always 'ls -l' or 'tail -f' the checksum
file to see that the calculation keeps going... 9.7MiB
and counting ;)

Regards and thanks for the great program,
-- Jussi Sainio

It looks like the -n mode forces -0

Converted from SourceForge issue 1635942, submitted by nobody

when using the -n switch the ouput lookks like it was produced with the -0 (null) switch

[email protected]

Error on empty hash file

Converted from SourceForge issue 1911532, submitted by jessekornblum

The program appears to error out when asked to read a set of known hashes from an empty file.

$ rm -f foo
$ touch foo
$ md5deep -m foo *
*** malloc[27553]: Deallocation of a pointer not malloced: 0xa000a8d0; This could be a double free(), or free() called with the middle of an allocated block; Try setting environment variable MallocHelp to see tools to help debug
bar: Unable to find any hashes in file, skipped.
md5deep: Unable to load any matching files
Try md5deep -h for more information.

Write FAQ

We need a FAQ for the project. Some questions to include:

Loading files of known hashes under the WIndows PowerShell. (Powershell produces Unicode output for the hash values. The programs only know how to handle ASCII hash input.)

Double-clicking on the program doesn't do anything.

Program can't match against previous stdin input

Converted from SourceForge issue 1802967, submitted by jessekornblum

The programs cannot use a previously hashed standard input as a matching file. Here's a highly recursive example:

$ cat md5deep | ./md5deep > foo
$ cat foo
bff9ef86fe533084847... [a hash, but with no spaces at the end]
$ ./md5deep -m foo *
foo: Unable to find any hashes in file, skipped.
md5deep: Unable to load any matching files
Try 'md5deep -h' for more information.

We should be able to use the result of one md5deep hashing as the input to another.

sha1deep fails on some 64-bit platforms

Converted from SourceForge issue 1429864, submitted by jessekornblum

sha1deep produces wildly innacurate data and
occasionally generates internal errors when compiled on
some 64-bit platforms. The error appears to be
related to the use of unsigned int and unsigned long
variables in the sha1.c and sha1.h files.

progress indicator bugs

Converted from SourceForge issue 1231796, submitted by nobody

Hello,

the progress indicator occasionally displays not the
remaining time but something like ..

Firefox.zip: 0MB of 5MB done,
2022810292:1073741824:18446744069414

.. or the error messages "Unable to estimate remaining
time".

This behavior is not tied to specific files. As the
error only occurs occasionally, the same files are
usually processed without any errors.

The error occurs with md5deep 1.8-003 beta on Windows
XP Professional.

Regards

Time estimates can leave remnants behind

Converted from SourceForge issue 922644, submitted by jessekornblum

When using the -e (time estimation) mode, the initial output can sometimes not be erased by subsequent updates. Also, these updates can span more than one line.

Each Time estimate should:

Fit on one line
Completely erase the previous estimate.

Matching files starting with '(' are not output correctly

Converted from SourceForge issue 1415251, submitted by nobody

I ran md5deep with the options -rwm * to find
files that had been tranferred correctly from one
computer to another.

I piped the output of the program to a file, so that I
could turn this into a batch file (I'm moving from
Windows to OS-X) to delete the files on the source machine.

The output file contained lines like:
(Audiobook) matched Audiobook)

The .md5 file contains the (, like this:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx *(Audiobook)

The file was checked using md5deep version 1.8.

Permission Denied errors are not handled as fatal

Converted from SourceForge issue 1200971, submitted by jessekornblum

If a read operation gets a Permission Denied error
(which can happen, for example when trying to access a
PhysicalDevice via vmWare), the program can crash. The
solution is to handle Permission Denied as a fatal error.

Odd time estimates for large files

Converted from SourceForge issue 2022302, submitted by jessekornblum

As demonstrated in http://stam.blogs.com/.a/6a00d83451c2f969e200e553ac4dd48834-pi, the programs can give some strange time estimates for large files.

no hashes found in valid hashfile

Converted from SourceForge issue 1315126, submitted by nobody

When generating a hashfile using the Windows binary
v1.8 of md5deep there are no hashes found in that
hashfile, when every filename contains an open parenthese.
The hashfile thus contains an open paranthese in every
line. Using
md5deep -m test.md5
(where test.md5 is the valid hashfile) results in the

following output:

md5deep: test.md5: Unable to find any hashes in file,
skipped.
md5deep: Unable to load any matching files

Try `md5deep -h` for more information.

After adding the line
00000000000000000000000000000000 test
to test.md5 the file is processed as usual and all
files (including those that have open parantheses in
the name) are checked.
Adding
00000000000000000000000000000000 tes(t
does not help.

Best regards,
Sven Arnhold, [email protected]

Time estimates can leave remnants behind

Converted from SourceForge issue 922644, submitted by jessekornblum

When using the -e (time estimation) mode, the initial output can sometimes not be erased by subsequent updates. Also, these updates can span more than one line.

Each Time estimate should:

Fit on one line
Completely erase the previous estimate.

md5deep 1.11 doesn't compile on (Gentoo) Linux

Converted from SourceForge issue 1462302, submitted by juza

Hi!

Thanks for fixing the fflush issue. However, I ran into
trouble when trying to compile the new release on
Gentoo Linux. I think this is issue is relevant on
other distributions too.

The problem is that BLOCK_SIZE seems to be defined in
<sys/mount.h> and in hash.c the same name is used as a
variable. When the C preprocessor substitutes the name
with the defined number, it obviously won't compile.

My suggestion is to add #undef BLOCK_SIZE into
md5deep.h right after the line where <sys/mount.h> is
included. (Patch attached.) Of course, another solution
would be to use a name like BLOCKSIZE for the variable.

Best regards,
--jussi

Conversion error on -p <size>

Converted from SourceForge issue 1710202, submitted by nobody

I think there is an integer overflow if the parameter to the -p option is too big:

xena.local(1021)==> /usr/local/bin/md5deep -p 14205025792 nt.dd

cf25a37f76c9911e1bdab3f0054be7a9 /Users/jimmy/ntfs-encase/nt.dd offset 0-1320123904

18c67e1c1d7738254a07a4c555036afe /Users/jimmy/ntfs-encase/nt.dd offset 1320123904-2640247808
...

Note that it seems to have chopped off everything more than 46 bits :-)

xena.local(1065)==> bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
a=1320123904
obase=16
a
4EAF7E00
b=14205025792
b
34EAF7E00

Thanks,
Jim
[email protected]

"No errror" on Windows 98, but no hashes either

Converted from SourceForge issue 1846170, submitted by jessekornblum

Version 2.0.1 does not work on Windows 98. The command line:

c:> md5deep -r f:.

produces the output:

md5deep: No error

but doesn't compute any hashes either. This is probably due to the call to _tgetcwd in main.c, which calls _wgetcwd, which isn't supported in Win9x. The solution will probably be to detect Win9x and call the non-Unicode version of this (and other) functions.

Using DFXML files as known hashes

The programs should be able to accept DFXML files as files of known hashes. Ideally, output generated by the programs in -d mode should be acceptable as input. Example:

$ hashdeep -d * > ../known.xml
$ hashdeep -ak ../known.xml *
hashdeep: Audit passed

progress indicator bug (no. 2)

Converted from SourceForge issue 1239505, submitted by nobody

Hello,

With a target file name longer than 30 characters,
md5deep sometimes only displays a colon. Instead,
the file name should always be displayed truncated

Example: "Oxford Advanced Learner Dictionary.iso"

Currently:
: 46MB of 513MB done, 00:00:30 left

Proposed:
Oxford Advanced Learner Dict..: 46MB of 513MB done, 00:00:30 left

The error occurs with md5deep 1.8-003 beta on Windows
XP Professional.

Regards

Hashkeeper FilePath entries

Converted from SourceForge issue 1196577, submitted by jessekornblum

Version 1.6 and lower have a problem reading Hashkeeper
files under the matching modes. If there is a value
present in the "FilePath" field, the program reports
"unable to find hash in line" and skips the line.

It appears that the program is looking for the nth
quoted string to be the hash value, which is the file
path if it's present. Normally the file path is not a
valid hash value and so the line is skipped.

Inaccurate time estimates in piecewise mode

Converted from SourceForge issue 1798988, submitted by jessekornblum

When running the progam with both piecewise and time estimation modes (i.e. -p [x] and -e), it generates odd time estimates. The estimated time remaining drops to 00:00:00 well before the file is complete. The error appears to be that the 'start' time is reset each time a piece is started. Thus the 'elapsed' time is always very short. This should be fixed in version 2.0.

MD5 doesn't properly wipe memory

Converted from SourceForge issue 982688, submitted by jessekornblum

After calculating the MD5 hash, the code in md5.c attempts to wipe the memory (ctx) used for calculations in line 147 with:

memset(ctx, 0, sizeof(ctx)); /* In case it's sensitive */

"The author of the code is being paranoid and assuming that some hacker will be reading data from the memory used by MD5Final after it returns (e.g., perhaps this function is implemented in some non-trapable area of memory on what is intended to be a secure application). To prevent this happening the author is zeroing out what might be considered sensitive areas of storage. However, sizeof(ctx) returns the size of the pointer, not the size of the pointed to object. So only part of the storage will be zeroed out." [dj]

Version 1.4 will wipe the entire storage area.

Does not properly skip busy files on Win32

Converted from SourceForge issue 1179758, submitted by jessekornblum

When running md5deep on the %WINNT% directory on Win32, there are several files that are busy and cannot be accessed. These files are not properly skipped by using the regular files option of expert mode. For example, running from Linux on a filesystem mounted via SMB

md5deep -r -l -of WINNT/* >> output.md5deep.txt

md5deep: WINNT/system32/config/default: error at offset 2138112: Text
file busy

Why is the error in the middle of the file?

This should work like:

find ./ -type f -exec md5sum {} ; > outputfile.txt

does!

Wrong -k switch help

Converted from SourceForge issue 1467205, submitted by camarade_tux

This is a purely cosmetic thing.

Running WinXP, using latest Cygwin version, md5deep -
h reports:
"-k - print asterisk before hash"

But output is :
"C:>md5deep -b hash.txt -k
8d1a691279d709454c14510ffe15f259 *hash.txt"

and both manpage hosted on sf.net and readme.txt say :
"Enables asterisk mode. An asterisk is inserted in
lieu of a second space between the filename and the
hash, just like md5sum in its binary (-b) mode."

As you see, it's an important bug. ;)

Windows version does not produce output with linefeeds

Converted from SourceForge issue 924146, submitted by jessekornblum

The Windows version 1.1 uses the standard UNIX convention of \n as a line break. In order to display correctly in some Win32 applications (e.g. Notepad), we need to use the Windows convention of \r\n for linebreaks.

How to recreate

C:> md5deep . > foo.txt

C:> notepad foo.txt

Incorrect size for standard input

Converted from SourceForge issue 982730, submitted by jessekornblum

Using the -z flag (file size) while processing standard input produces rather wild estimates for the file size. Example:

$ uname | md5deep -z
18446744073709551615 1b61f2a016f7478478fcb13130fcec7b

By way of comparison

$ uname | wc -c
6

Piecewise mode fails for block size > 1 GB

Converted from SourceForge issue 1463687, submitted by jessekornblum

The piecewise block size variables, both in main.c and
hash.c, are both 32 bit variables. As such, they can't
hold block sizes larger than 32 bits. Admittedly, this
isn't a major problem as most people are not doing
piecewise hashing in the 1GB+ range, but it should be
fixed. Save for version 1.13.

-o ... Ssee README/manpage

Converted from SourceForge issue 1415322, submitted by doryforos

md5deep v1.9.3:

md5deep -h outputs:
...
-o - Only process certain types of files. Ssee
README/manpage
...

(double "s")

X:>type foo.txt
foo

X:>md5deep foo.txt
2145971cf82058b108229a3a2e3bff35 X:\foo.txt

X:>md5deep < foo.txt
d3b07384d113edec49eaa6238ad5ff00