GithubHelp home page GithubHelp logo

tavianator / bfs Goto Github PK

View Code? Open in Web Editor NEW
974.0 15.0 37.0 3.12 MB

A breadth-first version of the UNIX find command

Home Page: https://tavianator.com/projects/bfs.html

License: BSD Zero Clause License

Makefile 2.35% C 83.19% Shell 14.45%
find command-line filesystem directory-tree breadth-first-search linux macos bsd unix

bfs's Introduction

bfs's People

Contributors

a1346054 avatar bmundt6 avatar bourgeoisbear avatar chapmanjacobd avatar data-man avatar dependabot[bot] avatar electronicsarchiver avatar farribeiro avatar lamby avatar lgtm-migrator avatar lugoues avatar markus-oberhumer avatar mpolden avatar rhermes avatar tavianator avatar thesamesam avatar virtualroot avatar visciang avatar vorpalblade avatar wscott avatar xfgusta avatar ylluminarious avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bfs's Issues

printf with \0 not handled the same as GNU find

GNU find outputs an ASCII NUL for -printf with \0, but not bfs.

$ find -name "*.c" -type f -printf '%P\0' | xargs -0 ls -s
24 bftw.c  12 color.c   4 dstring.c  24 eval.c  16 exec.c   4 main.c   4 mtab.c  76 parse.c  24 printf.c   8 typo.c   8 util.c

$ bfs -nohidden -name "*.c" -type f -printf '%P\0' | xargs -0 ls -s
ls: cannot access 'parse.ccolor.cmtab.cutil.cprintf.cmain.ctypo.ceval.cdstring.cbftw.cexec.c': No such file or directory

bfs just seems to omit the zero character, rather than simply terminating the string as I suspected:

$ bfs -name "parse.c" -type f -printf '%P\0%P'
parse.cparse.c

bfs 1.3 fails to build on FreeBSD

Commit b2e69d3 changes some EINVAL to ENODATA which is not defined on FreeBSD. Please advise on what I should replace it with.

Another issue is that util.h now includes sys/capability.h apparently unconditionally even on non-Linux platforms leading to

/usr/include/sys/capability.h:44:2: warning: this file includes <sys/capability.h> which is deprecated [-W#warnings]
#warning this file includes <sys/capability.h> which is deprecated
 ^

On FreeBSD sys/capability.h is an older name for sys/capcisum.h. It's not the same header as on Linux, so the __has_include check is not sufficient.

Implement a depth-first mode

Suggested by /u/mailme_gx and /u/defaultxr on reddit here. Depth-first mode takes less memory, and is probably slightly faster if you're exploring a whole tree.

bfs quite a bit slower than BSD and GNU find

I've recently written my own BFS version of find in Go and optimised it until it performed equally to regular find(1).

I've ran a few rudimentary benchmarks of bfs(1):

โžœ  ~  time find . >/dev/null
find . > /dev/null  0.97s user 4.96s system 6% cpu 1:34.89 total
โžœ  ~  time gfind . >/dev/null
gfind . > /dev/null  1.39s user 5.17s system 8% cpu 1:15.24 total
โžœ  ~  time ~/github/bfs/bfs >/dev/null
~/github/bfs/bfs > /dev/null  1.57s user 36.42s system 15% cpu 4:00.42 total
โžœ  ~  time ~/github/bfs/bfs >/dev/null

This is after warmup. bfs(1) was compiled with -O2 -flto.

I believe this might have to do with the fact that find in its default incarnation only does getdents(64) calls with a large enough buffer to amortize the system call overhead. At least, I was only able to match find(1) when I did something similar (which I had to do by copying and changing some of the Go stdlib, which insisted on doing a useless stat(2)).

I'm on OSX now and don't know how to properly use dtrace, so it'll have to wait until I'm at a linux box to see what's going on. But I'm leaving this here just as a heads up.

GNU find compatibility

For compatibility with GNU findutils' find, we need the following features:

Flags:

  • -H
  • -L
  • -P
  • -Olevel
  • -D

Expressions:

  • ( EXPR )
  • ! EXPR, -not EXPR
  • EXPR1 EXPR2, EXPR1 -a EXPR2, EXPR1 -and EXPR2
  • EXPR1 -o EXPR2, EXPR1 -or EXPR2
  • EXPR1 , EXPR2

Positional options:

  • -daystart
  • -follow
  • -regextype
  • -warn, -nowarn

Normal options:

  • -d, -depth
  • -maxdepth LEVELS
  • -mindepth LEVELS
  • -mount, -xdev
  • -noleaf
  • -ignore_readdir_race, -noignore_readdir_race
  • -help, --help
  • -version, --version

Tests:

  • -amin N
  • -anewer FILE
  • -atime N
  • -cmin N
  • -cnewer FILE
  • -ctime N
  • -empty
  • -false
  • -fstype TYPE
  • -gid N
  • -group NAME
  • -ilname PATTERN
  • -iname PATTERN
  • -inum N
  • -ipath PATTERN, -iwholename PATTERN
  • -iregex PATTERN
  • -links N
  • -lname PATTERN
  • -mmin N
  • -mtime N
  • -name PATTERN
  • -newer FILE
  • -newerXY REFERENCE
  • -nogroup
  • -nouser
  • -path PATTERN, -wholename PATTERN
  • -perm [+-/]?MODE
  • -regex PATTERN
  • -readable, -writable, -executable
  • -samefile FILE
  • -size N[bcwkMG]
  • -true
  • -type [bcdpflsD]
  • -uid N
  • -used N
  • -user NAME
  • -xtype [bcdpfls]

Actions:

  • -delete
  • -exec COMMAND ;
  • -exec COMMAND {} +
  • -ok COMMAND ;
  • -execdir COMMAND
  • -execdir COMMAND {} +
  • -okdir COMMAND ;
  • -ls
  • -fls FILE
  • -print
  • -fprint FILE
  • -print0
  • -fprint0 FILE
  • -printf FORMAT
  • -fprintf FILE FORMAT
  • -prune
  • -quit

Consider removing all nohidden/hidden code

Dotfiles are one of those old bugs introduced long ago in an attempt to make ls skip the directories . and .. which permeate every directory, and because the code only checked if the first byte was ., [1].

It resulted in decades of developer abuse and clutter, including paranoia as now shells ignore these so-called "dotfiles" during globbing operations resulting in data loss as * does not mean "matches any string" as the glob(7) manual suggests.

In the name of simplicity, consider removing support for this pointless distinction or at least make the reality of the filesystem (-nohidden) the default.

As an aside, this would make bfs more compatible with find.

PS: This mechanism was correctly done by GoboLinux with their gobohide tool and module.

Make bftw() multi-threaded

Multi-threading in bftw() should allow for some compute and I/O parallelism. fd gets a great speedup from it, although implementing it in bfs will have significantly more complexity.

A -type flag

I'd like to use this in some fzf contraption of mine. Yet for that I would need it to output only directories. For example like the type d flag commonly found in find.

Since you expose the typeflag parameter to the callback but don't do anything with it I assume you wanted to do this but didn't have the time.

The change is at any rate small (37+, 2- lines), I have a patch ready but cannot contribute to your project if it's under the WTFPL . Licenses like MIT/BSD/ISC/Apache 2 or similar would be fine, should you consider changing your license or dual-licensing the project.

I'm not asking you to change the license, that's your prerogative. But if you can't/won't do that, would you perhaps consider adding the feature yourself?

Feature Request: accept list of files from STDIN

One thing I've always thought would be great as a part of 'find' is accepting a list of files on stdin instead of searching through a provided directory. That way you could use the filters of find/bfs just on a list of files.

i.e.

$ cat foo
/home/user/file.txt
/home/user/file2.txt
/home/user/blah/file99.mp3
$ cat foo | bfs - -ctime +1
/home/user/file.txt
$ cat foo | bfs - -name '*.mp3'
/home/user/blah/file99.mp3

This sort of thing could probably also be done by using a bunch of other commands and parsing their output with grep. But it would be a lot more convenient and would significantly reduce cognitive load to be able to just pipe a list of files into bfs and use its filters on them.

I think this would be a great enhancement for bfs and would be yet another reason to use it over find. Either way, bfs is great and I'm thankful for your work on it!

Can't compile on OSX Sierra, missing `->st_atim` etc.

Full output:

โžœ  bfs git:(master) make release
cc -D_DEFAULT_SOURCE -D_GNU_SOURCE -DBFS_VERSION=\"0.82-10-gb0216b9\"  -std=c99 -O3 -flto -Wall -DNDEBUG -MD -MP -MF bftw.d -c bftw.c -o bftw.o
cc -D_DEFAULT_SOURCE -D_GNU_SOURCE -DBFS_VERSION=\"0.82-10-gb0216b9\"  -std=c99 -O3 -flto -Wall -DNDEBUG -MD -MP -MF color.d -c color.c -o color.o
cc -D_DEFAULT_SOURCE -D_GNU_SOURCE -DBFS_VERSION=\"0.82-10-gb0216b9\"  -std=c99 -O3 -flto -Wall -DNDEBUG -MD -MP -MF dstring.d -c dstring.c -o dstring.o
cc -D_DEFAULT_SOURCE -D_GNU_SOURCE -DBFS_VERSION=\"0.82-10-gb0216b9\"  -std=c99 -O3 -flto -Wall -DNDEBUG -MD -MP -MF eval.d -c eval.c -o eval.o
cc -D_DEFAULT_SOURCE -D_GNU_SOURCE -DBFS_VERSION=\"0.82-10-gb0216b9\"  -std=c99 -O3 -flto -Wall -DNDEBUG -MD -MP -MF main.d -c main.c -o main.o
cc -D_DEFAULT_SOURCE -D_GNU_SOURCE -DBFS_VERSION=\"0.82-10-gb0216b9\"  -std=c99 -O3 -flto -Wall -DNDEBUG -MD -MP -MF parse.d -c parse.c -o parse.o
cc -D_DEFAULT_SOURCE -D_GNU_SOURCE -DBFS_VERSION=\"0.82-10-gb0216b9\"  -std=c99 -O3 -flto -Wall -DNDEBUG -MD -MP -MF typo.d -c typo.c -o typo.o
eval.c:139:20: error: no member named 'st_atim' in 'struct stat'
                time = &statbuf->st_atim;
                        ~~~~~~~  ^
eval.c:142:20: error: no member named 'st_ctim' in 'struct stat'
                time = &statbuf->st_ctim;
                        ~~~~~~~  ^
eval.c:145:20: error: no member named 'st_mtim' in 'struct stat'
                time = &statbuf->st_mtim;
                        ~~~~~~~  ^
eval.c:175:20: error: no member named 'st_atim' in 'struct stat'
                time = &statbuf->st_atim;
                        ~~~~~~~  ^
eval.c:178:20: error: no member named 'st_ctim' in 'struct stat'
                time = &statbuf->st_ctim;
                        ~~~~~~~  ^
eval.c:181:20: error: no member named 'st_mtim' in 'struct stat'
                time = &statbuf->st_mtim;
                        ~~~~~~~  ^
eval.c:199:40: error: no member named 'st_atim' in 'struct stat'
        time_t diff = timespec_diff(&statbuf->st_atim, &statbuf->st_ctim);
                                     ~~~~~~~  ^
eval.c:199:59: error: no member named 'st_ctim' in 'struct stat'
        time_t diff = timespec_diff(&statbuf->st_atim, &statbuf->st_ctim);
                                                        ~~~~~~~  ^
8 errors generated.
make: *** [eval.o] Error 1
make: *** Waiting for unfinished jobs....
parse.c:724:21: error: no member named 'st_mtim' in 'struct stat'
        expr->reftime = sb.st_mtim;
                        ~~ ^
parse.c:1156:23: error: no member named 'st_atim' in 'struct stat'
                        expr->reftime = sb.st_atim;
                                        ~~ ^
parse.c:1159:23: error: no member named 'st_ctim' in 'struct stat'
                        expr->reftime = sb.st_ctim;
                                        ~~ ^
parse.c:1162:23: error: no member named 'st_mtim' in 'struct stat'
                        expr->reftime = sb.st_mtim;
                                        ~~ ^
4 errors generated.
make: *** [parse.o] Error 1

Requires -lrt switch

Hello,

I just tried compiling this project.

It compiles if I do

make CFLAGS+=-lrt

Without that, I get

eval.c:858: undefined reference to `clock_gettime'

statx redefinition error

glibc has recently added statx which results in following redefinition error:

cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.2.2-10-g9caf477\" -D_FORTIFY_SOURCE=2 -std=c99 -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt -MD -MP -MF typo.d -c typo.c -o typo.o
In file included from stat.c:27:
/usr/include/linux/stat.h:56:8: error: redefinition of 'struct statx_timestamp'
 struct statx_timestamp {
        ^~~~~~~~~~~~~~~
In file included from /usr/include/sys/stat.h:446,
                 from stat.c:23:
/usr/include/bits/statx.h:25:8: note: originally defined here
 struct statx_timestamp
        ^~~~~~~~~~~~~~~
In file included from stat.c:27:
/usr/include/linux/stat.h:99:8: error: redefinition of 'struct statx'
 struct statx {
        ^~~~~
In file included from /usr/include/sys/stat.h:446,
                 from stat.c:23:
/usr/include/bits/statx.h:36:8: note: originally defined here
 struct statx
        ^~~~~
make: *** [Makefile;64: stat.o] Error 1
make: *** Waiting for unfinished jobs....

Pulseaudio had a similar situation when glibc added memfd, here is their solution: pulseaudio/pulseaudio@dfb0460#diff-a3824a947f7132abd61ce6a8a9b5e2a4

You could possibly check if __NR_statx defined, but I'm not sure.

Alternatively it might be better to expose a -D flag via the Makefile and guard against that in the implementation. That way I can compile it with make HAVE_STATX=1 or something like it.

idea: print inode of file (and even of the parent)

Today I was thinking that instead of building a comprehensive grammar into a lightweight tool like this, it would be useful to allow it to serve other programs as well as possible. An example could be performing operations on files hierarchies (for example deletion).

Naturally, a BFS traversal of a directory tree will find it hard to actually remove the entire tree in one go since the folders one encounters first can't easily be removed. Yet the files can.

Instead of:

$ find . -type f -delete

This would be possible:

# delete all files in a file hierarchy
$ bfs somedir -type f | parallel -X rm
# get rid of empty husk
$ rm -rf somedir

So that's great, doing one thing and one thing well. I wondered whether it could be made even better (and by better I mean faster):

It would seem that sorting by inode would be a good shot. Buffering up an array and sorting it in C is needlessly messy, and there are other tools that can do that. But bfs(1) would need to output the inode number as calling an external program to stat(2) every file is very expensive. Luckily, stat(2) is not necessary since the inode number is present in the dirent struct.

So we can imagine:

$ bfs -i .
 10720996 ./.git
 10721207 ./.gitignore
 10982737 ./bfs
 10979979 ./bfs.c
 10982713 ./bfs.d
 10982714 ./bfs.o

Notice that the output is not sorted by inode number. Naively doing:

$ bfs -i -type f . | sort -n | parallel -X rm -f

Would maybe not be ideal as it is very possible that this places files from lots of different directories close to each other. Intuitively it seems best to "empty" one directory in one god. this would need to be benchmarked though, because I'm just grasping at straws here.

So we could compare that to sorting by inode within a directory. For that it would be sufficient to output the inode of the parent directory as well:

$ bfs -i -p -type f .
...
10720996 10979934 ./.git/ORIG_HEAD
10720996 10721190 ./.git/packed-refs
10720996 10720997 ./.git/refs
10721002 10721003 ./.git/hooks/applypatch-msg.sample
10721002 10721004 ./.git/hooks/commit-msg.sample
10721002 10721005 ./.git/hooks/post-update.sample
...

$ bfs -i -p -type f . | sort -n -k1,2 | cut -f 2- -d' ' | parallel -X rm

That seems nice, if the intuition about what's more convenient for a file system is accurate, which remains to be seen. That said, this will also place directories in non-BFS order, which might be important. An alternative is to output a generation number instead of the parent directories' inode:

$ bfs -i -g -type f .
...
48 10977657 ./.git/objects/a4/7cee3a89da7a2cff6b615e381fd486d5d9c06e
49 10977653 ./.git/objects/a5/41b1002b2830a0bde86a2e50bf7e58cd41ea85
50 10721056 ./.git/objects/ac/11476e20136976bf689acf847fdcea4e05e37a
50 10976289 ./.git/objects/ac/2e304d1fca0aab2eebba954851c3ef759aaaf5
50 10721187 ./.git/objects/ac/6b2e7981854281d75b24fbb972933a25cb5bcc
51 10721048 ./.git/objects/ad/628c395cc45add3e3a87f21250cbd0df1d860e
52 10979608 ./.git/objects/b0/4f245cb03897e2616aaeb643cb03244993b4cd
53 10721137 ./.git/objects/b1/337c8e1dbfe6ae1b9d79e2b16af210562c2e17
53 10721119 ./.git/objects/b1/d430ec463fd1d49bf47443cb143af0e8e9905e
54 10721080 ./.git/objects/b2/510860f1a20cbaab76b3efca68efe9bc530c66
55 10741243 ./.git/objects/b4/0ccb5fe4e00584ec867706b71db707bf3f63ce
56 10721111 ./.git/objects/be/538a8a3cd111795549eecb16f05bce8c8ba907
...

This will allow one to sort only within directories, keeping BFS traversal order.

However, this still has a big problem: sort(1) wait until there's no more input before sorting and outputting. A bfs(1) run on a very large directory tree could take a long time, and in the meanwhile one could be performing operations on files instead of twiddling thumbs.

Something like this could work

bfs -i -dirdelim='\n' -type f | parallel --pipe --recend='\n\n' "sort -n | cut -f 2- -d' ' | parallel -X rm"

While it looks like it could work, it doesn't somehow, parallel(1) doesn't understand. It also looks complicated, which is almost never good.

Anyway, I just wanted to write down my thoughts for possible features. I've implemented a few of them (the output I pasted is mostly real :).

BSD find compatibility

Options:

  • -E
  • -H
  • -L
  • -P
  • -X
  • -d
  • -s
  • -x
  • -f PATH

Expressions:

  • ( EXPR )
  • ! EXPR, -not EXPR
  • EXPR1 EXPR2, EXPR1 -a EXPR2, EXPR1 -and EXPR2
  • EXPR1 -o EXPR2, EXPR1 -or EXPR2

Primaries:

  • -Bmin, -Bnewer, -Btime
  • -acl
  • -anewer, -amin, -atime
  • -cnewer, -cmin, -ctime
  • -asince, -csince, -since
  • -delete
  • -depth
  • -depth N
  • -empty
  • -exec COMMAND ;
  • -exec COMMAND {} +
  • -execdir COMMAND
  • -execdir COMMAND {} +
  • -exit STATUS
  • -flags FLAGS
  • -fstype TYPE
  • -gid GNAME
  • -group GNAME
  • -ignore_readdir_race, -noignore_readdir_race
  • -ilname PATTERN
  • -iname PATTERN
  • -inum N
  • -ipath PATTERN, -iwholename PATTERN
  • -iregex PATTERN
  • -links N
  • -lname PATTERN
  • -ls
  • -maxdepth LEVELS
  • -mindepth LEVELS
  • -mnewer
  • -mmin, -mtime
  • -mount
  • -name PATTERN
  • -newer FILE
  • -newerXY REFERENCE
  • -nogroup
  • -nouser
  • -noleaf
  • -ok COMMAND ;
  • -okdir COMMAND ;
  • -path PATTERN, -wholename PATTERN
  • -perm -?MODE
  • -perm +MODE
  • -print
  • -print0
  • -printx
  • -prune
  • -quit
  • -regex PATTERN
  • -rm
  • -samefile FILE
  • -size N[ckMGTP]
  • -sparse
  • -type [bcdpfls]
  • -uid UNAME
  • -user UNAME

Make -nohidden position independent

Hello,

When I do:

bfs -nohidden . -name "*foo*"

it works great! Fast, and ignoring dirs starting with ., as expected.

But when I do:

bfs . -name "*foo*" -nohidden

the -nohidden switch is not effective.. I get search results from hidden dirs too (dirs starting with .)!

bfs uses fcntl while find doesn't (unnecessary F_SETFD FD_CLOEXEC)

Running the following:

strace -c bfs -links 2 >/dev/null ; strace -c find . -links 2 >/dev/null 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 63.99    0.000972           2       530           newfstatat
 10.40    0.000158           0       374           getdents
  5.46    0.000083           0       558           fcntl
  4.41    0.000067           0       187           openat
  3.09    0.000047           0       375           close
  2.83    0.000043           5         9           mmap
  2.70    0.000041           0       189           fstat
  1.51    0.000023           6         4           mprotect
  1.18    0.000018           6         3         3 access
  0.86    0.000013           7         2           open
  0.79    0.000012          12         1           execve
  0.72    0.000011           4         3           brk
  0.72    0.000011           4         3         2 ioctl
  0.66    0.000010           5         2           munmap
  0.26    0.000004           4         1           read
  0.20    0.000003           3         1           getrlimit
  0.20    0.000003           3         1           arch_prctl
  0.00    0.000000           0         1           write
------ ----------- ----------- --------- --------- ----------------
100.00    0.001519                  2244         5 total
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 23.69    0.002030           4       530           newfstatat
 18.37    0.001574           4       372           getdents
 15.33    0.001314           3       379           close
 13.06    0.001119           3       375           fchdir
 10.65    0.000913           5       186           openat
 10.26    0.000879           4       196         4 open
  6.60    0.000566           3       191           fstat
  0.68    0.000058           5        12           mmap
  0.40    0.000034           6         6           mprotect
  0.21    0.000018           5         4         4 access
  0.18    0.000015           4         4           read
  0.16    0.000014           5         3           munmap
  0.13    0.000011           4         3           brk
  0.12    0.000010           3         3         2 ioctl
  0.11    0.000009           9         1           execve
  0.04    0.000003           3         1           uname
  0.04    0.000003           3         1           arch_prctl
  0.00    0.000000           0         1           write
------ ----------- ----------- --------- --------- ----------------
100.00    0.008570                  2268        10 total

I suspect the reason to be that find passed O_CLOEXEC to openat(2), while bfs does it in a separate fcntl(2) call or something. Which is valid because it doesn't fork but could be avoided.

Let's check.

$ strace find . -links 2 >/dev/null |& grep 'CLOEXEC' | tail
openat(AT_FDCWD, "d7", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "27", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "logs", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "refs", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "remotes", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "origin", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "heads", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "hooks", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "branches", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "info", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, ".git", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5

bfs does a strange sort of dance:

$  strace bfs -links 2 >/dev/null |& grep -P 'close|open|CLOEXEC' | less
...
openat(6, "73/", O_RDONLY|O_DIRECTORY|O_CLOEXEC) = 9
fcntl(9, F_DUPFD_CLOEXEC, 0)            = 10
fcntl(10, F_SETFD, FD_CLOEXEC)          = 0
close(10)                               = 0
close(9)                                = 0
...

If I'm seeing that right, there's no reason to to the fcntl(10, F_SETFD, FD_CLOEXEC) operation after fcntl(9, F_DUPFD_CLOEXEC, 0) is called, From http://man7.org/linux/man-pages/man2/fcntl.2.html:

F_DUPFD_CLOEXEC (int; since Linux 2.6.24)
As for F_DUPFD, but additionally set the close-on-exec flag
for the duplicate file descriptor. Specifying this flag
permits a program to avoid an additional fcntl() F_SETFD
operation to set the FD_CLOEXEC flag. For an explanation of
why this flag is useful, see the description of O_CLOEXEC in
open(2).

As for the dup call which find doesn't do, I think it might have something to do with the breadth-first ways of bfs. So that could be required for proper functioning. Don't know.

Support more regex types

Findutils offers

  • findutils-default
  • awk
  • egrep
  • ed
  • emacs
  • gnu-awk
  • grep
  • posix-awk
  • posix-basic
  • posix-egrep
  • posix-extended
  • posix-minimal-basic
  • sed

Build failures on kfreebsd

Hi, bfs currently fails on the kfreebsd architectures in Debian:

c -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.6\" -Wdate-time -D_FORTIFY_SOURCE=2 -std=c99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -MD -MP -MF spawn.d -c spawn.c -o spawn.o
cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.6\" -Wdate-time -D_FORTIFY_SOURCE=2 -std=c99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -MD -MP -MF stat.d -c stat.c -o stat.o
cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.6\" -Wdate-time -D_FORTIFY_SOURCE=2 -std=c99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -MD -MP -MF time.d -c time.c -o time.o
cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.6\" -Wdate-time -D_FORTIFY_SOURCE=2 -std=c99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -MD -MP -MF trie.d -c trie.c -o trie.o
cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.6\" -Wdate-time -D_FORTIFY_SOURCE=2 -std=c99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -MD -MP -MF typo.d -c typo.c -o typo.o
cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.6\" -Wdate-time -D_FORTIFY_SOURCE=2 -std=c99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -MD -MP -MF util.d -c util.c -o util.o
cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.6\" -Wdate-time -D_FORTIFY_SOURCE=2 -std=c99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -MD -MP -MF bfs  -Wl,-z,relro -Wl,-z,now bftw.o color.o darray.o diag.o dstring.o eval.o exec.o fsade.o main.o mtab.o opt.o parse.o printf.o spawn.o stat.o time.o trie.o typo.o util.o   -o bfs
/usr/bin/ld: fsade.o: in function `bfs_check_xattrs':
./fsade.c:288: undefined reference to `extattr_list_file'
/usr/bin/ld: ./fsade.c:288: undefined reference to `extattr_list_link'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:96: bfs] Error 1
make[1]: Leaving directory '/<<PKGBUILDDIR>>'
dh_auto_build: error: make -j2 "INSTALL=install --strip-program=true" returned exit code 2
make: *** [debian/rules:6: binary-arch] Error 25

โ€” https://buildd.debian.org/status/package.php?p=bfs

I see that we have a BFS_HAS_SYS_EXTATTR macro already, but I think the setup for this is wrong in kFreeBSD which is a weird hybrid (eg. perhaps the && !__FreeBSD__ logic is wrong here.)

Not available via apt-get

The readme says that bfs is available with apt-get

If you're on Debian GNU/Linux, bfs is available via apt-get install bfs.

But when I try to install it, It doesn't seem to be available.

$ sudo apt-get update && sudo apt-get install bfs
Hit:1 http://archive.ubuntu.com/ubuntu zesty InRelease
Hit:2 http://archive.ubuntu.com/ubuntu zesty-updates InRelease
Hit:3 http://archive.ubuntu.com/ubuntu zesty-security InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package bfs

Using apt search bfs does not show any matches either.

Respect VCS ignore files

As suggested by @merwok: #8 (comment), we should implement a -vcsignore action. Related to #30 since conditionally respecting .gitignore would be very weird, it should be an on/off thing regardless of where it appears on the command line.

[Feature request] Grep style directory exclusion

I've been using bfs for a while now, it is excellent. One thing that find is terrible for is excluding directories, and I thought maybe bfs could add on a more use friendly system.

It would be a lot nicer if there was the option to exclude/include directories/files in the same way that grep does:

--exclude-dir={build,vendor,.git,node_modules} --include=*.{css,vcl,conf,yml,jade,js,styl,php,config,html} --exclude={*.min.js,tags,json}

I'd be quite interested in helping to write this functionality, but I thought I would raise it as an issue first to see what your thoughts on it are.

Feature request: highlight matched part of filename

I think it would help with readability if matched substrings were highlighted in some way (eg: with a yellow background).

It would be extra helpful for regular expressions, where the patterns are abstract.

A workaround is to run bfs | grep --color, but this removes all of bfs' beautiful filetype colour coding!

The -Og flag doesn't exist in Clang

Hi, bfs(1) works fine on OSX, except that it doesn't compile with the default compiler:

โžœ  bfs git:(master) โœ— make
cc -D_DEFAULT_SOURCE  -std=c99 -g -Og -Wall -MD -MP -MF bfs.d -c bfs.c -o bfs.o
cc -D_DEFAULT_SOURCE  -std=c99 -g -Og -Wall -MD -MP -MF bftw.d -c bftw.c -o bftw.o
cc -D_DEFAULT_SOURCE  -std=c99 -g -Og -Wall -MD -MP -MF color.d -c color.c -o color.o
error: invalid integral value 'g' in '-Og'
error: invalid integral value 'g' in '-Og'
make: *** [bftw.o] Error 1
make: *** Waiting for unfinished jobs....
error: invalid integral value 'g' in '-Og'
error: invalid integral value 'g' in '-Og'
error: invalid integral value 'g' in '-Og'
error: invalid integral value 'g' in '-Og'
make: *** [bfs.o] Error 1
make: *** [color.o] Error 1

It does work with gcc 5.2

โžœ  bfs git:(master) โœ— CC=gcc-5 make
gcc-5 -D_DEFAULT_SOURCE  -std=c99 -g -Og -Wall -MD -MP -MF bfs.d -c bfs.c -o bfs.o
gcc-5 -D_DEFAULT_SOURCE  -std=c99 -g -Og -Wall -MD -MP -MF bftw.d -c bftw.c -o bftw.o
gcc-5 -D_DEFAULT_SOURCE  -std=c99 -g -Og -Wall -MD -MP -MF color.d -c color.c -o color.o
gcc-5 -D_DEFAULT_SOURCE  -std=c99 -g -Og -Wall -MD -MP -MF bfs  bfs.o bftw.o color.o -o bfs

On that note, perhaps it wouldn't be bad to add a make release mode. Or, as is better in my experience, let the default make target be an optimized build (-O2 -flto) and provide a make debug target.

If it builds with a regular make invocation on OSX, this could get included in homebrew, which would be great.

Very beautiful code too, I enjoyed reading it.

Implement -sort

-s will sort by name, but it would be nice to sort by other attributes like type, inode number, etc. The main difficulty is that at the time we sort, we don't have the struct BFTW available, so we either need to potentially compute those earlier and cache them for the later actual visits, or pass a different (lighter) type to the comparison function. But even with a lighter type for comparisons, determining the type/inode number may need a stat(), which should be cached somehow anyway.

It would be extra nice to support compound orderings like -sort type,-name meaning sort by type first, then name backwards.

Permission Issue

When it try to list directories inside directory without read permissions, it throws out error. I think bfs should skip such directories before hand.

error: './cache/private': Permission denied
error: './cache/ldconfig': Permission denied
error: './db/sudo': Permission denied
error: './tmp/systemd-private-b0c63c239a4f49ebaf5fa0ee80d157c8-ModemManager.service-ODCouH': Permission denied
error: './tmp/systemd-private-b0c63c239a4f49ebaf5fa0ee80d157c8-systemd-timesyncd.service-Jze1lA': Permission denied
error: './lib/portables': Permission denied
error: './lib/machines': Permission denied
error: './lib/private': Permission denied
error: './lib/NetworkManager': Permission denied
error: './lib/udisks2': Permission denied
error: './lib/rpcbind': Permission denied
error: './log/private': Permission denied
error: './spool/cups': Permission denied
error: './lib/nfs/sm': Permission denied
error: './lib/nfs/sm.bak': Permission denied
error: './lib/sddm/.nv': Permission denied
error: './lib/sddm/.config': Permission denied
error: './lib/gssproxy/rcache': Permission denied

Latest bfs version fails to find libacl and libcap

> LANG=C git pull
Already up to date.

> LANG=C make
[โ€ฆ]
cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.3\"  -std=c99 -g -Wall -Wmissing-declarations -MD -MP -MF bfs  -Wl,--as-needed  bftw.o color.o diag.o dstring.o eval.o exec.o main.o mtab.o opt.o parse.o posix1e.o printf.o spawn.o stat.o typo.o util.o  -lacl -lcap -lrt  -o bfs
/usr/bin/ld: cannot find -lacl
/usr/bin/ld: cannot find -lcap
collect2: error: ld returned 1 exit status
make: *** [bfs] Error 1

> uname -srm
Linux 4.4.0-119-generic x86_64

Support NO_COLOR

While bfs -nocolor exists it would perhaps be nice if it honoured NO_COLOR to provide a more universal way to prevent the use of colours.

bftw(): Permission denied

Arch Linux 64 bits
bfs 0.70 (installed from AUR)

When I execute a search in system folders, it shows the error bftw(): Permission denied.
Example:

$ bfs /usr > /dev/null 
'/usr/share/polkit-1/rules.d/': Permission denied
bftw(): Permission denied

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.