deadc0de6 / catcli Goto Github PK
View Code? Open in Web Editor NEWThe command line catalog tool for your offline data
License: GNU General Public License v3.0
The command line catalog tool for your offline data
License: GNU General Public License v3.0
Thanks for a great tool, it's been really useful.
I have some files that have double quotes in their names, and sqlite errors on importing the CSV file because it doesn't like the format. It seems the generally the best way is to escape double quotes with a second double quote preceding it (there's more detail here as CSV is not well specified – https://docs.python.org/3/library/csv.html).
I've got this working for myself and will attach a pull request to here shortly.
Hi,
IMHO showing the date of the index update /creation would be useful, e.g.:
catcli ls some-name
storage: some-name (free:800G, total:1T, date: 2018-09-26 07:16:44)
thx,
ramon
Can you please post the json schema for the catcli.catalog file so that it can be read easily through an programming language ( such as java, C#, python, or etc.. ) as json library/modules ask for the schema to be able to read in and properly format the data. Schema generators get the maccess field type incorrect and in some instances insist that the data file (catcli.catalog) does not conform to the schema it attempts to generate based on the catcli.catalog file.
Thank you.
The command to mount via catcli throws an error. Is it a bug or a me doing something wrong?
If fuse needs some special config options, perhaps the documention should mention this at https://github.com/deadc0de6/catcli#mount-catalog ?
Creating the catalog:
~/playground/catcli/tc-01$ catcli index --meta="Some description" my-name-for-ds-01 ../../data-sets/ds-01
+-+-+-+-+-+-+
|c|a|t|c|l|i|
+-+-+-+-+-+-+ v0.9.5`
Indexed 14 file(s) in 0:00:00.000824
Trying to mount it with catcli:
~/playground/catcli/tc-01$ mkdir --parent mnt
~/playground/catcli/tc-01$ catcli mount mnt
+-+-+-+-+-+-+
|c|a|t|c|l|i|
+-+-+-+-+-+-+ v0.9.5
fusermount: option allow_other only allowed if 'user_allow_other' is set in /etc/fuse.conf
Traceback (most recent call last):
File "/home/chr/.local/bin/catcli", line 8, in
sys.exit(main())
File "/home/chr/.local/lib/python3.8/site-packages/catcli/init.py", line 13, in main
if catcli.catcli.main():
File "/home/chr/.local/lib/python3.8/site-packages/catcli/catcli.py", line 357, in main
if not cmd_mount(args, top, noder):
File "/home/chr/.local/lib/python3.8/site-packages/catcli/catcli.py", line 90, in cmd_mount
Fuser(mountpoint, top, noder,
File "/home/chr/.local/lib/python3.8/site-packages/catcli/fuser.py", line 33, in init
fuse.FUSE(filesystem,
File "/home/chr/.local/lib/python3.8/site-packages/fuse.py", line 711, in init
raise RuntimeError(err)
is there a way to disable this message, every time i run the command is wasting screen space
+-+-+-+-+-+-+
|c|a|t|c|l|i|
+-+-+-+-+-+-+ v0.6.2
also is there a way to disable color code too?
for the most part i like the colors by default but sometime it is distracting,
i guess it depends on the person terminal color scheme but an option to toggle it off would be nice
catcli ls tmptest/a
should output
tmptest/a
and
catcli ls -l tmptest/a
should output
262 Oct 14 15:11 tmptest/a
Hi,
When I run 'catcli index ...' , I have an AttributeError: 'list' object has no attribute 'splitlines'
+-+-+-+-+-+-+
|c|a|t|c|l|i|
+-+-+-+-+-+-+ v0.4.5
Traceback (most recent call last):
File "/Users/tt/.pyenv/versions/3.6.0/bin/catcli", line 11, in
sys.exit(main())
File "/Users/tt/.pyenv/versions/3.6.0/lib/python3.6/site-packages/catcli/init.py", line 13, in main
if catcli.catcli.main():
File "/Users/tt/.pyenv/versions/3.6.0/lib/python3.6/site-packages/catcli/catcli.py", line 204, in main
cmd_index(args, noder, catalog, top)
File "/Users/tt/.pyenv/versions/3.6.0/lib/python3.6/site-packages/catcli/catcli.py", line 80, in cmd_index
attr = noder.clean_storage_attr(args['--meta'])
File "/Users/tt/.pyenv/versions/3.6.0/lib/python3.6/site-packages/catcli/noder.py", line 57, in clean_storage_attr
return ', '.join(attr.splitlines())
AttributeError: 'list' object has no attribute 'splitlines'
What i need to do ?
Thanks
Hi,
I tried to install catcli on Ubuntu 20.04 with Python 3.9 using pip3.
After pip3 install catcli, python3 complained:
ModuleNotFoundError: No module named 'fuse'
After pip3 install fuse (installed fuse-0.1.3), trying to index a directory resulted in
class CatcliFilesystem(fuse.LoggingMixIn, fuse.Operations): # type: ignore
AttributeError: module 'fuse' has no attribute 'LoggingMixIn'
Best wishes
Question from #38 (comment)
A question though (I can create a separate issue if youd' like).
Should the mounted directories really be writable by 'other'?
$ find mnt/my-name-for-ds-01/data -type d -ls
7 0 drwxrwxrwx mnt/my-name-for-ds-01/data
10 0 drwxrwxrwx mnt/my-name-for-ds-01/data/photos_Nikon_D800
Perhaps the topic of what permissions the mounted tree should have deserves it's own topic... does it ever make sense to be able to modify something in the mounted view of the catalog?
Currently when using the find command, the results show only the last directory of the path in which the file is located.
For example if we have /var/log/pacman.log, the find results will show only: log/pacman.log
This behaviour can make it difficult to locate files in databases with complex directory structure.
I don't remember if it's also the case with results which are just directories, as my database is currently on another machine.
Seeing the following error when running an Update:
OSError: [WinError 4393] The tag present in the reparse point buffer is invalid:
Are there files that can cause this error that I should avoid?
I want to use this for external disks. Say I modify a file on my disk, how can I update the catalog? catcli
complains "storage named NNNNN already exist".
By the way both the -r
(recursive) and -f
(force) arguments are ignored by catcli index
despite the documentation.
After using an old version, i wanted to try the new features. So i compiled an updated git package, and things stopped working. I get this error when trying to index data:
File "/usr/bin/catcli", line 33, in
sys.exit(load_entry_point('catcli==0.9.2', 'console_scripts', 'catcli')())
File "/usr/lib/python3.10/site-packages/catcli/init.py", line 12, in main
import catcli.catcli
File "/usr/lib/python3.10/site-packages/catcli/catcli.py", line 23, in
from catcli.walker import Walker
File "/usr/lib/python3.10/site-packages/catcli/walker.py", line 12, in
from catcli.noder import Noder
File "/usr/lib/python3.10/site-packages/catcli/noder.py", line 13, in
from pyfzf.pyfzf import FzfPrompt # type: ignore
ModuleNotFoundError: No module named 'pyfzf'`
Looks like the dependencies need to be updated.
I'm filing this issue here as i saw that you also maintain the AUR package, but since it wasn't updated in quite some time, i wasn't sure i would get a response in the package comments.
Hi,
first of all thanks for catcli. I'm planning on using catcli for testing file integrity on large backup disks and was surprised to see that the update command does not have a forced hash generation and hash comparison. Is it possible to add this feature?
Regards,
Flynn
similar to du(1)
with -s option
catcli -B du temptst/etc
3.2M /etc
Not really an issue, but my impressions when trying out catcli. I do like catcli enough to bother to write this:
(catcli) weberjn:~$ catcli ls tmptest/a
+-+-+-+-+-+-+
|c|a|t|c|l|i|
+-+-+-+-+-+-+ v0.9.6
a [nbfiles:3, totsize:72]
- 1 [size:24]
- 2 [size:24]
- 3 [size:24]
(catcli) weberjn:~$ catcli ls tmptest/a/1
+-+-+-+-+-+-+
|c|a|t|c|l|i|
+-+-+-+-+-+-+ v0.9.6
"tmptest/a/1": nothing found
(catcli) weberjn:~$ catcli -B find fstab
fstab [size:902, storage:etc]
catcli -B find '*.conf'
(catcli find conf also finds containers/registries.conf.d)
I'd prefer if catcli would create in ~/.catcli a separate catalog file for each index, instead of asking to Update catalog.
Update catalog "catcli.catalog" [y|N] ?
Why do you ask that, isn't that natural to update the catalog?
miss a du, e.g
catcli du -h tmptest/a
catcli without parameters should go into interactive mode and accept commands like ls, .. (as ftp does)
CATCLI has been working really well to index and list files.
However when I try to update a catalog, I get the following error: AttributeError: 'AnyNode' object has no attribute 'flag'
I'm not using a hash, and it looks like the entries do have the proper modified time.
Initially I thought it might be because I was indexing a mounted samba share, but the same thing happens when indexing/updating local files.
$ catcli update Animation ~/mount/Animation/
+-+-+-+-+-+-+
|c|a|t|c|l|i|
+-+-+-+-+-+-+ v0.9.5
Traceback (most recent call last):
File "/usr/local/bin/catcli", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/catcli/__init__.py", line 13, in main
if catcli.catcli.main():
File "/usr/local/lib/python3.10/dist-packages/catcli/catcli.py", line 342, in main
cmd_update(args, noder, catalog, top)
File "/usr/local/lib/python3.10/dist-packages/catcli/catcli.py", line 157, in cmd_update
cnt = walker.reindex(path, root, top)
File "/usr/local/lib/python3.10/dist-packages/catcli/walker.py", line 94, in reindex
cnt = self._reindex(path, parent, top)
File "/usr/local/lib/python3.10/dist-packages/catcli/walker.py", line 119, in _reindex
node.flag()
AttributeError: 'AnyNode' object has no attribute 'flag'
find should show the path of the found file
catcli -B find fstab
/etc/fstab [size:902, storage:etc]
First of all, I really appreciate all work on this tool, as it helps me to catalog a large collection of archive data. Found an issue when I gave the wrong meta-information upon indexing, but I failed to correct it. I want to change meta information in a catalog using the following command:
#catcli edit --catalog=Movies_Series.catalog 014
The command opens vim editor where I can modify and save (/w) the metadata. The editor is editing a file in a file /tmp/catcli6pfb08bh.tmp which reflects the change. When I quit the editor the following error message shown:
+-+-+-+-+-+-+
|c|a|t|c|l|i|
+-+-+-+-+-+-+ v0.7.0
Traceback (most recent call last):
File "/home/lacmac/.local/bin/catcli", line 8, in <module>
sys.exit(main())
File "/home/lacmac/.local/lib/python3.8/site-packages/catcli/__init__.py", line 13, in main
if catcli.catcli.main():
File "/home/lacmac/.local/lib/python3.8/site-packages/catcli/catcli.py", line 277, in main
cmd_edit(args, noder, catalog, top)
File "/home/lacmac/.local/lib/python3.8/site-packages/catcli/catcli.py", line 213, in cmd_edit
node.attr = noder.clean_storage_attr(new)node.attr = noder.clean_storage_attr(new)
AttributeError: 'Noder' object has no attribute 'clean_storage_attr'
catcli ls shows the metadata not changed.
I understand I could remove the entry and add the index to the source again with correct metainformation, but I thought may be possible to correct the meta info with the edit parameter.
Hi,
just installed clicat by
pip3 install catcli
Collecting catcli
Downloading https://files.pythonhosted.org/packages/f1/06/0253f58a67a143c5bdbeffa8c36574e9c16f4ab0703b42682b02cab3cba4/catcli-0.5.1-py3-none-any.whl
Collecting docopt (from catcli)
Downloading https://files.pythonhosted.org/packages/a2/55/8f8cab2afd404cf578136ef2cc5dfb50baa1761b68c9da1fb1e4eed343c9/docopt-0.6.2.tar.gz
Collecting anytree (from catcli)
Downloading https://files.pythonhosted.org/packages/b9/cd/abd10f53ba136c77dd6c68aa96d9e6881b9713c4778fd8e854ff5d9787ba/anytree-2.4.3.tar.gz
Collecting psutil (from catcli)
Downloading https://files.pythonhosted.org/packages/7d/9a/1e93d41708f8ed2b564395edfa3389f0fd6d567597401c2e5e2775118d8b/psutil-5.4.7.tar.gz (420kB)
100% |████████████████████████████████| 430kB 2.2MB/s
Collecting six>=1.9.0 (from anytree->catcli)
Using cached https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
Building wheels for collected packages: docopt, anytree, psutil
Running setup.py bdist_wheel for docopt ... done
Stored in directory: /home/ralf/.cache/pip/wheels/9b/04/dd/7daf4150b6d9b12949298737de9431a324d4b797ffd63f526e
Running setup.py bdist_wheel for anytree ... done
Stored in directory: /home/ralf/.cache/pip/wheels/ea/2f/04/b50f9b3761fb3c3c9c3b9ed01b2680c7ee99030f82fde1062b
Running setup.py bdist_wheel for psutil ... done
Stored in directory: /home/ralf/.cache/pip/wheels/e2/9d/ea/1913d16f19bb927c32197308dec69cd8d10b61be8f7e265524
Successfully built docopt anytree psutil
Installing collected packages: docopt, six, anytree, psutil, catcli
Successfully installed anytree-2.4.3 catcli-0.5.1 docopt-0.6.2 psutil-5.4.7 six-1.11.0
and tried to index:
catcli index --meta='my test directory' -u tmptest /var/log
+-+-+-+-+-+-+
|c|a|t|c|l|i|
+-+-+-+-+-+-+ v0.5.1
Traceback (most recent call last):
File "/home/ralf/.local/bin/catcli", line 11, in <module>
sys.exit(main())
File "/home/ralf/.local/lib/python3.5/site-packages/catcli/__init__.py", line 13, in main
if catcli.catcli.main():
File "/home/ralf/.local/lib/python3.5/site-packages/catcli/catcli.py", line 231, in main
cmd_index(args, noder, catalog, top, debug=args['--verbose'])
File "/home/ralf/.local/lib/python3.5/site-packages/catcli/catcli.py", line 84, in cmd_index
attr = noder.format_storage_attr(args['--meta'].split(','))
AttributeError: 'list' object has no attribute 'split'
What i need to do ?
Thx for your tool!,
ramon
The find
is a bit slow for me. It takes 66.30 seconds for it to find an entry in an 1.9GB catalog:
catcli find python 66.30s user 3.09s system 99% cpu 1:09.56 total
System:
Model Name: MacBook Pro
Model Identifier: MacBookPro11,3
Processor Name: Intel Core i7
Processor Speed: 2.8 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 6 MB
Memory: 16 GB
OS: High Sierra
Is it normal for it to take that much time?
A few weeks ago, I had a similar need to catcli but didn't know about catcli, so I built my own indexer and a fuse filesystem for browsing an offline catalog: decoyfs. Today I came across catcli and realized things would have been different if I found it before :)
I added a converter tool in decoyfs so it can load catcli catalogs and show it as a vfs. decoyfs has a quite liberal license, so feel free to steal some code if you want a fuse filesystem for catcli!
Since Python 3.3 the shutil module includes a shutil.disk_usage which is similar to psutil.disk_usage so I believe you can easily drop the psutil dependency.
Would you please add the ability to export to csv.
We are going to use catcli to help manage the use of data lake contents ("time traveling"). The question has come up as to where is the best place for the catcli.cataog file to reside, inside or outside of the data lake?
When I have an exact copy of a directory (or file) in two separate drives (two different storages) and i search for the file using "find", the result displays only the files contained in the last added storage.
I don't know if this is the intended beahviour or a bug. If it is the intended behaviour, I think an option to perform the "full" search would be useful to identify duplicates across different storages (I could try to implement it since I need the function).
Maybe this options already exist and I am missing something, in that case i would appreciate a "guide" on how to perform such operation.
Thanks for the attention.
doesn't seem to index the archives if the extension is not lowercase, e.g myfiles.zip vs myfiles.ZIP
catcli index -a mystorage /run/media/user/mystorage
#### not working
catcli ls -a mystorage/valet.ZIP
valet.ZIP: nothing found
#### working
catcli ls -a mystorage/valet.zip
valet.zip [size:96.6K]
- ARCHIVES.MNU [archive:valet.zip]
- FORMAT.MNU [archive:valet.zip]
- REGISTER.TXT [archive:valet.zip]
I'm new to catcli
, so I'm not sure if the error message below is due to me or a bug. If it's a user error, I'd like to suggest improving the readability of the error message...
Note: I haven't gotten 'catcli' to work at all so far. I'm getting the error the first time I tried to use catcli
. The output below is from after having switched to an absolute path.
Note: I installed 'catcli' with pip3.
Any suggestions on how to troubleshoot this, or if I should first try some easier example or so?
~/playground/catcli/tc-01$ catcli index --meta="Some description" my-name-for-ds-01 /home/chr/playground/data-sets/ds-01
+-+-+-+-+-+-+
|c|a|t|c|l|i|
+-+-+-+-+-+-+ v0.9.4
Traceback (most recent call last):
File "/home/chr/.local/bin/catcli", line 8, in <module>
sys.exit(main())
File "/home/chr/.local/lib/python3.8/site-packages/catcli/__init__.py", line 13, in main
if catcli.catcli.main():
File "/home/chr/.local/lib/python3.8/site-packages/catcli/catcli.py", line 337, in main
cmd_index(args, noder, catalog, top)
File "/home/chr/.local/lib/python3.8/site-packages/catcli/catcli.py", line 125, in cmd_index
root = noder.new_storage_node(name, path, top, attr)
File "/home/chr/.local/lib/python3.8/site-packages/catcli/noder.py", line 252, in new_storage_node
return NodeStorage(name,
File "/home/chr/.local/lib/python3.8/site-packages/catcli/nodes.py", line 179, in __init__
self.size = size
AttributeError: can't set attribute
I dont know which shell and OS you are using but on my system,
GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)
Linux gigabyty 6.2.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 30 Mar 2023 14:51:14 +0000 x86_64 GNU/Linux
This example listed on the README,
echo 'something in files in a' > /tmp/test/a/{1,2,3}
does not work,
-bash: /t-bash: /tmp/test/a/{1,2,3}: ambiguous redirect
I suppose you cant use echo and ">" and {} brace expansion altogether like that.
This works,
echo hello >file1 >file2 >file3
And this, only file1 gets created.
echo hello > file1 file2 file3
Not sure if this is intentional, or I miss a parameter. If neither of these then I think it is a feature request for your consideration.
Update argument is updating the content of the folder but the media information (size, free space, date, etc.) remains as it was captured during the original index.
Example:
catcli index --meta="whatever" --catalog=whatever.catalog 191 /media/user/191
catcli ls --catalog=whatever.catalog 191
storage: 191 (whatever)
nbfiles:1 ,totsize:204.8G ,free:33.2% ,du:91.1G/274.0G ,date: 2021-08-26 15:52:46
catcli update --catalog=whatever.catalog 191 /media/user/191
As a workaround, I remove the from the catalog and index it again. It would be nice if the media information would be updated as well as the date. I use this parameter in later scripts to ease finding available storage space in backup media.
BTW, I still love this piece of software. ;-)
The CSV output is a handy feature; many thanks. However, the native format still provides some data fields that are not available from CSV formatted output:
Would you consider matching the printed data fields between native and CSV formats?
My current output in native:
storage: 079 (Audio)
nbfiles:10 | totsize:881.3G | free:5.3% | du:49.3G/931.5G | date:2021-08-21 18:38:43
The same in csv:
"079","storage","","881.3G","2021-08-21 18:38:43","",""
Ideal output for me in CSV format would also include more fields from the native format, e.g:
"079","storage","","881.3G","2021-08-21 18:38:43","","","Audio","49.3G","931.5G"
Thank you for considering.
First let me thank you for this amazing tool!
Finally I can search through all my NAS files conveniently and offline too! and damn is that indexing fast!!!!
So now I found it would be really neat if I could use fzf instead of find to fuzzy search through the catalog.
Piping works but in this case I'd be able to remove metadata e.g only query the filenames.
And to be able to persist the "fzf" settings for each catalog.
The config file could also be use to be able to schedule catcli catalog updates like in my case I'd like to update the catalog for each of 11 folders on my NAS once every 24 hours at 4 am.
This tool is a gamechanger especially with the power of fzf.
Thank you for your consideration.
Seeing the following issue when I go to run tree:
Traceback (most recent call last):
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1520.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 196, in _run_module_as_main
return run_code(code, main_globals, None,
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1520.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\pflores\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\Scripts\catcli.exe_main.py", line 7, in
File "C:\Users\pflores\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\catcli_init.py", line 13, in main
if catcli.catcli.main():
File "C:\Users\pflores\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\catcli\catcli.py", line 292, in main
cmd_tree(args, noder, top)
File "C:\Users\pflores\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\catcli\catcli.py", line 191, in cmd_tree
noder.print_tree(node, fmt=fmt, header=hdr)
File "C:\Users\pflores\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\catcli\noder.py", line 434, in print_tree
self._to_csv(node, with_header=header)
File "C:\Users\pflores\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\catcli\noder.py", line 442, in _to_csv
self._node_to_csv(node)
File "C:\Users\pflores\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\catcli\noder.py", line 318, in _node_to_csv
out.append(utils.epoch_to_str(node.maccess))
AttributeError: 'AnyNode' object has no attribute 'maccess
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.