GithubHelp home page GithubHelp logo

Comments (8)

eikek avatar eikek commented on August 28, 2024

Thank you for reporting! Yes it's really a bad idea to change the working dir if the program uses it to determine things… 😕. And it is really embarassing that all the commands in the blog post won't work….

The reason that it works for me is that I usually use the script generated in target/bin, I should educate myself to always use the artefacts that I publish.

from chee.

arvindsv avatar arvindsv commented on August 28, 2024

:) No worries.

chee is very, very nice and really well documented too. Kudos to you for that!

I was writing my own, in Clojure, using drewnoakes/metadata-extractor as well, before I found chee. I threw that away and am going to use chee for managing my photos. I'm trying to see if it'll work well with a 200GB photo database. Let's see. Thanks for writing this and putting so much thought into it.

from chee.

arvindsv avatar arvindsv commented on August 28, 2024

As an aside, the only extra piece of metadata I was using was the subsecond time, which helped me sort burst images a little better sometimes.

from chee.

eikek avatar eikek commented on August 28, 2024

Thanks for your kind words! I'm very curious about chee coping with your big photo collection! (I currently cannot recall the GB of my photos, but it's below 200G for sure). I guess some issues are waiting… I hope not too many, though. I'd appreciate if you could share any problems you face.

Also thank you for your side note, this is interesting! I don't do burst images myself, so this just didn't occur to me. I'll create a separate issue for that and start experimenting when I find some time.

By the way, my first try was also using Clojure :-) I wasn't good enough in Clojure (I'm still not) and I had to give up on that.

from chee.

arvindsv avatar arvindsv commented on August 28, 2024

@eikek: You'd asked about my experience with the 200GB of images. It has been great so far! It took a while (about 2 hours) to finish that, mostly because it was on a spinning hard drive (one of my redundant backups).

Once it was done, it has been fine and quick:

# Shows a total of 27870 images in 9 seconds (very happy!)
$ time chee find --all -p '~:checksum ~#length ~:path~%' >all.photos; echo; wc -l all.photos

real	0m9.521s
user	0m12.581s
sys	0m1.245s

27870 all.photos

# Found 2184 duplicate images in the index
$ cut -f1 -d' ' all.photos  | sort | uniq -d | wc -l
2184

# Shows that it found 189GB of images
$ cut -f2 -d' ' all.photos  | awk '{ sum += $1 } END {print sum / (1024 * 1024 * 1024)}'
189.008

There were a couple of small issues. Nothing I couldn't get past easily. I'll investigate a bit more and raise an issue, if I feel it's a bug:

  1. I had placed my config in the wrong location (~/.chee) so it wasn't picking up the index when I tried to query it. Using chee info and chee config, I was able to figure that out and fix it. But, the difference in speed was light and day. So, it was obvious to me that the index wasn't being used.

  2. chee add -r does not like being given a symbolic link to a directory. I had to do a: chee add -r dir-link/*. symlinks are useful, because I found that my original idea of multiple .chee directories, virtually showing up as one isn't part of the concepts. So, I am using the root repository. This is the one I want to investigate more and see why it's happening. Sorry about bringing it up in this (now closed) issue, but I don't want to lose it. :)

  3. After 2 hours of running, it said: Added: 21973 files; Skipped: 5487 files; in 2:15:17.499. I wasn't sure what was skipped (I'm sure it's fine). The logs didn't help much, because there weren't 5487 lines of files which were skipped. My suspicion is that the default query and the query I provided both took effect and disqualified 5487 files. I just need to verify that. What is worrying is that: 21973 + 5487 = 27870, which is what the index claims it has. So, it has skipped files too. Don't know what that means, at this time.

Overall, I'm very happy! I have plans to organize my images using tags and come up with galleries based on the tags to ensure, visually, that I have the categories right, etc. chee has been very helpful. Thanks!

[EDIT: It actually adds up to 27460. I added 410 files later]

from chee.

eikek avatar eikek commented on August 28, 2024

@arvindsv thank you a lot for your comments!

  1. That seems strange to me. A .chee directory (and so ~/.chee) is used for “repository mode” and it should pick up .chee/chee.conf. But this is not the global config file then, which would be $HOME/.config/chee.conf. So it depends where chee is invoked. But if you created a ~/.chee directory, it should actually pick up the config there if invoked somewhere below ~.

  2. I'm not sure if I understand correctly: I guess multiple .chee directories don't work as one, because one would need to query multiple databases and combine the results (not that this is a bad thing, it's just not done). It may very well be, that add doesn't like symbolic links (I remember having issues with links several times)–this would be something to fix, though. Don't worry about where to bring something up :)

  3. This seems strange to me, too. When using add, it would/should only skip files, if the path is already in the index. The query and default query should not do something about it, they select files that are either added or skipped. Files that are removed by the queries don't show up in the results at all. If using a .chee repository the path is relative, otherwise an absolute path is used. Hmm. are there maybe links involved that are resolved to equal paths…? The logs should indeed give more a clue about what was happening. At least the stdout should list every path and whether it was skipped or added. But then it is really strange that the index has a total of 27870 if it says 5487 were skipped… I have no idea how that can happen.

Don't hesitate to raise new issues if you think there's something strange. Thanks again for this information.

from chee.

arvindsv avatar arvindsv commented on August 28, 2024

Here's what I found (it's not chee's problem):

  1. You were right. I tried running chee find --all in /tmp and that didn't give me anything. That's because it had created a ~/.chee-work/ dir and was using that. I like it using ~/.chee as my root directory everywhere possible. So, I symlinked ~/.chee/chee.conf to ~/.config/chee/chee.conf. All my files are now in ~/.chee/ and chee find works everywhere using the index!

  2. Yes, I somehow thought multiple .chee directories would be aggregated into one. That would unnecessarily complicate chee anyway. Instead, I created a directory with symlinks to various locations and used that instead. I am actually going to consolidate into one directory soon. So, I won't even need to do that.

    I'll raise a separate issue for the symlink issue with chee add. I have been able to reproduce it with a simple setup.

  3. I dug into it a bit more and it was again my mistake. I found that I had done a chee add of the inner directory before (probably to verify what you say here). So, when I ran it for the bigger and outer directory, it correctly skipped those files since they were already in the index, just as you said.

from chee.

eikek avatar eikek commented on August 28, 2024

Ah good that it resolved more or less. Thanks for reporting #7 , that is clearly a bug.

from chee.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.