GithubHelp home page GithubHelp logo

better handle hard link.s about czkawka HOT 7 OPEN

FransM avatar FransM commented on June 11, 2024
better handle hard link.s

from czkawka.

Comments (7)

CalunVier avatar CalunVier commented on June 11, 2024

hard link has been ignorded on linux and macos. For the reason that windows is not unix like os, there's no implementation for windows.

from czkawka.

CalunVier avatar CalunVier commented on June 11, 2024

I found a crate file_id implemented the file id getter for Windows. I’m learning Rust and trying to add the function. But this will take some time (and may never be done)

from czkawka.

iconoclasthero avatar iconoclasthero commented on June 11, 2024
Usage: ln [OPTION]... [-T] TARGET LINK_NAME
 n the 1st form, create a link to TARGET with the name LINK_NAME.

I don't see hardlinks (HLs) being ignored and the way czkawa converts duplicates to HLs is not acceptable for my use.
I had a set of audio files that were encoded to opus and archived in 2020, 4.4 GB, 1970 files
I had a set of audio files that were encoded to opus and archived in 2023, 11 GB, 3867 files
The 2023 files were written with a newer version of libopus and lvaf/lvac (ffmpeg), are smaller, and are preferred over the 2020 files.
The 2020 archive almost completely overlaps the 2023 archive and, as the files are generally static, hardlinks would be an easy way to eliminate 4.4 GB of duplicate data.

I considered using the czkawa-gui hardlink command, but I was unable to find anywhere I could specify which of the pair of files was to be the target (i.e., I want the 2023 files to be the target, the copy I'm keeping, the 2020 files will be replaced with HLs to the 2023 versions. "Target" is used the way that the man page uses it.)

In the end I used

while read -r oldopus
  do
    newopus="$(find "$PWD" -iname "${oldopus##*\/}")"
    echo "$newopus"
    ln -f "$newopus" "$oldopus"
    mediainfo "$oldopus"|grep Writing
    ls -l --color=always "$oldopus"|grep --color=always \ 2\ 
    read -p ''< /dev/tty
  done < <(find "${olddir}" -iname "*opus")

When I then went back to look for duplicates in czawka under name, they were all still there. Thinking that it was a cache problem, I compiled it on another machine and then looked for duplicates with that machine over NFS and they're still there so...

1.) allow for the user to specify which direction the HLs will be made, i.e., I want to be able to specify the TARGET.
2.) no, czkawa does not filter out HLs, at least not if the files are checked by file names...

Screenshotfrom2024-04-2907-02-19-ezgif com-censor (1)

I can't say that 100% of the duplicates are HLs—some of them did fail—but as I go back and look through the log, most of them show a 2 for the HL number.

⋮
/library/books/collections/Various Authors -- redacted/path/to.opus
Writing application                      : Lavc60.6.101 libopus
Writing library                          : Lavf60.4.100
-rw-rw-r-- 2 user group 2647551 Aug 14  2023 /library/books/collections/redacte/path/to.opus
/library/books/collections/redacted/path/to.opus
⋮

from czkawka.

iconoclasthero avatar iconoclasthero commented on June 11, 2024

So after running my HL script, 10% of the files have not been hard linked, either because they don't have a pair in the new data or there is some other problem. 10% of 4.4 GB is 400 MB and I'm not tracing that down...
however, they're still showing both of the copies in the name view!*

$ find. "*opus"|wc -l; find. "*opus" -links +1|wc -l; find. "*opus" -links +1|\du -sh
1970
1750
4.4G	.

* This is a problem because the hash never showed duplicates and now the name view is showing files, 90% of which are hardlinked to each other.

I did a bit more investigating and of the remaining ca. 220 files that are not hardlinked via the script, it looks like about 189 of them are probably duplicates and vary by e.g., punctuation:

$ find. "*why religion*"; find. "*and the good news*"
path/to/file/2020/Why Religion.opus
𠋮
path/to/file/2023/Why Religion?/Why Religion?.opus
path/to/file/2020/And the Good News Is.opus
𠋮
path/to/file/2023/And the Good News Is ….opus

This can probably be scripted and 200-300 hits aren't that many that I can individually approve the HL replacement of the 2020 archive copy, but ...

tl;dr: well, isn't this supposed to be what czkawa is doing? I mean I identified the directory for it to search, I let it search, it found duplicates I knew where there, then I couldn't HL them the way I wanted, I wrote a de novo script to link 90% of them, and I still can't turn up the duplicates I that exist with czkawa...are you interested in looking at this to see what could be done to address this use case?

from czkawka.

FransM avatar FransM commented on June 11, 2024

hard link has been ignorded on linux and macos. For the reason that windows is not unix like os, there's no implementation for windows.

Windows does support hard links through the https://learn.microsoft.com/en-us/windows/desktop/api/WinBase/nf-winbase-createhardlinka function.

Whether hard links are supported is more a filesystem aspect than a host os aspect.
Most filesystems that originate from the linux world do.
In the windows world NTFS supports hard links, but FAT does not
(FAT on linux does also not support hard links)

from czkawka.

iconoclasthero avatar iconoclasthero commented on June 11, 2024

The OP did not mention Windows and I did not take this thread to be about HLs on that proprietary operating system.

$ uname -a; blkid|grep sdb; lsblk|grep sdb
Linux system 6.8.4-060804-generic #202404041833 SMP PREEMPT_DYNAMIC Thu Apr  4 18:46:22 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
/dev/sdb: UUID="544f0615-####-4e3c-bb6b-aadc1e512efb" UUID_SUB="2a92cb0a-####-48fe-a6e1-a8c9d2bfdfec" BLOCK_SIZE="4096" TYPE="btrfs" PTTYPE="dos"
sdb      8:16   0   3.6T  0 disk /library

Ubuntu 22.04/btrfs

from czkawka.

CalunVier avatar CalunVier commented on June 11, 2024

allow for the user to specify which direction the HLs will be made, i.e., I want to be able to specify the TARGET.

The purpose of the duplicate file finder is to find identical files (not similar files). So at this time, there is no problem with the direction of hard links. All links share the same file (even the so-called target).

from czkawka.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.