GithubHelp home page GithubHelp logo

Comments (6)

jsign avatar jsign commented on June 3, 2024 1

@frank-dspeed, thanks for the details. For sure we can talk about the details, that's where the fun is. :)

First, is true that under different DAG creation configurations you'd get a different cid. (i.e: change hashing alg, dag layout, raw leaves, etc), but that's unrelated to your original question because you have a definition of DHT which isn't correct. This is why I said I was smelling some confusion here, and I was suspecting you wanted to refer to another concept.

I think in your original question when you said: "between different IPFS nodes" or "DHT", those things are irrelevant or wrong, since what matters is the DAG creation configuration. To be more verbose, here're some claims:

  • Under the same DAG creation configuration (i.e: hashing alg, dag layout, etc), the Cid result for a chunk of data is deterministic. The determinism is related to generating something under the same underlying assumptions. If you change the way you do something, then that's unrelated to determinism.
  • If two different IPFS nodes use the same DAG creation configuration, the generated Cid is the same. Saying it differently, different Cids of the same data isn't related to different IPFS nodes, just different configurations. You can get different Cids for the same data (under different configs) in the same IPFS node. Conclusion: talking about different IPFS nodes in this discussion isn't relevant.
  • Your definition of DHT is not what most people would understand in computer science (or IPFS ecosystem). DHT is understood as Distributed Hash Table, which is ~basically a distributed map.
  • Your claim the CID also Includes the IPFS node DHT Information still isn't correct under your definition of DHT. The Cid format doesn't include any details about DAG layout (e.g: balanced, etc); at most "codecs" but that's another thing.

my conclusion is that we can never think that all are running the same version with the same settings so we have no Deduplikation

In general, most people in the space use ipfs add which has the same default values since the ~begining. Mostly to avoid the same problem you're mentioning. If someone is changing the DAG creation configuration, they should probably know what they're doing and understand that will change the Cid of the data for other people just running ipfs add.

If you want to be 100% strict on saying that we should clarify adding in our paper: "under the assumption of always using the same DAG building configuration", I think is a fair point. That's something not usually clarified every time someone wants to talk about leveraging content-hashing, since talking about content-addressing always implies having baked in a stable address creation scheme. If you have f(data) = address, I think is fair to say nobody should expect f to be changed in the middle of an argument.

from go-threads.

frank-dspeed avatar frank-dspeed commented on June 3, 2024 1

@jsign your correct add that part. You should not underestimate the number of People without prerequired knowledge that read the paper.

I think we can assume that someone who uses this software is not in general familiar with the deep implications of content addressing in general.

from go-threads.

merlinran avatar merlinran commented on June 3, 2024

No CID is purely based on the file content. You can generate a CID for the content without having anything IPFS related.

from go-threads.

jsign avatar jsign commented on June 3, 2024

As an extra question reg:

.. the CID also Includes the IPFS node DHT Information.

Can you provide the reference where you read that? That claim isn't true.
It feels to me there might be some confusion.

from go-threads.

frank-dspeed avatar frank-dspeed commented on June 3, 2024

@jsign you can verify that by creating files on diffrent nodes but if you want the full details

  • The hashing algorithm used (sha256 or any other)
  • The dag-format used (default “balanced” but can be anything and is that what i call DHT Information)
  • The chunking algorithm used (default “fixed-262144” “256KiB blocks” but can and will change).
  • Whether --raw-leaves was used.
  • large folders, whether the HAMT directory sharding option was enabled

Using --raw-leaves (implied by --nocopy, iirc) or --inline should also change the CID (but it might depend on the file content).

my conclusion is that we can never think that all are running the same version with same settings so we have no Deduplikation

from go-threads.

jsign avatar jsign commented on June 3, 2024

@frank-dspeed, thanks for your feedback!

from go-threads.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.