Your paper claims that a CID is based on the file content and that is true it also cla

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Your Paper has a wrong definition of the CID in ipfs about go-threads HOT 6 CLOSED

frank-dspeed commented on June 3, 2024

Your Paper has a wrong definition of the CID in ipfs

from go-threads.

Comments (6)

jsign commented on June 3, 2024 1

@frank-dspeed, thanks for the details. For sure we can talk about the details, that's where the fun is. :)

First, is true that under different DAG creation configurations you'd get a different cid. (i.e: change hashing alg, dag layout, raw leaves, etc), but that's unrelated to your original question because you have a definition of DHT which isn't correct. This is why I said I was smelling some confusion here, and I was suspecting you wanted to refer to another concept.

I think in your original question when you said: "between different IPFS nodes" or "DHT", those things are irrelevant or wrong, since what matters is the DAG creation configuration. To be more verbose, here're some claims:

Under the same DAG creation configuration (i.e: hashing alg, dag layout, etc), the Cid result for a chunk of data is deterministic. The determinism is related to generating something under the same underlying assumptions. If you change the way you do something, then that's unrelated to determinism.
If two different IPFS nodes use the same DAG creation configuration, the generated Cid is the same. Saying it differently, different Cids of the same data isn't related to different IPFS nodes, just different configurations. You can get different Cids for the same data (under different configs) in the same IPFS node. Conclusion: talking about different IPFS nodes in this discussion isn't relevant.
Your definition of DHT is not what most people would understand in computer science (or IPFS ecosystem). DHT is understood as Distributed Hash Table, which is ~basically a distributed map.
Your claim the CID also Includes the IPFS node DHT Information still isn't correct under your definition of DHT. The Cid format doesn't include any details about DAG layout (e.g: balanced, etc); at most "codecs" but that's another thing.

my conclusion is that we can never think that all are running the same version with the same settings so we have no Deduplikation

In general, most people in the space use ipfs add which has the same default values since the ~begining. Mostly to avoid the same problem you're mentioning. If someone is changing the DAG creation configuration, they should probably know what they're doing and understand that will change the Cid of the data for other people just running ipfs add.

If you want to be 100% strict on saying that we should clarify adding in our paper: "under the assumption of always using the same DAG building configuration", I think is a fair point. That's something not usually clarified every time someone wants to talk about leveraging content-hashing, since talking about content-addressing always implies having baked in a stable address creation scheme. If you have f(data) = address, I think is fair to say nobody should expect f to be changed in the middle of an argument.

from go-threads.

frank-dspeed commented on June 3, 2024 1

@jsign your correct add that part. You should not underestimate the number of People without prerequired knowledge that read the paper.

I think we can assume that someone who uses this software is not in general familiar with the deep implications of content addressing in general.

from go-threads.

merlinran commented on June 3, 2024

No CID is purely based on the file content. You can generate a CID for the content without having anything IPFS related.

from go-threads.

jsign commented on June 3, 2024

As an extra question reg:

.. the CID also Includes the IPFS node DHT Information.

Can you provide the reference where you read that? That claim isn't true.
It feels to me there might be some confusion.

from go-threads.

frank-dspeed commented on June 3, 2024

@jsign you can verify that by creating files on diffrent nodes but if you want the full details

The hashing algorithm used (sha256 or any other)
The dag-format used (default “balanced” but can be anything and is that what i call DHT Information)
The chunking algorithm used (default “fixed-262144” “256KiB blocks” but can and will change).
Whether --raw-leaves was used.
large folders, whether the HAMT directory sharding option was enabled

Using --raw-leaves (implied by --nocopy, iirc) or --inline should also change the CID (but it might depend on the file content).

my conclusion is that we can never think that all are running the same version with same settings so we have no Deduplikation

from go-threads.

jsign commented on June 3, 2024

@frank-dspeed, thanks for your feedback!

from go-threads.

Your Paper has a wrong definition of the CID in ipfs about go-threads HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs