GithubHelp home page GithubHelp logo

Comments (29)

versmisse avatar versmisse commented on May 7, 2024

I found the problem: git_tree.entries uses 2 different algos to sort the entries:

entry_search_cmp and entry_sort_cmp
entry_search_cmp makes an alphabetic sort, but entry_sort_cmp is more sophisticated (directories and files are not sorted the same way).

So i see 3 solutions:
1- We need to sort the list alphabetically before make the search (problem: git_vector_bsearch2 makes a sort)
2- We split entry_search_cmp into 2 functions: entry_search_file_cmp and entry_search_dir_cmp, we use gitfo_cmp_path and we search for a file and after for a directory.
3- Maybe we can always (everywhere) use an alphabetic sort???
4- An other better solution, ...

I made this patch:

versmisse@8ee19a2

to test the solution and it works fine with git_tree_entry_byname. But it is not very clean (i change a private member (_cmp)) and incomplete (we must find a solution for all bsearch2 in the code).

from libgit2.

chobie avatar chobie commented on May 7, 2024

also i encounterd this problem. i',m looking forward to fix that :)

from libgit2.

sakari avatar sakari commented on May 7, 2024

I stumbled onto this also. I created a test case for this before kind souls pointed me towards this issue. Is anybody working on this? The fix in versmisse/libgit2@8ee19a2 corrects my test case sakari/libgit2@8af9b61. That also fixes my original failing case in haskell.

There is still one case using insert, remove and entry_byname which fails even with versmisses commit -- or then it is the late hour... if it turns out to be a real failure I turn it in to a test case for libgit2

from libgit2.

sakari avatar sakari commented on May 7, 2024

yes the other test failure I got is a result of the same problem but now with git_treebuilder_remove (test sakari/libgit2@30c1d26).

from libgit2.

vmg avatar vmg commented on May 7, 2024

I had totally missed this. Fixing asap.

from libgit2.

carlosmn avatar carlosmn commented on May 7, 2024

What's the status of this?

from libgit2.

sakari avatar sakari commented on May 7, 2024

Last I heard about this was that tanoku was going to fix this asap. I
did not start work on it due to this. But maybe I should have as I
seem to recall that the problem is still present in 0.13 release and
it really bothers me.

Sakari

On Tue, Jul 12, 2011 at 6:45 PM, carlosmn
[email protected]
wrote:

What's the status of this?

Reply to this email directly or view it on GitHub:
#127 (comment)

from libgit2.

vmg avatar vmg commented on May 7, 2024

Working on this now.

from libgit2.

vmg avatar vmg commented on May 7, 2024

Paging everyone: does this fix make sense at all? I think I have trascended reality and the bytes no longer make sense to me.

761aa2a

from libgit2.

jdavid avatar jdavid commented on May 7, 2024

Not yet fixed. A user has found a broken case with the Linux kernel Git repo, see libgit2/pygit2#38

I have translated it to C. Here is the test program:

#include <git2.h>
#include <stdio.h>

#define REPO "/home/jdavid/sandboxes/linux/.git"
#define TREE "bc18386d999a4f652df5dd61bf0ed5c38e698085"
#define NAME "i2c"

void main()
{
    int err;
    git_repository *repo;
    git_oid tree_id;
    git_object *tree;
    const git_tree_entry *entry;

    err = git_repository_open(&repo, REPO);
    err = git_oid_fromstr(&tree_id, TREE);
    err = git_object_lookup(&tree, repo, &tree_id, GIT_OBJ_TREE);
    entry = git_tree_entry_byname((git_tree*)tree, NAME);
    printf("%s %s\n", NAME, (entry == NULL ? "not found" : "found"));
}

from libgit2.

sigmaris avatar sigmaris commented on May 7, 2024

I had a little time this weekend to look for the cause of the bug. It seems to be because the assumption in 761aa2a, that the path we're searching for is of the same kind (file/directory) as the tree entry we're comparing with, will cause different orderings when searching vs. when sorting. For example when comparing the directory "i2c" with the file "i2c.h" in the sort phase, git_futils_cmp_path will know the first is a directory and treat it as if it was named "i2c/" and so produce this ordering:

i2c.h
i2c

but when comparing the name we're searching for, "i2c", with the file "i2c.h" in the search phase, it will assume "i2c" is not a directory and so order it as:

i2c
i2c.h

so the binary search will fail. I guess that when sorting the tree for the purpose of doing a bsearch, it should always be sorted in plain alphabetic order, but the special treatment for directories seems to be required in other cases to follow the behaviour of Git exactly (see 35786cb).

from libgit2.

itroot avatar itroot commented on May 7, 2024

I'm sorry but this is not yet fixed and must be reopened, right?

from libgit2.

vmg avatar vmg commented on May 7, 2024

Apologies, I missed this. Looking into it (again)

from libgit2.

carlosmn avatar carlosmn commented on May 7, 2024

If the bug still exists, then yes, it should be reopened.

Is there any reason why we can't use git.git's sorting function everywhere?

from libgit2.

itroot avatar itroot commented on May 7, 2024

I think there are only 2 solutions

  1. On every search we must know what we are searching for - a blob, or a tree (also a commit can be in tree in case of submodules)
  2. We always performing 2 searches - for tree and for blob. Also, we can optimize it like one search with some split on the end.

An assumption

int result =
git_futils_cmp_path(
    ksearch->filename, ksearch->filename_len, entry->attr & 040000,
    entry->filename, entry->filename_len, entry->attr & 040000)

to use entry->attr in comparison in entry_search_cmp is wrong

from libgit2.

erikvanzijst avatar erikvanzijst commented on May 7, 2024

Any progress on this? I'm kindof blocked on this issue.

from libgit2.

vmg avatar vmg commented on May 7, 2024

I'll jump into this this afternoon again.

from libgit2.

itroot avatar itroot commented on May 7, 2024

How was the jump? :-)

from libgit2.

vmg avatar vmg commented on May 7, 2024

Alright, after some more thought, I decided to go with @itroot's first solution: letting the library know what kind of object we're looking for. Instead of adding a explicit is_folder argument to the API, I've changed the search callback internally: if you're looking for an entry called i2c, and that entry is a folder, then just call git_tree_entry_byname(tree, "i2c/"). That should cut it.

I'm looking for some feedback, i.e. cases where you don't know beforehand if the entry you're looking for is a folder or a normal file. I think this would be extremely rare, and the only way to handle this would be the two binary searches, or probably a single linear search, which at first strikes me as more optimal on 90% of the cases...

from libgit2.

erikvanzijst avatar erikvanzijst commented on May 7, 2024

I'm looking for some feedback, i.e. cases where you don't know beforehand if the entry you're looking for is a folder or a normal file. I think this would be extremely rare, and the only way to handle this would be the two binary searches, or probably a single linear search, which at first strikes me as more optimal on 90% of the cases...

I don't think that's so rare. I have that scenario right now and my workaround for this bug does a linear search when git_tree_entry_byname() doesn't find anything.

from libgit2.

vmg avatar vmg commented on May 7, 2024

Can you give me some background on what are you trying to accomplish?

from libgit2.

erikvanzijst avatar erikvanzijst commented on May 7, 2024

I'm writing an scm-independent library that uses libgit and the path strings are provided by the caller. My api does not require the user to supply an entity type.

from libgit2.

jdavid avatar jdavid commented on May 7, 2024

Use case, Git:

$ git log foobar

That's the input provided by the user, an it must work whether foobar is a directory or a file.

The question is whether libgi2 should provide a function for this or whether it should be implemented at a higher level. In my opinion it's be better if there is a libgit2 function, otherwise we will find an implementation in every binding (rugged, pygit2, etc.)

Regarding implementation, I prefer two binary searches over a linear search, because it scales. Then it is about testing, you could for instance check the length of the tree object and choose one strategy or the other based on that.

What about something like this :

/* the caller knows it is a folder */
git_tree_entry_byname(tree, "foobar", GIT_DIR);
/* the caller knows it is a file */
git_tree_entry_byname(tree, "foobar", GIT_FILE);
/* the caller does not know */
git_tree_entry_byname(tree, "foobar", GIT_DIR | GIT_FILE);

So if the caller knows you get the best performance. If the caller does not know, it still works.

from libgit2.

jdavid avatar jdavid commented on May 7, 2024

Just would like to add. A linear search would be enough for me today, like @erikvanzijst that's my current workaround. The important thing is that it works.

from libgit2.

carlosmn avatar carlosmn commented on May 7, 2024

So if the caller knows you get the best performance. If the caller does not know, it still works.

Often, the caller can't know that a particular path refers to a file. Not even if they look at the working tree. They can have a good idea, but not know. If the user says they're interested in "foobar/", you know they're only interested in a directory called foobar, so it won't match if it's a file, but the user only types "foobar", it needs to match whether it's a file for a directory.

I believe the best solution is to use two linear searches like this:

  • use the path as-is; return if found
  • if the path has a trailing slash
    • return failure
  • Otherwise
    • try to find a directory with that name.

from libgit2.

vmg avatar vmg commented on May 7, 2024

28c1451

Check it out yo

Check out this fix yo. Two-pass binary-linear hybrid. Nebody finding any issues?

from libgit2.

nulltoken avatar nulltoken commented on May 7, 2024

Nebody finding any issues?

@tanoku Works very nicely from a binding angle!

from libgit2.

jdavid avatar jdavid commented on May 7, 2024

The test programs work, bravo!

from libgit2.

vmg avatar vmg commented on May 7, 2024

Most riveting! Looks like we can close this for good then...

Until it breaks again. :)

from libgit2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.