Comments (29)
I found the problem: git_tree.entries uses 2 different algos to sort the entries:
entry_search_cmp and entry_sort_cmp
entry_search_cmp makes an alphabetic sort, but entry_sort_cmp is more sophisticated (directories and files are not sorted the same way).
So i see 3 solutions:
1- We need to sort the list alphabetically before make the search (problem: git_vector_bsearch2 makes a sort)
2- We split entry_search_cmp into 2 functions: entry_search_file_cmp and entry_search_dir_cmp, we use gitfo_cmp_path and we search for a file and after for a directory.
3- Maybe we can always (everywhere) use an alphabetic sort???
4- An other better solution, ...
I made this patch:
to test the solution and it works fine with git_tree_entry_byname. But it is not very clean (i change a private member (_cmp)) and incomplete (we must find a solution for all bsearch2 in the code).
from libgit2.
also i encounterd this problem. i',m looking forward to fix that :)
from libgit2.
I stumbled onto this also. I created a test case for this before kind souls pointed me towards this issue. Is anybody working on this? The fix in versmisse/libgit2@8ee19a2 corrects my test case sakari/libgit2@8af9b61. That also fixes my original failing case in haskell.
There is still one case using insert, remove and entry_byname which fails even with versmisses commit -- or then it is the late hour... if it turns out to be a real failure I turn it in to a test case for libgit2
from libgit2.
yes the other test failure I got is a result of the same problem but now with git_treebuilder_remove (test sakari/libgit2@30c1d26).
from libgit2.
I had totally missed this. Fixing asap.
from libgit2.
What's the status of this?
from libgit2.
Last I heard about this was that tanoku was going to fix this asap. I
did not start work on it due to this. But maybe I should have as I
seem to recall that the problem is still present in 0.13 release and
it really bothers me.
Sakari
On Tue, Jul 12, 2011 at 6:45 PM, carlosmn
[email protected]
wrote:
What's the status of this?
Reply to this email directly or view it on GitHub:
#127 (comment)
from libgit2.
Working on this now.
from libgit2.
Paging everyone: does this fix make sense at all? I think I have trascended reality and the bytes no longer make sense to me.
from libgit2.
Not yet fixed. A user has found a broken case with the Linux kernel Git repo, see libgit2/pygit2#38
I have translated it to C. Here is the test program:
#include <git2.h>
#include <stdio.h>
#define REPO "/home/jdavid/sandboxes/linux/.git"
#define TREE "bc18386d999a4f652df5dd61bf0ed5c38e698085"
#define NAME "i2c"
void main()
{
int err;
git_repository *repo;
git_oid tree_id;
git_object *tree;
const git_tree_entry *entry;
err = git_repository_open(&repo, REPO);
err = git_oid_fromstr(&tree_id, TREE);
err = git_object_lookup(&tree, repo, &tree_id, GIT_OBJ_TREE);
entry = git_tree_entry_byname((git_tree*)tree, NAME);
printf("%s %s\n", NAME, (entry == NULL ? "not found" : "found"));
}
from libgit2.
I had a little time this weekend to look for the cause of the bug. It seems to be because the assumption in 761aa2a, that the path we're searching for is of the same kind (file/directory) as the tree entry we're comparing with, will cause different orderings when searching vs. when sorting. For example when comparing the directory "i2c" with the file "i2c.h" in the sort phase, git_futils_cmp_path will know the first is a directory and treat it as if it was named "i2c/" and so produce this ordering:
i2c.h
i2c
but when comparing the name we're searching for, "i2c", with the file "i2c.h" in the search phase, it will assume "i2c" is not a directory and so order it as:
i2c
i2c.h
so the binary search will fail. I guess that when sorting the tree for the purpose of doing a bsearch, it should always be sorted in plain alphabetic order, but the special treatment for directories seems to be required in other cases to follow the behaviour of Git exactly (see 35786cb).
from libgit2.
I'm sorry but this is not yet fixed and must be reopened, right?
from libgit2.
Apologies, I missed this. Looking into it (again)
from libgit2.
If the bug still exists, then yes, it should be reopened.
Is there any reason why we can't use git.git's sorting function everywhere?
from libgit2.
I think there are only 2 solutions
- On every search we must know what we are searching for - a blob, or a tree (also a commit can be in tree in case of submodules)
- We always performing 2 searches - for tree and for blob. Also, we can optimize it like one search with some split on the end.
An assumption
int result =
git_futils_cmp_path(
ksearch->filename, ksearch->filename_len, entry->attr & 040000,
entry->filename, entry->filename_len, entry->attr & 040000)
to use entry->attr in comparison in entry_search_cmp is wrong
from libgit2.
Any progress on this? I'm kindof blocked on this issue.
from libgit2.
I'll jump into this this afternoon again.
from libgit2.
How was the jump? :-)
from libgit2.
Alright, after some more thought, I decided to go with @itroot's first solution: letting the library know what kind of object we're looking for. Instead of adding a explicit is_folder
argument to the API, I've changed the search callback internally: if you're looking for an entry called i2c
, and that entry is a folder, then just call git_tree_entry_byname(tree, "i2c/")
. That should cut it.
I'm looking for some feedback, i.e. cases where you don't know beforehand if the entry you're looking for is a folder or a normal file. I think this would be extremely rare, and the only way to handle this would be the two binary searches, or probably a single linear search, which at first strikes me as more optimal on 90% of the cases...
from libgit2.
I'm looking for some feedback, i.e. cases where you don't know beforehand if the entry you're looking for is a folder or a normal file. I think this would be extremely rare, and the only way to handle this would be the two binary searches, or probably a single linear search, which at first strikes me as more optimal on 90% of the cases...
I don't think that's so rare. I have that scenario right now and my workaround for this bug does a linear search when git_tree_entry_byname() doesn't find anything.
from libgit2.
Can you give me some background on what are you trying to accomplish?
from libgit2.
I'm writing an scm-independent library that uses libgit and the path strings are provided by the caller. My api does not require the user to supply an entity type.
from libgit2.
Use case, Git:
$ git log foobar
That's the input provided by the user, an it must work whether foobar
is a directory or a file.
The question is whether libgi2 should provide a function for this or whether it should be implemented at a higher level. In my opinion it's be better if there is a libgit2 function, otherwise we will find an implementation in every binding (rugged, pygit2, etc.)
Regarding implementation, I prefer two binary searches over a linear search, because it scales. Then it is about testing, you could for instance check the length of the tree object and choose one strategy or the other based on that.
What about something like this :
/* the caller knows it is a folder */
git_tree_entry_byname(tree, "foobar", GIT_DIR);
/* the caller knows it is a file */
git_tree_entry_byname(tree, "foobar", GIT_FILE);
/* the caller does not know */
git_tree_entry_byname(tree, "foobar", GIT_DIR | GIT_FILE);
So if the caller knows you get the best performance. If the caller does not know, it still works.
from libgit2.
Just would like to add. A linear search would be enough for me today, like @erikvanzijst that's my current workaround. The important thing is that it works.
from libgit2.
So if the caller knows you get the best performance. If the caller does not know, it still works.
Often, the caller can't know that a particular path refers to a file. Not even if they look at the working tree. They can have a good idea, but not know. If the user says they're interested in "foobar/", you know they're only interested in a directory called foobar, so it won't match if it's a file, but the user only types "foobar", it needs to match whether it's a file for a directory.
I believe the best solution is to use two linear searches like this:
- use the path as-is; return if found
- if the path has a trailing slash
- return failure
- Otherwise
- try to find a directory with that name.
from libgit2.
Check out this fix yo. Two-pass binary-linear hybrid. Nebody finding any issues?
from libgit2.
Nebody finding any issues?
@tanoku Works very nicely from a binding angle!
from libgit2.
The test programs work, bravo!
from libgit2.
Most riveting! Looks like we can close this for good then...
Until it breaks again. :)
from libgit2.
Related Issues (20)
- Pull
- Tests fail when built against zlib-ng-compat HOT 21
- Wrong link title in 1.7.2 release notes HOT 2
- write config properties at specific level HOT 6
- 1.7.2: LTO exposes a lot of `-Wmaybe-uninitialized` issues HOT 5
- git_revparse_single ignores content before trailing `@`
- Repository ownership validation does not skip `%(prefix)/` on Windows HOT 4
- negative refspec `^refs` not yet implemented
- Memory leak in git_repository_set_workdir(for unhappy paths and when resetting the same working dir)
- libgit2 1.7.x into conan 2 HOT 3
- About "file size" issue, file size should change to uint64? HOT 2
- `----` in commit message breaks trailer parsing HOT 1
- git_error_set_str removed from public API in 1.8.0? HOT 2
- 1.8.0: Calling git_commit_create_from_stage without author and committer does not use the default committer HOT 2
- Failure to parse diff when it adds an empty file HOT 1
- Error when applying patch that removes submodule HOT 1
- Unable to completely override User-Agent HOT 1
- pip3 install pygit2 fails on ppc64le and s390x platform HOT 5
- 1.8.0 requires code changes around functions taking git_commit HOT 2
- Bus error 10: in pack_entry_find_offset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libgit2.