GithubHelp home page GithubHelp logo

curiosity-ai / hnsw-sharp Goto Github PK

View Code? Open in Web Editor NEW
60.0 5.0 9.0 260 KB

C# library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs

License: MIT License

C# 100.00%
ann approximate-nearest-neighbor-search csharp netcore dotnet embeddings word2vec

hnsw-sharp's People

Contributors

azure-pipelines[bot] avatar jelmerk avatar microsoftopensource avatar msftgits avatar theolivenbaum avatar wlou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

hnsw-sharp's Issues

Consider allowing either M or LevelLambda to be specified with the other one computed

When using this library, it seems like you often need to specify both M and LevelLambda, but the two values are typically related.

It would be nice if you could just set one or the other and the default would follow. For example:

class Parameters
{
        private int? _m;
        private double? _levelLambda;

        public int M
        {
            get => this._m is { } m ? m
                : this._levelLambda is { } levelLambda => (int)Math.Round(Math.Pow(Math.E, 1 / levelLambda), MidpointRounding.AwayFromZero)
                : 10; // default
            set => this._m = value;
        }

        public double LevelLambda
        {
            get => this._levelLambda ?? 1 / Math.Log(this.M);
            set => this._levelLambda = value;
        }
}

Tests failing

Looks like all the tests in SmallWorldTest.cs fail. I tried increasing the K parameter from 20 to 30 and now some of them pass reliably, but some still fail.

It'd be nice if the examples in the README were working, you know?

Like for example:

  1. new SmallWorld() takes more than one argument (and explaining the IProvideRandomValues stuff wouldn't go amiss...)
  2. There's no BuildGraph() function

At least the search example works. A pity I had to waste a lot of time on the initialization.

Is it a unwritten rule in ML related fields to not document or comment, anything, anywhere, ever? (MS's SPTAG is even worse in that regard)

Self-links, duplicate edges in node 0

I'm not 100% sure this is a bug, but I don't understand why any node in the graph would want self-links or duplicate edges.

Here's a unit test which demonstrates this:

    [TestMethod]
    public void TestGraphDoesNotHaveSelfLinksOrDuplicateEdges()
    {
        var items = SmallVectors().Take(100)
            .Select((v, i) => (Id: (ulong)i, Vector: v))
            .ToArray();

        var graph = new SmallWorld<(ulong Id, float[] Vector), float>(
            distance: (a, b) => CosineDistance.NonOptimized(a.Vector, b.Vector),
            generator: new DeterministicGenerator(),
            parameters: new()
        );
        graph.AddItems(items);

        var underlyingGraph = (Graph<(ulong Id, float[] Vector), float>)graph.GetType()
            .GetField("Graph", BindingFlags.Instance | BindingFlags.NonPublic)
            .GetValue(graph);

        List<(int From, int To, int Layer, string Type)> badEdges = new();
        for (var i = 0; i < underlyingGraph.GraphCore.Nodes.Count; ++i)
        {
            var node = underlyingGraph.GraphCore.Nodes[i];
            for (var l = 0; l <= node.MaxLayer; ++l)
            {
                var connections = node[l];
                foreach (var group in connections.GroupBy(c => c))
                {
                    if (group.Key == i) { badEdges.Add((i, group.Key, l, "self-link")); }
                    if (group.Count() > 1) { badEdges.Add((i, group.Key, l, "duplicate")); }
                }
            }
        }

private IEnumerable<float[]> SmallVectors()
    {
        Random random = new(Seed: 1234);
        while (true)
        {
            var vector = new float[10];
            for (var j = 0; j < vector.Length; ++j)
            {
                vector[j] = random.NextSingle();
            }
            yield return vector;
        }
    }

    internal sealed class DeterministicGenerator : IProvideRandomValues
    {
        private readonly Random _random = new(Seed: 12345);

        public bool IsThreadSafe => false;

        public int Next(int minValue, int maxValue) => this._random.Next(minValue, maxValue);

        public float NextFloat() => this._random.NextSingle();

        public void NextFloats(Span<float> buffer)
        {
            for (var i = 0; i < buffer.Length; ++i)
            {
                buffer[i] = this.NextFloat();
            }
        }
    }

Package Performance

Hi

Do you have estimations for the performance of this code?
For example I ran it on GIST data (dim = 960, ef = 200, M = 32) and it takes about 3 hours to create the graph.

Maybe I am doing something wrong although the parameters are only few and there is no much place to do a mistake here.

not sure this is the righ tplace to submit the query but i will more very happy to hear from you

thanks
Samer

namespace HnswIndex is not found error in Visual Studio

Hi,thank you for deploying the HNSW.Net library.
I try to build the program using HNSW.Net,but it has occured that namespace HnswIndex<> is not found error.
How do I fix this error ? Please teach me.
I don't know whether I have mistaken when installing the HNSW.net library package from Nuget.

Best Regards.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.