GithubHelp home page GithubHelp logo

hae's People

Contributors

eeroel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

hae's Issues

Idea: support caching embedding vectors

Currently embeddings are computed on the fly for the input text. This is OK for one-off searches or ones that take a couple of seconds max, but can be awkwardly slow when searching a long text (such as a book) repeatedly. It would be nice to speed up such use cases, for example by having an option to cache the embeddings from the previous run.

Automate build

The complete build pipeline should be as follows:

  • Download onnx runtime distribution
  • Download tokenizer and model
  • Convert model to onnx
  • Update tokenizer to remove truncation and padding parameters
  • Convert tokenizer and model into hexdumps
  • Compile

Broken Example

The example in the readme does not work on MacOS(v0.1.4 ARM64 build from releases):

$ man ls | ./hae "how to show file sizes" -n 1 -hl                                                                   
$ man ls | grep -C 5 -E 'the following information'                                                                  
       By default, ls lists one entry per line to standard output; the exceptions are to terminals or when the -C or -x options are specified.

       File information is displayed with one or more ⟨blank⟩s separating the information associated with the -i, -s, and -l options.

   The Long Format
       If the -l option is given, the following information is displayed for each file: file mode, number of links, owner name, group name, number of bytes in the file, abbreviated month, day-of-month file was last modified,
       hour  file  last  modified, minute file last modified, and the pathname.  If the file or directory has extended attributes, the permissions field printed by the -l option is followed by a '@' character.  Otherwise, if
       the file or directory has extended security information (such as an access control list), the permissions field printed by the -l option is followed by a '+' character.  If the -% option is given, a '%' character fol‐
       lows the permissions field for dataless files and directories, possibly replacing the '@' or '+' character.

       If the modification time of the file is more than 6 months in the past or future, and the -D or -T are not specified, then the year of the last modification is displayed in place of the hour and minute fields.

Feat: support reading from input file

Currently only standard input is supported. Reading from a file should also be supported with syntax similar to grep (one input file should be enough to start with though).

Build onnxruntime from source

The idea would be to build onnxruntime as a static library, so the tool could be packaged as a self-contained executable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.