A streamlined and user-friendly library designed for performing local LLM inference directly through your preferred programming language. This library efficiently loads LLMs in GGUF format into CPU or GPU memory, utilizing a CUDA backend for enhanced processing speed.
-
Download Infero and extract it to your preferred location.
-
Acquire a GGUF model from Hugging Face, ensuring compatibility with llama.cpp. Reference the MODELS.txt for supported models.
-
The application utilizes CUDA for enhanced performance on supported GPUs. In the absence of a CUDA-enabled GPU, computation defaults to the CPU. Ensure the model size does not exceed the available system resources, considering the requisite memory.
-
Consult the
installdir\examples
directory for demonstrations on integrating Infero with your programming language. -
Include the following DLLs in your project distribution:
cublas64_12.dll
,cublasLt64_12.dll
,cudart64_12.dll
,llama.dll
, andInfero.dll
. -
Infero API supports integration across programming languages that accommodate Win64 and Unicode, with out-of-the-box support for Pascal (Delphi/FreePascal) and C/C++ (C++Builder, Visual Studio 2022).
-
Ship-ready DLLs are included in the repository; however, if there is a need to rebuild the
Infero.dll
, Delphi 12.1 is required. -
This project is developed using RAD Studio 12.1, on Windows 11, powered by an Intel Core i5-12400F at 2500 MHz with 6 cores (12 logical), equipped with 36GB RAM and an NVIDIA RTX 3060 GPU with 12GB RAM.
-
We encourage testing and welcome pull requests.
-
If you find this project beneficial, please consider starring the repository, sponsoring, or promoting it. Your support is invaluable and highly appreciated.
Delphi example:
uses
System.SysUtils,
Infero;
begin
// init
if not Infero_Init('config.json', nil) then
Exit;
try
// add message
Infero_AddMessage(ROLE_SYSTEM, 'You are a helpful AI assistant');
Infero_AddMessage(ROLE_USER, 'What is AI?');
// do inference
if Infero_Inference('phi3', 1024, nil, nil, nil) then
begin
// success
end
else
begin
// error
end;
finally
Infero_Quit();
end;
end.
C/CPP Example
#include <Infero.h>
int main()
{
// init config
Infero_InitConfig("config.json", nil);
// add message
Infero_AddMessage(ROLE_SYSTEM, "You are a helpful AI assistant");
Infero_AddMessage(ROLE_USER, "What is AI?");
// do inference
if (Infero_Inference("phi34", 1024, NULL, NULL, NULL)
{
// success
}
else
{
// error
return 1;
};
Infero_Quit();
return 0;
}
Infero01.mp4
Infero02.mp4
Our development motto:
- We will not release products that are buggy, incomplete, adding new features over not fixing underlying issues.
- We will strive to fix issues found with our products in a timely manner.
- We will maintain an attitude of quality over quantity for our products.
- We will establish a great rapport with users/customers, with communication, transparency and respect, always encouragingng feedback to help shape the direction of our products.
- We will be decent, fair, remain humble and committed to the craft.
Infero is a community-driven project created by tinyBigGAMES LLC.
BSD-3-Clause license - Core developers:
Infero couldn't have been built without the help of wonderful people and great software already available from the community. Thank you!
Software
People
- John Claw
- Robert Jalarvo