This repository contains a collection of Rust command-line utilities designed to streamline various text processing tasks using Large Language Models (LLMs) with the help of APIs such as KoboldAI and Ollama, along with other functionalities. Each utility serves a specific purpose and can be used independently or in conjunction with others to enhance text processing workflows.
The koboldai_summarization_cli
utility is one of the main projects in this repository, providing a convenient way to generate document summaries using KoboldAI's API. It allows users to summarize large text documents into concise summaries using parameters customizable via JSON.
The ollama_summarization_cli
is the latest addition to our main projects. Similar to its KoboldAI counterpart, this utility harnesses the Ollama API for generating document summaries. It provides users with a straightforward approach to summarizing text documents using the capabilities of the Ollama API.
The repository also includes several auxiliary utilities to complement the main projects:
The json_text_merger
utility simplifies the merging of text entries from JSON files into a single text file. It parses structured JSON data, sorts text entries based on numeric values in filenames, and merges them into a cohesive text document.
The noscribe_transcript_extractor
utility converts HTML files generated by noScribe into transcript text files. It extracts script text and corresponding start and end times from HTML files, facilitating easy conversion for further processing.
The transcript-splitter
utility divides transcript text files into smaller parts based on a specified maximum number of tokens per split. It aims to simplify handling and processing of large transcript files by breaking them down into manageable chunks.
- Installation: Clone this repository and compile the utilities using Cargo.
- Execution: Run each utility with appropriate command-line arguments as specified in their respective README files.
- Integration: Utilize the utilities individually or in combination to suit specific text processing requirements.
Each utility may have its own set of dependencies, detailed in their respective README files. Common dependencies include serde for JSON serialization and deserialization, reqwest for HTTP requests, and structopt for command-line argument parsing.
The project is licensed under the MIT License. See the LICENSE file in each utility's directory for details.
For detailed usage instructions and examples, refer to the README files in each utility's directory.