microsoft / sammo Goto Github PK

View Code? Open in Web Editor NEW

301.0 9.0 19.0 2.86 MB

A library for prompt engineering and optimization (SAMMO = Structure-aware Multi-Objective Metaprompt Optimization)

License: MIT License

Python 100.00%

llms prompt-engineering prompt-tuning

sammo's Introduction

SAMMO (📘User Guide)

A flexible, easy-to-use library for running and optimizing prompts for Large Language Models (LLMs).

How to Get Started

Go to the user guide for examples, how-tos, and API reference.

Just want to have a quick look? Try the live demo on Binder.

Install library only

pip install sammo

Install and run tutorials

Prerequisites

Python 3.11+

The following commands will install sammo and jupyter and launch jupyter notebook. It's recommended that you create and activate a virtualenv prior to installing packages.

pip install sammo jupyter

# clone sammo to a local directory
git clone https://github.com/microsoft/sammo.git
cd sammo

# launch jupyter notebook and open tutorials directory
jupyter notebook --notebook-dir docs/tutorials

Use Cases

SAMMO is designed to support

Efficient data labeling: Supports minibatching by packing and parsing multiple datapoints into a single prompt.
Prompt prototyping and engineering: Re-usable components and prompt structures to quickly build and test new prompts.
Instruction optimization: Optimize instructions to do better on a given task.
Prompt compression: Compress prompts while maintaining performance.
Large-scale prompt execution: parallelization and rate-limiting out-of-the-box so you can run many queries in parallel and at scale without overwhelming the LLM API.

It is less useful if you want to build

Interactive, agent-based LLM applications (→ check out AutoGen)
Interactive, production-ready LLM applications (→ check out LangChain)

Example

This is extending the chat dialog example from Guidance by running queries in parallel.

runner = OpenAIChat(model_id="gpt-3.5-turbo", api_config=API_CONFIG)
expert_names = GenerateText(
    Template(
        "I want a response to the following question:"
        "{{input}}\n"
        "Name 3 world-class experts (past or present) who would be great at answering this? Don't answer the question yet."
    ),
    system_prompt="You are a helpful and terse assistant.",
    randomness=0,
    max_tokens=300,
)

joint_answer = GenerateText(
    "Great, now please answer the question as if these experts had collaborated in writing a joint anonymous answer.",
    history=expert_names,
    randomness=0,
    max_tokens=500,
)

questions = [
    "How can I be more productive?",
    "What will AI look like in 10 years?",
    "How do we end world hunger?",
]
print(Output(joint_answer).run(runner, questions))

Licence

This project is licensed under MIT.

Authors

SAMMO was written by Tobias Schnabel.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

sammo's People

Contributors

Stargazers

Watchers

Forkers

onkarbhardwaj dearborn-open-ai latteliem showjiangnan springtian1982 hs-heddy art55japan ekoziol skaiphd hmzdtz liuchaoxd jxzhangjhu david-liu nalin879 evdcush t-schn toubat juanmackie ysz2862351242-ysz

sammo's Issues

Change from 4 number version to 3 number

Currently our version numbers look like "0.1.0.6". This proposal is to make the next version number "0.1.1". 3 number versions are conventional for Python, and are a bit easier to work with using some tooling such as poetry version (not a major issue).

I think for a project of this size, we'll be okay to just use semantic versioning with conventional major, minor and patch releases.

Add guidelines/workflow for contributors

Request to add guidelines/workflow somewhere in README or CONTRIBUTING.md - For example, how to replicate the environment, pre-commit hooks, formatting/linting (e.g., black/flake8) that's expected for a PR to be accepted.

Fail the build pipeline if we attempt to publish with a mismatched tag and pyproject.toml

ensure that the git tag and version string in pyproject.toml are consistent

This repo is missing important files

There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.

Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

Merge this pull request

CONTRIBUTING.md should be included in docs site

How to modify the number of concurrent requests to AI (perhaps the number of threads)?

question

I want to modify the number of concurrent requests to AI, what should I do?

background

I want to connect Zhipu glm-3-turbo model. When I execute the following script, it prompts that the number of API concurrency is too high. zhipu limits the concurrency of my account to 5, so I want to modify the number of concurrent requests for AI in the current project (perhaps called the number of threads) to be compatible with the excessive concurrency I encountered.

Phenomenon:

09:01:20,43: sammo.runners.RetriableError: Server error: 429 {'error': {'code': '1302', 'message': '您当前使用该API的并发数过高，请降低并发，或联系客服增加限额。'}}
09:01:20,120: sammo.runners.RetriableError: Server error: 429 {'error': {'code': '1302', 'message': '您当前使用该API的并发数过高，请降低并发，或联系客服增加限额。'}}
09:01:21,172: sammo.runners.RetriableError: Server error: 429 {'error': {'code': '1302', 'message': '您当前使用该API的并发数过高，请降低并发，或联系客服增加限额。'}}
09:01:21,237: sammo.runners.RetriableError: Server error: 429 {'error': {'code': '1302', 'message': '您当前使用该API的并发数过高，请降低并发，或联系客服增加限额。'}}
09:01:22,66: sammo.runners.RetriableError: Server error: 429 {'error': {'code': '1302', 'message': '您当前使用该API的并发数过高，请降低并发，或联系客服增加限额。'}}
09:01:22,221: sammo.runners.RetriableError: Server error: 429 {'error': {'code': '1302', 'message': '您当前使用该API的并发数过高，请降低并发，或联系客服增加限额。'}}
09:01:23,251: sammo.runners.RetriableError: Server error: 429 {'error': {'code': '1302', 'message': '您当前使用该API的并发数过高，请降低并发，或联系客服增加限额。'}}
09:01:23,288: sammo.runners.RetriableError: Server error: 429 {'error': {'code': '1302', 'message': '您当前使用该API的并发数过高，请降低并发，或联系客服增加限额。'}}

code

from sammo.search import BeamSearch

from sammo.mutators import BagOfMutators, InduceInstructions, Paraphrase
import pathlib
import sammo
from sammo.runners import OpenAIChat
from sammo.base import LLMResult,Costs
from sammo.base import Template, EvaluationScore
from sammo.components import Output, GenerateText, ForEach, Union
from sammo.extractors import ExtractRegex
from sammo.data import DataTable
import json
import requests
import os
from sammo.utils import serialize_json

from sammo.instructions import MetaPrompt, Section, Paragraph, InputData
from sammo.dataformatters import PlainFormatter
from sammo.search_op import one_of
from sammo import PROMPT_LOGGER_NAME
import logging

prompt_logger = logging.getLogger(PROMPT_LOGGER_NAME)




class ZhiPuAIChat(OpenAIChat):
    # 'https://open.bigmodel.cn/api/paas/v4/chat/completions
    BASE_URL = "https://open.bigmodel.cn/api/paas/v4"
    SUFFIX = "/chat/completions"

    async def generate_text(
        self,
        prompt: str,
        max_tokens: int | None = None,
        randomness: float | None = 0.01,
        seed: int = 0,
        priority: int = 0,
        system_prompt: str | None = None,
        history: list[dict] | None = None,
        json_mode: bool = False,
    ) -> LLMResult:
        """Calls the chat endpoint of the OAI model.

        Args:
            prompt: The user prompt.
            max_tokens: The maximum number of tokens to generate. If not set, corresponds to maximum
            available tokens.
            randomness: The randomness to use when generating tokens.
            seed: When using randomness, use this seed for local reproducibility (achieved by caching).
            priority: The priority of the request (used for throttling).

        Returns:
            Dictionary with keys "data" (the generated text), "cost" (the number of tokens used),
            and "retries" (the number of retries).
        """
        messages = []
        if system_prompt is not None:
            messages = [{"role": "system", "content": system_prompt}]
            if history:
                history = [x for x in history if x["role"] != "system"]
        if history is not None:
            messages = messages + history

        # check for images in prompt
        revised_prompt = self._post_process_prompt(prompt)
        messages += [{"role": "user", "content": revised_prompt}]

        request = dict(messages=messages, max_tokens=self._max_context_window or max_tokens, temperature=0.1)
        if json_mode:
            request["response_format"] = {"type": "json_object"}
        fingerprint = serialize_json({"seed": seed, "generative_model_id": self._equivalence_class, **request})

        return await self._execute_request(request, fingerprint, priority)

    def _to_llm_result(self, request: dict, json_data: dict, fingerprint: str | bytes) -> LLMResult:
        request_text = request["messages"][-1]["content"]
        prompt_logger.debug(f"\n\n\nAPI call:\n{request_text}\n->\n\n{json_data['choices'][0]['message']['content']}")
        return LLMResult(
            json_data["choices"][0]["message"]["content"],
            history=request["messages"] + [json_data["choices"][0]["message"]],
            costs=self._extract_costs(json_data),
            request_text=request["messages"][-1]["content"],
        )

    def _post_process_prompt(self, prompt: str):
        return prompt

    @staticmethod
    def _extract_costs(json_data: dict) -> dict:
        return Costs(
            input_costs=json_data["usage"].get("prompt_tokens", 0),
            output_costs=json_data["usage"].get("completion_tokens", 0),
        )

class InititialCandidates:
    def __init__(self, dtrain):
        self.dtrain = dtrain

    def __call__(self):
        example_formatter = PlainFormatter(
            all_labels=self.dtrain.outputs.unique(), orient="item"
        )

        labels = self.dtrain.outputs.unique()
        instructions = MetaPrompt(
            [
                Paragraph("Instructions: "),
                Paragraph(
                    one_of(
                        [
                            self.dtrain.constants["instructions"],
                            "",
                            "Find the best output label given the input.",
                            self.dtrain.constants["instructions"] * 2,
                        ]
                    ),
                    id="instructions",
                ),
                Paragraph("\n"),
                Paragraph(
                    f"Output labels: {', '.join(labels)}\n" if len(labels) <= 10 else ""
                ),
                Paragraph(InputData()),
                Paragraph("Output: "),
            ],
            render_as="raw",
            data_formatter=example_formatter,
        )

        return Output(
            instructions.with_extractor("raise"),
            minibatch_size=1,
            on_error="empty_result",
        )

API_CONFIG_FILE = pathlib.Path().cwd() / "config" / "personal.openai"
API_CONFIG = ""
if API_CONFIG_FILE.exists():
    API_CONFIG = API_CONFIG_FILE
if not API_CONFIG:
    raise ValueError('Please set API_CONFIG to {"api_key": "YOUR_KEY"}')

_ = sammo.setup_logger("WARNING")  # we're only interested in warnings for now

runner = ZhiPuAIChat(
    model_id="glm-3-turbo",
    api_config=API_CONFIG,
    cache=os.getenv("CACHE_FILE", "cache.tsv"),
    timeout=30
)

# %load -s load_data,accuracy _init.py
def load_data(
    url="https://raw.githubusercontent.com/SinMu-L/BIG-bench/main/bigbench/benchmark_tasks/implicatures/task.json",
):
    task = json.loads(requests.get(url).content)

    # convert label to single string
    # for x in task["examples"]:
    #     x["output"] = max(x["target_scores"], key=x["target_scores"].get)

    return DataTable.from_records(
        task["examples"],
        input_fields="input",
        constants={"instructions": task["task_prefix"]},
    )


def accuracy(y_True: DataTable, y_pred: DataTable) -> EvaluationScore:
    y_True = y_True.outputs.normalized_values()
    y_pred = y_pred.outputs.normalized_values()
    n_correct = sum([y_p == y_t for y_p, y_t in zip(y_pred, y_True)])

    return EvaluationScore(n_correct / len(y_True))

mydata = load_data()
d_train = mydata.sample(8, seed=42)


mutation_operators = BagOfMutators(
    InititialCandidates(d_train),
    InduceInstructions({"id": "instructions"}, d_train),
    Paraphrase({"id": "instructions"}),
    sample_for_init_candidates=False,
)

prompt_optimizer = BeamSearch(
            runner,
            mutation_operators,
            accuracy,
            maximize=True,
            depth=3,
            mutations_per_beam=2,
            n_initial_candidates=4,
            beam_width=4,
            add_previous=True,
    )
prompt_optimizer.fit(d_train)
prompt_optimizer.show_report()
print(prompt_optimizer.best_prompt)

ps:Wish everything goes well for now

tiktoken dependency does not seem getting used, makes install process heavy

find . -iname "*" | xargs grep -in "tiktoken" does not yield any project files except toml file - is this dependency necessary? This is a heavyweight dependency which perhaps uses other heavy dependencies as well.

Adding more LLM APIs as runners / extending

Right now, only OpenAI API compatible models are supported. Possible other LLMs

Ideas:

Write wrapper around LangChain class
Tutorial to show how to implement custom Runner class.

Add Copyright header for Microsoft-produced source code

should apply to all .py files and other files deemed source code (ie: not configuration)

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

Bug in quickstart example in github pages

CACHE_FILE variable has not been defined in the quickstart example prior to its use.

This repo is missing a LICENSE file

This repository is currently missing a LICENSE file.

A license helps users understand how to use your project in a compliant manner. You can find the standard MIT license Microsoft uses at: https://github.com/microsoft/repo-templates/blob/main/shared/LICENSE.

If you would like to learn more about open source licenses, please visit the document at https://aka.ms/license (Microsoft-internal guidance).

Weird import errors.

I've tried to install sammo on my m1 macbook using
poetry install
and using
pip install sammo jupyter
but when I try to run the any of the notebooks I get

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[4], line 3
      1 # %load -r :27 _init.py
      2 import pathlib
----> 3 import sammo
      4 from sammo.runners import OpenAIChat
      5 from sammo.base import Template, EvaluationScore

File ~/PycharmProjects/github/sammo/sammo/__init__.py:4
      1 # Copyright (c) Microsoft Corporation.
      2 # Licensed under the MIT License.
      3 import logging
----> 4 import beartype
      5 import sammo.utils as utils
      6 from pathlib import Path

even though:

(sammo-py3.11) datascience@headsmac sammo % pip show beartype
Name: beartype
Version: 0.15.0
Summary: Unbearably fast runtime type checking in pure Python.
Home-page: https://beartype.readthedocs.io
Author: Cecil Curry, et al.
Author-email: [email protected]
License: MIT
Location: /Users/datascience/Library/Caches/pypoetry/virtualenvs/sammo-I0aNzNSL-py3.11/lib/python3.11/site-packages
Requires: 
Required-by: sammo
(sammo-py3.11) datascience@headsmac sammo %

Enable build pipeline and require PRs for merge to main

Enable build pipeline via GitHub Actions to automate quality checks and artifact assembly:

add branch protection rule for main requiring PR to merge
run pre-commit hooks
run pytest tests
run mypy type checks
fail build if any of above steps fail
build (but do not publish) documentation
build (but do not publish) PyPI package

Require a PR to merge to main (disable direct push), and require a green build pipeline before merge is allowed.

Subsequent issues will cover the release process wherein a new version is created and the code and documentation artifacts are published (to PyPI, GH Pages and readthedocs). Additional checks to consider in the future include test and documentation coverage reports.

Right now the SAMMO docs site hosted on GH Pages is built from a separate gh-pages branch. This issue moves towards building docs and code from the same branch (so, moving from "publishing from a branch" style to "publishing with a custom GitHub Actions workflow")