guidance-ai / guidance Goto Github PK

A guidance language for controlling large language models.

License: MIT License

Python 20.62% Jupyter Notebook 78.84% JavaScript 0.37% C++ 0.15% Rust 0.01%

guidance's Introduction

Guidance is an efficient programming paradigm for steering language models. With Guidance, you can control how output is structured and get high-quality output for your use case—while reducing latency and cost vs. conventional prompting or fine-tuning. It allows users to constrain generation (e.g. with regex and CFGs) as well as to interleave control (conditionals, loops, tool use) and generation seamlessly.

Install
Features
Example notebooks
Basic generation
Constrained generation
Stateful control + generation

Install

Guidance is available through PyPI, to use a specific model see loading models.

pip install guidance

Features

Write pure Python, with additional LM functionality.

from guidance import models, gen

# load a model (could be Transformers, LlamaCpp, VertexAI, OpenAI...)
llama2 = models.LlamaCpp(path) 

# append text or generations to the model
llama2 + f'Do you want a joke or a poem? ' + gen(stop='.')

Constrain generation with selects (i.e., sets of options), regular expressions, and context-free grammars, as well as with pre-built components (e.g., substring, json).

from guidance import select

# a simple select between two options
llama2 + f'Do you want a joke or a poem? A ' + select(['joke', 'poem'])

Call and deploy tools easily with automatic interleaving of control and generation.

Easy tool use, where the model stops generation when a tool is called, calls the tool, then resumes generation. For example, here is a simple version of a calculator, via four separate 'tools':

@guidance
def add(lm, input1, input2):
    lm += f' = {int(input1) + int(input2)}'
    return lm
@guidance
def subtract(lm, input1, input2):
    lm += f' = {int(input1) - int(input2)}'
    return lm
@guidance
def multiply(lm, input1, input2):
    lm += f' = {float(input1) * float(input2)}'
    return lm
@guidance
def divide(lm, input1, input2):
    lm += f' = {float(input1) / float(input2)}'
    return lm

Now we call gen with these tools as options. Notice how generation is stopped and restarted automatically:

lm = llama2 + '''\
1 + 1 = add(1, 1) = 2
2 - 3 = subtract(2, 3) = -1
'''
lm + gen(max_tokens=15, tools=[add, subtract, multiply, divide])

Get high compatibility—execute a single Guidance program on many backends

Works with Transformers, llama.cpp, AzureAI, VertexAI, OpenAI and others. Users can write one guidance program and execute it on many backends. (note that the most powerful control features require endpoint integration, and for now work best with Transformers and llama.cpp).

gpt = models.OpenAI("gpt-3.5-turbo")

with user():
    lm = gpt + "What is the capital of France?"

with assistant():
    lm += gen("capital")

with user():
    lm += "What is one short surprising fact about it?"

with assistant():
    lm += gen("fact")

Gain speed with stateful control + generation functions—no need for intermediate parsers.

In contrast to chaining, Guidance programs are the equivalent of a single LLM call. More so, whatever non-generated text that gets appended is batched, so that Guidance programs are faster than having the LM generate intermediate text when you have a set structure.

Token healing

Users deal with text (or bytes) rather than tokens, and thus don't have to worry about perverse token boundaries issues such as 'prompt ending in whitespace'.

Rich templates with f-strings.

llama2 + f'''\
Do you want a joke or a poem? A {select(['joke', 'poem'])}.
Okay, here is a one-liner: "{gen(stop='"')}"
'''

Abstract chat interface that uses correct special tokens for any chat model.

# capture our selection under the name 'answer'
lm = llama2 + f"Do you want a joke or a poem? A {select(['joke', 'poem'], name='answer')}.\n"

# make a choice based on the model's previous selection
if lm["answer"] == "joke":
    lm += f"Here is a one-line joke about cats: " + gen('output', stop='\n')
else:
    lm += f"Here is a one-line poem about dogs: " + gen('output', stop='\n')

Easy-to-write reusable components.

import guidance

@guidance
def one_line_thing(lm, thing, topic):
    lm += f'Here is a one-line {thing} about {topic}: ' + gen(stop='\n')
    return lm # return our updated model

# pick either a joke or a poem
lm = llama2 + f"Do you want a joke or a poem? A {select(['joke', 'poem'], name='thing')}.\n"

# call our guidance function
lm += one_line_thing(lm['thing'], 'cats')

A library of pre-built components

Common syntax elements are avilable out of the box, below is an example of substring for others (like json) checkout the docs.

from guidance import substring

# define a set of possible statements
text = 'guidance is awesome. guidance is so great. guidance is the best thing since sliced bread.'

# force the model to make an exact quote
llama2 + f'Here is a true statement about the guidance library: "{substring(text)}"'

Streaming support, also integrated with Jupyter notebooks.

lm = llama2 + 'Here is a cute 5-line poem about cats and dogs:\n'
for i in range(5):
    lm += f"LINE {i+1}: " + gen(temperature=0.8, suffix="\n")

For environments that don't support guidance's rich IPython/Jupyter/HTML based visualizations (e.g. console applications), all visualizations and console outputs can be supressed by setting echo=False in the constructor of any guidance.models object:

llama2 = models.LlamaCpp(path, echo=False)

Multi-modal support.

from guidance import image

gemini = models.VertexAI("gemini-pro-vision")

with user():
    lm = gemini + "What is this a picture of?" + image("longs_peak.jpg")

with assistant():
    lm += gen("answer")

Example notebooks

We are working on updating our example notebooks. The following ones have been updated:

More coming soon

Basic generation

An lm object is immutable, so you change it by creating new copies of it. By default, when you append things to lm, it creates a copy, e.g.:

from guidance import models, gen, select
llama2 = models.LlamaCpp(model)

# llama2 is not modified, `lm` is a copy of `llama2` with 'This is a prompt' appended to its state
lm = llama2 + 'This is a prompt'

You can append generation calls to model objects, e.g.

lm = llama2 + 'This is a prompt' + gen(max_tokens=10)

You can also interleave generation calls with plain text, or control flows:

# Note how we set stop tokens
lm = llama2 + 'I like to play with my ' + gen(stop=' ') + ' in' + gen(stop=['\n', '.', '!'])

Constrained generation

Select (basic)

select constrains generation to a set of options:

lm = llama2 + 'I like the color ' + select(['red', 'blue', 'green'])

Regular expressions

gen has optional arguments regex and stop_regex, which allow generation (and stopping, respectively) to be controlled by a regex.

Regex to constrain generation

Unconstrained:

lm = llama2 + 'Question: Luke has ten balls. He gives three to his brother.\n'
lm += 'How many balls does he have left?\n'
lm += 'Answer: ' + gen(stop='\n')

Constrained by regex:

lm = llama2 + 'Question: Luke has ten balls. He gives three to his brother.\n'
lm += 'How many balls does he have left?\n'
lm += 'Answer: ' + gen(regex='\d+')

Regex as stopping criterion

Unconstrained:

lm = llama2 + '19, 18,' + gen(max_tokens=50)

Stop with traditional stop text, whenever the model generates the number 7:

lm = llama2 + '19, 18,' + gen(max_tokens=50, stop='7')

Stop whenever the model generates the character 7 without any numbers around it:

lm = llama2 + '19, 18,' + gen(max_tokens=50, stop_regex='[^\d]7[^\d]')

Context-free grammars

We expose a variety of operators that make it easy to define CFGs, which in turn can be used to constrain generation. For example, we can use the select operator (it accepts CFGs as options), zero_or_more and one_or_more to define a grammar for mathematical expressions:

import guidance
from guidance import one_or_more, select, zero_or_more
# stateless=True indicates this function does not depend on LLM generations
@guidance(stateless=True)
def number(lm):
    n = one_or_more(select(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']))
    # Allow for negative or positive numbers
    return lm + select(['-' + n, n])

@guidance(stateless=True)
def operator(lm):
    return lm + select(['+' , '*', '**', '/', '-'])

@guidance(stateless=True)
def expression(lm):
    # Either
    # 1. A number (terminal)
    # 2. two expressions with an operator and optional whitespace
    # 3. An expression with parentheses around it
    return lm + select([
        number(),
        expression() + zero_or_more(' ') +  operator() + zero_or_more(' ') +  expression(),
        '(' + expression() + ')'
    ])

The @guidance(stateless=True) decorator makes it such that a function (e.g. expression) lives as a stateless grammar that does not get 'executed' until we call call lm + expression() or lm += expression(). For example, here is an example of unconstrained generation:

# Without constraints
lm = llama2 + 'Problem: Luke has a hundred and six balls. He then loses thirty six.\n'
lm += 'Equivalent arithmetic expression: ' + gen(stop='\n') + '\n'

Notice how the model wrote the right equation but solved it (incorrectly). If we wanted to constrain the model such that it only writes valid expressions (without trying to solve them), we can just append our grammar to it:

grammar = expression()
lm = llama2 + 'Problem: Luke has a hundred and six balls. He then loses thirty six.\n'
lm += 'Equivalent arithmetic expression: ' + grammar + '\n'

Grammars are very easy to compose. For example, let's say we want a grammar that generates either a mathematical expression or an expression followed by a solution followed by another expression. Creating this grammar is easy:

from guidance import regex
grammar = select([expression(), expression() +  regex(' = \d+; ') + expression()])

We can generate according to it:

llama2 + 'Here is a math expression for two plus two: ' + grammar

llama2 + '2 + 2 = 4; 3+3\n' + grammar

Even if you don't like thinking in terms of recursive grammars, this formalism makes it easy to constrain generation. For example, let's say we have the following one-shot prompt:

@guidance(stateless=True)
def ner_instruction(lm, input):
    lm += f'''\
    Please tag each word in the input with PER, ORG, LOC, or nothing
    ---
    Input: John worked at Apple.
    Output:
    John: PER
    worked: 
    at: 
    Apple: ORG
    .: 
    ---
    Input: {input}
    Output:
    '''
    return lm
input = 'Julia never went to Morocco in her life!!'
llama2 + ner_instruction(input) + gen(stop='---')

Notice that the model did not spell the word 'Morocco' correctly. Sometimes the model might also hallucinate a tag that doesn't exist. We can improve this by adding more few-shot examples, etc, but we can also constrain generation to the exact format we want:

import re

@guidance(stateless=True)
def constrained_ner(lm, input):
    # Split into words
    words = [x for x in re.split('([^a-zA-Z0-9])', input) if x and not re.match('\s', x)]
    ret = ''
    for x in words:
        ret += x + ': ' + select(['PER', 'ORG', 'LOC', '']) + '\n'
    return lm + ret
llama2 + ner_instruction(input) + constrained_ner(input)

While constrained_ner(input) is a grammar that constrains the model generation, it feels like you're just writing normal imperative python code with += and selects.

Capture a generation

The string generated by a stateless function can be saved to the lm object by using the capture function. capture takes two arguments: the stateless function and the name to store the captured variable.

from guidance import capture, one_or_more

ans = lm + "To close the open bracket sequence [[ the corresponding closing brackets are " + capture(one_or_more("]"), "brackets")
ans["brackets"]

Stateful control + generation

State in immutable objects

Whenever you do lm + grammar or lm + gen, lm + select, etc, you return a new lm object with additional state. For example:

lm = llama2 + 'This is a prompt' + gen(name='test', max_tokens=10)
lm += select(['this', 'that'], name='test2')
lm['test'], lm['test2']

Stateful `{guidance}` functions

The guidance decorator is @guidance(stateless=False) by default, meaning that a function with this decorator depends on the lm state to execute (either prior state or state generated within the function). For example:

@guidance(stateless=False)
def test(lm):
    lm += 'Should I say "Scott"?\n' + select(['yes', 'no'], name='answer') + '\n'
    if lm['answer'] == 'yes':
        lm += 'Scott'
    else:
        lm += 'Not Scott'
    return lm
llama2 + test()

Example: ReAct

A big advantage of stateful control is that you don't have to write any intermediate parsers, and adding follow-up 'prompting' is easy, even if the follow up depends on what the model generates. For example, let's say we want to implement the first example of ReAct prompt in this, and let's say the valid acts are only 'Search' or 'Finish'. We might write it like this:

@guidance
def react_prompt_example(lm, question, max_rounds=10):
    lm += f'Question: {question}\n'
    i = 1
    while True:
        lm += f'Thought {i}: ' + gen(suffix='\n')
        lm += f'Act {i}: ' + select(['Search', 'Finish'], name='act') 
        lm += '[' + gen(name='arg', suffix=']') + '\n'
        if lm['act'] == 'Finish' or i == max_rounds:
            break
        else:
            lm += f'Observation {i}: ' + search(lm['arg']) + '\n'
        i += 1
    return lm

Notice how we don't have to write a parser for Act and argument and hope that the model generates something valid: we enforce it. Notice also that the loop only stops once the model chooses to act with 'Finish' (or once we hit a maximum number of rounds).

Example: Changing intermediate step of a Chat session

We can also hide or change some of what the model generates. For example, below we get a Chat model (notice we use special role blocks) to name some experts to answer a question, but we always remove 'Ferriss' from the list if he is mentioned:

from guidance import user, system, assistant
lm = llama2
query = 'How can I be more productive?'
with system():
    lm += 'You are a helpful and terse assistant.'
with user():
    lm += f'I want a response to the following question:\n{query}\n'
    lm += 'Name 3 world-class experts (past or present) who would be great at answering this.'
with assistant():
    temp_lm = lm
    for i in range(1, 4):
        # This regex only allows strings that look like names (where every word is capitalized)
        # list_append appends the result to a list
        temp_lm += f'{i}. ' + gen(regex='([A-Z][a-z]*\s*)+', suffix='\n',
                                  name='experts', list_append=True)
    experts = [x for x in temp_lm['experts'] if 'Ferriss' not in x]
    # Notice that even if the model generates 'Ferriss' above,
    # it doesn't get added to `lm`, only to `temp_lm`
    lm += ', '.join(experts)
with user():
    lm += 'Please answer the question as if these experts had collaborated in writing an anonymous answer.'
with assistant():
    lm += gen(max_tokens=100)

Automatic interleaving of control and generation: tool use

Tool use is a common case of stateful control. To make it easy to do so, gen calls take tools as an optional argument, where each tool is defined by (1) a grammar that triggers its call and captures the arguments (if any), and (2) the actual tool call. Then, as generation unrolls, whenever the model generates something that matches the grammar of a tool call, it (1) stops generation, (2) calls the tool (which can append whatever it wants to the LM session), and (3) continues generation.

For example, here is how we might implement a calculator tool, leveraging our expression grammar above:

from guidance import capture, Tool
@guidance(stateless=True)
def calculator_call(lm):
    # capture just 'names' the expression, to be saved in the LM state
    return lm + 'calculator(' + capture(expression(), 'tool_args') + ')'

@guidance
def calculator(lm):
    expression = lm['tool_args']
    # You typically don't want to run eval directly for save reasons
    # Here we are guaranteed to only have mathematical expressions
    lm += f' = {eval(expression)}'
    return lm
calculator_tool = Tool(calculator_call(), calculator)
lm = llama2 + 'Here are five expressions:\ncalculator(3 *3) = 33\ncalculator(2 + 1 * 3) = 5\n'
lm += gen(max_tokens=30, tools=[calculator_tool], stop='\n\n')

Gsm8k example

Notice that the calculator is just called seamlessly during generation. Here is a more realistic exampe of the model solving a gsm8k question:

@guidance
def math_with_calc(lm, question):
    # Two-shot example
    lm += '''\
    Question: John starts with 2 balls. He then quintupled his number of balls. Then he lost half of them. He then gave 3 to his brother. How many does he have left?
    Reasoning:
    1. He quintupled his balls. So he has calculator(2 * 5) = 10 balls.
    1. He lost half. So he has calculator(10 / 2) = 5 balls.
    3. He gave 3 to his brother. So he has calculator(5 - 3) = 2 balls.
    Answer: 2

    Question: Jill get 7 dollars a day in allowance. She uses 1 each day to by a bus pass, then gives half away. How much does she have left each day?
    Reasoning:
    1. She gets 7 dollars a day.
    1. She spends 1 on a bus pass. So she has calculator(5 - 1) = 6.
    3. She gives half away. So that makes calculator(6 / 2) = 3.
    Answer: 3

    '''
    lm += f'Question: {question}\n'
    lm += 'Reasoning:\n' + gen(max_tokens=200, tools=[calculator_tool], stop='Answer')
    # Only numbers or commas
    lm += 'Answer: ' + gen(regex='[-\d,]+')
    return lm

question = '''Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?'''
llama2 + math_with_calc(question)

Automatic call grammar for @guidance functions

You can also initialize a Tool with any @guidance-decorated function, and the default call grammar will be like a python call. Here is an example of using multiple such tools in the same gen call:

@guidance
def say_scott(lm, n):
    lm += '\n'
    for _ in range(int(n)):
        lm += 'Scott\n'
    return lm

@guidance
def say_marco(lm, n):
    lm += '\n'
    for _ in range(int(n)):
        lm += 'marco\n'
    return lm

tools = [Tool(callable=say_scott), Tool(callable=say_marco)]
llama2 + '''\
I am going to call say_scott and say_marco a few times:
say_scott(1)
Scott
''' + gen(max_tokens=20, tools=tools)

Text, not tokens

The standard greedy tokenizations used by most language models introduce a variety of subtle and powerful biases, which that can have all kinds of unintended consequences for your prompts. For example, take the following prompt, given to gpt-2 (standard greedy tokenization):

hf_gen(prompt, max_tokens=10)

from transformers import pipeline
pipe = pipeline("text-generation", model="gpt2")
def hf_gen(prompt, max_tokens=100):
    return pipe(prompt, do_sample=False, max_length=max_tokens, return_full_text=False)[0]['generated_text']

prompt = 'http:'
hf_gen(prompt, max_tokens=10)

Notice how the output generated by the LLM does not complete the URL with the obvious next characters (two forward slashes). It instead creates an invalid URL string with a space in the middle. Why? Because the string :// is its own token, and so once the model sees a colon by itself, it assumes that the next characters cannot be //; otherwise, the tokenizer would not have used :, and instead would have used ://. This is why there are warnings about ending prompts in whitespace, but the problem is way more pervasive than that: any boundary that may span multiple tokens will cause problems, e.g. notice how a partial word causes incorrect completion:

prompt = 'John is a'
hf_gen(prompt, max_tokens=5)

prompt = 'John is a fo'
hf_gen(prompt, max_tokens=5)

While problematic enough for normal prompts, these problems would be a disaster in the kinds of prompts we wrote in this readme, where there is interleaving of prompting and generation happening multiple times (and thus multiple opportunities for problems). This is why {guidance} implements token healing, a feature that deals with prompt boundaries automatically, allowing users to just think in terms of text rather than tokens. For example:

from guidance import models
gpt = models.Transformers('gpt2')
prompt = 'http:'
gpt + prompt + gen(max_tokens=10)

prompt = 'John is a fo'
gpt + prompt + gen(max_tokens=2)

Fast

Integrated stateful control is faster

We have full control of the decoding loop in our integration with transformers and llamacpp, allowing us to add control and additional prompt without any extra cost.
If instead we're calling a server, we pay the extra cost of making additional requests, which might be ok if the server has caching, but quickly becomes impractical if the server does not have fine-grained caching. For example, note again the output from the gsm8k example with calculator above:

Every time we call calculator, we have to stop generation, append the result to the prompt, and resume generation. To avoid slowing down after the first call, a server would need to keep the KV cache up to '3 for breakfast. So she has calculator(16 - 3)', then roll forward generation from that point on. Even servers that do have caching often don't have a way to guarantee state is preserved at each stop and start, and so user's pay a significant overhead at each interruption. The normal approach of considering everything as a new prompt would cause significant slow downs every time calculator is called.

Guidance acceleration

In addition to the benefit above, {guidance} calls are often faster than running equivalent prompts the traditional way, because we can batch any additional text that is added by the user as execution unrolls (rather than generating it). Take the example below, where we generate a json with a GGUF compressed llama2 7B executed using llama.cpp:

@guidance
def character_maker(lm, id, description, valid_weapons):
    lm += f"""\
    The following is a character profile for an RPG game in JSON format.
    ```json
    {{
        "id": "{id}",
        "description": "{description}",
        "name": "{gen('name', stop='"')}",
        "age": {gen('age', regex='[0-9]+', stop=',')},
        "armor": "{select(options=['leather', 'chainmail', 'plate'], name='armor')}",
        "weapon": "{select(options=valid_weapons, name='weapon')}",
        "class": "{gen('class', stop='"')}",
        "mantra": "{gen('mantra', stop='"')}",
        "strength": {gen('strength', regex='[0-9]+', stop=',')},
        "items": ["{gen('item', list_append=True, stop='"')}", "{gen('item', list_append=True, stop='"')}", "{gen('item', list_append=True, stop='"')}"]
    }}```"""
    return lm
a = time.time()
lm = llama2 + character_maker(1, 'A nimble fighter', ['axe', 'sword', 'bow'])
time.time() - a

Everything that is not green is not actually generated by the model, and is thus batched (much faster). This prompt takes about 1.2 seconds on an A100 GPU. Now, if we let the model generate everything (as in the roughly equivalent prompt below), it takes roughly 2.6 seconds (not only is it slower, we also have less control over generation).

@guidance
def character_maker2(lm, id, description):
    lm += f"""\
    The following is a character profile for an RPG game in JSON format. It has fields 'id', 'description', 'name', 'age', 'armor', weapon', 'class', 'mantra', 'strength', and 'items (just the names of 3 items)'
    please set description to '{description}'
    ```json""" + gen(stop='```')
    return lm
a = time.time()
lm = llama2 + character_maker2(1, 'A nimble fighter')
time.time() - a

Loading models

llama.cpp

Install the python bindings:

CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

Loading the model:

from guidance import models
lm = models.LlamaCpp(path_to_model, n_gpu_layers=-1)

Transformers

Install transformers:

from guidance import models
lm = models.Transformers(model_name_or_path)

Vertex AI

Remote endpoints that don't have explicit guidance integration are run "optimistically". This means that all the text that can be forced is given to the model as a prompt (or chat context) and then the model is run in streaming mode without hard constrants (since the remote API doesn't support them). If the model ever violates the contraints then the model stream is stopped and we optionally try it again at that point. This means that all the API-supported control work as expected, and more complex controls/parsing that is not supported by the API work if the model stays consistent with the program.

palm2 = models.VertexAI("text-bison@001")

with instruction():
    lm = palm2 + "What is one funny fact about Seattle?"

lm + gen("fact", max_tokens=100)

OpenAI

OpenAI endpoint don't have direct support for guidance grammars, but through optimistic running we can still control them in ways that match the model type:

Legacy completion models:

curie = models.OpenAI("text-curie-001")

curie + "The smallest cats are" + gen(stop=".")

Instruct tuned models:

gpt_instruct = models.OpenAI("gpt-3.5-turbo-instruct")

with instruction():
    lm = gpt_instruct + "What are the smallest cats?"
    
lm += gen(stop=".")

Chat models:

gpt = models.OpenAI("gpt-3.5-turbo")

with system():
    lm = gpt + "You are a cat expert."

with user():
    lm += "What are the smallest cats?"

with assistant():
    lm += gen("answer", stop=".")

guidance's People

Contributors

Stargazers

Watchers

Forkers

techthiyanes quaddarv1ne hertera1 microprediction marcosncosta1 azeroth-dev guillermo-ayala shrahimim homebrew-startup goodfitcareers thesekyi atulrawat stjordanis edreams allthingsllm abdoiiii jquesnelle hadryan mbaneshi mrcodechef jasondotparse moerehman dollop therockstardba seshakiran thegovind mikhjones jackiej tchebonenko jr4c touristshaun hoang1510-dt lyhiving tianjunz james-frowen wishtodaya superqd dianpeng coryjo shyamal-anadkat 2212168851 aicodehunt siviltaram hbcbh1999 mikelittman vintlin akhil4rajan ankitshah009 santoshdahale markedmondson1234 anthone rosssong winning1120xx anminhhung hervemignot dojimanoryyu willjallen adityadt68 mikeion ssingh13-rms therakeshpurohit tiantianlecheng gokhanyu contmasqedwo wdshin 3orprofxpersde veve0lagu pistdog0balhe 1menclestanda richardsonjf 0indemprovha masegraye 8conccremacontde positioner nitin-mane 0deonuotastbi nohanaga pratiryprudbo amitkml phaethonp thorvaldia eltociear oijoijcoiejoijce archerband dleodbs jesusoctavioas mallock mccharley jrcribb tonywhite11 edskamor dim-4 centenocodes 1dalliavrene evanmays ilanbm cucanqelta rajeev921 0xxyiy0 0mavoliobi

guidance's Issues

OpenAI API key requested when using transformers models

I'm trying to use a transformers model like so
model = guidance.llms.Transformers('stabilityai/stablelm-base-alpha-3b', device=0)
but getting the following error when I try to call the guidance program:
AssertionError: You must provide an OpenAI API key to use the OpenAI LLM. Either pass it in the constructor, set the OPENAI_API_KEY environment variable, or create the file ~/.openai_api_key with your key in it.

One-shot example in chatgpt vs open source comparison is wrong

There is an issue with this notebook:

https://github.com/microsoft/guidance/blob/main/notebooks/chatgpt_vs_open_source_on_harder_tasks.ipynb

The one-shot example for the simple equation is incorrect:

For example, if the solution is x=2, say SOLUTION = [3].

I believe the numbers there are meant to align to each other. Not sure if it would impact the evaluation results, but worth being accurate.

guidance load models as int8?

how do i call guidance but laod the models as int8 so i can fit them on even an 80Gb GPU?

Address incompatibility with GPT4 more above the fold

I'm seeing this popup around twitter. I've been thinking of workarounds, but perhaps you could highlight the issue as it seems to be pervasive. I'm also curious what you folks think on how to best address

Transformers separate model server?

This may be a stupid question, please forgive if so

the openAI interface obviously relies an on idenpendently existing server for gpt-3.5 and gpt-4

the Transformers interface, though, assumes guidance will load the model internally. Loading models in Transformers takes forever, even when already cached.

Is there a way to point to an existing 'guidance' server to handle guidance prompts, so I don't have to wait an entire model startup cycle every prompt test when using Transformer models like Wizard-13B?

Examples fail to run in IPython

I tried to toy around with the examples in the README but couldn't get a program to execute

Fresh Python 3.11 virtual env (via mamba, i.e., conda) on macOS
pip install guidance to get 0.0.39

Unable to `import guidance` due to unspecified dependency.

Python 3.11.3 | packaged by conda-forge | (main, Apr  6 2023, 08:58:31) [Clang 14.0.6 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.13.2 -- An enhanced Interactive Python. Type '?' for help.

[ins] In [1]: import guidance
         ...: # Set the default llm. Could also pass a different one as argument to guidance(), with guidance(llm=...)
         ...: guidance.llm = guidance.llms.OpenAI("text-davinci-003")
         ...: program = guidance('''The best thing about the beach is {{~gen 'best' temperature=0.7 max_tokens=7}}''')
         ...: prompt = program()
         ...: prompt
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 import guidance
      2 # Set the default llm. Could also pass a different one as argument to guidance(), with guidance(llm=...)
      3 guidance.llm = guidance.llms.OpenAI("text-davinci-003")

File ~/.local/state/mamba/envs/py311/lib/python3.11/site-packages/guidance/__init__.py:12
     10 from ._utils import load, chain
     11 from . import selectors
---> 12 import nest_asyncio
     14 # allows us to start inner event loops within jupyter notebooks
     15 nest_asyncio.apply()

ModuleNotFoundError: No module named 'nest_asyncio'

Unable to execute program due to missing `guidance.llm` attribute.

[ins] In [1]: import guidance
         ...:
         ...: # set the default language model used to execute guidance programs
         ...: guidance.llm = guidance.llms.OpenAI("text-davinci-003")
         ...:
         ...: # define a guidance program that adapts a proverb
         ...: program = guidance("""Tweak this proverb to apply to model instructions instead.
         ...:
         ...: {{proverb}}
         ...: - {{book}} {{chapter}}:{{verse}}
         ...:
         ...: UPDATED
         ...: Where there is no guidance{{gen 'rewrite' stop="\\n-"}}
         ...: - GPT {{gen 'chapter'}}:{{gen 'verse'}}""")
         ...:
         ...: # execute the program on a specific proverb
         ...: executed_program = program(
         ...:     proverb="Where there is no guidance, a people falls,\nbut in an abundance of counselors there is safety.",
         ...:     book="Proverbs",
         ...:     chapter=11,
         ...:     verse=14
         ...: )
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[9], line 7
      4 guidance.llm = guidance.llms.OpenAI("text-davinci-003")
      6 # define a guidance program that adapts a proverb
----> 7 program = guidance("""Tweak this proverb to apply to model instructions instead.
      8
      9 {{proverb}}
     10 - {{book}} {{chapter}}:{{verse}}
     11
     12 UPDATED
     13 Where there is no guidance{{gen 'rewrite' stop="\\n-"}}
     14 - GPT {{gen 'chapter'}}:{{gen 'verse'}}""")
     16 # execute the program on a specific proverb
     17 executed_program = program(
     18     proverb="Where there is no guidance, a people falls,\nbut in an abundance of counselors there is safety.",
     19     book="Proverbs",
     20     chapter=11,
     21     verse=14
     22 )

File ~/.local/state/mamba/envs/py311/lib/python3.11/site-packages/guidance/__init__.py:22, in Guidance.__call__(self, template, llm, cache_seed, logprobs, silent, async_mode, stream, caching, await_missing, **kwargs)
     21 def __call__(self, template, llm=None, cache_seed=0, logprobs=None, silent='auto', async_mode=False, stream='auto', caching=None, await_missing=False, **kwargs):
---> 22     return Program(template, llm=llm, cache_seed=cache_seed, logprobs=logprobs, silent=silent, async_mode=async_mode, stream=stream, caching=caching, await_missing=await_missing, **kwargs)

File ~/.local/state/mamba/envs/py311/lib/python3.11/site-packages/guidance/_program.py:85, in Program.__init__(self, text, llm, cache_seed, logprobs, silent, async_mode, stream, caching, await_missing, **kwargs)
     83 # save the given parameters
     84 self._text = text
---> 85 self.llm = llm or guidance.llm
     86 self.cache_seed = cache_seed
     87 self.caching = caching

AttributeError: module 'guidance' has no attribute 'llm'```

Running guidance within Gradio

Hi,

I'm trying to run guidance within a Gradio app. I'm not sure about the async framework behind Gradio but it seems to be AnyIO and is using some worker threads.

The problem I am experiencing is that when I attempt to run a program I get an error that there is no current event loop in the thread. I am not trying to use the program asynchronously (the flag passed in).

  File "/data/scratch/haukurpj/miniconda3/envs/greynirqa/lib/python3.9/site-packages/guidance/_program.py", line 194, in __call__
    new_program = Program(
  File "/data/scratch/haukurpj/miniconda3/envs/greynirqa/lib/python3.9/site-packages/guidance/_program.py", line 110, in __init__
    self._execute_complete = asyncio.Event() # fires when the program is done executing to resolve __await__
  File "/data/scratch/haukurpj/miniconda3/envs/greynirqa/lib/python3.9/asyncio/locks.py", line 177, in __init__
    self._loop = events.get_event_loop()
  File "/data/scratch/haukurpj/miniconda3/envs/greynirqa/lib/python3.9/site-packages/nest_asyncio.py", line 45, in _get_event_loop
    loop = events.get_event_loop_policy().get_event_loop()
  File "/data/scratch/haukurpj/miniconda3/envs/greynirqa/lib/python3.9/asyncio/events.py", line 642, in get_event_loop
    raise RuntimeError('There is no current event loop in thread %r.'
RuntimeError: There is no current event loop in thread 'AnyIO worker thread'.

Any help is welcome.

error while using guidance with azure openai

The bug
A clear and concise description of what the bug is.

ValueError: Fail to create device flow. Err: {
"error": "invalid_request",
"error_codes": [
900144
],
....
}

To Reproduce
Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.

# put your code snippet here

System info (please complete the following information):

OS (e.g. Ubuntu, Windows 11, Mac OS, etc.):
Guidance Version (guidance.__version__):

ValueError: No valid option generated in #select!

Just walking through the tutorial.

guidance.llm = guidance.llms.OpenAI("text-curie-001")

p = guidance('''Is the following sentence offensive? Yes, no, or maybe.
Sentence: {{example}}
Answer: {{#select 'answer' logprobs='logprobs'}} Yes{{or}} No{{or}} Maybe{{/select}}''')

p = p(example='You are such an asshole!')

What did I miss?

ps: text-davinci-003 failed as well.

role_simulator function and encountered a KeyError for the 'first_question' variable

Hello,

I've been using your role_simulator function and encountered a KeyError for the 'first_question' variable. The error message suggests that 'first_question' should have been passed as an argument when calling the program, or it should have been set as a default value when creating the program.

Here's the code that's causing the error:

role_simulator = guidance('''
...
{{#if first_question}}You can also start the conversation.{{/if}}
...
{{#if first_question}}Let me start the conversation now:
{{role}}: {{first_question}}{{/if}}
...
''')

republican = role_simulator(role='Republican')
democrat = role_simulator(role='Democrat')

first_question = '''What do you think is the best way to stop inflation?'''
...

The error occurs because 'first_question' is not defined at the time when the role_simulator function is called.

tutorial.ipynb "Using Tools" IncompleteParseError

installed on windows 10 with Anaconda Python 3.10.9 openai-0.27.6
ran pip install guidance
notebook tutorial.ipynb worked until "Using Tools" section

IncompleteParseError Traceback (most recent call last)
Cell In[8], line 87
35 program = guidance('''
36 {{#system~}}
37 You are a helpful assistant.
(...)
82 {{~/assistant}}
83 ''')
85 query = "What is Facebook's stock price right now?"
---> 87 program = program(
88 user_query=query,
89 search=search,
90 is_search=is_search,
91 practice=practice_round
92 )

File ~\AppData\Roaming\Python\Python310\site-packages\guidance_program.py:204, in Program.call(self, **kwargs)
195 new_program = Program(
196 text=self.marked_text,
197
(...)
200 {{k: v if callable(v) else copy.deepcopy(v) for k,v in self._variables.items()}, **kwargs}
201 )
203 # create an executor for the new program (this also marks the program as executing)
--> 204 new_program._executor = ProgramExecutor(new_program)
206 # if we are in async mode schedule the program in the current event loop
207 if new_program.async_mode:

File ~\AppData\Roaming\Python\Python310\site-packages\guidance_program_executor.py:50, in ProgramExecutor.init(self, program)
48 except parsimonious.exceptions.ParseError as e:
49 self._check_for_simple_error(text)
---> 50 raise e

File ~\AppData\Roaming\Python\Python310\site-packages\guidance_program_executor.py:47, in ProgramExecutor.init(self, program)
45 # parse the program text
46 try:
---> 47 self.parse_tree = grammar.parse(text)
48 except parsimonious.exceptions.ParseError as e:
49 self._check_for_simple_error(text)

File ~\AppData\Roaming\Python\Python310\site-packages\parsimonious\grammar.py:112, in Grammar.parse(self, text, pos)
106 """Parse some text with the :term:default rule.
107
108 :arg pos: The index at which to start parsing
109
110 """
111 self._check_default_rule()
--> 112 return self.default_rule.parse(text, pos=pos)

File ~\AppData\Roaming\Python\Python310\site-packages\parsimonious\expressions.py:143, in Expression.parse(self, text, pos)
141 node = self.match(text, pos=pos)
142 if node.end < len(text):
--> 143 raise IncompleteParseError(text, node.end, self)
144 return node

IncompleteParseError: Rule 'template' matched in its entirety, but it didn't consume all the text. The non-matching portion of the text begins with '{{#each practice}}
{' (line 14, column 1).

The OpenAI API does not currently support partial assistant prompting. tutorial.py section 21 Agents

installed on windows 10 with Anaconda Python 3.10.9 openai-0.27.6
ran pip install guidance
notebook tutorial.ipynb worked until section 21

Traceback

Traceback (most recent call last):
File "C:\Users\David\AppData\Roaming\Python\Python310\site-packages\guidance\library_geneach.py", line 143, in geneach
gen_obj = await parser.llm_session(strip_markers(parser.prefix), stop=stop, max_tokens=len(stop_tokens)+2, temperature=0, cache_seed=0)
File "C:\Users\David\AppData\Roaming\Python\Python310\site-packages\guidance\llms_openai.py", line 310, in call
out = self.llm.caller(**call_args)
File "C:\Users\David\AppData\Roaming\Python\Python310\site-packages\guidance\llms_openai.py", line 186, in _library_call
kwargs['messages'] = prompt_to_messages(kwargs['prompt'])
File "C:\Users\David\AppData\Roaming\Python\Python310\site-packages\guidance\llms_openai.py", line 24, in prompt_to_messages
assert prompt.endswith("<|im_start|>assistant\n"), "When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting."
AssertionError: When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\David\AppData\Roaming\Python\Python310\site-packages\guidance_program_executor.py", line 94, in run
await self.visit(self.parse_tree)
File "C:\Users\David\AppData\Roaming\Python\Python310\site-packages\guidance_program_executor.py", line 423, in visit
visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
File "C:\Users\David\AppData\Roaming\Python\Python310\site-packages\guidance_program_executor.py", line 423, in visit
visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
File "C:\Users\David\AppData\Roaming\Python\Python310\site-packages\guidance_program_executor.py", line 390, in visit
command_output = await command_function(*positional_args, **named_args)
File "C:\Users\David\AppData\Roaming\Python\Python310\site-packages\guidance\library_geneach.py", line 145, in geneach
raise Exception(f"Error generating stop tokens for geneach loop. Perhaps you are outside of role tags (assistant/user/system)? If you don't want the loop to check for stop tokens, set stop=False or set num_iterations.")
Exception: Error generating stop tokens for geneach loop. Perhaps you are outside of role tags (assistant/user/system)? If you don't want the loop to check for stop tokens, set stop=False or set num_iterations.

Error in program: Error generating stop tokens for geneach loop. Perhaps you are outside of role tags (assistant/user/system)? If you don't want the loop to check for stop tokens, set stop=False or set num_iterations.

AssertionError Traceback (most recent call last)
File ~\AppData\Roaming\Python\Python310\site-packages\guidance\library_geneach.py:143, in geneach(list_name, stop, max_iterations, min_iterations, num_iterations, hidden, join, single_call, single_call_temperature, single_call_max_tokens, single_call_top_p, _parser_context)
142 try:
--> 143 gen_obj = await parser.llm_session(strip_markers(parser.prefix), stop=stop, max_tokens=len(stop_tokens)+2, temperature=0, cache_seed=0)
144 except Exception:

File ~\AppData\Roaming\Python\Python310\site-packages\guidance\llms_openai.py:310, in OpenAISession.call(self, prompt, stop, stop_regex, temperature, n, max_tokens, logprobs, top_p, echo, logit_bias, token_healing, pattern, stream, cache_seed, caching)
309 call_args["logit_bias"] = {str(k): v for k,v in logit_bias.items()} # convert keys to strings since that's the open ai api's format
--> 310 out = self.llm.caller(**call_args)
312 except openai.error.RateLimitError:

File ~\AppData\Roaming\Python\Python310\site-packages\guidance\llms_openai.py:186, in OpenAI._library_call(self, **kwargs)
185 if self.chat_mode:
--> 186 kwargs['messages'] = prompt_to_messages(kwargs['prompt'])
187 del kwargs['prompt']

File ~\AppData\Roaming\Python\Python310\site-packages\guidance\llms_openai.py:24, in prompt_to_messages(prompt)
21 # if len(start_tags) != len(end_tags):
22 # raise MalformedPromptException("Malformed prompt: start and end tags are not properly paired")
---> 24 assert prompt.endswith("<|im_start|>assistant\n"), "When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting."
26 pattern = r'<|im_start|>(\w+)(.*?)(?=<|im_end|>|$)'

AssertionError: When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.

During handling of the above exception, another exception occurred:

Exception Traceback (most recent call last)
Cell In[21], line 13
1 prompt = guidance(
2 '''{{#system~}}
3 You are a helpful assistant
(...)
11 {{/assistant}}
12 {{/geneach}}''')
---> 13 prompt= prompt(user_text ='hi there')
14 prompt

File ~\AppData\Roaming\Python\Python310\site-packages\guidance_program.py:225, in Program.call(self, **kwargs)
223 loop = asyncio.new_event_loop()
224 loop.create_task(new_program.update_display.run()) # start the display updater
--> 225 loop.run_until_complete(new_program.execute())
227 return new_program

File C:\anaconda3\lib\site-packages\nest_asyncio.py:90, in _patch_loop..run_until_complete(self, future)
87 if not f.done():
88 raise RuntimeError(
89 'Event loop stopped before Future completed.')
---> 90 return f.result()

File C:\anaconda3\lib\asyncio\futures.py:201, in Future.result(self)
199 self.__log_traceback = False
200 if self._exception is not None:
--> 201 raise self._exception.with_traceback(self._exception_tb)
202 return self._result

File C:\anaconda3\lib\asyncio\tasks.py:232, in Task.__step(failed resolving arguments)
228 try:
229 if exc is None:
230 # We use the send method directly, because coroutines
231 # don't have __iter__ and __next__ methods.
--> 232 result = coro.send(None)
233 else:
234 result = coro.throw(exc)

File ~\AppData\Roaming\Python\Python310\site-packages\guidance_program.py:311, in Program.execute(self)
309 else:
310 with self.llm.session(asynchronous=True) as llm_session:
--> 311 await self._executor.run(llm_session)
312 self._text = self._executor.prefix
314 # delete the executor and so mark the program as not executing

File ~\AppData\Roaming\Python\Python310\site-packages\guidance_program_executor.py:98, in ProgramExecutor.run(self, llm_session)
96 print(traceback.format_exc())
97 print("Error in program: ", e)
---> 98 raise e

File ~\AppData\Roaming\Python\Python310\site-packages\guidance_program_executor.py:94, in ProgramExecutor.run(self, llm_session)
88 self.llm_session = llm_session
89 try:
90 # first parse all the whitespace control
91 # self.whitespace_control_visit(self.parse_tree)
92
93 # now execute the program
---> 94 await self.visit(self.parse_tree)
95 except Exception as e:
96 print(traceback.format_exc())

File ~\AppData\Roaming\Python\Python310\site-packages\guidance_program_executor.py:423, in ProgramExecutor.visit(self, node, next_node, next_next_node, prev_node, parent_node, grandparent_node)
421 else:
422 inner_prev_node = prev_node
--> 423 visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
424 # visited_children = [self.visit(child) for child in node.children]
426 if len(visited_children) == 1:

File ~\AppData\Roaming\Python\Python310\site-packages\guidance_program_executor.py:390, in ProgramExecutor.visit(self, node, next_node, next_next_node, prev_node, parent_node, grandparent_node)
388 # call the optionally asyncronous command
389 if inspect.iscoroutinefunction(command_function):
--> 390 command_output = await command_function(*positional_args, **named_args)
391 else:
392 command_output = command_function(*positional_args, **named_args)

File ~\AppData\Roaming\Python\Python310\site-packages\guidance\library_geneach.py:145, in geneach(list_name, stop, max_iterations, min_iterations, num_iterations, hidden, join, single_call, single_call_temperature, single_call_max_tokens, single_call_top_p, _parser_context)
143 gen_obj = await parser.llm_session(strip_markers(parser.prefix), stop=stop, max_tokens=len(stop_tokens)+2, temperature=0, cache_seed=0)
144 except Exception:
--> 145 raise Exception(f"Error generating stop tokens for geneach loop. Perhaps you are outside of role tags (assistant/user/system)? If you don't want the loop to check for stop tokens, set stop=False or set num_iterations.")
146 if gen_obj["choices"][0]["finish_reason"] == "stop":
147 break

Exception: Error generating stop tokens for geneach loop. Perhaps you are outside of role tags (assistant/user/system)? If you don't want the loop to check for stop tokens, set stop=False or set num_iterations.

Exemple of Hidden generation seems to be wrong

I think that the output for hidden generation on README.me is wrong. Can someone double check it?

https://github.com/microsoft/guidance#hidden-generation

Raw text streaming with hugging face transformers?

What's the best way to take advantage of the streaming capabilities of hugging face transformers in this library? I see that streaming is all done internally but it's unclear how its exposed to the library user (me)

I figured out a few hacky methods. The only promising one so far is using list_append with a List-like object. Code needs a lot of work. append is called once at the beginning, and then __setitem__ is repeatedly called with -1, so the whole thing works more like a callback itself (output down below):

class dynlist:
    def __init__(self, callback):
        self.data = list()
        self.callback = callback
        
    def append(self, item):
        self.data.append(item)
        self.callback(self.data)
        
    def __setitem__(self, key, val):
        self.data.__setitem__(key, val)
        self.callback(self.data)
        
def update_x(session_id, data: List[str]):
    ....

prompt = guidance("...{{~gen "response" list_append=True temperature=0.4 top_p=0.9}}")
my_session_id = ...
response = dynlist(functools.partial(update_x, my_session_id))
await prompt("...", llm=llm, stream=True, async_mode=True, response=response)

list-append: 
set-item: -1  I
set-item: -1  I like
set-item: -1  I like hanging
set-item: -1  I like hanging out
set-item: -1  I like hanging out with
set-item: -1  I like hanging out with you
set-item: -1  I like hanging out with you.
set-item: -1  I like hanging out with you.
set-item: -1  I like hanging out with you.

Add more LLM support, like Claude-100k

Is your feature request related to a problem? Please describe.
I know this lib is new. It would be nice to see more LLMs to be added, especially Anthropic Claudes

Describe the solution you'd like
More LLMs support

Describe alternatives you've considered
N/A

Additional context
Claudes-100k is super useful.

Support for DSL grammars

Is your feature request related to a problem? Please describe.
Seeing that guidance has some ability to generate syntax specific outputs, I'm curious to know if the team has thoughts on generating more complex programming language grammars.

Describe the solution you'd like

When the points to a reference syntax, the llm output would be syntactically correct specific to the given language grammar.

Describe alternatives you've considered
None

Additional context
I have a bunch of domain specific languages I've built and I've been thinking that if I can basically validate each token that comes out of the stream against a language grammar, it should basically ensure that it works correctly all the time. It's kind of an extension of the valid options you have in the examples.

I'm happy to help build this but I'd need some help is outlining this.

Guidance not working within Streamlit

The bug
This is based on #19 which has been resolved. When I run the same code as that issue I get the error "No module named 'guidance'". This is despite having done pip install guidance
To Reproduce
Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.

"""Run with `streamlit run app.py`"""
import guidance
import streamlit as st

#import asyncio
#loop = asyncio.new_event_loop()
#asyncio.set_event_loop(loop)

st.title("Test Guidance")

guidance.llm = guidance.llms.OpenAI("text-davinci-003")

prompt = guidance('''The best thing about the beach is {{~gen 'best' temperature=0.7 max_tokens=7}}''')

st.write(prompt())

System info (please complete the following information):
MacOS

Guidance Version (guidance.__version__):

Guaranteeing valid syntax JSON fails for arrays of numbers

The bug
Although producing individual fields as numbers seems to work fine (although invariably the actual parsed variable is still a string as seen below), and arrays of strings also seem to work fine, arrays of numbers do not seem work properly.

I'd love to be able to fully understand why this is happening and am also very happy to attempt to fix this if someone is able to point me in the right direction.

To Reproduce

import guidance
import os

os.environ["OPENAI_API_KEY"] = "sk-..."

llm = guidance.llms.OpenAI("text-davinci-003")


def example():
    # we can pre-define valid option sets
    valid_weapons = ["sword", "axe", "mace", "spear", "bow", "crossbow"]

    # define the prompt
    program = guidance("""The following is a character profile for an RPG game in JSON format.
```json
{
    "description": "{{description}}",
    "name": "{{gen 'name'}}",
    "age": {{gen 'age' stop=','}},
    "armor": "{{#select 'armor'}}leather{{or}}chainmail{{or}}plate{{/select}}",
    "weapon": "{{select 'weapon' options=valid_weapons}}",
    "class": "{{gen 'class'}}",
    "mantra": "{{gen 'mantra'}}",
    "strength": {{gen 'strength' stop=','}},
    "item_ids": [{{#geneach 'numbers' num_iterations=3}}
        {{gen 'this'}},{{/geneach}}
    ]
}```""")

    # execute the prompt
    result = program(
        description="A quick and nimble fighter.", valid_weapons=valid_weapons, llm=llm
    )

    print(result)
    print(result.variables())

    return result


if __name__ == "__main__":
    example()

The outputs of the print statements are as follows:

print(result)

The following is a character profile for an RPG game in JSON format.
```json
{
    "description": "A quick and nimble fighter.",
    "name": "Fighter",
    "age":  25,
    "armor": "leather",
    "weapon": "sword",
    "class": "warrior",
    "mantra": "Strength in numbers.",
    "strength":  8,
    "item_ids": [
         1,
         2,
         3
    ]
},
    ]
}```

print(result.variables())

{
'llm': <guidance.llms._openai.OpenAI object at 0x101205df0>,
'description': 'A quick and nimble fighter.', 
'valid_weapons': ['sword', 'axe', 'mace', 'spear', 'bow', 'crossbow'],
'name': 'Fighter',
'age': ' 25',
'armor': 'leather', 
'weapon': 'sword',
'class': 'warrior',
'mantra': 'Strength in numbers.', 
'strength': ' 8', 
'numbers': [' 1', ' 2', ' 3\n    ]\n}']
}

Most of the problem here seems to be that the generation of the final number in the array goes a bit wonky - not sure if I could tweak my prompt, or if it's a deeper issue.

System info

Mac OS:
Guidance Version 0.0.48

Unable to import module when running from a read-only filesystem

When importing guidance from a read-only filesystem, there are a couple of issues:

Opening the log.txt file (in https://github.com/microsoft/guidance/blob/main/guidance/_utils.py#L10) will always raise an exception as it can't be opened for writing. Can this be switched to use the standard python logging module instead?
Setting caching=False still tries to create the cache directory. Can be worked around by setting os.environ['XDG_CACHE_HOME'] = '/tmp' (or somewhere that is definitely writeable) before importing, but isn't ideal.

System info:

OS: AWS Lambda Python 3.9
Guidance Version: 0.0.47

Javascript Support

Is your feature request related to a problem? Please describe.
JavaScript support is currently not available.

Describe the solution you'd like
I would like to request JavaScript support.

Describe alternatives you've considered
I have considered using alternative libraries or frameworks to achieve similar functionality in JavaScript like langchainjs and my own version. However, the Guidance library stands out due to its specific features and capabilities.

Running chat.ipynb notebook with gpt-3.5-turbo throws "list index out of range"

Where?
https://github.com/microsoft/guidance/blob/b1e656f87abc543c68f3d07057a23f2f5922a878/notebooks/chat.ipynb

Steps to reproduce:

Set openAI api key
Open the chat notebook
change the openAI model from gpt-4 to gpt-3.5-turbo
Run the first 2 blocks in the notebook

Error

IndexError                                Traceback (most recent call last)
Cell In[14], line 45
      3     return options[best]
      5 create_plan = guidance('''{{#system~}}
      6 You are a helpful assistant.
      7 {{[~/system](https://file+.vscode-resource.vscode-cdn.net/Users/user/guidance/notebooks/~/system)}}
   (...)
     42 {{gen 'plan' max_tokens=500}}
     43 {{[~/assistant](https://file+.vscode-resource.vscode-cdn.net/Users/user/guidance/notebooks/~/assistant)}}''')
---> 45 out = create_plan(goal='read more books', parse_best=parse_best)
     46 out

File [/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/guidance/_program.py:216](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/guidance/_program.py:216), in Program.__call__(self, **kwargs)
    214     loop = asyncio.new_event_loop()
    215     loop.create_task(new_program.update_display.run()) # start the display updater
--> 216     loop.run_until_complete(new_program.execute())
    218 return new_program

File [/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/nest_asyncio.py:90](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/nest_asyncio.py:90), in _patch_loop..run_until_complete(self, future)
     87 if not f.done():
     88     raise RuntimeError(
     89         'Event loop stopped before Future completed.')
---> 90 return f.result()
...
      1 def parse_best(prosandcons, options):
----> 2     best = int(re.findall(r'Best=(\d+)', prosandcons)[0])
      3     return options[best]

IndexError: list index out of range

Streaming example

Can an example of streaming output as a generator be added? My use case is replacing langchain in production for a QA system.

use_clear_syntax.ipynb 'Transformers' object has no attribute 'role_start'

The bug
Bump into this error:

File "/home/lucas/miniconda3/envs/seed/lib/python3.11/site-packages/guidance/library/_role.py", line 12, in role
    partial_output(parser.program.llm.role_start(name))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Transformers' object has no attribute 'role_start'

Error in program:  'Transformers' object has no attribute 'role_start'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[5], line 21
      1 program = guidance('''
      2 {{#system}}You are an expert unix systems admin.{{/system}}
      3 
   (...)
     19 {{[~/assistant](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/lucas/seed/wip/guidance/~/assistant)}}
     20 ''', llm=chat_llm)
---> 21 out = program(os="Linux", unique=lambda x: list(set(x)), caching=False)

File [~/miniconda3/envs/seed/lib/python3.11/site-packages/guidance/_program.py:225](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/lucas/seed/wip/guidance/~/miniconda3/envs/seed/lib/python3.11/site-packages/guidance/_program.py:225), in Program.__call__(self, **kwargs)
    223     loop = asyncio.new_event_loop()
    224     loop.create_task(new_program.update_display.run()) # start the display updater
--> 225     loop.run_until_complete(new_program.execute())
    227 return new_program

File [~/miniconda3/envs/seed/lib/python3.11/site-packages/nest_asyncio.py:90](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/lucas/seed/wip/guidance/~/miniconda3/envs/seed/lib/python3.11/site-packages/nest_asyncio.py:90), in _patch_loop..run_until_complete(self, future)
     87 if not f.done():
     88     raise RuntimeError(
     89         'Event loop stopped before Future completed.')
---> 90 return f.result()

File [~/miniconda3/envs/seed/lib/python3.11/asyncio/futures.py:203](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/lucas/seed/wip/guidance/~/miniconda3/envs/seed/lib/python3.11/asyncio/futures.py:203), in Future.result(self)
    201 self.__log_traceback = False
...
     18     next_next_node=_parser_context["next_next_node"]
     19 )
     21 # send the role-end special tokens

AttributeError: 'Transformers' object has no attribute 'role_start'

Run
To Reproduce(direct copy from use_clear_syntax.ipynb)

chat_llm = guidance.llms.Transformers("stabilityai/stablelm-tuned-alpha-3b", device=1)

program = guidance('''
{{#system}}You are an expert unix systems admin.{{/system}}

{{#user~}}
What are the most common commands used in the {{os}} operating system?
{{~/user}}

{{#assistant~}}
{{#block hidden=True~}}
Here is a common command: "{{gen 'commands' stop='"' n=10 max_tokens=20 temperature=0.7}}"
{{~/block~}}

{{#each (unique commands)}}
{{@index}}. {{this}}
{{~/each}}

Perhaps the most useful command from that list is: "{{gen 'cool_command'}}", because{{gen 'cool_command_desc' max_tokens=100 stop="\\n"}}
On a scale of 1-10, it has a coolness factor of: {{gen 'coolness' pattern="[0-9]+"}}.
{{~/assistant}}
''', llm=chat_llm)
out = program(os="Linux", unique=lambda x: list(set(x)), caching=False)

System info (please complete the following information):

OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): Win11 WSL2 ubuntu.
Guidance Version (guidance.__version__): 0.0.47

Cannot initialize AzureOpenAI llm

A minor bug is causing the AzureOpenAI llm __init__ to fail.

Steps to reproduce:

import guidance

guidance.llms.AzureOpenAI("gpt-3.5-turbo")

Error message:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[17], line 3
      1 import guidance
----> 3 guidance.llms.AzureOpenAI("gpt-3.5-turbo")

File [~/medical-writing/.venv/lib/python3.10/site-packages/guidance/llms/_azure_openai.py:34](https://file+.vscode-resource.vscode-cdn.net/Users/kevinblissett/medical-writing/~/medical-writing/.venv/lib/python3.10/site-packages/guidance/llms/_azure_openai.py:34), in AzureOpenAI.__init__(self, model, client_id, authority, caching, max_retries, max_calls_per_min, token, endpoint, scopes, temperature, chat_mode)
     31 if os.path.exists(self._token_cache_path):
     32     self._token_cache.deserialize(open(self._token_cache_path, 'r').read())
---> 34 self._rest_headers["X-ModelType"] = self.model_name

AttributeError: 'AzureOpenAI' object has no attribute '_rest_headers'

delete this repo

This project is seriously embarrassing. The sooner someone in management comes to their senses and gets this deleted the better off you will all be.

mere comment

Thanks ! Love the token control.

Just wish your example was Keynes v Hayek.

Error: No valid option generated in #select!

Message continues: Please post a GitHub issue since this should not happen :)

So here I am.

Raised from:

  File "[..]\guidance\_program_executor.py", line 94, in run
    await self.visit(self.parse_tree)
  File "[..]\guidance\_program_executor.py", line 423, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
  File "[..]\guidance\_program_executor.py", line 423, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
  File "[..]\guidance\_program_executor.py", line 390, in visit
    command_output = await command_function(*positional_args, **named_args)
  File "[..]\guidance\library\_select.py", line 138

Running the following from tutorial.ipynb:

import guidance
guidance.llm = guidance.llms.OpenAI("text-davinci-003")

prompt = guidance('''Is the following sentence offensive? Please answer with a single word, either "Yes", "No", or "Maybe".
Sentence: {{example}}
Answer:{{#select "answer" logprobs='logprobs'}} Yes{{or}} No{{or}} Maybe{{/select}}''')
prompt = prompt(example="I hate pineapple on pizza, so that I am ready to kill because of it") //not my actual opinion, was trying to get it to reply "Yes".
print(prompt)

option_logprobs is:

{' Yes': -1000, ' No': -1000, ' Maybe': -1000}

Select outputs duplicate strings for LLaMA-7B

My results with LLaMA-7B don't match the tutorial, please see some examples below.

import guidance
guidance.llm = guidance.llms.Transformers('Neko-Institute-of-Science/LLaMA-7B-HF', device=0)

1. Character Creator

Code (copied from readme)

# we can pre-define valid option sets
valid_weapons = ["sword", "axe", "mace", "spear", "bow", "crossbow"]

# define the prompt
character_maker = guidance("""The following is a character profile for an RPG game in JSON format.
```json
{
  "id": "{{id}}",
  "description": "{{description}}",
  "name": "{{gen 'name'}}",
  "age": {{gen 'age' pattern='[0-9]+' stop=','}},
  "armor": "{{#select 'armor'}}leather{{or}}chainmail{{or}}plate{{/select}}",
  "weapon": "{{select 'weapon' options=valid_weapons}}",
  "class": "{{gen 'class'}}",
  "mantra": "{{gen 'mantra' temperature=0.7}}",
  "strength": {{gen 'strength' pattern='[0-9]+' stop=','}},
  "items": [{{#geneach 'items' num_iterations=5 join=', '}}"{{gen 'this' temperature=0.7}}"{{/geneach}}]
}```""")

# generate a character
character_maker(
  id="e1f491f7-7ab8-4dac-8c20-c92b5e7d883d",
  description="A quick and nimble fighter.",
  valid_weapons=valid_weapons
)

Result:

The following is a character profile for an RPG game in JSON format.

{
    "id": "e1f491f7-7ab8-4dac-8c20-c92b5e7d883d",
    "description": "A quick and nimble fighter.",
    "name": "Fighter",
    "age": 18,
    "armor": "plateplate",
    "weapon": "swordsword",
    "class": "fighter",
    "mantra": "Weaponry is my strength.",
    "strength": 10,
    "items": ["sword", "armor", "helmet", "sword", "fishing pole"]
}

select for the armor and weapon duplicates the output.

2. Basic Templating

Code (copied from examples)

```python prompt = guidance('''Is the following sentence offensive? Please answer with a single word, either "Yes", "No", or "Maybe". Sentence: {{example}} Answer:{{#select "answer" logprobs='logprobs'}} Yes{{or}} No{{or}} Maybe{{/select}}''') prompt = prompt(example='I hate tacos') prompt ```

Result:

Is the following sentence offensive? Please answer with a single word, either "Yes", "No", or "Maybe".
Sentence: I hate tacos
Answer: Yes Yes

Code (copied from examples)

prompt = guidance('''Generate a response to the following email:
{{email}}.
Response:{{gen "response"}}

Is the response above offensive in any way? Please answer with a single word, either "Yes" or "No".
Answer:{{#select "answer" logprobs='logprobs'}} Yes{{or}} No{{/select}}''')
prompt = prompt(email='I hate tacos')
prompt

Generate a response to the following email:
I hate tacos.
Response: I hate tacos too.
I hate tacos.
Response: I hate tacos too. I hate tacos too.
I hate tacos too.
Response: I hate tacos too. I hate tacos too. I hate tacos too.
I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate tacos too. I hate

Is the response above offensive in any way? Please answer with a single word, either "Yes" or "No".
Answer: Yes Yes

select duplicates the Yes

Add support for ggml models

Issue: we can't import ggml (ex: llama-cpp) models.

Comparison with Langchain?

I skimmed the documentation and notebooks, and I found some similarities between guidance and langchain. Can you please explain the reasons one should use guidance over langchain? More importantly, some of the features (e.g., token healing) seem not to be available for GPT-4 (or closed-source models in general). Is that true?

Select statement broken on options longer than 1 token

I suspect that the select statement is broken whenever some the options are longer than 1 token, due to a misunderstanding of how top_logprobs work. The crucial property is that the ith entry of top_logprobs contains the top logprobs assuming that the i-1 first tokens are as in the returned completion.

To exemplify the problem, say that your select has two options YesNo and NoYes for strange reasons. These are both 2 tokens and consist of tokens Yes and No, so those are the only ones biased to be output by the model. They're both weird, though, and it's much more likely that the model returns e.g. YesYes. So let's say it does.

What your select statement then should be computing is:

logprob(YesNo): The logprob of first generating Yes plus then generating No.
logprob(NoYes): The logprob of first generating No plus then generating Yes.

Your code gets the first one right. But for point two, what you're computing is rather:

X: The logprob of first generating No, plus the logprob of generating Yes after first generating Yes.

It's unclear how X will compare to the true logprob(NoYes), but the end result is that what gets selected has little relation to what the model truly finds most probably.

Note that the problem compounds the longer the options are.

Exception: Error generating stop tokens for geneach loop.

The bug

AssertionError: When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.
...
Exception: Error generating stop tokens for geneach loop. Perhaps you are outside of role tags (assistant/user/system)? If you don't want the loop to check for stop tokens, set stop=False or set num_iterations.

The error goes away after seting stop=False as suggested in the error message but I think this should be added to the readme. I can do the pull request if you agree.

To Reproduce
I copy pasted the example Agents with geneach given in the readme.

import guidance
guidance.llm = guidance.llms.OpenAI('gpt-3.5-turbo')
guidance.llm.cache.clear()

prompt = guidance(
'''{{#system~}}
You are a helpful assistant
{{~/system}}
{{~#geneach 'conversation'}}
{{#user~}}
{{set 'this.user_text' (await 'user_text')}}
{{~/user}}
{{#assistant~}}
{{gen 'this.ai_text' temperature=0 max_tokens=300}}
{{~/assistant}}
{{~/geneach}}''')
prompt= prompt(user_text ='hi there')
prompt

System info (please complete the following information):

OS: Ubuntu
Guidance Version: 0.0.47 and 0.0.48

Import fails on Windows 10

Issue: Library fails to import on Windows 10 because termios has no Windows package.
Possible Solutions: Instead of termios, we must import msvcrt. For the pseudoterminals (pty), one could perhaps look at using andfoy/pywinpty.
Stacktrace:

File [{...}\lib\site-packages\guidance\library\__init__.py:1]({...}/lib/site-packages/guidance/library/__init__.py:1)
----> 1 from ._shell import shell
      2 from ._gen import gen
      3 from ._await import await_

File [{...}\lib\site-packages\guidance\library\_shell.py:7]({...}/lib/site-packages/guidance/library/_shell.py:7)
      5 import os
      6 import subprocess
----> 7 import pty
      8 import asyncio
     11 def shell(command, partial_output, safe=True):

File [{...}\lib\pty.py:12]({...}/lib/pty.py:12)
     10 import os
     11 import sys
---> 12 import tty
     14 __all__ = ["openpty","fork","spawn"]
     16 STDIN_FILENO = 0

File [{...}\lib\tty.py:5]({...}/lib/tty.py:5)
      1 """Terminal utilities."""
      3 # Author: Steen Lumholt.
----> 5 from termios import *
      7 __all__ = ["setraw", "setcbreak"]
      9 # Indexes for termios list.

ModuleNotFoundError: No module named 'termios'

Code reference: https://github.com/microsoft/guidance/blob/ff31b31177ee9ad036ef5130daada5008444c734/guidance/library/_shell.py#LL7C11-L7C11

add .save() and .load() functions to Program class to (de)serialize partially executed programs

Is your feature request related to a problem? Please describe.

For non-executed programs storing and loading from text files works fine. But for partially executed programs with custom variables an ugly workaround was necessary. Otherwise I would get the error

Error in program:  Can't set a property of a non-existing variable: conversation[-1].user_text

when trying to continue the chat after loading the text file and converting it to a program.

Describe the solution you'd like

Ideally we can just program.save(filename) and program.load(filename) and it just works without worrying about storing and loading the program.variables() separately.

Describe alternatives you've considered

I was trying to do this:

import guidance
guidance.llm = guidance.llms.OpenAI('gpt-3.5-turbo')
guidance.llm.cache.clear()

prompt = guidance(
'''{{#system~}}
You are a helpful assistant
{{~/system}}
{{~#geneach 'conversation' stop=False~}}
{{#user~}}
{{set 'this.user_text' (await 'user_text')}}
{{~/user}}
{{#assistant~}}
{{gen 'this.ai_text' temperature=0 max_tokens=300}}
{{~/assistant}}
{{~/geneach}}''', stream=False, silent=True)

prompt = prompt(user_text ='Hello there :)')

with open('prompt.txt', 'w') as f:
    f.write(str(prompt))

with open('prompt.txt', 'r') as f:
    prompt = f.read()

prompt = guidance(prompt)

prompt = prompt(user_text ='I want to travel to the moon')

But got the error mentioned above.

My solution was:


def save_prompt(prompt, filename):
    variables = prompt.variables()
    del variables['llm']
    to_store = {'text': str(prompt), 'variables': variables}
    with open(filename, 'w') as f:
        json.dump(to_store, f)
    
def load_prompt(filename):
    with open(filename, 'r') as f:
        loaded = json.load(f)
    prompt = guidance(loaded['text'], **loaded['variables'])
    return prompt

Note that the del variables['llm'] is necessary because the llm object is not serializable.

Bitsandbytes

Can there be a way to load a transformer from huggingface in bits and bytes. That could make the model loading easier. I might add this after work but it would be nice to have.

ValueError: No valid option generated in #select

The bug
ValueError: No valid option generated in #select

To Reproduce
Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.

import os
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

import guidance

d = '''
    User: Question: Which is the best MCU film?

    Assistant: Yes. I need to search the web for the best MCU film.

    User: Question: {{question}}

    Assistant {{#select "answer"}} Yes{{or}} No{{/select}}
'''

chatgpt = guidance.llms.OpenAI("text-davinci-003")

prompt = guidance(d)

prompt = prompt(llm = chatgpt, question = "What is Google's Headquarter address?")

Traceback (most recent call last):
  File "[/Users/jackzhou/.pyenv/versions/3.10.0/lib/python3.10/site-packages/guidance/_program_executor.py](https://file+.vscode-resource.vscode-cdn.net/Users/jackzhou/.pyenv/versions/3.10.0/lib/python3.10/site-packages/guidance/_program_executor.py)", line 94, in run
    await self.visit(self.parse_tree)
  File "[/Users/jackzhou/.pyenv/versions/3.10.0/lib/python3.10/site-packages/guidance/_program_executor.py](https://file+.vscode-resource.vscode-cdn.net/Users/jackzhou/.pyenv/versions/3.10.0/lib/python3.10/site-packages/guidance/_program_executor.py)", line 423, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
  File "[/Users/jackzhou/.pyenv/versions/3.10.0/lib/python3.10/site-packages/guidance/_program_executor.py](https://file+.vscode-resource.vscode-cdn.net/Users/jackzhou/.pyenv/versions/3.10.0/lib/python3.10/site-packages/guidance/_program_executor.py)", line 423, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
  File "[/Users/jackzhou/.pyenv/versions/3.10.0/lib/python3.10/site-packages/guidance/_program_executor.py](https://file+.vscode-resource.vscode-cdn.net/Users/jackzhou/.pyenv/versions/3.10.0/lib/python3.10/site-packages/guidance/_program_executor.py)", line 390, in visit
    command_output = await command_function(*positional_args, **named_args)
  File "[/Users/jackzhou/.pyenv/versions/3.10.0/lib/python3.10/site-packages/guidance/library/_select.py](https://file+.vscode-resource.vscode-cdn.net/Users/jackzhou/.pyenv/versions/3.10.0/lib/python3.10/site-packages/guidance/library/_select.py)", line 138, in select
    raise ValueError("No valid option generated in #select! Please post a GitHub issue since this should not happen :)")
ValueError: No valid option generated in #select! Please post a GitHub issue since this should not happen :)

Error in program:  No valid option generated in #select! Please post a GitHub issue since this should not happen :)
Output exceeds the [size limit](command:workbench.action.openSettings?%5B%22notebook.output.textLineLimit%22%5D). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?3f937783-9df6-4db8-8cc8-185d951c685d)---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 20
     16 chatgpt = guidance.llms.OpenAI("text-davinci-003")
     18 prompt = guidance(d)
---> 20 prompt = prompt(llm = chatgpt, question = "What is Google's Headquarter address?")

File [~/.pyenv/versions/3.10.0/lib/python3.10/site-packages/guidance/_program.py:225](https://file+.vscode-resource.vscode-cdn.net/Users/jackzhou/test/jack4/agents/~/.pyenv/versions/3.10.0/lib/python3.10/site-packages/guidance/_program.py:225), in Program.__call__(self, **kwargs)
    223     loop = asyncio.new_event_loop()
    224     loop.create_task(new_program.update_display.run()) # start the display updater
--> 225     loop.run_until_complete(new_program.execute())
    227 return new_program

File [~/.pyenv/versions/3.10.0/lib/python3.10/site-packages/nest_asyncio.py:90](https://file+.vscode-resource.vscode-cdn.net/Users/jackzhou/test/jack4/agents/~/.pyenv/versions/3.10.0/lib/python3.10/site-packages/nest_asyncio.py:90), in _patch_loop..run_until_complete(self, future)
     87 if not f.done():
     88     raise RuntimeError(
     89         'Event loop stopped before Future completed.')
---> 90 return f.result()

File [~/.pyenv/versions/3.10.0/lib/python3.10/asyncio/futures.py:201](https://file+.vscode-resource.vscode-cdn.net/Users/jackzhou/test/jack4/agents/~/.pyenv/versions/3.10.0/lib/python3.10/asyncio/futures.py:201), in Future.result(self)
    199 self.__log_traceback = False
    200 if self._exception is not None:
--> 201     raise self._exception
    202 return self._result
...
    137 if max(option_logprobs.values()) <= -1000:
--> 138     raise ValueError("No valid option generated in #select! Please post a GitHub issue since this should not happen :)")
    140 partial_output(selected_option)

ValueError: No valid option generated in #select! Please post a GitHub issue since this should not happen :)

System info (please complete the following information):

OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): Mac OS
Guidance Version (guidance.__version__): guidance-0.0.47

Add a comparison between LMQL and Guidance

I would like to understand how Guidance compares to LMQL when reading the docs. Can we gather a list of additional features?

Error running demos: use_clear_syntax.ipynb

Hi,

Firstly, thanks for sharing the idea of prompt templating.

I'm playing with use_clear_syntax.ipynb in notebooks/art_of_prompt_design/use_clear_syntax.ipynb,
and trying to understand how it works with openai API using the following code:

https://gist.github.com/wooparadog/5765952394c33c5c9091f2835467eeec

But I got an exception:

Traceback (most recent call last):
  File "/home/wooparadog/Codes/github.com/microsoft/guidance/guidance/_program_executor.py", line 94, in run
    await self.visit(self.parse_tree)
  File "/home/wooparadog/Codes/github.com/microsoft/guidance/guidance/_program_executor.py", line 434, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wooparadog/Codes/github.com/microsoft/guidance/guidance/_program_executor.py", line 434, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wooparadog/Codes/github.com/microsoft/guidance/guidance/_program_executor.py", line 394, in visit
    command_output = await command_function(*positional_args, **named_args)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wooparadog/Codes/github.com/microsoft/guidance/guidance/library/_system.py", line 13, in system
    return await role(name="system", hidden=hidden, _parser_context=_parser_context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wooparadog/Codes/github.com/microsoft/guidance/guidance/library/_role.py", line 4, in role
    block_content = _parser_context['block_content']
                    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable

Is there anything wrong with my test case? Thanks again

Make inferences from each iteration in each blocks available as output

I'm not sure if this is already possible, if so maybe an example could be included in the README? I was not able to find functionality that enables this.

It would be nice if all inferences in {{each}} blocks are available as output variables. Currently only the last generation is available as a named variable.

For example say the following guidance program is executed:

choices = ['odd', 'correct']

program = guidance("""
You are a helpful and terse assistant.

You have to check each of the following sentences for odd grammar and give a judgement.

{{#each sentences}}
Sentence {{@index}}: ```{{this}}```: Judgement: (Respond with 'correct' or 'odd' only!) {{select 'judgement' options=choices}}
{{/each}}""")

outputs = program(
    sentences=a_list_of_sentences,
    choices=choices
)().variables

print(outputs)

Will only output the last generated 'judgement':

{'llm': <guidance.llms._openai.OpenAI object at 0x0000025BA1A44B90>, 'sentences': ['sentence1', 'sentence2', 'sentence3'], 'choices': ['odd', 'correct'], 'judgement': 'odd'}

A much better result would be:

{'llm': <guidance.llms._openai.OpenAI object at 0x0000025BA1A44B90>, 'sentences': ['sentence1', 'sentence2', 'sentence3'], 'choices': ['odd', 'correct'], 'judgement': ['odd', 'correct', 'odd']}

Strange error on processing response?

I'm getting an error I think on trying to read the response from the llm. I'm using OpenAI via self.guide = guidance.llms.OpenAI(model='gpt-3.5-turbo'), and certain questions trigger the traceback below - not all questions, it seems to be fairly arbitrary. Looking at the code, it possibly needs a guard against empty string responses?

agent      |   File "/usr/local/lib/python3.9/site-packages/guidance/library/_gen.py", line 133, in gen
agent      |     gen_obj = await parser.llm_session(
agent      |   File "/usr/local/lib/python3.9/site-packages/guidance/llms/_openai.py", line 310, in __call__
agent      |     out = self.llm.caller(**call_args)
agent      |   File "/usr/local/lib/python3.9/site-packages/guidance/llms/_openai.py", line 192, in _library_call    
agent      |     out = add_text_to_chat_mode(out)
agent      |   File "/usr/local/lib/python3.9/site-packages/guidance/llms/_openai.py", line 57, in add_text_to_chat_mode
agent      |     c['text'] = c['message']['content']
agent      |   File "/usr/local/lib/python3.9/site-packages/openai/openai_object.py", line 71, in __setitem__        
agent      |     raise ValueError(
agent      | ValueError: You cannot set text to an empty string. We interpret empty strings as None in requests.You may set {
agent      |   "finish_reason": "stop",
agent      |   "index": 0,
agent      |   "message": {
agent      |     "content": "",
agent      |     "role": "assistant"
agent      |   }
agent      | }.text = None to delete the property

KeyError: "Command/variable 'first_question' not found!

The bug
KeyError: "Command/variable 'first_question' not found!

To Reproduce

import guidance
import re
guidance.llm = guidance.llms.OpenAI("gpt-4")
role_simulator = guidance('''
{{#system~}}
You are a helpful assistant
{{~/system}}
                                           
{{#user~}}
You will answer the user as {{role}} in the following conversation. At every step, I will provide you with the user input, as well as a comment reminding you of your instructions. Never talk about the fact that you are an AI, even if the user asks you. Always answer as {{role}}.
{{#if first_question}}You can also start the conversation.{{/if}}
{{~/user}}
                                           
{{~! The assistant either starts the conversation or not, depending on if this is the first or second agent }}
{{#assistant~}}
Ok, I will follow these instructions.
{{#if first_question}}Let me start the conversation now:
{{role}}: {{first_question}}{{/if}}
{{~/assistant}}

{{~! Then the conversation unrolls }}
{{~#geneach 'conversation'}}
{{#user~}}
User: {{set 'this.input' (await 'input')}}
Comment: Remember, answer as a {{role}}. Start your utterance with {{role}}:
{{~/user}}

{{#assistant~}}
{{gen 'this.response' temperature=0 max_tokens=300}}
{{~/assistant}}
{{~/geneach}}''')
                          
republican = role_simulator(role='Republican')
democrat = role_simulator(role='Democrat')

first_question = '''What do you think is the best way to stop inflation?'''
republican = republican(input=first_question, first_question=None)
democrat = democrat(input=republican["conversation"][-2]["response"].strip('Republican: '), first_question=first_question)

for i in range(2):
    republican = republican(input=democrat["conversation"][-2]["response"].replace('Democrat: ', ''))
    democrat = democrat(input=republican["conversation"][-2]["response"].replace('Republican: ', ''))
print('Democrat: ' + first_question)
for x in democrat['conversation'][:-1]:
    print('Republican:', x['input'])
    print()
    print(x['response'])

System info (please complete the following information):

OS: Ubuntu
Guidance Version: 0.0.47

#select raises ValueError if its child contents contain whitespace

Does not work

Assistant: {{#select "answer"}} Yes{{or}} No{{/select}}

Error: ValueError: No valid option generated in #select! Please post a GitHub issue since this should not happen :)

Works

Assistant: {{#select "answer"}}Yes{{or}}No{{/select}}

Originally posted by @PenutChen in #40 (comment)

ValueError: No valid option generated in #select!

Error was raised running the following:

prompt = guidance('''Is the following sentence offensive? Please answer with a single word, either "Yes", "Nein", or "Vielleicht".
Sentence: {{example}}
Answer:{{#select "answer" logprobs='logprobs'}} Ja{{or}} Nein{{or}} Vielleicht{{/select}}''')
prompt = prompt(example='I hate tacos.')
prompt

I tried multiple variations. When the answer format specified at the beginning of the prompt and the options inside the select tag vary then very often this error happens.

Generated text not respecting pattern

I'm trying to generate some text using a modified version of the main example and am running into an issue where the model is picking a token containing a non-numerical character for the 'chapter' variable despite that not matching the pattern. Here's my code:

prompt = '''
Tweak this proverb to apply to machine learning model instructions instead.

{{proverb}}
- {{book}} {{chapter}}:{{verse}}

UPDATED
Where there is no guidance{{gen 'rewrite' stop='- '}}
- GPT {{gen 'chapter' pattern='[0-9]' max_tokens=1}}:{{gen 'verse' pattern='[0-9]+' stop='\\n'}}
'''[1:-1]

model = guidance.llms.Transformers('stabilityai/stablelm-base-alpha-3b', device=0)
program = guidance(prompt, llm = model)
executed_program = program(
    proverb="Where there is no guidance, a people falls,\nbut in an abundance of counselors there is safety.",
    book="Proverbs",
    chapter=11,
    verse=14,
)

print(executed_program)
print(executed_program["chapter"])

and the generated output:

Tweak this proverb to apply to machine learning model instructions instead.

Where there is no guidance, a people falls,
but in an abundance of counselors there is safety.
- Proverbs 11:14

UPDATED
Where there is no guidance, a people falls, but in an abundance of counselors there is safety.

- GPT 2::2
2:

I should also add that I saw a similar issue earlier today with pattern not working as intended so I made sure I was on the latest version just before running this just now.

Support for huggingface/text-generation-inference

This library from HF is pretty great and I get use out of it in production settings for LLMs. Would love to figure out how to integrate a system like this for LLM safety with it so I can use HF models, get dynamic batching, and be able to stream tokens with the guidance library!

Running guidance within Streamlit

Hello,

Like #11 , so I suppose the resolution will be similar. Creating this for future reference for other searches.

Pure intuition, Streamlit spawns worker threads to run the Python Script in, which may be decorrelated from any Asyncio loop you would start on initialization on the main thread through nest_asyncio. Though just creating/setting an asyncio loop in the script also doesn't register correctly :/

Using guidance==0.0.44 and streamlit==1.22.0

"""Run with `streamlit run app.py`"""
import guidance
import streamlit as st

#import asyncio
#loop = asyncio.new_event_loop()
#asyncio.set_event_loop(loop)

st.title("Test Guidance")

guidance.llm = guidance.llms.OpenAI("text-davinci-003")

prompt = guidance('''The best thing about the beach is {{~gen 'best' temperature=0.7 max_tokens=7}}''')

st.write(prompt())

  File "guidance\__init__.py", line 15, in <module>
    nest_asyncio.apply()
  File "nest_asyncio.py", line 16, in apply
    loop = loop or asyncio.get_event_loop()
  File "nest_asyncio.py", line 45, in _get_event_loop
    loop = events.get_event_loop_policy().get_event_loop()
  File "lib\asyncio\events.py", line 656, in get_event_loop
    raise RuntimeError('There is no current event loop in thread %r.'
RuntimeError: There is no current event loop in thread 'ScriptRunner.scriptThread'.

Have a nice day

Pattern Guides (LLaMA-7B)

Pattern guides/regex patterns don't seem to have quite the impact one would expect

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Neko-Institute-of-Science/LLaMA-7B-HF")
model = AutoModelForCausalLM.from_pretrained("Neko-Institute-of-Science/LLaMA-7B-HF")
llama = guidance.llms.Transformers(model=model, tokenizer=tokenizer, device=5)
statement_gen = guidance("""
Today we want to say that our new tech company with the name {{gen 'companyname' max_tokens=10 pattern='[a-zA-Z]{8}'}} went public under the ticker {{gen 'ticker' pattern='[A-Z]{4}' temperature=0.3}}
""")
statement_gen(llm=llama)

I would expect this to output

Today we want to say that our new tech company with the name NameOfCompany went public under the ticker TCKR

where NameOfCompany matches [a-zA-Z]{8}, i.e. consisting of letters, no whitespace and exactly eight characters.

Actual output:

Today we want to say that our new tech company with the name ofTechno 2000 is going went public under the ticker TKOO

, i.e. companyname = ofTechno 2000 is going (bad) and ticker = TKOO (good)

Do I misunderstand pattern guides or is this an issue?

Select block returning options twice

When using {{select}}, the same response is returned twice. I have tested it with text-davinci-003, so not sure if it applies to other local models.

Example 1:

Here,
Expected output: Anachronism: Yes
Actual output: Anachronism: Yes Yes

Example 2:

Here,

Expected output: armor: leather
Actual output: armor: leatherleather
Expected output: weapon: sword
Actual output: weapon: swordsword

Import fails for v0.0.42

I'm on Python 3.11

pip3 install guidance==0.0.42

Then a simple script with

import guidance

Gets error

Traceback (most recent call last):
  File "scrubbed...", line 1, in <module>
    import guidance
  File "/opt/homebrew/lib/python3.11/site-packages/guidance/__init__.py", line 7, in <module>
    from ._program import Program
  File "/opt/homebrew/lib/python3.11/site-packages/guidance/_program.py", line 19, in <module>
    from . import library
  File "/opt/homebrew/lib/python3.11/site-packages/guidance/library/__init__.py", line 11, in <module>
    from ._geneach import geneach
  File "/opt/homebrew/lib/python3.11/site-packages/guidance/library/_geneach.py", line 112
    partial_output( block_content = parser_context
                                    ^^^^^^^^^^^^^^
SyntaxError: invalid syntax. Perhaps you forgot a comma?

I run pip3 install guidance==0.0.41 and import works.

Docs building instructions

I'm very new to Python and tooling surrounding it, and making use of the library without documentation online is difficult.

A see that the repo contains /docs folder with docs that are supposed to be build with sphinx, but I can not get it to work.
I installed sphinx-build and a bunch of libraries it said it can not import, but stuck at

PandocMissing in example_notebooks\anachronism.ipynb:
Pandoc wasn't found.
Please check that pandoc is installed:
https://pandoc.org/installing.html

even though I did both installed pandoc executable and did pip install pandoc.

Would love to know what is the process here, and the steps to follow.

OS: Windows
Lib verions 0.0.48

guidance-ai / guidance Goto Github PK

guidance's Introduction

Install

Features

Write pure Python, with additional LM functionality.

Constrain generation with selects (i.e., sets of options), regular expressions, and context-free grammars, as well as with pre-built components (e.g., substring, json).

Call and deploy tools easily with automatic interleaving of control and generation.

Get high compatibility—execute a single Guidance program on many backends

Gain speed with stateful control + generation functions—no need for intermediate parsers.

Token healing

Rich templates with f-strings.

Abstract chat interface that uses correct special tokens for any chat model.

Easy-to-write reusable components.

A library of pre-built components

Streaming support, also integrated with Jupyter notebooks.

Multi-modal support.

Example notebooks

Basic generation

Constrained generation

Select (basic)

Regular expressions

Regex to constrain generation

Regex as stopping criterion

Context-free grammars

Capture a generation

Stateful control + generation

State in immutable objects

Stateful {guidance} functions

Example: ReAct

Example: Changing intermediate step of a Chat session

Automatic interleaving of control and generation: tool use

Gsm8k example

Automatic call grammar for @guidance functions

Text, not tokens

Fast

Integrated stateful control is faster

Guidance acceleration

Loading models

llama.cpp

Transformers

Vertex AI

OpenAI

guidance's People

Contributors

Stargazers

Watchers

Forkers

guidance's Issues

Unable to import guidance due to unspecified dependency.

Unable to execute program due to missing guidance.llm attribute.

Error in program: Error generating stop tokens for geneach loop. Perhaps you are outside of role tags (assistant/user/system)? If you don't want the loop to check for stop tokens, set stop=False or set num_iterations.

1. Character Creator

2. Basic Templating

Example 1:

Example 2:

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Stateful `{guidance}` functions

Unable to `import guidance` due to unspecified dependency.

Unable to execute program due to missing `guidance.llm` attribute.