microsoft / lida Goto Github PK

View Code? Open in Web Editor NEW

2.6K 38.0 278.0 497.98 MB

Automatic Generation of Visualizations and Infographics using Large Language Models

Home Page: https://microsoft.github.io/lida/

License: MIT License

Jupyter Notebook 31.41% Python 11.16% HTML 27.45% JavaScript 29.99%

datavisualization llm openai visualization cohere openai-api palm2 hacktoberfest

lida's Introduction

LIDA: Automatic Generation of Visualizations and Infographics using Large Language Models

LIDA is a library for generating data visualizations and data-faithful infographics. LIDA is grammar agnostic (will work with any programming language and visualization libraries e.g. matplotlib, seaborn, altair, d3 etc) and works with multiple large language model providers (OpenAI, Azure OpenAI, PaLM, Cohere, Huggingface). Details on the components of LIDA are described in the paper here and in this tutorial notebook. See the project page here for updates!.

Note on Code Execution: To create visualizations, LIDA generates and executes code. Ensure that you run LIDA in a secure environment.

Features

LIDA treats visualizations as code and provides a clean api for generating, executing, editing, explaining, evaluating and repairing visualization code.

Data Summarization
Goal Generation
Visualization Generation
Visualization Editing
Visualization Explanation
Visualization Evaluation and Repair
Visualization Recommendation
Infographic Generation (beta) # pip install lida[infographics]

from lida import Manager, llm

lida = Manager(text_gen = llm("openai")) # palm, cohere ..
summary = lida.summarize("data/cars.csv")
goals = lida.goals(summary, n=2) # exploratory data analysis
charts = lida.visualize(summary=summary, goal=goals[0]) # exploratory data analysis

Getting Started

Setup and verify that your python environment is python 3.10 or higher (preferably, use Conda). Install the library via pip.

pip install -U lida

LIDA depends on llmx and openai. If you had these libraries installed previously, consider updating them.

pip install -U llmx openai

Once requirements are met, setup your api key. Learn more about setting up keys for other LLM providers here.

export OPENAI_API_KEY=<your key>

Alternatively you can install the library in dev model by cloning this repo and running pip install -e . in the repository root.

Web API and UI

LIDA comes with an optional bundled ui and web api that you can explore by running the following command:

lida ui  --port=8080 --docs

Then navigate to http://localhost:8080/ in your browser. To view the web api specification, add the --docs option to the cli command, and navigate to http://localhost:8080/api/docs in your browser.

The fastest and recommended way to get started after installation will be to try out the web ui above or run the tutorial notebook.

Building the Web API and UI with Docker

The LIDA web api and ui can be setup using docker and the command below (ensure that you have docker installed, and you have set your OPENAI_API_KEY environment variable).

docker compose up

Data Summarization

Given a dataset, generate a compact summary of the data.

from lida import Manager

lida = Manager()
summary = lida.summarize("data/cars.json") # generate data summary

Goal Generation

Generate a set of visualization goals given a data summary.

goals = lida.goals(summary, n=5, persona="ceo with aerodynamics background") # generate goals

Add a persona parameter to generate goals based on that persona.

Visualization Generation

Generate, refine, execute and filter visualization code given a data summary and visualization goal. Note that LIDA represents visualizations as code.

# generate charts (generate and execute visualization code)
charts = lida.visualize(summary=summary, goal=goals[0], library="matplotlib") # seaborn, ggplot ..

Visualization Editing

Given a visualization, edit the visualization using natural language.

# modify chart using natural language
instructions = ["convert this to a bar chart", "change the color to red", "change y axes label to Fuel Efficiency", "translate the title to french"]
edited_charts = lida.edit(code=code,  summary=summary, instructions=instructions, library=library, textgen_config=textgen_config)

Visualization Explanation

Given a visualization, generate a natural language explanation of the visualization code (accessibility, data transformations applied, visualization code)

# generate explanation for chart
explanation = lida.explain(code=charts[0].code, summary=summary)

Visualization Evaluation and Repair

Given a visualization, evaluate to find repair instructions (which may be human authored, or generated), repair the visualization.

evaluations = lida.evaluate(code=code,  goal=goals[i], library=library)

Visualization Recommendation

Given a dataset, generate a set of recommended visualizations.

recommendations = lida.recommend(code=code, summary=summary, n=2,  textgen_config=textgen_config)

Infographic Generation [WIP]

Given a visualization, generate a data-faithful infographic. This methods should be considered experimental, and uses stable diffusion models from the peacasso library. You will need to run pip install lida[infographics] to install the required dependencies.

infographics = lida.infographics(visualization = charts[0].raster, n=3, style_prompt="line art")

Using LIDA with Locally Hosted LLMs (HuggingFace)

LIDA uses the llmx library as its interface for text generation. llmx supports multiple local models including HuggingFace models. You can use the huggingface models directly (assuming you have a gpu) or connect to an openai compatible local model endpoint e.g. using the excellent vllm library.

Using HuggingFace Models Directly

!pip3 install --upgrade llmx==0.0.17a0

# Restart the colab session

from lida import Manager
from llmx import  llm
text_gen = llm(provider="hf", model="uukuguy/speechless-llama2-hermes-orca-platypus-13b", device_map="auto")
lida = Manager(text_gen=text_gen)
# now you can call lida methods as above e.g.
sumamry = lida.summarize("data/cars.csv") # ....

Using an OpenAI Compatible Endpoint e.g. vllm server

from lida import Manager, TextGenerationConfig , llm

model_name = "uukuguy/speechless-llama2-hermes-orca-platypus-13b"
model_details = [{'name': model_name, 'max_tokens': 2596, 'model': {'provider': 'openai', 'parameters': {'model': model_name}}}]

# assuming your vllm endpoint is running on localhost:8000
text_gen = llm(provider="openai",  api_base="http://localhost:8000/v1", api_key="EMPTY", models=model_details)
lida = Manager(text_gen = text_gen)

Important Notes / Caveats / FAQs

LIDA generates and executes code based on provided input. Ensure that you run LIDA in a secure environment with appropriate permissions.
LIDA currently works best with datasets that have a small number of columns (<= 10). This is mainly due to the limited context size for most models. For larger datasets, consider preprocessing your dataset to use a subset of the columns.
LIDA assumes the dataset exists and is in a format that can be loaded into a pandas dataframe. For example, a csv file, or a json file with a list of objects. In practices the right dataset may need to be curated and preprocessed to ensure that it is suitable for the task at hand.
Smaller LLMs (e.g., OSS LLMs on Huggingface) have limited instruction following capabilities and may not work well with LIDA. LIDA works best with larger LLMs (e.g., OpenAI GPT 3.5, GPT 4).
How reliable is the LIDA approach? The LIDA paper describes experiments that evaluate the reliability of LIDA using a visualization error rate metric. With the current version of prompts, data summarization techniques, preprocessing/postprocessing logic and LLMs, LIDA has an error rate of < 3.5% on over 2200 visualizations generated (compared to a baseline of over 10% error rate). This area is work in progress.
Can I build my own apps with LIDA? Yes! You can either use the python api directly in your app or setup a web api endpoint and use the web api in your app. See the web api section for more details.
How is LIDA related to OpenAI Code Interpreter: LIDA shares several similarities with code interpreter in the sense that both involve writing and executing code to address user intent. LIDA differs in its focus on visualization, providing a modular api for developer reuse and providing evaluation metrics on the visualization use case.

Naturally, some of the limitations above could be addressed by a much welcomed PR.

Community Examples Built with LIDA

LIDA + Streamlit: lida-streamlit,

Documentation and Citation

A short paper describing LIDA (Accepted at ACL 2023 Conference) is available here.

@inproceedings{dibia2023lida,
    title = "{LIDA}: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models",
    author = "Dibia, Victor",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-demo.11",
    doi = "10.18653/v1/2023.acl-demo.11",
    pages = "113--126",
}

LIDA builds on insights in automatic generation of visualization from an earlier paper - Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks.

lida's People

Contributors

Stargazers

Watchers

Forkers

peter-gy nitya francyjglisboa mesumraza techthiyanes khoa-lucents cscomic alfred-onuada anaghkanungo7 jsndg marssovereign danselem brian-backer evdcush kangshengyu huangtao36 julienze gaoxiaoliang111 clarence5 mccpr onlywangyh rgbkrk 0xaaiden jaedukseo asaran graemewal777 manigithub-lab sandhiyara zeynepazili iretex keshavkmr48 engelberger rhinojosa mz0in gunsan senthil-prabhu xieshentoken fahdamjad enjoyteabookshistory m-hussien eltociear cosark hhy5277 rkp64 doytsujin khanajmal007 linzuxin interchiba belkmouf coder17934 suryatmodulus lilleswing jskherman thearchiver danibrew4 justinnazari sreekiranar oldbulb torshinrg tony163163 haoyitedaniu ryanmio jedi9t tfius sashanksilwal tonydata22au suprah925 tori-ham f901107 saxoji sonali147 sonalidevbud vinicius-ianni idooss lawrencerowland davgit mhannani spaceblocks zhigaloff centaurioun ryman protoys-webdev tyler4jin wtcruijff ariktan coderdeepstudy kimtuo kickuas ethicalsecurity-agency fantasybhl gibson-gichuru dataskeptic twinyk jfontestad baseldaja trojrobert avldya jzhongsun ingeniousfrog sksundaram-learning

lida's Issues

Improved Prompting

What

Currently the system prompts used in LIDA modules are integrated in the class files. Given the fluid nature of LLMs today and the need for low level prompt specification, this initial implementation works.

However it would be valuable to provide some sort of templating system where part of the prompt can be specified by the user and loaded as an option.

Some potential approaches

load prompt template files that either augment or replace existing prompts
prompt templates for specific LLMs
scaffold template files for specific languages/grammars

Issue in Plot Generation in 2nd iterartion against csv file

I'm having an issue with Lida package. I'm uploading a csv file and generating some goals from lida and then I have used visualise method to create a plot using seaborn library.

#Issue
Scenario 1:
When I try to create a plot of my own defined goal, It's working very fine and a very clear output.
Scenario 2:
But if I try to create a plot for a goal from the Goals List that Lida has generated and then try to create a plot for my own defined goal. it's giving me an output but it's not clearly visible, Like I can see only two dots instead of a graph That has been generated in 1st scenario

Error processing csv

Hi!

Uploading the following .csv is giving a "An error occurred. Please try again later." message. It looks like it's struggling with the CSV, the error message is:

Error processing file: Error tokenizing data. C error: Expected 4 fields in line 257, saw 5

The last cell has unescapes quotes in it, which is probably the issue.

results.csv

The model did not return a valid JSON object

The problem with using the LLAMA-2-7B-FP16 model, can someone help solve it?Thanks~~

Support for Vega-lite

Are there any plans to support declarative visualisation specifications such as Vega-lite or LookML?

NameError: name 'OpenAITextGenerator' is not defined

Following the instructions:
from lida import Manager, TextGenerationConfig , llm
lida = Manager(text_gen = llm("openai", api_key=OPENAI_API_KEY))
textgen_config = TextGenerationConfig(n=1, temperature=0.5, model="gpt-4.0", use_cache=True)

NameError Traceback (most recent call last)
in <cell line: 1>()
----> 1 lida = Manager(text_gen = llm("openai", api_key=OPENAI_API_KEY))
2 textgen_config = TextGenerationConfig(n=1, temperature=0.5, model="gpt-3.5-turbo-0301", use_cache=True)

/usr/local/lib/python3.10/dist-packages/llmx/generators/text/textgen.py in llm(provider, **kwargs)
50
51 if provider.lower() == "openai":
---> 52 return OpenAITextGenerator(**kwargs)
53 elif provider.lower() == "palm":
54 return PalmTextGenerator(**kwargs)

NameError: name 'OpenAITextGenerator' is not defined

Improvements to Code Executor

What

Current code executor class is fairly simple. It attempts to clean/filter the code, and based on the specified library (e.g. matplotlib, seaborn, altair etc), we compile the code (eval) and retrieve a chart object.

There are a few areas that could be could be improved:

How

Module not found detection and resolution. In some cases, the generated code may require modules that are not installed (e.g., maps etc).
It would be good to implement a strategy to explicitly discover and address this (e.g., instal the library if it is on a list of preappreoved libraries) or return an actionable error message.
Sandboxed code execution: Code is executed on the local machine ❌ 💀 . It would be good to figure out some sort of (lightweight) docker sandboxing setup where we setup a docker env with installed deps for code execution.

Large dataset summary exceeds context size?

Uploaded a fairly large datasets (> 40 columns) and seems like the summary and response size exceeded the maximum context size.

Would it be possible to compress the summary size or let the users choose which columns are relevant to generate charts?

Alternative Free LLMs

I’m interested in using the LIDA project for generating automated visualizations. However, I’m unable to obtain an OpenAI key. I was wondering if there are any free Large Language Models (LLMs) that could be used as an alternative. Specifically, I’m curious about the possibility of using Bing AI or Google’s Bard. I’ve also tried a local Hugging Face LLM, but it doesn’t seem to be as beneficial. Any guidance or suggestions would be greatly appreciated.

Request: Visualise datasets longer than context length

Currently the data needs to 'loaded' into the context window as a pandas dataframe. Is there any scope for extending this project to deal with large scale tabular databases that use pyspark dataframes, sql, etc

infographics AttributeError: 'NoneType' object has no attribute 'generate'

am using the demo at the docs for local llm

Ggg

swift
import UIKit

class SocialMediaAccount {
let platform: String
let username: String
let password: String

init(platform: String, username: String, password: String) {
    self.platform = platform
    self.username = username
    self.password = password
}

}

class User {
var socialMediaAccounts: [SocialMediaAccount] = []

func addSocialMediaAccount(account: SocialMediaAccount) {
    socialMediaAccounts.append(account)
}

func generateUserCode() -> String {
    // Generate a unique user code here
    let userCode = "ABC123"
    return userCode
}

}

// Usage example
let user = User()

// Add social media accounts
let facebookAccount = SocialMediaAccount(platform: "Facebook", username: "example_user", password: "password123")
let twitterAccount = SocialMediaAccount(platform: "Twitter", username: "example_user", password: "password123")
user.addSocialMediaAccount(account: facebookAccount)
user.addSocialMediaAccount(account: twitterAccount)

// Generate user code
let userCode = user.generateUserCode()
print("User code: (userCode)")

Dataset Finder (Data Discovery Tool)

What

Data analysis and exploration typically begins with the assumption that the right dataset exists. For many scenarios, this assumption holds (e.g., organizational data already exists is a tidy csv or json file). However, for other use cases, the right dataset may not exist and needs to be found.

The high level goal of this functionality is

provide a set of approaches to finding data given some query or representation of the user's intent.

How

Supported approaches may include the following:

Heuristic strategy: define a work flow for identifying datasets that may be relevant. For example, support fixed providers like
- data.gov
- GHO https://www.who.int/data/gho/info/gho-odata-api
- github to find csvs, or json files relevant to queries.
Live agent strategy: define some mechanism that leverages web search in identifying related relevant datasets.

Possibly start off with a a base DataFinder class (find method), HeuristicsDataFinder subclass, AgentDataFinder subclass.

p.s. if you are interested in working on this, please share thoughts on your general approach for discussion and comment.

Plotly issue in visualize

Hi,

Im having issues when using 'plotly' as library.

Example :

i = 0
library = 'plotly'
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=goals[i], textgen_config=textgen_config, library=library)
plot_raster(charts[0].raster)

Returns:

ValueError: Unsupported library. Choose from 'matplotlib', 'seaborn', 'plotly', 'bokeh', 'ggplot', 'altair'.

Openai ChatCompletion not found

latest openai version 1.1.1 doesn't have openai.ChatCompletion. Need previous version of library or lida has to update the method for using latest documentation

Access to the code generated to plot

Hi, Is there any way to get access to the code that lida generate and use to plot?

LIDA rename columns

LIDA columns replace special characters with _

Is it possible to use the native model ChatGLM?

Notebook Example

via query generate issue in generation graphic in the example.

Hi

fail

Haven't been able to generate a visualization with Altair

Hello,
I've been playing around with the toolkit for a couple days and it's working very well with the seaborn Library. However I would like to generate VegaLite specifications so I switched to the altair library. However, I've been having issues with generating visualizations through this library. I am running the following code snippet to produce a visualization.

new_goal = Goal(index=1, question='Show the relationship between budget and rating', visualization='scatterplot', rationale='Both are quantitative attributes so a scatterplot is necessary to find the correlation')
goals.append(new_goal)
temp = Summary(name=summary['name'], file_name=summary['file_name'], dataset_description=summary['dataset_description'], field_names=summary['field_names'], fields=summary['fields'])
library="altair"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=temp, goal=new_goal, textgen_config=textgen_config, library=library) 
plot_raster(charts[0].raster)

The code block produces the following error:

Is there something wrong with how I'm switching the library to altair?

edited_charts[0].raster 为空

colab environment

the code show as below

code show as below
`from lida import Manager, TextGenerationConfig, llm
from lida.utils import plot_raster
from lida import Manager

lida = Manager()
summary = lida.summarize("../data/cars.csv")  # generate data summary
print("summary:", summary)
# 将字典转换为JSON字符串
import json

summary_string = json.dumps(summary)

# 打印JSON字符串
print(summary_string)

goals = lida.goals(summary, n=5)
print("goals", goals)

# 生成可视化代码
charts = lida.visualize(summary=summary, goal=goals[0], library="matplotlib")
print(charts)
code = charts[0].code
# 绘制图
# 本质是对图片进行解码，展示图片
plot_raster(charts[0].raster)

library = "seaborn"
# code = charts[0].code
textgen_config = TextGenerationConfig(n=1, temperature=0, use_cache=True)
instructions = ["zoom in 50%", "make the chart height and width equal", "change the color of the chart to red",
                "translate the chart to spanish"]
edited_charts = lida.edit(code=code, summary=summary, instructions=instructions, library=library,
                          textgen_config=textgen_config)
plot_raster(edited_charts[0].raster)

`
The error is reported as follows

Single User functionality

To my understanding, LIDA functions as a single-threaded application, serving one user per session. We’re exploring scaling LIDA to an enterprise level, enabling multi-user support simultaneously.

Does Microsoft offer any existing frameworks or solutions to transition LIDA from single-threaded to multi-threaded architecture?

Furthermore, as we plan to use Azure Redis Cache Services, how would this scaling impact our caching strategy? Are there established solutions for managing cache effectively in a multi-user, multi-threaded environment?

What other information regarding this topic you can provide which can help us to design a solution.

is anyone got error 'not enough values to unpack ' on Mac system?

I got error in my Mac but run ok in ubantu .has anyone got the same error message?

the error msg is as follow:
Backend MacOSX is interactive backend. Turning interactive mode on.
Traceback (most recent call last):
File "/Users/jinzhiliang/anaconda3/envs/py10_chatgml/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/jinzhiliang/anaconda3/envs/py10_chatgml/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/jinzhiliang/.vscode/extensions/ms-python.python-2023.20.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 39, in
cli.main()
File "/Users/jinzhiliang/.vscode/extensions/ms-python.python-2023.20.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/Users/jinzhiliang/.vscode/extensions/ms-python.python-2023.20.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="main")
File "/Users/jinzhiliang/.vscode/extensions/ms-python.python-2023.20.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
File "/Users/jinzhiliang/.vscode/extensions/ms-python.python-2023.20.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/Users/jinzhiliang/.vscode/extensions/ms-python.python-2023.20.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "/Users/jinzhiliang/githubRepo/lida/tests/test_modules.py", line 81, in
test_summarizer()
File "/Users/jinzhiliang/githubRepo/lida/tests/test_modules.py", line 24, in test_summarizer
summary_enrich = lida.summarize(cars_data_url,
File "/Users/jinzhiliang/anaconda3/envs/py10_chatgml/lib/python3.10/site-packages/lida/components/manager.py", line 131, in summarize
return self.summarizer.summarize(
File "/Users/jinzhiliang/anaconda3/envs/py10_chatgml/lib/python3.10/site-packages/lida/components/summarizer.py", line 141, in summarize
data_summary = self.encrich(
File "/Users/jinzhiliang/anaconda3/envs/py10_chatgml/lib/python3.10/site-packages/lida/components/summarizer.py", line 104, in encrich
response = text_gen.generate(messages=messages, config=textgen_config)
File "/Users/jinzhiliang/anaconda3/envs/py10_chatgml/lib/python3.10/site-packages/llmx/generators/text/openai_textgen.py", line 52, in generate
prompt_tokens = num_tokens_from_messages(messages)
File "/Users/jinzhiliang/anaconda3/envs/py10_chatgml/lib/python3.10/site-packages/llmx/utils.py", line 22, in num_tokens_from_messages
encoding = tiktoken.encoding_for_model(model)
File "/Users/jinzhiliang/anaconda3/envs/py10_chatgml/lib/python3.10/site-packages/tiktoken/model.py", line 67, in encoding_for_model
return get_encoding(model_encoding_name)
File "/Users/jinzhiliang/anaconda3/envs/py10_chatgml/lib/python3.10/site-packages/tiktoken/registry.py", line 63, in get_encoding
enc = Encoding(**constructor())
File "/Users/jinzhiliang/anaconda3/envs/py10_chatgml/lib/python3.10/site-packages/tiktoken_ext/openai_public.py", line 64, in cl100k_base
mergeable_ranks = load_tiktoken_bpe(
File "/Users/jinzhiliang/anaconda3/envs/py10_chatgml/lib/python3.10/site-packages/tiktoken/load.py", line 117, in load_tiktoken_bpe
return {
File "/Users/jinzhiliang/anaconda3/envs/py10_chatgml/lib/python3.10/site-packages/tiktoken/load.py", line 119, in
for token, rank in (line.split() for line in contents.splitlines() if line)
ValueError: not enough values to unpack (expected 2, got 1)

it seem like run error in this as following picture.but Why?

TypeError: Manager.init() got an unexpected keyword argument 'llm' , also the text_gen llm function does work correctly i had the change it

from lida import llm , Manager
from llmx.generators.text.hf_textgen import HFTextGenerator
text_gen = HFTextGenerator(provider="hf", models="uukuguy/speechless-llama2-hermes-orca-platypus-13b",device_map="auto")
print("success loading the model") # the falcon7b caused error out oom it need larger gpu allocation 
lida = Manager(llm=text_gen)

sumamry = lida.summarize("order.csv") # summirize the dataset

How can I use deployed caht_GPT_model in lida ?

Evaluation of HF Models with LIDA

What

Local models (e.g. LLAMA based models available via HuggingFace in the 7B or 13B size classes) offer multiple benefits (e.g., can be finetuned/adapted, run locally etc).
While LIDA has been mostly tested with OpenAI models, more work is needed to test workflows and performance for HF models.

Work Items

Test a set of local HF models either directly with llmx or with LIDA, systematically document bugs and suggest fixes via PRs

Feat request: Platform based visualizations

Current visualizations are standard and not customizable across different screen sizes. It would be great to have this parameter additionally to differentiate Mobile versus Desktop visualizations to make them more legible on smaller screen sizes.

tutorial notebook issue

I want to learn how to use lida, so I ran the tutorial jupyter notebook, when I ran the command

i = 0
library = "seaborn"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=goals[i], textgen_config=textgen_config, library=library)
# plot raster image of chart
plot_raster(charts[0].raster)

I got the below issue:

Use the code template below 
 
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
<imports>
# solution plan
# i.  ..
def plot(data: pd.DataFrame):

    # ii. convert Retail_Price and Dealer_Cost to log scale
    data['Retail_Price_log'] = data['Retail_Price'].apply(lambda x: np.log(x))
    data['Dealer_Cost_log'] = data['Dealer_Cost'].apply(lambda x: np.log(x))

    # iii. plot the distribution of Retail_Price and Dealer_Cost
    plt.figure(figsize=(10, 6))
    sns.histplot(data=data, x='Retail_Price_log', color='red', alpha=0.5, label='Retail_Price')
    sns.histplot(data=data, x='Dealer_Cost_log', color='blue', alpha=0.5, label='Dealer_Cost')
    plt.xlabel('Log Scale')
    plt.ylabel('Frequency')
    plt.legend()
    plt.title('What is the distribution of Retail_Price and Dealer_Cost?', wrap=True)
    return plt;

chart = plot(data) # data already contains the data to be plotted. Always include this line. No additional code beyond this line.. DO NOT modify the rest of the code template.
****
 name 'np' is not defined
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_1490/2717688829.py in <module>
      4 charts = lida.visualize(summary=summary, goal=goals[i], textgen_config=textgen_config, library=library)
      5 # plot raster image of chart
----> 6 plot_raster(charts[0].raster)

IndexError: list index out of range

Supported visualization libraries

On this example, as the visualization library it uses Seaborn. How can I check all the supported visualization libraries? Instead of using this Seaborn, Can we use something like d3.js to show the visualizations?

can the visualize model support Chinese in generaled image?

I had try "altair"、"matplotlib"、”seaborn“、”ggplot“、”plotly“ in the params on 'library' in the method ' lida.visualize( ... , library="seaborn")' ，but got error output image when there has Chinese in my data. Can It support Chinese showing?

Unbundled front-end

Is it possible to publish the frontend ui currently used for Lida?
Thanks!

Can lida.summarize( ) ,input a long string instead of local file ?

I want to embed it in my program. So I want to input a formated long string to summarize(). Can it support?

pip install err

Thank you very much for open-sourcing this tool. I have encountered some problems. Could you please take a look and see how to solve them?

ERROR: Could not find a version that satisfies the requirement llmx (from lida)
ERROR: No matching distribution found for llmx

brach -main
command pip install .

Dockerfile and docker-compose

Any chance of an official Dockerfile so the project can be easily spun up in a repeatable way across various platforms?

`lida.summarize` returns wrong column types

with sample csv file below, lida.summarize(https://github.com/microsoft/lida/blob/main/lida/components/summarizer.py#L53) will return a dtype for the app_version column.

And in lida final step, the matplotlib/seaborn will throw errors with :

6.4.3 can not be cast to a date.

and if i use the summary_method as summary = lida.summarize("./info.csv", summary_method='llm'), the AI will keep the column type, OpenAI doesn't fix the error type.

user,app_version
Jack,6.4.3
Lisa,6.4.4
Tom,6.4.2

{
    "column": "app_version",
    "properties": {
        "dtype": "date",
        "min": "6.4.2",
        "max": "6.4.4",
        "samples": [
            "6.4.3",
            "6.4.4",
            "6.4.2"
        ],
        "num_unique_values": 3,
        "semantic_type": "date",
        "description": ""
    }
}

Persona Based Goal Generation

What

Currently, LIDA generates goals conditioned only on a data summary.
The expected functionality here will extend theSummarizer class to also consider a provided persona described in natural language or a data structure. This may also include the addition of a persona generator to generate a set of relevant personas given the dataset summary.

Mac installation error: geos_c.h not found

I suspect a number of Mac users will run into a GEOS issue on pip install lida

      src/_geoslib.c:751:10: fatal error: 'geos_c.h' file not found
      #include "geos_c.h"
               ^~~~~~~~~~
      1 warning and 1 error generated.
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]

  ERROR: Failed building wheel for basemap

This fixes the problem in my case:

brew install geos
export CPPFLAGS="-I/opt/homebrew/include"
export LDFLAGS="-L/opt/homebrew/lib"
pip install basemap
pip install lida

It may be helpful to include some mention of this in the docs or make the inclusion of basemap optional.

Context

Running Mac OS 13.2.1, Python 3.10.5

Is it possible to make this project available for locally deployed open source llm, such as chatglm2

Hello, because of the company's network policy, the service can only be deployed on an offline server. Is it possible for lida to call the locally deployed open source LLM, such as chatglm2, which provides an API call method similar to openai，as shown below
import openai
if name == "main":
openai.api_base = "http://localhost:8000/v1"
openai.api_key = "none"
for chunk in openai.ChatCompletion.create(
model="chatglm2-6b",
messages=[
{"role": "user", "content": "你好"}
],
stream=True
):
if hasattr(chunk.choices[0].delta, "content"):
print(chunk.choices[0].delta.content, end="", flush=True)

Failed to build basemap

ERROR: Could not build wheels for basemap which use PEP 517 and cannot be installed directly


> `Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.9/site-packages (from tiktoken->llmx->lida) (2023.3.23)
> Requirement already satisfied: h11>=0.8 in /usr/local/lib/python3.9/site-packages (from uvicorn->lida) (0.14.0)
> Building wheels for collected packages: basemap, matplotlib-venn
>   Building wheel for basemap (PEP 517) ... error
>   ERROR: Command errored out with exit status 1:
>    command: /usr/local/opt/[email protected]/bin/python3.9 /usr/local/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpveqgi13n
>        cwd: /private/tmp/pip-install-cjcckjp5/basemap_8efa836acd5543ea8d32db9d6b08ae0e
>   Complete output (35 lines):
>   <string>:58: RuntimeWarning: Cannot find GEOS library and/or headers in standard locations ('/Users/harshit/local', '/Users/harshit', '/usr/local', '/usr', '/opt/local', '/opt', '/sw'). Please install the corresponding packages using your software management system or set the environment variable GEOS_DIR to point to the location where GEOS is installed (for example, if 'geos_c.h' is in '/usr/local/include' and 'libgeos_c' is in '/usr/local/lib', then you need to set GEOS_DIR to '/usr/local'
>   /usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/setuptools/dist.py:287: SetuptoolsDeprecationWarning: The namespace_packages parameter is deprecated, consider using implicit namespaces instead (PEP 420). See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
>     warnings.warn(msg, SetuptoolsDeprecationWarning)
>   running bdist_wheel
>   running build
>   running build_py
>   creating build
>   creating build/lib.macosx-11-x86_64-cpython-39
>   creating build/lib.macosx-11-x86_64-cpython-39/mpl_toolkits
>   copying src/mpl_toolkits/__init__.py -> build/lib.macosx-11-x86_64-cpython-39/mpl_toolkits
>   creating build/lib.macosx-11-x86_64-cpython-39/mpl_toolkits/basemap
>   copying src/mpl_toolkits/basemap/cm.py -> build/lib.macosx-11-x86_64-cpython-39/mpl_toolkits/basemap
>   copying src/mpl_toolkits/basemap/__init__.py -> build/lib.macosx-11-x86_64-cpython-39/mpl_toolkits/basemap
>   copying src/mpl_toolkits/basemap/test.py -> build/lib.macosx-11-x86_64-cpython-39/mpl_toolkits/basemap
>   copying src/mpl_toolkits/basemap/diagnostic.py -> build/lib.macosx-11-x86_64-cpython-39/mpl_toolkits/basemap
>   copying src/mpl_toolkits/basemap/proj.py -> build/lib.macosx-11-x86_64-cpython-39/mpl_toolkits/basemap
>   copying src/mpl_toolkits/basemap/solar.py -> build/lib.macosx-11-x86_64-cpython-39/mpl_toolkits/basemap
>   running build_ext
>   cythoning src/_geoslib.pyx to src/_geoslib.c
>   building '_geoslib' extension
>   creating build/temp.macosx-11-x86_64-cpython-39
>   creating build/temp.macosx-11-x86_64-cpython-39/src
>   clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/core/include -I/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.9/include/python3.9 -c src/_geoslib.c -o build/temp.macosx-11-x86_64-cpython-39/src/_geoslib.o
>   In file included from src/_geoslib.c:744:
>   In file included from /usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/core/include/numpy/arrayobject.h:5:
>   In file included from /usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/core/include/numpy/ndarrayobject.h:12:
>   In file included from /usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/core/include/numpy/ndarraytypes.h:1929:
>   /usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with "          "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
>   #warning "Using deprecated NumPy API, disable it with " \
>    ^
>   src/_geoslib.c:745:10: fatal error: 'geos_c.h' file not found
>   #include "geos_c.h"
>            ^~~~~~~~~~
>   1 warning and 1 error generated.
>   error: command '/usr/bin/clang' failed with exit code 1
>   ----------------------------------------
>   ERROR: Failed building wheel for basemap`

The model did not return a valid JSON object

Hello,

While i am using hugging face model "TheBloke/Llama-2-7b-chat-fp16", i am facing the below error:

Note: I have used llmx to load the lida.

textgen_config = TextGenerationConfig(n=1, temperature=0.5, model="TheBloke/Llama-2-7b-chat-fp16", use_cache=True)

summary = lida.summarize("https://raw.githubusercontent.com/uwdata/draco/master/data/cars.csv", summary_method="default", textgen_config=textgen_config)
goals = lida.goals(summary, n=1, textgen_config=textgen_config)
print(goals)

Error decoding JSON: CHAP 1: Distribution of Engine Size
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------près

JSONDecodeError Traceback (most recent call last)
File ~.conda\envs\hack\Lib\site-packages\lida\components\goal.py:43, in GoalExplorer.generate(self, summary, textgen_config, text_gen, n)
42 json_string = clean_code_snippet(result.text[0]["content"])
---> 43 result = json.loads(json_string)
44 # cast each item in the list to a Goal object

File ~.conda\envs\hack\Lib\json_init_.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:

File ~.conda\envs\hack\Lib\json\decoder.py:337, in JSONDecoder.decode(self, s, _w)
333 """Return the Python representation of s (a str instance
334 containing a JSON document).
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()

File ~.conda\envs\hack\Lib\json\decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
Cell In[6], line 5
2 textgen_config = TextGenerationConfig(n=1, temperature=0.5, model="TheBloke/Llama-2-7b-chat-fp16", use_cache=True)
4 summary = lida.summarize("https://raw.githubusercontent.com/uwdata/draco/master/data/cars.csv", summary_method="default", textgen_config=textgen_config)
----> 5 goals = lida.goals(summary, n=1, textgen_config=textgen_config)
6 print(goals)

File ~.conda\envs\hack\Lib\site-packages\lida\components\manager.py:82, in Manager.goals(self, summary, textgen_config, n)
77 def goals(
78 self, summary, textgen_config: TextGenerationConfig = TextGenerationConfig(),
79 n=5):
80 self.check_textgen(config=textgen_config)
---> 82 return self.goal.generate(summary=summary, text_gen=self.text_gen,
83 textgen_config=textgen_config, n=n)

File ~.conda\envs\hack\Lib\site-packages\lida\components\goal.py:51, in GoalExplorer.generate(self, summary, textgen_config, text_gen, n)
49 logger.info(f"Error decoding JSON: {result.text[0]['content']}")
50 print(f"Error decoding JSON: {result.text[0]['content']}")
---> 51 raise ValueError(
52 "The model did not return a valid JSON object while attempting generate goals. Please try again.")
53 return result

ValueError: The model did not return a valid JSON object while attempting generate goals. Please try again.

Can you please help me with the issue? or if you have any sample code that uses the same dataset with hugging face, please do post.

I checked in goal.py the value didnt return a json object, even the model is loaded perfected, please check the below screenshot for your reference.

loading (…)of-00002.safetensors: 99%|███████████████████████████████████████▋| 3.47G/3.50G [01:16<00:00, 46.8MB/s]
Downloading (…)of-00002.safetensors: 99%|███████████████████████████████████████▊| 3.48G/3.50G [01:16<00:00, 46.0MB/s]
Downloading (…)of-00002.safetensors: 100%|███████████████████████████████████████▉| 3.49G/3.50G [01:16<00:00, 47.5MB/s]
Downloading (…)of-00002.safetensors: 100%|████████████████████████████████████████| 3.50G/3.50G [01:16<00:00, 45.6MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████| 2/2 [04:56<00:00, 148.13s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.03s/it]
Downloading (…)neration_config.json: 100%|████████████████████████████████████████████████████| 167/167 [00:00<?, ?B/s]

TypeError: llmx.generators.text.hf_textgen.HFTextGenerator() got multiple values for keyword argument 'provider'

Thanks for creating this Library but I am sure "Testing" is not been the priority before releasing this. I just installed and followed the documentation and boom "The Error" while using the HF model.

Here is the complete error:
[Traceback (most recent call last): File "C:\Users\aiany\OneDrive\Desktop\lida demo\test.py", line 8, in <module> text_gen = llm(provider="hf", model="uukuguy/speechless-llama2-hermes-orca-platypus-13b", device_map="auto") File "C:\Users\aiany\OneDrive\Desktop\lida demo\.venv\lib\site-packages\llmx\generators\text\textgen.py", line 75, in llm return HFTextGenerator(provider=provider, models=models, **kwargs) TypeError: llmx.generators.text.hf_textgen.HFTextGenerator() got multiple values for keyword argument 'provider'](url)

Just these 2 lines of code I have used:

`[from lida import llm

print("Import Successful!")

#text_gen = llm("openai")
text_gen = llm(provider="hf", model="uukuguy/speechless-llama2-hermes-orca-platypus-13b", device_map="auto")](url)`

Set default provider

How to use linda UI using hugging face models?

while running the below command, its asking for open ai api key.

linda ui --port=8080 --docs

raise ValueError(
ValueError: OpenAI API key is not set. Please set the OPENAI_API_KEY environment variable.

and another weird error which came is unrelated to project. i am not using that path.

.conda\envs\hack\Lib\pathlib.py", line 938, in _scandir
return os.scandir(self)
^^^^^^^^^^^^^^^^

OSError: [WinError 4393] The tag present in the reparse point buffer is invalid:

Can you please help me resolve the above issue?

Seeking Expert Advice with Accurately Interpreting Data from Diverse Sources, such as Financial Data or Generic Health Data

Lida is a big deal and very impressive. It fits very well with so many use cases and features. Allows us to change the current tedious process and focus on end to end streamlined features that have a incredible impact. I want to ask whether its possible to provide additional content to Summary and Goals so that the can better evaluate the data and provide goals that are aligned to the user, system, and task. For example, financial data to evaluate accruals by an FP&A analyst vs. patient records for the consumption of the patient or the provider (different context)
Description:
Lida encounters challenges when dealing with files from diverse, previously unknown sources, like financial details from a presentation or general health data from a public report. These files can house multiple datasets spanning different topics or industries. Our existing processing system may not be adept at distinguishing and correctly interpreting these varied datasets, which can lead to inaccuracies.

Use Case:
Imagine a scenario where a user imports a multi-faceted report from a financial institution. This report could blend revenue stats, expense breakdowns, and trends in health-related expenditures. Each dataset might have its unique structure and nuance.

Steps to reproduce:

Import a multifaceted file, like a financial report that contains both financial metrics and some generic health data points.
Attempt to analyze or process the data with Lida.
Expected Behavior:
Lida should:

Automatically discern and classify different data sets.
Offer user-guided options to set context or provide metadata tags for specific sections, ensuring accurate interpretation.
Actual Behavior:
Lida handles all datasets in a homogeneous manner, possibly leading to misinterpretation. Especially leans towards distribution metrics by year, rowid etc. which are not the core goals for providers or analysts.

Seeking ExpertAdvice
Integrate advanced algorithms, possibly leveraging machine learning, to better detect varied data structures/types. Persona, or system behavior issue is aligned to this . Currently there is also a need to predefined core goals that I want to dynamically check against the unknown data sets for feasibility
Allow users to set context manually, guiding Lida's data interpretation process For example views for patients vs. providers will be using the same data but different lens would be required for the goals. Again ties to persona issues. Ability to produce mock "insufficient data" charts in case of errors in goal to viz generation
Add a review step where Lida proposes potential classifications for datasets, letting users confirm or adjust based on their understanding. Also inbuilt data clean up for "millions" dates and currency signs can help with LLM or traditional EDA and clean up

Error: 'DataFrame' object has no attribute 'dtype'

File c:\ProgramData\anaconda3\lib\site-packages\lida\components\manager.py:73, in Manager.summarize(self, data, file_name, n_samples, summary_method, textgen_config)
71 self.data = data
72 # self.data = data
---> 73 return self.summarizer.summarize(
74 data=self.data, text_gen=self.text_gen, file_name=file_name, n_samples=n_samples,
75 summary_method=summary_method, textgen_config=textgen_config)

File c:\ProgramData\anaconda3\lib\site-packages\lida\components\summarizer.py:127, in Summarizer.summarize(self, data, text_gen, file_name, n_samples, textgen_config, summary_method)
125 file_name = data.split("/")[-1]
126 data = read_dataframe(data)
--> 127 data_properties = self.get_column_properties(data, n_samples)
129 # default single stage summary construction
130 base_summary = {
131 "name": file_name,
132 "file_name": file_name,
133 "dataset_description": "",
134 "fields": data_properties,
135 }

File c:\ProgramData\anaconda3\lib\site-packages\lida\components\summarizer.py:38, in Summarizer.get_column_properties(self, df, n_samples)
36 properties_list = []
37 for column in df.columns:
---> 38 dtype = df[column].dtype
39 properties = {}
40 if dtype in [int, float, complex]:

File c:\ProgramData\anaconda3\lib\site-packages\pandas\core\generic.py:6202, in NDFrame.getattr(self, name)
6195 if (
6196 name not in self._internal_names_set
6197 and name not in self._metadata
6198 and name not in self._accessors
6199 and self._info_axis._can_hold_identifiers_and_holds_name(name)
6200 ):
6201 return self[name]
-> 6202 return object.getattribute(self, name)

AttributeError: 'DataFrame' object has no attribute 'dtype'

WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu. success loading the model /usr/local/lib/python3.10/dist-packages/lida/utils.py:49: DtypeWarning: Columns (29,38,42,53,61) have mixed types. Specify dtype option on import or set low_memory=False. df = pd.read_csv(file_location)

[Idea] Vector DBs to extend dataset sizes

Love this tool! A limitation that I've run across is the token limits of LLMs when working with real-world large datasets.

I'd love to be able to point this tool to a Vector DB to extend the amount of data being worked on.

What do we think? Would this really solve the problem I've facing with token limits or is there a general limitation to LIDA because of LLM token sizes?

Converting Graph to matplotlib graph or Saving graph as a image

Hi,

I want to convert the graph into the matplotlib graph format or save the graph into the image, How can I achieve this because the current visualization function is not returning enough information to do so. Getting error while accessing the graph via rasterio. i.e. TypeError: Image data of dtype <U44560 cannot be converted to float

Another error that I got while plotting charts[0] is :- AttributeError: 'ChartExecutorResponse' object has no attribute 'savefig'

microsoft / lida Goto Github PK

lida's Introduction

LIDA: Automatic Generation of Visualizations and Infographics using Large Language Models

Features

Getting Started

Web API and UI

Building the Web API and UI with Docker

Data Summarization

Goal Generation

Visualization Generation

Visualization Editing

Visualization Explanation

Visualization Evaluation and Repair

Visualization Recommendation

Infographic Generation [WIP]

Using LIDA with Locally Hosted LLMs (HuggingFace)

Using HuggingFace Models Directly

Using an OpenAI Compatible Endpoint e.g. vllm server

Important Notes / Caveats / FAQs

Community Examples Built with LIDA

Documentation and Citation

lida's People

Contributors

Stargazers

Watchers

Forkers

lida's Issues

What

What

How

What

How

Possibly start off with a a base DataFinder class (find method), HeuristicsDataFinder subclass, AgentDataFinder subclass.

What

Work Items

What

Context

Recommend Projects

Recommend Topics

Recommend Org

Jobs