openbmb / repoagent Goto Github PK

An LLM-powered repository agent designed to assist developers and teams in generating documentation and understanding repositories quickly.

License: Apache License 2.0

Python 97.40% Makefile 1.64% Shell 0.96%

agent chatglm gpt gpt-4 langchain llama llms qwen rag chatgpt chatgpt-api llama-index repo-level-debugging

repoagent's Introduction

RepoAgent: An LLM-Powered Framework for Repository-level Code Documentation Generation.

English readme • 简体中文 readme

📺 Demo

👾 Background

In the realm of computer programming, the significance of comprehensive project documentation, including detailed explanations for each Python file, cannot be overstated. Such documentation serves as the cornerstone for understanding, maintaining, and enhancing the codebase. It provides essential context and rationale for the code, making it easier for current and future developers to comprehend the purpose, functionality, and structure of the software. It not only facilitates current and future developers in grasping the project's purpose and structure but also ensures that the project remains accessible and modifiable over time, significantly easing the learning curve for new team members.

Traditionally, creating and maintaining software documentation demanded significant human effort and expertise, a challenge for small teams without dedicated personnel. The introduction of Large Language Models (LLMs) like GPT has transformed this, enabling AI to handle much of the documentation process. This shift allows human developers to focus on verification and fine-tuning, greatly reducing the manual burden of documentation.

🏆 Our goal is to create an intelligent document assistant that helps people read and understand repositories and generate documents, ultimately helping people improve efficiency and save time.

✨ Features

🤖 Automatically detects changes in Git repositories, tracking additions, deletions, and modifications of files.
📝 Independently analyzes the code structure through AST, generating documents for individual objects.
🔍 Accurate identification of inter-object bidirectional invocation relationships, enriching the global perspective of document content.
📚 Seamlessly replaces Markdown content based on changes, maintaining consistency in documentation.
🕙 Executes multi-threaded concurrent operations, enhancing the efficiency of document generation.
👭 Offer a sustainable, automated documentation update method for team collaboration.
😍 Display Code Documentation in an amazing way. (with document book per project powered by Gitbook)

🚀 Getting Started

Installation Method

Using pip (Recommended for Users)

Install the repoagent package directly using pip:

pip install repoagent

Development Setup Using PDM

If you're looking to contribute or set up a development environment:

Install PDM: If you haven't already, install PDM.
Use CodeSpace, or Clone the Repository:
- Use CodeSpace The easiest way to get RepoAgent enviornment. Click below to use the GitHub Codespace, then go to the next step.
- Clone the Repository
```
git clone https://github.com/LOGIC-10/RepoAgent.git
cd RepoAgent
```
Setup with PDM
- Initialize the Python virtual environment. Make sure to run the below cmd in /RepoAgent directory:
```
pdm venv create --name repoagent
```
- Activate virtual environment
- Install dependencies using PDM
```
 pdm install
```

Configuring RepoAgent

Before configuring specific parameters for RepoAgent, please ensure that the OpenAI API is configured as an environment variable in the command line:

export OPENAI_API_KEY=YOUR_API_KEY # on Linux/Mac
set OPENAI_API_KEY=YOUR_API_KEY # on Windows
$Env:OPENAI_API_KEY = "YOUR_API_KEY" # on Windows (PowerShell)

Use repoagent configure if you need to modify the running parameters.

Enter the path to target repository: 
Enter the project hierarchy file name [.project_doc_record]: 
Enter the Markdown documents folder name [markdown_docs]: 
Enter files or directories to ignore, separated by commas []: 
Enter the language (ISO 639 code or language name, e.g., 'en', 'eng', 'English') [Chinese]: 
Enter the maximum number of threads [4]: 
Enter the maximum number of document tokens [1024]: 
Enter the log level (DEBUG, INFO, WARNING, ERROR, CRITICAL) [INFO]: 
Enter the model [gpt-3.5-turbo]: 
Enter the temperature [0.2]: 
Enter the request timeout (seconds) [60.0]: 
Enter the base URL [https://api.openai.com/v1]:

Run RepoAgent

Enter the root directory of RepoAgent and try the following command in the terminal:

repoagent run #this command will generate doc, or update docs(pre-commit-hook will automatically call this)

The run command supports the following optional flags (if set, will override config defaults):

-m, --model TEXT: Specifies the model to use for completion. Default: gpt-3.5-turbo
-t, --temperature FLOAT: Sets the generation temperature for the model. Lower values make the model more deterministic. Default: 0.2
-r, --request-timeout INTEGER: Defines the timeout in seconds for the API request. Default: 60
-b, --base-url TEXT: The base URL for the API calls. Default: https://api.openai.com/v1
-tp, --target-repo-path PATH: The file system path to the target repository. Used as the root for documentation generation. Default: path/to/your/target/repository
-hp, --hierarchy-path TEXT: The name or path for the project hierarchy file, used to organize documentation structure. Default: .project_doc_record
-mdp, --markdown-docs-path TEXT: The folder path where Markdown documentation will be stored or generated. Default: markdown_docs
-i, --ignore-list TEXT: A list of files or directories to ignore during documentation generation, separated by commas.
-l, --language TEXT: The ISO 639 code or language name for the documentation. Default: Chinese
-ll, --log-level [DEBUG|INFO|WARNING|ERROR|CRITICAL]: Sets the logging level for the application. Default: INFO

You can also try the following feature

repoagent clean # Remove repoagent-related cache
repoagent print-hierarchy # Print how repo-agent parse the target repo
repoagent diff # Check what docs will be updated/generated based on current code change

If it's your first time generating documentation for the target repository, RepoAgent will automatically create a JSON file maintaining the global structure information and a folder named Markdown_Docs in the root directory of the target repository for storing documents.

Once you have initially generated the global documentation for the target repository, or if the project you cloned already contains global documentation information, you can then seamlessly and automatically maintain internal project documentation with your team by configuring the pre-commit hook in the target repository!

Use `pre-commit`

RepoAgent currently supports generating documentation for projects, which requires some configuration in the target repository.

First, ensure that the target repository is a git repository and has been initialized.

git init

Install pre-commit in the target repository to detect changes in the git repository.

pip install pre-commit

Create a file named .pre-commit-config.yaml in the root directory of the target repository. An example is as follows:

repos:
  - repo: local
    hooks:
    - id: repo-agent
      name: RepoAgent
      entry: repoagent
      language: system
      pass_filenames: false # prevent from passing filenames to the hook
      # You can specify the file types that trigger the hook, but currently only python is supported.
      types: [python]

For specific configuration methods of hooks, please refer to pre-commit. After configuring the yaml file, execute the following command to install the hook.

pre-commit install

In this way, each git commit will trigger the RepoAgent's hook, automatically detecting changes in the target repository and generating corresponding documents. Next, you can make some modifications to the target repository, such as adding a new file to the target repository, or modifying an existing file. You just need to follow the normal git workflow: git add, git commit -m "your commit message", git push The RepoAgent hook will automatically trigger at git commit, detect the files you added in the previous step, and generate corresponding documents.

After execution, RepoAgent will automatically modify the staged files in the target repository and formally submit the commit. After the execution is completed, the green "Passed" will be displayed, as shown in the figure below:

The generated document will be stored in the specified folder in the root directory of the target warehouse. The rendering of the generated document is as shown below:

We utilized the default model gpt-3.5-turbo to generate documentation for the XAgent project, which comprises approximately 270,000 lines of code. You can view the results of this generation in the Markdown_Docs directory of the XAgent project on GitHub. For enhanced documentation quality, we suggest considering more advanced models like gpt-4-1106 or gpt-4-0125-preview.

In the end, you can flexibly adjust the output format, template, and other aspects of the document by customizing the prompt. We are excited about your exploration of a more scientific approach to Automated Technical Writing and your contributions to the community.

Exploring chat with repo

We conceptualize Chat With Repo as a unified gateway for these downstream applications, acting as a connector that links RepoAgent to human users and other AI agents. Our future research will focus on adapting the interface to various downstream applications and customizing it to meet their unique characteristics and implementation requirements.

Here we demonstrate a preliminary prototype of one of our downstream tasks: Automatic Q&A for Issues and Code Explanation. You can start the server by running the following code.

repoagent chat-with-repo

✅ Future Work

Generate README.md automatically combining with the global documentation
Multi-programming-language support Support more programming languages like Java, C or C++, etc.
Local model support like Llama, chatGLM, Qwen, GLM4, etc.

🥰 Featured Cases

Here are featured cases that have adopted RepoAgent.

MiniCPM: An edge-side LLM of 2B size, comparable to 7B model.
ChatDev: Collaborative AI agents for software development.
XAgent: An Autonomous LLM Agent for Complex Task Solving.
EasyRL4Rec: A user-friendly RL library for recommender systems.

📊 Citation

@misc{luo2024repoagent,
      title={RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation}, 
      author={Qinyu Luo and Yining Ye and Shihao Liang and Zhong Zhang and Yujia Qin and Yaxi Lu and Yesai Wu and Xin Cong and Yankai Lin and Yingli Zhang and Xiaoyin Che and Zhiyuan Liu and Maosong Sun},
      year={2024},
      eprint={2402.16667},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

repoagent's People

Contributors

Stargazers

Watchers

repoagent's Issues

cannot run or configure the repoagent, anyone facing similar issues?

Tried this on both windows and macos, got the same error

Traceback (most recent call last):
File "/Users/XXX/Library/Python/3.9/bin/repoagent", line 5, in
from repo_agent.main import app
File "/Users/XXX/Library/Python/3.9/lib/python/site-packages/repo_agent/main.py", line 28, in
repo_path: Annotated[str, typer.Option(prompt="Enter the path to your local repository")] = settings.repo_path ,
File "/Users/XXX/Library/Python/3.9/lib/python/site-packages/dynaconf/base.py", line 145, in getattr
value = getattr(self._wrapped, name)
File "/Users/XXX/Library/Python/3.9/lib/python/site-packages/dynaconf/base.py", line 328, in getattribute
return super().getattribute(name)
AttributeError: 'Settings' object has no attribute 'REPO_PATH'

maximum recursion depth exceeded in comparison

Receiving the following error when :
2024-07-09 12:48:43.996 | SUCCESS | repo_agent.log:set_logger_level_from_config:74 - Log level set to INFO!
parsing parent relationship: 0%| parsing parent relationship: 100%|█████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 8184.01it/s]
Loading MetaInfo: /home/jovyan/work/Documentation/Arya/wiki/hierarchy_files
MetaInfo is Refreshed and Saved
2024-07-09 12:48:44.015 | INFO | repo_agent.runner:first_generate:104 - Starting to generate documentation
parsing bidirectional reference: 0%| | 0/2 [00:00<?, ?it/s]2024-07-09 12:48:44.198 | INFO | repo_agent.doc_meta_info:find_all_referencer:293 - Error occurred: column parameter (6) is not in a valid range (0-0) for line 205 ('\n').
2024-07-09 12:48:44.198 | INFO | repo_agent.doc_meta_info:find_all_referencer:294 - Parameters: variable_name=Embed, file_path=wikiv1.py, line_number=205, column_number=6
2024-07-09 12:48:44.208 | INFO | repo_agent.doc_meta_info:find_all_referencer:293 - Error occurred: line parameter is not in a valid range.
2024-07-09 12:48:44.208 | INFO | repo_agent.doc_meta_info:find_all_referencer:294 - Parameters: variable_name=create_collection, file_path=wikiv1.py, line_number=284, column_number=4
parsing bidirectional reference: 50%|█████████████████████████████████████▌ | 1/2 [00:00<00:00, 5.17it/s]2024-07-09 12:48:44.212 | INFO | repo_agent.doc_meta_info:find_all_referencer:293 - Error occurred: column parameter (6) is not in a valid range (0-0) for line 205 ('\n').
2024-07-09 12:48:44.213 | INFO | repo_agent.doc_meta_info:find_all_referencer:294 - Parameters: variable_name=Embed, file_path=.ipynb_checkpoints/wikiv1-checkpoint.py, line_number=205, column_number=6
2024-07-09 12:48:44.220 | INFO | repo_agent.doc_meta_info:find_all_referencer:293 - Error occurred: line parameter is not in a valid range.
2024-07-09 12:48:44.220 | INFO | repo_agent.doc_meta_info:find_all_referencer:294 - Parameters: variable_name=create_collection, file_path=.ipynb_checkpoints/wikiv1-checkpoint.py, line_number=284, column_number=4
parsing bidirectional reference: 100%|███████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 9.74it/s]
parsing topology task-list: 0%| | 0/10 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/jovyan/work/repoagent/RepoAgent/repo_agent/main.py", line 312, in
cli()
File "/opt/conda/envs/repoenv/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/repoenv/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/opt/conda/envs/repoenv/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/repoenv/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/repoenv/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/work/repoagent/RepoAgent/repo_agent/main.py", line 260, in run
runner.run()
File "/home/jovyan/work/repoagent/RepoAgent/repo_agent/runner.py", line 240, in run
self.first_generate() # 如果是第一次做文档生成任务，就通过first_generate生成所有文档
^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/work/repoagent/RepoAgent/repo_agent/runner.py", line 106, in first_generate
task_manager = self.meta_info.get_topology(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/work/repoagent/RepoAgent/repo_agent/doc_meta_info.py", line 619, in get_topology
task_manager = self.get_task_manager(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/work/repoagent/RepoAgent/repo_agent/doc_meta_info.py", line 577, in get_task_manager
if task_available_func(child) and (child not in deal_items):
^^^^^^^^^^^^^^^^^^^^^^^
File "", line 4, in eq
File "", line 4, in eq
File "", line 4, in eq
[Previous line repeated 851 more times]
RecursionError: maximum recursion depth exceeded in comparison
parsing topology task-list: 90%|███████████████████████████████████████████████████████████████████████ | 9/10 [00:00<00:00, 34.28it/s]

repoagent run生成后无法运行serve windows64系统

1.repoagent run命令后还是没有config.yml文件，手动新建和配置config.yml可以运行gitbook
2.主程序运行太慢了，有办法设置运行快一点吗

chat with repo workflow issue

Chat with Repo 项目要求

核心概念

目标: 创建一个能够与代码仓库进行交互的聊天系统。
总结 : 根据用户的问题做匹配，匹配到对应的文档、代码、引用关系。然后将匹配到的结果送给大模型，让大模型去做思考，最后生成回答。
灵感来源: LangChian 的 RAG over code | 🦜️🔗 Langchain

具体要求

动态更新文档chunks对应的向量
- 文档变更监控: 由于MD文件内容可能会频繁变更，系统必须使用工具监控文档的更改，并相应地更新文档chunks的向量表示。
- 向量存储与版本控制: 必须有一个高效的向量存储系统来管理文档与其向量表示之间的一致性。
组织文档和代码块
- 检索方法: 进行embedding search，将用户查询转换成向量，再与向量数据库中的内容进行比较。基于相似度选择最相关的几个块进行返回，确保回答的准确性和相关性。
代码整合
- 加入原始代码: 文档对应的代码也应进行向量化处理，以便整合到检索过程中。
- 多路召回: 除了向量化处理外，还包括关键字检索等传统搜索方法，以及可能的语义搜索和模式匹配技术，以增强检索的全面性和准确度。
处理引用关系
- 代码块引用关系: 召回的代码块应包含其在项目中的具体位置信息，以及它与其他代码块或文档的引用关系。这有助于构建更加全面和连贯的上下文。（当前已实现）
大模型的总结与回答
- 综合回答: 大模型应该能够进行综合分析，理解复杂的代码和文档关系，并基于召回的内容形成对用户查询的综合回答。

AttributeError in runner.py When Handling Exceeded Context Length

Description:
Encountered an AttributeError in runner.py after multiple attempts to process a long code snippet using the gpt-3.5-turbo-16k model.

Error Messages:
Repeated errors indicating the model's maximum context length was exceeded:

Error: The model's maximum context length is exceeded. Reducing the length of the messages. Attempt 1 of 5
...
Error: The model's maximum context length is exceeded. Reducing the length of the messages. Attempt 5 of 5

Followed by an AttributeError:

   File "ai_doc\runner.py", line 341, in <module>
    runner.run()
  File "ai_doc\runner.py", line 165, in run
    self.process_file_changes(repo_path, file_path, is_new_file)
  File "ai_doc\runner.py", line 217, in process_file_changes
    json_data[file_handler.file_path] = self.update_existing_item(json_data[file_handler.file_path], file_handler, changes_in_pyfile)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "ai_doc\runner.py", line 298, in update_existing_item
    future.result()
  File "Python\Python311\Lib\concurrent\futures\_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "Python\Python311\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "Python\Python311\Lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "ai_doc\runner.py", line 308, in update_object
    obj["md_content"] = response_message.content
                        ^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'content'

Suspected Issue:
The response_message object is None, likely due to the previous errors where the model's maximum context length was exceeded. The code attempts to access the content attribute of a NoneType object, leading to the AttributeError.

Suggested Solution:

Investigate why the model's maximum context length is being exceeded and attempt to reduce the input size accordingly.
Implement a check to ensure response_message is not None before attempting to access its content attribute. This could prevent the AttributeError and provide a clearer indication of the underlying issue.

`generate_overall_structure` processed an overly broad range of files.

Description

The generate_overall_structure method in our codebase is currently processing a wider range of files than necessary. This behavior is leading to the inclusion of files from directories like .venv and others that are not relevant to our intended use case.

Code Snippet:

def generate_overall_structure(self):
    repo_structure = {}
    for root, dirs, files in os.walk(self.repo_path):
        for file in files:
            if file.endswith('.py'):
                relative_file_path = os.path.relpath(os.path.join(root, file), self.repo_path)
                repo_structure[relative_file_path] = self.generate_file_structure(relative_file_path)
    return repo_structure

Observed Behavior

The method traverses all directories within self.repo_path, including those like .venv. It adds all Python files to the repo_structure dictionary, regardless of whether they are part of the virtual environment or other non-essential directories.

Expected Behavior

Ideally, the method should ignore directories that are not relevant to the repository's core functionality, such as .venv, __pycache__, and others typically found in a Python project's .gitignore file.

Suggested Fix

We might need to integrate a filtering mechanism that aligns with the patterns specified in .gitignore, or explicitly define a list of directories to ignore during the traversal process.

Additional Context

This issue can lead to unnecessary bloating of the repo_structure and may also cause performance issues if the method processes a large number of irrelevant files.

这个项目的todo还会继续做吗？

pre-commit运行异常的解决方案设计

问题

在install部分我有写到pre commit 的特性是有文件更改之后提交就会显示falied，需要手动 no verify再commit一次，这个体验就降低了一点，有没有更好的办法？

API设计

看起来似乎是需要一个类似于black的pre-commit实现的效果，即自动运行文档生成命令以后并返回正确的结果。

因此实际上这里主要是需要把命令行的返回值做很准确的调整。比如black的主函数自己准确地控制了给命令行返回的值。

解决方案

在命令行内部把每次更新的结果提交到暂存区；
控制命令行的结果为返回正常。

生成文档的时候出现了递归溢出

前面执行都很顺畅，但是解析到21%的时候突然就溢出了

Facing problem with project_hierarchy.json file

I found the project_hierarchy.json file in file_handler.py repo and other repo as well. In the setting file, it is also initialised but i am not cleared from where it is written? I am facing issue with this.
Please help me to clear my doubt.

Thanks

PermissionError raised in `ai_doc/file_handler.py write_file` function

修复文件路径处理和目录创建中的权限错误

问题描述

在项目的 write_file 函数中，我们遇到了一个关于文件路径处理的问题。当尝试结合两个路径参数来创建一个文件时，如 /workspaces/AI_doc 和 /Markdown_Docs/ai_doc/runner.md，函数错误地处理了这些路径，导致尝试在根目录下创建目录，从而出现权限错误。

具体表现为 PermissionError: [Errno 13] Permission denied: '/Markdown_Docs'。这是因为第二个路径被误解释为绝对路径，而不是预期的相对路径。

问题代码

def write_file(self, file_path, content):
    """
    写入文件内容

    Args:
        repo_path (str): 仓库路径
        file_path (str): 文件路径
        content (str): 文件内容
    """
    file_path = os.path.join(self.repo_path, file_path)
    os.makedirs(os.path.dirname(file_path), exist_ok=True)
    with open(file_path, 'w') as file:
        file.write(content)

修改建议

为解决这个问题，建议如下修改：

路径格式检查：确保 file_path 是相对路径。如果以 / 开头，则去除这个前导字符。
改进路径连接：在 os.path.join(self.repo_path, file_path) 调用中，正确地处理路径，确保所有路径部分正确地组合在一起。
目录创建逻辑：在尝试写入文件之前，确保所有中间目录已被创建，避免权限错误。

修复后代码

import os

def create_directory(base_path, file_path):
    # 确保file_path是相对路径
    if file_path.startswith('/'):
        # 移除开头的 '/'
        file_path = file_path[1:]

    # 使用os.path.join连接路径
    full_path = os.path.join(base_path, file_path)

    # 提取目录部分
    directory_path = os.path.dirname(full_path)

    # 创建目录
    os.makedirs(directory_path, exist_ok=True)

    return directory_path

# 调用函数
base_path = '/workspaces/AI_doc'
file_path = '/Markdown_Docs/ai_doc/runner.md'
created_directory = create_directory(base_path, file_path)
print(f"Created directory: {created_directory}")

Deleting the files or object

How can we implement the deleting features like deleting the any .py files or object in the repo and updating the repo?

win 11 repoagent configure 失败

报错信息

(venv) PS C:\git\RepoAgent> repoagent configure
C:\git\RepoAgent\venv\Lib\site-packages\onnxruntime\capi\onnxruntime_validation.py:26: UserWarning: Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only.
  warnings.warn(
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\git\RepoAgent\venv\Scripts\repoagent.exe\__main__.py", line 4, in <module>
  File "C:\git\RepoAgent\repo_agent\main.py", line 12, in <module>
    from repo_agent.chat_with_repo import main as run_chat_with_repo
  File "C:\git\RepoAgent\repo_agent\chat_with_repo\__init__.py", line 3, in <module>
    from .main import main
  File "C:\git\RepoAgent\repo_agent\chat_with_repo\main.py", line 3, in <module>
    from repo_agent.settings import setting
  File "C:\git\RepoAgent\repo_agent\settings.py", line 87, in <module>
    setting = Setting.model_validate(_config_data)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\git\RepoAgent\venv\Lib\site-packages\pydantic\main.py", line 509, in model_validate
    return cls.__pydantic_validator__.validate_python(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for Setting
chat_completion.openai_api_key
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing

查了下已经修复了但是没有release

microsoft/onnxruntime@7b46b31

Questions: tree sitter, git, ollama

Hello, interesting project and architecture.
I see that the support for other programming languages is left for future. Have you considered using tree-sitter for code parsing?

Also, why did you decide to use pre-commit hooks instead of pullling git repository with a scheduler. Llama index github reader could be leveraged in that case.

Do you plan to support Ollama and if so, which of the open source models you reckon would be the best fit?

Thanks

Solution about develop unit test & new features

Please share you thoughts exploration process here. @OctoberFox11

Unit Test Generation.
Automated Development of New features.

是否实现自举呢？

Repo Agent本身的文档是否是由Repo Agent生成的？如果是的话在哪里可以看到？

Solution about Document tutorial & README generation

Discussion: talk about the solution for Document tutorial & README generation

README Generation.
Generation of Public Tutorial Documentation.

Document tutorial method

README method

How to use AzureOpenAI instead of OpenAI

Hello, I loved the project and its workflow

I wanted to use AzureOpenAI instead of OpenAi. Can you please explain the process and template code for the same.

[bug]这个代码有几个人是能run起来的，好多bug

`2024-07-25 17:26:12.759 | ERROR | repo_agent.chat_with_repo.json_handler:read_json_file:17 - File not found: .project_doc_record/project_hierarchy.json
Traceback (most recent call last):

File "/Users/Desktop/workspace/RepoAgent/repo_agent/main.py", line 310, in
run_chat_with_repo()
└ <function main at 0x7fef1231f880>

File "/Users/Desktop/workspace/RepoAgent/repo_agent/chat_with_repo/main.py", line 16, in main
md_contents, meta_data = assistant.json_data.extract_data()
│ │ └ <function JsonFileProcessor.extract_data at 0x7feefe17df30>
│ └ <repo_agent.chat_with_repo.json_handler.JsonFileProcessor object at 0x7feefeaca110>
└ <repo_agent.chat_with_repo.rag.RepoAssistant object at 0x7feefea91480>

File "/Users/Desktop/workspace/RepoAgent/repo_agent/chat_with_repo/json_handler.py", line 22, in extract_data
json_data = self.read_json_file()
│ └ <function JsonFileProcessor.read_json_file at 0x7feefe17dea0>
└ <repo_agent.chat_with_repo.json_handler.JsonFileProcessor object at 0x7feefeaca110>

File "/Users/Desktop/workspace/RepoAgent/repo_agent/chat_with_repo/json_handler.py", line 13, in read_json_file
with open(self.file_path, "r", encoding="utf-8") as file:
│ └ PosixPath('.project_doc_record/project_hierarchy.json')
└ <repo_agent.chat_with_repo.json_handler.JsonFileProcessor object at 0x7feefeaca110>

FileNotFoundError: [Errno 2] No such file or directory: '.project_doc_record/project_hierarchy.json'`

Traceback (most recent call last): File "/Users/anaconda3/envs/freqtrade/bin/repoagent", line 5, in <module> from repo_agent.main import cli File "/Users/anaconda3/envs/freqtrade/lib/python3.10/site-packages/repo_agent/main.py", line 12, in <module> from repo_agent.chat_with_repo import main as run_chat_with_repo File "/Users/anaconda3/envs/freqtrade/lib/python3.10/site-packages/repo_agent/chat_with_repo/__init__.py", line 3, in <module> from .main import main File "/Users/anaconda3/envs/freqtrade/lib/python3.10/site-packages/repo_agent/chat_with_repo/main.py", line 3, in <module> from repo_agent.settings import setting File "/Users/anaconda3/envs/freqtrade/lib/python3.10/site-packages/repo_agent/settings.py", line 1, in <module> from enum import StrEnum ImportError: cannot import name 'StrEnum' from 'enum' (/Users/anaconda3/envs/freqtrade/lib/python3.10/enum.py)

请问什么时候可以支持Java项目的文档生成？

KeyError: `default_completion_kwargs` raisd in ai_doc\chat_engine.py

Description:

Encountered a KeyError when accessing default_completion_kwargs in chat_engine.py.

Code Snippet:

model = self.config["default_completion_kwargs"]["model"]

Error Message:

  File "ai_doc\chat_engine.py", line 103, in generate_doc
    model = self.config["default_completion_kwargs"]["model"]
            ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'default_completion_kwargs'

Expected Behavior:

The default_completion_kwargs key should be present in the config dictionary.

Additional Improvement:

Propose to include a new function find_engine_or_model to efficiently search for 'engine' or 'model' keys in nested dictionaries. The function returns the first occurrence of either key.

def find_engine_or_model(data):
    for first_level_key, first_level_value in data['api_keys'].items():
        for item in first_level_value:
            if 'engine' in item:
                return item['engine']
            elif 'model' in item:
                return item['model']
    return None

Solution about Document Generation

使用pip安装完之后运行repoagent configure 报错

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.12/bin/repoagent", line 5, in
from repo_agent.main import app
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/repo_agent/main.py", line 28, in
repo_path: Annotated[str, typer.Option(prompt="Enter the path to your local repository")] = settings.repo_path ,
^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/dynaconf/base.py", line 145, in getattr
value = getattr(self._wrapped, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/dynaconf/base.py", line 328, in getattribute
return super().getattribute(name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Settings' object has no attribute 'REPO_PATH'

如何使用deepseek的模型去跑呢，我设置后还是显示模型不存在Enter the model [gpt-3.5-turbo]: deepseek-chat

Enter the path to target repository: D:\RepoAgent
Enter the project hierarchy file name [.project_doc_record]:
Enter the Markdown documents folder name [markdown_docs]:
Enter files or directories to ignore, separated by commas []:
Enter the language (ISO 639 code or language name, e.g., 'en', 'eng', 'English') [Chinese]:
Enter the maximum number of threads [4]:
Enter the maximum number of document tokens [1024]:
Enter the log level (DEBUG, INFO, WARNING, ERROR, CRITICAL) [INFO]:
2024-05-13 17:58:11.440 | SUCCESS | repo_agent.main:configure:109 - Project settings saved successfully.
Enter the model [gpt-3.5-turbo]: deepseek-chat
Enter the temperature [0.2]:
Enter the request timeout (seconds) [60.0]:
Enter the base URL [https://api.openai.com/v1]: https://api.deepseek.com/v1
2024-05-13 17:58:56.656 | SUCCESS | repo_agent.main:configure:129 - Chat completion settings saved successfully.

change_detector.get_unstaged_mds() behaves out of expectation.

This function actually add all the untracked files to git.
I guess this is not the expected behavior.

Handling `UnicodeDecodeError` During File Read Operation

Description:

Encountered a UnicodeDecodeError while attempting to read content from a file that contains a mix of English and Chinese characters. The content was initially saved with utf-8 encoding but resulted in encoding errors when read back from the file.

Error Message:

  File "AI_doc\ai_doc\runner.py", line 341, in <module>
    runner.run()
  File "AI_doc\ai_doc\runner.py", line 165, in run
    self.process_file_changes(repo_path, file_path, is_new_file)
  File "AI_doc\ai_doc\runner.py", line 225, in process_file_changes
    markdown = file_handler.convert_to_markdown_file(file_path=file_handler.file_path)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    json_data = json.load(f)
                ^^^^^^^^^^^^
  File "Python\Python311\Lib\json\__init__.py", line 293, in load
    return loads(fp.read(),
                 ^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 207: invalid start byte

Issue Details:
The error occurred in the convert_to_markdown_file method, which suggests that the file content may have been incorrectly encoded or that the file contains a mix of encodings that are not properly handled by the standard utf-8 decoder.

Content Example:

        "synthesize_voice": {
            "type": "FunctionDef",
            "name": "synthesize_voice",
            "md_content": "**synthesize_voice����**���ú����Ĺ����ǽ�ָ���������ϳ�Ϊ�����������ϳɵ��������浽ָ�����ļ����С�\n\n�ú�������ϸ�������������£�\n\n- ���ȣ���voice_name_details��������ȡ�������ƺ��Ա���Ϣ���������ƺ��Ա���Ϣ֮��ʹ����������\"��\"��\"��\"���зָ���ͨ��rsplit�������������ƺ��Ա���Ϣ���룬��ʹ��rstrip����ȥ���Ա���Ϣĩβ���������š�Ȼ��ʹ��replace�������Ա���Ϣ�е�\"Ů��\"�滻Ϊ\"Ů\"����\"��ͯ\"�滻Ϊ\"ͯ\"���Լ��Ա�ı�ʾ��ʽ��\n\n- ���������������������õ�speech_config�����speech_synthesis_voice_name�����У��Ա��������ϳ�ʱʹ��ָ����������\n\n- Ȼ��ʹ��ѭ������������Դ����ĳ��ԡ�\n\n- ��ÿ�γ����У����ȳ�ʼ��SpeechSynthesizer���󣬲�����speech_config������\n\n- Ȼ��ʹ��os.path.join����������ļ��к��������ơ��Ա�ƴ�ӳ������Ƶ�ļ���·����\n\n- ���ţ�����AudioConfig���󣬽��ļ�·������filename������\n\n- Ȼ��ʹ��ָ����audio_config������ʼ��SpeechSynthesizer����\n\n- ����SpeechSynthesizer�����speak_text_async��������Ҫ�ϳɵ��ı���Ϊ�������룬��ʹ��get������ȡ�ϳɽ����\n\n- ���ϳɽ����reason���ԣ�����ϳɳɹ������ӡ�ϳɳɹ�����ʾ��Ϣ�������ء�\n\n- ����ϳɱ�ȡ�������ӡȡ����ԭ�򣬲�����ȡ����ԭ�������Ӧ�Ĵ�����\n\n- ��������쳣�����ӡ�쳣��Ϣ������ָ���������ӳ�ʱ���������ԡ�\n\n- ����ﵽ������Դ�����Ȼ�޷��ϳ����������ӡ�ϳ�ʧ�ܵ���ʾ��Ϣ��\n\n**ע��**��ʹ�øú���ʱ��Ҫע�����¼��㣺\n- ��Ҫ�ṩ�ϳ��������������ƺ��Ա���Ϣ��\n- ��Ҫ�ṩSpeechConfig������Ϊ���������ڸö��������ú��ʵ�������Ϣ��\n- ��Ҫ�ṩ����ļ��е�·����\n- ��Ҫָ��������Դ����������ӳ�ʱ�䡣\n\n**���ʾ��**������ɹ��ϳ��������������浽��ָ�����ļ����С�",
            "code_start_line": 42,
            "code_end_line": 85,
            "parent": null,
            "have_return": true,
            "code_content": "def synthesize_voice(voice_name_details, speech_config, output_folder, max_retries, retry_delay):\n    # Extract voice name and gender from the details\n    voice_name, gender = voice_name_details.rsplit('��', 1)\n    gender = gender.rstrip('��')\n    gender = gender.replace('Ů��', 'Ů').replace('��ͯ', 'ͯ')  # Simplify gender notation\n\n    # Set the voice name in the speech config.\n    speech_config.speech_synthesis_voice_name = f\"zh-CN-{voice_name}\"\n\n    for attempt in range(max_retries):\n        try:\n            # Initialize speech synthesizer.\n            synthesizer = SpeechSynthesizer(speech_config=speech_config)\n\n            # Get the path to the output audio file.\n            file_path = os.path.join(output_folder, f\"{voice_name}_{gender}.wav\")\n\n            audio_config = AudioConfig(filename=file_path)\n\n            # Use the synthesizer with the specified audio configuration\n            synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)\n\n            # Synthesize the voice name to a file.\n            result = synthesizer.speak_text_async(example_text).get()\n\n            # Check the result and break the loop if successful.\n            if result.reason == ResultReason.SynthesizingAudioCompleted:\n                print(f\"Speech synthesized for voice {voice_name} and saved to {file_path}\")\n                return\n            elif result.reason == ResultReason.Canceled:\n                cancellation_details = result.cancellation_details\n                print(f\"Speech synthesis canceled: {cancellation_details.reason}\")\n                if cancellation_details.reason == CancellationReason.Error:\n                    if cancellation_details.error_details:\n                        print(f\"Error details: {cancellation_details.error_details}\")\n                        raise Exception(cancellation_details.error_details)\n        except Exception as e:\n            print(f\"An error occurred: {e}. Retrying in {retry_delay} seconds.\")\n            time.sleep(retry_delay)\n        \n\n    print(f\"Failed to synthesize voice {voice_name} after {max_retries} attempts.\")\n",
            "name_column": 4
        }

A snippet from the file content includes function definitions and comments in both English and Chinese. The original content has been corrupted with a series of �� characters, which are indicative of encoding issues.

Solution Discussed:
To address this issue, Maybe using charset_normalizer to read the file in subsequent logic operations. This approach involves using charset_normalizer to re-read the file content successfully, detecting the correct encoding, and decoding the file content properly.

Proposed Changes to Workflow:

Integrate charset_normalizer into the file-reading step of the workflow to handle files with mixed or uncertain encodings.
Replace instances of direct file reading with charset_normalizer to ensure content is correctly decoded before processing.
Ensure all files are saved with a consistent encoding (utf-8 recommended) to prevent similar issues in the future.

Additional Context:
This solution aims to normalize the file content during the read operation without changing the initial file-saving behavior. By processing the encoding on read, we can handle files from various sources and encoding states more robustly.

openbmb / repoagent Goto Github PK

repoagent's Introduction

RepoAgent: An LLM-Powered Framework for Repository-level Code Documentation Generation.

📺 Demo

👾 Background

✨ Features

🚀 Getting Started

Installation Method

Using pip (Recommended for Users)

Development Setup Using PDM

Configuring RepoAgent

Run RepoAgent

Use pre-commit

Exploring chat with repo

✅ Future Work

🥰 Featured Cases

📊 Citation

repoagent's People

Contributors

Stargazers

Watchers

Forkers

repoagent's Issues

Chat with Repo 项目要求

核心概念

具体要求

Description

Observed Behavior

Expected Behavior

Suggested Fix

Additional Context

问题

API设计

解决方案

修复文件路径处理和目录创建中的权限错误

问题描述

问题代码

修改建议

修复后代码

Discussion: talk about the solution for Document tutorial & README generation

Document tutorial method

README method

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Use `pre-commit`