GithubHelp home page GithubHelp logo

develop-agent's Introduction

Web Development AI Agent

Python 11 License: MIT Code style: black

Basic Usage

  1. Set up your OpenAI API key:

    export OPENAI_API_KEY=<YOUR API KEY>
  2. Run the agent:

    python3 run.py --instruction 'Please develop a webpage that displays hello world.' --agent_type 'reflect' --model 'gpt-4-vision-preview'
    • --instruction: Specify the task or instruction for the agent.
    • --agent_type: Choose the type of agent. In this example, 'reflect' is used. (ReAct + Vision Feedback)
    • --model: Select the OpenAI model to be used. Here, gpt-4-vision-preview is specified.

Make sure to replace <YOUR API KEY> with your actual OpenAI API key.

Agent Architecture

AgentArch

The agent's architecture is based on the ReAct framework, which utilizes think and action steps. However, I observed that the agent often created .py files without running them, and after running the app, it failed to thoroughly check its functionality.

To address this issue, I introduced a new action called see. This action allows the agent to view its running website and verify if it is functioning as intended. As illustrated in the picture, the agent typically creates a skeleton code for the Tetris game and terminates the task prematurely.

With the addition of the see action, the agent can recognize that it has only created a skeleton code. It realizes that it needs to fill in the HTML/CSS/JS components to complete the Tetris game. This enhancement enables the agent to perform a more comprehensive evaluation of its work and make necessary adjustments to ensure the proper functioning of the developed website.

The introduction of the see action significantly improves the agent's ability to self-assess and refine its output, leading to more complete and functional web development projects.

Performance

This section presents an evaluation of an AI agent's performance on a series of web development tasks. The agent was tested on five test cases with varying levels of complexity. Each test case was designed to assess the agent's ability to complete the task within a maximum of 12 actions (hops). The success rate for each test case was determined through manual human evaluation based on 10 trials.

No. Test Case Functionality Test Success Rate
1 Hello World Webpage Verify the page correctly displays the text. 100%
2 Interactive Box Movement Webpage Check if the box can be moved smoothly with mouse interactions. 90%
3 Page Views Counter Webpage Ensure the total count of views is stored and updated with each new visit. 90%
4 Neumorphic Style Todo Webpage Test the addition, deletion, and visibility of todos. 90%
5 Tetris Game Webpage Test full game functionality including block generation and game over logic. 20%

Key Findings:

  • The AI agent excelled in basic web development tasks (Test Cases 1-4).
  • The agent demonstrated proficiency in styling webpages using the neumorphism design style (Test Case 4).
  • The agent struggled with complex tasks involving game development and advanced logic (Test Case 5).

The evaluation reveals that while the AI agent is highly capable of handling fundamental web development tasks, it faces significant difficulties in developing intricate applications like Tetris. The agent's inability to effectively identify and resolve issues in complex codebases highlights the need for further advancements in its debugging and error recovery capabilities.

develop-agent's People

Contributors

seungyounshin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.