GithubHelp home page GithubHelp logo

thunlp / legent Goto Github PK

View Code? Open in Web Editor NEW
177.0 8.0 13.0 1.82 MB

Open Platform for Embodied Agents

Home Page: https://docs.legent.ai

License: Apache License 2.0

Python 99.48% Shell 0.52%
embodied-ai language-grounding large-multimodal-models physics-engine robot-simulator

legent's Introduction

LEGENT

Open Platform for Embodied Agents


Introduction

In the future, robots will perceive the environment as we do, communicate with us through natural language and help us with our tasks. LEGENT is dedicated to developing robots that can chat, see, and act from virtual worlds to the real world. Designed to integrate large models with embodied agents, this platform prioritizes ease of use and scalability, focusing on developing:

  • An easy-to-use environment that simulates a physical world, where an agent can interact with humans through language, receive egocentric vision, and perform physical actions.

  • Automated generation of training data, including the generation of scenes, tasks, and agent trajectories. The platform is tailored to train large multimodal models as embodied models, using generated data from simulated worlds at scale. LEGENT serves as the data engine for embodied models in robotics and games, as well as for world models.

Demonstration

Interact with the embodied agent in realistic scenes.

robotics.mp4

Interact with the embodied agent in stylized scenes.

game.mp4

Features

  • Language Interaction. Use natural language as the human-robot interaction interface.

  • Fundamental Physics. The simulation incorporates gravity, friction, and collision dynamics.

  • Diverse Rendering. By adjusting assets and rendering features, LEGENT can achieve photorealistic rendering and stylized rendering. Instructions for trying out these scenes can be found here.

    photorealistic.mp4
  • Interactable Objects. Agents and humans can manipulate various 3D objects.

    interactable_objects.mp4
  • Scalable Assets. LEGENT supports importing (1) your own 3D objects, (2) objects from academic datasets, and (3) objects created by generative models. Learn more here. Note that the available adequately annotated 3D objects are very limited and vary a lot in format and quality. We are compiling a unified, open object assets library that can be freely used for embodied agent research.

    assets_generated.mp4
    assets_minecraft.mp4
  • Humanoid Animation. Body movement and nonverbal expression are also important for embodied agents. LEGENT will continue to enhance support in this aspect.

  • Scene Generation. LEGENT integrates advanced scene generation algorithms to support scalable training.

    scene.generation.mp4
  • Trajectory Generation. Automatic generation of training data for training multimodal models into language-grounded embodied models. A minimal example of a trajectory:

    0000 0001 0002 0003
    {
      "id": "20240509-223825-320898",
      "interactions": [
          {
              "from": "human",
              "text": "Where is the orange?"
          },
          {
              "from": "agent",
              "trajectory": [
                  {
                      "image": "20240509-223825-320898/0000.png",
                      "action": "rotate_right(18)"
                  },
                  {
                      "image": "20240509-223825-320898/0001.png",
                      "action": "move_forward(2.0)"
                  },
                  {
                      "image": "20240509-223825-320898/0002.png",
                      "action": "move_forward(1.8), rotate_right(30)"
                  },
                  {
                      "image": "20240509-223825-320898/0003.png",
                      "action": "speak(\"It's on the sofa.\")"
                  }
              ]
          }
      ]
    }
  • User-friendly. LEGENT requires no complex installation and can run cross-platform on both PCs and servers. It is as intuitive as a game while also supporting complex research needs.

Note

LEGENT is currently organizing code and documents and improving existing features. It will be more convenient to use once this process is complete. If you want a more stable version, please stay tuned!

TODO List

  • Polish APIs and write complete documentations.
  • Release the first stable version.
  • Develop a more powerful data generation system for training LMM-based embodied agents.
  • Add planning-level action APIs to support text-only research.
  • Add humanoid animation action APIs to support text-to-motion research.
  • Add physics-based character/body control by integrating more dedicated tools such as MuJoCo.
  • Add multi-agent support.

legent's People

Contributors

chengzl18 avatar kingboyandgirl avatar shengdinghu avatar tuyuge avatar zt-wang19 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

legent's Issues

Missing file scripts/llava/zero3.json

When I run bash scripts/llava/train.sh,
I got the error
ValueError: Expected a string path to an existing deepspeed config, or a dictionary, or a base64 encoded string. Received: scripts/llava/zero3.json.

Maybe the authors should upload the zero3.json file into the repository? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.