facebookresearch / spot-sim2real Goto Github PK

Spot Sim2Real Infrastructure

License: MIT License

Python 76.02% Shell 0.44% Jupyter Notebook 23.47% Dockerfile 0.06%

spot-sim2real's Introduction

🤖 Spot-Sim2Real

Spot-Sim2Real is a modular library for development of Spot for embodied AI tasks (e.g., Language-guided Skill Coordination (LSC), Adaptive Skill Coordination (ASC)) -- configuring Spot robots, controlling sensorimotor skills, and coordinating Large Language Models (LLMs).

📝 Setup instructions

Please refer to the setup instructions page for information on how to setup the repo. Note that this repo by-default does not track dirty status of submodules, if you're making any intentional changes within the third-party packages be sure to track them separately.

💻 Connecting to the robot

Computer can be connected to the robot in one of the following modes.

Ethernet (Gives best network speed, but it is cluttery :sad: )
This mode can be used to create a wired connection with the robot. Useful for teleoperating the robot via computer
Access Point Mode
This is a wireless mode where robot creates its wifi network. Connect robot to this mode for teleoperating it using controller over long distances. Robot is in Access Point mode if you see a wifi with name like spot-BD-*********** (where * is a number)
Client Mode (Gives 2nd best network speed, we usually prefer this)
This is a wireless mode where robot is connected to an external wifi network (from a nearby router). Computer should be connected to this same network, wired connection between router and computer will be faster than wireless connection.

Follow the steps from Spot's Network Setup page by Boston Dynamics to connect to the robot.

After setting up spot in correct network configuration, please add its IP inside bashrc

echo "export SPOT_IP=<spot's ip address>" >> ~/.bashrc
source ~/.bashrc

Test and ensure you can ping spot

ping $SPOT_IP

If you get response like this, then you are on right network

(spot_ros) user@linux-machine:~$ ping $SPOT_IP
PING 192.168.1.5 (192.168.1.5) 56(84) bytes of data.
64 bytes from 192.168.1.5: icmp_seq=1 ttl=64 time=8.87 ms
64 bytes from 192.168.1.5: icmp_seq=2 ttl=64 time=7.36 ms

Before starting to run the code, you need to ensure that all ROS env variables are setup properly inside bashrc. Please follow the steps from Setting ROS env variables for proper ROS env var setup.

🖥️ Getting to the repo

Go to the repository

cd /path/to/spot-sim2real/

The code for the demo lies inside the main branch.

# Check your current git branch
git rev-parse --abbrev-ref HEAD

# If you are not in the `main` branch, then checkout to the `main` branch
git checkout main

🚈 Try teleoperating the robot using keyboard

🚨 Running Emergency Stop

Since we do not have a physical emergency stop button (like the large red push buttons), we need to run an e-stop node.
```
python -m spot_wrapper.estop
```

Keep this window open at all the times, if the robot starts misbehaving you should be able to quickly press s or space_bar to kill the robot

🎹 Running keyboard teleop

Ensure you have the Estop up and running in one terminal. Follow these instructions for e-stop
Run keyboard teleop with this command in a new terminal
```
spot_keyboard_teleop
```

🎮 Instructions to record waypoints (use joystick to move robot around)

Before running scripts on the robot, waypoints should be recorded. These waypoints exist inside file spot-sim2real/spot_rl_experiments/configs/waypoints.yaml
Before recording receptacles, make the robot sit at home position then run following command
```
spot_reset_home
```
There are 2 types of waypoints that one can record,
1. clutter - These only require the (x, y, theta) of the target receptacle
2. place - These requre (x, y, theta) for target receptable as well as (x, y, z) for exact drop location on the receptacle
To record a clutter target, teleoperate the robot to reach near the receptacle target (using joystick). Once robot is at a close distance to receptacle, run the following command
```
spot_rl_waypoint_recorder -c <name_for_clutter_receptacle>
```
To record a place target, teleoperate the robot to reach near the receptacle target (using joystick). Once robot is at a close distance to receptacle, use manipulation mode in the joystick to manipulate the end-effector at desired (x,y,z) position. Once you are satisfied with the end-effector position, run the following command
```
spot_rl_waypoint_recorder -p <name_for_place_receptacle>
```

🚀 Running instructions

Running the demo (ASC/LSC/Seq-Experts)

Step1. Run the local launch executable

In a new terminal, run the executable as
```
spot_rl_launch_local
```
This command starts 4 tmux sessions\n
1. roscore
2. img_publishers
3. proprioception
4. tts
You can run tmux ls in the terminal to ensure that all 4 tmux sessions are running. You need to ensure that all 4 sessions remain active until 70 seconds after running the spot_rl_launch_local. If anyone of them dies before 70 seconds, it means there is some issue and you should rerun spot_rl_launch_local.
You should try re-running spot_rl_launch_local atleast 2-3 times to see if the issue still persists. Many times roscore takes a while to start due to which other nodes die, re-running can fix this issue.

You can verify if all ros nodes are up and running as expected if the output of rostopic list looks like the following

(spot_ros) user@linux-machine:~$ rostopic list
/filtered_hand_depth
/filtered_head_depth
/hand_rgb
/mask_rcnn_detections
/mask_rcnn_visualizations
/raw_hand_depth
/raw_head_depth
/robot_state
/rosout
/rosout_agg
/text_to_speech

If you don't get the output as follows, one of the tmux sessions might be failing. Follow the debugging strategies described in ISSUES.md for triaging and resolving these errors.

Step2. Run ROS image visualization

This is the image visualization tool that helps to understand what robot is seeing and perceiving from the world
```
spot_rl_ros_img_vis
```
Running this command will open an image viewer and start printing image frequency from different rosotopics.

If the image frequency corresponding to mask_rcnn_visualizations is too large and constant (like below), it means that the bounding box detector has not been fully initialized yet

raw_head_depth: 9.33 raw_hand_depth: 9.33 hand_rgb: 9.33 filtered_head_depth: 11.20 filtered_hand_depth: 11.20 mask_rcnn_visualizations: 11.20
raw_head_depth: 9.33 raw_hand_depth: 9.33 hand_rgb: 9.33 filtered_head_depth: 11.20 filtered_hand_depth: 8.57 mask_rcnn_visualizations: 11.20
raw_head_depth: 9.33 raw_hand_depth: 9.33 hand_rgb: 9.33 filtered_head_depth: 8.34 filtered_hand_depth: 8.57 mask_rcnn_visualizations: 11.20

Once the mask_rcnn_visualizations start becoming dynamic (like below), you can proceed with next steps

raw_head_depth: 6.87 raw_hand_depth: 6.88 hand_rgb: 6.86 filtered_head_depth: 4.77 filtered_hand_depth: 5.01 mask_rcnn_visualizations: 6.14
raw_head_depth: 6.87 raw_hand_depth: 6.88 hand_rgb: 6.86 filtered_head_depth: 4.77 filtered_hand_depth: 5.01 mask_rcnn_visualizations: 5.33
raw_head_depth: 4.14 raw_hand_depth: 4.15 hand_rgb: 4.13 filtered_head_depth: 4.15 filtered_hand_depth: 4.12 mask_rcnn_visualizations: 4.03
raw_head_depth: 4.11 raw_hand_depth: 4.12 hand_rgb: 4.10 filtered_head_depth: 4.15 filtered_hand_depth: 4.12 mask_rcnn_visualizations: 4.03

Step3. Reset home in a new terminal

This is an important step. Ensure robot is at its start location and sitting, then run the following command in a new terminal
```
spot_reset_home
```
The waypoints that were recorded are w.r.t the home location. Since the odometry drifts while robot is moving, it is necessary to reset home before start of every new run

Step4. Emergency stop

Follow the steps described in e-stop section

Step5. Main demo code in a new terminal

Ensure you have correctly added the waypoints of interest by following the intructions to record waypoints
In a new terminal you can now run the code of your choice
1. To run Sequencial experts
```
spot_rl_mobile_manipulation_env
```
2. To run Adaptive skill coordination
```
spot_rl_mobile_manipulation_env -m
```
3. To run Language instructions with Sequencial experts, ensure the usb microphone is connected to the computer
```
python spot_rl_experiments/spot_rl/envs/lang_env.py
```
If you are done with demo of one of the above code and want to run another code, you do not need to re-run other sessions and nodes. Running a new command in the same terminal will work just fine. But make sure to bring robot at home location and reset its home using spot_reset_home in the same terminal

Step6. [Optional] Pick with Pose estimation (uses NVIDIA's FoundationPose Model)

Ensure FoundationPoseForSpotSim2real is setup as submodule in third_party folder please follow instructions in SETUP_INSTRUCTIONS.mds
Currently we only support pose estimation for bottle, penguine plush toy & paper cup found in FB offices' microktichen
New Meshes can be added using 360 video of the object from any camera (iphone, android), entire process will be described in the above repo's README
Pose estimation model FoundationPose runs as a microservice & can be communicated through pyzmq socket
The Step 1 should also start the pose estimation service & no other step is required to start this microservice
How to use Pose Estimation ?
- You can pass two flags enable_pose_estimation & enable_pose_correction with pick skill as skillmanager.pick(enable_pose_estimation=True, enable_pose_correction=True)
- If you enable pose correction, spot will first manually correct the object pose for eg. rotate horizontal object to be vertical etc & place the corrected the object at the same place.
- Our orientationsolver can also correct the object to face the camera but it incurs additional pick attempt before place can be run thus is kept to be false by default
- Enabling pose estimation can help in two major way - informs grasp api how to approach the object viz. topdown or side which increases the grasp success probability & correct object orientation before place is ran.

Using Spot Data-logger

All logs will get stored inside data/data_logs directory

Logged keys

The logger will capture spot's data such that each timestamp's log packet is a dict with following keys:

"timestamp" : double, # UTC epoch time from time.time()
"datatime": str # human readable corresponding local time as "YY-MM-DD HH:MM:SS"
"camera_data" : [
                    {
                        "src_info" : str, # this is name of camera source as defined in SpotCamIds
                        "raw_image": np.ndarray, # this is spot's camera data as cv2 (see output of Spot.image_response_to_cv2() for more info)
                        "camera_intrinsics": np.ndarray, # this is 3x3 matrix holding camera intrinsics
                        "base_T_camera": np.ndarray, # this is 4x4 transformation matrix of camera w.r.t base frame of robot
                    },
                    ...
                ],
"vision_T_base": np.ndarray, # this is 4x4 transformation matrix of base frame w.r.t vision frame
"base_pose_xyt": np.ndarray, # this is 3 element array representing x,y,yaw w.r.t home frame
"arm_pose": np.array, # this is a 6 element array representing arm joint states (ordering : sh0, sh1, el0, el1, wr0, wr1)
"is_gripper_holding_item": bool, # whether gripper is holding something or not
"gripper_open_percentage": double, # how much is the gripper open
"gripper_force_in_hand": np.ndarray, # force estimate on end-effector in hand frame

Logging data

The data logger is designed to log the data provided here at whatever rate sensor data becomes available (which depends on network setup).

To run the logger async, simply run the following command in a new terminal

python -m spot_wrapper.data_logger --log_data

This will record data in a while loop, press Ctrl+c to spot the logger. That will save the log file inside data/data_logs/<YY,MM,DD-HH,MM,SS>.pkl file

Warning : This logger will cause motion blur as camera data is logged while the robot moves. Currently we do not support Spot-Record-Go protocol to log

Log replay

It is also possible to replay the logged data (essentially the camera streams that have been logged) using the following command :

python -m spot_wrapper.data_logger --replay="<name_of_log_file>.pkl"

Caution : For replay, the log file SHOULD be a pkl file with the keys provided here

Caution : Please ensure the log file is present inside data/data_logs dir.

🔧 Call skills (non-blocking) without installing spot-sim2real in your home environment

We provide an function that can call skills in seperate conda environment. And the calling of skill itself is a non-blocking call.

Step1. Follow Running instructions section to setup image client in spot_ros conda environment

Step2. Run `skill_executor.py` to listen to which skill to use. This will run on the background.

python spot_rl_experiments/spot_rl/utils/skill_executor.py

Step3. Use ROS to use skill in your application. Now you can call skills in non-blocking way.

# In your application, you import rospy for calling which skill to use 
import rospy # This is the only package you need to install in your environment
rospy.set_param("skill_name_input", "Navigate,desk") # Call navigation skills to navigate to the desk. This is a non-blocking call.

👓 Run Spot-Aria project code

Follow the steps in the project documentation.

⭐ Convert pytorch weights to torchscript

To convert pytorch weights to torchscript, please follow Torchscript Conversion Instructions.

📣 Acknowledgement

We thank Naoki Yokoyama for setting up the foundation of the codebase, and Joanne Truong for polishing the codebase. Spot-Sim2Real is built upon Naoki's codebases: bd_spot_wrapper and spot_rl_experiments , and with new features (LLMs, pytest) and improving robustness.

✍️ Citations

If you find this repository helpful, feel free to cite our papers: Adaptive Skill Coordination (ASC) and Language-guided Skill Coordination (LSC).

@article{yokoyama2023adaptive,
  title={Adaptive Skill Coordination for Robotic Mobile Manipulation},
  author={Yokoyama, Naoki and Clegg, Alexander William and Truong, Joanne and Undersander, Eric  and Yang, Tsung-Yen and Arnaud, Sergio and Ha, Sehoon and Batra, Dhruv and Rai, Akshara},
  journal={arXiv preprint arXiv:2304.00410},
  year={2023}
}

@misc{yang2023adaptive,
    title={LSC: Language-guided Skill Coordination for Open-Vocabulary Mobile Pick-and-Place},
    author={Yang, Tsung-Yen and Arnaud, Sergio and Shah, Kavit and Yokoyama, Naoki and  Clegg, Alexander William and Truong, Joanne and Undersander, Eric and Maksymets, Oleksandr and Ha, Sehoon and Kalakrishnan, Mrinal and Mottaghi, Roozbeh and Batra, Dhruv and Rai, Akshara},
    howpublished={\url{https://languageguidedskillcoordination.github.io/}}
}

License

Spot-Sim2Real is MIT licensed. See the LICENSE file for details.

spot-sim2real's People

Contributors

Stargazers

Watchers

Forkers

ls-br-robotics sbhuiyan20 samermakni

spot-sim2real's Issues

[BE] - Re-org code so that Perception + Utils is a separate package.

Fix the llm and llm_type duplication in spot_rl_experiments/spot_rl/llm/src/conf/config.yaml

as in title

repeated code between lang_env and mobile_manipulation_env

Would be good to clean up these so there is less repeated code here and other places of the repo.

Record trajectories in super (Skill) class for both base & arm

As suggested by @akshararai in PR##131 -> #131 (comment)

Currently:
Nav skill returns trajectory.
Pick skill returns a dict.

self.gaze_result = {
            "target_object": target_object,
            "time_taken": time.time() - self.start_time,
            "success": self.env.grasp_attempted and success_status_from_user_feedback,
        }

Place skill returns a dict

self.place_result = {
            "time_taken": time.time() - self.start_time,
            "success": is_position_within_bounds(
                local_place_target_spot,
                local_ee_pose_spot,
                self.config.SUCC_XY_DIST,
                self.config.SUCC_Z_DIST,
                convention="spot",
            ),
            "place_target": local_place_target_spot,
            "ee_pos": local_ee_pose_spot,
        }

we should standardize what they all return when we move to an abstract class

Port over frame transformations from home-robot repo

          Thanks for this! I wonder if we can simplify this by creating a SO2 object from SE3's rotation matrix and use theta from it?

Originally posted by @zephirefaith in #96 (comment)

Make AprilTagPoseEstimator support multuple markers

Move get_spot_a_tform_b to main Spot class and replace all get_ABC_T_XYZ calls with this generix function

sklearn deprecated package issue

run this before installing environment.yaml export SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True

Refactor SpotQRDetector class and Spot Interface

Move get_spot_a_tform_b to main Spot class and replace all get_ABC_T_XYZ calls with this generix function #83
Convert all transformations inside class SpotQRDetector to Sophus,SE3 instead of magnum
Also update the return types inside april_tag_pose_estimator
Check all comments and ensure there is no trace of magnum inside aria related files
Remove dependence of class Spot from class AprilTagPoseEstimator
Refactor convert_XYZ_to_ABC kind of functions to: <library>_<representation>_to_<library>_<representation>, e.g. ros_Pose_to_ros_TransformStamped
Use the SP3 averaging methods from math_utils.py in SpotQRDetector too.
Refactor class AprilTagDetector to make it generic enough for spot & aria
Update ros_topic_frames.yaml to have nested frame structure for spot & aria

Test Different OwlVit model sizes and architectures

Test google/owlvit-base-patch16 and google/owlvit-large-patch14 we simply replace it here

How much better/worse is the FPS?
How much better/worse is the performance?

I expect them to be slower but better. Is it worth it?

Refactor SkillManager

O

import os
import os
os.environ["OPENAI_API_KEY"] = "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"
import time
from collections import Counter

import magnum as mn
import numpy as np
from spot_wrapper.spot import Spot

from spot_rl.envs.base_env import SpotBaseEnv
from spot_rl.envs.gaze_env import SpotGazeEnv
from spot_rl.real_policy import GazePolicy, MixerPolicy, NavPolicy, PlacePolicy
from spot_rl.utils.remote_spot import RemoteSpot
from spot_rl.utils.utils import (
    WAYPOINTS,
    closest_clutter,
    construct_config,
    get_clutter_amounts,
    get_default_parser,
    nav_target_from_waypoints,
    object_id_to_nav_waypoint,
    place_target_from_waypoints,
)
#from spot_rl.models.whisper import WhisperTranslator
from spot_rl.models.sentence_similarity import SentenceSimilarity

from spot_rl.llm.src.rearrange_llm import RearrangeEasyChain

from hydra import compose, initialize
import subprocess
import time

CLUTTER_AMOUNTS = Counter()
CLUTTER_AMOUNTS.update(get_clutter_amounts())
NUM_OBJECTS = np.sum(list(CLUTTER_AMOUNTS.values()))
DOCK_ID = int(os.environ.get("SPOT_DOCK_ID", 520))

DEBUGGING = False

def main(spot, use_mixer, config, out_path=None):


    #audio_to_text = WhisperTranslator()
    sentence_similarity = SentenceSimilarity()
    #with initialize(config_path='../llm/src/conf'):
    #    llm_config = compose(config_name='config')
    #llm = RearrangeEasyChain(llm_config)
    #print('Give instruction!')
    #audio_to_text.record()
    #instruction = audio_to_text.translate()
    #instruction = 'take the rubik cube from the sining table to the hamper'
    #print(instruction)

    nav_1, pick, nav_2, _ = 'dining table', 'box', 'hamper'
    print('PARSED', nav_1, pick, nav_2)

    nav_1 = sentence_similarity.get_most_similar_in_list(nav_1, list(WAYPOINTS['nav_targets'].keys()))
    nav_2 = sentence_similarity.get_most_similar_in_list(nav_2, list(WAYPOINTS['nav_targets'].keys()))
    print('Most Similar', nav_1, pick, nav_2)

    print('Calling processes')
    subprocess.Popen(['tmux', 'kill-session', '-t', 'img_pub'])
    cmd = f"tmux new -s img_pub -d '/home/akshara/anaconda3/envs/spot_ros/bin/python -m spot_rl.utils.img_publishers --local --owlvit_label \"{pick}\"'"
    print(cmd)
    s = subprocess.getstatusoutput(cmd)
    print(s)

    policy = SequentialExperts(
            config.WEIGHTS.NAV,
            config.WEIGHTS.GAZE,
            config.WEIGHTS.PLACE,
            device=config.DEVICE,
        )
    env_class = SpotMobileManipulationSeqEnv
    env = env_class(config, spot)
    env.power_robot()
    time.sleep(1)
    count = Counter()
    out_data = []

    waypoint = nav_target_from_waypoints(nav_1)
    observations = env.reset(waypoint=waypoint)
    env.target_obj_name = nav_2

    policy.reset()
    done = False
    expert = Tasks.NAV
    env.stopwatch.reset()
    while not done:
        out_data.append((time.time(), env.x, env.y, env.yaw))

        base_action, arm_action = policy.act(observations, expert=expert)
        nav_silence_only = True
        env.stopwatch.record("policy_inference")
        observations, _, done, info = env.step(
            base_action=base_action,
            arm_action=arm_action,
            nav_silence_only=nav_silence_only,
        )
        
        expert = info["correct_skill"]

        if done:
            env.say("Finished object rearrangement. Heading to dock.")
            waypoint = nav_target_from_waypoints("dock")
            observations = env.reset(waypoint=waypoint)

        policy.nav_policy.reset()
        env.owlvit_pick_up_object_name = pick

        env.stopwatch.print_stats(latest=True)


    # Go to the dock
    env.say("Finished object rearrangement. Heading to dock.")
    waypoint = nav_target_from_waypoints("dock")
    observations = env.reset(waypoint=waypoint)
    expert = Tasks.NAV
    policy.nav_policy.reset()
    while True:

        base_action, arm_action = policy.act(observations, expert=expert)
        nav_silence_only = True
        env.stopwatch.record("policy_inference")
        observations, _, done, info = env.step(
            base_action=base_action,
            arm_action=arm_action,
            nav_silence_only=nav_silence_only,
        )
        try:
            spot.dock(dock_id=DOCK_ID, home_robot=True)
            spot.home_robot()
            break
        except:
            print("Dock not found... trying again")
            time.sleep(0.1)

    print("Done!")

    out_data.append((time.time(), env.x, env.y, env.yaw))

    if out_path is not None:
        data = (
                "\n".join([",".join([str(i) for i in t_x_y_yaw]) for t_x_y_yaw in out_data])
                + "\n"
        )
        with open(out_path, "w") as f:
            f.write(data)


class Tasks:
    r"""Enumeration of types of tasks."""

    NAV = "nav"
    GAZE = "gaze"
    PLACE = "place"


class SequentialExperts:
    def __init__(self, nav_weights, gaze_weights, place_weights, device="cuda"):
        print("Loading nav_policy...")
        self.nav_policy = NavPolicy(nav_weights, device)
        print("Loading gaze_policy...")
        self.gaze_policy = GazePolicy(gaze_weights, device)
        print("Loading place_policy...")
        self.place_policy = PlacePolicy(place_weights, device)
        print("Done loading all policies!")

    def reset(self):
        self.nav_policy.reset()
        self.gaze_policy.reset()
        self.place_policy.reset()

    def act(self, observations, expert):
        base_action, arm_action = None, None
        if expert == Tasks.NAV:
            base_action = self.nav_policy.act(observations)
        elif expert == Tasks.GAZE:
            arm_action = self.gaze_policy.act(observations)
        elif expert == Tasks.PLACE:
            arm_action = self.place_policy.act(observations)

        return base_action, arm_action


class SpotMobileManipulationBaseEnv(SpotGazeEnv):
    node_name = "SpotMobileManipulationBaseEnv"

    def __init__(self, config, spot: Spot):
        super().__init__(config, spot)

        # Nav
        self.goal_xy = None
        self.goal_heading = None
        self.succ_distance = config.SUCCESS_DISTANCE
        self.succ_angle = np.deg2rad(config.SUCCESS_ANGLE_DIST)
        self.gaze_nav_target = None
        self.place_nav_target = None
        self.rho = float("inf")
        self.heading_err = float("inf")

        # Gaze
        self.locked_on_object_count = 0
        self.target_obj_name = config.TARGET_OBJ_NAME

        # Place
        self.place_target = None
        self.ee_gripper_offset = mn.Vector3(config.EE_GRIPPER_OFFSET)
        self.place_target_is_local = False

        # General
        self.max_episode_steps = 1000
        self.navigating_to_place = False

    def reset(self, waypoint=None, *args, **kwargs):
        # Move arm to initial configuration (w/ gripper open)
        self.spot.set_arm_joint_positions(
            positions=np.deg2rad(self.config.GAZE_ARM_JOINT_ANGLES), travel_time=0.75
        )
        # Wait for arm to arrive to position
        # import pdb; pdb.set_trace()
        time.sleep(0.75)
        print("open gripper called in SpotMobileManipulationBaseEnv")
        self.spot.open_gripper()

        # Nav
        if waypoint is None:
            self.goal_xy = None
            self.goal_heading = None
        else:
            self.goal_xy, self.goal_heading = (waypoint[:2], waypoint[2])

        # Place
        self.place_target = mn.Vector3(-1.0, -1.0, -1.0)

        # General
        self.navigating_to_place = False

        return SpotBaseEnv.reset(self)

    def step(self, base_action, arm_action, *args, **kwargs):
        # import pdb; pdb.set_trace()
        _, xy_dist, z_dist = self.get_place_distance()
        place = xy_dist < self.config.SUCC_XY_DIST and z_dist < self.config.SUCC_Z_DIST
        if place:
            print("place is true")

        if self.grasp_attempted:
            print("A")
            grasp = False
        else:
            print("b")
            grasp = self.should_grasp()

        if self.grasp_attempted:
            max_joint_movement_key = "MAX_JOINT_MOVEMENT_2"
        else:
            max_joint_movement_key = "MAX_JOINT_MOVEMENT"

        # Slow the base down if we are close to the nav target for grasp to limit blur
        if (
            not self.grasp_attempted
            and self.rho < 0.5
            and abs(self.heading_err) < np.rad2deg(45)
        ):
            self.slowdown_base = 0.5  # Hz
            print("!!!!!!Slow mode!!!!!!")
        else:
            self.slowdown_base = -1
        disable_oa = False if self.rho > 0.3 and self.config.USE_OA_FOR_NAV else None
        # import ipdb; ipdb.set_trace()
        observations, reward, done, info = SpotBaseEnv.step(
            self,
            base_action=base_action,
            arm_action=arm_action,
            grasp=grasp,
            place=place,
            max_joint_movement_key=max_joint_movement_key,
            disable_oa=disable_oa,
            *args,
            **kwargs,
        )
        if done:
            print("done is true")

        if self.grasp_attempted and not self.navigating_to_place:
            # Determine where to go based on what object we've just grasped

            waypoint_name = self.target_obj_name
            waypoint = nav_target_from_waypoints(waypoint_name)

            #waypoint_name, waypoint = object_id_to_nav_waypoint(self.target_obj_name)
            self.say("Navigating to " + waypoint_name)
            self.place_target = place_target_from_waypoints(waypoint_name)
            self.goal_xy, self.goal_heading = (waypoint[:2], waypoint[2])
            self.navigating_to_place = True
            info["grasp_success"] = True

        return observations, reward, done, info

    def get_observations(self):
        observations = self.get_nav_observation(self.goal_xy, self.goal_heading)
        rho = observations["target_point_goal_gps_and_compass_sensor"][0]
        self.rho = rho
        goal_heading = observations["goal_heading"][0]
        self.heading_err = goal_heading
        self.use_mrcnn = False
        observations.update(super().get_observations())
        observations["obj_start_sensor"] = self.get_place_sensor()

        return observations

    def get_success(self, observations):
        return self.place_attempted


class SpotMobileManipulationSeqEnv(SpotMobileManipulationBaseEnv):
    node_name = "SpotMobileManipulationSeqEnv"

    def __init__(self, config, spot: Spot):
        super().__init__(config, spot)
        self.current_task = Tasks.NAV
        self.timeout_start = float("inf")

    def reset(self, *args, **kwargs):
        observations = super().reset(*args, **kwargs)
        self.current_task = Tasks.NAV
        self.target_obj_name = 0
        self.timeout_start = float("inf")

        return observations

    def step(self, *args, **kwargs):
        pre_step_navigating_to_place = self.navigating_to_place
        observations, reward, done, info = super().step(*args, **kwargs)

        if self.current_task != Tasks.GAZE:
            # Disable target searching if we are not gazing
            self.last_seen_objs = []

        if self.current_task == Tasks.NAV and self.get_nav_success(
            observations, self.succ_distance, self.succ_angle
        ):
            if not self.grasp_attempted:
                self.current_task = Tasks.GAZE
                self.timeout_start = time.time()
                #self.target_obj_name = None
            else:
                self.current_task = Tasks.PLACE
                self.say("Starting place")
                self.timeout_start = time.time()

        if self.current_task == Tasks.PLACE and time.time() > self.timeout_start + 10:
            # call place after 10s of trying
            print("Place failed to reach target")
            done = True


        if not pre_step_navigating_to_place and self.navigating_to_place:
            # This means that the Gaze task has just ended
            self.current_task = Tasks.NAV


        info["correct_skill"] = self.current_task

        self.use_mrcnn = False
        #

        return observations, reward, done, info


if __name__ == "__main__":

    parser = get_default_parser()
    parser.add_argument("-m", "--use-mixer", action="store_true")
    parser.add_argument("--output")
    args = parser.parse_args()
    config = construct_config(args.opts)

    spot = (RemoteSpot if config.USE_REMOTE_SPOT else Spot)("RealSeqEnv")
    if config.USE_REMOTE_SPOT:
        try:
            main(spot, args.use_mixer, config, args.output)
        finally:
            spot.power_off()
    else:
        with spot.get_lease(hijack=True):
            try:
                main(spot, args.use_mixer, config, args.output)
            finally:
                spot.power_off()

Unnecessary Dependencies

File : spot_rl_experiments/spot_rl/envs/spot_base_env.py
Spot2Habitat transform SHOULD NOT BE A PART OF SpotBaseEnv Class. It is an unnecessary dependency.

...

Add type safety in `decorate_img_with_text` inside `img_utils.py`

Create a clean `executables` or `example` folder and move all entry-point scripts there.

          I agree, this belongs in the utils/ folder.

However, we don't have a clean executables/ or examples/ folder.
We should make one and move this and the scripts for running Spot alone experiments there.

Originally posted by @akshararai in #110 (comment)

Add documentation regarding all ros-nodes that are spawned

Is there a published way to simulate ASC in habitat?

Create new configs file for AR - Robot integration

Create default configs for constant parameters like - AR device intrinsics, etc.

Make spot use `april_tag_detector.py` instead of `AprilTagEstimator` class

Split `parse_camera_stream` into two parts so parsing and processing are different conceptual functions.

Add safety checks for if input stream name is valid
Create a list of all detectors with same method signatures
Create new function parse_frame which runs the detectors on the frame that was passed
parse_stream should be responsible for taking the stream from VRS and run parse_frame on that.

All common x_utils.py should be moved to a new package in the root of the repo

This new folder should have access to config files

Address comments from Heuristic Nav PR

PR whos comments need to be fixed - #99

Remove all references to Magnum and use Sophus instead. Reasons -> Magnum installation causes errors with P1 demo stack, P1 demo and other external packages installed all use Sophus so maintenance is easy
Remove images generated (tests/hardware_tests/imagesearch_looking_forward.png)
If images are going to be generated in every run, please add the path to those images to .gitignore
[ ]

Add safety checks for if input stream name is valid

Cleanup global variables from aria_streamer.py

fix nav_target_from_waypoint to take 'nav_targets' list from waypoint.yaml

Add copyright to each file for open sourcing

Copyright (c) Meta Platforms, Inc. and its affiliates.

This source code is licensed under the MIT license found in the

LICENSE file in the root directory of this source tree.

[BE] - Integrate AriaLiveStreamer & AriaVRSStreamer as one class

[Perception] - Add Hand Object Detector as a new submodule package

Running nav/grasp/placing skills separately using LLMs

During the demo, we have need to run nav-only experiments. And we took a short-cut to integrate LLMs in nav env. We will need more clear way of doing this

Add local packages to requirements of sim2real pkgs for easy install

          @akshararai @rutadesai @jimmytyyang @tusharsangam Do any of you know how to include this package in requirements for spot_rl and aria_data_utils through `setup.py`? These instructions are good but the more we could push installs to less number of `pip install` commands, the better.

Originally posted by @zephirefaith in #127 (comment)

Move the functionality to power on and shutdown the robot into spot.py

This PR is an optimization of current state of the repo.

Currently, the latest "controllers" that are getting added to Nav, Pick and Place each have a shutdown() method which detects the base and docks the robot .. or sits down in place if no dock is detected.

This code duplication can be prevented by moving it to spot.py.

Open issues from review of P1 demo PR#110

This issue is used to track all open/unresolved conversations from PR - #110

Reviewers
(Please check against your names if you don't have any additional changes / recommendations for the PR#110

Comments from PR#110 to be reviewed
@zephirefaith
#110 (comment)
#110 (comment)

@jimmytyyang
#110 (comment)
#110 (comment)
#110 (comment)

@akshararai
#110 (comment)
#110 (comment)
#110 (comment)
#110 (comment)
#110 (comment)

Recommended TODOs for improving the PR#110

Upgrade the demo_trigger mechanism
PLEASE ADD TODOs has tasks here.

Missing functionality for different default arm configurations

For resetting the arm, right now each environment (that uses the arm) is storing the arm initial configuration differently and resetting it inside its own implementation of reset method.

While this is good, it blocks the dev from reusing such resets in cases where envs are not to be used (eg : while using BD APIs)

Thus is would make more sense to create default configurations (for stow, carry, ready, etc) for spot-sim2real and have a wrapper method responsible for achieving those arm configuration.

The envs can further utilize this wrapper method. This will prevent a lot of code duplication and would be cleaner in the long run.

Fix ROS node init issue in base_env, its parent & its children

The issue is that SpotRobotSubscriberMixin which is absolute base class is the one that initializes ros node.

SpotBaseEnv, SpotNavEnv, SpotGazeEnv, SpotPlaceEnv keep modifying the ros node name which finally get's initialized in SpotRobotSubscriberMixin. This system is prone to bug because SpotSkillManager which does not inherit from any of these above mentioned classes will not be able to alter the node name and the file that utilizes SpotSkillManager cannot have a separate ros node init as that raises runtime error.

[Spot] `num_steps` stays constant even when Spot is moving, leading to infinite loops

Saw this when MG was called on 12/08/2023. @KavitShah1998 has also seen this in the past for other skills, so seems like a logic problem at skill-manager or skill-to-env integration layer.

****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.01, -0.12]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.01, -0.12]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.13]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.14]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.14]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.15]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.15]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.16]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.17]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.17]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.18]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.18]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.19]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.19]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.19]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.20]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 100
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.20]	raw_arm_ac: None
base_action: [0.00, 0.00, -0.11]	arm_action: None
****************************************************num_steps: 101
Execution has not exceeded 500 steps.
raw_base_ac: [0.02, -0.44]	raw_arm_ac: None
base_action: [0.00, 0.00, -0.23]	arm_action: None
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [-0.00, -0.10]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [-0.01, -0.06]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.06]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [-0.00, -0.06]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [-0.00, -0.07]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [-0.00, -0.07]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.08]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.09]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.08]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [-0.01, -0.01]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [-0.00, -0.04]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [-0.00, -0.05]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [-0.00, -0.05]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.06]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.07]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [0.00, -0.07]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102
Execution has not exceeded 500 steps.
raw_base_ac: [0.01, -0.08]	raw_arm_ac: None
base_action: None	arm_action: None
!!!! NO ACTIONS CALLED: moving to next step !!!!
****************************************************num_steps: 102

Gripper does not rotate by 90 deg if place fails

When robot tries to attempt place, it approaches the target pose using Boston Dynamics API. But sometimes, the operation fails and when that happens, robot performs a hardcoded operation to drop the object and retract the arm.

It would be much smoother if right before placing the object, we also rotate the arm by 90 degree.

Branch : lang_env

Robot sometimes crashes with an error stating the command sent to the robot has a distant end-time .. far off into the future and crashes code execution

!!!!!!Slow mode!!!!!!
raw_base_ac: [-0.25, 0.40]	raw_arm_ac: [0.00, 0.00, 0.00, 0.00]
base_action: [-0.06, 0.00, 0.10]	arm_action: None
DETECTIONS STR
None
policy_inference: 0.0065 run_actions: 0.7089 get_observations: 0.0066 total: 0.7219
gater: nav	corrective: 0.007, 0.011, 0.002, 0.048
!!!!!!Slow mode!!!!!!
raw_base_ac: [-0.43, -0.46]	raw_arm_ac: [0.00, 0.00, 0.00, 0.00]
Traceback (most recent call last):
  File "/home/akshara/miniconda3/envs/spot_ros/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/akshara/miniconda3/envs/spot_ros/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/akshara/asc_demo/spot-sim2real/spot_rl_experiments/spot_rl/envs/mobile_manipulation_env.py", line 395, in <module>
    main(spot, args.use_mixer, config, args.output)
  File "/home/akshara/asc_demo/spot-sim2real/spot_rl_experiments/spot_rl/envs/mobile_manipulation_env.py", line 104, in main
    observations, _, done, info = env.step(
  File "/home/akshara/asc_demo/spot-sim2real/spot_rl_experiments/spot_rl/envs/mobile_manipulation_env.py", line 278, in step
    observations, reward, done, info = SpotBaseEnv.step(
  File "/home/akshara/asc_demo/spot-sim2real/spot_rl_experiments/spot_rl/envs/base_env.py", line 331, in step
    self.spot.set_base_velocity(
  File "/home/akshara/asc_demo/spot-sim2real/bd_spot_wrapper/spot_wrapper/spot.py", line 428, in set_base_velocity
    cmd_id = self.command_client.robot_command(
  File "/home/akshara/miniconda3/envs/spot_ros/lib/python3.8/site-packages/bosdyn/client/robot_command.py", line 324, in robot_command
    return self.call(self._stub.RobotCommand, req, _robot_command_value, _robot_command_error,
  File "/home/akshara/miniconda3/envs/spot_ros/lib/python3.8/site-packages/bosdyn/client/common.py", line 237, in processor
    return func(self, rpc_method, request, value_from_response=value_from_response,
  File "/home/akshara/miniconda3/envs/spot_ros/lib/python3.8/site-packages/bosdyn/client/common.py", line 362, in call
    return self.handle_response(response, error_from_response, value_from_response)
  File "/home/akshara/miniconda3/envs/spot_ros/lib/python3.8/site-packages/bosdyn/client/common.py", line 370, in handle_response
    raise exc  # pylint: disable=raising-bad-type
bosdyn.client.robot_command.TooDistantError: bosdyn.api.RobotCommandResponse (TooDistantError): The command end time was too far in the future.
/home/akshara/miniconda3/envs/spot_ros/lib/python3.8/site-packages/rospy/core.py:146: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  _rospy_logger.warn(msg, *args, **kwargs)
/home/akshara/miniconda3/envs/spot_ros/lib/python3.8/site-packages/rospy/core.py:598: DeprecationWarning: isAlive() is deprecated, use is_alive() instead
  if t.isAlive():
(spot_ros) akshara@akshara-laptop:~/asc_demo/spot-sim2real$ ^C

Observed while running ASC (so far)

Update AriaStreamer class

Update the AriaStreamer class for -

Supporting live-stream with aria
Supporting passing different "class-objects/classes" that it can focus on, e.g., markers, objects, receptacles, human-actions etc

TODO:
@jimmytyyang @zephirefaith @rutadesai @KavitShah1998 should discuss the requirements of what this class should support.

Move generic util functions, classes from Aria and Spot to 3rd common package

          This file had some functions such as _rotate_img(), _rectify_image() which are also repeated in aria_live_streamer.py. At some point in the future, we may want to move such functions in some utils file? Similarly, if I am not mistaken, the parse_camera_stream() function in aria_streamer basically does what process_frame() function in aria_live_streamer does but over the entire stream of frames. So perhaps this function can also be refactored into a "utility" to reduce code duplication.

Originally posted by @rutadesai in #110 (comment)

Phantom pick at the end of ASC demo when robot is supposed to go to dock

We saw this several times during RL-R demo and also during tests inside apartment space in Fremont.

Robot on finishing all tasks from waypoints.yaml tries to navigate to the base. While doing so if it sees any detections even from a non-registered receptacle, it does and picks the object (from object_targets inside waypoints.yaml) and takes it to its desired location (again from object_targets inside waypoints.yaml).

On reaching this desired location, it thinks that it reached the dock & sits down there itself.

Messages from log & attached image are evidence for this. The chair on which robot detects a dinosour as a "penguin" is not mapped as a receptable at all, but spotty does pick it nevertheless.

policy_inference: 0.0408 run_actions: 0.8101 get_observations: 0.0008 total: 0.8517
[base_env.py]: Saying: Finished object rearrangement. Heading to dock.
open gripper called in SpotMobileManipulationBaseEnv


...

[bounding_box]: Detected: ball, penguin
policy_inference: 0.0361 run_actions: 0.5098 get_observations: 0.0008 total: 0.5467
gater: gaze	corrective: 0.047, -0.025
raw_base_ac: [0.00, -0.00]	raw_arm_ac: [-0.94, 0.90, 0.99, 0.86]
base_action: None	arm_action: [-0.25, -2.56, 2.02, 0.00, 1.57, 0.00]
DETECTIONS STR
penguin,0.09405618906021118,197,120,249,177
[bounding_box]: Detected: penguin
Locked on to target 1 time(s)...
policy_inference: 0.0043 run_actions: 0.5114 get_observations: 0.0012 total: 0.5169
gater: gaze	corrective: 0.062, 0.091
raw_base_ac: [0.00, -0.00]	raw_arm_ac: [-0.34, 0.97, 0.51, -0.98]
base_action: None	arm_action: [-0.28, -2.48, 2.07, 0.00, 1.49, 0.00]
DETECTIONS STR
penguin,0.19008836150169373,175,68,226,133
[bounding_box]: Detected: penguin
Locked on to target 2 time(s)...
policy_inference: 0.0759 run_actions: 0.5098 get_observations: 0.0010 total: 0.5867
gater: gaze	corrective: -0.015, -0.021
raw_base_ac: [0.00, -0.00]	raw_arm_ac: [0.33, 0.98, -0.58, -1.00]
base_action: None	arm_action: [-0.25, -2.40, 2.02, 0.00, 1.40, 0.00]
DETECTIONS STR
penguin,0.33202460408210754,144,51,210,122
[bounding_box]: Detected: penguin
Locked on to target 3 time(s)...
policy_inference: 0.0192 run_actions: 0.5111 get_observations: 0.0009 total: 0.5312
gater: gaze	corrective: -0.009, 0.099
raw_base_ac: [0.00, -0.00]	raw_arm_ac: [0.77, 0.98, -0.76, -1.00]
DETECTIONS STR
penguin,0.36048218607902527,148,37,203,113
[bounding_box]: Detected: penguin
Saved grasp image as /home/kavitshah/fair/spot-sim2real/spot_rl_experiments/spot_rl/grasp_visualizations/1685142245.4967258.png
Lost lock-on!
GRASP CALLED: Aiming at (x, y): (250, 106)!
[base_env.py]: Saying: Grasping penguin
Current grasp_point_in_image state:  MANIP_STATE_SEARCHING_FOR_GRASP
Current grasp_point_in_image state:  MANIP_STATE_SEARCHING_FOR_GRASP
Current grasp_point_in_image state:  MANIP_STATE_SEARCHING_FOR_GRASP
Current grasp_point_in_image state:  MANIP_STATE_SEARCHING_FOR_GRASP
Current grasp_point_in_image state:  MANIP_STATE_MOVING_TO_GRASP
Current grasp_point_in_image state:  MANIP_STATE_MOVING_TO_GRASP
Current grasp_point_in_image state:  MANIP_STATE_MOVING_TO_GRASP
Current grasp_point_in_image state:  MANIP_STATE_MOVING_TO_GRASP
Current grasp_point_in_image state:  MANIP_STATE_MOVING_TO_GRASP
Current grasp_point_in_image state:  MANIP_STATE_MOVING_TO_GRASP
Current grasp_point_in_image state:  MANIP_STATE_MOVING_TO_GRASP
Current grasp_point_in_image state:  MANIP_STATE_MOVING_TO_GRASP
Current grasp_point_in_image state:  MANIP_STATE_MOVING_TO_GRASP
Current grasp_point_in_image state:  MANIP_STATE_GRASP_SUCCEEDED
base_action: None	arm_action: [0.07, -2.76, 2.02, 0.00, 1.22, 0.00]
[base_env.py]: Saying: Navigating to kitchen_sink

...


base_action: [-0.05, 0.00, -0.51]	arm_action: None
policy_inference: 0.0095 run_actions: 0.5094 get_observations: 0.0140 total: 0.5328
gater: nav	corrective: 0.007, 0.011, -0.015, 0.029
raw_base_ac: [0.11, -0.94]	raw_arm_ac: [0.00, 0.00, -0.00, -0.00]
base_action: [0.06, 0.00, -0.49]	arm_action: None
[base_env.py]: Saying: Executing automatic docking
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
Dock not found... trying again
/home/kavitshah/miniconda3/envs/spot_ros/lib/python3.8/site-packages/rospy/core.py:146: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  _rospy_logger.warn(msg, *args, **kwargs)
/home/kavitshah/miniconda3/envs/spot_ros/lib/python3.8/site-packages/rospy/core.py:598: DeprecationWarning: isAlive() is deprecated, use is_alive() instead
  if t.isAlive():

/home/kavitshah/fair/spot-sim2real/spot_rl_experiments/spot_rl/grasp_visualizations/1685142245.4967258.png

Sleep before grasping

In the base environment there's a sleep of half a second before grasping.

"Briefly pause and get latest gripper image to ensure precise grasp"

I actually think we could not hardcode the .5 but wait for the next gripper image and as soon as it arrives we continue the execution, does this make sense @KavitShah1998?

<3

from transformers import OwlViTProcessor, OwlViTForObjectDetection
import torch
from PIL import Image
import time
import cv2
import argparse

class OwlVit():
    def __init__(self, labels, score_threshold, show_img):
        #self.device = torch.device('cpu')
        self.device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

        self.model = OwlViTForObjectDetection.from_pretrained('google/owlvit-base-patch32')
        self.model.eval()
        self.model.to(self.device)

        self.processor = OwlViTProcessor.from_pretrained('google/owlvit-base-patch32')

        self.labels = labels
        self.score_threshold = score_threshold
        self.show_img = show_img

    def run_inference(self, img):
        '''
        img: an open cv image in (H, W, C) format
        '''
        # Process inputs
        #img = img.to(self.device)
        inputs = self.processor(text=self.labels, images=img, return_tensors='pt')

        #Target image sizes (height, width) to rescale box predictions [batch_size, 2]
        #target_sizes = torch.Tensor([img.size[::-1]]) this is for PIL images
        target_sizes = torch.Tensor([img.shape[:2]]).to(self.device)
        inputs = inputs.to(self.device)

        # Inference
        with torch.no_grad():
            outputs = self.model(**inputs)

        # Convert outputs (bounding boxes and class logits) to COCO API
        results = self.processor.post_process(outputs=outputs, target_sizes=target_sizes)
        #img = img.to('cpu')

        if self.show_img:
            self.show_img_with_overlaid_bounding_boxes(img, results)
        
        return self.get_most_confident_bounding_box_per_label(results)

    def run_inference_and_return_img(self, img):
        '''
        img: an open cv image in (H, W, C) format
        '''
        #img = img.to(self.device)

        inputs = self.processor(text=self.labels, images=img, return_tensors='pt')
        target_sizes = torch.Tensor([img.shape[:2]]).to(self.device)
        inputs = inputs.to(self.device)
        # Inference
        with torch.no_grad():
            outputs = self.model(**inputs)

        # Convert outputs (bounding boxes and class logits) to COCO API
        results = self.processor.post_process(outputs=outputs, target_sizes=target_sizes)
        #img = img.to('cpu')
        #if self.show_img:
        #    self.show_img_with_overlaid_bounding_boxes(img, results)

        return self.get_most_confident_bounding_box_per_label(results), self.create_img_with_bounding_box(img, results)

    def show_img_with_overlaid_bounding_boxes(self, img, results):
        img = self.create_img_with_bounding_box(img, results)
        cv2.imshow('img', img)
        cv2.waitKey(1)

    def get_bounding_boxes(self, results):
        '''
        Returns all bounding boxes with a score above the threshold
        '''
        boxes, scores, labels = results[0]['boxes'], results[0]['scores'], results[0]['labels']
        boxes = boxes.to('cpu')
        labels = labels.to('cpu')
        scores = scores.to('cpu')

        target_boxes = []
        for box, score, label in zip(boxes, scores, labels):
            box = [round(i, 2) for i in box.tolist()]
            if score >= self.score_threshold:
                target_boxes.append([self.labels[0][label.item()], score.item(), box])
            
        return target_boxes

    def get_most_confident_bounding_box(self, results):
        '''
        Returns the most confident bounding box
        '''
        boxes, scores, labels = results[0]['boxes'], results[0]['scores'], results[0]['labels']
        boxes = boxes.to('cpu')
        labels = labels.to('cpu')
        scores = scores.to('cpu')

        target_box = []
        target_score = -float('inf')

        for box, score, label in zip(boxes, scores, labels):
            box = [round(i, 2) for i in box.tolist()]
            if score >= self.score_threshold:
                if score > target_score:
                    target_score = score
                    target_box = box

        if target_score == -float('inf'):
            return None
        else:
            x1 = int(target_box[0])
            y1 = int(target_box[1])
            x2 = int(target_box[2])
            y2 = int(target_box[3])

            print("location:", x1, y1, x2, y2)
            return x1, y1, x2, y2

    def get_most_confident_bounding_box_per_label(self, results):
        '''
        Returns the most confident bounding box for each label above the threshold
        '''
        boxes, scores, labels = results[0]['boxes'], results[0]['scores'], results[0]['labels']
        boxes = boxes.to('cpu')
        labels = labels.to('cpu')
        scores = scores.to('cpu')

        # Initialize dictionaries to store most confident bounding boxes and scores per label
        target_boxes = {}
        target_scores = {}

        for box, score, label in zip(boxes, scores, labels):
            box = [round(i, 2) for i in box.tolist()]
            if score >= self.score_threshold:
                # If the current score is higher than the stored score for this label, update the target box and score
                if label.item() not in target_scores or score > target_scores[label.item()]:
                    target_scores[label.item()] = score
                    target_boxes[label.item()] = box

        # Format the output
        result = []
        for label, box in target_boxes.items():
            x1 = int(box[0])
            y1 = int(box[1])
            x2 = int(box[2])
            y2 = int(box[3])

            result.append([self.labels[0][label], target_scores[label], [x1, y1, x2, y2]])

        return result

    def create_img_with_bounding_box(self, img, results):
        '''
        Returns an image with all bounding boxes avove the threshold overlaid
        '''

        results = self.get_most_confident_bounding_box_per_label(results)
        font = cv2.FONT_HERSHEY_SIMPLEX

        for label, score, box in results:
            img = cv2.rectangle(img, box[:2], box[2:], (255,0,0), 5)
            if box[3] + 25 > 768:
                y = box[3] - 10
            else:
                y = box[3] + 25
            img = cv2.putText(
                    img, label, (box[0], y), font, 1, (255,0,0), 2, cv2.LINE_AA
                )

        return img

    def update_label(self, labels):
        self.labels = labels

Make a generic human sensor class interface with Aria as its implementation

disable mask_rcnn in config.yaml will not affect the behavior

Spot Sim2Real scratch installation on a new desktop

pycocotools and tensorflow had to be commented in environment.yaml for conda env install
installation guide says source ~/.bash_profile change that to ~/.bashrc
source bashrc after exporting env lines
gcc & g++ had to be installed on new machines
tensorflow(2.9.0) had to be installed separately and specifically for habitat-lab
tmux had to be installed for new machines

Discussion about hardware and software testing

Daily testing for hardware (running automatic testing script)

Low level motors (we do not need this, since we use BD API)
- Single joints
- Could you move the base given the command
High level integration tests
- Navigation: if the robot can navigate to the goal point or not
- Grasping: if the robot can grasp the object or not
- Perception: if the robot can return the bounding box or not
- Frequency
  - Weekly testing or daily testing
  - 5 or 10 mins for asking robot to do an obstacle course
Software
- CI test for launching the ROS node

in discussion with Mrinal

Research - Find the best way to get closest mps timestamp from vrs timestamp of interest **

Find a way to optimize method - get_closest_mps_idx_to_timestamp_ns

Given a vrs timestamp (int) find the closest mps timestamp without having to find min over the entire saved MPS timestamps.

Torchscript errors

Issue1 : I am getting this error about missing torchscript weights.

Traceback (most recent call last):
  File "/home/kavitshah/fair/spot-sim2real/spot_rl_experiments/spot_rl/utils/img_publishers.py", line 452, in <module>
    model = OWLVITModel()
  File "/home/kavitshah/fair/spot-sim2real/spot_rl_experiments/spot_rl/utils/img_publishers.py", line 363, in __init__
    self.config = config = construct_config()
  File "/home/kavitshah/fair/spot-sim2real/spot_rl_experiments/spot_rl/utils/utils.py", line 56, in construct_config
    raise KeyError(f"Neither {v} nor {new_v} exist!")
KeyError: 'Neither weights/torchscript/CUTOUT_WT_True_SD_200_ckpt.99.pvp_combined_net.torchscript nor /home/kavitshah/fair/spot-sim2real/spot_rl_experiments/weights/torchscript/CUTOUT_WT_True_SD_200_ckpt.99.pvp_combined_net.torchscript exist!'

Can you add a section in ISSUES.md to explain steps to resolve it?

Issue2 : Mobile Manipulation Environment fails (but Sequential Experts run fine)

(spot_ros) kavitshah@frerd001:~/fair/spot-sim2real/spot_rl_experiments/spot_rl/envs$ python mobile_manipulation_env.py -m
/home/kavitshah/miniforge3/envs/spot_ros/lib/python3.8/site-packages/albumentations/augmentations/geometric/functional.py:6: DeprecationWarning: Please use `gaussian_filter` from the `scipy.ndimage` namespace, the `scipy.ndimage.filters` namespace is deprecated.
  from scipy.ndimage.filters import gaussian_filter
/home/kavitshah/miniforge3/envs/spot_ros/lib/python3.8/site-packages/pretrainedmodels/datasets/utils.py:33: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
  def __init__(self, size, interpolation=Image.BILINEAR):
/home/kavitshah/fair/spot-sim2real/third_party/mask_rcnn_detectron2/detectron2/detectron2/data/transforms/transform.py:46: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
  def __init__(self, src_rect, output_size, interp=Image.BILINEAR, fill=0):
/home/kavitshah/fair/spot-sim2real/third_party/mask_rcnn_detectron2/detectron2/detectron2/data/transforms/augmentation_impl.py:117: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
  def __init__(self, shape, interp=Image.BILINEAR):
/home/kavitshah/fair/spot-sim2real/third_party/habitat-lab/habitat/utils/visualizations/maps.py:25: DeprecationWarning: Starting with ImageIO v3 the behavior of this function will switch to that of iio.v3.imread. To keep the current behavior (and make this warning disappear) use `import imageio.v2 as imageio` or call `imageio.v2.imread` directly.
  AGENT_SPRITE = imageio.imread(
/home/kavitshah/miniforge3/envs/spot_ros/lib/python3.8/site-packages/torch/utils/tensorboard/__init__.py:4: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if not hasattr(tensorboard, "__version__") or LooseVersion(
2024-02-12 15:44:59,217 - INFO - Current battery charge: 85.0%
2024-02-12 15:44:59,243 - INFO - Starting lease check-in
Loading policy...
/home/kavitshah/miniforge3/envs/spot_ros/lib/python3.8/site-packages/torch/serialization.py:707: UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
  warnings.warn("'torch.load' received a zip file that looks like a TorchScript archive"
Executing automatic docking
Dock not found... trying again
Dock not found... trying again
2024-02-12 15:45:03,252 - INFO - Robot safely powered off.
2024-02-12 15:45:03,263 - INFO - Lease check-in stopped
2024-02-12 15:45:03,274 - INFO - Returned the lease.
Traceback (most recent call last):
  File "mobile_manipulation_env.py", line 423, in <module>
    main(spot, args.use_mixer, config, args.output)
  File "mobile_manipulation_env.py", line 40, in main
    policy = MixerPolicy(
  File "/home/kavitshah/fair/spot-sim2real/spot_rl_experiments/spot_rl/real_policy.py", line 298, in __init__
    super().__init__(
  File "/home/kavitshah/fair/spot-sim2real/spot_rl_experiments/spot_rl/real_policy.py", line 76, in __init__
    self.policy = policy_class.from_config(
  File "/home/kavitshah/fair/spot-sim2real/third_party/habitat-lab/habitat_baselines/rl/ppo/moe.py", line 172, in from_config
    return cls(
  File "/home/kavitshah/fair/spot-sim2real/third_party/habitat-lab/habitat_baselines/rl/ppo/moe.py", line 442, in __init__
    super().__init__(
  File "/home/kavitshah/fair/spot-sim2real/third_party/habitat-lab/habitat_baselines/rl/ppo/moe.py", line 114, in __init__
    nav_ckpt, self.expert_nav_policy = get_ckpt_policy(nav_ckpt_path)
  File "/home/kavitshah/fair/spot-sim2real/third_party/habitat-lab/habitat_baselines/rl/ppo/moe.py", line 109, in get_ckpt_policy
    ckpt["config"].RL.POLICY["init"] = False
  File "/home/kavitshah/miniforge3/envs/spot_ros/lib/python3.8/site-packages/torch/jit/_script.py", line 838, in __getitem__
    return self.forward_magic_method("__getitem__", idx)
  File "/home/kavitshah/miniforge3/envs/spot_ros/lib/python3.8/site-packages/torch/jit/_script.py", line 831, in forward_magic_method
    raise NotImplementedError()
NotImplementedError
2024-02-12 15:45:03,276 - INFO - signal_shutdown [atexit]

Pyaudio

To record audio in Python on Linux, you can use the 'pyaudio' library. Here's a simple example showing how to record audio and save it to a WAV file:

First, you need to install 'pyaudio'. You can do this using pip:

pip install pyaudio

Next, create a Python script with the following code:

import pyaudio
import wave

# Configuration
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAV_OUTPUT_FILENAME = "output.wav"

# Initialize PyAudio
audio = pyaudio.PyAudio()

# Open a streaming stream
stream = audio.open(format=FORMAT,
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK)

print("Recording...")4gh5p-

frames = []

# Record audio data
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

print("Finished recording")

# Stop and close the stream
stream.stop_stream()
stream.close()

# Terminate the PortAudio interface
audio.terminate()

Navigation Environment

Navigation Environment performance is worse than navigation on ASC