nasa / isaac Goto Github PK

Integrated System for Autonomous and Adaptive Caretaking

License: Apache License 2.0

Java 0.19% CMake 0.72% C++ 12.62% Python 4.44% Shell 0.77% Dockerfile 0.27% C 0.05% Jupyter Notebook 79.94% CSS 0.04% JavaScript 0.19% HTML 0.04% PDDL 0.74%

isaac's Introduction

ISAAC (Integrated System for Autonomous and Adaptive Caretaking)

The ISAAC project has three main technical thrusts:

Integrated data: The current state of the art is that data from sensors associated with different facility subsystems (Structures, GN&C, and ECLSS, etc.) remain siloed. ISAAC technology unifies facility data and models with autonomous robotics, linking data streams from facility subsystems, sensor networks, and robots, as well as linking 3D geometry and sensor data map layers, and detecting changes and anomalies.
Integrated control interface: Current state of the art manages facilities with largely independent interface tools for different subsystems. Interfaces have different heritage, design assumptions and operator interface styles. Subsystem interactions are hard to analyze due to poor connectivity between separate tools. ISAAC technology combines interface tools for facility subsystems and autonomous robots; improves system-level situation awareness, situation understanding, and operator productivity; enables linking and embedding between tools to improve subsystem interaction analysis; and supports the entire activity life cycle from planning through execution and analysis.
Coordinated execution: The current state of the art for executing facility activities that require coordination between subsystems is either manual commanding (operator tracks dependencies between subsystems) or simple sequencing that can be brittle to even minor contingencies during execution. ISAAC technology models dependencies, uses automated planning to translate a high-level task definition to a plan that can work given the current system state (e.g. include actions to open hatches so that a robot can move where needed), and leverages ISAAC’s integrated data technology to watch for execution contingencies and trigger replanning as needed. This technology will reduce the time to effect changes on the system during critical faults and emergencies.

This isaac repo serves as a master for integrating an end-to-end demo that draws on code from the other repos as well as directly including a significant amount of the ISAAC code, mostly relating to the Astrobee robot. This repository includes:

Astrobee software for added behaviors such as inpection and cargo transport as well as new sensor utilization.
Dense mapping to create a textured 3D map
Volumetric mapping to map volumetric signals, such as WiFi.
Image analysis module to train a neural network to detect anomalies
Survey Manager semi-autonomous planning and execution of imaging tasks.

You may also be interested in the separate repository for the ISAAC User Interface, which enables monitoring of multiple robots through a web browser.

System requirements

The isaac repo depends on the astrobee repo, therefore it inherits the same system requirements. You must use Ubuntu 16.04 to 20.04 64-bit. When running in simulation, certain Gazebo plugins require appropriate graphics drivers. See INSTALL.md in that repository for more information.

Usage

There are two main ways to install and run isaac:

For development: Build the isaac code on your native Ubuntu OS (or inside a normal virtual machine) in a way that makes it convenient for you to edit and incrementally recompile.
For demo: For the ISAAC integrated demo, many ISAAC repos are checked out, built, and run in a distributed fashion across multiple docker containers. The isaac code itself is built and run inside one of these containers. Note that the in-docker build is managed by the Dockerfile and completely separate from any build in your native OS, and you don't need to install for development prior to installing for demo.

Instructions on installing and using the ISAAC Software. For running the docker demos

Documentation

The documentation is auto-generated from the contents of this repository.

To compile the documentation locally (make sure you have the latest doxygen installed):

doxygen isaac.doxyfile

Contributing

The ISAAC Software is open source, and we welcome contributions from the public. Please submit pull requests to the develop branch. For us to merge any pull requests, we must request that contributors sign and submit either an Individual Contributor License Agreement or a Corporate Contributor License Agreement due to NASA legal requirements. Thank you for your understanding.

License

The "ISAAC - Integrated System for Autonomous and Adaptive Caretaking platform" software is licensed under the Apache License, Version 2.0 "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

isaac's People

Contributors

Stargazers

Watchers

isaac's Issues

Check camera_projection logic and reuse astrobee_fsw code when possible

          My biggest concern here is that I couldn't figure out where this function handles the camera distortion. And I think instead of trying to reinvent this logic, we should use some existing functions:

https://github.com/nasa/astrobee/blob/1a0b93a73d6dcfe932da127d2678724e0f9aa542/localization/camera/include/camera/camera_model.h#L45 - Convert from 3D point to UNDISTORTED_C 2D
Compare to undistorted half-width to check if in FOV per Oleg's comment
https://github.com/nasa/astrobee/blob/1a0b93a73d6dcfe932da127d2678724e0f9aa542/localization/camera/include/camera/camera_params.h#L178 - Convert from UNDISTORTED_C to DISTORTED

(Ryan and Oleg are the experts on this. My concern is that the math is tricky and we want to consistently use the well-tested implementation. And even if we use the existing implementation, I would still be a bit concerned about possible corner cases in checking whether the point is within the FOV, things like math blowing up if the point is right behind the camera. Ryan has thought about that kind of thing a lot.)

Less critical issues:

Logically, the outputs from this function could be floating point pixel coordinates. Not sure why we are forcing them to be rounded to an integer. That could be up to whoever is using the function. Like if they have an image and want to look up the value at the resulting XY position, they could choose to round to the nearest integer and use that pixel value (nearest neighbor lookup), or they could do something more sophisticated like interpolating using the surrounding pixel values.

It would be easier to understand if the function returned an Eigen::Vector2D (or similar) rather than using reference arguments as outputs. If an error is possible, it could return a std::optional<Eigen::Vector2D> with an error represented by an empty value (which is nice because any attempt to use the empty value will cause a run-time error). https://google.github.io/styleguide/cppguide.html#Inputs_and_Outputs

Originally posted by @trey0 in #70 (comment)

Minimize rotation in panorama taking

Make survey_planner.py aware of completed actions

Currently, survey_planner.py will always try to accomplish all goals in its problem instance, even if the completed-x predicates for some of the goals have been asserted in the problem instance. The desired behavior is to skip over these already-completed goals.

Relevant predicates it should pay attention to are completed-panorama and completed-stereo.

We should also add at least one minimal test problem instance that includes a completed goal and check that the planner successfully returns a plan without the action corresponding to that goal.

Include choices on inspection_tool

after the tool is initiated it should show:

the progress (it does this already)
the options pause/resume/skip/save
allow user input

Survey manager faster sim testing

Currently, the survey manager multi-robot sim standard testing approach operates the robots to achieve the baseline ISAAC13 goals, which takes > 1 hour, with most of the time spent running long individual actions (panoramas and stereo surveys) vs. going through the transitions between actions that are the main current concern for survey manager testing. Survey manager support for stereo and docking actions in particular have been under-tested because those actions happen to come at the end of a long plan and don't get exercised if earlier steps fail.

In order to streamline our testing:

It would be great to provide a mode where panoramas and stereo surveys take less time. (Examples: Panoramas collecting fewer frames, abbreviated stereo survey trajectory that starts and ends in the same place as the full trajectory.)
We should consider adding a shorter test case that exercises all actions.
This is another motivation for #142 - move optimizations speed up the sim as well as actual ops

Add cargo to install on armhf

Make survey_planner.py plan to proactively get out of the way

A known problem with survey_planner.py is that it will crash with an error if it detects a collision avoidance deadlock situation, i.e., a time when both robots are working on goals that require reserving conflicting locations in the map, and it's not sufficient for one robot to passively wait for the other robot to finish. An example scenario would be conflicting goals that require the robots to move past each other.

The desired behavior in this case is that survey_planner.py should arbitrarily but consistently prioritize one of the robots and make the other robot proactively move out of its way as needed, rather than deadlock. Note that the resulting plan is likely to be somewhat inefficient (when possible, it would be better to reallocate goals to different robots or reorder them to avoid the conflict in the first place), but at least this approach should return a valid plan rather than crashing.

Fixes to gmm detection

Comments from Jamie: the code that was commented out is okay to delete if that would make the code cleaner and maybe there should be an MIT or NASA copyright header on visualization.py

Temp disable analyst build in ci_push.yml

We already temp disabled the analyst build in ci_pr.yml as part of #135 (until #136 is fixed).

This ticket is to propagate the disabled status to ci_push.yml as well.

It's not terribly critical, but the broken analyst build in ci_push.yml has caused these problems:

The remote Docker images for develop have not been pushed for recent PR merges.
The CI badge on recent versions of develop has shown up as failed.

Bags with malformed rosjava message definitions break some rosbag operations

Astrobee FSW bags often contain messages produced by the guest science manager running on the HLP implemented in Android Java, publishing messages using rosjava. The relevant message topics start with /gs/gs_manager.

These bags cause certain rosbag API calls and command-line utilities, such as rosbag check and rosbag filter, to fail with the following non-intuitive error message:

genmsg.msg_loader.MsgNotFound: Cannot locate message [Header]: unknown package [std_msgs] on search path [{}]

This external issue discusses the problem: jacknlliu/development-issues#39 . It seems to be related to rosjava not including the expected dependency information along with its message definitions.

The error message makes it look suspiciously like the ROS environment is not activated or somehow misconfigured, but it occurs even when all the other messages in the FSW bag work fine, as you can verify by filtering out the /gs/* messages and making the same rosbag calls. (Also, rosmsg info std_msgs/Header works fine.)

We've observed this with FSW bags, making rosbag calls in the standard environment on the astrobeast server. An example bag is https://hivemind.ndc.nasa.gov/freeflyer/2022-01-03_100/robot_data/SN003/bags/20220103_1237_ars_default.bag

It's probably not within our project scope to fix this rosjava problem, but there should be some documented guidance on how to cope with it.

Improve action monitoring

A problem I've observed is that sometimes command_astrobee will get into a hanging state, but it's painful both to detect that condition and debug it:

Detection is painful because the action node parent of command_astrobee will keep merrily ticking away despite its child hanging (the action node only watches for termination of the child, not progress).
Debugging is painful because typically the best source of information is using monitor_astrobee to review the command_astrobee child's console output, but monitor_astrobee can't provide any of that console output if command_astrobee is unable to manage the socket I/O it relies on.

Some ideas about improvements that could help:

For more accurate liveness detection:
- Relevant command_astrobee children could print a heartbeat (i.e., any output at all) to stdout.
- command_astrobee could detect child heartbeats through its socket management and propagate them as ROS messages.
- The action node could subscribe to command_astrobee ROS heartbeats, detect missing heartbeats, and publish a warning in its console output and send_feedback() calls.
For better debugging of hanging commands:
- Review the status of command_astrobee logging of its child stdout. If not available already, record child stdout to a log file in addition to writing it to a socket. Make sure both the socket output and the log include not just the raw child stdout but also the child command that was run and the exit code once it completes. A main benefit of the log file is that it remains available even if command_astrobee socket management is hanging.
- Consider improvements to how command_astrobee communicates its state to monitor_astrobee. For example, if this doesn't happen already, it could help for command_astrobee to delete its sockets on exit so monitor_astrobee can use their absence to report that command_astrobee isn't running (vs. running but hanging).
- A late-connecting monitor_astrobee could start by reporting an excerpt from the latest log file before trying to read from the live socket (which may hang indefinitely without producing any useful info about what caused the hang).
- monitor_astrobee could have a --loop flag such that it stays running indefinitely and auto-connects to new sockets from new command_astrobee calls (transitions between different calls can be emphasized in its output styling). With this option available, it could be convenient during ops to leave a persistent monitor_astrobee instance running for each robot using two tmux windows. In some ways, this operating mode could substitute for other improvements above if it diminishes concerns about properly servicing late-connecting instances of monitor_astrobee.

Cargo Nodelet Failed to Load

Cargo Nodelet failed to load during the CM on 1/25/22.

Improve Science Camera Image Quality

Copying over Trey's initial email/thoughts:

I was thinking again about how SciCam image quality is one of the key issues for our close-up inspections ... Something that might be easier is avoiding JPEG compression artifacts. We could almost certainly save the color image as lossless PNG (modulo PNG compression would take longer and result in higher data volume saved on the HLP, would need to assess if we could handle that). I’ve spent a lot of time staring at zoomed-in-too-far SciCam images and can confirm there are a lot of JPEG artifacts.

A more intriguing possibility is saving RAW format images. It looks like the Android Camera2 API supports saving RAW images in DNG format, if the camera supports it. I tried to find more information about the SciCam’s Sony IMX 230 sensor… one spec sheet I found suggested it might be natively producing images with 10 bits per pixel. If so, having access to that extra bit depth could be especially helpful if we are trying to both zoom way in and contrast stretch images with poor illumination. Note that it might not be possible to save RAW at other than full resolution, in which case we would definitely need to double-check we could handle the data volume.

Revert isort call in pre-commit to ensure consistent errors with CI linter

The isort pre-commit call was edited here so that it calls isort only on changed files rather than recursively on the whole repo.

But there was a comment just above that line explaining in absurd detail why we are calling it recursively on the whole repo...

Fix -save option on inspection_tool

Survey manager multi-robot sim instructions

We need tested instructions in the repo for how to run the survey manager in multi-robot sim mode.

Not really surprising, but sharing instructions informally using posts in a Teams channel has proven to be a dismal failure. A particular annoyance is that code blocks inside Teams messages apparently can't be edited as we progress (unlike the rest of the message, ironically).

I personally have suffered a lot of time lost recovering from following inaccurate instructions that were apparently never tested.

Address issues in https://github.com/nasa/isaac/pull/103

          The planner is the component in the architecture that receives a domain and problem instance and returns a plan to execute. A lot of this stuff seems like it is really elsewhere in the achitecture, like it could be part of a `survey_executive` package instead. At the moment, some stuff (like `tmux_inject.py`) that was at a higher level is now being merged into `survey_planner`.

If we are going to stuff everything into one package, probably it should be called survey_manager, since that is the umbrella term for the coordinated execution part of the system of which the planner is one part.

I don't necessarily want to block merge based on issues like this. I just think it fosters confusion when we're talking about the system and different people may have different interpretations of what these words mean.

Originally posted by @trey0 in #103 (comment)

sci_cam_image timestamp updates

Update APK to publish updated timestamp
Publish camera_info topic
Turn off bagfile image publishing by default in the sci_cam_image config (only publish camera_info)
Update all ISAAC survey recording profiles to turn on logging of the camera_info message in the logging profile
Change the inspection manager to subscribe to / trigger on the camera_info topic
Change panorama pipeline to use camera_info timestamps by default instead of image and look up the HLP-saved image files
Change mapping pipeline to use camera_info timestamps by default instead of image and look up the HLP-saved image files
@kbrowne15 update checkout recording profile check to not expect sci_cam image, but the sci_cam_info

Bringing some dependencies in house

Back when Isaac started we looked around at the best packages to use for some of the work that we then built on. That was a good thing, but longer term having the isaac repo depend on third-party repos that may go away without notice is likely a fragile state of things.

When there's more dev time, perhaps one can look at this list of third-party deps, https://github.com/oleg-alexandrov/mvs-texturing/blob/isaac/elibs/CMakeLists.txt, and switch those to some forked repos under nasa/isaac. Even my own repo, as seen above, should be forked I think.

This came about because I managed to break some stuff in those dependencies (that I fixed with Marina). That was made robust now, by ensure a specific version is checked out, but the longer term dependency on repos whose history may be rewritten or even they may be deleted is still there.

Fix version conflict in Python dependencies of analyst image

Here's the relevant error:
https://github.com/nasa/isaac/actions/runs/7835253902/job/21380266683

We can unblock getting PRs to pass CI again once #135 is merged (temp disables the analyst build as part of the PR testing). This ticket is for a permanent fix that allows the image to be built again.

Note: Solving this issue is not as urgent as some others because the analyst notebook is not needed for ISS ops!

Running demos documentation for source build

Hi ISAAC team,

Thank you very much for releasing this project as Open Source! I just installed it in my Ubuntu 20.04 machine and installation went smoothly (only extra thing was to install Torch for img_analysis, as the readme in the package mentions). I am having a bit of trouble running the demos. I checked the documentation page but it refers to docker users (I tinkered a little bit trying to run isaac_astrobee.launch but got a bunch of I2C-related errors - running good ol' sim.launch at least start things fine). Any pointer would be appreciated, thanks!

survey executor not accepting plan when one robot docked

When we specify the initial configuration to be bumble in bay 6 and honey in berth2, I get the error at execution:

[ERROR] requirement not met: [(robot-at honey jem_bay7)]

the generated plan makes sense though:
0: (move bumble jem_bay6 jem_bay5 jem_bay4) [20]
20.001: (move bumble jem_bay5 jem_bay4 jem_bay3) [20]
20.002: (undock honey berth2 jem_bay7 jem_bay6 jem_bay8) [30]
40.002: (panorama bumble o0 jem_bay4) [780]
50.003: (panorama honey o1 jem_bay7) [780]
820.003: (move bumble jem_bay4 jem_bay3 jem_bay2) [20]
830.004: (move honey jem_bay7 jem_bay6 jem_bay5) [20]
840.004: (panorama bumble o1 jem_bay3) [780]
850.005: (panorama honey o2 jem_bay6) [780]
1620.01: (move bumble jem_bay3 jem_bay2 jem_bay1) [20]
1630.01: (move honey jem_bay6 jem_bay5 jem_bay4) [20]
1640.01: (panorama bumble o2 jem_bay2) [780]
1650.01: (panorama honey o3 jem_bay5) [780]
2420.01: (move bumble jem_bay2 jem_bay1 jem_bay0) [20]
2430.01: (move honey jem_bay5 jem_bay6 jem_bay7) [20]
2440.01: (panorama bumble o3 jem_bay1) [780]
2450.01: (move honey jem_bay6 jem_bay7 jem_bay8) [20]
2470.01: (stereo honey o4 jem_bay7 jem_bay4 jem_bay3 jem_bay5) [600]
3070.01: (dock honey jem_bay7 berth2) [90]
3220.01: (stereo bumble o4 jem_bay1 jem_bay4 jem_bay3 jem_bay5) [600]
3820.01: (move bumble jem_bay1 jem_bay2 jem_bay3) [20]
3840.01: (move bumble jem_bay2 jem_bay3 jem_bay4) [20]
3860.01: (move bumble jem_bay3 jem_bay4 jem_bay5) [20]
3880.01: (move bumble jem_bay4 jem_bay5 jem_bay6) [20]
3900.01: (move bumble jem_bay5 jem_bay6 jem_bay7) [20]
3920.01: (move bumble jem_bay6 jem_bay7 jem_bay8) [20]
3940.02: (dock bumble jem_bay7 berth1) [90]
saved.txt

Optimize JEM bay motions for robustness and speed

Currently, the JEM bay motions of the survey planner are not well-optimized.

The following issues are observed:

If the robot is moving multiple bays, it includes unnecessary rotations at each motion because it flies in face-forward mode and each pose has its attitude set to point the robot in the +X direction. Therefore, when flying multiple waypoints down the JEM centerline, at each waypoint, the robot turns 90 degrees to face +X, then turns back down the centerline. These rotations waste time and more importantly can negatively impact localization.
The poses are not following our typical recent guidance for how to optimize localization in the JEM, which is more or less to face the robot toward whichever end of the module is closest. (Roughly, point toward the airlock for JEM bay 4 and up, point toward the NOD2 hatch for JEM bay 3 and down.)

The following improvements are proposed:

Adjust attitude of JEM bay poses here to follow the localization guidance.
Turn off face-forward motion. Will think about the easiest approach for this. I think it would be ok for this to be a blanket change rather than configurable per position because we have been pretty consistent about wanting to avoid unnecessary point turns that could affect localization.

Convert TestCase class to structure

          Should this be a struct instead of a class?

Originally posted by @kbrowne15 in #70 (comment)