RGB-D Perception

Team Name

LabBot Visionaries

Timeline

Spring 2024 – Summer 2024

Students

  • Zobia Tahir – Computer Science
  • Diya Ranjit – Computer Science
  • Jose Morales – Computer Science
  • ChangHao Yang– Computer Science
  • Khang Nguyen  – Computer Science

Abstract

RGB-D perception is an integral part in robotics as it helps robots to recognize target objects and handle them with precision. Our project will help deal with medication handling and various laboratory shelving mechanisms.

Background

This project is a versatile software module for three-dimensional (3D) key-point detection with semantic scene understanding for vision-based automation applications, such as robots or unmanned vehicles installed with depth cameras. As 3D perception is increasingly influential in the vision-based automation domain (e.g., navigation, localization, and manipulation), these systems are oftentimes hard-coded with target objects (semantic information) to perform tasks. Therefore, to solve these issues, we will leverage RGB-D images to enable robots in our case the sawyer to “see” target objects in 3D with their key-points by using neural network-based panoptic segmentation and key-point detection methods with an RGB-D camera and be able to use the sawyer in laboratories for handling medication packages.​

Project Requirements

The project must meet a minimum speed requirement, it should be able to capture processes, and update RGB and depth data at a rate to allow the robotic arm to react to its environment in real time. The data will be captured by the RGB-D camera, the depth data processed into point cloud data, the RGB image passed through a neural network to be segmented, and passed through a neural network for object recognition. The data must also be captured accurately enough and processed accurately enough for it to be viable. Additionally the data must be captured and processed reliably enough to guarantee a consistent experience for the user.

Design Constraints

The performance speed of the system will be constrained by the hardware, although the arm and the camera may also constrain performance speed the predominant constraint is the neural network which is limited by the processing power of the system. Additionally, the accuracy of the captured data will likewise constraint performance, the accuracy will be affected by environmental factors such as lighting, low featured areas, and non-textured surfaces. Precision and articulation of the arm will greatly affect the performance of the system, especially when smaller objects are manipulated, or objects must be manipulated in complex ways. Budget constrains the quality of components, RGB-D camera, GPU, robotic arm, that can be used which ultimately creates all the above constraints. The deadline for the completed project also limits the amount of development, neural network training and testing that can be done.

Engineering Standards

Accuracy/Precision standards for computing: Enables precise localization of object features.

Hardware Packaging Standards for Robotics: Hardware components will be securely packaged for shipping. The packaging will ensure that the hardware arrives safely and is ready for immediate use by end-users

Industry Best Practices for Technical Support: Offer continuous technical support to address operational issues, troubleshooting and user inquiries.

User-friendly design principles: Facilitates easy configuration, parameter tuning, and visualization of key point detection results.

NFPA 70:  Compliance with all requirements specified in the National Electric Code. This includes wire runs, insulation, grounding, enclosures, over-current protection, and all other specifications.

System Overview

Firstly, the hardware layer includes one high-definition camera connecting to the main PC of the Sawyer robot through a USB interface, the Sawyer arm, and the grippers of the Sawyer controlled by the main PC. Meanwhile, we also plan to construct an SSH terminal through Ethernet to remotely control the robot from a distance via the main PC of the Sawyer robot. The remote PC mainly refers to the laptop of one of our team members. Next, the OS layer mainly contains the Ubuntu version (native) of the Sawyer robot with compatible drivers for the hardware components, such as the RGB camera. We do not want to cause any backward compatibility problems for future uses. Then, the middleware layer includes the ROS version compatible with the Ubuntu version in the OS layer. One of the critical components in the middleware layer is arm motion planning, which enables the robot to execute the arm planning algorithm to grasp the target object in later stages. However, we must re-verify its proficiency with the hardware available on the system for future use. Lastly, the application layer is built on top of the layers above. The application layer solely includes the software components discussed in Sec. 2, including the retrained the feature extraction model and the segmentation model training/fine-tuning on the custom dataset. Also, to accomplish the project, the application must include the source code of the components above.

Results

Our project has taken a significant step forward by successfully developing a system that uses an RGB-D camera for advanced key-point detection and semantic understanding. Our initial tests show promising results in accurately identifying key points in 3D space, which is crucial for applications that require precision like in laboratory settings. The training and optimization of our neural networks have proven effective, opening up new possibilities for automated tasks in complex environments.

Future Work

We’re encouraged by our achievements but recognize there’s room to push the boundaries even further:

  1. Algorithm Optimization: We’ll keep refining our neural network models to enhance their performance across diverse environments, ensuring they can handle a variety of scenarios with ease.
  2. Robotic Arm Integration: Next, we plan to expand our system’s capabilities by integrating more complex functions of the robotic arm. This will allow for more sophisticated operations and fine control, essential for handling delicate tasks.
  3. User Interface Improvements: We are committed to making our system as user-friendly as possible. An intuitive interface will make it easier for all users to operate the system efficiently, irrespective of their technical background.
  4. Design for Flexibility: We aim to design our system to be modular and scalable, ready to be adapted for different robots and automation needs.
  5. Extensive Field Testing: Moving forward, we’ll conduct extensive tests outside the lab to see how our system performs under real-world conditions and make necessary adjustments.

Our goal is to continue evolving our project to meet the demands of real-world applications and improve both functionality and user experience for all involved.

Project Files

Project Charter
System Requirements Specification
Architectural Design Specification
Detailed Design Specification
Poster

References

D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 224–236.

M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.

Steven McDermott