| |
Last updated on October 7, 2024. This conference program is tentative and subject to change
Technical Program for Tuesday October 15, 2024
|
TuWAT2 |
Room 2 |
Workshop on Interaction-Aware Autonomous Systems |
Workshop |
|
08:00-12:00, Paper TuWAT2.1 | |
Workshop on Interaction-Aware Autonomous Systems |
|
Hallgarten, Marcel | University of Tübingen, Robert Bosch GmbH |
Stoll, Martin | Robert Bosch GmbH |
Janjoš, Faris | Robert Bosch GmbH |
Ruppel, Felicia | Bosch Research and Ulm University |
Valada, Abhinav | University of Freiburg |
Zell, Andreas | University of Tübingen |
Pavone, Marco | Stanford University |
Gilitschenski, Igor | University of Toronto |
|
TuWAT11 |
Room 11 |
Robot Safety in Times of AI: Data, Decision, and Multimodal Interaction |
Workshop |
|
08:00-12:00, Paper TuWAT11.1 | |
Robot Safety in Times of AI: Data, Decision, and Multimodal Interaction |
|
Rajaei, Nader | Technical University of Munich |
Lilienthal, Achim J. | Orebro University |
Secchi, Cristian | Univ. of Modena & Reggio Emilia |
Hoffmann, Matej | Czech Technical University in Prague, Faculty of Electrical Engineering |
Abdolshah, Saeed | KUKA Deutschland GmbH |
Mansfeld, Nico | Franka Robotics GmbH |
Kirschner, Robin Jeanne | TU Munich, Institute for Robotics and Systems Intelligence |
|
TuWAT14 |
Room 14 |
Bio-Inspired, Biomimetics, and Biohybrid (Cyborg) Systems |
Workshop |
|
08:00-12:00, Paper TuWAT14.1 | |
Bio-Inspired, Biomimetics, and Biohybrid (Cyborg) Systems |
|
Li, Yao | Harbin Institute of Technology, Shenzhen |
Sato, Hirotaka | Nanyang Technological University |
Raman, Ritu | Massachusetts Institute of Technology |
Piazza, Cristina | Technical University Munich (TUM) |
Li, Liang | Max-Planck Institute of Animal Behavior |
Vo-Doan, T. Thang | The University of Queensland |
Do, Thanh Nho | University of New South Wales |
Umezu, Shinjiro | Waseda University |
Raman, Barani | Washington University in St. Louis |
Fukuda, Toshio | Nagoya University |
Valdivia y Alvarado, Pablo | Singapore University of Technology and Design, MIT |
Latif, Tahmid | Wentworth Institute of Technology |
Shoji, Kan | Nagaoka University of Technology |
Milana, Edoardo | University of Freiburg |
Xu, Nicole | University of Colorado Boulder |
Zhang, Hongying | National University of Singapore |
Zarrouk, David | Ben Gurion University |
Mouthuy, Pierre-Alexis | University of Oxford |
Shi, Qing | Beijing Institute of Technology |
Nenggan, Zheng | Zhejiang University |
|
TuPIT1 |
Room 1 |
Robotics and Automation I |
Teaser Session |
Co-Chair: Song, Dezhen | Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and Texas A&M University (TAMU) |
|
14:00-15:00, Paper TuPIT1.1 | |
FruitNeRF: A Unified Neural Radiance Field Based Fruit Counting Framework |
|
Meyer, Lukas | Friedrich-Alexander-Universität Erlangen-Nürnberg |
Gilson, Andreas | Fraunhofer IIS |
Schmid, Ute | University of Bamberg |
Stamminger, Marc | Universität Erlangen-Nürnberg |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation
Abstract: We introduce FruitNeRF, a unified novel fruit counting framework that leverages state-of-the-art view synthesis methods to count any fruit type directly in 3D. Our framework takes an unordered set of posed images captured by a monocular camera and segments fruit in each image. To make our system independent of the fruit type, we employ a foundation model that generates binary segmentation masks for any fruit. Utilizing both modalities, RGB and semantic, we train a semantic neural radiance field. Through uniform volume sampling of the implicit Fruit Field, we obtain fruit-only point clouds. By applying cascaded clustering on the extracted point cloud, our approach achieves precise fruit count. The use of neural radiance fields provides significant advantages over conventional methods such as object tracking or optical flow, as the counting itself is lifted into 3D. Our method prevents double counting fruit and avoids counting irrelevant fruit. We evaluate our methodology using both real-world and synthetic datasets. The real-world dataset consists of three apple trees with manually counted ground truths, a benchmark apple dataset with one row and ground truth fruit location, while the synthetic dataset comprises various fruit types including apple, plum, lemon, pear, peach, and mangoes. Additionally, we assess the performance of fruit counting using the foundation model compared to a U-Net.
|
|
14:00-15:00, Paper TuPIT1.2 | |
SPVSoAP3D: A Second-Order Average Pooling Approach to Enhance 3D Place Recognition in Horticultural Environments |
|
Barros, Tiago | Institute of Systems and Robotics - University of Coimbra |
Premebida, Cristiano | University of Coimbra |
Aravecchia, Stephanie | Georgia Tech Lorraine - IRL 2958 GT-CNRS |
Pradalier, Cedric | GeorgiaTech Lorraine |
Nunes, Urbano J. | Instituto De Sistemas E Robotica |
Keywords: Robotics and Automation in Agriculture and Forestry, Localization, Deep Learning Methods
Abstract: 3D LiDAR-based place recognition has been extensively researched in urban environments, yet it remains underexplored in agricultural settings. Unlike urban contexts, horticultural environments, characterized by their permeability to laser beams, result in sparse and overlapping LiDAR scans with suboptimal geometries. This phenomenon leads to intra- and inter-row descriptor ambiguity. In this work, we address this challenge by introducing SPVSoAP3D, a novel modeling approach that combines a voxel-based feature extraction network with an aggregation technique based on a second-order average pooling operator, complemented by a descriptor enhancement stage. Furthermore, we augment the existing HORTO-3DLM dataset by introducing two new sequences derived from horticultural environments. We evaluate the performance of SPVSoAP3D against state-of-the-art (SOTA) models, including OverlapTransformer, PointNetVLAD, and LOGG3D-Net, utilizing a cross-validation protocol on both the newly introduced sequences and the existing HORTO-3DLM dataset. The findings indicate that the average operator is more suitable for horticultural environments compared to the max operator and other first-order pooling techniques. Additionally, the results highlight the improvements brought by the descriptor enhancement stage.
|
|
14:00-15:00, Paper TuPIT1.3 | |
TriLoc-NetVLAD: Enhancing Long-Term Place Recognition in Orchards with a Novel LiDAR-Based Approach |
|
Sun, Na | Southwest University |
Fan, Zhengqiang | Beijing University of Agriculture |
Qiu, Quan | Beijing Institute of Petrochemical Technology |
Li, Tao | Beijing Research Center of Intelligent Equipment for Agriculture |
Feng, Qingchun | Beijing Research Centor of Intelligent Equepment for Agriculture |
Ji, Chao | Xinjiang Academy of Agricultural and Reclamation Science |
Zhao, Chunjiang | Beijing Research Center of Intelligent Equipment for Agriculture |
Keywords: Robotics and Automation in Agriculture and Forestry, Field Robots, Localization
Abstract: Accurate long-term place recognition is crucial for autonomous mobile robots operating in unstructured environments. However, in the challenging scene of orchard with high-frequency repetitive features, traditional LiDAR-based localization methods relying on geometric features prove to be inadequate. To address this challenge, we propose TriLoc-NetVLAD, a novel LiDAR-based long-term place recognition approach designed to handle the repetitive and ambiguous features of orchards. This approach initially fuses the point cloud density, height and spatial information to transform unordered 3D point clouds into a channel selection strategy based on descriptor’s sublayer similarity between query and its corresponding positive and negative samples to amplify the differences in environmental features. Finally, a Triplet Network is employed to extract local features, encompassing both high-dimensional and low-dimensional information. These local features are then cascaded through NetVLAD layer to form a global descriptor. Furthermore, we have built a cross-seasonal orchard dataset to evaluate the performance of our place recognition method. The experiment results demonstrate the advantageous localization performances of the proposed localization system over the existing methods.
|
|
14:00-15:00, Paper TuPIT1.4 | |
3D Branch Point Cloud Completion for Robotic Pruning in Apple Orchards |
|
Qiu, Tian | Cornell University |
Zoubi, Alan | Cornell Unviersity |
Spine, Nikolai | Cornell University |
Cheng, Lailiang | Cornell University |
Jiang, Yu | Cornell University |
Keywords: Robotics and Automation in Agriculture and Forestry, Robotics and Automation in Life Sciences, Deep Learning for Visual Perception
Abstract: Robotic branch pruning, a rapidly growing field addressing labor shortages in agriculture, requires detailed perception of branch geometry and topology. However, point clouds obtained in agricultural settings often lack completeness, limiting pruning accuracy. This work addressed point cloud quality via a closed-loop approach, (Real2Sim)-1. Leveraging a Real-to-Simulation (Real2Sim) data generation pipeline, we generated simulated 3D apple trees based on realistically characterized apple tree information without manual parameterization. These 3D trees were used to train a simulation-based deep model that jointly performs point cloud completion and skeletonization on real-world partial branches, without extra real-world training. The Sim2Real qualitative results showed the model’s remarkable capability for geometry reconstruction and topology prediction. Additionally, we quantitatively evaluated the Sim2Real performance by comparing branch-level trait characterization errors using raw incomplete data and the best complete data. The Mean Absolute Error (MAE) reduced by 75% and 8% for branch diameter and branch angle estimation, respectively, which indicates the effectiveness of the Real2Sim data in a zero-shot generalization setting. The characterization improvements contributed to the precision and efficacy of robotic branch pruning.
|
|
14:00-15:00, Paper TuPIT1.5 | |
HortiBot: An Adaptive Multi-Arm System for Robotic Horticulture of Sweet Peppers |
|
Lenz, Christian | University of Bonn |
Menon, Rohit | University of Bonn |
Schreiber, Michael | University of Bonn |
Paul, Jacob, Melvin | Hochschule Bonn-Rhein-Sieg |
Behnke, Sven | University of Bonn |
Bennewitz, Maren | University of Bonn |
Keywords: Robotics and Automation in Agriculture and Forestry, Dual Arm Manipulation, Perception for Grasping and Manipulation
Abstract: Horticultural tasks such as pruning and selective harvesting are labor intensive and horticultural staff are hard to find. Automating these tasks is challenging due to the semi- structured greenhouse workspaces, changing environmental conditions such as lighting, dense plant growth with many occlusions, and the need for gentle manipulation of non-rigid plant organs. In this work, we present the three-armed system HortiBot, with two arms for manipulation and a third arm as an articulated head for active perception using stereo cameras. Its perception system detects not only peppers, but also peduncles and stems in real time, and performs online data association to build a world model of pepper plants. Collision-aware online trajectory generation allows all three arms to safely track their respective targets for observation, grasping, and cutting. We integrated perception and manipulation to perform selective harvesting of peppers and evaluated the system in lab experiments. Using active perception coupled with end-effector force torque sensing for compliant manipulation, HortiBot achieves high success rates in our indoor pepper plant mock-up.
|
|
14:00-15:00, Paper TuPIT1.6 | |
Markerless Aerial-Terrestrial Co-Registration of Forest Point Clouds Using a Deformable Pose Graph |
|
Casseau, Benoit | University of Oxford |
Chebrolu, Nived | University of Oxford |
Mattamala, Matias | University of Oxford |
Freißmuth, Leonard | Technical University Munich |
Fallon, Maurice | University of Oxford |
Keywords: Robotics and Automation in Agriculture and Forestry, Mapping, Aerial Systems: Applications
Abstract: For biodiversity and forestry applications, end-users desire maps of forests that are fully detailed--from the forest floor to the canopy. Terrestrial laser scanning and aerial laser scanning are accurate and increasingly mature methods to scan the forest. However, individually they are not able to estimate attributes such as tree height, trunk diameter and canopy density due to the inherent differences in their field-of-view and mapping processes. In this work, we present a pipeline that can automatically generate a single joint terrestrial and aerial forest reconstruction. The novelty of the approach is a marker-free registration pipeline, which estimates a set of relative transformation constraints between the aerial cloud and terrestrial sub-clouds without requiring any co-registration reflective markers to be physically placed in the scene. Our method then uses these constraints in a pose graph formulation, which enables us to finely align the respective clouds while respecting spatial constraints introduced by the terrestrial SLAM scanning process. We demonstrate that our approach can produce a fine-grained and complete reconstruction of large-scale natural environments, enabling multi-platform data capture for forestry applications without requiring external infrastructure.
|
|
14:00-15:00, Paper TuPIT1.7 | |
Optimal View Point and Kinematic Control for Grape Stem Detection and Cutting with an In-Hand Camera Robot |
|
Stavridis, Sotiris | Aristotle University of Thessaloniki |
Doulgeri, Zoe | Aristotle University of Thessaloniki |
Keywords: Robotics and Automation in Agriculture and Forestry, Reactive and Sensor-Based Planning
Abstract: In this work, a methodology to find the best view of a grape stem and approach angle in order to crop it is proposed. The control scheme is based only on a classified point cloud obtained by the in-hand camera attached to the robot’s end effector without continuous stem tracking. It is shown that the proposed controller finds and reaches the optimal view point and subsequently the stem fast and efficiently, accelerating the overall harvesting procedure. The proposed control scheme is evaluated through experiments in the lab with a UR5e robot with an in-hand RealSense camera on a mock-up vine.
|
|
14:00-15:00, Paper TuPIT1.8 | |
Real-Time Semantic Segmentation in Natural Environments with SAM-Assisted Sim-To-Real Domain Transfer |
|
Wang, Han | ETH Zurich |
Mascaro, Ruben | ETH Zurich |
Chli, Margarita | ETH Zurich & University of Cyprus |
Teixeira, Lucas | ETH Zurich |
Keywords: Robotics and Automation in Agriculture and Forestry, Semantic Scene Understanding, Transfer Learning
Abstract: Semantic segmentation plays a pivotal role in many robotic applications requiring high-level scene understanding, such as smart farming, where the precise identification of trees or plants can aid navigation and crop monitoring tasks. While deep-learning-based semantic segmentation approaches have reached outstanding performance in recent years, they demand large amounts of labeled data for training. Inspired by modern Unsupervised Domain Adaptation (UDA) techniques, in this paper, we introduce a two-step training pipeline specifically tailored to challenging natural scenes, where the availability of annotated data is often quite limited. Our strategy involves the initial training of a powerful domain adaptive architecture, followed by a refinement stage, where segmentation masks predicted by the Segment Anything Model (SAM) are used to improve the accuracy of the predictions on the target dataset. These refined predictions serve as pseudo-labels to supervise the training of a final distilled architecture for real-time deployment. Extensive experiments conducted in two real-world scenes demonstrate the effectiveness of the proposed method. Specifically, we show that our pipeline enables the training of a MobileNetV3 that achieves significant mIoU gains of 3.60% and 11.40% on our two datasets compared to the DAFormer while only demanding 1/15 of the latter's inference time. Code and datasets are available at https://github.com/VIS4ROB-lab/nature_uda_rt_segmentation .
|
|
14:00-15:00, Paper TuPIT1.9 | |
Temporal and Viewpoint-Invariant Registration for Under-Canopy Footage Using Deep-Learning-Based Bird's-Eye View Prediction |
|
Zhou, Jiawei | ETH Zurich |
Mascaro, Ruben | ETH Zurich |
Cadena Lerma, Cesar | ETH Zurich |
Chli, Margarita | ETH Zurich & University of Cyprus |
Teixeira, Lucas | ETH Zurich |
Keywords: Robotics and Automation in Agriculture and Forestry, Localization, Deep Learning for Visual Perception
Abstract: Conducting visual assessments under the canopy using mobile robots is an emerging task in smart farming and forestry. However, it is challenging to register images across different data-collection days, especially across seasons, due to the self-occluding geometry and temporal dynamics in forests and orchards. This paper proposes a new approach for registering under-canopy image sequences in general and in these situations. Our methodology leverages standard GPS data and deep-learning-based perspective to bird's-eye view conversion to provide an initial estimation of the positions of the trees in images and their association across datasets. Furthermore, it introduces an innovative strategy for extracting tree trunks and clean ground surfaces from noisy and sparse 3D reconstructions created from the image sequences, utilizing these features to achieve precise alignment. Our robust alignment method effectively mitigates position and scale drift, which may arise from GPS inaccuracies and Sparse Structure from Motion (SfM) limitations. We evaluate our approach on three challenging real-world datasets, demonstrating that our method outperforms ICP-based methods on average by 50%, and surpasses FGR and TEASER++ by over 90% in alignment accuracy. These results highlight our method's cost efficiency and robustness, even in the presence of severe outliers and sparsity. https://github.com/VIS4ROB-lab/bev_undercanopy_registration
|
|
14:00-15:00, Paper TuPIT1.10 | |
Design of Stickbug: A Six-Armed Precision Pollination Robot |
|
Smith, Trevor | West Virginia University |
Rijal, Madhav | West Virginia University |
Arend Tatsch, Christopher Alexander | West Virginia University |
Butts, R. Michael | West Virginia University |
Beard, Jared | West Virginia University |
Robert Cook, Tyler | West Virginia University |
Chu, Andy | West Virginia University |
Gross, Jason | West Virginia University |
Gu, Yu | West Virginia University |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, Multi-Robot Systems
Abstract: This work presents the design of Stickbug, a six-armed, multi-agent, precision pollination robot that combines the accuracy of single-agent systems with swarm parallelization in greenhouses. Precision pollination robots have often been proposed to offset the effects of a decreasing population of natural pollinators, but they frequently lack the required parallelization and scalability. Stickbug achieves this by allowing each arm and drive base to act as an individual agent, significantly reducing planning complexity. Stickbug uses a compact holonomic Kiwi drive to navigate narrow greenhouse rows, a tall mast to support multiple manipulators and reach plant heights, a detection model and classifier to identify Bramble flowers, and a felt-tipped end-effector for contact-based pollination. Initial experimental validation demonstrates that Stickbug can attempt over 1.5 pollinations per minute with a 49% success rate. Additionally, a Bramble flower perception dataset was created and is publicly available alongside Stickbug's software and design files.
|
|
14:00-15:00, Paper TuPIT1.11 | |
Occlusion Handling by Pushing for Enhanced Fruit Detection |
|
Gursoy, Ege | LIRMM, University of Montpellier CNRS |
Kulic, Dana | Monash University |
Cherubini, Andrea | LIRMM - Universite De Montpellier CNRS |
Keywords: Robotics and Automation in Agriculture and Forestry, Computer Vision for Automation, RGB-D Perception
Abstract: In agricultural robotics, effective observation and localization of fruits present challenges due to occlusions caused by other parts of the tree, such as branches and leaves. These occlusions can result in false fruit localization or impede the robot from picking the fruit. The objective of this work is to push away branches that block the fruit's view to increase their visibility. Our setup consists of an RGB-D camera and a robot arm. First, we detect the occluded fruit in the RGB image and estimate its occluded part via a deep learning generative model in the depth space. The direction to push to clear the occlusions is determined using classic image processing techniques. We then introduce a 3D extension of the 2D Hough transform to detect straight line segments in the point cloud. This extension helps detect tree branches and identify the one mainly responsible for the occlusion. Finally, we clear the occlusion by pushing the branch with the robot arm. Our method uses a combination of deep learning for fruit appearance estimation, classic image processing for push direction determination, and 3D Hough transform for branch detection. We validate our perception methods through real data under different lighting conditions and various types of fruits (i.e. apple, lemon, orange), achieving improved visibility and successful occlusion clearance. We demonstrate the practical application of our approach through a real robot branch pushing demonstration.
|
|
14:00-15:00, Paper TuPIT1.12 | |
Toward Precise Robotic Weed Flaming Using a Mobile Manipulator with a Blowtorch |
|
Wang, Di | Texas A&M University |
Hu, Chengsong | Mohamed Bin Zayed University of Artificial Intelligence |
Xie, Shuangyu | Texas A&M University |
Johnson, Joe | Texas A&M University |
Ji, Hojun | Boston Dynamics |
Jiang, Yingtao | Texas A&M University |
Bagavathiannan, Muthukumar | Texas A&M University |
Song, Dezhen | Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) |
Keywords: Mobile Manipulation, Perception for Grasping and Manipulation, Robotics and Automation in Agriculture and Forestry
Abstract: Robotic weed flaming is a new and environmentally friendly approach to weed removal in the agricultural field. Using a mobile manipulator equipped with a flamethrower, we design a new system and algorithm to enable effective weed flaming, which requires robotic manipulation with a soft and deformable end effector, as the thermal coverage of the flame is affected by dynamic or unknown environmental factors such as gravity, wind, atmospheric pressure, fuel tank pressure, and pose of the nozzle. System development includes overall design, hardware integration, and software pipeline. To enable precise weed removal, the greatest challenge is to detect and predict dynamic flame coverage in real time before motion planning, which is quite different from a conventional rigid gripper in grasping or a spray gun in painting. Based on the input from two onboard infrared cameras and the pose information of the mobile manipulator and the nozzle of the flamethrower, we propose a new dynamic flame coverage model. The flame model uses a center-arc curve with a Gaussian cross-section model to describe the flame coverage in real time. The experiments have demonstrated the working system and shown that our model and algorithm can achieve a mean average precision (mAP) of more than 76% in the reprojected images during online prediction.
|
|
14:00-15:00, Paper TuPIT1.13 | |
Towards Human-Centered Construction Robotics: A Reinforcement Learning-Driven Companion Robot for Contextually Assisting Carpentry Workers |
|
Wu, Yuning | Carnegie Mellon University |
Wei, Jiaying | Carnegie Mellon University |
Oh, Jean | Carnegie Mellon University |
Cardoso Llach, Daniel | Carnegie Mellon University |
Keywords: Robotics and Automation in Construction, Reinforcement Learning, Human-Centered Robotics
Abstract: In the dynamic construction industry, traditional robotic integration has primarily focused on automating specific tasks, often overlooking the complexity and variability of human aspects in construction workflows. This paper introduces a human-centered approach with a "work companion rover" designed to assist construction workers within their existing practices, aiming to enhance safety and workflow fluency while respecting construction labor's skilled nature. We conduct an in-depth study on deploying a robotic system in carpentry formwork, showcasing a prototype that emphasizes mobility, safety, and comfortable worker-robot collaboration in dynamic environments through a contextual Reinforcement Learning (RL)-driven modular framework. Our research advances robotic applications in construction, advocating for collaborative models where adaptive robots support rather than replace humans, underscoring the potential for an interactive and collaborative human-robot workforce.
|
|
14:00-15:00, Paper TuPIT1.14 | |
Dynamic Throwing with Robotic Material Handling Machines |
|
Werner, Lennart | ETH Zürich |
Nan, Fang | ETH Zurich |
Eyschen, Pol | ETH Zurich |
Spinelli, Filippo Alberto | ETH Zürich |
Yang, Hongyi | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Robotics and Automation in Construction, Underactuated Robots, Robotics in Hazardous Fields
Abstract: Automation of hydraulic material handling machinery is currently limited to semi-static pick-and-place cycles. Dynamic throwing motions which utilize the passive joints, can greatly improve time efficiency as well as increase the dumping workspace. In this work, we use RL to design dynamic controllers for material handlers with underactuated arms as commonly used in logistics. The controllers are tested both in simulation and in real-world experiments on a 12-ton test platform. The method is able to exploit the passive joints of the gripper to perform dynamic throwing motions. With the proposed controllers, the machine is able to throw individual objects to targets outside the static reachability zone with good accuracy for its practical applications. The work demonstrates the possibility of using RL to perform highly dynamic tasks with heavy machinery, suggesting a potential for improving the efficiency and precision of autonomous material handling tasks.
|
|
14:00-15:00, Paper TuPIT1.15 | |
Extensive, Long-Term Task and Motion Planning with Signal Temporal Logic Specification for Autonomous Construction |
|
Satoh, Mineto | NEC Corporation |
Takano, Rin | NEC Corporation |
Oyama, Hiroyuki | NEC Corporation |
Keywords: Robotics and Automation in Construction, Task and Motion Planning, Optimization and Optimal Control
Abstract: We propose a hierarchical task and motion planning (TAMP) for autonomous construction that manipulates deformable objects, such as terrain excavation. The TAMP is required to generate an efficient task plan to meet high-level construction goals at sites with environmental diversity while ensuring motion feasibility. The difficulty, however, is to manipulate deformable objects containing nonlinear dynamics with a target given by a continuous value as the task specification. Optimization-based TAMP with signal temporal logic specifications in robotics is promising because of its continuous task specification and formulation as a nonlinear programming problem. The key to its application to extensive, long-term planning at real construction sites is a computationally efficient and stable formulation. We introduce a new expression for deformable objects with a simple differentiable function and a system model that can represent mode transitions based on machine action on the objects. This allows TAMP to be formulated as simultaneously selecting an action for objects and planning the motion to execute it. Furthermore, a hierarchical method that gives appropriate initial values is combined to improve optimality for large-scale nonlinear problems. From the verification by numerical experiments, the proposed method can generate a plan that minimizes the time to meet the task goal, even when the area is expanded.
|
|
14:00-15:00, Paper TuPIT1.16 | |
BEV Image-Based Lane Tracking Control System for Autonomous Lane Repainting Robot |
|
Seo, Junghyun | DGIST |
Jeon, Hyeonjae | Daegu Gyeongbuk Institute of Science and Technology (DGIST) |
Choi, Joonyoung | Daegu Gyeongbuk Institute of Science and Technology |
Kwangho, Woo | Robot for People |
Lim, Yongseob | DGIST |
Jin, Yongsik | Electronics and Telecommunications Research Institute |
Keywords: Robotics and Automation in Construction, Software-Hardware Integration for Robot Systems, Industrial Robots
Abstract: In this paper, we present a novel study on a BEV image-based lane tracking control system for an autonomous lane repainting robot. Our research introduces a cutting-edge lane detection method based on BEV images, leveraging row-anchor techniques to enhance precision and provide detailed error information for lane tracking algorithms. By utilizing real-time sensor data and advanced deep learning processes, we have successfully implemented a high-performance lane repainting system that minimizes errors and ensures accuracy. Our proposed position-based visual pure pursuit algorithm (PV-PP) plays a crucial role in guiding the lane repainting process with precision and efficiency, ultimately improving the functionality and feasibility of the linear actuator responsible for paint spraying in the real indusrial fields. Through our contributions, including innovative lane detection methods, real-time sensor utilization, and robot control algorithm design, we aim to advance the field of autonomous lane repainting robots and enhance the safety and effectiveness of road maintenance operations.
|
|
14:00-15:00, Paper TuPIT1.17 | |
A Fast Heuristic Scheduling Search for Robotic Cellular Manufacturing Systems with Generalized and Timed Petri Nets |
|
Xiao, YuanZheng | Nanjing University of Science and Technology |
Gao, YangQing | Nanjing university of science and technology |
Wu, Haoran | Nanjing University of Science and Technology |
Huang, Bo | Nanjing University of Science and Technology |
Lv, Jianyong | Nanjing University of Science and Technology |
|
TuPIT2 |
Room 2 |
Assistive Robotics |
Teaser Session |
Chair: Campolo, Domenico | Nanyang Technological University |
Co-Chair: Cifuentes, Carlos A. | University of the West of England, Bristol |
|
14:00-15:00, Paper TuPIT2.1 | |
Kiri-Spoon: A Soft Shape-Changing Utensil for Robot-Assisted Feeding |
|
Keely, Maya | Virginia Tech |
Nemlekar, Heramb | Virginia Tech |
Losey, Dylan | Virginia Tech |
Keywords: Physically Assistive Devices, Soft Robot Applications, Grippers and Other End-Effectors
Abstract: Assistive robot arms have the potential to help disabled or elderly adults eat everyday meals without relying on a caregiver. To provide meaningful assistance, these robots must reach for food items, pick them up, and then carry them to the human's mouth. Current work equips robot arms with standard utensils (e.g., forks and spoons). But --- although these utensils are intuitive for humans --- they are not easy for robots to control. If the robot arm does not carefully and precisely orchestrate its motion, food items may fall out of a spoon or slide off of the fork. Accordingly, in this paper we design, model, and test Kiri-Spoon, a novel utensil specifically intended for robot-assisted feeding. Kiri-Spoon combines the familiar shape of traditional utensils with the capabilities of soft grippers. By actuating a kirigami structure the robot can rapidly adjust the curvature of Kiri-Spoon: at one extreme the utensil wraps around food items to make them easier for the robot to pick up and carry, and at the other extreme the utensil returns to a typical spoon shape so that human users can easily take a bite of food. Our studies with able-bodied human operators suggest that robot arms equipped with Kiri-Spoon carry foods more robustly than when leveraging traditional utensils.
|
|
14:00-15:00, Paper TuPIT2.2 | |
A Wearable Platform Based on the Multi-Modal Foundation Model to Augment Spatial Cognition for People with Blindness and Low Vision |
|
Hao, Yu | New York University |
Magay, Alexey | New York University Abu Dhabi |
Huang, Hao | New York University |
Yuan, Shuaihang | New York University |
Wen, Congcong | New York University Abu Dhabi |
Fang, Yi | New York University |
Keywords: Physically Assistive Devices, Wearable Robotics
Abstract: Spatial cognition refers to the ability to understand and navigate the spatial relationship between objects in the environment. People with blindness and low vision (pBLV) face significant challenges with spatial cognition due to the reliance on visual input. Without the full range of visual cues, pBLV individuals often find it difficult to grasp a comprehensive understanding of their environment, leading to obstacles in scene recognition and precise object localization, especially in unfamiliar environments. This limitation extends to their ability to independently detect and avoid potential tripping hazards, making navigation and interaction with their environment more challenging. In this paper, we presents a pioneering wearable platform tailored to enhance the spatial cognition of pBLV through the integration of multi-modal foundation model. The proposed platform integrates a wearable camera with audio module and leverages the advanced capabilities of vision language foundation model (i.e., GPT-4 and GPT-4V), for the nuanced processing of visual and textual data. Then we employ vision language models to bridge the gap between visual information and the proprioception of visually impaired users, offering more intelligible guidance by aligning visual data with the natural perception of space and movement. We also apply prompt engineering to guide the large language model to act as an assistant tailored specifically for pBLV users to produce accurate answers. Another notable innovation in our model is the incorporation of a chain of thought reasoning process, which enhances the accuracy and interpretability of the model, facilitating the generation of more precise responses to complex user inquiries across diverse environmental contexts. To assess the practical impact of our proposed wearable platform, we carried out a series of real-world experiments across three tasks that are commonly challenging for people with blindness and low vision: risk assessment, object localization, and scene recognition. Additionally, through an ablation study conducted on the VizWiz dataset, we rigorously assess the contribution of each individual module, substantiating the integral role in the model's overall performance.
|
|
14:00-15:00, Paper TuPIT2.3 | |
Force-Triggered Control Design for User Intent-Driven Assistive Upper-Limb Robots |
|
Manzano, Maxime | IRISA UMR CNRS 6074 - INRIA - INSA Rennes |
Guegan, Sylvain | INSA Rennes |
Le Breton, Ronan | UNIV-RENNES - INSA Rennes |
Devigne, Louise | IRISA UMR CNRS 6074 - INRIA - INSA Rennes - Rehabilitation Cente |
Babel, Marie | IRISA UMR CNRS 6074 - INRIA - INSA Rennes |
Keywords: Physically Assistive Devices, Prosthetics and Exoskeletons
Abstract: Assistive devices are to be designed with the objective of use in daily-life as well as broad adoption by end users. In this context, it is necessary to tackle usability challenges by properly detecting and acting in accordance to user intents while minimizing the device installation complexity as well. In the case of physical assistive devices, using force/torque sensors is advantageous to detect user intent compared to EMG interfaces, but it remains difficult to correctly translate the detected intent into actuator motions. Focusing on upper-limb assistive robots, the user voluntary force is commonly used with a controller based on an admittance approach which leads to relatively poor reactivity and requires the user to develop force throughout the movement which can lead to fatigue, particularly for people with upper-limb impairments. This work proposes a Force-Triggered (FT) controller which can initiate and maintain movement only from short force impulses. The user voluntary forces are retrieved from total interaction forces by subtracting the passive component measured beforehand during a calibration phase. This paper presents the design of the proposed FT controller and its preliminary testing on pick-and-place tasks compared to an admittance strategy. This experiment was performed with one participant without impairment, equipped with an upper-limb exoskeleton prototype designed from recommendations of physical medicine therapists. This preliminary work highlights the potential of the proposed FT controller. Also, it provides directions for future work and clinical trials with end-users to assess the proposed FT approach usability while used alone or in the form of an hybrid controller between FT and admittance strategies.
|
|
14:00-15:00, Paper TuPIT2.4 | |
Multimodal Haptic Interface for Walker-Assisted Navigation |
|
Wang, Yikun | Bristol Robotics Laboratory, University of the West of England, |
Sierra M., Sergio D. | University of Bristol |
Harris, Nigel | Bristol Robotics Laboratory, University of the West of England, |
Munera, Marcela | University of West England |
Cifuentes, Carlos A. | University of the West of England, Bristol |
Keywords: Physically Assistive Devices, Human-Centered Robotics, Haptics and Haptic Interfaces
Abstract: This study investigates the efficacy of a haptic interface, aiming to offer the walking frame users accurate, intuitive, and easily understandable directional cues. The research introduces a novel haptic feedback interface incorporated into a walking frame to enhance navigation assistance. The haptic handle encompasses three distinct haptic feedback modalities: vibration, skin stretch, and combined feedback. Ten participants, all in good health, engaged with the haptic handle for navigation. Across all three haptic feedback methods, 60% of participants found the combined feedback to be the most effective, while 40% favoured the vibration feedback; none selected the skin-stretch feedback. Comparative analysis revealed significant disparities between vibration and combined input regarding velocity (p-value: 0.04). These findings emphasize the haptic handle's capacity to give users an instinctive perception of directional cues, thus offering a promising avenue for assistive navigation.
|
|
14:00-15:00, Paper TuPIT2.5 | |
Development and Functional Evaluation of the PrHand V3 Soft-Robotics Prosthetic Hand |
|
Ramos, Orion Yari Santiago | Universidad Del Rosario, School of Engineering, Science and Techn |
De Arco, Laura | Federal University of Espírito Santo |
Munera, Marcela | University of West England |
Robledo, Jorge | Prótesis Avanzadas SAS |
Moazen, Mehran | UCL |
Wurdemann, Helge Arne | University College London |
Cifuentes, Carlos A. | University of the West of England, Bristol |
Keywords: Prosthetics and Exoskeletons, Soft Robot Applications, Physically Assistive Devices
Abstract: The affordability and functionality of hand prosthetics in developing countries are still very limited. This work aims to present and evaluate the new version of the PrHand affordable robotic prosthesis (PrHand V3), built with soft robotics and compliant mechanisms. PrHand V3 implements a new frictionless tendon unification system, the degree of freedom of thumb opposition was removed, and the finger flexion was improved to the previous version, PrHand V2. The study contributes by evaluating these mechanical changes and conducting the first functional assessment of PrHand V3 with an amputee user. The Anthropomorphic Hand Assessment Protocol (AHAP) dexterity test was the first evaluation in this work; it evaluated how the prosthesis performs eight different grips. PrHand V3 was compared with a PrHand V2 and a commercial robotic prosthesis A3D from Pr´otesis Avanzadas SAS. PrHand V3’s score on the AHAP test was 80%. This result is higher than the 69% obtained by the PrHand V2 and the 79% obtained by A3D. The Activities Measure for Upper Limb Amputees (AM-ULA) test was the second evaluation in this work; An A3D amputee user performed 23 Activities of Daily Living with PrHand V3 and an A3D. PrHand V3 obtained an average of 2.86/4, and A3D obtained an average of 2.96/4 without significant differences between the two tests. The soft actuation of PrHand V3 as an affordable prosthesis performs similarly to a commercial robotic prosthesis with the advantage of being more flexible to assist a trans-radial hand amputee.
|
|
14:00-15:00, Paper TuPIT2.6 | |
Evaluating the Impact of a Semi-Autonomous Interface on Configuration Space Accessibility for Multi-DOF Upper Limb Prostheses |
|
Greene, Rebecca J. | Johns Hopkins University |
Hunt, Christopher | Infinite Biomedical Technologies |
Acosta, Brooklyn Paige | Dillard University |
Huang, Zihan | Johns Hopkins University |
Kaliki, Rahul | Infinite Biomedical Technologies |
Thakor, Nitish V. | Johns Hopkins University, Baltimore, USA |
Keywords: Prosthetics and Exoskeletons, Human-Robot Collaboration, Virtual Reality and Interfaces
Abstract: Powered upper limb prostheses offer a particularly interesting case of human-machine interaction, where the user and the robot are physically coupled in an open chain manipulator. The biological and mechanical degrees of freedom (DOF) must collaborate for the user to manipulate objects in the environment. Current state-of-the-art systems use machine learning models to classify electromyogram (EMG) signals into motion intent primitives, allowing users to move the prosthetic joints sequentially at a fixed velocity. Current commercially available systems are limited to 1 or 2 powered DOF. This interface is intended to work for simple systems but does not extend well into higher DOF. In this paper, we present a semi-autonomous (SA) hybrid gaze-EMG interface that allows users to command the device in task-space instead of joint-space. Target end-effector poses are selected by tracking the user’s gaze vector, and then EMG signals guide the prosthetic along a calculated trajectory towards that pose. To examine how prosthesis interface performance scales with available mechanical DOF, we had 4 subjects complete virtual pick and place tasks with SA and traditional controller interfaces, varying the available DOF in the prosthetic wrist. Our results show that with the SA interface, increased DOF leads to a significant (p ≤ 0.05) reduction in compensatory motion of the upper arm, more effective (p ≤ 0.01) utilization of the increased configuration space, and overall more efficient motion (p ≤ 0.01) than traditional classification based interfaces. These findings indicate that when given SA interfaces, subjects can benefit from fully articulated prosthetic devices, which motivates more clinical research into SA systems and commercial development of higher-DOF devices.
|
|
14:00-15:00, Paper TuPIT2.7 | |
Data-Driven Predictive Control for Robust Exoskeleton Locomotion |
|
Li, Kejun | California Institute of Technology |
Kim, Jeeseop | Caltech |
Xiong, Xiaobin | University of Wisconsin Madison |
Akbari Hamed, Kaveh | Virginia Tech |
Yue, Yisong | California Institute of Technology |
Ames, Aaron | Caltech |
Keywords: Prosthetics and Exoskeletons, Humanoid and Bipedal Locomotion, Model Learning for Control
Abstract: Exoskeleton locomotion must be robust while being adaptive to different users with and without payloads. To address these challenges, this work introduces a data-driven predictive control (DDPC) framework to synthesize walking gaits for lower-body exoskeletons, employing Hankel matrices and a state transition matrix for its data-driven model. The proposed approach leverages DDPC through a multi-layer architecture. At the top layer, DDPC serves as a planner employing Hankel matrices and a state transition matrix to generate a data-driven model that can learn and adapt to varying users and payloads. At the lower layer, our method incorporates inverse kinematics and passivity-based control to map the planned trajectory from DDPC into the full-order states of the lower-body exoskeleton. We validate the effectiveness of this approach through numerical simulations and hardware experiments conducted on the Atalante lower-body exoskeleton with different payloads. Moreover, we conducted a comparative analysis against the model predictive control (MPC) framework based on the reduced-order linear inverted pendulum (LIP) model. Through this comparison, the paper demonstrates that DDPC enables robust bipedal walking at various velocities while accounting for model uncertainties and unknown perturbations.
|
|
14:00-15:00, Paper TuPIT2.8 | |
An Adaptive Robotic Exoskeleton for Comprehensive Force-Controlled Hand Rehabilitation |
|
Wilhelm, Nikolas Jakob | Technical University of Munich |
Schaack, Victor Gilles | Technical University Munich |
Leisching, Annick | TUM, MRI, Orthopaedics and Sport Orthopaedics |
Micheler, Carina M. | Technical University of Munich, TUM School of Medicine, Klinikum |
Haddadin, Sami | Technical University of Munich |
Burgkart, Rainer | Technische Universität München |
Keywords: Prosthetics and Exoskeletons, Rehabilitation Robotics, Wearable Robotics
Abstract: This study presents the development and validation of an innovative hand exoskeleton designed for the rehabilitation of patients with Complex Regional Pain Syndrome (CRPS), a condition frequently arising post-injury or surgeries. The prototype is tailored for the hand, a region commonly affected by CRPS, and is notable for its adaptability and a comprehensive sensor system for monitoring individual joint movements. These features allow for personalized rehabilitation and objective progress tracking, addressing limitations in traditional physiotherapy such as availability, cost, and time constraints. Reliable sensor performance was defined through precise force measurements and stability over time, showing minimal drift. These features enable personalized rehabilitation and objective progress tracking, addressing limitations in traditional physiotherapy such as availability, cost, and time constraints.The contributions of this work lie in its innovative design and the potential for robotic systems to improve therapeutic outcomes in CRPS rehabilitation.
|
|
14:00-15:00, Paper TuPIT2.9 | |
A Tactile Lightweight Exoskeleton for Teleoperation: Design and Control Performance |
|
Forouhar, Moein | Technische Universität München |
Sadeghian, Hamid | Technical University of Munich |
Pérez-Suay, Daniel | Technical University of Munich |
Naceri, Abdeldjallil | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
Keywords: Prosthetics and Exoskeletons, Neural and Fuzzy Control, Manipulation Planning
Abstract: In this work, an upgraded exoskeleton design is presented with enhanced trajectory tracking and mechanical transparency. Compared to the first version, the design features a 3-DoF actuated shoulder joint and a mechanism to regulate the pretension of Bowden cables. Force/torque sensors are installed to directly measure the interaction forces between the human arm and the exoskeleton at the connecting points. Three control strategies were evaluated to follow a desired trajectory; A PD controller, a PD controller with friction observer, and an adaptive controller based on Radial Basis Function (RBF). These strategies also form the basis for an admittance control, aimed at improving the exoskeleton's mechanical transparency during interaction with the human arm. Simulations and experimental results demonstrate that the PD control, supported by friction estimation via a momentum observer, achieves superior tracking performance. Moreover, the system's mechanical transparency is enhanced using the admittance RBF-based controller, showing marginally superior results.
|
|
14:00-15:00, Paper TuPIT2.10 | |
Design of Upper-Limb Exoskeleton with Distal Branching Link Mechanism for Bilateral Operation of Humanoid Robots |
|
Yoshioka, Hiroki | The University of Tokyo |
Hiraoka, Naoki | The University of Tokyo |
Kojima, Kunio | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Prosthetics and Exoskeletons, Mechanism Design, Humanoid Robot Systems
Abstract: Exoskeletons for robot operation necessitate shoulders with high range of motions and high degrees of freedom to fit the operator's shoulder girdle.These shoulder joints need high torque for force feedback on the operator. Existing exoskeletons struggle to simultaneously meet these requirements of high DOFs, wide ROM, and high torque due to spatial constraints. This study introduces an exoskeleton with a distal branching link mechanism that addresses this issue by concentrating on each link's absolute and relative degrees of freedom. In the proposed exoskeleton, the end-effector's absolute DOF, the forearm's absolute DOF, and the end-effector and forearm's relative DOF are matched between the operator and the exoskeleton. This is achieved while reducing the overall DOF by sharing the root link system's DOF. Furthermore, by avoiding direct attachment of the operator to the exoskeleton's shoulder, the design can accommodate the human shoulder's high torque and high ROM. The study demonstrates that the branching exoskeleton outperforms existing fixed-linkage exoskeletons in terms of tracking the operator's arms and the torque required by the exoskeleton's joints. Utilizing this exoskeleton, we successfully maneuvered an actual humanoid robot to perform daily activities where the forearm posture is crucial.
|
|
14:00-15:00, Paper TuPIT2.11 | |
Functional Kinematic and Kinetic Requirements of the Upper Limb During Activities of Daily Living: A Recommendation on Necessary Joint Capabilities for Prosthetic Arms |
|
Herneth, Christopher | Technical University Munich |
Ganguly, Amartya | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
Keywords: Prosthetics and Exoskeletons, Datasets for Human Motion, Modeling and Simulating Humans
Abstract: Prosthetic limb abandonment remains an unsolved challenge as amputees consistently reject their devices. Current prosthetic designs often fail to balance human-like perfor- mance with acceptable device weight, highlighting the need for optimised designs tailored to modern tasks. This study aims to provide a comprehensive dataset of joint kinematics and kinetics essential for performing activities of daily living (ADL), thereby informing the design of more functional and user-friendly prosthetic devices. Functionally required Ranges of Motion (ROM), velocities, and torques for the Glenohumeral (rotation), elbow, Radioulnar, and wrist joints were computed using motion capture data from 12 subjects performing 24 ADLs. Our approach included the computation of joint torques for varying mass and inertia properties of the upper limb, while torques induced by the manipulation of experimental objects were considered by their interaction wrench with the subject’s hand. Joint torques pertaining to individual ADL scaled linearly with limb and object mass and mass distribution, permitting their generalisation to not explicitly simulated limb and object dynamics with linear regressors (LRM), exhibiting coefficients of determination R = 0.99 ± 0.01. Exemplifying an application of data-driven prosthesis design, we optimise wrist axes orientations for two serial and two differential joint con- figurations. Optimised axes reduced peak power requirements, between 22% to 38% compared to anatomical configurations, by exploiting high torque correlations (r = −0.84, p < 0.05) between Ulnar deviation and wrist flexion/extension joints. This study offers critical insights into the functional requirements of upper limb prostheses, providing a valuable foundation for data-driven prosthetic design that addresses key user concerns and enhances device adoption.
|
|
14:00-15:00, Paper TuPIT2.12 | |
Optimal Integration of Hybrid FES-Exoskeleton for Precise Knee Trajectory Control |
|
Jafaripour, Masoud | University of Alberta |
Mushahwar, Vivian K. | University of Alberta |
Tavakoli, Mahdi | University of Alberta |
Keywords: Prosthetics and Exoskeletons, Rehabilitation Robotics, Optimization and Optimal Control
Abstract: This paper introduces a novel hybrid torque allocation method for improving wearability and mobility in integrated functional electrical stimulation (FES) of the quadriceps muscles and powered exoskeleton systems. Our proposed approach leverages a hierarchical closed-loop controller for knee joint position tracking while addressing limitations of powered exoskeletons and FES systems by reducing power consumption and battery size and by mitigating FES-induced muscle fatigue, respectively. The core component is a model-free optimization algorithm that dynamically distributes torque between FES and the exoskeleton by considering tracking error, effort, and the prediction of muscle fatigue in the cost function, computing allocation gain in an online manner. The online optimization approach interactively changes the optimal allocation gain by taking into account the instantaneous value of error and effort and also penalizing FES-induced fatigue, a common challenge in long-duration experiments. The results demonstrate that this dynamic allocation significantly improves system wearability by reducing power consumption without increasing muscle fatigue during the extension phase of walking. This hybrid control approach contributes to improving exoskeleton wearability and rehabilitation outcomes for individuals with SCI and mobility impairments, enhancing assistive technology and quality of life.
|
|
14:00-15:00, Paper TuPIT2.13 | |
Enhancing Prosthetic Safety and Environmental Adaptability: A Visual-Inertial Prosthesis Motion Estimation Approach on Uneven Terrains |
|
Chen, Chuheng | Southern University of Science and Technology |
Chen, Xinxing | Southern University of Science and Technology |
Yin, Shucong | Southern University of Science and Technology |
Wang, Yuxuan | The Southern University of Science and Technology |
Huang, Binxin | Southern University of Science and Technology |
Leng, Yuquan | Southern University of Science and Technology |
Fu, Chenglong | Southern University of Science and Technology (SUSTech) |
Keywords: Prosthetics and Exoskeletons, Visual Tracking, Sensor Fusion
Abstract: Environment awareness is crucial for enhancing walking safety and stability of amputee wearing powered prosthesis when crossing uneven terrains such as stairs and obstacles. However, existing environmental perception systems for prosthesis only provide terrain types and corresponding parameters, which fail to prevent potential collisions when crossing uneven terrains and may lead to falls and other severe consequences. In this paper, a visual-inertial motion estimation approach is proposed for prosthesis to perceive its movement and the changes of spatial relationship between the prosthesis and uneven terrain when traversing them. To achieve this, we estimate the knee motion by utilizing a depth camera to perceive the environment and align feature points extracted from uneven terrains. Subsequently, an error-state Kalman filter is incorporated to fuse the inertial data into visual estimations to obtain a more robust and accurate estimation, which is then utilized to derive the motion of the whole prosthesis for our prosthetic control scheme. Experiments conducted on our collected dataset and stair walking trials with powered prosthesis show that the proposed method can accurately track the motion of human leg and the prosthesis with the average root-mean-square error of toe trajectory less than 5 cm. The proposed method is expected to enable the environmental adaptive control for prosthesis, thereby enhancing amputee’s safety and mobility in uneven terrains.
|
|
14:00-15:00, Paper TuPIT2.14 | |
Using Hip Assisted Running Exoskeleton with Impact Isolation Mechanism to Improve Energy Efficiency |
|
Wang, Ziqi | Harbin Institute of Technology |
Liu, Junchen | Harbin Institute of Technology |
Li, Hongwu | Harbin Institute of Technology |
Zhang, Qinghua | Harbin Institute of Technology |
Li, Xianglong | Harbin Institute of Technology |
Huang, Yi | Harbin Institute of Technology |
Ju, Haotian | Harbin Institute of Technology |
Zheng, Tianjiao | Harbin Institute of Technology |
Zhao, Jie | Harbin Institute of Technology |
Zhu, Yanhe | Harbin Institute of Technology |
Keywords: Prosthetics and Exoskeletons, Wearable Robotics, Humanoid Robot Systems
Abstract: Research has indicated that exoskeletons can assist human movement, but due to the influence of additional weight and challenges in control strategy design, only a few exoskeletons effectively reduce the wearers' metabolic costs during running. This paper proposes an innovative and efficient hip-assisted running exoskeleton (HARE) designed to facilitate the flexion and extension movements of the joint along the sagittal plane. In the field of structural engineering, we propose implementing an active-passive combination constant force suspension system, hereinafter referred to as CFS, to effectively mitigate the impact of inertial forces during running. The decoupled transmission mechanism allows the CFS and assist mechanisms to operate independently, ensuring the tension of the cables. The flexible structural design can reduce the locomotion limitation on human bodies and reduce the additional energy burden on the body. In control strategy designing, the joint torque-generating strategy provides personalized assistance strategies for wearers to actively optimize the control parameters. Meanwhile, the safety control strategy based on abnormal gait recognition can ensure human safety. Experiments have shown that compared to not wearing exoskeletons, this device can reduce the energy consumption of the human body by 5.33% at a speed of 9 km/h. This demonstrates its potential in human motion assistance processes.
|
|
14:00-15:00, Paper TuPIT2.15 | |
A Large Vision-Language Model Based Environment Perception System for Visually Impaired People |
|
Chen, Zezhou | China Unicom |
Liu, Zhaoxiang | China Unicom |
Wang, Kai | China Unicom |
Wang, Kohou | Chinaunicom |
Lian, Shiguo | China Unicom |
Keywords: Wearable Robotics, Physically Assistive Devices
Abstract: It is a challenging task for visually impaired people to perceive their surrounding environment due to the complexity of the natural scenes. Their personal and social activities are thus highly limited. This paper introduces a Large Vision-Language Model(LVLM) based environment perception system which helps them to better understand the surrounding environment, by capturing the current scene they face with a wearable device, and then letting them retrieve the analysis results through the device. The visually impaired people could acquire a global description of the scene by long pressing the screen to activate the LVLM output, retrieve the categories of the objects in the scene resulting from a segmentation model by tapping or swiping the screen, and get a detailed description of the objects they are interested in by double-tapping the screen. To help visually impaired people more accurately perceive the world, this paper proposes incorporating the segmentation result of the RGB image as external knowledge into the input of LVLM to reduce the LVLM’s hallucination. Technical experiments on POPE, MME and LLaVA-QA90 show that the system could provide a more accurate description of the scene compared to Qwen-VL-Chat, exploratory experiments show that the system helps visually impaired people to perceive the surrounding environment effectively.
|
|
TuPIT3 |
Room 3 |
Bioinspired Robotics |
Teaser Session |
Co-Chair: Ijspeert, Auke | EPFL |
|
14:00-15:00, Paper TuPIT3.1 | |
Modeling of Hydraulic Soft Hand with Rubber Sheet Reservoir and Evaluation of Its Grasping Flexibility and Control |
|
Ishibashi, Kyosuke | The University of Tokyo |
Ishikawa, Hiroki | The University of Tokyo |
Azami, Osamu | Staff Service-Engineering |
Yamamoto, Ko | University of Tokyo |
Keywords: Modeling, Control, and Learning for Soft Robots, Hydraulic/Pneumatic Actuators, Grippers and Other End-Effectors
Abstract: In situations where robots work alongside humans, they must be capable of responding flexibly to unexpected external forces. To address this challenge, researchers have conducted numerous studies on soft robotics. However, most of the soft hands studied so far are powered by pneumatic pressure and can only exert pressure up to a several hundred kPa, resulting in low output. To solve this problem, we have developed a hydraulic soft hand in our previous research. In this paper, we derive the relationship between driving pressure, bending angle, and grasping force of a soft hand with a reservoir to evaluate the effect of a rubber sheet reservoir. Additionally, we experimentally show that the soft hand provides grasping flexibility when angle control is applied using the model proposed in this paper.
|
|
14:00-15:00, Paper TuPIT3.2 | |
Manta Ray-Inspired Soft Robotic Swimmer for High-Speed and Multi-Modal Swimming |
|
Xu, Zefeng | South China University of Technology |
Liang, Jiaqiao | South China University of Technology |
Zhou, Yitong | South China University of Technology |
Keywords: Soft Robot Applications, Soft Robot Materials and Design, Biologically-Inspired Robots
Abstract: Manta rays exhibit complex motion behavior through their flexible fins. This study proposes a novel manta ray-inspired soft robotic swimmer with bistable flapping wings for high-speed and multi-modal swimming. The wings are created with prestressed bistable composite and actuated by small McKibben artificial muscles. Pressurizing and depressurizing the McKibben actuators integrated into the flapping wings generates alternating snap-throughs between two stable states, yielding swimming. Experiments are set up and conducted to analyze how the robot responses vary as a function of input pressures and actuation frequencies for both bistable and monostable modes. Experimental results show that the highest swimming velocity is 0.58 body lengths (BL) per second (equivalent to 12.23 cm/s), and the maximum turning angle speed is 22.5°per second with a smaller turning radius by holding the fins in asymmetric positions for bistable modes. Multimodal swimming motions are achieved including forward and backward translating, turning, and flip-turning.
|
|
14:00-15:00, Paper TuPIT3.3 | |
Harnessing Symmetry Breaking in Soft Robotics: A Novel Approach for Underactuated Fingers |
|
Hashem, Ryman | University of College London |
Howison, Toby | University of Cambridge |
Stilli, Agostino | University College London |
Stoyanov, Danail | University College London |
Xu, Peter | Auckland University |
Iida, Fumiya | University of Cambridge |
Keywords: Soft Robot Applications, Actuation and Joint Mechanisms, Soft Robot Materials and Design
Abstract: Soft robotics, an emerging domain in modern robotics, introduces innovative possibilities alongside challenges in controllability, particularly with multi-degree inflatable actuators. We present a novel manipulation method using underactuated soft fingers that addresses these challenges by harnessing symmetry breaking. Central to our approach is the mechanism of self-organization within a ring actuator equipped with five fingers. Typically considered a drawback, we exploit the actuator's buckling behavior to facilitate in-hand manipulation. This strategic utilization enables object motion in both clockwise and counterclockwise directions via system perturbations and adjustments in frequency and duty cycle parameters. Employing the self-organizing properties of our actuator, our method is empirically validated through simulations and real actuator experiments, demonstrating the system's ability in manipulating objects by leveraging the inherent flexibility and morphological advantages. The design enables two degrees of freedom with minimal input, allowing objects to rotate due to the actuator's self-organizing actions. This simplification of control mechanisms is essential for soft robotics manipulation. Our findings indicate that control systems in soft robotics can be significantly simplified, harnessing the adaptable behavior inherent in its morphology.
|
|
14:00-15:00, Paper TuPIT3.4 | |
PINN-Ray: A Physics-Informed Neural Network to Model Soft Robotic Fin-Ray Fingers |
|
Wang, Xing | CSIRO |
Dabrowski, Joel Janek | CSIRO |
Pinskier, Joshua | CSIRO |
Liow, Lois | CSIRO |
Viswanathan, VinothKumar | CSIRO |
Scalzo, Richard | CSIRO |
Howard, David | CSIRO |
Keywords: Soft Robot Materials and Design, Modeling, Control, and Learning for Soft Robots, Soft Robot Applications
Abstract: Modelling complex deformation for soft robotics provides a guideline to understand their behaviour, leading to safe interaction with the environment. However, building a surrogate model with high accuracy and fast inference speed can be challenging for soft robotics due to the nonlinearity from complex geometry, large deformation, material nonlinearity etc. The reality gap from surrogate models also prevents their further deployment in the soft robotics domain. In this study, we proposed a Physics Informed Neural Networks (PINNs) named PINN-Ray to model complex deformation for a fin-ray soft robotic gripper, which embeds the minimum potential energy principle from elastic mechanics and additional high-fidelity experimental data into the loss function of neural network for training. This method is significant in terms of its generalisation to complex geometry and robust to data scarcity as compared to other data-driven neural networks. Furthermore, it has been extensively evaluated to model the deformation of the fin-ray finger under external actuation. PINN-Ray demonstrates improved accuracy as compared with Finite element modelling (FEM) after applying the data assimilation scheme to treat the sim-to-real gap. Additionally, we introduced our automated framework to design, fabricate soft robotic fingers, and characterise their deformation by visual tracking, which provides a guideline for the fast prototype of soft robotics.
|
|
14:00-15:00, Paper TuPIT3.5 | |
Single Actuator Undulation Soft-Bodied Robots Using a Precompressed Variable Thickness Flexible Beam |
|
Ta, Tung D. | The University of Tokyo |
Keywords: Soft Robot Materials and Design, Mechanism Design, Biologically-Inspired Robots
Abstract: Soft robots - due to their intrinsic flexibility of the body - can adaptively navigate unstructured environments. One of the most popular locomotion gaits that has been implemented in soft robots is undulation. The undulation motion in soft robots resembles the locomotion gait of stringy creatures such as snakes, eels, and C.~Elegans. Typically, the implementation of undulation locomotion on a soft robot requires many actuators to control each segment of the stringy body. The added weight of multiple actuators limits the navigating performance of soft-bodied robots. In this paper, we propose a simple tendon-driven flexible beam with only one actuator (a DC motor) that can generate a mechanical traveling wave along the beam to support the undulation locomotion of soft robots. The beam will be precompressed along its vertical axis to form an S-shape, thus pretensioning the tendons. The motor will wind and unwind the tendons to deform the flexible beam and generate traveling waves along the body of the robot. We experiment with different pre-tension to characterize the relationship between tendon pre-tension forces and the DC-motor winding/unwinding. Our proposal enables a simple implementation of undulation motion to support the locomotion of soft-bodied robots.
|
|
14:00-15:00, Paper TuPIT3.6 | |
CompdVision: Combining Near-Field 3D Visual and Tactile Sensing Using a Compact Compound-Eye Imaging System |
|
Luo, Lifan | The Hong Kong University of Science and Technology |
Zhang, Boyang | The Hong Kong University of Science and Technology |
Peng, Zhijie | Hong Kong University of Science and Technology |
Cheung, Yik Kin | The Hong Kong University of Science and Technology |
Zhang, Guanlan | The Hong Kong University of Science and Technology |
Li, Zhigang | Hong Kong Univ Sci Tech |
Wang, Michael Yu | Mywang@gbu.edu.cn |
Yu, Hongyu | The Hong Kong University of Science and Technology |
Keywords: Soft Sensors and Actuators
Abstract: As automation technologies advance, the need for compact and multi-modal sensors in robotic applications is growing. To address this demand, we introduce CompdVision, a novel sensor that employs a compound-eye imaging system to combine near-field 3D visual and tactile sensing within a compact form factor. CompdVision utilizes two types of vision units to address diverse sensing needs, eliminating the need for complex modality conversion. Stereo units with far-focus lenses can see through the transparent elastomer for depth estimation beyond the contact surface. Simultaneously, tactile units with near-focus lenses track the movement of markers embedded in the elastomer to obtain contact deformation. Experimental results validate the sensor's superior performance in 3D visual and tactile sensing, proving its capability for reliable external object depth estimation and precise measurement of tangential and normal contact forces. The dual modalities and compact design make the sensor a versatile tool for robotic manipulation.
|
|
14:00-15:00, Paper TuPIT3.7 | |
A Perceptive Pneumatic Artificial Muscle Empowered by Double Helix Fiber Reinforcement |
|
Wang, Yufeng | University of Science and Technology of China |
Wu, Houping | University of Science and Technology of China |
Li, Chenchen | University of Science and Technology of China |
Peng, Yu Lian | University of Science and Technology of China |
Wang, Hongbo | University of Science and Technology of China |
Keywords: Soft Sensors and Actuators, Biologically-Inspired Robots
Abstract: In the last decades, soft robotics has been growing rapidly as an emerging research topic, bringing new paradigms for robotic manipulation, locomotion, and human‒machine interactions. Pneumatic artificial muscle is a powerful, lightweight, rapid response with great design flexibility, making it promising for developing biological muscle-like robotic systems. The PPAM is made of a silicone tube body with double helix coil fiber reinforcement. The double helix coil fiber restricts the radial expansion of the cylinder tube to achieve extension in actuation, and monitors the muscle length change in real time by measuring its inductance. A finite element model was built to simulate the actuation characteristics of the PPAM. A theoretical formula was derived to analyze the inductive length sensing response of the double-helix coil on the PPAM. It is verified that the PPAM can sense its length change regardless of whether it is caused by active driving or external manipulation. Rigorous testing reveals that PPAM has an ultrahigh length sensing resolution of 5.9 µm in relaxed state, with a short response time of 50 ms. The self-length sensing of PPAM is hysteresis free, and highly repeatable, showing no degradation in 1000 operation cycles. In summary, the PPAM shows promising features for developing the next-generation perceptive and responsive soft robots, intelligent hybrid robots, or safer biomedical instruments.
|
|
14:00-15:00, Paper TuPIT3.8 | |
Climbing Gait for a Snake Robot by Adapting to a Flexible Net |
|
Yoshida, Kodai | The University of Electro-Communications |
Tanaka, Motoyasu | The Univ. of Electro-Communications |
Keywords: Biologically-Inspired Robots, Redundant Robots, Climbing Robots
Abstract: This paper introduces climbing gait for the snake robot by adapting to a flexible net. A net deforms in various ways due to external forces. Therefore, we tilt snake robot's body to adapt to the deformation of the net. In addition, we can adjust the position of the head of a snake robot passing through the mesh. A snake robot can move not only vertically but also horizontally. We demonstrated the validity of the proposed method through experiments in vertical movement and movement in diagonal direction.
|
|
14:00-15:00, Paper TuPIT3.9 | |
A Biomimetic Robot Crawling Upstream Using Adhesive Suckers Inspired by Net-Winged Midge Larvae |
|
Xu, Haoyuan | Beihang University of Mechanical Engineering and Automation |
Zhao, Shuyong | Beihang University |
Zhi, Jiale | Beihang University |
Bi, Chongze | Beihang University |
Wen, Li | Beihang University |
Keywords: Biologically-Inspired Robots, Biomimetics, Soft Robot Applications
Abstract: Net-winged midge larvae (genus Liponeura) can achieve robust attachment and crawl on the slippery surface in the fast stream with their powerful abdominal suckers. The rigid spine-like structures distributed in the sucker cavity called microtrichia, have been proven to be crucial in the adhesion process. In this work, we carry out a design of the biomimetic sucker with the spine-like structures and then implement various tests using biomimetic suckers to verify the adhesion capacity enhancement brought by spine-like structures. Finally, we assemble the suckers in a quadruped crawling robot capable of locomotion in both aerial and aquatic environments with a speed of 56.4 mm/s (0.225 BL/s) and can crawl upstream against a turbulent flow with a speed of 38.2 mm/s (0.152 BL/s). This study will inspire biomimetic design in future robotics, and pave the way for future robots to realize long-term observation and monitoring in complex environments.
|
|
14:00-15:00, Paper TuPIT3.10 | |
Tension Feedback Control for Musculoskeletal Quadrupedal Locomotion Over Uneven Terrain |
|
Tanaka, Hiroaki | Osaka University |
Matsumoto, Ojiro | Osaka University |
Kawasetsu, Takumi | Osaka University |
Hosoda, Koh | Kyoto University |
Keywords: Biologically-Inspired Robots, Legged Robots, Modeling, Control, and Learning for Soft Robots
Abstract: Musculoskeletal quadruped robots driven by pneumatic artificial muscles (PAMs) have great softness. Due to the softness, the proprioceptive information of PAMs (e.g. tension) reflects the environmental information. However, how to utilize this information for stable quadrupedal gait has been rarely explored. In this work, we utilized PAM tension for stable locomotion over uneven terrain. We newly developed a durable tension sensor and proposed tension feedback control for quadruped locomotion over uneven terrain. Our proposed controller stabilizes the trunk posture by modulating the phase of the leg. To verify the effectiveness of the proposed controller, we implement it in a simple quadrupedal model and a musculoskeletal quadruped robot driven by PAMs. The model and robot ran over uneven terrain with and without tension feedback. Through experiments, with tension feedback, the trunk posture oscillated more stably than that without the feedback. Furthermore, the running velocity with tension feedback was higher and had a smaller variance than that without the feedback in the robot experiment. This successful result will lead to more robust musculoskeletal quadruped robots that can be employed in the real-world environment.
|
|
14:00-15:00, Paper TuPIT3.11 | |
An Active and Dexterous Bionic Torso for a Quadruped Robot |
|
Li, Ruyue | Chang'an University |
Zhu, Yaguang | Chang'an University |
Wang, Yuntong | Chang'an University |
He, Zhimin | Chang'an University |
Zhou, Mengnan | Chang'an University |
Keywords: Biologically-Inspired Robots, Biomimetics, Legged Robots
Abstract: The torso of quadruped mammals serves as a robust foundation for their bodies, enabling a diverse array of agile and intricate movements. However, current quadruped robots cannot match the extensive range of motion exhibited by biological torsos while also serving a load-bearing role. Therefore, this paper presents an active bionic torso that emulates the quadrupedal animal's spinal column and associated muscular structure. This innovative torso can mimic animal torsos movements, including flexing, extending, lateral bending, and axial rotation. A thorough analysis of the torso's kinetic, workspace, and structural dynamics has been conducted. This proposed torso boasts a considerable load-bearing capacity and can support a load that exceeds its weight tenfold. The passive spring incorporated into the bionic torso emulates the intervertebral discs' shock-absorbing and load-bearing functions. Additionally, this paper documents the development of a quadrupedal robot fitted with the proposed bionic torso, demonstrating the torso's mobility in a real-world application.
|
|
14:00-15:00, Paper TuPIT3.12 | |
An Agile Robotic Penguin Driven by Submersible Geared Servomotors: Various Maneuvers by Active Feathering of the Wings |
|
Shimooka, Taiki | Institute of Science Tokyo |
Kakogawa, Atsushi | Ritsumeikan University |
Tanaka, Hiroto | Institute of Science Tokyo |
Keywords: Biologically-Inspired Robots, Marine Robotics, Mechanism Design
Abstract: This study introduces an agile robotic penguin featuring a pair of 2-degrees of freedom (DoF) wing mechanisms. Each wing can independently control flapping and feathering motions with two in-house submersible geared servomotors through a differential gear mechanism. Notably, our mechanism allows unrestricted feathering beyond 360°. Since feathering directly changes the wing’s angle of attack (AoA), the hydrodynamic forces can be significantly adjusted to achieve agile maneuvers. This paper demonstrates various maneuvers, including rapid acceleration, hard braking, rolling, pitching, and yawing, achieved solely by changing the feathering motion of both wings. Our robotic penguin reached a maximum forward speed of 1.8 m/s, comparable to the foraging speed of real penguins. The average roll, pitch, and yaw rates were 363°/s, 75°/s, and 92°/s, respectively. This robot serves as a model for the biological study of maneuverability in real penguins and the engineering exploration of bioinspired agile underwater robots.
|
|
14:00-15:00, Paper TuPIT3.13 | |
Loco-Manipulation with Nonimpulsive Contact-Implicit Planning in a Slithering Robot |
|
Salagame, Adarsh | Northeastern University |
Gangaraju, Kruthika | Northeastern University |
Nallaguntla, Harin Kumar | Northeastern University |
Sihite, Eric | California Institute of Technology |
Schirner, Gunar | Northeastern U., Dept. of Electrical and Computer Engineering |
Ramezani, Alireza | Northeastern University |
Keywords: Biologically-Inspired Robots, Optimization and Optimal Control, Motion and Path Planning
Abstract: Object manipulation has been extensively studied in the context of fixed base and mobile manipulators. However, the overactuated locomotion modality employed by snake robots allows for a unique blend of object manipulation through locomotion, referred to as loco-manipulation. The following work presents an optimization approach to solving the loco-manipulation problem based on non-impulsive implicit contact path planning for our snake robot COBRA. We present the mathematical framework and show high-fidelity simulation results and experiments to demonstrate the effectiveness of our approach.
|
|
14:00-15:00, Paper TuPIT3.14 | |
An Ejecting System for Autonomous Takeoff of Flapping-Wing Robots |
|
Jiang, Xu | Southeast University |
Zhang, Jun | Southeast University |
Song, Aiguo | Southeast University |
Keywords: Biologically-Inspired Robots, Field Robots, Mechanism Design
Abstract: Autonomous takeoff of flapping-wing robots (FWRs) is crucial for accelerating response speed and reducing costs in executing tasks. Jumping-aided takeoff is an effective method adopted by birds. However, the limited power density of motors poses challenges in achieving this type of takeoff for FWRs. In this study, we introduce a ground-based FWR ejecting system that utilizes a symmetric slider-crank mechanism (S-SCM) to store energy in spring. This stored energy is then converted into the takeoff speed of the FWR. The dynamic model of each working stage is established, and the design parameters are optimized according to simulation results. The prototype of the FWR ejecting system is fabricated for experimental validations. The results indicate that the system can provide a takeoff speed of 4 m/s for the 270 g FWR. Notably, the system is deployable on rough terrains and only adds a 3.2 g payload to the robot. Our work advances the autonomous takeoff of FWRs, promoting the application of such robots.
|
|
14:00-15:00, Paper TuPIT3.15 | |
A Robust Visual SLAM System for Small-Scale Quadruped Robots in Dynamic Environments |
|
Li, Chengyang | Beijing Institute of Technology |
Zhang, Yulai | Beijing Institute of Technology |
Yu, Zhiqiang | Beijing Institute of Technology |
Liu, Xinming | Beijing Institute of Technology |
Shi, Qing | Beijing Institute of Technology |
Keywords: Biologically-Inspired Robots, Visual-Inertial SLAM, Localization
Abstract: This paper presents a robust visual SLAM system designed for small-scale quadruped robots (ViQu-SLAM) for accurate localization, especially to mitigate the issue of erroneous data association caused by moving objects in dynamic environments. The proposed approach leverages a self-adaptive framework that integrates semantic segmentation with alterations in the spatial location of categorized map points. Besides, combination of leg odometry derived from forward kinematics with IMU provides scale information for positional transformations between keyframes, thus optimizing the overall localization accuracy of quadruped robots. At last, we performed evaluation across various stages and the results demonstrate competitive performance, with 53.16% reduction in average absolute trajectory error compared to that of ORB-SLAM3 in dynamic benchmark datasets. As a result, ViQu-SLAM, including visual and IMU-fused leg odometry, exhibits promising results on a small quadruped robot, reducing positioning errors in dynamic scenes by an average of 29.36% compared to existing state-of-the-art methods.
|
|
14:00-15:00, Paper TuPIT3.16 | |
Construction of Musculoskeletal Simulation for Shoulder Complex with Ligaments and Its Validation Via Model Predictive Control |
|
Sahara, Yuta | The University of Tokyo |
Miki, Akihiro | The University of Tokyo |
Ribayashi, Yoshimoto | The University of Tokyo |
Yoshimura, Shunnosuke | The University of Tokyo |
Kawaharazuka, Kento | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Biomimetics, Modeling and Simulating Humans, Human and Humanoid Motion Analysis and Synthesis
Abstract: The complex ways in which humans utilize their bodies in sports and martial arts are remarkable, and human motion analysis is one of the most effective tools for robot body design and control. On the other hand, motion analysis is not easy, and it is difficult to measure complex body motions in detail due to the influence of numerous muscles and soft tissues, mainly ligaments. In response, various musculoskeletal simulators have been developed and applied to motion analysis and robotics. However, none of them reproduce the ligaments but only the muscles, nor do they focus on the shoulder complex, including the clavicle and scapula, which is one of the most complex parts of the body. Therefore, in this study, a detailed simulation model of the shoulder complex including ligaments is constructed. The model will mimic not only the skeletal structure and muscle arrangement but also the ligament arrangement and maximum muscle strength. Through model predictive control based on the constructed simulation, we confirmed that the ligaments contribute to joint stabilization in the first movement and that the proper distribution of maximum muscle force contributes to the equalization of the load on each muscle, demonstrating the effectiveness of this simulation.
|
|
TuPIT4 |
Room 4 |
Visual Learning |
Teaser Session |
Co-Chair: Shafique, Muhammad | New York University Abu Dhabi |
|
14:00-15:00, Paper TuPIT4.1 | |
DECADE: Towards Designing Efficient-Yet-Accurate Distance Estimation Modules for Collision Avoidance in Mobile Advanced Driver Assistance Systems |
|
Shahzad, Muhammad Zaeem | New York University Abu Dhabi |
Hanif, Muhammad Abdullah | New York University Abu Dhabi (NYUAD) |
Shafique, Muhammad | New York University Abu Dhabi |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception
Abstract: The proliferation of smartphones and other mobile devices provides a unique opportunity to make Advanced Driver Assistance Systems (ADAS) accessible to everyone in the form of an application empowered by low-cost Machine/Deep Learning (ML/DL) models to enhance road safety. For the critical feature of Collision Avoidance in Mobile ADAS, lightweight Deep Neural Networks (DNN) for object detection exist, but conventional pixel-wise depth/distance estimation DNNs are vastly more computationally expensive making them unsuitable for a real-time application on resource-constrained devices. In this paper, we present a distance estimation model, DECADE, that processes each detector output instead of constructing pixel-wise depth/disparity maps. In it, we propose a pose estimation DNN to estimate allocentric orientation of detections to supplement the distance estimation DNN in its prediction of distance using bounding box features. We demonstrate that these modules can be attached to any detector to extend object detection with fast distance estimation. Evaluation of the proposed modules with attachment to and fine-tuning on the outputs of the YOLO object detector on the KITTI 3D Object Detection dataset achieves state-of-the-art performance with 1.38 meters in Mean Absolute Error and 7.3% in Mean Relative Error in the distance range of 0-150 meters. Our extensive evaluation scheme not only evaluates class-wise performance, but also evaluates range-wise accuracy especially in the critical range of 0-70m.
|
|
14:00-15:00, Paper TuPIT4.2 | |
Masked Mutual Guidance Transformer Tracking |
|
Fan, Baojie | Nanjing University of Posts and Telecommunications |
Wang, Zhiquan | Nanjing University of Posts and Telecommunications |
Ai, Jiajun | Nanjing University of Posts and Telecommunications |
Zhang, Caiyu | Nanjing University of Posts and Telecommunications |
Keywords: Visual Tracking, Deep Learning for Visual Perception, Computer Vision for Transportation
Abstract: Visual mask learning has received increasing attention in the field of visual object tracking. However, most existing studies merely utilize visual mask learning works as pre-training models without fully exploiting their potential for visual representation. In this paper, we present a novel approach for learning tracking target features, leveraging an encoderdecoder architecture with a masked mutual guidance tracking(MMG). Initially, we perform joint visual feature extraction on both the template and search areas. Subsequently, these features undergo separate self-decoding processes, followed by mutual guidance decoding to reconstruct the original search and template images. This process fosters mutual understanding between the images, facilitating improved learning of object states and shapes across different frames. During the inference phase, we offload the decoder and implement a simple and effective tracker. Experimental results indicate that our proposed method is effective that the mutual guidance strategy can achieve state-of-the-art performance on five tracking datasets.
|
|
14:00-15:00, Paper TuPIT4.3 | |
BEV-ODOM: Reducing Scale Drift in Monocular Visual Odometry with BEV Representation |
|
Wei, Yufei | Zhejiang University |
Lu, Sha | Zhejiang University |
Han, Fuzhang | Zhejiang University |
Xiong, Rong | Zhejiang University |
Wang, Yue | Zhejiang University |
Keywords: Visual Tracking, Localization, SLAM
Abstract: Monocular visual odometry (MVO) is vital in autonomous navigation and robotics, providing a cost-effective and flexible motion tracking solution, but the inherent scale ambiguity in monocular setups often leads to cumulative errors over time. In this paper, we present BEV-ODOM, a novel MVO framework leveraging the Bird's Eye View (BEV) Representation to address scale drift. Unlike existing approaches, BEV-ODOM integrates a depth-based perspective-view (PV) to BEV encoder, a correlation feature extraction neck, and a CNN-MLP-based decoder, enabling it to estimate motion across three degrees of freedom without the need for depth supervision or complex optimization techniques. Our framework reduces scale drift in long-term sequences and achieves accurate motion estimation across various datasets, including NCLT, Oxford, and KITTI. The results indicate that BEV-ODOM outperforms current MVO methods, demonstrating reduced scale drift and higher accuracy.
|
|
14:00-15:00, Paper TuPIT4.4 | |
DailySTR: A Daily Human Activity Pattern Recognition Dataset for Spatio-Temporal Reasoning |
|
Qiu, Yue | National Institute of Advanced Industrial Science and Technology |
Egami, Shusaku | National Institute of Advanced Industrial Science and Technology |
Fukuda, Ken | National Institute of Advanced Industrial Science and Technology |
Miyata, Natsuki | Inst. of Advanced Industrial Sci. & Tech |
Yagi, Takuma | National Institute of Advanced Industrial Science and Technology |
Hara, Kensho | National Institute of Advanced Industrial Science and Technology |
Iwata, Kenji | AIST |
Sagawa, Ryusuke | National Institute of Advanced Industrial Science AndTechnology |
Keywords: Computer Vision for Automation, Visual Learning, Deep Learning for Visual Perception
Abstract: Recognizing daily human activities is essential for domestic robots to assist humans effectively in indoor environments. These activities typically involve sequences of interactions between humans and objects across different locations and times within a household. Identifying these events and understanding their temporal and spatial relationships is crucial for accurately modeling human behavior patterns. However, most current methods and datasets for human activity recognition focus on identifying singular events at specific moments and locations, neglecting the complexity of activities that span multiple times and places. To address this gap, we collected data on human activity patterns over a single day through crowdsourcing. Based on this, we introduce a novel synthetic video question-answering dataset. Our proposed dataset includes videos of daily activities accompanied by question-answer pairs that require models to reason about sequences of activities in both time and space. We evaluated state-of-the-art methods against our dataset, highlighting their limitations in handling the intricate spatio-temporal dynamics of human activity sequences. To improve upon these methods, we propose a two-stage model. The proposed model initially decodes the detailed content of individual videos using a transformer-based approach, then employs LLMs for advanced spatio-temporal reasoning across multiple videos. We hope our research provides valuable benchmarks and insights, paving the way for advancements in the recognition of daily human activity patterns.
|
|
14:00-15:00, Paper TuPIT4.5 | |
Visual Imitation Learning of Task-Oriented Object Grasping and Rearrangement |
|
Cai, Yichen | Karlsruhe Institute of Technology |
Gao, Jianfeng | Karlsruhe Institute of Technology (KIT) |
Pohl, Christoph | Karlsruhe Institute of Technology (KIT) |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Deep Learning in Grasping and Manipulation, Imitation Learning, Perception for Grasping and Manipulation
Abstract: Task-oriented object grasping and rearrangement are key skills for robots, which have to perform versatile real-world manipulation tasks. However, they remain challenging due to partial observations of the objects and shape variations in categorical objects. In this paper, we present the Multi-feature Implicit Model (MIMO), a novel object representation that encodes multiple spatial features between a point and an object in an implicit neural field. Training such a model on multiple features ensures that it embeds the object shapes consistently in different aspects, thus improving its performance in object shape reconstruction from partial observation, shape similarity measure, and modeling spatial relations between objects. Based on MIMO, we propose a framework to learn task-oriented object grasping and rearrangement from single or multiple human demonstration videos. The evaluations in simulation show that our approach outperforms the state-of-the-art methods for multi- and single-view observations. Real-world experiments demonstrate the efficacy of our approach in one- and few-shot imitation learning of manipulation tasks.
|
|
14:00-15:00, Paper TuPIT4.6 | |
TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization |
|
Tan, Zhen | National University of Defense Technology |
Zhou, Zongtan | National University of Defense Technology |
Ge, Yangbing | National University of Defense Technology |
Wang, Zi | National University of Defense Technology |
Chen, Xieyuanli | National University of Defense Technology |
Hu, Dewen | National University of Defense Technology |
Keywords: Visual Learning, RGB-D Perception, Deep Learning for Visual Perception
Abstract: The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncated Depth NeRF (TD-NeRF), a novel approach that enables training NeRF from unknown camera poses - by jointly optimizing learnable parameters of the radiance field and camera poses. Our approach explicitly utilizes monocular depth priors through three key advancements: 1) we propose a novel depth-based ray sampling strategy based on the truncated normal distribution, which improves the convergence speed and accuracy of pose estimation; 2) to circumvent local minima and refine depth geometry, we introduce a coarse-to-fine training strategy that progressively improves the depth precision; 3) we propose a more robust inter-frame point constraint that enhances robustness against depth noise during training. The experimental results on three datasets demonstrate that TD-NeRF achieves superior performance in the joint optimization of camera pose and NeRF, surpassing prior works, and generates more accurate depth geometry. The implementation of our method has been released at https://github.com/nubot-nudt/TD-NeRF.
|
|
14:00-15:00, Paper TuPIT4.7 | |
Learning Concept-Based Causal Transition and Symbolic Reasoning for Visual Planning |
|
Qian, Yilue | Peking University |
Yu, Peiyu | UCLA |
Wu, Ying Nian | University of California, Los Angeles |
Su, Yao | Beijing Institute for General Artificial Intelligence |
Wang, Wei | Beijing Institute for General Artificial Intelligence |
Fan, Lifeng | University of California, Los Angeles |
Keywords: Visual Learning, Learning Categories and Concepts, Task Planning
Abstract: Visual planning simulates how humans make decisions to achieve desired goals in the form of searching for visual causal transitions between an initial visual state and a final visual goal state. It has become increasingly important in egocentric vision with its advantages in guiding agents to perform daily tasks in complex environments. In this paper, we propose an interpretable and generalizable visual planning framework consisting of i) a novel Substitution-based Concept Learner (SCL) that abstracts visual inputs into disentangled concept representations, ii) symbol abstraction and reasoning that performs task planning via the learned symbols, and iii) a Visual Causal Transition model (ViCT) that grounds visual causal transitions to semantically similar real-world actions. Given an initial state, we perform goal-conditioned visual planning with a symbolic reasoning method fueled by the learned representations and causal transitions to reach the goal state. To verify the effectiveness of the proposed model, we collect a large-scale visual planning dataset based on AI2-THOR, dubbed as CCTP. Extensive experiments on this challenging dataset demonstrate the superior performance of our method in visual planning. Empirically, we show that our framework can generalize to unseen task trajectories, unseen object categories, and real-world data. Further details of this work are provided at https://fqyqc.github.io/ConTranPlan/.
|
|
14:00-15:00, Paper TuPIT4.8 | |
Sim-To-Real Domain Shift in Online Action Detection |
|
Patsch, Constantin | Technical University of Munich |
Torjmene, Wael | Technical University of Munich |
Zakour, Marsil | Technical University of Munich |
Wu, Yuankai | TUM |
Salihu, Driton | Technical University Munich |
Steinbach, Eckehard | Technical University of Munich |
Keywords: Visual Learning, Datasets for Human Motion, Simulation and Animation
Abstract: Human reasoning comprises the ability to understand and reason about the current action solely based on past information. To provide effective assistance in an eldercare or household environment an assistive robot or intelligent assistive system has to assess human actions correctly. Based on this presumption, the task of online action detection determines the current action solely based on the past without access to future information. During inference, the performance of the model is largely impacted by the attributes of the underlying training dataset. However, as high costs and ethical concerns are associated with the real-world data collection process, synthetically created data provides a way to mitigate these problems while providing additional data for the training process of the underlying action detection model to improve performance. Due to the inherent domain shift between the synthetic and real data, we introduce a new egocentric dataset called Human Kitchen Interactions (HKI) to investigate the sim-to-real gap. Our dataset contains in total 100 synthetic and real videos in which 21 different actions are executed in a kitchen environment. The synthetic data is acquired in an egocentric virtual reality (VR) setup while capturing the virtual environment in a game engine. We evaluate state-of-the-art online action detection models on our dataset and provide insights into sim-to-real domain shift. Upon acceptance, we will release our dataset and the corresponding features at https://c-patsch.github.io/HKI/.
|
|
14:00-15:00, Paper TuPIT4.9 | |
STAIR: Semantic-Targeted Active Implicit Reconstruction |
|
Jin, Liren | University of Bonn |
Kuang, Haofei | University of Bonn |
Pan, Yue | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Popovic, Marija | TU Delft |
Keywords: Visual Learning, Perception-Action Coupling, Motion and Path Planning
Abstract: Many autonomous robotic applications require object-level understanding when deployed. Actively reconstructing objects of interest, i.e. objects with specific semantic meanings, is therefore relevant for a robot to perform downstream tasks in an initially unknown environment. In this work, we propose a novel framework for semantic-targeted active reconstruction using posed RGB-D measurements and 2D semantic labels as input. The key components of our framework are a semantic implicit neural representation and a compatible planning utility function based on semantic rendering and uncertainty estimation, enabling adaptive view planning to target objects of interest. Our planning approach achieves better reconstruction performance in terms of mesh and novel view rendering quality compared to implicit reconstruction baselines that do not consider semantics for view planning. Our framework further outperforms a state-of-the-art semantic-targeted active reconstruction pipeline based on explicit maps, justifying our choice of utilising implicit neural representations to tackle semantic-targeted active reconstruction problems.
|
|
14:00-15:00, Paper TuPIT4.10 | |
VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation |
|
Wang, Weiyao | The Johns Hopkins University |
Lei, Yutian | Baidu |
Jin, Shiyu | Baidu |
Hager, Gregory | Johns Hopkins University |
Zhang, Liangjun | Baidu |
Keywords: Visual Learning, Deep Learning in Grasping and Manipulation, Imitation Learning
Abstract: In this work, we introduce the Virtual In-Hand Eye Transformer (VIHE), a novel method designed to enhance 3D manipulation capabilities through action-aware view rendering. VIHE autoregressively refines actions in multiple stages by conditioning on rendered views posed from action predictions in the earlier stages. These virtual in-hand views provide a strong inductive bias for effectively recognizing the correct pose for the hand, especially for challenging high-precision tasks such as peg insertion. On 18 manipulation tasks in RLBench simulated environments, VIHE achieves a new state-of-the-art, with a 12% absolute improvement, increasing from 65% to 77% over the existing state-of-the-art model using 100 demonstrations per task. In real-world scenarios, VIHE can learn manipulation tasks with just a handful of demonstrations, highlighting its practical utility. Videos and code can be found at our project site: https://vihe-3d.github.io.
|
|
14:00-15:00, Paper TuPIT4.11 | |
Simultaneous Super-Resolution and Depth Estimation for Satellite Images Based on Diffusion Model |
|
Zhou, Yuwei | Rochester Institute of Technology |
Lee, Yangming | Rochester Institute of Technology |
Keywords: Visual Learning, RGB-D Perception
Abstract: Satellite images provide an effective way to observe the earth surface on a large scale. 3D landscape models can provide critical structural information, such as forestry and crop growth. However, there has been very limited research to estimate the depth and the 3D models of the earth based on satellite images. LiDAR measurements on satellites are usually quite sparse. RGB images have higher resolution than LiDAR, but there has been little research on 3D surface measurements based on satellite RGB images. In comparison with in-situ sensing, satellite RGB images are usually low resolution. In this research, we explore the method that can enhance the satellite image resolution to generate super-resolution images and then conduct depth estimation and 3D reconstruction based on higher-resolution satellite images. Leveraging the strong generation capability of diffusion models, we developed a simultaneous diffusion model learning framework that can train diffusion models for both super-resolution images and depth estimation. With the super-resolution images and the corresponding depth maps, 3D surface reconstruction models with detailed landscape information can be generated. We evaluated the proposed methodology on multiple satellite datasets for both super-resolution and depth estimation tasks, which have demonstrated the effectiveness of our methodology.
|
|
14:00-15:00, Paper TuPIT4.12 | |
Contrastive Mask Denoising Transformer for 3D Instance Segmentation |
|
Wang, He | Zhejiang University |
Lin, Minshen | Zhejiang University |
Zhang, Guofeng | Zhejiang University |
Keywords: Visual Learning
Abstract: In transformer-based methods for point cloud instance segmentation, bipartite matching is used to establish one-to-one correspondences between predictions and ground truths. However, in early training stages, matches can be unstable and inconsistent between epochs, requiring the model to frequently adjust its learning path, thus reducing the quality of model convergence. To address this challenge, we propose the contrastive mask denoising transformer for 3D instance segmentation, which utilizes a mask denoising module to guide the model towards a more stable optimization path in early training stages. Furthermore, we introduce a multi-patternaware query selection module to assist the model learn multiple patterns at one position such that clustered objects can be discerned. In addition, the proposed modules are “plug and play”, which can easily be integrated into transformer-based architectures. Experimental results on ScanNetv2 dataset show that the proposed modules improve the performance of multiple pipelines, notably achieving +1.0 mAP on the main pipeline.
|
|
14:00-15:00, Paper TuPIT4.13 | |
FlowTrack: Point-Level Flow Network for 3D Single Object Tracking |
|
Li, Shuo | Northeastern University |
Cui, Yubo | Northeastern University |
Li, Zhiheng | Northeastern University |
Fang, Zheng | Northeastern University |
Keywords: Visual Learning, Deep Learning Methods, Deep Learning for Visual Perception
Abstract: Traditional motion-based approaches achieve target tracking by estimating the relative movement of target between two consecutive frames. However, they usually overlook local motion information of the target and fail to exploit historical frame information effectively. To overcome the above limitations, we propose a point-level flow method with multi-frame information for 3D SOT task, called FlowTrack. Specifically, by estimating the flow for each point in the target, our method could capture the local motion details of target, thereby improving the localization accuracy of a specific object. At the same time, to handle scenes with sparse points, we present a learnable target feature as the bridge to efficiently integrate target information from past frames. Moreover, we design a novel Instance Flow Head to transform dense point-level flow into instance-level motion, effectively aggregating local motion information to obtain global target motion. Finally, our method achieves competitive performance with improvements of 5.9% on KITTI and 2.9% on NuScenes, compared to the next best method.
|
|
14:00-15:00, Paper TuPIT4.14 | |
Reinforcement Learning with Generalizable Gaussian Splatting |
|
Wang, Jiaxu | Hong Kong University of Science and Technology (Guangzhou) |
Zhang, Qiang | The Hong Kong University of Science and Technology (Guangzhou) |
Sun, Jingkai | The Hong Kong University of Science and Technology(GZ) |
Cao, Jiahang | The Hong Kong University of Science and Technology (Guangzhou) |
Han, Gang | PND Robotics |
Zhao, Wen | Nankai University |
Zhang, Weining | Beijing Innovation Center of Humanoid Robotics |
Shao, Yecheng | Zhejiang University |
Guo, Yijie | UBTECH Robotics |
Xu, Renjing | The Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Visual Learning, Deep Learning Methods, Reinforcement Learning
Abstract: An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. However, these representations contain several drawbacks. They cannot either describe complex local geometries or generalize well to unseen scenes, or require precise foreground masks. Moreover, these implicit neural representations are akin to a ``black box", significantly hindering interpretability. 3D Gaussian Splatting (3DGS), with its explicit scene representation and differentiable rendering nature, is considered a revolutionary change for reconstruction and representation methods. In this paper, we propose a novel Generalizable Gaussian Splatting framework to be the representation of RL tasks, called GSRL. Through validation in the RoboMimic environment, our method achieves better results than other baselines in multiple tasks, improving the performance by 10%, 44%, and 15% compared with baselines on the hardest task. This work is the first attempt to leverage generalizable 3DGS as a representation for RL.
|
|
14:00-15:00, Paper TuPIT4.15 | |
Gaining the Sparse Rewards by Exploring Lottery Tickets in Spiking Neural Network |
|
Cheng, Hao | The Hong Kong University of Science and Technology (Guangzhou) |
Cao, Jiahang | The Hong Kong University of Science and Technology (Guangzhou) |
Xiao, Erjia | The Hong Kong University of Science and Technology (Guangzhou) |
Sun, Mengshu | Beijing University of Technology |
Xu, Renjing | The Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Visual Learning, Recognition, Bioinspired Robot Learning
Abstract: Deploying energy-efficient deep learning algorithms on computational-limited devices, such as robots, is still a pressing issue for real-world applications. Spiking Neural Networks (SNNs), a novel brain-inspired algorithm, offer a promising solution due to their low-latency and low-energy properties over traditional Artificial Neural Networks (ANNs). Despite their advantages, the dense structure of deep SNNs can still result in extra energy consumption. The Lottery Ticket Hypothesis (LTH) posits that within dense neural networks, there exist winning Lottery Tickets (LTs), namely sub-networks, that can be obtained without compromising performance. Inspired by this, this paper delves into the spiking-based LTs (SLTs), examining their unique properties and potential for extreme efficiency. Then, two significant sparse rewards are gained through comprehensive explorations and meticulous experiments on SLTs across various dense structures. Moreover, a sparse algorithm tailored for spiking transformer structure, which incorporates convolution operations into the Patch Embedding Projection (ConvPEP) module, has been proposed to achieve Multi-level Sparsity (MultiSp). MultiSp refers to (1) Patch number sparsity; (2) ConvPEP weights sparsity and binarization; and (3) ConvPEP activation layer binarization. Extensive experiments demonstrate that our method achieves extreme sparsity with only a sight performance decrease, paving the way for deploying energy-efficient neural networks in robotics and beyond.
|
|
14:00-15:00, Paper TuPIT4.16 | |
Uncertainty-Aware Semi-Supervised Semantic Key Point Detection Via Bundle Adjustment |
|
Li, Kai | Zhejiang University, Westlake University |
Zhang, Yin | WestLake University |
Zhao, Shiyu | Westlake University |
Keywords: Visual Learning, Object Detection, Segmentation and Categorization, Recognition
Abstract: Visual relative pose estimation is widely used in multi-robot systems. While semantic key points offer a promising solution for 6DoF pose estimation, manual data labeling for network training remains unavoidable. In this paper, we introduce a novel method that jointly estimates the semantic key point detection model and 6DoF camera pose. Our key idea is to leverage the 3D-2D projection to produce pseudo labels for detection model training while taking the key point predictions as landmarks for 6DoF camera pose estimation. Compared with state-of-the-art works, our method eliminates the need for calibration and time synchronization of multicamera systems, requiring only a handful of manually labeled data, which significantly improves the training efficiency. The experiment validates the effectiveness and practicality of our method in public datasets and real-world robotic applications. Code and data are made available.
|
|
TuPIT5 |
Room 5 |
Deep Learning I |
Teaser Session |
Chair: Ogata, Tetsuya | Waseda University |
|
14:00-15:00, Paper TuPIT5.1 | |
X-Neuron: Interpreting, Locating and Editing of Neurons in Reinforcement Learning Policy |
|
Ge, Yuhong | Tsinghua University, Shanghai Aritificial Intelligence Laborator |
Zhao, Xun | Shanghai AI Laboratory |
Pang, Jiangmiao | Shanghai AI Laboratory |
Zhao, Mingguo | Tsinghua University |
Lin, Dahua | The Chinese University of Hong Kong |
Keywords: AI-Based Methods, Reinforcement Learning, Human Factors and Human-in-the-Loop
Abstract: Despite the impressive performance of Reinforcement Learning (RL), the black-box neural network backbone hinders users from trusting and deploying trained agents in real-world applications where safety is crucial. In order to make agents more trustworthy and controllable, for a given RL-trained policy, we propose to enhance its interpretability and make it human-controllable without retraining. We accomplish this goal by following a 3-step pipeline: 1) We interpret neurons by analyzing the causal effect of neurons on the kinematic attributes; To help agents unlock novel skills and enable human to assist agents in accomplishing tasks, 2) we locate the X-neuron, the optimal neuron that is capable of evoking the desired behavior; 3) and edit its activation values to achieve the precise control. We evaluate our method on various RL tasks ranging from autonomous driving to robot locomotion, and the results display that our approach outperforms previous work regarding almost all evaluation metrics. Through enhancing interpretability and introducing human control, the agents can improve safety and performance, even in unseen environments and novel tasks. For locomotion robots simply trained to walk forward, our method unlocks diverse controllable behaviors ranging from jump to backflip.
|
|
14:00-15:00, Paper TuPIT5.2 | |
Binary Amplitude-Only Hologram Design for Acoustic End-Effector Construction by Physics-Based Deep Learning |
|
Liu, Qing | Shanghaitech University |
Su, Hu | Institute of Automation, Chinese Academy of Science |
Li, Jiaqi | ShanghaiTech University |
Li, Y.F. | City University of Hong Kong |
Zhang, Zhiyuan | Acoustic Robotics Systems Laboratory, Institute of Robotics And |
Liu, Song | ShanghaiTech University |
Keywords: Deep Learning Methods, Grippers and Other End-Effectors, Micro/Nano Robots
Abstract: Acoustic holography has emerged as a cutting-edge technique for constructing a micro-robot acoustic end-effector for non-contact manipulation. With favorable features of sufficient penetration depth, high spatial resolution, non-injury, and unlimited work environment, it exhibits promising potential in various applications encompassing biomedicine, in-vivo surgery, and lab-on-a-chip applications. Binary Amplitude-Only Hologram (BAOH) with simple structure is highly compatible with acoustic metasurface, which supports to modulate acoustic field exquisitely with straightforward production at relatively low cost. However, existing researches mainly focus on hardware implementation instead of arithmetical design. In the present study, we propose a deep learning based BAOH generation algorithm for constructing precise, reconfigurable and high-resolution end-effector based on acoustic field. Specifically, we incorporated an acoustic wave propagation model into the deep neural network, in favor of bypassing the laborious collection of labeled data and facilitating the model to learn the inverse mapping. Additionally, differentiable binarization with adaptive threshold is embedded into the framework, with the purpose of circumventing the gradient invalidation and alleviating the information loss caused by binarization process. The simulation experiments show that the proposed algorithm is capable to predict BAOH that supports precise, robust, versatile and real-time construction of acoustic end-effector, enjoying broad prospects in various applications related to micro-robotic manipulation.
|
|
14:00-15:00, Paper TuPIT5.3 | |
Active Propulsion Noise Shaping for Multi-Rotor Aircraft Localization |
|
Serussi, Gabriele | Technion Institute of Technology |
Shor, Tamir | Technion Institute of Technology |
Hirshberg, Tom | Technion |
Baskin, Chaim | Technion Institute of Technology |
Bronstein, Alexander | TECHNION |
Keywords: Deep Learning Methods, Machine Learning for Robot Control, Kinematics
Abstract: Multi-rotor aerial autonomous vehicles (MAVs) primarily rely on vision for navigation purposes. However, visual localization and odometry techniques suffer from poor performance in low or direct sunlight, a limited field of view, and vulnerability to occlusions. Acoustic sensing can serve as a complementary or even alternative modality for vision in many situations, and it also has the added benefits of lower system cost and energy footprint, which is especially important for micro aircraft. This paper proposes actively controlling and shaping the aircraft propulsion noise generated by the rotors to benefit localization tasks, rather than considering it a harmful nuisance. We present a neural network architecture for selfnoise-based localization in a known environment. We show that training it simultaneously with learning time-varying rotor phase modulation achieves accurate and robust localization. The proposed methods are evaluated using a computationally affordable simulation of MAV rotor noise in 2D acoustic environments that is fitted to real recordings of rotor pressure fields. Code and data are accompanied.
|
|
14:00-15:00, Paper TuPIT5.4 | |
VoxelContrast: Voxel Contrast-Based Unsupervised Learning for 3D Point Clouds |
|
Qin, Yuxiang | Tongji University |
Sun, Hao | National University of Singapore |
Keywords: Deep Learning Methods, Representation Learning, Object Detection, Segmentation and Categorization
Abstract: The annotation process for 3D point cloud data is more complex than for image data, and training with a small amount of annotated data significantly reduces the performance of deep learning models. Unsupervised learning can better utilize large amounts of unlabeled point cloud data for model pretraining, thereby achieving excellent performance on small-scale datasets. However, many existing 3D point cloud unsupervised learning methods are primarily focused on single-object CAD point clouds and may not be suitable for larger-scale autonomous driving LiDAR point clouds. To address this challenging problem, we propose a voxel contrast-based unsupervised learning method (VoxelContrast), which adapts well to different types of point cloud data through voxelization and can be seamlessly integrated with existing model frameworks. Specifically, we utilize voxelization methods to preprocess point cloud data. Then, we incorporate voxel information into contrastive learning, facilitating the creation of more meaningful positive and negative sample pairs. Finally, we conduct unsupervised training of the model using instance discrimination as the proxy task. Our method was validated in two downstream tasks: point cloud shape classification and 3D object detection. Experimental results demonstrated that models pretrained using a substantial amount of unlabeled data can further enhance the effectiveness of existing supervised learning methods.
|
|
14:00-15:00, Paper TuPIT5.5 | |
Improving Out-Of-Distribution Generalization of Trajectory Prediction for Autonomous Driving Via Polynomial Representations |
|
Yao, Yue | Freie Universität Berlin & Continental AG |
Yan, Shengchao | University of Freiburg |
Goehring, Daniel | Freie Universität Berlin |
Burgard, Wolfram | University of Technology Nuremberg |
Reichardt, Joerg | Continental AG |
Keywords: Deep Learning Methods, Performance Evaluation and Benchmarking, Autonomous Vehicle Navigation
Abstract: Robustness against Out-of-Distribution (OoD) samples is a key performance indicator of a trajectory prediction model. However, the development and ranking of state-of-the-art (SotA) models are driven by their In-Distribution (ID) performance on individual competition datasets. We present an OoD testing protocol that homogenizes datasets and prediction tasks across two large-scale motion datasets. We introduce a novel prediction algorithm based on polynomial representations for agent trajectory and road geometry on both the input and output sides of the model. With a much smaller model size, training effort, and inference time, we reach near SotA performance for ID testing and significantly improve robustness in OoD testing. Within our OoD testing protocol, we further study two augmentation strategies of SotA models and their effects on model generalization. Highlighting the contrast between ID and OoD performance, we suggest adding OoD testing to the evaluation criteria of trajectory prediction models.
|
|
14:00-15:00, Paper TuPIT5.6 | |
Real-Time Coordinated Motion Generation: A Hierarchical Deep Predictive Learning Model for Bimanual Tasks |
|
Shikada, Genki | Waseda University |
Armleder, Simon | Technische Universität München |
Ito, Hiroshi | Hitachi, Ltd |
Cheng, Gordon | Technical University of Munich |
Ogata, Tetsuya | Waseda University |
Keywords: Deep Learning Methods, Bimanual Manipulation, Sensorimotor Learning
Abstract: Robots that autonomously operate in human living environments require the ability to adapt to unpredictable changes and flexibly handle a variety of tasks. Particularly, coordinated bimanual motions are essential for enabling tasks that are difficult with just one hand, such as grasping bulky objects, transporting heavy loads, and precision work. Traditional methods of generating robot motions typically involve executing pre-programmed motions, making it challenging to adapt to complex and unpredictable environmental changes. To address this issue, our research focuses on generating diverse motions that can flexibly adapt to environmental changes based on Deep Predictive Learning from a small amount of real-world data. Previous Deep Predictive Learning models have generated the motions of a robot's left and right arms by a single LSTM, making it difficult to operate them independently. Therefore, we propose a new hierarchical Deep Predictive Learning model specialized for generating coordinated bimanual motions. This model comprises three components: a Left-LSTM, which learns the body and visual information on the robot's left side, a Right-LSTM that performs a similar function for the right side, and a Union-LSTM which integrates this information at a higher level. To verify the effectiveness of the proposed model, we conducted bimanual grasping experiments with multiple different objects using two different robots. The experimental results showed that independent of hardware, our model demonstrated a higher success rate compared to the traditional approach, indicating its enhanced capability in coordinating bimanual motions.
|
|
14:00-15:00, Paper TuPIT5.7 | |
An LSTM-Based Model to Recognize Driving Style and Predict Acceleration |
|
Lu, Jiaxing | Oklahoma State University |
Hossain, Sanzida | Oklahoma State University |
Sheng, Weihua | Oklahoma State University |
Bai, He | Oklahoma State University |
Keywords: Deep Learning Methods
Abstract: To ensure safe cooperative driving in mixed traffic with both manned and unmanned vehicles, it is crucial to understand and model the driving styles of human drivers. This paper explores how to develop accurate recognition of driving style and use that for the prediction of vehicle motion, which enables better performance in cooperative driving. A simulation testbed that consists of a driving simulator and a copilot is first introduced for the purpose of data collection and testing. A Long Short-Term Memory (LSTM)-based network that models human driving styles and predicts driving acceleration is developed. Standalone tests are conducted to examine the model performance in the simulation testbed. Finally, the model is evaluated in a series of merging experiments that involves 5 vehicles.
|
|
14:00-15:00, Paper TuPIT5.8 | |
Loss Distillation Via Gradient Matching for Point Cloud Completion with Weighted Chamfer Distance |
|
Lin, Fangzhou | Tohoku University |
Liu, Haotian | Worcester Polytechnic Institute |
Zhou, Haoying | Worcester Polytechnic Institute |
Hou, Songlin | Dell Technologies |
Yamada, Kazunori | Tohoku University |
Fischer, Gregory Scott | Worcester Polytechnic Institute, WPI |
Li, Yanhua | Worcester Polytechnic Institute |
Zhang, Haichong | Worcester Polytechnic Institute |
Zhang, Ziming | Worcester Polytechnic Institute |
Keywords: Deep Learning Methods, Recognition
Abstract: 3D point clouds enhanced the robot's ability to perceive the geometrical information of the environments, making it possible for many downstream tasks such as grasp pose detection and scene understanding. The performance of these tasks, though, heavily relies on the quality of data input, as incomplete can lead to poor results and failure cases. Recent training loss functions designed for deep learning-based point cloud completion, such as Chamfer distance (CD) and its variants (e.g., HyperCD), imply a good gradient weighting scheme can significantly boost performance. However, these CD-based loss functions usually require data-related parameter tuning, which can be time-consuming for data-extensive tasks. To address this issue, we aim to find a family of weighted training losses (weighted CD) that requires no parameter tuning. To this end, we propose a search scheme, Loss Distillation via Gradient Matching, to find good candidate loss functions by mimicking the learning behavior in backpropagation between HyperCD and weighted CD. Once this is done, we propose a novel bilevel optimization formula to train the backbone network based on the weighted CD loss. We observe that: (1) with proper weighted functions, the weighted CD can always achieve similar performance to HyperCD, and (2) the Landau weighted CD, namely Landau CD, can outperform HyperCD for point cloud completion and lead to new state-of-the-art results on several benchmark datasets. Our demo code is available at https://github.com/Zhang-VISLab/IROS2024-LossDistillationWeightedCD.
|
|
14:00-15:00, Paper TuPIT5.9 | |
Event-Based Few-Shot Fine-Grained Human Action Recognition |
|
Yang, Zonglin | Beijing Institute of Technology |
Yang, Yan | The Australian National University |
Shi, Yuheng | Beijing Institute of Technology |
Yang, Hao | Beijing Institute of Technology |
Zhang, Ruikun | Beijing Institute of Technology |
Liu, Liu | Huawei |
Wu, Xinxiao | Beijing Institute of Technology |
Pan, Liyuan | Beijing Institute of Technology |
Keywords: Deep Learning Methods
Abstract: Few-shot fine-grained human (FGH) action recognition is crucial in the context of human-robot interaction within open-set real-world environments. Existing works mainly focus on features extracted from RGB frames. However, their performances are drastically impacted in challenging scenarios, such as high-dynamic or low lighting conditions. Event cameras can independently and sparsely capture brightness changes in a scene at microsecond resolution and high dynamic range, which offer a promising solution. However, the modality differences between events and RGB frames, and the lack of paired fine-grained data hinder the development of event-based FGH action recognition. Therefore, in this paper, we introduce the first Event Camera Fine-grained Human Action (E-FAction) dataset. This dataset comprises 3304 paired `event stream and RGB sequence', covering 15 coarse action classes and 128 fine-grained actions. Then, we develop a versatile event feature extractor. Considering the spatial sparsity of event stream, we design two modules to mine the temporal motion and semantic features under the guidance of paired RGB frames, facilitating robust weight initialization for the feature extractor in few-shot FGH action recognition. We conduct extensive experiments on both published and our built synthetic and real datasets, and consistently achieve state-of-the-art performance compared to existing baselines. Code and dataset will be available at link.
|
|
14:00-15:00, Paper TuPIT5.10 | |
FI-SLAM: Feature Fusion and Instance Reconstruction for Neural Implicit SLAM |
|
Wang, Xingshuo | Northeastern University |
Zhang, Yunzhou | Northeastern University |
Zhang, Zhiyao | Northeastern University |
Wang, Mengting | Northeastern University |
Li, Zhiteng | Northeastern University |
Chen, Xuanhua | Northeastern University |
Keywords: Deep Learning Methods, Semantic Scene Understanding, SLAM
Abstract: Abstract— Recent advancements in neural implicit fields for Simultaneous Localization and Mapping (SLAM) have provided breakthroughs. However, the benefits of reconstruction results to the perception ability of robot are minimal. Therefore, we propose FI-SLAM, a dense semantic instance SLAM system based on neural implicit representation, which significantly aids robots in better understanding the scene. FI-SLAM employs a coordinate and plane joint encoding method, which reduces the difficulty of feature storage by flattening the feature space. Furthermore, to improve representation efficiency, we use the method of adjacent feature level linear interpolation to describe features. We propose a feature fusion (FF) method to merge the object features with the scene features. The fused feature vector enhances the reconstruction accuracy of the local scene while ensuring the global reconstruction effect. It has improved the global reconstruction effect of the scene and the accuracy of camera tracking. Numerous experiments on synthetic and realworld datasets demonstrate that our method can assure accurate tracking precision, high-fidelity reconstruction results, and complete semantic instance maps. In summary, the algorithm we proposed heavily augments the scene perception capabilities of robot.
|
|
14:00-15:00, Paper TuPIT5.11 | |
PolyFit: A Peg-In-Hole Assembly Framework for Unseen Polygon Shapes Via Sim-To-Real Adaptation |
|
Lee, Geonhyup | Gwangju Institute of Science and Technology |
Lee, Joosoon | Gwangju Institute of Science and Technology |
Noh, Sangjun | Gwangju Institute of Science and Technology |
Ko, Minhwan | Gwaungju Institute of Science and Technology(GIST) |
Kim, Kangmin | Gwangju Institute of Science and Technology |
Lee, Kyoobin | Gwangju Institute of Science and Technology |
Keywords: Deep Learning Methods, Deep Learning in Grasping and Manipulation, Compliant Assembly
Abstract: The study addresses the foundational and challenging task of peg-in-hole assembly in robotics, where misalignments caused by sensor inaccuracies and mechanical errors often result in insertion failures or jamming. This research introduces PolyFit, representing a paradigm shift by transitioning from a reinforcement learning approach to a supervised learning methodology. PolyFit is a Force/Torque (F/T)-based supervised learning framework designed for 5-DoF peg-in-hole assembly. It utilizes F/T data for accurate extrinsic pose estimation and adjusts the peg pose to rectify misalignments. Extensive training in a simulated environment involves a dataset encompassing a diverse range of peg-hole shapes, extrinsic poses, and their corresponding contact F/T readings. The study proposes a sim-to-real adaptation method for real-world application, using a sim-real paired dataset to enable effective generalization to complex and unseen polygon shapes. Real-world evaluations demonstrate substantial success rates of 96.7% and 91.3%, highlighting the robustness and adaptability of the proposed method. Videos of data generation and experiments are available online at https://sites.google.com/view/polyfit-peginhole.
|
|
14:00-15:00, Paper TuPIT5.12 | |
Waypoint-Based Reinforcement Learning for Robot Manipulation Tasks |
|
Mehta, Shaunak | Virginia Tech |
Habibian, Soheil | Virginia Tech |
Losey, Dylan | Virginia Tech |
Keywords: Reinforcement Learning, Deep Learning in Grasping and Manipulation, Probabilistic Inference
Abstract: Robot arms should be able to learn new tasks. One framework here is reinforcement learning, where the robot is given a reward function that encodes the task, and the robot autonomously learns actions to maximize its reward. Existing approaches to reinforcement learning often frame this problem as a Markov decision process, and learn a policy (or a hierarchy of policies) to complete the task. These policies reason over hundreds of fine-grained actions that the robot arm needs to take: e.g., moving slightly to the right or rotating the end-effector a few degrees. But the manipulation tasks that we want robots to perform can often be broken down into a small number of high-level motions: e.g., reaching an object or turning a handle. In this paper we therefore propose a waypoint-based approach for model-free reinforcement learning. Instead of learning a low-level policy, the robot now learns a trajectory of waypoints, and then interpolates between those waypoints using existing controllers. Our key novelty is framing this waypoint-based setting as a sequence of multi-armed bandits: each bandit problem corresponds to one waypoint along the robot's motion. We theoretically show that an ideal solution to this reformulation has lower regret bounds than standard frameworks. We also introduce an approximate posterior sampling solution that builds the robot's motion one waypoint at a time. Results across benchmark simulations and two real-world experiments suggest that this proposed approach learns new tasks more quickly than state-of-the-art baselines. See our website here: https://collab.me.vt.edu/rl-waypoints/
|
|
14:00-15:00, Paper TuPIT5.13 | |
Reinforcement Learning of Dolly-In Filming Using a Ground-Based Robot |
|
Lorimer, Philip | University of Bath |
Saunders, Jack | University of Bath |
Hunter, Alan Joseph | University of Bath |
Li, Wenbin | University of Bath |
Keywords: Reinforcement Learning, Art and Entertainment Robotics, Machine Learning for Robot Control
Abstract: Free-roaming dollies enhance filmmaking with dynamic movement, but challenges in automated camera control remain unresolved. Our study advances this field by applying Reinforcement Learning (RL) to automate dolly-in shots using free-roaming ground-based filming robots, overcoming traditional control hurdles. We demonstrate the effectiveness of combined control for precise film tasks by comparing it to independent control strategies. Our robust RL pipeline surpasses traditional Proportional-Derivative controller performance in simulation and proves its efficacy in real-world tests on a modified ROSBot 2.0 platform equipped with a camera turret. This validates our approach's practicality and sets the stage for further research in complex filming scenarios, contributing significantly to the fusion of technology with cinematic creativity. This work presents a leap forward in the field and opens new avenues for research and development, effectively bridging the gap between technological advancement and creative filmmaking.
|
|
14:00-15:00, Paper TuPIT5.14 | |
Disentangled Acoustic Fields for Multimodal Physical Scene Understanding |
|
Yin, Jie | Shanghai Jiao Tong University |
Luo, Andrew | Carnegie Mellon University |
Du, Yilun | MIT |
Cherian, Anoop | Mitsubishi Electric Research Labs |
Marks, Tim K. | Mitsubishi Electric Research Laboratories (MERL) |
Le Roux, Jonathan | MERL |
Gan, Chuang | IBM |
Keywords: Representation Learning, Semantic Scene Understanding, AI-Based Methods
Abstract: We study the problem of multimodal physical scene understanding, where an embodied agent needs to find fallen objects by inferring object properties, direction, and distance of an impact sound source. Previous works adopt feed-forward neural networks to directly regress the variables from sound, leading to poor generalization and domain adaptation issues. In this paper, we illustrate that learning a disentangled model of acoustic formation, referred to as disentangled acoustic field (DAF), to capture the sound generation and propagation process, enables the embodied agent to construct a spatial uncertainty map over where the objects may have fallen. We demonstrate that our analysis-by-synthesis framework can jointly infer sound properties by explicitly decomposing and factorizing the latent space of the disentangled model. We further show that the spatial uncertainty map can significantly improve the success rate for the localization of fallen objects by proposing multiple plausible exploration locations.
|
|
14:00-15:00, Paper TuPIT5.15 | |
Kinematics-Aware Trajectory Generation and Prediction with Latent Stochastic Differential Modeling |
|
Jiao, Ruochen | Northwestern University |
Wang, Yixuan | Northwestern University |
Liu, Xiangguo | Northwestern University |
Zhan, Sinong | Northwestern University |
Huang, Chao | University of Liverpool |
Zhu, Qi | Northwestern University |
Keywords: Representation Learning, Autonomous Agents, AI-Enabled Robotics
Abstract: Trajectory generation and trajectory prediction are two critical tasks in autonomous driving, which generate various trajectories for testing during development and predict the trajectories of surrounding vehicles during operation, respectively. In recent years, emerging data-driven deep learning-based methods have shown great promise for these two tasks in learning various traffic scenarios and improving average performance without assuming physical models. However, it remains a challenging problem for these methods to ensure that the generated/predicted trajectories are physically realistic and controllable. This challenge arises because learning-based approaches often function as opaque black boxes and do not adhere to physical laws. Conversely, existing model-based methods provide physically feasible results but are constrained by predefined model structures, limiting their capabilities to address complex scenarios. To address the limitations of these two types of approaches, we propose a new method that integrates kinematic knowledge into neural stochastic differential equations (SDE) and designs a variational autoencoder based on this latent kinematics-aware SDE (LK-SDE) to generate vehicle motions. Experimental results demonstrate that our method significantly outperforms both model-based and learning-based baselines in producing physically realistic and precisely controllable vehicle trajectories. Additionally, it performs well in predicting unobserved physical variables in the latent space.
|
|
14:00-15:00, Paper TuPIT5.16 | |
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations |
|
Li, Puhao | Tsinghua University |
Liu, Tengyu | Beijing Institute for General Artificial Intelligence |
Li, Yuyang | Tsinghua University |
Han, Muzhi | University of California, Los Angeles |
Geng, Haoran | Peking University |
Wang, Shu | UCLA |
Zhu, Yixin | Peking University |
Zhu, Song-Chun | UCLA |
Huang, Siyuan | Beijing Institute for General Artificial Intelligence |
Keywords: Representation Learning, Deep Learning in Grasping and Manipulation
Abstract: Autonomous robotic systems capable of learning novel manipulation tasks are poised to transform industries from manufacturing to service automation. However, current methods ( e.g., VIP and R3M) still face significant hurdles, notably the domain gap among robotic embodiments and the sparsity of successful task executions within specific action spaces, resulting in misaligned and ambiguous task representations. We introduce Ag2Manip (Agent Agnostic representations for Manipulation), a framework aimed at addressing these challenges through two key innovations: (1) an agent-agnostic visual representation derived from human manipulation videos, with the specifics of embodiments obscured to enhance generalizability; and (2) an agent-agnostic action representation abstracting a robot's kinematics to a universal agent proxy, emphasizing crucial interactions between end-effector and object. Ag2Manip has been empirically validated across simulated benchmarks, showing a 325% performance increase without relying on domain-specific demonstrations. Ablation studies further underline the essential contributions of the agent-agnostic visual and action representations to this success. Extending our evaluations to the real world, Ag2Manip significantly improves imitation learning success rates from 50% to 77.5%, demonstrating its effectiveness and generalizability across both simulated and real environments.
|
|
TuPIT6 |
Room 6 |
Reinforcement Learning |
Teaser Session |
Chair: Wu, I-Chen | National Chiao Tung University |
Co-Chair: Panov, Aleksandr | AIRI |
|
14:00-15:00, Paper TuPIT6.1 | |
Bi-CL: A Reinforcement Learning Framework for Robots Coordination through Bi-Level Optimization |
|
Hu, Zechen | George Mason University |
Shishika, Daigo | George Mason University |
Xiao, Xuesu | George Mason University |
Wang, Xuan | George Mason University |
Keywords: Planning, Scheduling and Coordination, Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents
Abstract: In multi-robot systems, achieving coordinated missions remains a significant challenge due to the coupled nature of coordination behaviors and the lack of global information for individual robots. To mitigate these challenges, this paper introduces a novel approach, Bi-level Coordination Learning (Bi-CL), that leverages a bi-level optimization structure within a CTDE paradigm. Our bi-level reformulation decomposes the original problem into a reinforcement learning level with reduced action space, and an imitation learning level that gains demonstrations from a global optimizer. Bi-CL further integrates an alignment penalty mechanism, aiming to minimize the discrepancy between the two levels without degrading their training efficiency. We introduce a running example to conceptualize the problem formulation. Simulation results demonstrate that Bi-CL can learn more efficiently and achieve comparable performance with traditional multi-agent reinforcement learning baselines for multi-robot coordination.
|
|
14:00-15:00, Paper TuPIT6.2 | |
Image-Based Deep Reinforcement Learning with Intrinsically Motivated Stimuli: On the Execution of Complex Robotic Tasks |
|
Valencia Redrovan, David Patricio | The University of Auckland |
Williams, Henry | University of Auckland |
Xing, Yuning | The University of Auckland |
Gee, Trevor | The University of Auckland |
Liarokapis, Minas | The University of Auckland |
MacDonald, Bruce | University of Auckland |
Keywords: Reinforcement Learning
Abstract: Reinforcement Learning (RL) has been widely used to solve tasks where the environment consistently provides a dense reward value. However, in real-world scenarios, rewards can often be poorly defined or sparse. Auxiliary signals are indispensable for discovering efficient exploration strategies and aiding the learning process. In this work, inspired by intrinsic motivation theory, we postulate that the intrinsic stimuli of novelty and surprise can assist in improving exploration in complex, sparsely rewarded environments. We introduce a novel sample-efficient method able to learn directly from pixels, an image-based extension of TD3 with an autoencoder called NaSA-TD3. The experiments demonstrate that NaSA-TD3 is easy to train and an efficient method for tackling complex continuous-control robotic tasks, both in simulated environments and real-world settings. NaSA-TD3 outperforms existing state-of-the-art RL image-based methods in terms of final performance without requiring pre-trained models or human demonstrations.
|
|
14:00-15:00, Paper TuPIT6.3 | |
Mitigating Adversarial Perturbations for Deep Reinforcement Learning Via Vector Quantization |
|
Luu, Tung | Korea Advanced Institute of Science and Technology |
Nguyen, Thanh | Korea Advanced Institute of Science and Technology (KAIST) |
Tee, Joshua Tian Jin | KAIST |
Kim, Sungwoong | Korea University |
Yoo, Chang D. | KAIST |
Keywords: Reinforcement Learning, Robot Safety, Robust/Adaptive Control
Abstract: Recent studies reveal that well-performing reinforcement learning (RL) agents in training often lack resilience against adversarial perturbations during deployment. This highlights the importance of building a robust agent before deploying it in the real world. Most prior works focus on developing robust training-based procedures to tackle this problem, including enhancing the robustness of the deep neural network component itself or adversarially training the agent on strong attacks. In this work, we instead study an input transformation-based defense for RL. Specifically, we propose using a variant of vector quantization (VQ) as a transformation for input observations, which is then used to reduce the space of adversarial attacks during testing, resulting in the transformed observations being less affected by attacks. Our method is computationally efficient and seamlessly integrates with adversarial training, further enhancing the robustness of RL agents against adversarial attacks. Through extensive experiments in multiple environments, we demonstrate that using VQ as the input transformation effectively defends against adversarial attacks on the agent's observations.
|
|
14:00-15:00, Paper TuPIT6.4 | |
Gradient-Based Regularization for Action Smoothness in Robotic Control with Reinforcement Learning |
|
Li, Yi | National Yang Ming Chiao Tung University |
Cao, Hoang-Giang | National Yang Ming Chiao Tung University |
Dao, Cong-Tinh | National Yang Ming Chiao Tung University |
Chen, Yu-Cheng | National Yang Ming Chiao Tung University |
Wu, I-Chen | National Chiao Tung University |
Keywords: Reinforcement Learning, Robust/Adaptive Control
Abstract: Deep Reinforcement Learning (DRL) has achieved remarkable success, ranging from complex computer games to real-world applications, showing the potential for intelligent agents capable of learning in dynamic environments. However, its application in real-world scenarios presents challenges, including the jerky problem, in which jerky trajectories not only compromise system safety but also increase power consumption and shorten the service life of robotic and autonomous systems. To address jerky actions, a method called conditioning for action policy smoothness (CAPS) was proposed by adding regularization terms to reduce the action changes. This paper further proposes a novel method, named Gradient-based CAPS (Grad-CAPS), that modifies CAPS by reducing the difference in the gradient of action and then uses displacement normalization to enable the agent to adapt to invariant action scales. Consequently, our method effectively reduces zigzagging action sequences while enhancing policy expressiveness and the adaptability of our method across diverse scenarios and environments. In the experiments, we integrated Grad-CAPS with different reinforcement learning algorithms and evaluated its performance on various robotic-related tasks in DeepMind Control Suite and OpenAI Gym environments. The results demonstrate that Grad-CAPS effectively improves performance while maintaining a comparable level of smoothness compared to CAPS and Vanilla agents.
|
|
14:00-15:00, Paper TuPIT6.5 | |
Towards Accurate and Robust Dynamics and Reward Modeling for Model-Based Offline Inverse Reinforcement Learning |
|
Zhang, Gengyu | University of Illinois at Chicago |
Yan, Yan | Illinois Institute of Technology |
Keywords: Reinforcement Learning, Learning from Demonstration, Task and Motion Planning
Abstract: This paper enhances model-based offline inverse reinforcement learning (IRL) by refining conservative Markov decision process (MDP) frameworks, traditionally employing uncertainty penalties to deter exploitation in uncertain areas. Existing methods, dependent on neural network ensembles to model MDP dynamics and quantify uncertainty through ensemble prediction heuristics, face limitations: they presume Gaussian-distributed state transitions, leading to simplified environmental representations. Additionally, ensemble modeling often results in high variance, indicating potential overfitting and a lack of generalizability. Moreover, the heuristic reliance for uncertainty quantification struggles to fully grasp environmental complexities, offering an incomplete foundation for informed decisions. Maintaining multiple models also demands substantial computational resources. Addressing these shortcomings, we propose leveraging score-based diffusion generative models for dynamic modeling. This method significantly broadens the scope of representable target distributions, surpassing Gaussian constraints. It not only improves the accuracy of transition modeling but also roots uncertainty quantification in diffusion models' theoretical underpinnings, enabling more precise and dependable reward regularization. We further innovate by incorporating a transition stability regularizer (textit{TSR}) into the reward estimation. This novel element embeds stability into the reward learning process, diminishing the influence of transition variability and promoting more consistent policy optimization. Our empirical studies on diverse Mujoco robotic control tasks demonstrate that our diffusion-based methodology not only furnishes more accurate transition estimations but also surpasses conventional ensemble approaches in policy effectiveness. The addition of the textit{TSR} marks a distinctive advancement in offline IRL by enhancing the reward and policy learning efficacy.
|
|
14:00-15:00, Paper TuPIT6.6 | |
Meta SAC-Lag: Towards Deployable Safe Reinforcement Learning Via MetaGradient-Based Hyperparameter Tuning |
|
Honari, Homayoun | University of Victoria |
Soufi Enayati, Amir Mehdi | University of Victoria |
Ghafarian Tamizi, Mehran | University of Victoria |
Najjaran, Homayoun | University of Victoria |
Keywords: Reinforcement Learning, Machine Learning for Robot Control, Robot Safety
Abstract: Safe Reinforcement Learning (Safe RL) is one of the prevalently studied subcategories of trial-and-error-based methods with the intention to be deployed on real-world systems. In safe RL, the goal is to maximize reward performance while minimizing constraints, often achieved by setting bounds on constraint functions and utilizing the Lagrangian method. However, deploying Lagrangian-based safe RL in real-world scenarios is challenging due to the necessity of threshold fine-tuning, as imprecise adjustments may lead to suboptimal policy convergence. To mitigate this challenge, we propose a unified Lagrangian-based model-free architecture called Meta Soft Actor-Critic Lagrangian (Meta SAC-Lag). Meta SAC-Lag uses meta-gradient optimization to automatically update the safety-related hyperparameters. The proposed method is designed to address safe exploration and threshold adjustment with minimal hyperparameter tuning requirement. In our pipeline, the inner parameters are updated through the conventional formulation and the hyperparameters are adjusted using the meta-objectives which are defined based on the updated parameters. Our results show that the agent can reliably adjust the safety performance due to the relatively fast convergence rate of the safety threshold. We evaluate the performance of Meta SAC-Lag in five simulated environments against Lagrangian baselines, and the results demonstrate its capability to create synergy between parameters, yielding better or competitive results. Furthermore, we conduct a real-world experiment involving a robotic arm tasked with pouring coffee into a cup without spillage. Meta SAC-Lag is successfully trained to execute the task, while minimizing effort constraints. The success of Meta SAC-Lag in performing the experiment is intended to be a step toward practical deployment of safe RL algorithms to learn the control process of safety-critical real-world systems without explicit engineering.
|
|
14:00-15:00, Paper TuPIT6.7 | |
Benchmarking Smoothness and Reducing High-Frequency Oscillations in Continuous Control Policies |
|
Galelli Christmann, Guilherme Henrique | Inventec Corporation |
Luo, Ying-Sheng | Inventec Corporation |
Mandala, Hanjaya | Inventec Corporation |
Chen, Wei-Chao | Inventec Inc |
Keywords: Reinforcement Learning, Machine Learning for Robot Control
Abstract: Reinforcement learning (RL) policies are prone to high frequency oscillations, specially undesirable when deploying to hardware in the real-world. In this paper, we identify, categorize, and compare methods from the literature that aim to mitigate high frequency oscillations in deep RL. We define two broad classes: loss regularization and architectural methods. At their core, these methods incentivize learning a smooth mapping, such that nearby states in the input space produce nearby actions in the output space. We present benchmarks in terms of policy performance and control smoothness on traditional RL environments from Gymnasium and a complex manipulation task, as well as three robotics locomotion tasks that include deployment and evaluation with real-world hardware. Finally, we also propose hybrid methods that combine elements from both loss regularization and architectural methods. We find that the best-performing hybrid outperforms other methods, and improves control smoothness by 26.8% over the baseline, with a worst case performance degradation of just 2.8%.
|
|
14:00-15:00, Paper TuPIT6.8 | |
Deeper Introspective SLAM: How to Avoid Tracking Failures Over Longer Routes? |
|
Naveed, Kanwal | NUST |
Anjum, Muhammad Latif | National University of Sciences and Technology, Islamabad |
Hussain, Wajahat | National University of Sciences and Technology (NUST) |
Lee, Donghwan | KAIST |
Keywords: Reinforcement Learning, SLAM, Visual Tracking
Abstract: Large scale active exploration has recently revealed limitations of visual SLAM’s tracking ability. Active view planning methods, based on reinforcement learning, have been proposed to improve visual tracking robustness. In this work, we expose the limitations of deep reinforcement learning-based visual SLAM over longer routes. We demonstrate that additional modalities (depth, scene layout) offer little improvement. Furthermore, reward shaping is not the main reason behind the short-sightedness of the state-of-the-art visual SLAM tracker. We propose a novel video vision transformer-based architecture that improves the farsightedness of the visual tracker, which results in the completion of longer routes with efficient paths. Out of 60 challenging routes, our approach manages to complete 56 routes, which is a three-fold improvement over the state-of-the-art active view mapping (DI-SLAM) baseline. Interestingly, ORBSLAM3 was unable to complete a single route without tracking failure.
|
|
14:00-15:00, Paper TuPIT6.9 | |
Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks |
|
Feng, Pu | Beihang University |
Liang, Junkang | Beihang University |
Wang, Size | Beihang University |
Yu, Xin | Beihang University |
Ji, Xin | Big Data Center, State Grid Corporation of China |
Chen, Yiting | Big Data Center, State Grid Corporation of China |
Zhang, Kui | Beihang University |
Shi, Rongye | Beihang University |
Wu, Wenjun | Beihang University |
Keywords: Reinforcement Learning, Multi-Robot Systems, Distributed Robot Systems
Abstract: In multi-agent reinforcement learning (MARL), the Centralized Training with Decentralized Execution (CTDE) framework is pivotal but struggles due to a gap: global state guidance in training versus reliance on local observations in execution, lacking global signals. Inspired by human societal consensus mechanisms, we introduce the Hierarchical Consensus-based Multi-Agent Reinforcement Learning (HC-MARL) framework to address this limitation. HC-MARL employs contrastive learning to foster a global consensus among agents, enabling cooperative behavior without direct communication. This approach enables agents to form a global consensus from local observations, using it as an additional piece of information to guide collaborative actions during execution. To cater to the dynamic requirements of various tasks, consensus is divided into multiple layers, encompassing both short-term and long-term considerations. Short-term observations prompt the creation of an immediate, low-layer consensus, while long-term observations contribute to the formation of a strategic, high-layer consensus. This process is further refined through an adaptive attention mechanism that dynamically adjusts the influence of each consensus layer. This mechanism optimizes the balance between immediate reactions and strategic planning, tailoring it to the specific demands of the task at hand. Extensive experiments and real-world applications in multi-robot systems showcase our framework's superior performance, marking significant advancements over baselines.
|
|
14:00-15:00, Paper TuPIT6.10 | |
DEAR: Disentangled Environment and Agent Representations for Reinforcement Learning without Reconstruction |
|
Pore, Ameya | University of Verona |
Muradore, Riccardo | University of Verona |
Dall'Alba, Diego | University of Verona |
Keywords: Reinforcement Learning, Representation Learning, Visual Learning
Abstract: Reinforcement Learning (RL) algorithms can learn robotic control tasks from visual observations, but they often require a large amount of data, especially when the visual scene is complex and unstructured. In this paper, we explore how the agent’s knowledge of its shape can improve the sample efficiency of visual RL methods. We propose a novel method, Disentangled Environment and Agent Representations (DEAR), that uses the segmentation mask of the agent as supervision to learn disentangled representations of the environment and the agent through feature separation constraints. Unlike previous approaches, DEAR does not require reconstruction of visual observations. These representations are then used as an auxiliary loss to the RL objective, encouraging the agent to focus on the relevant features of the environment. We evaluate DEAR on two challenging benchmarks: Distracting DeepMind control suite and Franka Kitchen manipulation tasks. Our findings demonstrate that DEAR surpasses state-of-the-art methods in sample efficiency, achieving comparable or superior performance with reduced parameters. Our results indicate that integrating agent knowledge into visual RL methods has the potential to enhance their learning efficiency and robustness.
|
|
14:00-15:00, Paper TuPIT6.11 | |
Task and Domain Adaptive Reinforcement Learning for Robot Control |
|
Liu, Yu Tang | Max Planck Institute Intelligent System |
Nilaksh, Nilaksh | Indian Institue of Technology, Kharagpur |
Ahmad, Aamir | University of Stuttgart |
Keywords: Reinforcement Learning, Transfer Learning
Abstract: Deep reinforcement learning (DRL) has shown remarkable success in simulation domains, yet its application in designing robot controllers remains limited, due to its single-task orientation and insufficient adaptability to environmental changes. To overcome these limitations, we present a novel adaptive agent that leverages transfer learning techniques to dynamically adapt policy in response to different tasks and environmental conditions. The approach is validated through the blimp control challenge, where multitasking capabilities and environmental adaptability are essential. The agent is trained using a custom, highly parallelized simulator built on IsaacGym. We perform zero-shot transfer to fly the blimp in the real world to solve various tasks. We share our code at: https://github.com/robot-perception-group/adaptive_agent.
|
|
14:00-15:00, Paper TuPIT6.12 | |
Model-Based Policy Optimization Using Symbolic World Model |
|
Gorodetsky, Andrey | Moscow Institute of Physics and Technology |
Mironov, Konstantin | Ufa University of Science and Technology |
Panov, Aleksandr | AIRI |
Keywords: Reinforcement Learning, Machine Learning for Robot Control, Model Learning for Control
Abstract: The application of learning-based control methods in robotics presents significant challenges. One is that model-free reinforcement learning algorithms use observation data with low sample efficiency. To address this challenge, a prevalent approach is model-based reinforcement learning, which involves employing an environment dynamics model. We suggest approximating transition dynamics with symbolic expressions, which are generated via symbolic regression. Approximation of a mechanical system with a symbolic model has fewer parameters than approximation with neural networks, which can potentially lead to higher accuracy and quality of extrapolation. We use a symbolic dynamics model to generate trajectories in model-based policy optimization to improve the sample efficiency of the learning algorithm. We evaluate our approach across various tasks within simulated environments. Our method demonstrates superior sample efficiency in these tasks compared to model-free and model-based baseline methods.
|
|
14:00-15:00, Paper TuPIT6.13 | |
BayRnTune: Adaptive Bayesian Domain Randomization Via Strategic Fine-Tuning |
|
Huang, Tianle | Georgia Institute of Technology |
Sontakke, Nitish Rajnish | Georgia Institute of Technology |
Kannabiran, Niranjan Kumar | Georgia Institute of Technology |
Essa, Irfan | Georgia Institute of Technology |
Nikolaidis, Stefanos | University of Southern California |
Hong, Dennis | UCLA |
Ha, Sehoon | Georgia Institute of Technology |
Keywords: Reinforcement Learning, Model Learning for Control, Transfer Learning
Abstract: Domain randomization (DR), which entails training a policy with randomized dynamics, has proven to be a simple yet effective algorithm for reducing the gap between simulation and the real world. However, DR often requires careful tuning of randomization parameters. Methods like Bayesian Domain Randomization (Bayesian DR) and Active Domain Randomization (Adaptive DR) address this issue by automating parameter range selection using real-world experience. While effective, these algorithms often require long computation time, as a new policy is trained from scratch every iteration. In this work, we propose Adaptive Bayesian Domain Randomization via Strategic Fine-tuning (BayRnTune), which inherits the spirit of BayRn but aims to significantly accelerate the learning processes by fine-tuning from previously learned policy. This idea leads to a critical question: which previous policy should we use as a prior during fine-tuning? We investigated four different fine-tuning strategies and compared them against baseline algorithms in five simulated environments, ranging from simple benchmark tasks to more complex legged robot environments. Our analysis demonstrates that our method yields better rewards in the same amount of timesteps compared to vanilla domain randomization or Bayesian DR.
|
|
14:00-15:00, Paper TuPIT6.14 | |
Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers |
|
Krnjaic, Aleksandar | Dematic |
Steleac, Raul Dacian | University of Edinburgh |
Thomas, Jonathan David | University of Bristol |
Papoudakis, Georgios | University of Edinburgh |
Schäfer, Lukas | University of Edinburgh |
To, Andrew | Dematic |
Lao, Kuan-Ho | Dematic |
Cubuktepe, Murat | UTexas |
Haley, Matthew | Dematic |
Börsting, Peter | Dematic GmbH |
Albrecht, Stefano V. | University of Edinburgh |
Keywords: Reinforcement Learning, Planning, Scheduling and Coordination, Logistics
Abstract: We consider a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance in this task. Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), and different types of order-picking paradigms (e.g. Goods-to-Person and Person-to-Goods), as the agents can learn how to cooperate optimally through experience. We develop hierarchical MARL algorithms in which a manager agent assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency over baseline MARL algorithms and overall pick rates over multiple established industry heuristics in a diverse set of warehouse configurations and different order-picking paradigms.
|
|
14:00-15:00, Paper TuPIT6.15 | |
Learning When to Stop: Efficient Active Tactile Perception with Deep Reinforcement Learning |
|
Niemann, Christopher | Bielefeld University |
Leins, David Philip | Bielefeld University |
Lach, Luca | Bielefeld University |
Haschke, Robert | Bielefeld University |
Keywords: Reinforcement Learning, Haptics and Haptic Interfaces
Abstract: Actively guiding attention is an important mechanism to employ limited processing resources efficiently. The Recurrent Visual Attention Model (RAM) has been successfully applied to process large input images by sequentially attending to smaller image regions with an RL framework. In tactile perception, sequential attention methods are required naturally due to the limited size of the tactile receptive field. The concept of RAM was transferred to the haptic domain by the Haptic Attention Model (HAM) to iteratively generate a fixed number of informative haptic glances for tactile object classification. We extend HAM to a system capable of actively determining when sufficient haptic data is available for reliable classification. To this end, we introduce a hybrid action space, augmenting the continuous glance location with the discrete decision of when to classify. This allows balancing the cost of obtaining new samples against the cost of misclassification, resulting in an optimized number of glances while maintaining reasonable accuracy. We evaluate the efficiency of our approach on a hand-crafted dataset, which allows us to compute the most efficient glance locations.
|
|
14:00-15:00, Paper TuPIT6.16 | |
TopoNav: Topological Navigation for Efficient Exploration in Sparse Reward Environments |
|
Hossain, Jumman | University of Maryland Baltimore County |
Faridee, Abu-Zaher | University of Maryland Baltimore County, USA |
Roy, Nirmalya | University of Maryland Baltimore County, USA |
Freeman, Jade | DEVCOM Army Research Lab, USA |
Gregory, Timothy | DEVCOM Army Research Lab, USA |
Trout, Theron T. | Stormfish Scientific Corp |
Keywords: Reinforcement Learning, Autonomous Vehicle Navigation, Motion and Path Planning
Abstract: Autonomous robots exploring unknown environ- ments face a significant challenge: navigating effectively without prior maps and with limited external feedback. This challenge intensifies in sparse reward environments, where traditional exploration techniques often fail. In this paper, we present TopoNav, a novel topological navigation framework that inte- grates active mapping, hierarchical reinforcement learning, and intrinsic motivation to enable efficient goal-oriented exploration and navigation in sparse-reward settings. TopoNav dynamically constructs a topological map of the environment, capturing key locations and pathways. A two-level hierarchical policy architecture, comprising a high-level graph traversal policy and low-level motion control policies, enables effective navigation and obstacle avoidance while maintaining focus on the overall goal. Additionally, TopoNav incorporates intrinsic motivation to guide exploration towards relevant regions and frontier nodes in the topological map, addressing the challenges of sparse extrinsic rewards. We evaluate TopoNav both in the simulated and real- world off-road environments using a Clearpath Jackal robot, across three challenging navigation scenarios: goal-reaching, feature-based navigation, and navigation in complex terrains. We observe an increase in exploration coverage by 7-20%, in success rates by 9-19%, and reductions in navigation times by 15-36% across various scenarios, compared to state-of-the-art methods.
|
|
14:00-15:00, Paper TuPIT6.17 | |
Learning-Based Adaptive Control of Quadruped Robots for Active Stabilization on Moving Platforms |
|
Yoon, Minsung | Korea Advanced Institute of Science and Technology (KAIST) |
Shin, Heechan | KAIST |
Jeong, Jeil | Korea Advanced Institute of Science and Technology |
Yoon, Sung-eui | KAIST |
Keywords: Reinforcement Learning, Legged Robots
Abstract: A quadruped robot faces balancing challenges on a six-degrees-of-freedom moving platform, like subways, buses, airplanes, and yachts, due to independent platform motions and resultant diverse inertia forces on the robot. To alleviate these challenges, we present the Learning-based Active Stabilization on Moving Platforms (LAS-MP), featuring a self-balancing policy and system state estimators. The policy adaptively adjusts the robot's posture in response to the platform's motion. The estimators infer robot and platform states based on proprioceptive sensor data. For a systematic training scheme across various platform motions, we introduce platform trajectory generation and scheduling methods. Our evaluation demonstrates superior balancing performance across multiple metrics compared to three baselines. Furthermore, we conduct a detailed analysis of the LAS-MP, including ablation studies and evaluation of the estimators, to validate the effectiveness of each component.
|
|
TuPIT7 |
Room 7 |
Motion and Force Control |
Teaser Session |
Chair: Sakai, Satoru | Shinshu Univ |
Co-Chair: Meghjani, Malika | Singapore University of Technology and Design |
|
14:00-15:00, Paper TuPIT7.1 | |
Koopman Dynamic Modeling for Global and Unified Representations of Rigid Body Systems Making and Breaking Contact |
|
O'Neill, Cormac | Massachusetts Institute of Technology |
Asada, Harry | MIT |
Keywords: Contact Modeling, Dynamics
Abstract: A global modeling methodology based on Koopman operator theory for the dynamics of rigid bodies that make and break contact is presented. Traditionally, robotic systems that contact with their environment are represented as a system comprised of multiple dynamic equations that are switched depending on the contact state. This switching of governing dynamics has been a challenge in both task planning and control. Here, a Koopman lifting linearization approach is presented to subsume multiple dynamics such that no explicit switching is required for examining the dynamic behaviors across diverse contact states. First, it is shown that contact/non-contact transitions are continuous at a microscopic level. This allows for the application of Koopman operator theory to the class of robotic systems that repeat contact/non-contact transitions. Second, an effective method for finding Koopman operator observables for capturing rapid changes to contact forces is presented. The method is applied to the modeling of dynamic peg insertion where a peg collides against and bounces on the chamfer of the hole. Furthermore, the method is applied to the dynamic modeling of a sliding object subject to complex friction and damping properties. Segmented dynamic equations are unified with the Koopman modeling method.
|
|
14:00-15:00, Paper TuPIT7.2 | |
Neuromorphic Force-Control in an Industrial Task: Validating Energy and Latency Benefits |
|
Amaya, Camilo | Fortiss - An-Institut Technische Universität München |
Eames, Evan | Fortiss - an Institut Technische Universität München |
Palinauskas, Gintautas | Fortiss - An-Institut Technische Universität München |
Perzylo, Alexander Clifford | Fortiss - An-Institut Technische Universität München |
Sandamirskaya, Yulia | ZHAW |
von Arnim, Axel | Fortiss |
Keywords: Neurorobotics, Biologically-Inspired Robots, Force Control
Abstract: As robots become smarter and more ubiquitous, optimizing the power consumption of intelligent compute becomes imperative towards ensuring the sustainability of technological advancements. Neuromorphic computing hardware makes use of biologically inspired neural architectures to achieve energy and latency improvements compared to conventional von Neumann computing architecture. Applying these benefits to robots has been demonstrated in several works in the field of neurorobotics, typically on relatively simple control tasks. Here, we introduce an example of neuromorphic computing applied to the real-world industrial task of object insertion. We trained a spiking neural network (SNN) to perform force-torque feedback control using a reinforcement learning approach in simulation. We then ported the SNN to the Intel neuromorphic research chip Loihi interfaced with a KUKA robotic arm. At inference time we show latency competitive with current CPU/GPU architectures, and one order of magnitude less energy usage in comparison to traditional low-energy edge-hardware. We offer this example as a proof of concept implementation of a neuromoprhic controller in real-world robotic setting, highlighting the benefits of neuromorphic hardware for the development of intelligent controllers for robots.
|
|
14:00-15:00, Paper TuPIT7.3 | |
Zero-Shot Transfer of a Tactile-Based Continuous Force Control Policy from Simulation to Robot |
|
Lach, Luca | Bielefeld University |
Haschke, Robert | Bielefeld University |
Tateo, Davide | Technische Universität Darmstadt |
Peters, Jan | Technische Universität Darmstadt |
Ritter, Helge Joachim | Bielefeld University |
Borràs Sol, Júlia | Institut De Robòtica I Informàtica Industrial (CSIC-UPC) |
Torras, Carme | Csic - Upc |
Keywords: Force and Tactile Sensing, Deep Learning in Grasping and Manipulation, Force Control
Abstract: The advent of tactile sensors in robotics has sparked many ideas on how robots can leverage direct contact measurements of their environment interactions to improve manipulation tasks. An important line of research in this regard is grasp force control, which aims to manipulate objects safely by limiting the amount of force exerted on the object. While prior works have either hand-modeled their force controllers, employed model-based approaches, or not shown sim-to-real transfer, we propose a model-free deep reinforcement learning approach trained in simulation and then transferred to the robot without further fine-tuning. We, therefore, present a simulation environment that produces realistic normal forces, which we use to train continuous force control policies. A detailed evaluation shows that the learned policy performs similarly or better than a hand-crafted baseline. Ablation studies prove that the proposed inductive bias and domain randomization facilitate sim-to-real transfer. Code, models, and supplementary videos are available on https://sites.google.com/view/rl-force-ctrl
|
|
14:00-15:00, Paper TuPIT7.4 | |
A Proxy-Tactile Reactive Control for Robots Moving in Clutter |
|
Caroleo, Giammarco | University of Oxford |
Giovinazzo, Francesco | University of Genoa |
Albini, Alessandro | University of Oxford |
Grella, Francesco | University of Genova |
Cannata, Giorgio | University of Genova |
Maiolino, Perla | University of Oxford |
Keywords: Force and Tactile Sensing, Sensor-based Control
Abstract: Robots performing tasks in challenging environments must be supported by control or planning algorithms that exploit sensor feedback to effectively plan the robot’s actions. In this paper, we propose a reactive control law that simultaneously utilizes proximity and tactile feedback to perform a pick-and-place task in an unknown and cluttered environment. Specifically, the presented solution leverages proximity sensing obtained from distributed Time of Flight (ToF) sensors to avoid collision when this does not interfere with the pick-and-place task. Safety is guaranteed by a higher-priority task using tactile feedback that reduces contact forces when a collision occurs. Additionally, we compare the effectiveness of this control scheme with a collision detection and reaction scheme based solely on tactile sensing. Our results demonstrate that the proposed approach reduces the collisions with the environment and the task execution time of the pick-and-place operation.
|
|
14:00-15:00, Paper TuPIT7.5 | |
Position Control of a Low-Energy C-Core Reluctance Actuator in a Motion System |
|
Al Saaideh, Mohammad | Memorial University of Newfoundland |
Al-Rawashdeh, Yazan | Memorial University of Newfoundland |
Alatawneh, Natheer | Cysca Technologies |
Aljanaideh, Khaled | Jordan University of Science and Technology |
Al Janaideh, Mohammad | University of Guelph |
Keywords: Semiconductor Manufacturing, Motion Control
Abstract: This paper introduces a position control system for a motion stage driven by a low-energy C-core reluctance actuator. The central concept explored here is the utilization of a variable air gap to enable energy-efficient operation of the motion stage. The paper begins by presenting the design and mathematical model of the reluctance-actuated motion system (RAMS). Subsequently, open-loop responses of the RAMS are analyzed under various conditions, including variable air gaps and different applied voltages, highlighting the advantages of a variable air gap, particularly in reducing the required current. Finally, the paper formulates a control approach that combines a feedforward controller to linearize the RAMS’s dynamic behavior and a state feedback controller to achieve tracking performance. Experimental results demonstrate the effectiveness of this control approach in achieving tracking objectives with errors that are less than 2% for constant desired displacement and less than 10% for tracking a sinusoidal reference signal
|
|
14:00-15:00, Paper TuPIT7.6 | |
Improved Contact Stability for Admittance Control of Industrial Robots with Inverse Model Compensation |
|
Samuel, Kangwagye | Technical University of Munich |
Haninger, Kevin | Fraunhofer IPK |
Haddadin, Sami | Technical University of Munich |
Oh, Sehoon | DGIST |
Keywords: Compliance and Impedance Control, Human-Robot Collaboration, Motion Control
Abstract: Industrial robots have increased payload, repeatability, and reach compared to collaborative robots, however, they have a fixed position controller and low intrinsic admittance. This makes realizing safe contact challenging due to large contact force overshoots in contact transitions and contact instability when the environment and robot dynamics are coupled. To improve safe contact on industrial robots, we propose an admittance controller with inverse model compensation, designed and implemented outside the position controller. By including both the inner loop and outer loop dynamics in its design, the proposed method achieves expanded admittance in terms of increasing both gain and cutoff frequency of the desired admittance. Results from theoretical analyses and experiments on a commercial industrial robot show that the proposed method improves rendering of the desired admittance while maintaining contact stability. We further validate this by conducting actual assembly tasks of plug insertion with fine positioning, switch insertion onto the rail, and colliding the robot end effector with random objects and surfaces, as seen at https://youtu.be/8XfkdHEdWDs.
|
|
14:00-15:00, Paper TuPIT7.7 | |
Current-Based Impedance Control for Interacting with Mobile Manipulators |
|
de Wolde, Jelmer | Delft University of Technology |
Knoedler, Luzia | Delft University of Technology |
Garofalo, Gianluca | ABB AB |
Alonso-Mora, Javier | Delft University of Technology |
Keywords: Compliance and Impedance Control, Mobile Manipulation, Physical Human-Robot Interaction
Abstract: As robots shift from industrial to human-centered spaces, adopting mobile manipulators, which expand workspace capabilities, becomes crucial. In these settings, seamless interaction with humans necessitates compliant control. Two common methods for safe interaction, admittance, and impedance control, require force or torque sensors, often absent in lower-cost or lightweight robots. This paper presents an adaption of impedance control that can be used on current-controlled robots without the use of force or torque sensors and shows its application for compliant control of a mobile manipulator. A calibration method is designed that enables estimation of the actuators' current/torque ratios and frictions, used by the adapted impedance controller, and that can handle model errors. The calibration method and the performance of the designed controller are experimentally validated using the Kinova GEN3 Lite arm. Results show that the calibration method is consistent and that the designed controller for the arm is compliant while also being able to track targets with five-millimeter precision when no interaction is present. Additionally, this paper presents two operational modes for interacting with the mobile manipulator: one for guiding the robot around the workspace through interacting with the arm and another for executing a tracking task, both maintaining compliance to external forces. These operational modes were tested in real-world experiments, affirming their practical applicability and effectiveness.
|
|
14:00-15:00, Paper TuPIT7.8 | |
Understanding Strain Wave Gear Directional Efficiency in the Context of Robotic Actuation and Overcoming the Corresponding Performance Limitations through Direct Torque Control |
|
Georgiev, Nikola | Jet Propulsion Laboratory |
|
14:00-15:00, Paper TuPIT7.9 | |
Response Improvement of Hydraulic Robotic Joints Via a Force Servo and Inverted Pendulum Demo |
|
Arai, Ryo | Shinshu University |
Sakai, Satoru | Shinshu Univ |
Ono, Kazuki | Shinshu University |
Keywords: Force Control, Hydraulic/Pneumatic Actuators, Dynamics
Abstract: In this paper, a force servo for hydraulic robot joints is designed and applied to an inverted pendulum demo. First, the nonlinear nominal model of the hydraulic cylinder is reviewed, and the nominal integrator is introduced. Second, through a partial fusion of our modeling and control techniques, a force servo is designed based on a modification of the nominal integrator as an intermediate stage. Finally, against the non-short hydraulic pipes (1.9 m), the response is improved by the designed force servo in the presence of rotational motion effects.
|
|
14:00-15:00, Paper TuPIT7.10 | |
Development of a Spherical Wheel-Legged Composite Mobile Robot with Multimodal Motion Capabilities |
|
Du, Yuyang | Harbin Institute of Technology, Shenzhen |
Ye, Ruihua | Harbin Institute of Technology, Shenzhen |
Xu, Wenfu | Harbin Institute of Technology, Shenzhen |
Keywords: Motion Control, Mechanism Design, Body Balancing
Abstract: In this paper, we present a spherical wheel-legged mobile robot, aiming to meet the demands of adaptability to complex terrains and high maneuverability. It consists of a spherical main body and five-bar linkage parallel wheel-legged mechanisms. It can switch between legged and spherical modes by extending and retracting its legs according to the demands of the actual environment, thereby enhancing the overall mobility of the robot. By designing the Linear Quadratic Regulator (LQR) controller, we achieve the impact-resistant leg balancing motion and autonomous pitch adjustment for the robot on inclined surfaces in the legged configuration. For the rolling control in the spherical configuration, a hierarchical sliding mode control method and Proportional-Integral-Derivative controller (PID) are employed to control the rolling and turning of the robot. We finally verify The robustness of the robot in legged configuration against disturbances and the stability of its motion in spherical configuration.
|
|
14:00-15:00, Paper TuPIT7.11 | |
Segmented Safety Docking Control for Mobile Self-Reconfigurable Robots |
|
Zheng, Zhi | Chongqing University |
Jiang, Tao | Chongqing University |
Tan, Senqi | China North Artificial Intelligence & Innovation Research Instit |
Zhang, Hao | ChongQing University |
Ye, Jianchuan | Tsinghua University |
Keywords: Motion Control, Robust/Adaptive Control, Underactuated Robots
Abstract: Mobile self-reconfigurable robots (MSRRs), as a novel multi-robot system with flexible configurations and task adaptability, hold promising applications in unstructured task environments. However, existing autonomous docking strategies are primarily applied in laboratory settings and face numerous challenges and limitations in actual applications, including differences in sensor characteristics, safety threats, and saturation constraints. To address these issues, this paper proposes a segmented secure docking control framework based on global localization and local perception to achieve stable and reliable reconfiguration of MSRRs in practical applications. Specific contributions include the implementation of a dual-layer constraint framework for safeness of units in the long-distance phase against velocity and acceleration nested windups, and the integration of active line-of-sight (LOS) correction and adaptive windup driving mobile units to achieve precise and rapid locking of docked positions within the LOS in the close-range phase. Finally, the validity of the proposed method is verified via physical experiments, offering an innovative approach to deploying MSRRs in complex scenarios.
|
|
14:00-15:00, Paper TuPIT7.12 | |
Attitude Control of the Hydrobatic Intervention AUV Cuttlefish Using Incremental Nonlinear Dynamic Inversion |
|
Slawik, Tom | German Research Center for Artificial Intelligence (DFKI GmbH), |
Vyas, Shubham | Robotics Innovation Center, DFKI GmbH |
Christensen, Leif | DFKI |
Kirchner, Frank | University of Bremen |
Keywords: Motion Control, Sensor-based Control, Marine Robotics
Abstract: In this paper, we present an attitude control scheme for an autonomous underwater vehicle (AUV), which is based on incremental nonlinear dynamic inversion (INDI). Conventional model-based controllers depend on an exact model of the controlled system, which is difficult to find, especially for marine vehicles subject to highly nonlinear hydrodynamic effects. INDI trades off model accuracy with sensor accuracy by incorporating acceleration feedback and actuator output feedback to linearize a nonlinear system incrementally. Existing research primarily focuses on studying INDI on unmanned aerial vehicles. However, there is barely any research on controlling marine vehicles using INDI. The control task we are performing is a 90 degrees pitch-up maneuver, where the dual-arm intervention AUV Cuttlefish transitions from a horizontal traveling pose to a vertical intervention pose. We compare INDI to a classical model-based control scheme in the maritime test basin at DFKI RIC, Germany, and we find that INDI keeps the AUV much more steady both in the transitioning phase as well as in the station keeping phase.
|
|
14:00-15:00, Paper TuPIT7.13 | |
Robot Guided Evacuation with Viewpoint Constraints |
|
Gong, Chen | Singapore University of Technology and Design |
Meghjani, Malika | Singapore University of Technology and Design |
Prasetyo, Marcel Bartholomeus | Singapore University of Technology and Design |
Keywords: Motion Control, Search and Rescue Robots, Human-Aware Motion Planning
Abstract: We present a viewpoint-based non-linear Model Predictive Control (MPC) for evacuation guiding robots. Specifically, the proposed MPC algorithm enables evacuation guiding robots to track and guide cooperative human targets in emergency scenarios. Our algorithm accounts for the environment layout as well as distances between the robot and human target and distance to the goal location. A key challenge for evacuation guiding robot is the trade-off between its planned motion for leading the target toward a goal position and staying in the target's viewpoint while maintaining line-of-sight for guiding. We illustrate the effectiveness of our proposed evacuation guiding algorithm in both simulated and real-world environments with an Unmanned Aerial Vehicle (UAV) guiding a human. Our results suggest that using the contextual information from the environment for motion planning, increases the visibility of the guiding UAV to the human while achieving faster total evacuation time.
|
|
14:00-15:00, Paper TuPIT7.14 | |
Virtual Model Control for Compliant Reaching under Uncertainties |
|
Zhang, Yi | University of Cambridge |
Larby, Daniel | University of Cambridge |
Iida, Fumiya | University of Cambridge |
Forni, Fulvio | University of Cambridge |
Keywords: Motion Control, Compliance and Impedance Control, Whole-Body Motion Planning and Control
Abstract: Virtual Model Control (VMC) is an approach to design a controller for force-controlled robots in complex uncertain environments. While this method was primarily investigated for legged robot locomotion in the past, it can be more generally applicable to other types of robotic systems. This paper investigates the VMC framework for reaching tasks in a force-controlled robotic arm. We propose six different approaches to designing virtual models in order to achieve reaching tasks in environments with obstacles and uncertainties. A force-controlled 8 degree-of-freedom humanoid robot was used to validate the proposed approach in the real world. We conducted three experiments to test the performance of VMC controllers in terms of predictability, sensitivity to external force, and adaptability against known and unknown obstacles. Experimental analyses show that, even though the proposed approach needs to sacrifice accuracy and trajectory optimality, it enables us to design complex reaching motions under uncertainties, in an intuitive and extendable manner.
|
|
TuPIT8 |
Room 8 |
Robot Calibration and Identification |
Teaser Session |
Co-Chair: He, Yuesheng | Shanghai Jiao Tong University |
|
14:00-15:00, Paper TuPIT8.1 | |
Online Adaptation of Learned Vehicle Dynamics Model with Meta-Learning Approach |
|
Tsuchiya, Yuki | Toyota Motor Corporation |
Balch, Thomas | Toyota Research Institute |
Drews, Paul | Toyota Research Institute |
Rosman, Guy | Massachusetts Institute of Technology |
Keywords: Continual Learning, Model Learning for Control, Field Robots
Abstract: We represent a vehicle dynamics model for autonomous driving near the limits of handling via a multi-layer neural network. Online adaptation is desirable in order to address unseen environments. However, the model needs to adapt to new environments without forgetting previously encountered ones. In this study, we apply Continual-MAML to overcome this difficulty. It enables the model to adapt to the previously encountered environments quickly and efficiently by starting updates from optimized initial parameters. We evaluate the impact of online model adaptation with respect to inference performance and impact on control performance of a model predictive path integral (MPPI) controller using the TRIKart platform. The neural network was pre-trained using driving data collected in our test environment, and experiments for online adaptation were executed on multiple different road conditions not contained in the training data. Empirical results show that the model using Continual-MAML outperforms the fixed model and the model using gradient descent in test set loss and online tracking performance of MPPI.
|
|
14:00-15:00, Paper TuPIT8.2 | |
An Online Automatic Calibration Method for Infrastructure-Based LiDAR-Camera Via Cross-Modal Object Matching |
|
Wang, Tao | Shanghai Jiao Tong University |
He, Yuesheng | Shanghai Jiao Tong University |
Zhuang, Hanyang | Shanghai Jiao Tong University |
Yang, Ming | Shanghai Jiao Tong University |
Keywords: Sensor Networks, Sensor Fusion, Intelligent Transportation Systems
Abstract: In indoor environments where the Global Navigation Satellite System (GNSS) isn't available, the infrastructure-based LiDAR-camera joint array can provide high-precision localization for mobile robots, such as Autonomous Valet Parking(AVP). The primary challenge in employing the infrastructure-based LiDAR-camera joint array is the extrinsic calibration between the LiDAR and the camera. Moreover, to handle interference deviation caused by vibrations or inadequate mounting stiffness during operation, the calibration’s extrinsic parameters must be automatically updated online, presenting higher demands for infrastructure-based LiDAR-camera extrinsic calibration. This paper proposes an infrastructure LiDAR-camera online automatic calibration method based on prior knowledge of cross-modal target registration. This method requires no manual targets and initial pose guesses and can achieve extrinsic calibration. The object-prior model based on a lightweight object detection algorithm can rapidly detect scenes favorable for extrinsic calibration in sub-images of camera images. This creates favorable conditions for the registration of cross-modal networks and poses optimization of the LiDAR camera. Additionally, because a lightweight algorithm is used, the process does not compromise efficiency or consume excessive computational resources. Experimental results demonstrate that the proposed calibration method is suitable for calibrating infrastructure-based LiDAR-camera, with comparable accuracy and the ability to perform online calibration. Comparative experiments also show that the object prior model can indeed select better scenes for LiDAR-camera extrinsic calibration, thus improving the accuracy and stability of extrinsic calibration to some extent.
|
|
14:00-15:00, Paper TuPIT8.3 | |
EasyHeC++: Fully Automatic Hand-Eye Calibration with Pretrained Image Models |
|
Hong, Zhengdong | Zhejiang University |
Zheng, Kangfu | Tsinghua University |
Chen, Linghao | Zhejiang University |
Keywords: Calibration and Identification, Computer Vision for Automation, Recognition
Abstract: Hand-eye calibration plays a fundamental role in robotics by directly influencing the efficiency of critical operations such as manipulation and grasping. In this work, we present a novel framework, EasyHeC++, designed for fully automatic hand-eye calibration. In contrast to previous methods that necessitate manual calibration, specialized markers, or the training of arm-specific neural networks, our approach is the first system that enables accurate calibration of any robot arm in a marker-free, training-free, and fully automatic manner. Our approach employs a two-step process. First, we initialize the camera pose using a sampling or feature-matching-based method with the aid of pretrained image models. Subsequently, we perform pose optimization through differentiable rendering. Extensive experiments demonstrate the system's superior accuracy in both synthetic and real-world datasets across various robot arms and camera settings. Project page: https://ootts.github.io/easyhec_plus.
|
|
14:00-15:00, Paper TuPIT8.4 | |
A Direct Algorithm for Multi-Gyroscope Infield Calibration |
|
Wang, Tianheng | Apple |
Roumeliotis, Stergios | Apple Inc |
Keywords: Calibration and Identification, Sensor Networks, Visual-Inertial SLAM
Abstract: In this paper, we address the problem of estimating the rotational extrinsics, as well as the scale factors of two gyroscopes rigidly mounted on the same device. In particular, we formulate the problem as a least-squares minimization and introduce a direct algorithm that computes the estimated quantities without any iterations, hence avoiding local minima and improving efficiency. Furthermore, we show that the rotational extrinsics are always observable while the scale factors can be determined up to global scale for general configurations of the gyroscopes. To this end, we also study special placements of the gyroscopes where a pair, or all, of their axes are parallel and analyze their impact on the scale factors’ observability. Lastly, we evaluate our algorithm in simulations and real-world experiments to assess its performance as a function of key motion and sensor characteristics.
|
|
14:00-15:00, Paper TuPIT8.5 | |
Sensor-Agnostic Visuo-Tactile Robot Calibration Exploiting Assembly-Precision Model Geometries |
|
Gomes, Manuel | University of Aveiro |
Görner, Michael | University of Hamburg |
Oliveira, Miguel | University of Aveiro |
Zhang, Jianwei | University of Hamburg |
Keywords: Calibration and Identification, Sensor Fusion, Kinematics
Abstract: Visual sensor modalities dominate traditional robot calibration, but when environment contacts are relevant, the tactile modality can provide another natural, accurate, and highly relevant modality. Most existing tactile sensing methods for robot calibration are constrained to specific sensor-object pairs, limiting their applicability. This paper pioneers a general approach to exploit contacts in robot calibration, supporting self-touch throughout the entire system kinematics by generalizing touchable surfaces to any accurately represented mesh surface. The approach supports different contact sensors as long as a simple single-contact interface can be provided. Integrated into the Atomic Transformation Optimization Method (ATOM) calibration methodology, our work facilitates seamless integration of both modalities in a single approach. Our results demonstrate comparable performance to single-modality calibration but can trade off accuracy between both modalities, thus increasing overall robustness. Furthermore, we observe that utilizing a touch point at the end of a kinematic chain slightly improves calibration over touching the chain links with an external sensor but find no significant advantage of restricting touch to end-effector contacts when calibrating a dual-arm system with our method.
|
|
14:00-15:00, Paper TuPIT8.6 | |
Extrinsic Calibration of Multiple LiDARs for a Mobile Robot Based on Floor Plane and Object Segmentation |
|
Niijima, Shun | Sony Group Corporation |
Suzuki, Atsushi | Sony Group Corporation |
Tsuzaki, Ryoichi | Sony Group Corporation |
Kinoshita, Masaya | Sony Group Corporation |
Keywords: Calibration and Identification, Sensor Fusion
Abstract: Mobile robots equipped with multiple light detection and ranging (LiDARs) and capable of recognizing their surroundings are increasing due to the minitualization and cost reduction of LiDAR. This paper proposes a target-less extrinsic calibration method of multiple LiDARs with non-overlapping field of view (FoV). The proposed method uses accumulated point clouds of floor plane and objects while in motion. It enables accurate calibration with challenging configuration of LiDARs that directed towards the floor plane, caused by biased feature values. Additionally, the method includes a noise removal module that considers the scanning pattern to address bleeding points, which are noises of significant source of error in point cloud alignment using high-density LiDARs. Evaluations through simulation demonstrate that the proposed method achieved higher accuracy extrinsic calibration with two and four LiDARs than conventional methods, regardless type of objects. Furthermore, the experiments using a real mobile robot has shown that our proposed noise removal module can eliminate noise more precisely than conventional methods, and the estimated extrinsic parameters have successfully created consistent 3D maps.
|
|
14:00-15:00, Paper TuPIT8.7 | |
Research of Calibration Method for Fusion of LDS Sensor and ToF Low-Cost Sensor |
|
Zhu, Jiahui | Ningbo University |
Yu, Guitao | Healthy & Intelligent Kitchen Engineering Research Center of Z |
He, Yang | Healthy & Intelligent Kitchen Engineering Research Center of Zhe |
Yang, Kui | Ningbo University |
Liang, Dongtai | Ningbo University |
Keywords: Calibration and Identification, SLAM, Wheeled Robots
Abstract: This paper proposes a method for calibrating the external parameters of the LDS sensor and ToF depth camera based on three cylinders. This method obtains the scanning data of the side surfaces of the three cylinders at different postures by changing the posture of the robot. For the single-line laser plane scanned by the LDS sensor, three elliptical contours are obtained by intersecting with the side surfaces of the three cylinders respectively. The Random Sample Consensus (RANSAC) algorithm is used to obtain the coordinates of the center points of the three elliptical contours and two random points on each elliptical contour. For the three-dimensional point cloud image of the cylinder scanned by the ToF depth camera, the RANSAC algorithm is used to fit the central axis of the three cylinders. The nonlinear optimization equation is established using the three center points obtained from the three elliptical contours and the distances from the two random points on each elliptical contour to their corresponding central axes. In this paper, we propose to use a fusion method of the Powell algorithm and the BFGS algorithm to solve the nonlinear optimization equations to obtain the transformation matrix between the LDS sensor and the ToF depth camera. Finally, simulation and actual test are carried out based on the proposed method, and the influence of the initial value of the calibration parameter on the calibration result is discussed. The accuracy of the calibration algorithm in this paper is verified through comparative experiments of the calibration algorithm. The results show that the calibration accuracy of the proposed method is better than that of the traditional planar calibration method, and it has the characteristics of simple operation and high calibration accuracy.
|
|
14:00-15:00, Paper TuPIT8.8 | |
Efficient Extrinsic Self-Calibration of Multiple IMUs Using Measurement Subset Selection |
|
Lee, Jongwon | University of Illinois Urbana-Champaign |
Hanley, David | University of Edinburgh |
Bretl, Timothy | University of Illinois at Urbana-Champaign |
Keywords: Calibration and Identification, Sensor Fusion, Space Robotics and Automation
Abstract: This paper addresses the problem of choosing a sparse subset of measurements for quick calibration parameter estimation. A standard solution to this is selecting a measurement only if its utility---the difference between posterior (with the measurement) and prior information (without the measurement)---exceeds some threshold. Theoretically, utility, a function of the parameter estimate, should be evaluated at the estimate obtained with all measurements selected so far, hence necessitating a recalibration with each new measurement. However, we hypothesize that utility is insensitive to changes in the parameter estimate for many systems of interest, suggesting that evaluating utility at some initial parameter guess would yield equivalent results in practice. We provide evidence supporting this hypothesis for extrinsic calibration of multiple inertial measurement units (IMUs), showing the reduction in calibration time by two orders of magnitude by forgoing recalibration for each measurement.
|
|
14:00-15:00, Paper TuPIT8.9 | |
MFCalib: Single-Shot and Automatic Extrinsic Calibration for LiDAR and Camera in Targetless Environments Based on Multi-Feature Edge |
|
Ye, Tianyong | Shenzhen University |
Xu, Wei | Manifold Tech Limited |
Zheng, Chunran | The University of Hong Kong |
Cui, Yukang | Shenzhen University |
Keywords: Calibration and Identification, SLAM, Sensor Fusion
Abstract: This paper presents MFCalib, an innovative extrinsic calibration technique for LiDAR and RGB camera that operates automatically in targetless environments with a single data capture. At the heart of this method is using a rich set of edge information, significantly enhancing calibration accuracy and robustness. Specifically, we extract both depth-continuous and depth-discontinuous edges, along with intensity-discontinuous edges on planes. This comprehensive edge extraction strategy ensures our ability to achieve accurate calibration with just one round of data collection, even in complex and varied settings. Addressing the uncertainty of depth-discontinuous edges, we delve into the physical measurement principles of LiDAR and develop a beam model, effectively mitigating the issue of edge inflation caused by the LiDAR beam. Extensive experiment results demonstrate that MFCalib outperforms the state-of-the-art targetless calibration methods across various scenes, achieving and often surpassing the precision of multi-scene calibrations in a single-shot collection. To support community development, we make our code available open-source on GitHub.
|
|
14:00-15:00, Paper TuPIT8.10 | |
A Graph-Based Self-Calibration Technique for Cable-Driven Robots with Sagging Cable |
|
Dindarloo, Mohammadreza | K. N. Toosi University of Technology |
Mirjalili, Amir Saman | K. N. Toosi University of Technology |
Khalilpour, S. Ahmad | K. N. Toosi University of Technology |
Khorrambakht, Rooholla | New York University |
Weiss, Stephan | Universität Klagenfurt |
Taghirad, Hamid D. | K.N.Toosi University of Technology |
Keywords: Calibration and Identification, Parallel Robots, Kinematics
Abstract: The efficient operation of large-scale Cable-Driven Parallel Robots (CDPRs) relies on precise calibration of kinematic parameters and the simplicity of the calibration process. This paper presents a graph-based self-calibration framework that explicitly addresses cable sag effects and facilitates the calibration procedure for large-scale CDPRs by only relying on internal sensors. A unified factor graph is proposed, incorporating a catenary cable model to capture cable sagging. The factor graph iteratively refines kinematic parameters, including anchor point locations and initial cable length, by considering jointly onboard sensor data and the robot's kineto-static model. The applicability and accuracy of the proposed technique are demonstrated through Finite Element (FE) simulations, on both large and small-scale CDPRs subjected to significant initialization perturbations.
|
|
14:00-15:00, Paper TuPIT8.11 | |
Uncertainty-Aware Deployment of Pre-Trained Language-Conditioned Imitation Learning Policies |
|
Wu, Bo | University of Pennsylvania |
Lee, Bruce | University of Pennsylvania |
Daniilidis, Kostas | University of Pennsylvania |
Bucher, Bernadette | University of Michigan |
Matni, Nikolai | University of Pennsylvania |
Keywords: Calibration and Identification, Imitation Learning, Perception-Action Coupling
Abstract: Large-scale robotic policies trained on data from diverse tasks and robotic platforms hold great promise for enabling general-purpose robots; however, reliable generalization to new environment conditions remains a major challenge. To address this challenge, we propose an approach to achieve this in pre-trained language-conditioned imitation learning agents. Specifically, we use temperature scaling to calibrate these models and exploit the calibrated model to make uncertainty-aware decisions, by aggregating the local information of candidate actions. We implement our approach in simulation using three such pre-trained models, and showcase its potential to significantly enhance task completion rates. The accompanying code is accessible at the link: href{https://github.com/BobWu1998/uncertainty_quant_all.git}{https://github.com/BobWu1998/uncertainty_quant_all.git }
|
|
14:00-15:00, Paper TuPIT8.12 | |
MEMROC: Multi-Eye to Mobile RObot Calibration |
|
Allegro, Davide | University of Padova |
Terreran, Matteo | University of Padova |
Ghidoni, Stefano | University of Padova |
Keywords: Calibration and Identification, Wheeled Robots, Sensor Networks
Abstract: This paper presents MEMROC (Multi-Eye to Mobile RObot Calibration), a novel motion-based calibration method that simplifies the process of accurately calibrating multiple cameras relative to a mobile robot’s reference frame. MEMROC utilizes a known calibration pattern to facilitate accurate calibration with a lower number of images during the optimization process. Additionally, it leverages robust ground plane detection for comprehensive 6-DoF extrinsic calibration, overcoming a critical limitation of many existing methods that struggle to estimate the complete camera pose. The proposed method addresses the need for frequent recalibration in dynamic environments, where cameras may shift slightly or alter their positions due to daily usage, operational adjustments, or vibrations from mobile robot movements. MEMROC exhibits remarkable robustness to noisy odometry data, requiring minimal calibration input data. This combination makes it highly suitable for daily operations involving mobile robots. A comprehensive set of experiments on both synthetic and real data proves MEMROC’s efficiency, surpassing existing state- of-the-art methods in terms of accuracy, robustness, and ease of use. To facilitate further research, we have made our code publicly available: https://github.com/davidea97/MEMROC.git
|
|
14:00-15:00, Paper TuPIT8.13 | |
V2I-Calib: A Novel Calibration Approach for Collaborative Vehicle and Infrastructure LiDAR Systems |
|
Qu, Luca | School of Vehicle and Mobility, Tsinghua University |
Xiong, Yijin | Tsinghua University |
Zhang, Guipeng | Institute of Computing Technology of the Chinese Academy of Scie |
Wu, Xin | Beijing Jiaotong University |
Gao, Xiaohan | Tsinghua University |
Gao, Xin | China University of Mining & Technology, Beijing |
Li, Hanyu | Beijing Jiaotong University |
Guo, Shichun | Tsinghua University |
Zhang, Guoying | China University of Mining & Technology, Beijing |
Keywords: Calibration and Identification, Intelligent Transportation Systems, Localization
Abstract: Cooperative LiDAR systems integrating vehicles and road infrastructure, termed V2I calibration, exhibit substantial potential, yet their deployment encounters numerous challenges. A pivotal aspect of ensuring data accuracy and consistency across such systems involves the calibration of LiDAR units across heterogeneous vehicular and infrastructural endpoints. This necessitates the development of calibration methods that are both real-time and robust, particularly those that can ensure robust performance in urban canyon scenarios without relying on initial positioning values. Accordingly, this paper introduces a novel approach to V2I calibration, leveraging spatial association information among perceived objects. Central to this method is the innovative Overall Intersection over Union (oIoU) metric, which quantifies the correlation between targets identified by vehicle and infrastructure systems, thereby facilitating the real-time monitoring of calibration results. Our approach involves identifying common targets within the perception results of vehicle and infrastructure LiDAR systems through the construction of an affinity matrix. These common targets then form the basis for the calculation and optimization of extrinsic parameters. Comparative and ablation studies conducted using the DAIR-V2X dataset substantiate the superiority of our approach. For further insights and resources, our project repository is accessible at https://github.com/MassimoQu/v2i-calib.
|
|
14:00-15:00, Paper TuPIT8.14 | |
A Piecewise-Weighted RANSAC Method Utilizing Abandoned Hypothesis Model Information with a New Application on Robot Self-Calibration |
|
He, Jianhui | Ningbo Institute of Materials Technology and Engineering, Chines |
Feng, Yiyang | Ningbo Institute of Material Technology & Engineering, CAS |
Yang, Guilin | Ningbo Institute of Material Technology and Engineering, Chines |
Shen, Wenjun | Ningbo Institute of Material Technology and Engineering, Chinese |
Chen, Silu | Ningbo Institute of Materials Technology and Engineering, CAS |
Zheng, Tianjiang | Ningbo Industrial Technology Research Institute |
Li, Junjie | Ningbo Institute of Material Technology and Engineering, Chinese |
Keywords: Calibration and Identification, Kinematics
Abstract: Industrial robots and collaborative robots are widely employed in industry and are progressively being utilized to assist individuals in their daily routines. To improve their absolute accuracy, self-calibration methods using portable local measurement devices are cost-effective solutions. However, compared with the conventional external calibration methods, self-calibration methods employing two configurations as a calibration sample introduce more non-kinematic errors to the robot. Therefore, noise reduction is significantly necessary in self-calibration. A novel Piecewise-weighted Random Sample Consensus (RANSAC) method is proposed in this paper. Instead of choosing an optimal model with all inliers, the proposed method employs a general weight considering both the sample and hypothesis model qualities to generate a new model with Weighted Least Square (WLS) method. Besides, the proposed method turns the target of finding an uncontaminated set of inliers into the training of the proper weight coefficient for WLS, which not only improves the accuracy but also greatly enhances the speed. The self-calibration experiment on a 6 degree-of-freedom(DOF) robot CR10 shows that the accuracy of the proposed Piecewise-weighted RANSAC method makes a 27.7% accuracy improvement from that employing Least Square method, a 20.0% accuracy improvement from that employing standard RANSAC method, and a 5.5% accuracy improvement from that employing LO-RANSAC method. Besides, the proposed method is also over 10.9 times faster than the standard RANSAC method and 18.6 times faster than the LO-RANSAC method.
|
|
14:00-15:00, Paper TuPIT8.15 | |
A Hybrid Model and Learning-Based Force Estimation Framework for Surgical Robots |
|
Yang, Hao | Johns Hopkins University |
Zhou, Haoying | Worcester Polytechnic Institute |
Fischer, Gregory Scott | Worcester Polytechnic Institute, WPI |
Wu, Jie Ying | Vanderbilt University |
Keywords: Calibration and Identification, Surgical Robotics: Laparoscopy, Model Learning for Control
Abstract: Haptic feedback to the surgeon during robotic surgery would enable safer and more immersive surgeries but estimating tissue interaction forces at the tips of robotically controlled surgical instruments has proven challenging. Few existing surgical robots can measure interaction forces directly and the additional sensor may limit the life of instruments. We present a hybrid model and learning-based framework for force estimation for the Patient Side Manipulators (PSM) of a da Vinci Research Kit (dVRK). The model-based component identifies the dynamic parameters of the robot and estimates free-space joint torque, while the learning-based component compensates for environmental factors, such as the additional torque caused by trocar interaction between the PSM instrument and the patient's body wall. We evaluate our method in an abdominal phantom and achieve an error in force estimation of under 10% normalized root-mean-squared error. We show that by using a model-based method to perform dynamics identification, we reduce reliance on the training data covering the entire workspace. Although originally developed for the dVRK, the proposed method is a generalizable framework for other compliant surgical robots. The code is available at https://github.com/vu-maple-lab/dvrk_force_estimation.
|
|
14:00-15:00, Paper TuPIT8.16 | |
Asynchronous Microphone Array Calibration Using Hybrid TDOA Information |
|
Zhang, Chengjie | Soutern University of Science and Technology |
Wang, Jiang | Southern University of Science and Technology |
Kong, He | Southern University of Science and Technology |
Keywords: Robot Audition, Calibration and Identification
Abstract: Asynchronous microphone array calibration is a prerequisite for many audition robot applications. A popular solution to the above calibration problem is the batch form of Simultaneous Localisation and Mapping (SLAM), using the time difference of arrival measurements between two microphones (TDOA-M), and the robot (which serves as a moving sound source during calibration) odometry information. In this paper, we introduce a new form of measurement for microphone array calibration, i.e. the time difference of arrival between adjacent sound events (TDOA-S) with respect to the microphone channels. We propose to use TDOA-S and TDOA-M, called hybrid TDOA, together with odometry measurements for bath SLAM-based calibration of asynchronous microphone arrays. Extensive simulation and real-world experiments show that our method is more independent of microphone number, less sensitive to initial values (when using off-the-shelf algorithms such as Gauss-Newton iterations), and has better calibration accuracy and robustness under various TDOA noises. Simulation results also demonstrate that our method has a lower Cramér-Rao lower bound (CRLB) for microphone parameters. To benefit the community, we open-source our code and data at https://github.com/AISLAB-sustech/Hybrid-TDOA-Calib.
|
|
TuPIT9 |
Room 9 |
Intelligent Transportation |
Teaser Session |
Chair: Lin, Ming C. | University of Maryland at College Park |
Co-Chair: Walas, Krzysztof, Tadeusz | Poznan University of Technology |
|
14:00-15:00, Paper TuPIT9.1 | |
NeSyMoF: A Neuro-Symbolic Model for Motion Forecasting |
|
Doula, Achref | Technical University of Darmstadt |
Yin, Huijie | Technical University of Darmstadt |
Mühlhäuser, Max | Technical University of Darmstadt |
Sanchez Guinea, Alejandro | TU Darmstadt |
Keywords: Autonomous Vehicle Navigation, Intelligent Transportation Systems
Abstract: Recent advancements in deep learning have significantly enhanced the development of efficient models for multi-modal path prediction within urban environments, offering approaches to navigate complex environments accurately. Despite their performance, models grounded in deep learning techniques frequently encounter challenges related to interpretability. This limitation not only hampers their practical application but also complicates the process of diagnosing and rectifying errors within these systems, which is a critical factor for ensuring reliability and safety in real-world deployments. In this paper we propose NeSyMoF, a Neuro-Symbolic model for Motion Forecasting, to address this critical gap by combining the predictive power of deep neural networks with the interpretable logic inherent in symbolic reasoning. Data processing in NeSyMoF involves extracting pertinent features from the agent's environment and channeling them into a neuro-symbolic reasoning module. The neuro-symbolic reasoning module generates first-order logic rules that describe and condition the path prediction process, thereby providing clear explanations and intentions behind the forecasts of the model. We evaluate our model with the Argoverse benchmark for path forecasting, as it includes challenging driving situations, necessary to extensively evaluate our model. The results of our evaluation show that NeSyMoF outperforms state-of-the-art interpretable models for single-mode predictions while providing logic-based explanations for its forecasts, that articulate the reasoning behind predictions, making NeSyMoF more adapted for human-centric applications.
|
|
14:00-15:00, Paper TuPIT9.2 | |
Improving Behavior Profile Discovery for Vehicles |
|
de Moura Martins Gomes, Nelson | ISIR |
Garrido Carpio, Fernando José | Valeo |
Nashashibi, Fawzi | INRIA |
Keywords: Autonomous Vehicle Navigation, Intelligent Transportation Systems, Big Data in Robotics and Automation
Abstract: Multiple approaches have already been proposed to mimic real driver behaviors in simulation. This article proposes a new one, based solely on the exploration of undisturbed observation of intersections. From them, the behavior profiles for each macro-maneuver will be discovered. Using the macro-maneuvers already identified in previous works, a comparison method between trajectories with different lengths using an Extended Kalman Filter (EKF) is proposed, which combined with an Expectation-Maximization (EM) inspired method, defines the different clusters that represent the behaviors observed. This is also paired with a Kullback-Liebler divergent (KL) criteria to define when the clusters need to be split or merged. Finally, the behaviors for each macro-maneuver are determined by each cluster discovered, without using any map information about the environment and being dynamically consistent with vehicle motion. By observation it becomes clear that the two main factor for driver behavior are their assertiveness and interaction with other road users.
|
|
14:00-15:00, Paper TuPIT9.3 | |
Applying Neural Monte Carlo Tree Search to Unsignalized Multi-Intersection Scheduling for Autonomous Vehicles |
|
Shi, Yucheng | Trinity College Dublin |
Wang, Wenlong | Trinity College Dublin |
Tao, Xiaowen | Trinity College Dublin |
Dusparic, Ivana | Trinity College Dublin |
Cahill, Vinny | Trinity College Dublin |
Keywords: Intelligent Transportation Systems, Collision Avoidance, Agent-Based Systems
Abstract: Dynamic scheduling of access to shared resources by autonomous systems is a challenging problem, characterized as being NP-hard. The complexity of this task leads to a combinatorial explosion of possibilities in highly dynamic systems where arriving requests must be continuously scheduled subject to strong safety and time constraints. An example of such a system is an unsignalized intersection, where automated vehicles' access to potential conflict zones must be dynamically scheduled. In this paper, we apply Neural Monte Carlo Tree Search (NMCTS) to the challenging task of scheduling platoons of vehicles crossing unsignalized intersections. Crucially, we introduce a transformation model that maps successive sequences of potentially conflicting road-space reservation requests from platoons of vehicles into a series of board-game-like problems and use NMCTS to search for solutions representing optimal road-space allocation schedules in the context of past allocations. To optimize search, we incorporate a prioritized re-sampling method with parallel NMCTS (PNMCTS) to improve the quality of training data. To optimize training, a curriculum learning strategy is used to train the agent to schedule progressively more complex boards culminating in overlapping boards that represent busy intersections. In a single four-way unsignalized intersection simulation, PNMCTS managed 95% of new high-density scenarios, reducing crossing time by 43% in light and 52% in heavy traffic versus first-in first-out control. In a 3x3 multi-intersection network, the proposed method maintained free-flow in light traffic when all intersection are under control of PNMCTS and outperformed state-of-the-art RL-based traffic-light controls in average travel time by 74.5% and total throughput by 16% in heavy traffic.
|
|
14:00-15:00, Paper TuPIT9.4 | |
Deep Stochastic Kinematic Models for Probabilistic Motion Forecasting in Traffic |
|
Zheng, Laura | University of Maryland, College Park |
Son, Sanghyun | University of Maryland |
Liang, Jing | University of Maryland |
Wang, Xijun | University of Maryland, College Park |
Clipp, Brian | Kitware Inc |
Lin, Ming C. | University of Maryland at College Park |
Keywords: Intelligent Transportation Systems, Agent-Based Systems, Autonomous Vehicle Navigation
Abstract: In trajectory forecasting tasks for traffic, future output trajectories can be computed by advancing the ego vehicle's state with predicted actions according to a kinematics model. By unrolling predicted trajectories via time integration and models of kinematic dynamics, predicted trajectories should not only be kinematically feasible but also relate uncertainty from one timestep to the next. While current works in probabilistic prediction do incorporate kinematic priors for mean trajectory prediction, _variance_ is often left as a learnable parameter, despite uncertainty in one time step being inextricably tied to uncertainty in the previous time step. In this paper, we show simple and differentiable analytical approximations describing the relationship between variance at one timestep and that at the next with the kinematic bicycle model. In our results, we find that encoding the relationship between variance across timesteps works especially well in unoptimal settings, such as with small or noisy datasets. We observe up to a 50% performance boost in partial dataset settings and up to an 8% performance boost in large-scale learning compared to previous kinematic prediction methods on SOTA trajectory forecasting architectures out-of-the-box, with no fine-tuning.
|
|
14:00-15:00, Paper TuPIT9.5 | |
Realistic Rainy Weather Simulation for LiDARs in CARLA Simulator |
|
Yang, Donglin | BUAA |
Cai, Xinyu | Shanghai AI Laboratory |
Liu, Zhenfeng | Nankai University |
Jiang, Wentao | Beihang University |
Zhang, Bo | Shanghai Artificial Intelligence Laboratory |
Yan, Guohang | Shanghai AI Laboratory |
Gao, Xing | Shanghai AI Lab |
Liu, Si | Beihang University |
Shi, Botian | Shanghai AI Laboratory |
Keywords: Intelligent Transportation Systems, Simulation and Animation, Computer Vision for Transportation
Abstract: Data augmentation methods to enhance perception performance in adverse weather have recently attracted considerable attention. Most of the LiDAR data augmentation methods post-process the existing dataset by physics-based models or machine-learning methods. However, due to the limited environmental annotations and the fixed vehicle trajectories in existing datasets, it is challenging to edit the scene and expand the diversity of traffic flow and scenario. To this end, we propose a simulator-based physical modeling approach to augment LiDAR data in rainy weather, enhancing the performance of the perception model. We complete the modeling task of the rainy weather effect in the CARLA simulator and establish a data collection pipeline for LiDAR. Furthermore, we pay special attention to the spray generated by vehicles in rainy weather and simulate this phenomenon through the Spray Emitter method we developed. In addition, considering the influence of different weather conditions on point cloud intensity, we develop a prediction network to forecast the intensity of the LiDAR echo. This enables us to complete the rainy weather simulation of 4D point cloud data. In the experiment, we observe that the model augmented by our synthetic dataset improves the performance for 3D object detection in rainy weather. Both code and dataset are available at https://github.com/PJLab-ADG/PCSim#rainypcsim.
|
|
14:00-15:00, Paper TuPIT9.6 | |
Multi-Agent Path Finding for Mixed Autonomy Traffic Coordination |
|
Zheng, Han | Massachusetts Institute of Technology |
Yan, Zhongxia | Massachusetts Institute of Technology |
Wu, Cathy | MIT |
Keywords: Intelligent Transportation Systems, Path Planning for Multiple Mobile Robots or Agents
Abstract: In the evolving landscape of urban mobility, the prospective integration of Connected and Automated Vehicles (CAVs) with Human-Driven Vehicles (HDVs) presents a com- plex array of challenges and opportunities for autonomous driving systems. While recent advancements in robotics have yielded Multi-Agent Path Finding (MAPF) algorithms tai- lored for agent coordination task characterized by simplified kinematics and complete control over agent behaviors, these solutions are inapplicable in mixed-traffic environments where uncontrollable HDVs must coexist and interact with CAVs. Addressing this gap, we propose the Behavior Prediction Kinematic Priority Based Search (BK-PBS), which leverages an offline-trained conditional prediction model to forecast HDV responses to CAV maneuvers, integrating these insights into a Priority Based Search (PBS) where the A* search proceeds over motion primitives to accommodate kinematic constraints. We compare BK-PBS with CAV planning algorithms derived by rule-based car-following models, and reinforcement learning (RL). Through comprehensive simulation on a highway merging scenario across diverse scenarios of CAV penetration rate and traffic density, BK-PBS outperforms these baselines in reducing collision rates and enhancing system-level travel delay. Our work is directly applicable to many scenarios of multi-human multi-robot coordination.
|
|
14:00-15:00, Paper TuPIT9.7 | |
SurrealDriver: Designing LLM-Powered Generative Driver Agent Framework Based on Human Drivers' Driving-Thinking Data |
|
Jin, Ye | Tsinghua University |
Yang, Ruoxuan | Tsinghua University |
Yi, Zhijie | Beijing Normal University |
Shen, Xiaoxi | City University of Hong Kong |
Huiling, Peng | Nankai University |
Liu, Xiaoan | New York University |
Qin, Jingli | Institute for AI Industry Research, Tsinghua University, Beijing |
Jiayang, Li | Tongji University |
Xie, Jintao | The Institute for AI Industry Research, Tsinghua University, Bei |
Gao, Peizhong | Tsinghua University |
Zhou, Guyue | Tsinghua University |
Gong, Jiangtao | Tsinghua University |
Keywords: Intelligent Transportation Systems, Embodied Cognitive Science, Agent-Based Systems
Abstract: Leveraging advanced reasoning capabilities and extensive world knowledge of large language models (LLMs) to construct generative agents for solving complex real-world problems is a major trend. However, LLMs inherently lack embodiment as humans, resulting in suboptimal performance in many embodied decision-making tasks. In this paper, we introduce a framework for building human-like generative driving agents using post-driving self-report driving-thinking data from human drivers as both demonstration and feedback. To capture high-quality, natural language data from drivers, we conducted urban driving experiments, recording drivers' verbalized thoughts under various conditions to serve as chain-of-thought prompts and demonstration examples for the LLM-Agent. The framework's effectiveness was evaluated through simulations and human assessments. Results indicate that incorporating expert demonstration data significantly reduced collision rates by 81.04% and increased human likeness by 50% compared to a baseline LLM-based agent. Our study provides insights into using natural language-based human demonstration data for embodied tasks. The driving-thinking dataset is available at url{https://github.com/AIR-DISCOVER/Driving-Thinking-Datas et}.
|
|
14:00-15:00, Paper TuPIT9.8 | |
Learning Dynamics Models for Velocity Estimation in Autonomous Racing |
|
Węgrzynowski, Jan | IDEAS NCBR, Poznan University of Technology |
Czechmanowski, Grzegorz | IDEAS NCBR, Poznan University of Technology |
Kicki, Piotr | Poznan University of Technology |
Walas, Krzysztof, Tadeusz | Poznan University of Technology |
Keywords: Intelligent Transportation Systems, Deep Learning Methods, Dynamics
Abstract: Velocity estimation is of great importance in autonomous racing. Still, existing solutions are characterized by limited accuracy, especially in the case of aggressive driving or poor generalization to unseen road conditions. To address these issues, we propose to utilize Unscented Kalman Filter (UKF) with a learned dynamics model that is optimized directly for the state estimation task. Moreover, we propose to aid this model with the online-estimated friction coefficient, which increases the estimation accuracy and enables zero-shot adaptation to the new road conditions. To evaluate the UKF-based velocity estimator with the proposed dynamics model, we introduced a publicly available dataset of aggressive maneuvers performed by an F1TENTH car, with sideslip angles reaching 40°. Using this dataset, we show that learning the dynamics model through UKF leads to improved estimation performance and that the proposed solution outperforms state-of-the-art learning-based state estimators by 17% in the nominal scenario. Moreover, we present unseen zero-shot adaptation abilities of the proposed method to the new road surface thanks to the proposed learning-based tire dynamics model with online friction estimation.
|
|
14:00-15:00, Paper TuPIT9.9 | |
Large Language Models Powered Context-Aware Motion Prediction |
|
Zheng, Xiaoji | Southeast University |
Wu, Lixiu | Minzu University of China |
Yan, Zhijie | Beihang University |
Tang, Yuanrong | Tsinghua University |
Zhao, Hao | Tsinghua University |
Zhong, Chen | Tsinghua University |
Chen, Bokui | Tsinghua University |
Gong, Jiangtao | Tsinghua University |
Keywords: Intelligent Transportation Systems, Computer Vision for Transportation, AI-Based Methods
Abstract: Motion prediction is among the most fundamental tasks in autonomous driving. Traditional methods of motion forecasting primarily encode vector information of maps and historical trajectory data of traffic participants, lacking a comprehensive understanding of overall traffic semantics, which in turn affects the performance of prediction tasks. In this paper, we utilized Large Language Models (LLMs) to enhance the global traffic context understanding for motion prediction tasks. We first conducted systematic prompt engineering, visualizing complex traffic environments and historical trajectory information of traffic participants into image prompts---Transportation Context Map (TC-Map), accompanied by corresponding text prompts. Through this approach, we obtained rich traffic context information from the LLM. By integrating this information into the motion prediction model, we demonstrate that such context can enhance the accuracy of motion predictions. Furthermore, considering the cost associated with LLMs, we propose a cost-effective deployment strategy: enhancing the accuracy of motion prediction tasks at scale with 0.7% LLM-augmented datasets. Our research offers valuable insights into enhancing the understanding of traffic scenes of LLMs and the motion prediction performance of autonomous driving.
|
|
14:00-15:00, Paper TuPIT9.10 | |
A Data-Informed Analysis of Scalable Supervision for Safety in Autonomous Vehicle Fleets |
|
Hickert, Cameron | Massachusetts Institute of Technology |
Yan, Zhongxia | Massachusetts Institute of Technology |
Wu, Cathy | MIT |
Keywords: Intelligent Transportation Systems, Telerobotics and Teleoperation, Human Factors and Human-in-the-Loop
Abstract: Autonomous driving is a highly anticipated approach toward eliminating roadway fatalities. At the same time, the bar for safety is both high and costly to verify. This work considers the role of remotely-located human operators supervising a fleet of autonomous vehicles (AVs) for safety. Such a 'scalable supervision' concept was previously proposed to bridge the gap between still-maturing autonomy technology and the pressure to begin commercial offerings of autonomous driving. The present article proposes DISCES, a framework for data-informed safety-critical event simulation, to investigate the practicality of this concept from a dynamic network loading standpoint. With a focus on the safety-critical context of AVs merging into mixed-autonomy traffic, vehicular arrival processes at 1,097 highway merge points are modeled using microscopic traffic reconstruction with historical data from interstates across three California counties. Combined with a queuing theoretic model, these results characterize the dynamic supervision requirements and thereby scalability of the teleoperation approach. Across all scenarios we find reductions in operator requirements greater than 99% as compared to in-vehicle supervisors for the time period analyzed. The work also demonstrates two methods for reducing these empirical supervision requirements: (i) the use of cooperative connected AVs --- which are shown to produce an average 3.67 orders-of-magnitude system reliability improvement across the scenarios studied --- and (ii) aggregation across larger regions.
|
|
14:00-15:00, Paper TuPIT9.11 | |
SmartPathfinder: Pushing the Limits of Heuristic Solutions for Vehicle Routing Problem with Drones Using Reinforcement Learning |
|
Imran, Navid Mohammad | University of Memphis |
Won, Myounggyu | University of Memphis |
Keywords: Intelligent Transportation Systems, Logistics, Reinforcement Learning
Abstract: The Vehicle Routing Problem with Drones (VRPD) seeks to optimize the routing paths for both trucks and drones, where the trucks are responsible for delivering parcels to customer locations, and the drones are dispatched from these trucks for parcel delivery, subsequently being retrieved by the trucks. Given the NP-Hard complexity of VRPD, numerous heuristic approaches have been introduced. However, improving solution quality and reducing computation time remain significant challenges. In this paper, we conduct a comprehensive examination of heuristic methods designed for solving VRPD, distilling and standardizing them into core elements. We then develop a novel reinforcement learning (RL) framework that is seamlessly integrated with the heuristic solution components, establishing a set of universal principles for incorporating the RL framework with heuristic strategies in an aim to improve both the solution quality and computational speed. This integration has been applied to a state-of-the-art heuristic solution for VRPD, showcasing the substantial benefits of incorporating the RL framework. Our evaluation results demonstrated that the heuristic solution incorporated with our RL framework not only elevated the quality of solutions but also achieved rapid computation speeds, especially when dealing with extensive customer locations.
|
|
14:00-15:00, Paper TuPIT9.12 | |
Agent-Agnostic Centralized Training for Decentralized Multi-Agent Cooperative Driving |
|
Yan, Shengchao | University of Freiburg |
König, Lukas Maximilian | Ruhr Universität Bochum |
Burgard, Wolfram | University of Technology Nuremberg |
Keywords: Intelligent Transportation Systems, Reinforcement Learning, Deep Learning Methods
Abstract: Active traffic management with autonomous vehicles offers the potential for reduced congestion and improved traffic flow. However, developing effective algorithms for real-world scenarios requires overcoming challenges related to infinite-horizon traffic flow and partial observability. To address these issues and further decentralize traffic management, we propose an asymmetric actor-critic model that learns decentralized cooperative driving policies for autonomous vehicles using single-agent reinforcement learning. By employing attention neural networks with masking, our approach efficiently manages real-world traffic dynamics and partial observability, eliminating the need for predefined agents or agent-specific experience buffers in multi-agent reinforcement learning. Extensive evaluations across various traffic scenarios demonstrate our method's significant potential in improving traffic flow at critical bottleneck points. Moreover, we address the challenges posed by conservative autonomous vehicle driving behaviors that adhere strictly to traffic rules, showing that our cooperative policy effectively alleviates potential slowdowns without compromising safety.
|
|
14:00-15:00, Paper TuPIT9.13 | |
HeteroLight: A General and Efficient Learning Approach for Heterogeneous Traffic Signal Control |
|
Zhang, Yifeng | National University of Singapore |
Li, Peizhuo | National University of Singapore |
Fan, Mingfeng | Central South University |
Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
Keywords: Intelligent Transportation Systems, Multi-Robot Systems, Reinforcement Learning
Abstract: Efficient and scalable adaptive traffic signal control is crucial in reducing congestion, maximizing throughput, and improving mobility experience in ever-expanding cities. Recent advances in multi-agent reinforcement learning (MARL) with parameter sharing have significantly improved the adaptive optimization of large-scale, complex, and dynamic traffic flows. However, the limited model representation capability due to shared parameters impedes the learning of diverse control strategies for intersections with different flows/topologies, posing significant challenges to achieving effective signal control in complex and varied real-world traffic scenarios. To address these challenges, we present a novel MARL-based general traffic signal control framework, called HeteroLight. Specifically, we first introduce a General Feature Extraction (GFE) module, crafted in a decoder-only fashion, where we employ an attention mechanism to facilitate efficient and flexible extraction of traffic dynamics at intersections with varied topologies. Additionally, we incorporate an Intersection Specifics Extraction (ISE) module, designed to identify key latent vectors that represent the unique intersection's topology and traffic dynamics through variational inference techniques. By integrating the learned intersection-specific information into policy learning, we enhance the parameter-sharing mechanism, improving the model's representation diversity among different agents and enabling the learning of a more efficient shared control strategy. Through comprehensive evaluations against other state-of-the-art traffic signal control methods on the real-world Monaco traffic network, our empirical findings reveal that HeteroLight consistently outperforms other methods across various evaluation metrics, highlighting its superiority in optimizing traffic flows in heterogeneous traffic networks.
|
|
14:00-15:00, Paper TuPIT9.14 | |
Multi-Uncertainty Aware Autonomous Cooperative Planning |
|
Zhang, Shiyao | Southern University of Science and Technology |
Li, He | University of Macau |
Zhang, Shengyu | Singapore University of Technology and Design |
Wang, Shuai | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Ng, Derrick Wing Kwan | University of New South Wales |
Xu, Chengzhong | University of Macau |
Keywords: Intelligent Transportation Systems, Multi-Robot Systems
Abstract: Autonomous cooperative planning (ACP) is a promising technique to improve the efficiency and safety of multi-vehicle interactions for future intelligent transportation systems. However, realizing robust ACP is a challenge due to the aggregation of perception, motion, and communication uncertainties. This paper proposes a novel multi-uncertainty aware ACP (MUACP) framework that simultaneously accounts for multiple types of uncertainties via regularized cooperative model predictive control (RC-MPC). The regularizers and constraints for perception, motion, and communication are constructed according to the confidence levels, weather conditions, and outage probabilities, respectively. The effectiveness of the proposed method is evaluated in the Car Learning to Act (CARLA) simulation platform. Results demonstrate that the proposed MUACP efficiently performs cooperative formation in real time and outperforms other benchmark approaches in various scenarios under imperfect knowledge of the environment.
|
|
14:00-15:00, Paper TuPIT9.15 | |
Adversarial Attack on Trajectory Prediction for Autonomous Vehicles with Generative Adversarial Networks |
|
Fan, Jiping | Beijing Institute of Technology |
Wang, Zhenpo | Beijing Institute of Technology |
Li, Guoqiang | Beijing Institute of Technology |
Keywords: Intelligent Transportation Systems, Deep Learning Methods, Robot Safety
Abstract: Accurate trajectory prediction is crucial for autonomous vehicles to realize safe driving. Current trajectory prediction approaches generally rely on deep neural networks, which are susceptible to adversarial attacks. To evaluate the adversarial robustness and security of deep-learning-based trajectory prediction models, this paper proposes an adversarial attack method on trajectory prediction using generative adversarial networks (GANs). First, a novel LSTM-based attack trajectory model named Adv-GAN is proposed considering both the temporal and spatial driving features. The networks in Adv-GAN are trained through game learning between the generator and the discriminator to obtain the adversarial trajectories with real driving feature distribution. Furthermore, the generated trajectory is optimized with the vehicle kinematics model for driving feasibility on roads. The derived adversarial attack can lead to considerable deviations in trajectory prediction which affects driving safety for autonomous vehicles. We evaluate the proposed Adv-GAN on three public datasets, and experimental results show the effectiveness with better attack performance compared to a state-of-the-art adversarial attack model.
|
|
14:00-15:00, Paper TuPIT9.16 | |
Active Vehicle Re-Localization Based on Non-Repetitive Lidar with Gimbal Motion Strategy |
|
Wu, Xin'ao | Shanghai Jiaotong University |
Yang, Chenxi | Shanghai Jiao Tong University |
Guo, Yiyang | Los Altos High School |
Zhuang, Hanyang | Shanghai Jiao Tong University |
Wang, Chunxiang | Shanghai Jiaotong University |
Yang, Ming | Shanghai Jiao Tong University |
Keywords: Intelligent Transportation Systems, Localization, Field Robots
Abstract: The installation of a multi-layer 3D LiDAR atop the vehicle is a widely adopted hardware configuration for mapmatching-based localization in intelligent driving. By offering a comprehensive 360◦ horizontal Field of View (FoV), this setup aims to achieve precise matching outcomes through the imposition of substantial geometric constraints against dynamic interferences and structural degradation. However, several factors limit its environmental adaptability, such as sparse point cloud density at distances, insufficient maximum sensing range, and notably, the restricted beam elevation angle, limiting the perception of the environment beyond obstacles. The rapid advancement of non-repetitive scanning LiDARs shows promise in mitigating such limitations. Nevertheless, their narrow FoV remains a challenge to overcome. In this study, we propose a solution by mounting such one single LiDAR on a two-axis rotating gimbal, enabling the vehicle to surpass the ranges and vertical FoV limitations of traditional setups actively. The corresponding gimbal motion strategy has been designed to automatically focus on the environment component with the most robust geometric constraints. Experimental results validate that the proposed method achieves superior robustness under high dynamic interference while delivering sufficient performance under standard conditions
|
|
TuPIT10 |
Room 10 |
Simultaneous Localization and Mapping (SLAM) I |
Teaser Session |
Co-Chair: Nuechter, Andreas | University of Würzburg |
|
14:00-15:00, Paper TuPIT10.1 | |
Geometry-Aided Underwater 3D Mapping Using Side-Scan Sonar |
|
Yang, Yiqiao | Northeastern University |
Pang, Chenglin | Northeastern University |
Wu, Chengdong | Northeastern University |
Fang, Zheng | Northeastern University |
Keywords: Mapping, Marine Robotics, SLAM
Abstract: In recent years, the interest in underwater exploration with Autonomous Underwater Vehicles (AUVs) equipped with side-scan sonars (SSS) has grown considerably. However, state-of-the-art SSS Simultaneous Localization and Mapping (SLAM) systems encounter challenges in data association across large viewpoint changes. Additionally, these systems assume that the seabed is a flat surface, leading to significant mapping error in uneven underwater terrains. To address these challenges, we propose a framework that leverages the side-scan sonar geometry to facilitate data association and improve mapping accuracy. The framework begins with a preprocessing module that extracts feature points and provides initial estimates of the elevation angles of the landmarks. Then, a non-consecutive data association module applies epipolar line searching to establish correspondences between the current and historical frames. Finally, the mapping module uses side-scan sonar bundle adjustment to recover the positions of the landmarks. The proposed method is evaluated using an underwater terraced fields dataset. Our method achieves over 90 percent matching rate and reduces the average reprojection error from 764.17 to 4.91 after three iterations.
|
|
14:00-15:00, Paper TuPIT10.2 | |
Thermal-NeRF: Neural Radiance Fields from an Infrared Camera |
|
Ye, Tianxiang | Shanghai Jiaotong University |
Wu, Qi | Shanghai Jiao Tong University |
Deng, Junyuan | Shanghai Jiao Tong University |
Liu, Guoqing | Shanghai Jiao Tong University |
Liu, Liu | Hefei University of Technology |
Xia, Songpengcheng | Shanghai Jiao Tong University |
Pang, Liang | Shanghai Slamtec Co., Ltd |
Yu, Wenxian | Shanghai Jiao Tong University |
Pei, Ling | Shanghai Jiao Tong University |
Keywords: Mapping, Data Sets for Robotic Vision
Abstract: In recent years, Neural Radiance Fields (NeRFs) have demonstrated significant potential in encoding highlydetailed 3D geometry and environmental appearance, positioning themselves as a promising alternative to traditional explicit representation for 3D scene reconstruction. However, the predominant reliance on RGB imaging presupposes ideal lighting conditions—a premise frequently unmet in robotic applications plagued by poor lighting or visual obstructions. This limitation overlooks the capabilities of infrared (IR) cameras, which excel in low-light detection and present a robust alternative under such adverse scenarios. To tackle these issues, we introduce Thermal-NeRF, the first method that estimates a volumetric scene representation in the form of a NeRF solely from IR imaging. By leveraging a thermal mapping and structural thermal constraint derived from the thermal characteristics of IR imaging, our method showcases unparalleled proficiency in recovering NeRFs in visually degraded scenes where RGB-based methods fall short. We conduct extensive experiments to demonstrate that Thermal-NeRF can achieve superior quality compared to existing methods. Furthermore, we contribute a dataset for IR-based NeRF applications, paving the way for future research in IR NeRF reconstruction, see https: //github.com/Cerf-Volant425/Thermal-NeRF.
|
|
14:00-15:00, Paper TuPIT10.3 | |
CSR: A Lightweight Crowdsourced Road Structure Reconstruction System for Autonomous Driving |
|
Wang, Huayou | Huawei Technologies |
Liu, Qingyao | Li Auto |
Wu, Jiazheng | Tianjin University |
Liu, Kun | Li Auto Inc |
Ding, Chao | LiAuto |
Lang, Xianpeng | LiAuto |
Xue, Changliang | Huawei Technologies |
Keywords: Mapping, Autonomous Vehicle Navigation, Semantic Scene Understanding
Abstract: Highly accurate and robust vectorized reconstruction of road structures is crucial for autonomous vehicles. Traditional LiDAR-based methods require multiple processes and are often expensive, time-consuming, labor-intensive, and cumbersome. In this paper, we propose a lightweight crowdsourced road structure reconstruction system (termed CSR) that relies solely on online perceived semantic elements. Ambiguities and perceptual errors of semantic features and Global Navigation Satellite System (GNSS) global pose errors constitute the predominant challenge in achieving alignment across multi-trip data. To this end, a robust two-phased coarse-to-fine multi-trip alignment method is performed considering local geometric consistency, global topology consistency, intra-trip temporal consistency, and inter-trip consistency. Further, we introduce an incremental pose graph optimization framework with adaptive weight tuning ability to integrate pre-built maps, currently perceived multi-trip semantic features, odometry, and GNSS, enabling accurate and robust incremental road structure reconstruction. CSR is highly automated, efficient, and scalable for large-scale autonomous driving scenarios, significantly expediting road structure production. We quantitatively and qualitatively validate the reconstruction performance of CSR in real-world scenes. CSR achieves centimeter-level accuracy commensurate with established LiDAR-based methods, concurrently boosting efficiency and reducing resource expenditure.
|
|
14:00-15:00, Paper TuPIT10.4 | |
Neural Semantic Map-Learning for Autonomous Vehicles |
|
Herb, Markus | Technische Universität München |
Navab, Nassir | TU Munich |
Tombari, Federico | Technische Universität München |
Keywords: Mapping, Multi-Robot SLAM, Computer Vision for Transportation
Abstract: Autonomous vehicles demand detailed maps to maneuver reliably through traffic, which need to be kept up-to-date to ensure a safe operation. A promising way to adapt the maps to the ever-changing road-network is to use crowd-sourced data from a fleet of vehicles. In this work, we present a mapping system that fuses local submaps gathered from a fleet of vehicles at a central instance to produce a coherent map of the road environment including drivable area, lane markings, poles, obstacles and more as a 3D mesh. Each vehicle contributes locally reconstructed submaps as lightweight meshes, making our method applicable to a wide range of reconstruction methods and sensor modalities. Our method jointly aligns and merges the noisy and incomplete local submaps using a scene-specific Neural Signed Distance Field, which is supervised using the submap meshes to predict a fused environment representation. We leverage memory-efficient sparse feature-grids to scale to large areas and introduce a confidence score to model uncertainty in scene reconstruction. Our approach is evaluated on two datasets with different local mapping methods, showing improved pose alignment and reconstruction over existing methods. Additionally, we demonstrate the benefit of multi-session mapping and examine the required amount of data to enable high-fidelity map learning for autonomous vehicles.
|
|
14:00-15:00, Paper TuPIT10.5 | |
On the 3D Trochoidal Motion Model of LiDAR Sensors Placed Off-Centered Inside Spherical Mobile Mapping Systems |
|
Arzberger, Fabian | Julius-Maximilians-University of Würzburg |
Nuechter, Andreas | University of Würzburg |
Keywords: Mapping, Calibration and Identification, Kinematics
Abstract: We study the motion model of a sensor rigidly mounted inside a ball. Due to the rigid placement inside the ball, the geometry of the sensor trajectory resembles a 3D curate trochoid. A new calibration method for spherical systems estimates the extrinsic parameters of the sensor with respect to the balls center of rotation. We deploy the calibration and motion model on our spherical mobile mapping platform to estimate the trajectory of a LiDAR sensor and compare it to trajectories of state-of-the-art LiDAR-Inertial odometry (LIO) methods. The motion model, which is solely based on IMU measurements, produces comparable results to the LIO methods, sometimes even outperforming them in positional accuracy. Although the LIO methods provide better rotational accuracy due to the utilization of LiDAR data, they struggle to reproduce the trochoidal nature of the trajectory and only provide pose estimations at the LiDAR frequency, whereas the motion model produces a more consistent trochoidal trajectory at the much higher IMU frequency. The results demonstrate the difficulty that current LIO methods have on spherical systems and indicate that our motion model is suitable for overcoming these issues.
|
|
14:00-15:00, Paper TuPIT10.6 | |
V-PRISM: Probabilistic Mapping of Unknown Tabletop Scenes |
|
Wright, Herbert | University of Utah |
Zhi, Weiming | Carnegie Mellon University |
Johnson-Roberson, Matthew | Carnegie Mellon University |
Hermans, Tucker | University of Utah |
Keywords: Mapping, RGB-D Perception, Probability and Statistical Methods
Abstract: The ability to construct concise scene representations from sensor input is central to the field of robotics. This paper addresses the problem of robustly creating a 3D representation of a tabletop scene from a segmented RGB-D image. These representations are then critical for a range of downstream manipulation tasks. Many previous attempts to tackle this problem do not capture accurate uncertainty, which is required to subsequently produce safe motion plans. In this paper, we cast the representation of 3D tabletop scenes as a multi-class classification problem. To tackle this, we introduce V-PRISM, a framework and method for robustly creating probabilistic 3D segmentation maps of tabletop scenes. Our maps contain both occupancy estimates, segmentation information, and principled uncertainty measures. We evaluate the robustness of our method in (1) procedurally generated scenes using open-source object datasets, and (2) real-world tabletop data collected from a depth camera. Our experiments show that our approach outperforms alternative continuous reconstruction approaches that do not explicitly reason about objects in a multi-class formulation.
|
|
14:00-15:00, Paper TuPIT10.7 | |
Enhancing Online Road Network Perception and Reasoning with Standard Definition Maps |
|
Zhang, Hengyuan | University of California, San Diego |
Paz, David | University of California, San Diego |
Guo, Yuliang | Bosch Research North America |
Das, Arun | Robert Bosch LLC |
Huang, Xinyu | Robert Bosch LLC |
Haug, Karsten | Robert Bosch GmbH |
Christensen, Henrik Iskov | UC San Diego |
Ren, Liu | Robert Bosch North America Research Technology Center |
Keywords: Mapping, Semantic Scene Understanding, Computer Vision for Transportation
Abstract: Autonomous driving for urban and highway driving applications often requires High Definition (HD) maps to generate a navigation plan. Nevertheless, various challenges arise when generating and maintaining HD maps at scale. While recent online mapping methods have started to emerge, their performance especially for longer ranges is limited by heavy occlusion in dynamic environments. With these considerations in mind, our work focuses on leveraging lightweight and scalable priors—Standard Definition (SD) maps—in the development of online vectorized HD map representations. We first examine the integration of prototypical rasterized SD map representations into various online mapping architectures. Furthermore, to identify lightweight strategies, we extend the OpenLane-V2 dataset with OpenStreetMaps and evaluate the benefits of graphical SD map representations. A key finding from designing SD map integration components is that SD map encoders are model agnostic and can be quickly adapted to new architectures that utilize bird's eye view (BEV) encoders. Our results show that making use of SD maps as priors for the online mapping task can significantly speed up convergence and boost the performance of the online centerline perception task by 30% (mAP). Furthermore, we show that the introduction of the SD maps leads to a reduction of the number of parameters in the perception and reasoning task by leveraging SD map graphs while improving the overall performance. Project Page: https://henryzhangzhy.github.io/sdhdmap/.
|
|
14:00-15:00, Paper TuPIT10.8 | |
Teaching Robots Where to Go and How to Act with Human Sketches Via Spatial Diagrammatic Instructions |
|
Sun, Qilin | Carnegie Mellon University |
Zhi, Weiming | Carnegie Mellon University |
Zhang, Tianyi | Carnegie Mellon University |
Johnson-Roberson, Matthew | Carnegie Mellon University |
Keywords: Mapping, Learning from Demonstration, Probabilistic Inference
Abstract: This paper introduces Spatial Diagrammatic Instructions (SDIs), an approach for human operators to specify objectives and constraints that are related to spatial regions in the working environment. Human operators are enabled to sketch out regions on camera images that correspond to the objectives and constraints. These sketches are projected to 3D spatial coordinates, and continuous emph{Spatial Instruction Maps} are learned upon them. These models can then be integrated into optimization problems for tasks of robots. In particular, we demonstrate how SDIs can be applied to solve the Base Placement Problem of mobile manipulators, which concerns the best place to put the manipulator to facilitate a certain task. Human operators can specify, via sketch, spatial regions of interest for a manipulation task and permissible regions for the mobile manipulator to be at. Then, an optimization problem that maximizes the manipulator's reachability, or coverage, over the designated regions of interest while remaining in the permissible regions is solved. We provide extensive empirical evaluations, and show that our formulation of Spatial Instruction Maps provides accurate representations of user-specified Diagrammatic Instructions. Furthermore, we demonstrate that our diagrammatic approach to the Mobile Base Placement Problem enables higher quality solutions and faster runtime.
|
|
14:00-15:00, Paper TuPIT10.9 | |
DHP-Mapping: A Dense Panoptic Mapping System with Hierarchical World Representation and Label Optimization Techniques |
|
Hu, Tianshuai | The Hong Kong University of Science and Technology |
Jiao, Jianhao | University College London |
Xu, Yucheng | University of Edinburgh |
Liu, Hongji | The Hong Kong University of Science and Technology |
Wang, Sheng | Hong Kong University of Science and Technology |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Mapping, Semantic Scene Understanding
Abstract: Maps provide robots with crucial environmental knowledge, thereby enabling them to perform interactive tasks effectively. Easily accessing accurate abstract-to-detailed geometric and semantic concepts from maps is crucial for robots to make informed and efficient decisions. To comprehensively model the environment and effectively manage the map data structure, we propose DHP-Mapping, a dense mapping system that utilizes multiple Truncated Signed Distance Field (TSDF) submaps and panoptic labels to hierarchically model the environment. The output map is able to maintain both voxel- and submap-level metric and semantic information. Two modules are presented to enhance the mapping efficiency and label consistency: (1) an inter-submaps label fusion strategy to eliminate duplicate points across submaps and (2) a conditional random field (CRF) based approach to enhance panoptic labels. We conducted experiments with two public datasets including indoor and outdoor scenarios. Our system performs comparably to state-of-the-art (SOTA) methods across geometry and label accuracy evaluation metrics. The experiment results highlight the effectiveness and scalability of our system, as it is capable of constructing precise geometry and maintaining consistent panoptic labels. Our code is publicly available at https://github.com/hutslib/DHP-Mapping.
|
|
14:00-15:00, Paper TuPIT10.10 | |
RMap: Millimeter-Wave Radar Mapping through Volumetric UpSampling |
|
Mopidevi, Ajay Narasimha | University of Colorado Boulder |
Harlow, Kyle | University of Colorado Boulder |
Heckman, Christoffer | University of Colorado at Boulder |
Keywords: Mapping, Range Sensing, Deep Learning Methods
Abstract: Millimeter Wave Radar is being adopted as a viable alternative to lidar and radar in adverse visually degraded conditions, such as in the presence of fog and dust. However, this sensor modality suffers from severe sparsity and noise under nominal conditions, which makes it difficult to use in precise applications such as mapping. This work presents a novel solution to generate accurate 3D maps from sparse radar point clouds. RMap uses a generative transformer architecture which upsamples, denoises, and fills the incomplete radar maps to resemble lidar maps. We test this method on the ColoRadar dataset to demonstrate its efficacy.
|
|
14:00-15:00, Paper TuPIT10.11 | |
A Novel Framework for Structure Descriptors-Guided Hand-Drawn Floor Plan Reconstruction |
|
Zhang, Zhentong | Southeast University |
Liu, Juan | Samsung Electronics(China)R&D Center |
Li, Xinde | Southeast University |
Hu, Chuanfei | University of Shanghai for Science and Technology |
Dunkin, Fir | Southeast University |
Zhang, Shaokun | Southeast University |
Keywords: Mapping, Semantic Scene Understanding, Computer Vision for Automation
Abstract: In the absence of a pre-built indoor map, robot navigation suffers from the limitations of sensors and environments, resulting in decreased efficiency in performing ad-hoc tasks. Given that blueprints are difficult to obtain, an intuitive method is to provide robots with prior knowledge via hand-drawn floor plans. However, due to the inability of robots to directly comprehend hand-drawn styles, the applicability of this method is limited. In this paper, we present a novel framework for hand-drawn floor plan reconstruction that can recognize abstract hand-drawn elements and standardize the reconstruction of hand-drawn floor plans, thereby providing robots with valuable global map information. Specifically, we design a new series of structure descriptors as reconstruction components and employ a deep learning-based model for recognition. Then the standardized results are obtained through the proposed floor plan reconstruction algorithm. To verify the effectiveness of the framework, we conduct experiments on electronic and paper hand-drawn floor plans. Compared with other state-of-the-art methods, our proposed method achieves superior reconstruction results. This work expands the application scenarios for indoor robots, enabling them to quickly comprehend the semantics of complex scenes, thereby enhancing the competitiveness in downstream tasks.
|
|
14:00-15:00, Paper TuPIT10.12 | |
PSS-BA: LiDAR Bundle Adjustment with Progressive Spatial Smoothing |
|
Li, Jianping | Nanyang Technological University |
Nguyen, Thien-Minh | Nanyang Technological University |
Yuan, Shenghai | Nanyang Technological University |
Xie, Lihua | NanyangTechnological University |
Keywords: Mapping, SLAM
Abstract: Accurate and consistent construction of point clouds from LiDAR scanning data is fundamental for 3D modeling applications. Current solutions, such as multiview point cloud registration and LiDAR bundle adjustment, predominantly depend on the local plane assumption, which may be inadequate in complex environments lacking of planar geometries or substantial initial pose errors. To mitigate this problem, this paper presents a LiDAR bundle adjustment with progressive spatial smoothing, which is suitable for complex environments and exhibits improved convergence capabilities. The proposed method consists of a spatial smoothing module and a pose adjustment module, which combines the benefits of local consistency and global accuracy. With the spatial smoothing module, we can obtain robust and rich surface constraints employing smoothing kernels across various scales. Then the pose adjustment module corrects all poses utilizing the novel surface constraints. Ultimately, the proposed method simultaneously achieves fine poses and parametric surfaces that can be directly employed for high-quality point cloud reconstruction. The effectiveness and robustness of our proposed approach have been validated on both simulation and real-world datasets. The experimental results demonstrate that the proposed method outperforms the existing methods and achieves better accuracy in complex environments with low planar structures.
|
|
14:00-15:00, Paper TuPIT10.13 | |
nu-DBA: Neural Implicit Dense Bundle Adjustment Enables Image-Only Driving Scene Reconstruction |
|
Mao, Yunxuan | Zhejiang University |
Shen, Bingqi | Zhejiang University |
Yang, Yifei | Zhejiang University |
Wang, Kai | HuaWei |
Xiong, Rong | Zhejiang University |
Liao, Yiyi | Zhejiang University |
Wang, Yue | Zhejiang University |
Keywords: Mapping, SLAM
Abstract: The joint optimization of the sensor trajectory and 3D map is a crucial characteristic of bundle adjustment (BA), essential for autonomous driving. This paper presents nu-DBA, a novel framework implementing geometric dense bundle adjustment (DBA) using 3D neural implicit surfaces for map parametrization, which optimizes both the map surface and trajectory poses using geometric error guided by dense optical flow prediction. Additionally, we fine-tune the optical flow model with per-scene self-supervision to further improve the quality of the dense mapping. Our experimental results on multiple driving scene datasets demonstrate that our method achieves superior trajectory optimization and dense reconstruction accuracy. We also investigate the influences of photometric error and different neural geometric priors on the performance of surface reconstruction and novel view synthesis. Our method stands as a significant step towards leveraging neural implicit representations in dense bundle adjustment for more accurate trajectories and detailed environmental mapping.
|
|
14:00-15:00, Paper TuPIT10.14 | |
FRAGG-Map: Frustum Accelerated GPU-Based Grid Map |
|
Grimaldi, Michele | University of Girona |
Palomeras, Narcis | Universitat De Girona |
Carlucho, Ignacio | University of Edinburgh |
Petillot, Yvan R. | Heriot-Watt University |
Ridao, Pere | University of Girona |
Keywords: Mapping, SLAM, Marine Robotics
Abstract: In robotics, occupancy grids serve as required repositories of information about the environment in numerous applications. One such critical application is Simultaneous Localization and Mapping (SLAM), where robots dynamically scan and explore their surroundings while in motion. In the context of extended-duration missions, it becomes imperative to confront the complexities linked to the expansion of occupancy grids as well as handling loop closure detection. These challenges primarily revolve around two key aspects: enabling the seamless expansion of the map on multiple occasions, thus avoiding the need to map smaller regions in numerous separate missions, and ensuring real-time updates to the map to sustain the robot’s knowledge base and enhance its responsiveness. To address these challenges, we introduce an innovative map called Frustum Accelerated GPU-Based Grid Map (FRAGG-Map). This map adopts a highly parallelizable 3D grid structure and leverages the power of CUDA kernels to facilitate efficient insertion of point-clouds and enables real-time updates of the map. FRAGG- Map identifies the portions of the map that require updates and utilises the GPU to update them, significantly enhancing computational performance. Our results show that FRAGG- Map can run 31 times faster than OctoMap, significantly outperforming state-of-the-art methods.
|
|
14:00-15:00, Paper TuPIT10.15 | |
OpenOcc: Open Vocabulary 3D Scene Reconstruction Via Occupancy Representation |
|
Jiang, Haochen | Fudan University |
Xu, Yueming | Fudan University |
Zeng, Yihan | Shanghai Jiao Tong University |
Xu, Hang | Noah's Ark Lab |
Zhang, Wei | HUAWEI |
Feng, Jianfeng | Fudan University |
Zhang, Li | Fudan University |
Keywords: Mapping, Semantic Scene Understanding, Deep Learning Methods
Abstract: 3D reconstruction has been widely used in autonomous navigation fields of mobile robotics. However, the former research can only provide the basic geometry structure without the capability of open-world scene understanding, limiting advanced tasks like human interaction and visual navigation. Moreover, traditional 3D scene understanding approaches rely on expensive labeled 3D datasets to train a model for a single task with supervision. Thus, geometric reconstruction with zero-shot scene understanding i.e. Open vocabulary 3D Understanding and Reconstruction, is crucial for the future development of mobile robots. In this paper, we propose OpenOcc, a novel framework unifying the 3D scene reconstruction and open vocabulary understanding with neural radiance fields. We model the geometric structure of the scene with occupancy representation and distill the pre-trained open vocabulary model into a 3D language field via volume rendering for zero-shot inference. Furthermore, a novel semantic-aware confidence propagation (SCP) method has been proposed to relieve the issue of language field representation degeneracy caused by inconsistent measurements in distilled features. Experimental results show that our approach achieves competitive performance in 3D scene understanding tasks, especially for small and long-tail objects.
|
|
14:00-15:00, Paper TuPIT10.16 | |
Text2Map: From Navigational Instructions to Graph-Based Indoor Map Representations Using LLMs |
|
Karkour, Ammar | Carnegie Mellon University |
Harras, Khaled | Carnegie Mellon University |
Feo, Eduardo | Carnegie Mellon University |
Keywords: Mapping, AI-Based Methods, Human Factors and Human-in-the-Loop
Abstract: In spatial navigation, the shift from manual cartography to digital map representations has revolutionized how we interact with and comprehend both outdoor and indoor environments. While digital mapping has substantially advanced outdoor navigation with robust techniques like satellite imagery and sophisticated data labeling, the full potential of indoor digital mapping remains untapped. Accurate indoor mapping promises to enhance the operational efficiency of mobile robots, improving their ability to interact with human environments, and bolstering emergency response capabilities. However, its realization is impeded by the complexity of current methods and the need for heavy manual labor, expertise knowledge, and specialized equipment. To address these challenges, we introduce Text2Map -- a novel methodology that harnesses natural language navigational instructions, the power of off-the-shelf Large Language Models (LLMs), and Few-shot Learning to create graph-based digital maps of indoor spaces. This approach simplifies the mapping process for widespread use, leveraging crowd-sourceable ubiquitous navigation instructions as a data source without requiring specialized map data formats or hardware. Our paper presents the Text2Map system architecture, details the creation of the first dedicated dataset, and evaluates the system's efficacy, highlighting the substantial potential and scalability of our approach. Text2Map achieves a Graph-Edit-Distance (GED) ranging from 0.5X to 2X the total number of regions in a building and an Edge Similarity score between 0.87 and 0.9. These results highlight the precision, robustness, and effectiveness of our methodology. Our work paves the way for a more accessible and streamlined approach to indoor digital mapping, setting the stage for broader adoption in human and mobile robot navigation applications.
|
|
TuPIT11 |
Room 11 |
Marine Robotic Systems |
Teaser Session |
Chair: Yetkin, Harun | Bartin University |
Co-Chair: De Masi, Giulia | Khalifa University |
|
14:00-15:00, Paper TuPIT11.1 | |
Decentralized Linear Convoying for Underactuated Surface Craft with Partial State Coupling |
|
Turrisi, Raymond | Massachusetts Institute of Technology |
Benjamin, Michael | Massachusetts Institute of Technology |
Keywords: Marine Robotics, Multi-Robot Systems, Field Robots
Abstract: This work introduces a novel decentralized algorithm and control law for stable linear convoying using a layered control approach. This algorithm was implemented in MOOS-IvP, using an abstraction layer that models a virtual system decoupled from the system's actual dynamics. This makes the algorithm platform-agnostic and able to be combined with other behaviors such as collision avoidance and operating region behaviors. A trajectory defined by a lead agent is discretized and embedded with the leader's dynamics and propagated to all following agents. We first demonstrate that this approach when paired with a simple PD controller prevents accumulated errors and improves trajectory tracking for follower agents. Thereafter we demonstrate how virtually coupling a subset of agent states improves the overall cohesiveness of the convoy. Improvements are demonstrated in both simulations and field trials using five autonomous surface vehicles.
|
|
14:00-15:00, Paper TuPIT11.2 | |
Opti-Acoustic Semantic SLAM with Unknown Objects in Underwater Environments |
|
Singh, Kurran | Massachusetts Institute of Technology |
Hong, Jungseok | MIT |
Rypkema, Nicholas Rahardiyan | Woods Hole Oceanographic Institution |
Leonard, John | MIT |
Keywords: Marine Robotics, SLAM, Sensor Fusion
Abstract: Despite recent advances in semantic Simultaneous Localization and Mapping (SLAM) for terrestrial and aerial applications, underwater semantic SLAM remains an open and largely unaddressed research problem due to the unique sensing modalities and the object classes found underwater. This paper presents an object-based semantic SLAM method for underwater environments that can identify, localize, classify, and map a wide variety of marine objects without a priori knowledge of the object classes present in the scene. The method performs unsupervised object segmentation and object-level feature aggregation, and then uses opti-acoustic sensor fusion for object localization. Probabilistic data association is used to determine observation to landmark correspondences. Given such correspondences, the method then jointly optimizes landmark and vehicle position estimates. Indoor and outdoor underwater datasets with a wide variety of objects and challenging acoustic and lighting conditions are collected for evaluation and made publicly available. Quantitative and qualitative results show the proposed method achieves reduced trajectory error compared to baseline methods, and is able to obtain comparable map accuracy to a baseline closed-set method that requires hand-labeled data of all objects in the scene.
|
|
14:00-15:00, Paper TuPIT11.3 | |
Development of Contextual Collision Risk Framework for Operational Envelope of Autonomous Navigation System |
|
Kim, Inbeom | Avikus |
Ko, Kwangsung | Avikus |
Park, Jinmo | Avikus |
Keywords: Marine Robotics, Collision Avoidance, Autonomous Vehicle Navigation
Abstract: This paper introduces the `Contextual Collision Risk (CCR)' method, a novel collision risk assessment approach for autonomous navigation systems. As global shipping traffic increases, marine traffic becomes more complex and busier, requiring advanced risk assessment methods for autonomous vessels. The CCR establishes the Operational Envelope (OE) of autonomous navigation systems by integrating the complexities of real-time marine traffic and the specific maneuvering capabilities of own ship. The central aspect of CCR involves setting up Reachable Velocities (RV), which includes the dynamics of own ship. Additionally, the Velocity Obstacle (VO) algorithm is implemented to identify potential collision risks from other vessels. By integrating RV and VO analyses, CCR provides a framework that effectively quantifies collision risk in congested maritime environments. To demonstrate the effects of CCR, we employ data from transoceanic voyages across various scenarios. Furthermore, simulations of collision avoidance maneuvers are conducted, focusing on high-risk situations such as near-misses and potential collisions. In particular, the analysis of transoceanic voyage data shows that CCR holds promise for enhancing navigation safety in complex maritime environments.
|
|
14:00-15:00, Paper TuPIT11.4 | |
IMU-Based Monitoring of Buoy-Ballast System through Cable Dynamics Simulation |
|
Peraud, Charly | COSMER Laboratory, Université De Toulon |
Filliung, Martin | CNRS LIS, COSMER Laboratory, Université De Toulon |
Anthierens, Cedric | Universite De Toulon |
Dune, Claire | Université De Toulon |
Boizot, Nicolas | Université De Toulon |
Hugel, Vincent | University of Toulon |
Keywords: Marine Robotics, Software Tools for Robot Programming, Sensor-based Control
Abstract: This study is twofold. First, a comprehensive simulation framework of cable dynamics is introduced. This framework considers variable length cables and allows to incorporate elements such as buoys, ballast or Inertial Measurement Unit (IMU) sensors. The accuracy of this framework is assessed through experimental data. Second, a novel and improved solution for the instrumentation of a V-shaped buoy-ballast system using IMU sensors is investigated. This latter, designed for a neutrally buoyant tether between a Remotely Operated Vehicle (ROV) and an Unmanned Surface Vehicle (USV), is meant to improve operation safety. The discussed IMU-based solution provides wider information on the interaction between the ROV and the cable, including its 3D orientation and curvature amplitude, which could be used for both the control of the USV trajectory and its onboard winch.
|
|
14:00-15:00, Paper TuPIT11.5 | |
TURTLMap: Real-Time Localization and Dense Mapping of Low-Texture Underwater Environments with a Low-Cost Unmanned Underwater Vehicle |
|
Song, Jingyu | University of Michigan |
Bagoren, Onur | University of Michigan |
Andigani, Razan | University of Michigan - Ann Arbor |
Venkatramanan Sethuraman, Advaith | University of Michigan |
Skinner, Katherine | University of Michigan |
Keywords: Marine Robotics, Mapping, Localization
Abstract: Significant work has been done on advancing localization and mapping in underwater environments. Still, state-of-the-art methods are challenged by low-texture environments, which is common for underwater settings. This makes it difficult to use existing methods in diverse, real-world scenes. In this paper, we present TURTLMap, a novel solution that focuses on textureless underwater environments through a real-time localization and mapping method. We show that this method is low-cost, and capable of tracking the robot accurately, while constructing a dense map of a low-textured environment in real-time. We evaluate the proposed method using real-world data collected in an indoor water tank with a motion capture system and ground truth reference map. Qualitative and quantitative results validate the proposed system achieves accurate and robust localization and precise dense mapping, even when subject to wave conditions. The project page for TURTLMap is url{https://umfieldrobotics.github.io/TURTLMap}.
|
|
14:00-15:00, Paper TuPIT11.6 | |
Towards a Factor Graph-Based Method Using Angular Rates for Full Magnetometer Calibration and Gyroscope Bias Estimation |
|
Rodríguez-Martínez, Sebastián | Monterey Bay Aquarium Research Institute |
Troni, Giancarlo | Monterey Bay Aquarium Research Institute |
Keywords: Marine Robotics, Calibration and Identification, Field Robots
Abstract: MEMS Attitude Heading Reference Systems are widely employed to determine a system's attitude, but sensor measurement biases limit their accuracy. This paper introduces a novel factor graph-based method called MAgnetometer and GYroscope Calibration (MAGYC). MAGYC leverages three-axis angular rate measurements from an angular rate gyroscope to enhance calibration for batch and online applications. Our approach imposes less restrictive conditions for instrument movements required for calibration, eliminates the need for knowledge of the local magnetic field or instrument attitude, and facilitates integration into factor graph algorithms within Smoothing and Mapping frameworks. We evaluate the proposed methods through numerical simulations and in-field experimental assessments using a sensor installed on an underwater vehicle. Ultimately, our proposed methods reduced the underwater vehicle's heading error standard deviation from 6.21 degrees to 0.57 degrees for a standard seafloor mapping survey.
|
|
14:00-15:00, Paper TuPIT11.7 | |
Efficient Feature Mapping Using a Collaborative Team of AUVs |
|
Biggs, Benjamin | Virginia Polytechnic Institute and State University |
Stilwell, Daniel | Virginia Tech |
Yetkin, Harun | Bartin University |
McMahon, James | The Naval Research Laboratory |
Keywords: Marine Robotics, Field Robots
Abstract: We present the results of experiments performed using a team of small autonomous underwater vehicles (AUVs) to determine the location of an isobath. The primary contributions of this work are (1) the development of a novel objective function for level set estimation that utilizes a rigorous assessment of uncertainty, and (2) a description of the practical challenges and corresponding solutions needed to implement our approach in the field using a team of AUVs. We combine path planning techniques and an approach to decentralization from prior work that yields theoretical performance guarantees. Experimentation with a team of AUVs provides empirical evidence that the desirable performance guarantees can be preserved in practice even in the presence of limitations that commonly arise in underwater robotics, including slow and intermittent acoustic communications and limited computational resources.
|
|
14:00-15:00, Paper TuPIT11.8 | |
Real-Time Horizon Locking on Unmanned Surface Vehicles |
|
Kiefer, Benjamin | University of Tuebingen |
Zell, Andreas | University of Tübingen |
Keywords: Marine Robotics, Computer Vision for Transportation, Data Sets for Robotic Vision
Abstract: The expanding use of automated vision, assistance systems, and augmented reality applications in marine settings calls for reliable and accurate horizon detection and locking. Traditional methods utilizing Inertial Measurement Units (IMU) or feature-based computer vision techniques often yield inconsistent results, particularly when unmanned surface vehicles or boats are subject to high-speed movement or choppy waters. Addressing this, our work introduces a computer vision (CV)-based solution for real-time horizon locking. Employing real-time semantic segmentation, we accurately differentiate between sky, land or water in the frame, enabling computational locking of the horizon's position. This stable visual reference significantly improves the performance and reliability of onboard systems for autonomous navigation, augmented reality overlays, and multi-object tracking. Supported by a dataset collected under various marine conditions, our method has proven to achieve high accuracy with low computational latency, making it a promising avenue for wide-scale implementation on automated and semi-automated systems.
|
|
14:00-15:00, Paper TuPIT11.9 | |
Adaptive Control Barrier Functions for Near-Structure ROV Operations |
|
von Benzon, Malte | Aalborg University |
Marley, Mathias | NTNU |
Sørensen, Fredrik Fogh | Aalborg University |
Liniger, Jesper | Aalborg University |
Pedersen, Simon | Aalborg University |
Keywords: Marine Robotics, Robotics in Hazardous Fields, Motion Control
Abstract: This paper introduces a novel control design focused on enhancing the operational safety and efficiency of Inspection, Maintenance, and Repair (IMR) operations conducted by autonomous remotely operated vehicles. Specifically, we propose using a safeguarding controller based on an adaptive Control Barrier Function (CBF) incorporating safety properties previously identified in the literature. This approach permits temporary safe set violations, using an integrative penalty term to strengthen safety measures when necessary. A nominal nonlinear controller inside the safety bounds is proposed to ensure good reference tracking. The proposed controller is demonstrated in a simulation case study where an ROV is set to clean an offshore monopile. The simulation includes unknown time-varying and step-like disturbances caused by water waves, the tether, ocean currents, and the high-pressure water jet. The proposed control law can react to the disturbances, and the ROV never leaves the defined safe set.
|
|
14:00-15:00, Paper TuPIT11.10 | |
Integrated 3DOF Trajectory Tracking Control for Under-Actuated Marine Surface Vehicles by Trajectory Linearization |
|
Sempertegui, Miguel | Ohio University |
Zhu, J. Jim | Ohio University |
Keywords: Marine Robotics, Motion Control, Nonholonomic Mechanisms and Systems
Abstract: An integrated 3DOF trajectory tracking control algorithm with lateral drift correction for under-actuated Marine Surface Vehicles is presented using the Multi-Nested Loop Trajectory Linearization Control architecture. The sideslip angle is used as a virtual control effector for generating a lateral hydrodynamic force to correct the lateral drift due to a skid-turn, which is intrinsic to MSVs. The nominal sideslip angle is determined based on the kinematics and dynamics of the MSV along a nominal trajectory. Simulation results for a sub-scale vessel with significant vessel parameter perturbations are presented to demonstrate effectiveness of the proposed algorithm.
|
|
14:00-15:00, Paper TuPIT11.11 | |
SAVOR: Sonar-Aided Visual Odometry and Reconstruction for Autonomous Underwater Vehicles |
|
Coffelt, Jeremy Paul | Rosenxt |
Kampmann, Peter | ROSEN Technology and Research Center GmbH |
Wehbe, Bilal | German Research Center for Artificial Intelligence |
Keywords: Marine Robotics, Field Robots, Vision-Based Navigation
Abstract: Visual odometry (VO) relies on sequential camera images to estimate robot motion. For underwater robots, this is often complicated by turbidity, light attenuation, and environments containing scarce or repetitive features. Even ideal imagery suffers from the issue of scale ambiguity common to all monocular VO implementations. To address these issues, we supplement a camera with a multibeam echosounder. This acoustic, time-of-flight sensor comes with its own challenges, including relatively slow and sparse measurements that can be further degraded by backscatter from suspended particulate matter as well as interfering sounds from nearby marine traffic. We propose a method for fusing only data from these two inspection sensors into a hybrid VO solution that does not rely on IMU, DVL, or any other positioning sensor. We demonstrate this method on real data collected by an autonomous underwater vehicle performing end-to-end pipeline inspection in the open ocean, where multiple passes through the same scene (i.e., the "loop closure" common to SLAM algorithms) is often time and cost prohibitive. We also show how this approach can be extended for the creation of dense point clouds that provide a colored reconstruction of the surveyed scene.
|
|
14:00-15:00, Paper TuPIT11.12 | |
Prediction of Acoustic Communication Performance for AUVs Using Gaussian Process Classification |
|
Gao, Yifei | Virginia Tech |
Yetkin, Harun | Bartin University |
Stilwell, Daniel | Virginia Tech |
McMahon, James | The Naval Research Laboratory |
Keywords: Marine Robotics
Abstract: Cooperating autonomous underwater vehicles (AUVs) often rely on acoustic communication to coordinate their actions effectively. However, the reliability of underwater acoustic communication decreases as the communication range between vehicles increases. Consequently, teams of cooperating AUVs typically make conservative assumptions about the maximum range at which they can communicate reliably. To address this limitation, we propose a novel approach that involves learning a map representing the probability of successful communication based on the locations of the transmitting and receiving vehicles. This probabilistic communication map accounts for factors such as the range between vehicles, environmental noise, and multi-path effects at a given location. In pursuit of this goal, we investigate the application of Gaussian process binary classification to generate the desired communication map. We specialize existing results to this specific binary classification problem and explore methods to incorporate uncertainty in vehicle location into the mapping process. Furthermore, we compare the prediction performance of the probability communication map generated using binary classification with that of a signal-to-noise ratio (SNR) communication map generated using Gaussian process regression. Our approach is experimentally validated using communication and navigation data collected during trials with a pair of Virginia Tech 690 AUVs.
|
|
14:00-15:00, Paper TuPIT11.13 | |
This Is the Way: Mitigating the Roll of an Autonomous Uncrewed Surface Vessel in Wavy Conditions Using Model Predictive Control |
|
Jenkins, Daniel | Queen's University |
Marshall, Joshua A. | Queen's University |
Keywords: Marine Robotics, Motion and Path Planning, Optimization and Optimal Control
Abstract: Though larger vessels may be well-equipped to deal with wavy conditions, smaller vessels are often more susceptible to disturbances. This paper explores the development of a nonlinear model predictive control (NMPC) system for Uncrewed Surface Vessels (USVs) in wavy conditions to minimize average roll. The NMPC is based on a prediction method that uses information about the vessel's dynamics and an assumed wave model. This method is able to mitigate the roll of an under-actuated USV in a variety of conditions by adjusting the weights of the cost function. The results show a reduction of 39% of average roll with a tuned controller in conditions with 1.75-metre sinusoidal waves. A general and intuitive tuning strategy is established. This preliminary work is a proof of concept which sets the stage for the leveraging of wave prediction methodologies to perform planning and control in real time for USVs in real-world scenarios and field trials.
|
|
14:00-15:00, Paper TuPIT11.14 | |
A Deep Reinforcement Learning Framework and Methodology for Reducing the Sim-To-Real Gap in ASV Navigation |
|
Batista, Luis F. W. | Georgia Instutue of Technology and Universite De Lorraine |
Ro, Junghwan | Georgia Institute of Technology |
Richard, Antoine | University of Luxembourg |
Schroepfer, Pete | Cnrs Irl 2958 |
Hutchinson, Seth | Georgia Institute of Technology |
Pradalier, Cedric | GeorgiaTech Lorraine |
Keywords: Marine Robotics, Reinforcement Learning, Field Robots
Abstract: Despite the increasing adoption of Deep Reinforcement Learning (DRL) for Autonomous Surface Vehicles (ASVs), there still remain challenges limiting real-world deployment. In this paper, we first integrate buoyancy and hydrodynamics models into a modern Reinforcement Learning framework to reduce training time. Next, we show how system identification coupled with domain randomization improves the RL agent performance and narrows the sim-to-real gap. Real-world experiments for the task of capturing floating waste show that our approach lowers energy consumption by 13.1% while reducing task completion time by 7.4%. These findings, supported by sharing our open-source implementation, hold the potential to impact the efficiency and versatility of ASVs, contributing to environmental conservation efforts.
|
|
14:00-15:00, Paper TuPIT11.15 | |
OAS-GPUCB: On-The-Way Adaptive Sampling Using GPUCB for Bathymetry Mapping |
|
Agrawal, Rajat | Indian Institute of Science Education and Research Bhopal |
Nambiar, Karthik | IISER Bhopal |
Chhaglani, Bhawana | Bharati Vidyapeeth's College of Engineering |
Chitre, Mandar | National University of Singapore |
Pb, Sujit | IISER Bhopal |
Keywords: Marine Robotics
Abstract: Bathymetry mapping of static water bodies like lakes is essential for sustainable ecosystem development strategies. However, bathymetry mapping using (i) traditional manual sampling approaches using single-beam echosounder (SBE) has large mapping errors and (ii) performing multi-beam survey is very expensive. Alternatively, performing lawn-mower SBE surveys is time consuming due to limited field-of-view. In order to address the above issues, in this paper, we present an on-the-way sampling approach with Gaussian Process Upper Confidence Bound (GPUCB) algorithm called as OAS-GPUCB that can adaptively sample the lake to minimize the bathymetry error while reducing the distance travelled to achieve a given mapping accuracy. We validate the proposed approach using simulations on actual lake bathymetry maps and also carry out real-world experiments using an Autonomous Surface Vehicle(ASV) with SBE. Further, we compare OAS-GPUCB to lawn-mower, GPUCB, and GPUCB with fixed radius appraoches. The results consistently show that the proposed approach can achieve less than 10% bathymetry error while achieving distance reduction of more than 55% compared to the lawn-mower approach, and more than 90% less distance travelled compared to GPUCB and GPUCB with fixed radius approaches. The results shows the general applicability of OAS-GPUCB for bathymetry mapping of water bodies without any prior information maps.
|
|
14:00-15:00, Paper TuPIT11.16 | |
Interpretation of Legged Locomotion in Underwater Robots Based on Rimless Wheel Model |
|
He, Yuetong | Japan Advanced Institute of Science and Technology |
Asano, Fumihiko | Japan Advanced Institute of Science and Technology |
Keywords: Marine Robotics, Legged Robots, Search and Rescue Robots
Abstract: Inspired by the fascinating underwater locomotion of cephalopods such as octopuses, this research explores the possibility of using legged robots in underwater environments. Using a rimless wheel model, we investigated their navigation and adaptation capabilities in a dynamic fluid environment. Through sophisticated numerical simulations, we meticulously reproduce legged robot behaviours such as walking underwater and jumping over uneven seabed terrain. Our research results aim to provide valuable insights for the development of versatile underwater robotic systems suitable for various applications in ocean exploration and surveying.
|
|
14:00-15:00, Paper TuPIT11.17 | |
Risk-Averse Planning and Plan Assessment for Marine Robots |
|
Mohammadi Kashani, Mahya | IT-University of Copenhagen |
John, Tobias | University of Oslo |
Coffelt, Jeremy Paul | Rosenxt |
Johnsen, Einar Broch | University of Oslo |
Wasowski, Andrzej | IT University of Copenhagen |
Keywords: Formal Methods in Robotics and Automation, Planning, Scheduling and Coordination, Robot Safety
Abstract: Autonomous Underwater Vehicles (AUVs) need to operate for days without human intervention and thus must be able to do efficient and reliable task planning. Unfortunately, efficient task planning requires deliberately abstract domain models (for scalability reasons), which in practice leads to plans that might be unreliable or under performing in practice. An optimal abstract plan may turn out suboptimal or unreliable during physical execution. To overcome this, we introduce a method that first generates a selection of diverse high-level plans and then assesses them in a low-level simulation to select the optimal and most reliable candidate. We evaluate the method using a realistic underwater robot simulation, estimating the risk metrics for different scenarios, demonstrating feasibility and effectiveness of the approach.
|
|
TuPIT12 |
Room 12 |
Design of Robotics Systems |
Teaser Session |
Chair: Ren, Hongliang | Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS) |
Co-Chair: El-Khasawneh, Bashar | Khalifa University |
|
14:00-15:00, Paper TuPIT12.1 | |
MANIP: A Modular Architecture for Integrating Interactive Perception for Robot Manipulation |
|
Yu, Justin | University of California Berkeley |
Sadjadpour, Tara | University of California, Berkeley |
O'Neill, Abigail | UC Berkeley BAIR |
Khfifi, Mehdi | University of California Berkeley |
Chen, Lawrence Yunliang | UC Berkeley |
Cheng, Richard | California Institute of Technology |
Irshad, Muhammad Zubair | Georgia Institute of Technology |
Balakrishna, Ashwin | Toyota Research Institute |
Kollar, Thomas | Toyota Research Institute |
Goldberg, Ken | UC Berkeley |
Keywords: Methods and Tools for Robot System Design, Perception-Action Coupling, Bimanual Manipulation
Abstract: We propose a modular systems architecture, MANIP, that can facilitate the design and development of robot manipulation systems by systematically combining learned sub-policies with well-established procedural algorithmic primitives such as Inverse Kinematics, Kalman Filters, RANSAC outlier rejection, PID modules, etc. (aka "Good Old Fashioned Engineering (GOFE)"). The MANIP architecture grew from our lab's experience developing robot systems for folding clothes, routing cables, and untangling knots. To address failure modes, MANIP can facilitate inclusion of "interactive perception" sub-policies that execute robot actions to modify system state to bring the system into alignment with the training distribution and/or to disambiguate system state when system state confidence is low. We demonstrate how MANIP can be applied with 3 case studies and then describe a detailed case study in cable tracing with experiments that suggest MANIP can improve performance by up to 88%. Code and details are available at: https://berkeleyautomation.github.io/MANIP/
|
|
14:00-15:00, Paper TuPIT12.2 | |
Kinematic Modeling of Twisted String Actuator Based on Invertible Neural Networks |
|
Liu, Zekun | University of Electronic Science and Technology of China |
Wei, Dunwen | University of Electronic Science and Technology of China |
Gao, Tao | University of Electronic Science and Technology of China |
Gong, Jumin | University of Electronic Science and Technology of China |
Keywords: Tendon/Wire Mechanism, Kinematics, Soft Sensors and Actuators
Abstract: Twisted String Actuators (TSAs) exhibit several advantages, including lightweight, compact, and having a high power-to-weight ratio. However, current research on kinematic models of TSAs is limited to deriving the relationship between motor input and output through idealized geometric calculations. Therefore, the accuracy of these models does not meet the requirements for practical applications. Previous studies on the kinematic modeling of TSA have not considered the impact of material plastic deformation and stroke times on TSA kinematics. Accumulation of plastic deformation over multiple strokes leads to changes in the output displacement of TSA, significantly affecting the accuracy of the kinematic model. This study aims to address the limitations of previous research by investigating the use of Invertible Neural Networks (INNs) in kinematic modeling of TSAs, taking into account material plastic deformation and stroke times. Through a series of TSA experiments, a kinematic model of TSA was established using an INN that considers stroke times. The INN model proves to be superior in both forward and inverse kinematic modeling by effectively compensating for the effects of plastic deformation during TSA operation. The experimental results demonstrate that the kinematic model established by the proposed INN is more aligned with the actual conditions when compared to traditional kinematic models. This insight can aid in predicting the lifespan of TSA in the future.
|
|
14:00-15:00, Paper TuPIT12.3 | |
CubiX: Portable Wire-Driven Parallel Robot Connecting to and Utilizing the Environment |
|
Inoue, Shintaro | The University of Tokyo |
Kawaharazuka, Kento | The University of Tokyo |
Suzuki, Temma | The University of Tokyo |
Yuzaki, Sota | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Tendon/Wire Mechanism, Parallel Robots, Multi-Robot Systems
Abstract: A wire-driven parallel robot is a type of robotic system where multiple wires are used to control the movement of a end-effector. The wires are attached to the end-effector and anchored to fixed points on external structures. This configuration allows for the separation of actuators and end-effectors, enabling lightweight and simplified movable parts in the robot. However, its range of motion remains confined within the space formed by the wires, limiting the wire-driven capability to only within the pre-designed operational range. Here, in this study, we develop a wire-driven robot, CubiX, capable of connecting to and utilizing the environment. CubiX connects itself to the environment using up to 8 wires and drives itself by winding these wires. By integrating actuators for winding the wires into CubiX, a portable wire-driven parallel robot is realized without limitations on its workspace. Consequently, the robot can form parallel wire-driven structures by connecting wires to the environment at any operational location.
|
|
14:00-15:00, Paper TuPIT12.4 | |
Formalization of Temporal and Spatial Constraints of Bimanual Manipulation Categories |
|
Krebs, Franziska | Karlsruhe Institute of Technology (KIT) |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Bimanual Manipulation, Dual Arm Manipulation, Learning from Demonstration
Abstract: Executing bimanual manipulation tasks on humanoid robots introduces additional challenges due to inherent spatial and temporal coordination between both hands. In our previous work, we proposed the Bimanual Manipulation Taxonomy, which defines categories of bimanual manipulation strategies based on the coordination and physical interaction between both hands, the role of each hand in the task, and the symmetry of arm movements during task execution. In this work, we build upon this taxonomy and provide a formalization of temporal and spatial constraints associated with each category of the taxonomy. This formalization uses Petri nets to represent temporal constraints and differentiates between relative and global targets. We incorporate these constraints in a category-specific controllers to enable reactive adaptation of the behavior according to the respective coordination constraints. We evaluated our approach in simulation and in real-world experiments on the humanoid robot ARMAR-6. The results demonstrate that category-specific constraints can be enforced when needed while maintaining flexibility to accommodate additional constraints.
|
|
14:00-15:00, Paper TuPIT12.5 | |
Design and Implementation of a Novel Wheel-Based Cable Inspection Robot |
|
Hou, Mengqi | Nanjing University of Posts and Telecommunications |
Li, Jie | Nanjing University of Posts and Telecommunications |
Xu, Fengyu | Southeast University |
Hu, LeZhi | Nanjing University of Posts and Telecommunications |
Keywords: Climbing Robots, Mechanism Design, Kinematics
Abstract: Regular maintenance and inspection of cables are essential for cable-stayed bridges and suspension bridges, as cables serve as the core components. In order to enable automated detection of cables, this paper proposes a novel wheeled cable inspection robot. The robot utilizes a bilateral wheel structure and is composed of four independent suspension mechanisms. By collaborating with a lifting mechanism,the robot achieves functions such as adhesion, climbing, and obstacle overcoming. The robot is powered by a lithium polymer battery and operated via wireless control by ground personnel.This paper provides a detailed exposition on the structure design and control system of robots, and conducts a mechanical analysis of the suspension mechanism of robots. The maximum obstacle-negotiation height of the robot is calculated, and a mechanical model for cable climbing is established. During prototype testing, the robot demonstrated a mass of 6.7kg, a maximum payload capacity of 6kg, a maximum obstacle height of 10mm, and a fastest climbing speed of 14m/min. These specifications meet the requirements of practical inspections.
|
|
14:00-15:00, Paper TuPIT12.6 | |
Towards Electricity-Free Pneumatic Miniature Rotation Actuator for Optical Coherence Tomography Endoscopy |
|
Zhang, Tinghua | The Chinese University of Hong Kong |
Yuan, Sishen | The Chinese University of Hong Kong |
Xu, Chao | The Chinese University of Hong Kong |
Liu, Peng | Harbin Institute of Technology, Shenzhen |
Ren, Hongliang | Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS) |
Yuan, Wu | The Chinese University of Hong Kong |
Keywords: Hydraulic/Pneumatic Actuators, Actuation and Joint Mechanisms, Soft Sensors and Actuators
Abstract: Miniature rotation actuators have been extensively developed and utilized in optical coherence tomography (OCT) endoscopy, enabling distortion-free OCT imaging in complex and tortuous environments. However, the use of electrical-driven rotation actuators raises safety concerns. Although magnetic-driven rotation actuators have been reported in OCT endoscopy, their use can potentially interfere with other medical devices in clinical settings. Here, we propose a pneumatic miniature rotation actuator that eliminates the electricity and magnetism concerns in circumferential imaging for OCT endoscopy. The rotor of the actuator is designed as a windmill, enabling it to convert air energy into rotation energy. Meanwhile, to maintain the stable rotation, both a sliding bearing with two supporting points and a glass spindle with a halfball end surface are developed. The rotation speed of our pneumatic actuator can be controlled from 66 to 97 revolutions per second by adjusting the airflow rate from 3.25 to 4.00 liters per minute. By OCT imaging of the human fingers, we demonstrate the feasibility of the pneumatic actuator in electricity-free distal scanning OCT endoscopy. Our pneumatic rotation actuator has wide-ranging potential in various fiber-imaging modalities, including not only OCT but also ultrasound imaging that requires similar rotation capabilities.
|
|
14:00-15:00, Paper TuPIT12.7 | |
ICR-Based Kinematics for Wheeled Skid-Steer Vehicles on Firm Slopes |
|
Martinez, Jorge L. | University of Malaga |
Morales, Jesús | Universidad De Málaga |
Sánchez-Montero, Manuel | University of Malaga |
García-Cerezo, Alfonso | University of Malaga |
Keywords: Kinematics, Autonomous Vehicle Navigation, Wheeled Robots
Abstract: This paper proposes a new kinematic model for wheeled skid-steer vehicles moving with low inertia on firm slopes that is based on the variation of the Instantaneous Center of Rotation (ICR) of its lateral treads. To this end, the ICR-based model for horizontal surfaces, where constant tread ICRs were considered, has been extended with two additional parameters to account for the sliding down phenomenon on inclined terrains that the former is unable to predict. The current pitch and roll angles of the vehicle together with the speeds of the treads are employed to calculate the changing positions of tread ICRs during turnings. This kinematic approach shows promising results when applied to the heavy robotic rover J8 on a slanted surface of smooth concrete.
|
|
14:00-15:00, Paper TuPIT12.8 | |
Enhancing Robustness in Manipulability Assessment: The Pseudo-Ellipsoid Approach |
|
Shahriari, Erfan | Technical University of Munich |
Peper, Kim Kristin | Technical University of Munich |
Hoffmann, Matej | Czech Technical University in Prague, Faculty of Electrical Engi |
Haddadin, Sami | Technical University of Munich |
Keywords: Kinematics, In-Hand Manipulation, Human Factors and Human-in-the-Loop
Abstract: Manipulability analysis is a methodology employed to assess the capacity of an articulated system, at a specific configuration, to produce motion or exert force in diverse directions. The conventional method entails generating a virtual ellipsoid using the system's configuration and model. Yet, this approach poses challenges when applied to systems such as the human body, where direct access to such information is limited, necessitating reliance on estimations. Any inaccuracies in these estimations can distort the ellipsoid's configuration, potentially compromising the accuracy of the manipulability assessment. To address this issue, this article extends the standard approach by introducing the concept of the manipulability pseudo-ellipsoid. Through a series of theoretical analyses, simulations, and experiments, the article demonstrates that the proposed method exhibits reduced sensitivity to noise in sensory information, consequently enhancing the robustness of the approach.
|
|
14:00-15:00, Paper TuPIT12.9 | |
Navigated Locomotion and Controllable Splitting of a Microswarm in a Complex Environment |
|
Liu, Yuezhen | The Chinese University of Hong Kong, Shenzhen |
Zeng, Guangjun | The Chinese University of Hong Kong, Shenzhen |
Du, Xingzhou | Shenzhen Institute of Artificial Intelligence and Robotics for S |
Fang, Kaiwen | The Chinese University of Hong Kong, Shenzhen |
Yu, Jiangfan | Chinese University of Hong Kong, Shenzhen |
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales
Abstract: Reconfigurable microswarms have received extensive attention recently. In this work, we propose a control strategy for a ribbon-like swarm to perform navigated locomotion with a stable pattern, and perform controllable splitting into double subswarms to reach two targets simultaneously. Two different behaviors of the ribbon-like swarm are firstly investigated, i.e., locomotion with a stable pattern, and controllable splitting. The two behaviors of the ribbon-like swarm are realized based on different aspect ratio of the swarm. Subsequently, we propose a morphology controller to keep the aspect ratio of the swarm within a desired range. The morphology controller consists of a feedforward controller and a PD controller. The feedforward controller containing a fitted model, and a fuzzy logic controller for online compensation of the model error. The control strategy combining the morphology planning, morphology controller, path planning, and motion controller is developed. Using the proposed control strategy, the ribbon-like swarm can be navigated to follow a desired path with a stable pattern while avoiding obstacles, and finally perform controllable splitting into double subswarms to reach two predefined targets simultaneously.
|
|
14:00-15:00, Paper TuPIT12.10 | |
NanoNeRF: Robot-Assisted Nanoscale 360° Reconstruction with Neural Radiance Field under Scanning Electron Microscope |
|
Fu, Xiang | ShanghaiTech University |
Xu, Yifan | ShanghaiTech University |
Wang, Shudong | Xi'an Jiaotong University |
Lu, Haojian | Zhejiang University |
Li, Jiaqi | ShanghaiTech University |
Li, Y.F. | City University of Hong Kong |
Su, Hu | Institute of Automation, Chinese Academy of Science |
Liu, Song | ShanghaiTech University |
Keywords: Micro/Nano Robots, Computer Vision for Automation, Automation at Micro-Nano Scales
Abstract: The pursuit of 3D reconstruction from 2D images for nanomanipulation under scanning electron microscopy stands as a critical research endeavor. Previous methods either necessitates additional lighting which is difficult in standard SEM devices or relies on feature matching with low resolution and precision, further constraining reconstruction performance. In this paper, we propose a novel robot-assisted nanoscale 360° reconstruction approach, which simplifies SEM setups and maximizes the utilization of robot motion and feedback. By harnessing a nanorobotic system, we capture 360° multi-view images automatically with precise mapping information and camera postures. Sequentially, neural radiance field reconstruct the pixel-wise structure and synthesizing images from diverse perspectives. Experimental results using two real datasets demonstrates our approach’s efficacy, achieving PSNR of 28.1 and SSIM of 0.93 for nanotube reconstruction, and PSNR of 32.8 and SSIM of 0.98 for AFM cantilever reconstruction. These results validate the reliability and robustness of our proposed robot-assisted reconstruction method.
|
|
14:00-15:00, Paper TuPIT12.11 | |
A New 10-Mg SMA-Based Fast Bimorph Actuator for Microrobotics |
|
Trygstad, Conor | Washington State University |
Blankenship, Elijah | Washington State University |
Perez-Arancibia, Nestor O | Washington State University (WSU) |
Keywords: Micro/Nano Robots, Biologically-Inspired Robots, Marine Robotics
Abstract: We present a new millimeter-scale bimorph actuator for microrobotic applications, driven by feedforward controlled shape-memory alloy (SMA) wires. The device weighs 10mg, measures 14mm in length, and occupies a volume of 4.8mm 3, which makes it the lightest and smallest fully functional SMA-based bimorph actuator for microrobotics developed to date. The experimentally measured operational bandwidth is on the order of 20Hz, and the unimorph and bimorph maximum low-frequency displacement outputs are on the order of 3.5 and 7mm, respectively. To test and demonstrate the functionality and suitability of the actuator for microrobotics, we developed the Fish-&-Ribbon–Inspired Small Swimming Harmonic roBot (FRISSHBot). Loosely inspired by carangiformes, the FRISSHBot leverages fluid-structure interaction (FSI) phenomena to propel itself forward, weighs 30mg, measures 34mm in length, operates at frequencies of up to 4Hz, and swims at speeds of up to 3.06mm · s −1 (0.09 Bl · s −1). This robot is the lightest and smallest swimmer with onboard actuation developed to date.
|
|
14:00-15:00, Paper TuPIT12.12 | |
On a Magnetically Driven Array System with Autonomous Motion and Object Delivery for Biomedical Microrobots |
|
Liu, Yueyue | Jiangnan University |
Hou, Zhe | Jiangnan University |
Fan, Qigao | Jiangnan University |
Keywords: Micro/Nano Robots
Abstract: The application of microrobots in the biomedical field has attracted great interest, among which drug transportation is one of the application scenarios. Traditional studies used the global magnetic field to control single microrobot, therefore it is impossible to control multiple microrobots. To address this problem, this paper develops a local magnetic field generation system to realize the independent control of multiple microrobots. The proposed multi-microrobot motion system integrates perception, planning, and actuation, enabling autonomous multi-task drug delivery. In our system, we first develop a printed circuit board (PCB) array magnetic driven microrobot system based on a micro coil array, then the Yolov8 framework is employed for the target/environment recognition, accurately identifying microrobots and magnetic fluids, while the Rapidly-exploring Random Trees (RRT) algorithm is used for path planning. We have conducted experiments on obstacle avoidance, droplet transport, and drug fusion. The results clearly demonstrate the significant potential of magnetic field-driven microcoil array devices in transportation and drug fusion engineering.
|
|
14:00-15:00, Paper TuPIT12.13 | |
Analysis of Lockable Passive Prismatic and Revolute Joints |
|
Rosyid, Abdur | Khalifa University |
El-Khasawneh, Bashar | Khalifa University |
Keywords: Parallel Robots, Field Robots
Abstract: This paper analyzes the stresses, positional errors, and friction in lockable passive prismatic and revolute joints. The joints’ locking mechanisms that use solenoids to trigger the locking action have a self-alignment capability. The stress analysis evaluates the strength and material deformation of the joints’ components. The positional error analysis relates the clearances and contact deformations in the joints’ assembly with the positional errors of the joints. The friction analysis investigates how the friction during the locking motion interacts with the joint load, the pushing force, and the locking acceleration. The stress analysis was performed analytically for simplified cases and by finite element analysis for cases involving complex geometries and nonlinear contact. The positional error and friction analyses were performed analytically by deriving the kinematic and dynamic equations. Discussions based on the analyses provide a deeper understanding of the behavior of lockable joints that applies not only to the specific joints discussed in this paper but also to other lockable joints working with similar principles.
|
|
14:00-15:00, Paper TuPIT12.14 | |
Development of a Novel Redundant Parallel Mechanism with Enlarged Workspace and Enhanced Dexterity for Fracture Reduction Surgery |
|
Yuan, Quan | ShanghaiTech University |
Liang, Xu | Beijing Jiaotong University |
Su, Tingting | Beijing University of Technology |
Bai, Weibang | ShanghaiTech University |
Keywords: Parallel Robots, Redundant Robots, Mechanism Design
Abstract: The limited workspace and complex singularity issues are predominant factors impeding the clinical applicability of fracture reduction parallel robots. To address these challenges, this paper proposes a novel redundant parallel mechanism (NRPM) for robotic-assisted fracture reduction with an enlarged workspace and enhanced dexterity capabilities based on the traditional Stewart parallel mechanism (SPM). With six redundant degrees-of-freedom (DOFs) added to the novel mechanism, the kinematics of NRPM needs to be thoroughly analyzed. Furthermore, the calculation of its workspace and determination of its dexterity are deduced. Both the analytical simulation and real experiment results demonstrated the effectiveness and superior performance of the proposed NRPM compared to SPM.
|
|
14:00-15:00, Paper TuPIT12.15 | |
Embedded Sensing-Enabled External Interaction Estimation of 6-PSS Parallel Robots |
|
Xia, Jingyuan | Shanghai Jiao Tong University |
Lin, Zecai | Shanghai Jiao Tong University |
Ai, Xiaojie | Shanghai Jiao Tong University |
Yu, Guangjun | The Second Affiliated Hospital, the Chinese University of Hong K |
Gao, Anzhu | Shanghai Jiao Tong University |
Keywords: Parallel Robots
Abstract: Traditional interaction perception of parallel robots relies on a six-dimensional force sensor for contact sensing at their distal end. However, the sensor body occupies the space of moving platform and also increases the load on the robot actuations. To enable both minimization and embodied intelligence, this paper proposes an external interaction estimation method with embodied mechanical intelligence by embedding two single-axis force sensors in each leg of 6-PSS parallel robot. The method uses a backward propagation neural network optimized by sparrow search algorithm, and it can simultaneously estimate the external force and its position using information from multiple single-axis force sensors and the encoder of driving motor. The experimental platform is established to collect the data and train the network. The result shows that the force estimation mean error is 2.4% and the position estimation error is 2.9%. A demonstration with a virtual display interface showing the reconstructed parallel robot pose, and the interaction force and its pose using the proposed estimation method, indicates the effectiveness of the proposed interaction method with embodied mechanical intelligence for 6-PSS parallel robot.
|
|
14:00-15:00, Paper TuPIT12.16 | |
Abstraction of the Body Ability of the Transformer Robot System for the Transportation and Installation of Heavy Objects in Land and Underwater Environments |
|
Makabe, Tasuku | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Software-Hardware Integration for Robot Systems, Methods and Tools for Robot System Design, Humanoid Robot Systems
Abstract: To give the single robot system the ability to realize many behaviors and to realize tasks with shifting environments and objectives, it is necessary to abstract the robot’s body ability to the extent that they can be detected by sensors in the body so that we can plan as the problem of state transition. In this paper, to abstract the transformer robot system that performs heavy lifting and environmental attachment tasks in an aquatic and terrestrial environment, we extend the graphical representation of the robot’s body to manage joint capability and the body adaptability for the environment. To abstract the body ability, we divide the body into elements and define Connection between them at three different granularities. And using Connection, we propose the Connection Modification Feature(CMF) as the representation for changing body ability. To implement the Connection Modification Feature, we perform the abstract description and extract Connection to construct Body Ability Graph, a graph for the robot to manage its body ability. We show that it is possible to plan to manipulate and use its own Connection Modification Feature through multiple experiments by defining Normal Action that does not change body abilities and Body Ability Modifying Action that manipulate body abilities.
|
|
TuAT1 |
Room 1 |
Best Conference Papers |
Regular session |
|
15:00-15:15, Paper TuAT1.1 | |
FogROS2-FT: Fault Tolerant Cloud Robotics |
|
Chen, Kaiyuan | University of California, Berkeley |
Hari, Kush | UC Berkeley |
Chung, Trinity | UC Berkeley |
Wang, Michael | Bosch |
Tian, Nan | University of California, Berkeley |
Juette, Christian | Bosch Research |
Ichnowski, Jeffrey | Carnegie Mellon University |
Ren, Liu | Robert Bosch North America Research Technology Center |
Kubiatowicz, John | UC Berkeley |
Stoica, Ion | UC Berkeley |
Goldberg, Ken | UC Berkeley |
Keywords: Distributed Robot Systems, Networked Robots, Multi-Robot Systems
Abstract: Cloud robotics enables robots to offload complex computational tasks to cloud servers for performance and ease of management. However, cloud compute can be costly, cloud services can suffer occasional downtime, and connectivity between the robot and cloud are prone to variations in network Quality-of-Service (QoS). We present FogROS2-FT (Fault Tolerant) to mitigate these issues by introducing a multi-cloud extension that automatically replicates independent stateless robotic services, routes requests to these replicas, and directs the first response back. With replication, robots can still benefit from cloud computations even when a cloud service provider is down or there is low QoS. Additionally, many cloud computing providers offer low-cost “spot” computing instances that may shutdown unpredictably. Normally, these low-cost instances would be inappropriate for cloud robotics, but the fault tolerance nature of FogROS2-FT allows them to be used reliably. We demonstrate FogROS2-FT fault tolerance capabilities in 3 cloud-robotics scenarios in simulation (visual object detection, semantic segmentation, motion planning) and 1 physical robot experiment (scan-pick-and-place). Running on the same hardware specification, FogROS2-FT achieves motion planning with up to 2.2x cost reduction and up to a 5.53x reduction on 99 Percentile (P99) long-tail latency. FogROS2-FT reduces the P99 long-tail latency of object detection and semantic segmentation by 2.0x and 2.1x, respectively, under network slowdown and resource contention.
|
|
15:15-15:30, Paper TuAT1.2 | |
On the Modularity of Elementary Dynamic Actions |
|
Nah, Moses | MIT |
Lachner, Johannes | Massachusetts Institute of Technology |
Tessari, Federico | Massachusetts Institute of Technology |
Hogan, Neville | Massachusetts Institute of Technology |
Keywords: Compliance and Impedance Control, Learning from Demonstration, Dexterous Manipulation
Abstract: In this paper, a kinematically modular approach to robot control is presented. The method involves structures called Elementary Dynamic Actions and a network model combining these elements. With this control framework, a rich repertoire of movements can be generated by combination of basic modules. The problems of solving inverse kinematics, managing kinematic singularity and kinematic redundancy are avoided. The modular approach is robust against contact and physical interaction, which makes it particularly effective for contact-rich manipulation. Each kinematic module can be learned by Imitation Learning, thereby resulting in a modular learning strategy for robot control. The theoretical foundations and their implementation on a real robot are presented. Using a KUKA LBR iiwa robot, three tasks were considered: (1) generating a sequence of discrete movements, (2) generating a combination of discrete and rhythmic movements, and (3) a drawing and erasing task. The results obtained show that this modular approach has the potential to simplify the generation of a diverse range of robot actions.
|
|
15:30-15:45, Paper TuAT1.3 | |
Millipede-Inspired Multi-Legged Magnetic Soft Robots for Targeted Locomotion in Tortuous Environments |
|
Wang, Yibin | The Chinese University of HongKong, Shenzhen |
Xiong, Yiting | The Chinese University of Hong Kong, Shenzhen |
Fang, Kaiwen | The Chinese University of Hong Kong, Shenzhen |
Yu, Jiangfan | Chinese University of Hong Kong, Shenzhen |
Keywords: Micro/Nano Robots, Biologically-Inspired Robots
Abstract: Miniature robots capable of untethered operation hold great promise for performing diagnostic and therapeutic procedures in hard-to-reach regions within the human body. Nonetheless, navigating these complex and diverse physiological environments remains a significant challenge. To effectively navigate the tortuous pathways inside the human body, it is essential to equip miniature robots with flexible body structures that can adapt to complex geometries and develop efficient actuation strategies for deformed robots. In this study, we present a miniature soft robot featuring a zigzag body structure, imparting the robot with remarkable deformation capabilities that enable it to adapt to confined and tortuous spaces. This robot is equipped with arrays of magnetic legs, enabling robust locomotion propelled by traveling metachronal waves. We demonstrate that the robot can crawl on both flat surfaces and slopes. Leveraging its in-plane flexibility and discrete actuation system, this robot can navigate through intricate environments with precise control using magnetic fields. Our work provides valuable insights into the development of crawling robots with enhanced agility and adaptability, creating opportunities for their future use in a wide range of biomedical applications.
|
|
15:45-16:00, Paper TuAT1.4 | |
DiMSam: Diffusion Models As Samplers for Task and Motion Planning under Partial Observability |
|
Fang, Xiaolin | MIT |
Garrett, Caelan | NVIDIA |
Eppner, Clemens | NVIDIA |
Lozano-Perez, Tomas | MIT |
Kaelbling, Leslie | MIT |
Fox, Dieter | University of Washington |
Keywords: Task and Motion Planning, Machine Learning for Robot Control
Abstract: Generative models such as diffusion models, excel at capturing high-dimensional distributions with diverse input modalities, e.g. robot trajectories, but are less effective at multi-step constraint reasoning. Task and Motion Planning (TAMP) approaches are suited for planning multi-step autonomous robot manipulation. However, it can be difficult to apply them to domains where the environment and its dynamics are not fully known. We propose to overcome these limitations by composing diffusion models using a TAMP system. We use the learned components for constraints and samplers that are difficult to engineer in the planning model, and use a TAMP solver to search for the task plan with constraint-satisfying action parameter values. To tractably make predictions for unseen objects in the environment, we define the learned samplers and TAMP operators on learned latent embedding of changing object states. We evaluate our approach in a simulated articulated object manipulation domain and show how the combination of classical TAMP, generative modeling, and latent embedding enables multi-step constraint-based reasoning. We also apply the learned sampler in the real world.
|
|
TuAT2 |
Room 2 |
Best Cognitive Robotics Papers (KROS) |
Regular session |
Chair: Valada, Abhinav | University of Freiburg |
|
15:00-15:15, Paper TuAT2.1 | |
Evidential Semantic Mapping in Off-Road Environments with Uncertainty-Aware Bayesian Kernel Inference |
|
Kim, Junyoung | Agency for Defense Development |
Seo, Junwon | Carnegie Mellon University |
Min, Jihong | Agency for Defense Development |
Keywords: Mapping, Semantic Scene Understanding, Field Robots
Abstract: Robotic mapping with Bayesian Kernel Inference (BKI) has shown promise in creating semantic maps by effectively leveraging local spatial information. However, existing semantic mapping methods face challenges in constructing reliable maps in unstructured outdoor scenarios due to unreliable semantic predictions. To address this issue, we propose an evidential semantic mapping, which can enhance reliability in perceptually challenging off-road environments. We integrate Evidential Deep Learning into the semantic segmentation network to obtain the uncertainty estimate of semantic prediction. Subsequently, this semantic uncertainty is incorporated into an uncertainty-aware BKI, devised to prioritize confident semantic predictions when accumulating semantic information. By adaptively handling semantic uncertainties, the proposed framework constructs robust representations of the surroundings even in previously unseen environments. Comprehensive experiments across various off-road datasets demonstrate that our framework enhances accuracy and robustness, consistently outperforming existing methods in scenes with high perceptual uncertainties.
|
|
15:15-15:30, Paper TuAT2.2 | |
Spike-Based High Energy Efficiency and Accuracy Tracker for Robot |
|
Qu, Jinye | Institute of Automation, Chinese Academy of Sciences |
Gao, Zeyu | Institute of Automation, Chinese Academy of Sciences |
Yi, Li | School of Information Engineering, Nanchang University |
Lu, Yanfeng | Institute of Automation, Chinese Academy of Sciences |
Qiao, Hong | Institute of Automation, Chinese Academy of Sciences |
Keywords: Visual Tracking, Computer Vision for Automation, Neurorobotics
Abstract: Spiking Neural Networks (SNNs) have gained attention for their apparent energy efficiency and significant biological interpretability, although they also face significant challenges such as prolonged latency and suboptimal tracking accuracy. Recent studies have explored the application of SNNs in object tracking tasks. Dynamic visual sensors (DVS) have become a popular way to implement SNN-based object tracking due to their asynchronous and spiking characteristics similar to SNNs. However, challenges such as the high cost of DVS cameras and the lack of object surface texture information hinder the utility and performance of DVS trackers. In contrast, RGB information has inherent advantages, including low acquisition cost and comprehensive object surface texture representation. However, RGB information is prone to excessive image blurring in low-light conditions or in fast-motion scenes. To address these challenges, we propose the “Motion Feature Extractor” and the “RGB-DVS Fusion Module”. The ”Motion Feature Extractor” can replace the DVS camera at a very low cost, and the ”RGBDVS Fusion Module” can deeply fuse the feature information of the two to make up for their respective deficiencies. In addition, we adopt a conversion method to obtain a lossless SNN version of the model. Through experiments, our model achieves a 13.6% improvement in the expected average overlap (EAO) index using only 1.47% of the energy consumption of SiamRPN (VOT2016 dataset). In addition, we deployed the model to a robot and then conducted tracking experiments, which confirmed that the model can operate on the robot losslessly with satisfactory results.
|
|
15:30-15:45, Paper TuAT2.3 | |
BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation |
|
Schramm, Jonas | University of Freiburg |
Vödisch, Niclas | University of Freiburg |
Petek, Kürsat | University of Freiburg |
Ravi, Kiran | Qualcomm |
Yogamani, Senthil | Valeo Vision Systems |
Burgard, Wolfram | University of Technology Nuremberg |
Valada, Abhinav | University of Freiburg |
Keywords: Sensor Fusion, Semantic Scene Understanding, Deep Learning for Visual Perception
Abstract: Semantic scene segmentation from a bird's-eye-view (BEV) perspective plays a crucial role in facilitating planning and decision-making for mobile robots. Although recent vision-only methods have demonstrated notable advancements in performance, they often struggle under adverse illumination conditions such as rain or nighttime. While active sensors offer a solution to this challenge, the prohibitively high cost of LiDARs remains a limiting factor. Fusing camera data with automotive radars poses a more inexpensive alternative but has received less attention in prior research. In this work, we aim to advance this promising avenue by introducing BEVCar, a novel approach for joint BEV object and map segmentation. The core novelty of our approach lies in first learning a point-based encoding of raw radar data, which is then leveraged to efficiently initialize the lifting of image features into the BEV space. We perform extensive experiments on the nuScenes dataset and demonstrate that BEVCar outperforms the current state of the art. Moreover, we show that incorporating radar information significantly enhances robustness in challenging environmental conditions and improves segmentation performance for distant objects. To foster future research, we provide the weather split of the nuScenes dataset used in our experiments, along with our code and trained models at http://bevcar.cs.uni-freiburg.de.
|
|
15:45-16:00, Paper TuAT2.4 | |
Multimodal Evolutionary Encoder for Continuous Vision-Language Navigation |
|
He, Zongtao | Tongji University |
Wang, Liuyi | Tongji University |
Chen, Lu | Tongji University |
Li, Shu | Tongji University |
Yan, Qingqing | Tongji University |
Liu, Chengju | Tongji University |
Chen, Qijun | Tongji University |
Keywords: Vision-Based Navigation, Multi-Modal Perception for HRI, Representation Learning
Abstract: Can multimodal encoder evolve when facing increasingly tough circumstances? Our work investigates this possibility in the context of continuous vision-language navigation (continuous VLN), which aims to navigate robots under linguistic supervision and visual feedback. We propose a multimodal evolutionary encoder (MEE) comprising a unified multimodal encoder architecture and an evolutionary pre-training strategy. The unified multimodal encoder unifies rich modalities, including depth and sub-instruction, to enhance the solid understanding of environments and tasks. It also effectively utilizes monocular observation, reducing the reliance on panoramic vision. The evolutionary pre-training strategy exposes the encoder to increasingly unfamiliar data domains and difficult objectives. The multi-stage adaption helps the encoder establish robust intra- and inter-modality connections and improve its generalization to unfamiliar environments. To achieve such evolution, we collect a large-scale multi-stage dataset with specialized objectives, addressing the absence of suitable continuous VLN pre-training. Evaluation on VLN-CE demonstrates the superiority of MEE over other direct action-predicting methods. Furthermore, we deploy MEE in real scenes using self-developed service robots, showcasing its effectiveness and potential for real-world applications. Our code and dataset are available at https://github.com/RavenKiller/MEE.
|
|
TuAT3 |
Room 3 |
Active Perception I |
Regular session |
Chair: Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
|
15:00-15:15, Paper TuAT3.1 | |
EFP: Efficient Frontier-Based Autonomous UAV Exploration Strategy for Unknown Environments |
|
Zhang, Hong | Harbin Institute of Technology |
Wang, SongYan | Harbin Institute of Technology |
Liu, Yuanshuai | Harbin Institute of Technology |
Ji, Pengtao | Harbin Institute of Technology |
Yu, Runzhuo | Harbin Institute of Technology |
Chao, Tao | Harbin Institute of Technology |
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Autonomous Vehicle Navigation
Abstract: The optimization of quadrotors for the efficient and autonomous exploration of complex, unknown environments and the construction of corresponding maps with integrity is of high priority in unmanned aerial vehicle(UAV) research. To overcome the challenges of inefficient and incomplete map construction in autonomous UAV exploration, this study propose EFP, an efficient frontier-based autonomous UAV exploration strategy for unknown environments. For this, the UFOMap algorithm was adopted to represent an entire environment and reduce the map construction time. Its accurate representation and hierarchical frontiers structure were then employed to rapidly extract frontiers. Simultaneously, a fast Euclidean clustering approach was implemented to process the frontiers and obtain the relevant viewpoints, an approximate trajectory optimization strategy was used to rapidly obtain a preferred trajectory that traverses all the viewpoints, and finally the RRT-based global planner and sampling-based local planner algorithms were utilized to perform autonomous exploration with a drone. The proposed algorithm was analyzed and validated in both simulation and real-world scenarios, demonstrating higher efficiency than state-of-the-art approaches and enabling quadrotors to autonomously explore and construct complete maps in complex and unknown
|
|
15:15-15:30, Paper TuAT3.2 | |
Semantics-Aware Receding Horizon Planner for Object-Centric Active Mapping |
|
Lu, Liang | University of Hong Kong |
Zhang, Yinqiang | The University of Hong Kong |
Zhou, Peng | The University of Hong Kong |
Qi, Jiaming | Centre for Transformative Garment Production, HongKong |
Pan, Yipeng | The University of Hong Kong |
Fu, Changhong | Tongji University |
Pan, Jia | University of Hong Kong |
Keywords: Reactive and Sensor-Based Planning, Semantic Scene Understanding, Object Detection, Segmentation and Categorization
Abstract: The escalating demands for real-time scene comprehension in modern industries underscore the growing significance of semantic information in the daily tasks of robots, particularly in areas like autonomous inspection and target searching. This letter introduces a semantics-aware receding horizon planner (SARHP) for efficiently building the object-centric volumetric map. It includes a multi-layer mapping strategy and a semantics-aware frontier detection and planning method. With the multi-layer map, the semantics-aware frontier detection is conducted in the local layer, and the route assessment is conducted in the Field-of-View layer, which can reduce the time cost of the planning stage. Moreover, kinematic cost, geometric cost, and semantic cost are considered in the planner to ensure high search performance for semantic objects without affecting the overall mapping efficiency. The effectiveness of the proposed mapping and planning algorithm is validated in simulation and real-world experiments.
|
|
15:30-15:45, Paper TuAT3.3 | |
View Planning for Grape Harvesting Based on Active Vision Strategy under Occlusion |
|
Yi, Tao | Xiangtan University |
Zhang, Dongbo | Xiangtan University |
Luo, Lufeng | Foshan University |
Luo, Jiangtao | Xiangtan University |
Keywords: Robotics and Automation in Agriculture and Forestry, Motion and Path Planning, Vision-Based Navigation
Abstract: Replacing humans with robots for fruit harvesting is the trend of agricultural automation in the future. However, for grape harvesting robots, locating the picking point becomes a significant challenge in highly occluded environments due to the small fruit stem, which can be entirely obscured by fruit leaves when the observation angle is poor. In the work, a view planner based on an active vision strategy is proposed to address the occlusion problem. It aims to find the picking point by altering the observation perspective of the harvesting robot. The view planning process is achieved through multiple iterations. Each iteration consists of three key steps: randomly generating candidate views, predicting the ideal perspective using a score function, and guiding the robotic arm to change the viewpoint. To evaluate the degree of occlusion, a novel concept of Spatial Coverage Rate Metric (SC) is introduced. Based on this, the score function is improved by incorporating SC and motion cost. Finally, to validate the effectiveness of the planner, we conducted comparative experiments with other advanced view planners on a real grape harvesting robot. The experimental results demonstrate that our method achieves a higher picking success rate with lower computation time.
|
|
15:45-16:00, Paper TuAT3.4 | |
Deep Reinforcement Learning-Based Large-Scale Robot Exploration |
|
Cao, Yuhong | National University of Singapore |
Zhao, Rui | BYD Auto Industry Corporation LTD |
Wang, Yizhuo | National University of Singapore |
Xiang, Bairan | National University of Singapore |
Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
Keywords: View Planning for SLAM, Reinforcement Learning, Motion and Path Planning
Abstract: In this work, we propose a deep reinforcement learning (DRL) based reactive planner to solve large-scale Lidar-based autonomous robot exploration problems in 2D action space. Our DRL-based planner allows the agent to reactively plan its exploration path by making implicit predictions about unknown areas, based on a learned estimation of the underlying transition model of the environment. To this end, our approach relies on learned attention mechanisms for their powerful ability to capture long-term dependencies at different spatial scales to reason about the robot's entire belief over known areas. Our approach relies on ground truth information (i.e., privileged learning) to guide the environment estimation during training, as well as on a graph rarefaction algorithm, which allows models trained in small-scale environments to scale to large-scale ones. Simulation results show that our model exhibits better exploration efficiency (12% in path length, 6% in makespan) and lower planning time (60%) than the state-of-the-art planners in a 130m*100m benchmark scenario. We also validate our learned model on hardware.
|
|
TuAT4 |
Room 4 |
Compliance and Impedance Control |
Regular session |
Co-Chair: Lippiello, Vincenzo | University of Naples FEDERICO II |
|
15:00-15:15, Paper TuAT4.1 | |
Simple-Rotation Angle/Axis Representations Based Second-Order Impedance Control |
|
Gong, Chenwei | Xi'an Jiaotong University |
Zhao, Fei | Xi'an Jiaotong University |
Liao, Zhiwei | Xi'an Jiaotong University |
Tao, Tao | Xi'an Jiaotong University |
Wang, Xiao | Xi'an Jiaotong University |
Mei, Xuesong | Xi'an Jiaotong University |
Keywords: Compliance and Impedance Control, Robust/Adaptive Control, Dynamics
Abstract: Since the difference in angular velocity is used as the derivative of the orientation error in the classical impedance control, there is no longer a form of the second-order differential equation (SODE), and there is non-linearity in the classical impedance control, which limits applications. To address this problem, this article uses simple-rotation angle/axis representations (SRAAR), as well as their derivatives, to describe the end-effector’s orientation displacement and its derivatives in impedance control. As a result, an SRAAR-based second-order impedance control, whose dynamic relationship has the form of SODE, is proposed. Furthermore, as a direct application of the proposed SRAAR-based second-order impedance control, an adaptive control method is also proposed to deal with the problem of uncertain dynamic parameters so that the desired dynamic relationship can be accurately realized. A simulation is carried out to show the difference between the classical impedance control and the proposed impedance control. Experiments on the Franka Emika Panda have been conducted, and the results validate the effectiveness of the proposed adaptive control, which also verifies the correctness of the proposed SRAAR-based second-order impedance control.
|
|
15:15-15:30, Paper TuAT4.2 | |
Robust Elastic Structure Preserving Control for High Impedance Rendering of Series Elastic Actuator |
|
Lee, Hyunwook | Gyeongsang National University |
Lee, Jinoh | German Aerospace Center (DLR) |
Keppler, Manuel | German Aerospace Center (DLR) |
Oh, Sehoon | DGIST |
Keywords: Compliance and Impedance Control, Flexible Robotics, Robust/Adaptive Control
Abstract: In this paper, a new robust approach is proposed to address the limitation of impedance rendering for Series Elastic Actuators (SEA). The concept of Elastic Structure Preserving (ESP) control allows for the attachment of desired load-side dynamics to the SEA while maintaining a passivity condition, regardless of the parameters for the attached dynamics. The characteristics of ESP control are revisited and translated in the frequency domain, which grants a new perspective to identify its advantages compared to conventional impedance control in terms of passivity. Additionally, we analyze the degradation of performance due to unwanted disturbance and uncertainties in spring stiffness and motor inertia, and a new form of the robust ESP method is proposed by endowing disturbance rejection capability and robustness against uncertainty.
|
|
15:30-15:45, Paper TuAT4.3 | |
Contact-Rich SE(3) Equivariant Robot Manipulation Task Learning Via Geometric Impedance Control |
|
Seo, Joohwan | University of California, Berkeley |
Potu Surya Prakash, Nikhil | University of California, Berkeley |
Zhang, Xiang | University of California, Berkeley |
Wang, Changhao | University of California, Berkeley |
Choi, Jongeun | Yonsei University |
Tomizuka, Masayoshi | University of California |
Horowitz, Roberto | Berkeley |
Keywords: Machine Learning for Robot Control, Compliance and Impedance Control, Learning from Demonstration
Abstract: This paper presents a differential geometric control approach that leverages SE(3) group invariance and equivariance to increase transferability in learning robot manipulation tasks that involve interaction with the environment. Specifically, we employ a control law and a learning representation framework that remain invariant under arbitrary SE(3) transformations of the manipulation task definition. Furthermore, the control law and learning representation framework are shown to be SE(3) equivariant when represented relative to the spatial frame. The proposed approach is based on utilizing a recently presented geometric impedance control (GIC) combined with a learning variable impedance control framework, where the gain scheduling policy is trained in a supervised learning fashion from expert demonstrations. A geometrically consistent error vector (GCEV) is fed to a neural network to achieve a gain scheduling policy that remains invariant to arbitrary translation and rotations. A comparison of our proposed control and learning framework with a well-known Cartesian space learning impedance control, equipped with a Cartesian error vector-based gain scheduling policy, confirms the significantly superior learning transferability of our proposed approach. A hardware implementation on a peg-in-hole task is conducted to validate the learning transferability and feasibility of the proposed approach. The simulation and hardware experiment video is posted: https://sites.google.com/berkeley.edu/equivariant-task-learning/home
|
|
15:45-16:00, Paper TuAT4.4 | |
Lie Group-Based User Motion Refinement Control for Teleoperation of a Constrained Robot Arm |
|
Kim, Jonghyeok | POSTECH |
Lee, Donghyeon | Pohang University of Science and Technology(POSTECH) |
Choi, Youngjin | Hanyang University |
Chung, Wan Kyun | POSTECH |
Keywords: Motion Control, Compliance and Impedance Control, Telerobotics and Teleoperation
Abstract: In unilateral teleoperation systems, robots often face challenges when performing tasks with specific geometric constraints. These constraints restrict the robot's movements to certain directions, requiring accurate control of its position and orientation. If the operator's commands do not consider these constraints, excessive contact force may occur, potentially damaging the robot and its environment. Such scenarios can also trigger frequent emergency stops, even with conventional admittance control. To mitigate these issues, we propose a new teleoperation framework tailored for handling geometric constraints. This framework comprises two main components: (1) Geometric Constraint Identification: We use a straightforward line regression method based on Lie group theory to identify geometric constraints. (2) Motion Command Reshaping: The operator's motion commands are safely recalculated using a projection filter coupled with a Lie group setpoint controller. This approach ensures that the robot's movements strictly conform to the identified geometric constraints. As a result, this approach significantly reduces the interaction forces and prevents the risk of severe failures or accidents.
|
|
TuAT5 |
Room 5 |
Additive Manufacturing |
Regular session |
Chair: El-Khasawneh, Bashar | Khalifa University |
|
15:00-15:15, Paper TuAT5.1 | |
Assessment of a Flow-Measurement Technique for the Printability of Extrusion-Based Bioprinting |
|
Tseng, Wei-Chih | National Central University |
Liao, Chao-Yaug | National Central University |
Chassagne, Luc | University of Versailles |
Cagneau, Barthélemy | Université De Versailles Saint-Quentin En Yvelines |
Keywords: Additive Manufacturing, Product Design, Development and Prototyping
Abstract: Extrusion-based 3D bioprinting is a common application in tissue engineering used to fabricate porous and complex scaffolds for tissue regeneration and repair. Flow-rate measurement can improve printability and reduce the trial-and-error steps needed to optimize printing parameters. This study develops a novel system for flow-rate measurement of chitosan solution based on particle image velocimetry (PIV). The results obtained using PIV are compared with the results obtained using an existing predictive model. In addition, results are double-checked using precision scales to demonstrate the robustness of the PIV method. The experimental results fit the exponential function of the mathematical model (adjusted Rsquared = 0.999), and the measured errors of PIV compared to those of the precision scale were within 3%.
|
|
15:15-15:30, Paper TuAT5.2 | |
Soft Printable Robots with Flexible Metal Endoskeleton (I) |
|
Chen, Chao-Yu | National University of Singapore |
Ang, Benjamin, Wee Keong | National University of Singapore |
Li, Yangfan | Institute of High Performance Computing, A*Star |
Liu, Jun | Institute of High Performance Computing |
Liu, ZhuangJian | Institute of High Performance Computing |
Yeow, Chen-Hua | National University of Singapore |
Keywords: Additive Manufacturing, Soft Robot Materials and Design, Soft Sensors and Actuators, Variable Stiffness
Abstract: Recent advancements in soft robotics have seen the rapid development of soft grippers for industrial pick-and-place applications. They are however ill-suited to bear heavy loads due to their compliant nature. Paradoxically, researchers have sought to increase the stiffness of soft grippers to improve load-bearing capabilities. Unfortunately, contemporary soft actuators with variable stiffness are fabricated using manual processes and their performance is subject to an individual’s mastery. They are hence not reliable for long-term industrial use. In this paper, we present our work on a 3D-printed metal-endoskeleton-reinforced actuator (MERA) for industrial pick-and-place applications. We also highlight the fabrication processes needed to recreate it repetitively. Using stainless steel splints (SSS), we demonstrate that MERA is able to modulate its stiffness at selective junctures for stable and effective grasping. We also describe our design rationale with a qualitative mathematical model and validate its performance quantitatively using a finite element model, which is further investigated in the following fatigue test. In our experiments, the MERA equipped with SSS is able to output a peak tip force of 8N, which is a 291% increase compared to the one without metallic reinforcement. In addition, an increase of 76.5% in gripping load and a maximum holding force per actuator of 13.8N are realized through the stiffness tuning of a MERA-Gripper. Despite significantly improving load-bearing capabilities, the actuator manages to retain an overall low profile with a weight of 82g. Finally, we adapted the MERA into a reconfigurable gripper and tested its grasping capabilities on objects of various shapes, sizes, and weights.
|
|
15:30-15:45, Paper TuAT5.3 | |
Optimal Design of Linkage-Driven Underactuated Hand for Precise Pinching and Powerful Grasping |
|
Meng, Hailiang | Zhejiang University of Technology |
Yang, Kaiyu | Zhejiang University of Technology |
Zhou, Lingxuan | Zhejiang University of Technology |
Shi, Yixiao | Zhejiang University of Technology |
Cai, Shibo | Zhejiang University of Technology |
Bao, Guanjun | Zhejiang University of Technology, China |
Keywords: Product Design, Development and Prototyping, Grasping, Mechanism Design
Abstract: Pinching and grasping are the key fundamental actions for manipulator to handle objects, and achieving high-quality execution of these actions has always been an intense attention in robotics field. In this paper, a novel underactuated manipulator with composed multiple-linkage mechanism is proposed, which enables both precise pinching and powerful grasping functions. To eliminate trajectory errors of linear pinching, the linkage mechanism is optimized by genetic algorithm based on modelling and kinematic analysis of the finger. The optimized configuration results in a vertical displacement error less than 0.3mm and a linear deviation less than 1%. Furthermore, the envelope linkage mechanism is optimized by taking the uniformity of force distribution, which is chosen as a performance indicator. Additionally, a rotary mechanism is installed at the proximal interphalangeal of the finger to enable the manipulator to switch different grasping modes for various types of objects. Finally, the experiments on pinching small and thin objects and grasping objects with pose-varied mode demonstrate that the developed manipulator is capable of both precise pinching and powerful grasping. This work offers a promising solution for the robotic manipulator to perform multi-type object grasping tasks.
|
|
15:45-16:00, Paper TuAT5.4 | |
SPONGE: Open-Source Designs of Modular Articulated Soft Robots |
|
Habich, Tim-Lukas | Leibniz University Hannover |
Haack, Jonas | University of Bremen |
Belhadj, Mehdi | Leibniz University Hannover |
Lehmann, Dustin | TU Berlin |
Seel, Thomas | Leibniz Universität Hannover |
Schappler, Moritz | Institute of Mechatronic Systems, Leibniz Universitaet Hannover |
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, Hydraulic/Pneumatic Actuators
Abstract: Soft-robot designs are manifold, but only a few are publicly available. Often, these are only briefly described in their publications. This complicates reproduction and hinders the reproducibility and comparability of research results. If the designs were uniform and open source, validating researched methods on real benchmark systems would be possible. To address this, we present two variants of a soft pneumatic robot with antagonistic bellows as open source. Starting from a semi-modular design with multiple cables and tubes routed through the robot body, the transition to a modular robot with integrated microvalves and serial communication is highlighted. Modularity in terms of stackability, actuation, and communication is achieved, which is the crucial requirement for building soft robots with many degrees of freedom and high dexterity for real-world tasks. Both systems are compared regarding their respective advantages and disadvantages. The robots' functionality is demonstrated in experiments on airtightness, position control with mean tracking errors of <3 deg, and long-term operation of cast and printed bellows. All soft- and hardware files required for reproduction are provided.
|
|
TuAT6 |
Room 6 |
Tendon-Driven Robots |
Regular session |
Co-Chair: Stefanini, Cesare | Scuola Superiore Sant'Anna |
|
15:00-15:15, Paper TuAT6.1 | |
Design Optimization of Wire Arrangement with Variable Relay Points in Numerical Simulation for Tendon-Driven Robots |
|
Kawaharazuka, Kento | The University of Tokyo |
Yoshimura, Shunnosuke | The University of Tokyo |
Suzuki, Temma | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Tendon/Wire Mechanism, Redundant Robots, Biomimetics
Abstract: One of the most important features of tendon-driven robots is the ease of wire arrangement and the degree of freedom it affords, enabling the construction of a body that satisfies the desired characteristics by modifying the wire arrangement. Various wire arrangement optimization methods have been proposed, but they have simplified the configuration by assuming that the moment arm of wires to joints are constant, or by disregarding wire arrangements that span multiple joints and include relay points. In this study, we formulate a more flexible wire arrangement optimization problem in which each wire is represented by a start point, multiple relay points, and an end point, and achieve the desired physical performance based on black-box optimization. We consider a multi-objective optimization which simultaneously takes into account both the feasible operational force space and velocity space, and discuss the optimization results obtained from various configurations.
|
|
15:15-15:30, Paper TuAT6.2 | |
A Tendon-Driven Continuum Manipulator with Robust Shape Estimation by Multiple IMUs |
|
Peng, Rui | The University of Hong Kong |
Wang, Yu | The University of Hong Kong |
Lu, Peng | The University of Hong Kong |
Keywords: Tendon/Wire Mechanism, Modeling, Control, and Learning for Soft Robots
Abstract: In this letter, a tendon-driven continuum robotic manipulator with three individual continuum sections is developed and manufactured. The main contribution is that we propose a robust and accurate shape estimation method based on the fusion of multi-IMUs for the manipulator, under the PCC (Piecewise Constant Curvature) assumption. To intuitively present the robot’s configuration space, we develop a visualization environment to showcase the real-time continuum shape. To validate the proposed system with the estimation method, we evaluate fundamental attributes such as the bending range, tip velocity, effective workspace, and durability. Furthermore, we conduct motion experiments of shape deformation, dynamic tracking, and disturbance resisting. The results show that our proposed estimation method is evaluated to achieve less than 20 mm RMSE on tip motion, during consecutive 3D motions. Meanwhile, we compare the proposed system with previous continuum robotic systems in mechanism properties. Our proposed robotic system has a more compact and efficient structure.
|
|
15:30-15:45, Paper TuAT6.3 | |
SAQIEL: Ultra-Light and Safe Manipulator with Passive 3D Wire Alignment Mechanism |
|
Suzuki, Temma | The University of Tokyo |
Bando, Masahiro | The University of Tokyo |
Kawaharazuka, Kento | The University of Tokyo |
Okada, Kei | The University of Tokyo |
Inaba, Masayuki | The University of Tokyo |
Keywords: Tendon/Wire Mechanism, Robot Safety, Redundant Robots
Abstract: Improving the safety of collaborative manipulators necessitates the reduction of inertia in the moving part. Within this paper, we introduce a novel approach in the form of a passive 3D wire aligner, serving as a lightweight and low-friction power transmission mechanism, thus achieving the desired low inertia in the manipulator's operation. Through the utilization of this innovation, the consolidation of hefty actuators onto the root link becomes feasible, consequently enabling a supple drive characterized by minimal friction. To demonstrate the efficacy of this device, we fabricate an ultralight 7 degrees of freedom (DoF) manipulator named SAQIEL, boasting a mere 1.5 kg weight for its moving components. Notably, to mitigate friction within SAQIEL's actuation system, we employ a distinctive mechanism that directly winds wires using motors, obviating the need for traditional gear or belt-based speed reduction mechanisms. Through a series of empirical trials, we substantiate that SAQIEL adeptly strikes balance between lightweight design, substantial payload capacity, elevated velocity, precision, and adaptability.
|
|
15:45-16:00, Paper TuAT6.4 | |
A Novel Friction Measuring Method and Its Application to Improve the Static Modeling Accuracy of Cable-Driven Continuum Manipulators |
|
Dai, Yicheng | Harbin Institute of Technology (Shenzhen) |
Wang, Sheng | Harbin Institute of Technology |
Wang, Xin | Harbin Institute of Technology, Shenzhen |
Yuan, Han | Harbin Institute of Technology |
Keywords: Tendon/Wire Mechanism, Flexible Robotics
Abstract: Cable-driven continuum manipulators exhibit high flexibility and dexterity, leading to their increased popularity in recent years. Friction analysis is a crucial problem for these manipulators. Previous research has introduced friction models that are applicable to dynamic states where the direction of friction can be ascertained. However, in static states, the direction of internal friction remains undetermined. Additionally, previous studies have investigated the friction law within a single cable hole. However, as multiple cable holes exist along the manipulator, friction should be considered as a series of forces and examined across multiple cable holes. Measuring internal friction presents a challenge due to the unique structure of the manipulator. To our knowledge, no state-of-the-art research has studied how to measure the friction along the entire manipulator. In this paper, we propose a novel friction measuring method based on fiber Bragg grating (FBG) sensors. Experimental results show that the friction distribution can be fully measured. We apply this method to a static model and shape estimation experiments demonstrate that the accuracy of the static model is significantly improved, particularly when the friction has inconsistent directions. Our proposed friction measuring method provides a valuable approach for mechanics analysis of cable-driven continuum manipulators.
|
|
TuAT7 |
Room 7 |
Human-Robot Collaboration |
Regular session |
Co-Chair: Hussain, Irfan | Khalifa University |
|
15:00-15:15, Paper TuAT7.1 | |
Human-Robot Collaboration through a Multi-Scale Graph Convolution Neural Network with Temporal Attention |
|
Liu, Zhaowei | School of Computer and Control Engineering, Yantai University |
Lu, Xilang | Yantai University |
Liu, Wenzhe | Yantai University |
Qi, Wen | Politecnico Di Milano |
Su, Hang | Politecnico Di Milano |
Keywords: Human-Robot Collaboration, Intention Recognition, Data Sets for Robotic Vision
Abstract: Collaborative robots sensing and understanding the movements and intentions of their human partners are crucial for realizing human-robot collaboration. Human skeleton sequences are widely recognized as a kind of data with great application potential in human action recognition. In this paper, a multi-scale skeleton-based human action recognition network is proposed, which leverages a spatio-temporal attention mechanism. The network achieves high-accuracy human action prediction by aggregating multi-level key point features of the skeleton and applying the spatio-temporal attention mechanism to extract key temporal information features. In addition, a human action skeleton dataset containing eight different categories is collected for a human-robot collaboration task, where the human activity recognition network predicts skeleton sequences from a camera and the collaborating robot makes collaborative actions based on the predicted actions. In this study, the performance of the proposed method is compared with state-of-the-art human action recognition methods and ablation experiments are performed. The results show that the multi-scale spatio-temporal graph convolutional neural network has an action recognition accuracy of 94.16%. The effectiveness of the method is also verified by performing human-robot collaboration experiments on a real robot platform in a laboratory environment.
|
|
15:15-15:30, Paper TuAT7.2 | |
Safety Compliant, Ergonomic and Time-Optimal Trajectory Planning for Collaborative Robotics (I) |
|
Proia, Silvia | Università Di Modena E Reggio Emilia |
Cavone, Graziana | University Roma Tre |
Scarabaggio, Paolo | Politecnico Di Bari |
Carli, Raffaele | Politecnico Di Bari |
Dotoli, Mariagrazia | Politecnico Di Bari |
Keywords: Human-Robot Collaboration, Safety in HRI, Optimization and Optimal Control
Abstract: The demand for safe and ergonomic workplaces is rapidly growing in modern industrial scenarios, especially for companies that intensely rely on Human-Robot Collaboration (HRC). This work focuses on optimizing the trajectory of the end-effector of a cobot arm in a collaborative industrial environment, ensuring the maximization of the operator’s safety and ergonomics without sacrificing production efficiency requirements. Hence, a multi-objective optimization strategy for trajectory planning in a safe and ergonomic HRC is defined. This approach aims at finding the best trade-off between the total traversal time of the cobot’s end-effector trajectory and ergonomics for the human worker, while respecting in the kinematic constraint of the optimization problem the ISO safety requirements through the well-known Speed and Separation Monitoring (SSM) methodology. Guaranteeing an ergonomic HRC means reducing musculoskeletal disorders linked to risky and highly repetitive activities. The three main phases of the proposed technique are described as follows. First, a manikin designed using a dedicated software is employed to evaluate the Rapid Upper Limb Assessment (RULA) ergonomic index in the working area. Next, a second-order cone programming problem is defined to represent a time-optimal safety compliant trajectory planning problem. Finally, the trajectory that ensures the best compromise between these two opposing goals –minimizing the task’s traversal time and maintaining a high level of ergonomics for the human worker– is computed by defining and solving a multi-objective control problem. The method is tested on an experimental case study in reference to an assembly task and the obtained results are discussed, showing the effectiveness of the proposed approach.
|
|
15:30-15:45, Paper TuAT7.3 | |
Effects of Shared Control on Cognitive Load and Trust in Teleoperated Trajectory Tracking |
|
Pan, Jiahe | ETH Zurich |
Eden, Jonathan | University of Melborune |
Oetomo, Denny | The University of Melbourne |
Johal, Wafa | University of Melbourne |
Keywords: Human-Robot Collaboration, Acceptability and Trust, Telerobotics and Teleoperation
Abstract: Teleoperation is increasingly recognized as a viable solution for deploying robots in hazardous environments. Controlling a robot to perform a complex or demanding task may overload operators resulting in poor performance. To design a robot controller to assist the human in executing such challenging tasks, a comprehensive understanding of the interplay between the robot's autonomous behavior and the operator's internal state is essential. In this paper, we investigate the relationships between robot autonomy and both the human user's cognitive load and trust levels, and the potential existence of three-way interactions in the robot-assisted execution of the task. Our user study (N=24) results indicate that while autonomy level influences the teleoperator's perceived cognitive load and trust, there is no clear interaction between these factors. Instead, these elements appear to operate independently, thus highlighting the need to consider both cognitive load and trust as distinct but interrelated factors in varying the robot autonomy level in shared-control settings. This insight is crucial for the development of more effective and adaptable assistive robotic systems.
|
|
15:45-16:00, Paper TuAT7.4 | |
Reconciling Conflicting Intents: Bidirectional Trust-Based Variable Autonomy for Mobile Robots |
|
Li, Yinglin | Northwestern Polytechnical University |
Cui, Rongxin | Northwestern Polytechnical University |
Yan, Weisheng | Northwestern Polytechnical University |
Zhang, Shi | Northwestern Polytechnical University |
Yang, Chenguang | University of Liverpool |
Keywords: Human-Robot Collaboration, Telerobotics and Teleoperation, Human Factors and Human-in-the-Loop
Abstract: In the realm of semi-autonomous mobile robots designed for remote operation with humans, current variable autonomy approaches struggle to reconcile conflicting intents while ensuring compliance, autonomy, and safety. To address this challenge, we propose a bidirectional trust-based variable autonomy (BTVA) control approach. By incorporating diverse trust factors and leveraging Kalman filtering techniques, we establish a core abstraction layer to construct the state-space model of bidirectional computational trust. This bidirectional trust is integrated into the variable autonomy control loop. Real-time modulation of the degree of automation is achieved through variable weight receding horizon optimization. Through a within-group experimental study with twenty participants in a semi-autonomous navigation task, we validate the effectiveness of our method in goal transfer and assisted teleoperation. Statistical analysis reveals that our method achieves a balance between rapid response and trajectory smoothness. Compared with binary control switching, this method reduces operator workload by 14.3% and enhances system usability by 9.9%.
|
|
TuAT8 |
Room 8 |
Autonomous Vehicle Navigation I |
Regular session |
Chair: Kheddar, Abderrahmane | CNRS-AIST |
Co-Chair: Wang, Shuai | Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences |
|
15:00-15:15, Paper TuAT8.1 | |
Learning Vehicle Dynamics from Cropped Image Patches for Robot Navigation in Unpaved Outdoor Terrains |
|
Lee, Jeong Hyun | Korea Advanced Institute of Science & Technology (KAIST) |
Choi, Jinhyeok | Korea Advanced Institute of Science and Technology |
Ryu, Simo | Korea Advanced Institute of Science & Technology |
Oh, Hyunsik | Korea Advanced Institute of Science and Technology |
Choi, Suyoung | KAIST |
Hwangbo, Jemin | Korean Advanced Institute of Science and Technology |
Keywords: Autonomous Vehicle Navigation, Deep Learning Methods
Abstract: In the realm of autonomous mobile robots, safe navigation through unpaved outdoor environments remains a challenging task. Due to the high-dimensional nature of sensor data, extracting relevant information becomes a complex problem, which hinders adequate perception and path planning. Previous works have shown promising performances in extracting global features from full-sized images. However, they often face challenges in capturing essential local information. In this paper, we propose Crop-LSTM, which iteratively takes cropped image patches around the current robot’s position and predicts the future position, orientation, and bumpiness. Our method performs local feature extraction by paying attention to corresponding image patches along the predicted robot trajectory in the 2D image plane. This enables more accurate predictions of the robot’s future trajectory. With our wheeled mobile robot platform Raicart, we demonstrated the effectiveness of Crop-LSTM for point-goal navigation in an unpaved outdoor environment. Our method enabled safe and robust navigation using RGBD images in challenging unpaved outdoor terrains. The summary video is available at https://youtu.be/iIGNZ8ignk0.
|
|
15:15-15:30, Paper TuAT8.2 | |
Planning Impact-Driven Logistic Tasks |
|
Zermane, Ahmed | CNRS-LIRMM |
Dehio, Niels | KUKA |
Kheddar, Abderrahmane | CNRS-AIST |
Keywords: Logistics, Process Control, Industrial Robots
Abstract: This letter proposes a decoupled two-level model-based planning strategy and control for a class of robotic tasks involving impact to be made with desired performance and constraints. The first part solves the problem of planning a robotic arm trajectory from a given state (position and velocity) to another desired one, enabling non-stop trajectory cycles. The second part is an impact-aware model-based plugin; it is specific to each task and links the desired task-space impact objective to a via-point in the joint-space. The two parts are then combined to achieve the entire task. Our approach is assessed with real-robot experiments demonstrating how this strategy can be used to perform tossing, grabbing, and boxing or any combination of them in sorting logistics-industry use-cases.
|
|
15:30-15:45, Paper TuAT8.3 | |
PRIEST: Projection Guided Sampling-Based Optimization for Autonomous Navigation |
|
Rastgar, Fatemeh | University of Tartu |
Masnavi, Houman | Toronto Metropolitan University |
Sharma, Basant | University of Tartu |
Aabloo, Alvo | Tartu University |
Swevers, Jan | KU Leuven |
Singh, Arun Kumar | University of Tartu |
Keywords: Optimization and Optimal Control, Autonomous Vehicle Navigation, Collision Avoidance
Abstract: Efficient navigation in unknown and dynamic environments is crucial for expanding the application domain of mobile robots. The core challenge stems from the non-availability of a feasible global path for guiding optimization-based local planners. As a result, existing local planners often get trapped in poor local minima. In this paper, we present a novel optimizer that can explore multiple homotopies to plan high-quality trajectories over long horizons while still being fast enough for real-time applications. We build on the gradient-free paradigm by augmenting the trajectory sampling strategy with a projection optimization that guides the samples toward a feasible region. As a result, our method can recover from the frequently encountered pathological cases wherein all the sampled trajectories lie in the high-cost region. We push the color{red}{state-of-the-art (SOTA)} color{black}in the following respects. Over the navigation stack of the Robot Operating System (ROS), we show an improvement of 7-13% in success rate and up to two times in total travel time metric. On the same benchmarks and metrics, our approach achieves up to 44% improvement over color{red}{model predictive path integral (MPPI)} color{black} and its recent variants. On simple point-to-point navigation tasks, our optimizer is up to two times more reliable than SOTA gradient-based solvers, as well as sampling-based approaches such as the Cross-Entropy Method (CEM) and VPSTO. Codes: https://github.com/fatemeh-rastgar/PRIEST
|
|
15:45-16:00, Paper TuAT8.4 | |
Seamless Virtual Reality with Integrated Synchronizer and Synthesizer for Autonomous Driving |
|
Li, He | University of Macau |
Han, Ruihua | University of Hong Kong |
Zhao, Zirui | Southern University of Science and Technology |
Xu, Wei | Manifold Tech Limited |
Hao, Qi | Southern University of Science and Technology |
Wang, Shuai | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Xu, Chengzhong | University of Macau |
Keywords: Virtual Reality and Interfaces, Autonomous Vehicle Navigation, Simulation and Animation
Abstract: Virtual reality (VR) is a promising data engine for autonomous driving (AD). However, data fidelity in this paradigm is often degraded by VR inconsistency, for which the existing VR approaches become ineffective, as they ignore the inter-dependency between low-level VR synchronizer designs (i.e., data collector) and high-level VR synthesizer designs (i.e., data processor). This paper presents a seamless virtual reality (SVR) platform for AD, which mitigates such inconsistency via cross-layer optimization of VR sensory and action data fidelity when agents interact with each other in a shared symbiotic world. The crux to SVR is an integrated synchronizer and synthesizer (IS2) design, which consists of a drift-aware lidar-inertial synchronizer for VR colocation and a motion-aware deep visual synthesis network for augmented reality image generation. We implement SVR on car-like robots in two sandbox platforms, achieving a cm-level VR colocalization accuracy with 50 Hz frequency and 3.2% VR image deviation, thereby avoiding missed collisions or model clippings. Experiments show that the proposed SVR reduces the intervention times, missed turns, and failure rates compared to other benchmarks. The SVR-trained neural network can handle unseen situations in real-world environments, by leveraging its knowledge learnt from the VR space.
|
|
TuAT9 |
Room 9 |
Visual Tracking |
Regular session |
Chair: Khorrami, Farshad | New York University Tandon School of Engineering |
|
15:00-15:15, Paper TuAT9.1 | |
DynaMeshSLAM: A Mesh-Based Dynamic Visual SLAMMOT Method |
|
Liu, Yang | Wuhan University |
Guo, Chi | Wuhan University |
Luo, Yarong | Wuhan University |
Wang, Yingli | Wuhan University |
Keywords: SLAM, Visual Tracking, Semantic Scene Understanding
Abstract: In order to estimate both camera poses and dynamic object poses, the visual SLAMMOT method combines visual Simultaneous Localization and Mapping (SLAM) with Multiple Object Tracking (MOT). Many visual SLAMMOT methods represent dynamic objects as bounding boxes and point cloud clusters, which ignores the geometric properties of the object surfaces that can provide additional constraints. In this letter, we propose DynaMeshSLAM, a visual SLAMMOT method, which represents dynamic objects as mesh models to leverage intrinsic geometric properties. Firstly, DynaMeshSLAM fuses the mesh projection and the optical flow to achieve multi-level object data association. Secondly, a constrained mesh smoothing method is embedded into the visual SLAMMOT framework to adjust dynamic landmarks depending on both the smoothness of object mesh models and the projection error of mesh vertices. Thirdly, a bundle adjustment solution incorporating the deformation graph optimizes the states of dynamic objects, while ensuring the local rigidity of the smoothed mesh models. Experiments on the KITTI-Tracking dataset demonstrate that our method achieves state-of-the-art performance in both object tracking and object pose estimation.
|
|
15:15-15:30, Paper TuAT9.2 | |
DiffOcclusion: Differentiable Optimization Based Control Barrier Functions for Occlusion-Free Visual Servoing |
|
Wei, Shiqing | New York University |
Dai, Bolun | New York University |
Khorrambakht, Rooholla | New York University |
Krishnamurthy, Prashanth | New York University Tandon School of Engineering |
Khorrami, Farshad | New York University Tandon School of Engineering |
Keywords: Visual Servoing, Robot Safety, Sensor-based Control
Abstract: The visibility (possibly partial) of some image features is crucial to a broad class of visual servoing-based control. In this work, we consider the setting of image-based visual servoing (IBVS) and address the fundamental problem of keeping a moving object with an unknown motion profile in the field of view while ensuring it remains unobstructed by obstacles. Assuming that the projections of the target and obstacles are both convex polygons, we propose a systematic method for circumscribing these polygons by strictly convex shapes with tunable accuracy. We prove that the minimal scaling factor such that two convex shapes intersect is continuously differentiable with respect to their vertex coordinates. Then, we formulate a control barrier function (CBF) based on this minimal scaling factor and incorporate a motion observer into occlusion-free visual servoing. The effectiveness of our method is validated through both simulation studies and experiments on the Franka Research 3 robotic arm.
|
|
15:30-15:45, Paper TuAT9.3 | |
S.T.A.R.-Track: Latent Motion Models for End-To-End 3D Object Tracking with Adaptive Spatio-Temporal Appearance Representations |
|
Doll, Simon | Mercedes-Benz AG, University of Tübingen |
Hanselmann, Niklas | Mercedes-Benz AG R&D, University of Tübingen |
Schneider, Lukas | Mercedes Benz AG |
Schulz, Richard | Mercedes-Benz AG |
Enzweiler, Markus | Esslingen University of Applied Sciences |
Lensch, Hendrik Peter Asmus | University of Tuebingen |
Keywords: Visual Tracking, Deep Learning for Visual Perception, Autonomous Vehicle Navigation
Abstract: Following the tracking-by-attention paradigm, this paper introduces an object-centric, transformer-based framework for tracking in 3D. Traditional model-based tracking approaches incorporate the geometric effect of object- and ego motion between frames with a geometric motion model. Inspired by this, we propose S.T.A.R.-Track which uses a novel latent motion model (LMM) to additionally adjust object queries to account for changes in viewing direction and lighting conditions directly in the latent space, while still modeling the geometric motion explicitly. Combined with a novel learnable track embedding that aids in modeling the existence probability of tracks, this results in a generic tracking framework that can be integrated with any query-based detector. Extensive experiments on the nuScenes benchmark demonstrate the benefits of our approach, showing state-of-the-art (SOTA) performance for DETR3D-based trackers while drastically reducing the number of identity switches of tracks at the same time.
|
|
15:45-16:00, Paper TuAT9.4 | |
D-VAT: End-To-End Visual Active Tracking for Micro Aerial Vehicles |
|
Dionigi, Alberto | University of Perugia |
Felicioni, Simone | University of Perugia - Department of Engineering |
Leomanni, Mirko | University of Perugia |
Costante, Gabriele | University of Perugia |
Keywords: Visual Tracking, Reinforcement Learning, Aerial Systems: Applications
Abstract: Visual active tracking is a growing research topic in robotics due to its key role in applications such as human assistance, disaster recovery, and surveillance. In contrast to passive tracking, active tracking approaches combine vision and control capabilities to detect and actively track the target. Most of the work in this area focuses on ground robots, while the very few contributions on aerial platforms still pose important design constraints that limit their applicability. To overcome these limitations, in this paper we propose D-VAT, a novel end-to-end visual active tracking methodology based on deep reinforcement learning that is tailored to micro aerial vehicle platforms. The D-VAT agent computes the vehicle thrust and angular velocity commands needed to track the target by directly processing monocular camera measurements. We show that the proposed approach allows for precise and collision-free tracking operations, outperforming different state-of-the-art baselines on simulated environments which differ significantly from those encountered during training. Moreover, we demonstrate a smooth real-world transition to a quadrotor platform with mixed-reality.
|
|
TuAT10 |
Room 10 |
Computer Vision for Automation |
Regular session |
Co-Chair: Lim, Yongseob | DGIST |
|
15:00-15:15, Paper TuAT10.1 | |
Lane Segmentation Data Augmentation for Heavy Rain Sensor Blockage Using Realistically Translated Raindrop Images and CARLA Simulator |
|
Pahk, Jinu | Daegu Gyeongbuk Institute of Science and Technology |
Park, Seongjeong | Daegu Gyeongbuk Institute of Science and Technology |
Shim, Jungseok | DGIST |
Son, Sungho | KATRI |
Lee, Jungki | KATRI |
An, Jinung | DGIST |
Lim, Yongseob | DGIST |
Choi, GyeungHo | Daegu Gyeongbuk Institute of Science and Technology |
Keywords: Computer Vision for Automation, Data Sets for Robotic Vision, Simulation and Animation
Abstract: Lane segmentation and Lane Keeping Assist System (LKAS) play a vital role in autonomous driving. While deep learning technology has significantly improved the accuracy of lane segmentation, real-world driving scenarios present various challenges. In particular, heavy rainfall not only obscures the road with sheets of rain and fog but also creates water droplets on the windshield or lens of the camera that affects the lane segmentation performance. There may even be a false positive problem in which the algorithm incorrectly recognizes a raindrop as a road lane. Collecting heavy rain data is challenging in real-world settings, and manual annotation of such data is expensive. In this research, we propose a realistic raindrop conversion process that employs a contrastive learning-based Generative Adversarial Network (GAN) model to transform raindrops randomly generated using Python libraries. In addition, we utilize the attention mask of the lane segmentation model to guide the placement of raindrops in training images from the translation target domain (real Rainy-Images). By training the ENet-SAD model using the realistically Translated-Raindrop images and lane ground truth automatically extracted from the CARLA Simulator, we observe an improvement in lane segmentation accuracy in Rainy-Images. This method enables training and testing of the perception model while adjusting the number, size, shape, and direction of raindrops, thereby contributing to future research on autonomous driving in adverse weather conditions.
|
|
15:15-15:30, Paper TuAT10.2 | |
Street-View Image Generation from a Bird's-Eye View Layout |
|
Swerdlow, Alexander | Carnegie Mellon University |
Xu, Runsheng | UCLA |
Zhou, Bolei | University of California, Los Angeles |
Keywords: Computer Vision for Automation, Simulation and Animation, Deep Learning for Visual Perception
Abstract: Bird's-Eye View (BEV) Perception has received increasing attention in recent years as it provides a concise and unified spatial representation across views and benefits a diverse set of downstream driving applications. At the same time, data-driven simulation for autonomous driving has been a focal point of recent research but with few approaches that are both fully data-driven and controllable. Instead of using perception data from real-life scenarios, an ideal model for simulation would generate realistic street-view images that align with a given HD map and traffic layout, a task that is critical for visualizing complex traffic scenarios and developing robust perception models for autonomous driving. In this paper, we propose BEVGen, a conditional generative model that synthesizes a set of realistic and spatially consistent surrounding images that match the BEV layout of a traffic scenario. BEVGen incorporates a novel cross-view transformation with spatial attention design which learns the relationship between cameras and map views to ensure their consistency. We evaluate the proposed model on the challenging NuScenes and Argoverse 2 datasets. After training, BEVGen can accurately render road and lane lines, as well as generate traffic scenes under diverse weather conditions and times of day.
|
|
15:30-15:45, Paper TuAT10.3 | |
Toward Reliable Human Pose Forecasting with Uncertainty |
|
Saadatnejad, Saeed | EPFL |
Mirmohammadi, Mehrshad | Sharif University of Technology |
Daghyani, Matin | Sharif University of Technology |
Saremi, Parham | Sharif University of Technology |
Zoroofchi Benisi, Yashar | Sharif University of Technology |
Alimohammadi, Amirhossein | Simon Fraser University |
TehraniNasab, Zahra | McGill University |
Mordan, Taylor | EPFL |
Alahi, Alexandre | EPFL |
Keywords: Computer Vision for Automation, Human-Robot Collaboration, Human-Centered Robotics
Abstract: Recently, there has been an arms race of pose forecasting methods aimed at solving the spatio-temporal task of predicting a sequence of future 3D poses of a person given a sequence of past observed ones. However, the lack of unified benchmarks and limited uncertainty analysis have hindered progress in the field. To address this, we first develop an open-source library for human pose forecasting, including multiple models, supporting several datasets, and employing standardized evaluation metrics, with the aim of promoting research and moving toward a unified and consistent evaluation. Second, we devise two types of uncertainty in the problem to increase performance and convey better trust: 1) we propose a method for modeling aleatoric uncertainty by using uncertainty priors to inject knowledge about the pattern of uncertainty. This focuses the capacity of the model in the direction of more meaningful supervision while reducing the number of learned parameters and improving stability; 2) we introduce a novel approach for quantifying the epistemic uncertainty of any model through clustering and measuring the entropy of its assignments. Our experiments demonstrate up to 25% improvements in forecasting at short horizons, with no loss on longer horizons on Human3.6M, AMSS, and 3DPW datasets, and better performance in uncertainty estimation. The code is available: https://github.com/vita-epfl/UnPOSed.
|
|
15:45-16:00, Paper TuAT10.4 | |
GSDC Transformer: An Efficient and Effective Cue Fusion for Monocular Multi-Frame Depth Estimation |
|
Naiyu, Fang | Zhejiang University |
Lemiao, Qiu | Zhejiang University |
Zhang, Shuyou | Zhejiang University |
Zili, Wang | Zhejiang University |
Zheyuan, Zhou | Zhejiang University |
Kerui, Hu | Zhejiang University |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Computer Vision for Transportation
Abstract: Depth estimation provides an alternative approach for perceiving 3D information in autonomous driving. Monocular depth estimation, whether with single-frame or multi-frame inputs, has achieved significant success by learning various types of cues and specializing in either static or dynamic scenes. Recently, these cues fusion becomes an attractive topic, aiming to enable the combined cues to perform well in both types of scenes. However, adaptive cue fusion relies on attention mechanisms, where the quadratic complexity limits the granularity of cue representation. Additionally, explicit cue fusion depends on precise segmentation, which imposes a heavy burden on mask prediction. To address these issues, we propose the GSDC Transformer, an efficient and effective component for cue fusion in monocular multi-frame depth estimation. We utilize deformable attention to learn cue relationships at a fine scale, while sparse attention reduces computational requirements when granularity increases. To compensate for the precision drop in dynamic scenes, we represent scene attributes in the form of super tokens without relying on precise shapes. Within each super token attributed to dynamic scenes, we gather its relevant cues and learn local dense relationships to enhance cue fusion. Our method achieves state-of-the-art performance on the KITTI dataset with efficient fusion speed.
|
|
TuAT11 |
Room 11 |
Human and Humanoid Motion Analysis and Synthesis |
Regular session |
Chair: Ayusawa, Ko | National Institute of Advanced Industrial Science and Technology (AIST) |
Co-Chair: Loianno, Giuseppe | New York University |
|
15:00-15:15, Paper TuAT11.1 | |
Keyframe Selection Via Deep Reinforcement Learning for Skeleton-Based Gesture Recognition |
|
Gan, Minggang | Beijing Institute of Technology |
Liu, Jinting | Beijing Institute of Technology |
He, Yuxuan | Beijing Institute of Technology |
Chen, Aobo | Beijing Institute of Technology |
Ma, Qianzhao | Beijing Institute of Technology |
Keywords: Gesture, Posture and Facial Expressions, Reinforcement Learning
Abstract: Skeleton-based gesture recognition has attracted extensive attention and has made great progress. However, mainstream methods generally treat all frames equally important, which may limit performance, especially when dealing with high inter-class variance in gesture. To tackle this issue, we propose a approach that models a Markov decision process to identify keyframes while discarding irrelevant ones. This paper proposes a deep reinforcement learning double-feature double-motion network comprising two main components: a baseline gesture recognition model and a frame selection network. These two components mutually influence each other, resulting in enhanced overall performance. Following the evaluation of the SHREC-17 and F-PHAB datasets, our proposed method demonstrates superior performance.
|
|
15:15-15:30, Paper TuAT11.2 | |
GesGPT: Speech Gesture Synthesis with Text Parsing from ChatGPT |
|
Gao, Nan | Institute of Automation, Chinese Academy of Sciences |
Zhao, Zeyu | Institute of Automation, Chinese Academy of Sciences |
Zeng, Zhi | Beijing University of Posts and Telecommunications |
Zhang, Shuwu | Beijing University of Posts and Telecommunications |
Weng, Dongdong | Beijing Institute of Technology |
Bao, Yihua | Beijing Institute of Technology |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Gesture, Posture and Facial Expressions
Abstract: Gesture synthesis has gained significant attention as a critical research field, aiming to produce contextually appropriate and natural gestures corresponding to speech or textual input. Although deep learning-based approaches have achieved remarkable progress, they often overlook the rich semantic information present in the text, leading to less expressive and meaningful gestures. In this letter, we propose GesGPT, a novel approach to gesture generation that leverages the semantic analysis capabilities of large language models , such as ChatGPT. By capitalizing on the strengths of LLMs for text analysis, we adopt a controlled approach to generate and integrate professional gestures and base gestures through a text parsing script, resulting in diverse and meaningful gestures. Firstly, our approach involves the development of prompt principles that transform gesture generation into an intention classification problem using ChatGPT. We also conduct further analysis on emphasis words and semantic words to aid in gesture generation. Subsequently, we construct a specialized gesture lexicon with multiple semantic annotations, decoupling the synthesis of gestures into professional gestures and base gestures. Finally, we merge the professional gestures with base gestures. Experimental results demonstrate that GesGPT effectively generates contextually appropriate and expressive gestures, offering a new perspective on semantic co-speech gesture generation.
|
|
15:30-15:45, Paper TuAT11.3 | |
Robust Upper Limb Kinematic Reconstruction Using a RGB-D Camera |
|
Li Gioi, Salvatore Maria | Università Campus Bio-Medico |
Loianno, Giuseppe | New York University |
Cordella, Francesca | University Campus Biomedico of Rome |
Keywords: Human Detection and Tracking, RGB-D Perception, Telerobotics and Teleoperation
Abstract: In this paper, we propose a new approach for human motion reconstruction based on Gaussian Mixture Probability Hypothesis Density (GM-PHD) Filter applied to human joint positions extracted from RGB-D camera (e.g. Kinect). Existing inference approaches require a proper association between measurements and joints, which cannot be maintained in case of the multi-tracking occlusion problem. The proposed GM-PHD recursively estimates the number and states of each group of targets. Furthermore, we embed kinematic constraints in the inference process to guarantee robustness to occlusions. We evaluate the accuracy of both the proposed approach and the default one obtained through a Kinect device by comparing them with a motion analysis system (i.e. Vicon optoelectronic system) even in presence of occlusions of one or more body joints. Experimental results show that the filter outperforms the solution provided by the baseline commercial solution approach available in the Kinect device by reducing the hand position and elbow flexion error of 55.8% and 36.3%, respectively. In addition, to evaluate the applicability of the approach in real-world applications, we employ it in a drone gesture-based context to remotely control a drone. The user is able to move the drone in a target position with a 100% success rate.
|
|
15:45-16:00, Paper TuAT11.4 | |
Fast Direct Optimal Control for Humanoids Based on Dynamics Representation in FPC Latent Space |
|
Shimizu, Soya | Tokyo University of Agriculture and Technology |
Ayusawa, Ko | National Institute of Advanced Industrial Science and Technology |
Venture, Gentiane | The University of Tokyo |
Keywords: Optimization and Optimal Control, Motion Control, Humanoid Robot Systems
Abstract: In this study, we introduce a novel approach to Direct Optimal Control (DOC) for generating feasible motions in humanoid robots through Functional Principal Component Analysis (FPCA). FPCA is an effective instrument for compressing complex, high-dimensional motion data including ground reaction forces into low-dimensional elements located within a latent space, referred to as the FPC space. These low-dimensional elements retain the fundamental characteristics of the original motions and can be readily transformed back into their diverse, original forms. Additionally, these elements are interchangeable, thereby enabling the synthesis of specific elements to formulate entirely new low-dimensional representations. Consequently, we anticipated a significant reduction in computational time by employing these low-dimensional elements as optimization variables for motion generation. To demonstrate the applicability of this approach for the generation of whole-body motions, experiments were conducted on the humanoid robot HRP-4J, incorporating various objective functions. Upon calculating the Root Mean Square Error (RMSE) for angle data between motion data generated using the proposed method and motion data created using conventional methods, an average RMSE of angle as low as 0.007 rad was obtained. This remarkably small value attests to the capability of the proposed method to generate robot motions with high precision without compromising the distinctive characteristics of the data, while the computation time of the proposed method was 35.8 times faster than the conventional one.
|
|
TuAT12 |
Room 12 |
Learning Categories and Concepts |
Regular session |
Chair: Song, Sichao | CyberAgent Inc |
|
15:00-15:15, Paper TuAT12.1 | |
StROL: Stabilized and Robust Online Learning from Humans |
|
Mehta, Shaunak | Virginia Tech |
Meng, Forrest | Virginia Tech |
Bajcsy, Andrea | Carnegie Mellon University |
Losey, Dylan | Virginia Tech |
Keywords: Intention Recognition, Dynamics, Model Learning for Control
Abstract: Robots often need to learn the human's reward function online, during the current interaction. This real-time learning requires fast but approximate learning rules: when the human's behavior is noisy or suboptimal, current approximations can result in unstable robot learning. Accordingly, in this paper we seek to enhance the robustness and convergence properties of gradient descent learning rules when inferring the human's reward parameters. We model the robot's learning algorithm as a dynamical system over the human preference parameters, where the human's true (but unknown) preferences are the equilibrium point. This enables us to perform Lyapunov stability analysis to derive the conditions under which the robot's learning dynamics converge. Our proposed algorithm (StROL) uses these conditions to learn robust-by-design learning rules: given the original learning dynamics, StROL outputs a modified learning rule that now converges to the human's true parameters under a larger set of human inputs. In practice, these autonomously generated learning rules can correctly infer what the human is trying to convey, even when the human is noisy, biased, and suboptimal. Across simulations and a user study we find that StROL results in a more accurate estimate and less regret than state-of-the-art approaches for online reward learning. See videos and code here: https://github.com/VT-Collab/StROL_RAL
|
|
15:15-15:30, Paper TuAT12.2 | |
Wingman-Leader Recommendation: An Empirical Study on Product Recommendation Strategy Using Two Robots |
|
Song, Sichao | CyberAgent Inc |
Baba, Jun | CyberAgent, Inc |
Okafuji, Yuki | CyberAgent, Inc |
Nakanishi, Junya | Osaka Univ |
Yoshikawa, Yuichiro | Osaka University |
Ishiguro, Hiroshi | Osaka University |
Keywords: Design and Human Factors, Social HRI
Abstract: In recent years, the potential applications of social robots in retail environments have been explored. Previous studies have mainly focused on the performance of a single robot and have rarely considered the use of multiple robots to recommend products. Although a previous study did investigate the effectiveness of using two robots to recommend products, it remains unclear whether their results can be attributed a cooperative strategy or simply to the number of robots used. In this work, we explore the effectiveness of a combination strategy called wingman-leader recommendation (WLR), in which a wingman robot outside an establishment promotes a leader robot positioned inside to improve the lead robot's sales performance. Our findings suggest that the wingman robot attracted customers' attention to the leader robot and encouraged them to initiate interactions with it inside the store. We found that using two robots with the WLR strategy achieved better overall performance in terms of sales compared to using two robots without this strategy.
|
|
15:30-15:45, Paper TuAT12.3 | |
Discovering Predictive Relational Object Symbols with Symbolic Attentive Layers |
|
Ahmetoglu, Alper | Brown University |
Çelik, Mehmet Batuhan | Boğaziçi University |
Oztop, Erhan | Osaka University / Ozyegin University |
Ugur, Emre | Bogazici University |
Keywords: Developmental Robotics, Learning Categories and Concepts, Deep Learning Methods
Abstract: In this paper, we propose and realize a new deep learning architecture for discovering symbolic representations for objects and their relations based on the self-supervised continuous interaction of a manipulator robot with multiple objects on a tabletop environment. The key feature of the model is that it can take a changing number of objects as input and map the object-object relations into symbolic domain explicitly. In the model, we employ a self-attention layer that computes discrete attention weights from object features, which are treated as relational symbols between objects. These relational symbols are then used to aggregate the learned object symbols and predict the effects of executed actions on each object. The result is a pipeline that allows the formation of object symbols and relational symbols from a dataset of object features, actions, and effects in an end-to-end manner. We compare the performance of our proposed architecture with state-of-the-art symbol discovery methods in a simulated tabletop environment where the robot needs to discover symbols related to the relative positions of objects to predict the action's result. Our experiments show that the proposed architecture performs better than other baselines in effect prediction while forming not only object symbols but also relational symbols.
|
|
15:45-16:00, Paper TuAT12.4 | |
PRIMP: PRobabilistically-Informed Motion Primitives for Efficient Affordance Learning from Demonstration (I) |
|
Ruan, Sipu | National University of Singapore |
Liu, Weixiao | Johns Hopkins University |
Wang, Xiaoli | National University of Singapore |
Meng, Xin | National University of Singapore |
Chirikjian, Gregory | National University of Singapore |
Keywords: Learning from Demonstration, Probability and Statistical Methods, Motion and Path Planning, Service Robots
Abstract: This article proposes a Learning-from-Demonstration (LfD) method using probability densities on the workspaces of robot manipulators. The method, named PRobabilistically-Informed Motion Primitives (PRIMP), learns the probability distribution of the end effector trajectories in the 6-D workspace that includes both positions and orientations. It is able to adapt to new situations such as novel via points with uncertainty and a change of viewing frame. The method itself is robot-agnostic, in that the learned distribution can be transferred to another robot with the adaptation to its workspace density. Workspace-STOMP, a new version of the existing STOMP motion planner, is also introduced, which can be used as a postprocess to improve the performance of PRIMP and any other reachability-based LfD method. The combination of PRIMP and Workspace-STOMP can further help the robot avoid novel obstacles that are not present during the demonstration process. The proposed methods are evaluated with several sets of benchmark experiments. PRIMP runs more than five times faster than existing state-of-the-art methods while generalizing trajectoriesmore than twice as close to both the demonstrations and novel desired poses. They are then combined with our lab’s robot imagination method that learns object affordances, illustrating the applicability to learn tool use through physical experiments.
|
|
TuAT13 |
Room 13 |
Agricultural Automation I |
Regular session |
Co-Chair: Mintchev, Stefano | ETH Zurich |
|
15:00-15:15, Paper TuAT13.1 | |
Design, Localization, Perception, and Control for GPS-Denied Autonomous Aerial Grasping and Harvesting |
|
Kumar, Ashish | Indian Institute of Technology, Kanpur |
Behera, Laxmidhar | IIT Kanpur |
Keywords: Aerial Systems: Applications, Agricultural Automation, Grasping
Abstract: In this paper, we present a comprehensive UAV system design to perform the highly complex task of off-centered aerial grasping. This task has several interdisciplinary research challenges which need to be addressed at once. The main design challenges are GPS-denied functionality, solely onboard computing, and avoiding off-the-shelf costly positioning systems. While in terms of algorithms, visual perception, localization, control, and grasping are the leading research problems. Hence in this paper, we make interdisciplinary contributions: (i) A detailed description of the fundamental challenges in indoor aerial grasping, (ii) a novel lightweight gripper design, (iii) a complete aerial platform design and in-lab fabrication, and (iv) localization, perception, control, grasping systems, and an end- to-end flight autonomy state-machine. Finally, we demonstrate the resulting aerial grasping system Drone-Bee achieving a high grasping rate for a highly challenging agricultural task of apple-like fruit harvesting, indoors in a vertical farming setting (Fig. 1). To our knowledge, such a system has not been previously discussed in the literature, and with its capabilities, this system pushes aerial manipulation towards 4 th generation.
|
|
15:15-15:30, Paper TuAT13.2 | |
Vision-Based Cow Tracking and Feeding Monitoring for Autonomous Livestock Farming (I) |
|
Guo, Yangyang | School of Internet, Anhui University, Hefei, Anhui 230039, China |
Wenhao, Hong | Anhui University |
Wu, Jiaxin | Anhui University |
Huang, Xiaoping | Anhui University |
Qiao, Yongliang | University of Adelaide |
Kong, He | Southern University of Science and Technology |
Keywords: Agricultural Automation, Robotics and Automation in Agriculture and Forestry, Robotics and Automation in Life Sciences
Abstract: In autonomous livestock farming, it is urgent to build an intelligent system to achieve precise feeding. Animal tracking and feeding monitoring is of great significance for automatic individual cow welfare measurement. Aiming at the problem of target matching error or loss in cow tracking caused by the complex farming environment and the frequent movement of cows in the scene, an improved cow vision-based tracking and feeding monitoring based on deep learning network is proposed. In proposed approach, Coordinate Attention (CA) integrated YOLOv5 was developed to capture spatial location information to improve the detection performance of the model for overlapping regions. Then the Vision Transformer (ViT) was embedded in the re-identification network DeepSORT to enhance feature matching and tracking accuracy. The comparative results of the multi-cow complex dataset constructed from commercial farm show that the ID F1 Score (IDF1) and Multi-target tracking accuracy (MOTA) of the improved YOLOv5s-CA+DeepSORT-ViT are improved by 1.9% and 2.8% higher than those of the original model, respectively, and the accuracy of continuous detection and tracking was improved. The ID switching (ID Sw.) times and the processing time are reduced by 50% and 20% respectively compared with the original model, which reduced the running time and the false identification of cows. The overall cow tracking performance of our proposed approach outperformed the other baselines (e.g. SORT, ByteTrack, BoT-SORT and DeepSORT). The proposed approach can effectively track multiple cows in complex scenes, enables multi-animal tracking in autonomous livestock farming.
|
|
15:30-15:45, Paper TuAT13.3 | |
Learning Occluded Branch Depth Maps in Forest Environments Using RGB-D Images |
|
Geckeler, Christian | ETH Zürich |
Aucone, Emanuele | ETH Zürich |
Schnider, Yannick | ETH Zurich |
Simeon, Andri | ETHZ |
von Bassewitz, Jan-Philipp | ETH Zurich |
Zhu, Yunying | ETHZ |
Mintchev, Stefano | ETH Zurich |
Keywords: Deep Learning for Visual Perception, Robotics and Automation in Agriculture and Forestry, RGB-D Perception
Abstract: Covering over a third of all terrestrial land area, forests are crucial environments; as ecosystems, for farming, and for human leisure. However, they are challenging to access for environmental monitoring, for agricultural uses, and for search and rescue applications. To enter, aerial robots need to fly through dense vegetation, where foliage can be pushed aside, but occluded branches pose critical obstacles. Therefore, we propose pixel-wise depth regression of occluded branches using three different U-Net inspired architectures. Given RGB-D input of trees with partially occluded branches, the models estimate depth values of only the wooden parts of the tree. A large photorealistic simulation dataset comprising around 44K images of nine different tree species is generated, on which the models are trained. Extensive evaluation and analysis of the models on this dataset is shown. To improve network generalization to real-world data, different data augmentation and transformation techniques are performed. The approaches are then also successfully demonstrated on real-world data of broadleaf trees from Swiss temperate forests and a tropical Masoala Rainforest. This work showcases the previously unexplored task of frame-by-frame pixel-based occluded branch depth reconstruction to facilitate robot traversal of forest environments. All models, code and data are available online (http://hdl.handle.net/20.500.11850/634419).
|
|
15:45-16:00, Paper TuAT13.4 | |
Robotic Volatile Sampling for Early Detection of Plant Stress (I) |
|
Geckeler, Christian | ETH Zürich |
Ramos, Sergio | University of Zürich |
Schuman, Meredith C. | University of Zurich |
Mintchev, Stefano | ETH Zurich |
Keywords: Robotics and Automation in Agriculture and Forestry, Aerial Systems: Applications, Field Robots
Abstract: Global agriculture is challenged to provide food for a larger than ever human population, while also reducing environmental impacts. Early identification of plant stress enables fast intervention to limit crop losses, and optimized application of pesticides and fertilizer to reduce environmental impacts. Current image-based approaches identify plant stress hours or days after the event, usually only after substantial damage has occurred. In contrast, plant volatiles are released after seconds to hours and can indicate both the type and severity of stress. An automatable and non-disruptive sampling method is needed to use plant volatiles in precision agriculture. This work details the development of a sampling pump which can be deployed and collected with an uncrewed aerial vehicle. The effect of sampling flow rate, horizontal distance to volatile source, and overhead downwash on collected volatiles is investigated, along with the deployment accuracy and retrieval successes. Robotic collection of plant volatiles is a first and important step towards the use of chemical signals for early stress detection and opens up new avenues for precision agriculture beyond visual remote sensing.
|
|
TuBT1 |
Room 1 |
Best Agri-Robotics Papers (YANMAR) |
Regular session |
Chair: Stachniss, Cyrill | University of Bonn |
Co-Chair: Papadopoulos, Evangelos | National Technical University of Athens |
|
16:00-16:15, Paper TuBT1.1 | |
BonnBeetClouds3D: A Dataset towards Point Cloud-Based Organ-Level Phenotyping of Sugar Beet Plants under Real Field Conditions |
|
Marks, Elias Ariel | University of Bonn |
Bömer, Jonas | Institute of Sugar Beet Rearch Goettingen |
Magistri, Federico | University of Bonn |
Sah, Anurag | University of Bonn |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Keywords: Robotics and Automation in Agriculture and Forestry, Data Sets for Robotic Vision, Computer Vision for Automation
Abstract: Agricultural production is facing challenges in the next decades induced by climate change and the need for more sustainability by reducing its impact on the environment. Advances in field management through robotic intervention, monitoring of crops by autonomous unmanned aerial vehicles (UAVs) supporting breeding of novel and more resilient crop varieties can help to address these challenges. The analysis of plant traits is called phenotyping and is an essential activity in plant breeding; it however involves a great amount of manual labor. With this paper, we provide means to better tackle the problems of instance segmentation to support robotic intervention and automatic fine-grained, organ-level geometric analysis needed for precision phenotyping. As the availability of real- world data in this domain is relatively scarce, we provide a novel dataset that was acquired using UAVs capturing high-resolution images of real breeding trials containing 48 plant varieties and therefore covering a relevant morphological and appearance spectrum. This enables the development of approaches for instance segmentation and autonomous phenotyping that generalize well to different plant varieties. Based on overlapping high-resolution images taken from multiple viewing angles, we provide photogrammetric dense point clouds and provide detailed and accurate point-wise labels for plants, leaves, and salient points as the tip and the base in 3D. Additionally, we include measurements of phenotypic traits performed by experts from the German Federal Plant Variety Office on the real plants, allowing the evaluation of new approaches not only on segmentation and keypoint detection but also directly on actual traits. The provided labeled point clouds enable fine- grained plant analysis and support further progress in the development of automatic phenotyping approaches, but also enable further research in surface reconstruction, point cloud completion, and semantic interpretation of point clouds.
|
|
16:15-16:30, Paper TuBT1.2 | |
Vinymap: A Vineyard Inspection and 3D Reconstruction Framework for Agricultural Robots |
|
Zarras, Ioannis | National Technical University of Athens |
Mastrogeorgiou, Athanasios | National Technical University of Athens |
Machairas, Konstantinos | National Technical University of Athens |
Koutsoukis, Konstantinos | National Technical University of Athens |
Papadopoulos, Evangelos | National Technical University of Athens |
Keywords: Agricultural Automation, Vision-Based Navigation, RGB-D Perception
Abstract: Efficient and thorough vineyard inspection is crucial for optimizing yield and preventing disease from spreading. Manual approaches are labor-intensive and prone to human error, motivating the development of automated solutions. Precision viticulture benefits greatly from access to photo-realistic 3D vineyard maps and from capturing intricate visual details necessary for accurate canopy and grape health assessment. Generating such maps efficiently proves challenging, particularly when employing cost-effective equipment. This paper presents a novel vineyard inspection and 3D reconstruction framework implemented on a Robotic Platform (RP) equipped with three stereo cameras. The framework's performance was evaluated on an experimental synthetic vineyard developed at NTUA. This testing setup allowed experimentation under diverse lighting conditions, ensuring the system's robustness under realistic scenarios. Unlike existing solutions, which often focus on specific aspects of the inspection, our framework offers a top-down approach, encompassing autonomous navigation, high-fidelity 3D reconstruction, and canopy growth assessment. The developed software is available at the CSL's bitbucket repository.
|
|
16:30-16:45, Paper TuBT1.3 | |
Sim2real Cattle Joint Estimation in 3D Pointclouds |
|
Okour, Mohammad | University of Technology Sydney |
Alempijevic, Alen | University of Technology Sydney |
Falque, Raphael | University of Technology Sydney |
Keywords: Robotics and Automation in Agriculture and Forestry, Deep Learning for Visual Perception
Abstract: Understanding the well-being of cattle is crucial in various agricultural contexts. Cattle's body shape and joint articulation carry significant information about their welfare, yet acquiring comprehensive datasets for 3D body pose estimation presents a formidable challenge. This study delves into the construction of such a dataset specifically tailored for cattle. Leveraging the expertise of digital artists, we use a single animated 3D model to represent diverse cattle postures. To address the disparity between virtual and real-world data, we augment the 3D model's shape to encompass a range of potential body appearances, thereby narrowing the "sim2real" gap. We use these annotated models to train a deep learning framework capable of estimating internal joints solely based on external surface curvature. Our contribution is specifically the use of geodesic distance over the surface manifold, coupled with multilateration to extract joints in a semantic keypoint detection encoder-decoder architecture. We demonstrate the robustness of joint extraction by analysis of link lengths extracted on real cattle mobbing and walking within a race. Furthermore, inspired by the established allometric relationship between bone length and the overall height of mammals, we utilise the estimated joints to predict hip height within a real cattle dataset, extending the utility of our approach to offer insights into improving cattle monitoring practices.
|
|
16:45-17:00, Paper TuBT1.4 | |
Ground-Density Clustering for Approximate Agricultural Field Segmentation |
|
Nelson, Henry J. | University of Minnesota |
Papanikolopoulos, Nikos | University of Minnesota |
Keywords: Agricultural Automation, Robotics and Automation in Agriculture and Forestry
Abstract: Instance and semantic segmentation form the backbone of robotic perception and are crucial to many tasks. While most research in the area focuses on improving segmentation quality metrics, there are plenty of applications where approximate methods are adequate as long as they are fast, especially in applications with large amounts of data like precision agriculture. In order to apply the recent successes of machine learning and computer vision on a large scale using robotics, efficient and general algorithms must be designed to intelligently split point clouds into small, yet actionable, portions that can then be processed by more complex algorithms. In this paper, we capitalize on a similarity between the current state-of-the-art for roughly segmenting corn plants and a commonly used density-based clustering algorithm, Quickshift. Exploiting this similarity we propose a novel algorithm, Ground-Density Quickshift++, with the goal of producing a general and scalable field segmentation algorithm that segments individual plants and their stems. This algorithm produces quantitatively better results than the current state-of-the-art on both plant separation and stem segmentation while being less sensitive to input parameters and maintaining the same algorithmic time complexity. When incorporated into field-scale phenotyping systems, the proposed algorithm should work as a drop-in replacement that can greatly improve the accuracy of results while ensuring that performance and scalability remain undiminished.
|
|
TuBT2 |
Room 2 |
Best RoboCup Papers |
Regular session |
Co-Chair: Fang, Hao-Shu | Massachusetts Institute of Technology |
|
16:00-16:15, Paper TuBT2.1 | |
A Convex Formulation of Frictional Contact for the Material Point Method and Rigid Bodies |
|
Zong, Zeshun | University of California, Los Angeles |
Han, Xuchen | Toyota Research Institute |
Jiang, Chenfanfu | University of California, Los Angeles |
Keywords: Simulation and Animation, Contact Modeling, Dynamics
Abstract: In this paper, we introduce a novel convex formulation that seamlessly integrates the Material Point Method (MPM) with articulated rigid body dynamics in frictional contact scenarios. We extend the linear corotational hyperelastic model into the realm of elastoplasticity and include an efficient return mapping algorithm. This approach is particularly effective for MPM simulations involving significant deformation and topology changes, while preserving the convexity of the optimization problem. Our method ensures global convergence, enabling the use of large simulation time steps without compromising robustness. We have validated our approach through rigorous testing and performance evaluations, highlighting its superior capabilities in managing complex simulations relevant to robotics. Compared to previous MPM-based robotic simulators, our method significantly improves the stability of contact resolution -- a critical factor in robot manipulation tasks. We have made our method available in the open-source robotics toolkit, Drake.
|
|
16:15-16:30, Paper TuBT2.2 | |
Differentiable Collision-Free Parametric Corridors |
|
Arrizabalaga, Jon | Technical University of Munich (TUM) |
Manchester, Zachary | Carnegie Mellon University |
Ryll, Markus | Technical University Munich |
Keywords: Motion and Path Planning, Optimization and Optimal Control
Abstract: This paper presents a method to compute differentiable collision-free parametric corridors. In contrast to existing solutions that decompose the obstacle-free space into multiple convex sets, the continuous corridors computed by our method are smooth and differentiable, making them compatible with existing numerical techniques for learning and optimization. To achieve this, we represent the collision-free corridors as a path-parametric off-centered ellipse with a polynomial basis. We show that the problem of maximizing the volume of such corridors is convex, and can be efficiently solved. To assess the effectiveness of the proposed method, we examine its performance in a synthetic case study and subsequently evaluate its applicability in a real-world scenario from the KITTI dataset.
|
|
16:30-16:45, Paper TuBT2.3 | |
IMU Based Pose Reconstruction and Closed-Loop Control for Soft Robotic Arms |
|
Pei, Guanran | École Polytechnique Fédérale De Lausanne |
Stella, Francesco | EPFL |
Meebed, Omar Hani Mokhtar Ahmed | EPFL |
Bing, Zhenshan | Technical University of Munich |
Della Santina, Cosimo | TU Delft |
Hughes, Josie | EPFL |
Keywords: Soft Sensors and Actuators, Modeling, Control, and Learning for Soft Robots, Soft Robot Applications
Abstract: Soft continuum manipulators are celebrated for their versatility and physical robustness to external forces and perturbations. However, this feature comes at a cost. The many degrees of freedom and compliance pose challenges for accurate pose reconstruction, both in terms of distributed sensing and pose reconstruction algorithms. Moreover, soft arms are inherently susceptible to deformation from external forces or loads, meaning that closed-loop control is essential for robust task performance. In this article, we propose the integration of multiple Inertial Measurement Units (IMUs) of a soft robot arm, Helix, for reconstruction of pose under internal and external forces. Furthermore, we integrate this dynamic pose reconstruction for kinematic-based closed-loop control strategies. By serially integrating sensing in the body of the Helix soft manipulator, we provide the system with high-frequency pose reconstruction and demonstrate improvements in end effector position with comparison to open-loop performance.
|
|
16:45-17:00, Paper TuBT2.4 | |
EyeSight Hand: Design of a Fully-Actuated Dexterous Robot Hand with Integrated Vision-Based Tactile Sensors and Compliant Actuation |
|
Romero, Branden | Massachusetts Institute of Technology |
Fang, Hao-Shu | Shanghai Jiao Tong University |
Agrawal, Pulkit | MIT |
Adelson, Edward | MIT |
Keywords: Force and Tactile Sensing, Multifingered Hands, Imitation Learning
Abstract: In this work, we introduce the EyeSight Hand, a 7 degrees of freedom (DoF) humanoid hand featuring integrated vision-based tactile sensors tailored for enhanced whole-hand manipulation. Additionally, we introduce an actuation scheme centered around quasi-direct drive actuation to achieve human-like strength and speed while ensuring robustness for large-scale data collection. We evaluate the EyeSight Hand on three challenging tasks: bottle opening, plasticine cutting, and plate pick and place, which require a blend of complex manipulation, tool use, and precise force application. Imitation learning models trained on these tasks, with a vision dropout strategy, showcase the benefits of tactile feedback in enhancing task success rates. Our results reveal that the integration of tactile sensing dramatically improves task performance, underscoring the critical role of tactile information in dexterous manipulation.
|
|
TuBT3 |
Room 3 |
Active Perception II |
Regular session |
Chair: Pb, Sujit | IISER Bhopal |
Co-Chair: Campolo, Domenico | Nanyang Technological University |
|
16:00-16:15, Paper TuBT3.1 | |
Learning Hierarchical Graph-Based Policy for Goal-Reaching in Unknown Environments |
|
Cui, Yuxiang | Zhejiang University |
Ye, Shuhao | Zhejiang University |
Xu, Xuecheng | Zhejiang University |
Sha, Hao | Zhejiang University |
Wang, Cheng | Zhejiang University |
Lin, Longzhong | Zhejiang University |
Yang, Yifei | Zhejiang University |
Yu, Jiyu | Zhejiang University |
Liu, Zhe | University of Cambridge |
Xiong, Rong | Zhejiang University |
Wang, Yue | Zhejiang University |
Keywords: Service Robotics, Reinforcement Learning, Integrated Planning and Learning
Abstract: Goal-reaching in unknown environments is one of the essential tasks in robot applications. Large-scale perception and long-horizon decision-making are the keys to solving this task as the operation scope expands or complexity rises. Existing navigation methods may suffer from degraded performance in complicated environments induced by scalability-limited map representation or greedy decision strategy. We propose the path-extended graph as a compact map representation providing sufficient structural information within a reasonable receptive field and incorporate it into a hierarchical policy for higher efficiency and generalizability. The path-extended graph contains the concise topology of environment structure and frontier layout for large-scale perception, avoiding the impact of redundant information. The hierarchical policy solves long-horizon non-myopic decision-making through a high-level frontier selection policy using deep reinforcement learning (DRL) and a low-level motion controller that handles path planning and collision avoidance. Simulation and real-world experiments demonstrate that our method outperforms other competitive approaches in avoiding redundant movement and achieves efficient goal-reaching, especially in complex environments.
|
|
16:15-16:30, Paper TuBT3.2 | |
Multi-Agent Deep Reinforcement Learning for Persistent Monitoring with Sensing, Communication, and Localization Constraints (I) |
|
Mishra, Manav | IISER Bhopal |
Poddar, Prithvi | University at Buffalo |
Agrawal, Rajat | Indian Institute of Science Education and Research Bhopal |
Chen, Jingxi | University of Maryland |
Tokekar, Pratap | University of Maryland |
Pb, Sujit | IISER Bhopal |
Keywords: Surveillance Robotic Systems, Reinforcement Learning, Multi-Robot Systems
Abstract: Determining multi-robot motion policies for persistently monitoring a region with limited sensing, communication, and localization constraints in non-GPS environments is a challenging problem. To take the localization constraints into account, in this paper, we consider a heterogeneous robotic system consisting of two types of agents: anchor agents with accurate localization capability and auxiliary agents with low localization accuracy. To localize itself, the auxiliary agents must be within the communication range of an anchor, directly or indirectly. The robotic team’s objective is to minimize environmental uncertainty through persistent monitoring. We propose a multi-agent deep reinforcement learning (MARL) based architecture with graph convolution called Graph Localized Proximal Policy Optimization (GALOPP), which incorporates the limited sensor field-of-view, communication, and localization constraints of the agents along with persistent monitoring objectives to determine motion policies for each agent. We evaluate the performance of GALOPP on open maps with obstacles having a different number of anchor and auxiliary agents. We further study 1) the effect of communication range, obstacle density, and sensing range on the performance and 2) compare the performance of GALOPP with area partition, greedy search, random search, and random search with communication constraint strategies. For its generalization capability, we also evaluated GALOPP in two different environments – 2-room and 4-room. The results show that GALOPP learns the policies and monitors the area well. As a proof-of-concept, we perform hardware experiments to demonstrate the performance of GALOPP.
|
|
16:30-16:45, Paper TuBT3.3 | |
Contingency Games for Multi-Agent Interaction |
|
Peters, Lasse | Delft University of Technology |
Bajcsy, Andrea | Carnegie Mellon University |
Chiu, Chih-Yuan | University of California, Berkeley |
Fridovich-Keil, David | The University of Texas at Austin |
Laine, Forrest | Vanderbilt University |
Ferranti, Laura | Delft University of Technology |
Alonso-Mora, Javier | Delft University of Technology |
Keywords: Human-Aware Motion Planning, Planning under Uncertainty, Motion and Path Planning
Abstract: Contingency planning, wherein an agent generates a set of possible plans conditioned on the outcome of an uncertain event, is an increasingly popular way for robots to act under uncertainty. In this work we take a game-theoretic perspective on contingency planning, tailored to multi-agent scenarios in which a robot’s actions impact the decisions of other agents and vice versa. The resulting contingency game allows the robot to efficiently interact with other agents by generating strategic motion plans conditioned on multiple possible intents for other actors in the scene. Contingency games are parameterized via a scalar variable which represents a future time when intent uncertainty will be resolved. By estimating this parameter online, we construct a game-theoretic motion planner that adapts to changing beliefs while anticipating future certainty. We show that existing variants of game-theoretic planning under uncertainty are readily obtained as special cases of contingency games. Through a series of simulated autonomous driving scenarios, we demonstrate that contingency games close the gap between certainty-equivalent games that commit to a single hypothesis and non-contingent multi-hypothesis games that do not account for future uncertainty reduction.
|
|
16:45-17:00, Paper TuBT3.4 | |
GesGPT: Speech Gesture Synthesis with Text Parsing from ChatGPT |
|
Gao, Nan | Institute of Automation, Chinese Academy of Sciences |
Zhao, Zeyu | Institute of Automation, Chinese Academy of Sciences |
Zeng, Zhi | Beijing University of Posts and Telecommunications |
Zhang, Shuwu | Beijing University of Posts and Telecommunications |
Weng, Dongdong | Beijing Institute of Technology |
Bao, Yihua | Beijing Institute of Technology |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Gesture, Posture and Facial Expressions
Abstract: Gesture synthesis has gained significant attention as a critical research field, aiming to produce contextually appropriate and natural gestures corresponding to speech or textual input. Although deep learning-based approaches have achieved remarkable progress, they often overlook the rich semantic information present in the text, leading to less expressive and meaningful gestures. In this letter, we propose GesGPT, a novel approach to gesture generation that leverages the semantic analysis capabilities of large language models , such as ChatGPT. By capitalizing on the strengths of LLMs for text analysis, we adopt a controlled approach to generate and integrate professional gestures and base gestures through a text parsing script, resulting in diverse and meaningful gestures. Firstly, our approach involves the development of prompt principles that transform gesture generation into an intention classification problem using ChatGPT. We also conduct further analysis on emphasis words and semantic words to aid in gesture generation. Subsequently, we construct a specialized gesture lexicon with multiple semantic annotations, decoupling the synthesis of gestures into professional gestures and base gestures. Finally, we merge the professional gestures with base gestures. Experimental results demonstrate that GesGPT effectively generates contextually appropriate and expressive gestures, offering a new perspective on semantic co-speech gesture generation.
|
|
TuBT4 |
Room 4 |
Modeling, Control, and Learning for Soft Robots |
Regular session |
Co-Chair: George Thuruthel, Thomas | University College London |
|
16:00-16:15, Paper TuBT4.1 | |
Hyperboloidal Pneumatic Artificial Muscle with Braided Straight Fibers |
|
Watanabe, Masahiro | Osaka University |
Tadakuma, Kenjiro | Osaka University |
Tadokoro, Satoshi | Tohoku University |
Keywords: Hydraulic/Pneumatic Actuators, Modeling, Control, and Learning for Soft Robots, Soft Sensors and Actuators
Abstract: This paper introduces the development and analysis of a hyperboloidal pneumatic artificial muscle (h-PAM) utilizing braided straight fibers, aimed at overcoming the limitations of traditional pneumatic artificial muscles (PAMs). The novel design features a hyperboloidal rubber tube coupled with a braided shell of straight fibers, enhancing both contraction performance and flexibility, a significant advantage particularly for shorter muscles. Through deformation simulations, we have demonstrated that critical performance metrics such as contraction ratio, radius expansion ratio, and contraction force can be effectively tailored by adjusting the height-to-radius ratio and the fiber's offset angle. The fabricated h-PAM prototype's performance was evaluated, showcasing its capability for contraction and flexibility even in compact forms, thus highlighting its potential as an actuator in soft robotics applications.
|
|
16:15-16:30, Paper TuBT4.2 | |
A Hybrid Adaptive Controller for Soft Robot Interchangeability |
|
Chen, Zixi | Scuola Superiore Sant'Anna |
Ren, Xuyang | Scuola Superiore Sant’Anna |
Bernabei, Matteo | Scuola Superiore Sant'Anna |
Mainardi, Vanessa | Scuola Superiore Sant'Anna |
Ciuti, Gastone | Scuola Superiore Sant'Anna |
Stefanini, Cesare | Scuola Superiore Sant'Anna |
Keywords: Modeling, Control, and Learning for Soft Robots, Machine Learning for Robot Control, Robust/Adaptive Control
Abstract: Soft robots have been leveraged in considerable areas like surgery, rehabilitation, and bionics due to their softness, flexibility, and safety. However, it is challenging to produce two same soft robots even with the same mold and manufacturing process owing to the complexity of soft materials. Meanwhile, widespread usage of a system requires the ability to replace inner components without highly affecting system performance, which is interchangeability. Due to the necessity of this property, a hybrid adaptive controller is introduced to achieve interchangeability from the perspective of control approaches. This method utilizes an offline-trained recurrent neural network controller to cope with the nonlinear and delayed response from soft robots. Furthermore, an online optimizing kinematics controller is applied to decrease the error caused by the above neural network controller. Soft pneumatic robots with different deformation properties but the same mold have been included for validation experiments. In the experiments, the systems with different actuation configurations and the different robots follow the desired trajectory with errors of 3.3 ± 2.9% and 4.3 ± 4.1% compared with the working space length, respectively. Such an adaptive controller also shows good performance on different control frequencies and desired velocities. This controller is also compared with a model-based controller in simulation. This controller endows soft robots with the potential for wide application, and future work may include different offline and online controllers. A weight parameter adjusting strategy may also be proposed in the future.
|
|
16:30-16:45, Paper TuBT4.3 | |
DisMech: A Discrete Differential Geometry-Based Physical Simulator for Soft Robots and Structures |
|
Choi, Andrew | University of California, Los Angeles |
Jing, Ran | Boston University |
Sabelhaus, Andrew | Boston University |
Khalid Jawed, Mohammad | University of California, Los Angeles |
Keywords: Modeling, Control, and Learning for Soft Robots, Simulation and Animation, Soft Robot Materials and Design
Abstract: Fast, accurate, and generalizable simulations are a key enabler of modern advances in robot design and control. However, existing simulation frameworks in robotics either model rigid environments and mechanisms only, or if they include flexible or soft structures, suffer significantly in one or more of these performance areas. To close this ``sim2real'' gap, we introduce DisMech, a simulation environment that models highly dynamic motions of rod-like soft continuum robots and structures, quickly and accurately, with arbitrary connections between them. Our methodology combines a fully implicit discrete differential geometry-based physics solver with fast and accurate contact handling, all in an intuitive software interface. Crucially, we propose a gradient descent approach to easily map the motions of hardware robot prototypes to control inputs in DisMech. We validate DisMech through several highly-nuanced soft robot simulations while demonstrating an order of magnitude speed increase over previous state of the art. Our real2sim validation shows high physical accuracy versus hardware, even with complicated soft actuation mechanisms such as shape memory alloy wires. With its low computational cost, physical accuracy, and ease of use, DisMech can accelerate translation of sim-based control for both soft robotics and deformable object manipulation.
|
|
16:45-17:00, Paper TuBT4.4 | |
RL-Based Adaptive Controller for High Precision Reaching in a Soft Robot Arm (I) |
|
Nazeer, Muhammad Sunny | College of Design Engineering, National University of Singapore |
Laschi, Cecilia | National University of Singapore |
Falotico, Egidio | Scuola Superiore Sant'Anna |
Keywords: Modeling, Control, and Learning for Soft Robots, Robust/Adaptive Control of Robotic Systems, Collision Avoidance, Bayesian Optimization Assisted Control
Abstract: High precision control of soft robots is challenging due to their stohcastic behavior and material-dependent nature. While RL has been applied in soft robotics, achieving precision in task execution is still a long way off. Traditionally, RL requires substantial data for convergence, often obtained from a training environment. Yet, despite exhibiting high accuracy in the training environment, RL-policies often fall short in reality due to the training-to-reality gap, and the performance is exacerbated by the stochastic nature of soft robots. This study paves the way for the implementation of RL for soft robot control to achieve high precision in task execution. Two sample-efficient adaptive control strategies are proposed that leverage the RL-policy. The schemes can overcome stochasticity, bridge the training-to-reality gap, and attain desired accuracy even in challenging tasks, such as obstacle avoidance. In addition, deliberate and reversible damage is induced to the pneumatic actuation chamber, altering the soft robot’s behavior to test the adaptability of our solutions. Despite the damage, desired accuracy was achieved in most scenarios without needing to retrain the RL-policy.
|
|
TuBT5 |
Room 5 |
Calibration and Identification I |
Regular session |
Chair: Ganguly, Amartya | Technical University of Munich |
Co-Chair: Campolo, Domenico | Nanyang Technological University |
|
16:00-16:15, Paper TuBT5.1 | |
Spatio-Temporal Calibration for Omni-Directional Vehicle-Mounted Event Cameras |
|
Li, Xiao | National University of Defense Technology |
Zhou, Yi | Hunan University |
Guo, Ruibin | National University of Defense Technology |
Peng, Xin | ShanghaiTech University |
Zhou, Zongtan | National University of Defense Technology |
Lu, Huimin | National University of Defense Technology |
Keywords: Calibration and Identification, SLAM
Abstract: 我们提出了一个时空校准问题的解决方案 安装在 OnmiDirectional 车辆上的事件摄像机。不同于 通常用于确定摄像机姿势的传统方法 使用轨迹对齐的车辆车身框架, 我们的方法利用了两组线性 分别根据事件数据和车轮里程表进行速度估计。 整体校准任务包括估计底层 两个异构传感器之间的时间偏移,此外, 恢复定义线性关系的 extrinsic rotation 在两组速度估计值之间。最佳时间偏移 通过将相关性测量不变量最大化为 任意线性变换。一旦时间偏移量为 compensated,则 extrinsic rotation 可以通过迭代 以增量方式注册关联线性的闭式求解器 速度估计。所提算法在两者上均被证明是有效的 合成数据和真实数据,优于基于传统方法 在轨迹对齐上。
|
|
16:15-16:30, Paper TuBT5.2 | |
Calibration System and Algorithm Design for a Soft Hinged Micro Scanning Mirror with a Triaxial Hall Effect Sensor |
|
Wang, Di | Texas A&M University |
Duan, Xiaoyu | Texas A&M University |
Yeh, Shu-Hao | Texas A&M University |
Zou, Jun | Texas A&M University |
Song, Dezhen | Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) |
Keywords: Calibration and Identification, Mechanism Design
Abstract: Micro scanning mirrors (MSM) extend the range and field of view of LiDARs, medical imaging devices, and laser projectors. However, a new class of soft-hinged MSMs contains out-of-plane translation in addition to the 2 degree-of-freedom rotations, which presents a cabliration challenge. We report a new calibration system and algorithm design to address the challenge. In the calibration system, a new low-cost calibration rig design employs a minimal 2-laser beam approach. The new new algorithm builds on the reflection principle and an optimization approach to precisely measure MSM poses. To establish the mapping between Hall sensor readings and MSM poses, we propose a self-synchronizing periodicity-based model fitting calibration approach. We achieve an MSM poses estimation accuracy of 0.020^circ with a standard deviation of 0.011^circ.
|
|
16:30-16:45, Paper TuBT5.3 | |
LiDAR-Link: Observability-Aware Probabilistic Plane-Based Extrinsic Calibration for Non-Overlapping Solid-State LiDARs |
|
Xu, Jie | Harbin Institute of Technology |
Huang, Song | Anhui Normal University |
Qiu, Shuxin | NanChang Institute of Technology |
Zhao, Lijun | Harbin Institute of Technology |
Yu, Wenlu | Harbin Institute of Technology |
Fang, Mingxing | Anhui Normal University |
Wang, Minhang | HAOMO.AI Technology Co., Ltd |
Li, Ruifeng | Harbin Institute of Technology |
Keywords: Calibration and Identification, Factory Automation
Abstract: As solid-state LiDAR technology advances, mobile robotics and autonomous driving increasingly rely on multiple solid-state LiDARs for perception. However, limited or non-overlapping fields of view (FoV) among these sensors pose significant challenges for extrinsic calibration. Moreover, there are no quantitative indicators to evaluate calibration quality currently. To tackle these challenges, we introduce LiDAR-Link, a novel approach for calibration and evaluation, consisting of LiDAR-Bridge and LiDAR-Align. LiDAR-Bridge employs a wide-angle LiDAR as an intermediary to align point clouds from small-angle solid-state LiDARs with limited or non-overlapping FoVs and indirectly obtain the extrinsics between the small-angle LiDARs. LiDAR-Align adaptively constructs a Voxel Probabilistic Plane Map to efficiently match point clouds and differentiate the contributions of various high-quality planar features. To align these point clouds, Iterated Extended Kalman Filter (IEKF) is utilized, which is based on point-to-plane residuals. The resulting covariance verifies whether point cloud alignment constraints are met and assesses calibration reliability. Furthermore, LiDAR-Align supports multi-scene joint calibration to overcome the limitations of fewer constraints caused by smaller LiDAR FoVs. We validate the accuracy of our method and the significance of observability analysis through a comprehensive set of experiments. To promote community development, we provide public access to LiDAR-Link's source code and experimental datasets.
|
|
16:45-17:00, Paper TuBT5.4 | |
I Get the Hang of It! a Learning-Free Method to Predict Hanging Poses for Previously Unseen Objects |
|
Li, Wanze | Nation University of Singapore |
Pan, Lexin | National University of Singapore |
Jiang, Boren | National University of Singapore |
Wu, Yuwei | National University of Singapore |
Liu, Weixiao | Johns Hopkins University |
Chirikjian, Gregory | National University of Singapore |
Keywords: Calibration and Identification, Computational Geometry, Domestic Robotics
Abstract: The action of hanging previously unseen objects remains a challenge for robots due to the multitude of object shapes and the limited number of stable hanging arrangements. This paper proposes a learning-free framework that enables robots to infer stable relative poses between the object being hung (object) and the supporting item (supporter). Our method identifies potential hanging positions and orientations on previously unseen supporters and objects by analyzing the hanging mechanics and geometric properties. An evaluation policy is designed to match potential hanging positions and directions and to optimize the relative hanging poses. Experiments were conducted in both simulation and real-world scenarios. The success rates of our strategy outperform the state-of-the-art baseline method. The proposed method was also tested on unhangable pairs of objects and supporters and results show that our algorithm can reject false positive hanging properly. Finally, we ran experiments under different scanning conditions. Experimental results indicate that although the success rate decreases as the quality of the scan decreases, it remains at a high level.
|
|
TuBT6 |
Room 6 |
Parallel Robots |
Regular session |
Chair: Mueller, Andreas | Johannes Kepler University |
Co-Chair: Keshavan, Jishnu | Indian Institute of Science |
|
16:00-16:15, Paper TuBT6.1 | |
Graph-Propagation-Based Kinematic Algorithm for In-Pipe Truss Structure Robots |
|
Chen, Yu | Carnegie Mellon University |
Xu, Jinyun | Carnegie Mellon University |
Cai, Yilin | Georgia Institute of Technology |
Yang, Shuo | Carnegie Mellon University |
Brown, H. Ben | Carnegie Mellon University |
Ruan, Fujun | Carnegie Mellon University |
Gu, Yizhu | Carnegie Mellon University |
Choset, Howie | Carnegie Mellon University |
Li, Lu | Carnegie Mellon University |
Keywords: Parallel Robots, Kinematics, Redundant Robots
Abstract: Robots designed for in-pipe navigation, inspection, and repair require flexibility for intricate pipeline traversal and the strength to carry payloads. However, conventional wheeled in-pipe robots face challenges in simultaneously achieving both substantial flexibility and payload-carrying capacity. A superior approach involves utilizing truss robots with redundant joints and linkages for pipe shape adaptation and actuation force distribution, providing significant advantages for complex pipeline navigation and heavy payload delivery. However, the kinematics of truss robots is computationally expensive for conventional Jacobian-based algorithms due to their complicated structural constraints. To address this limitation, we propose a novel algorithm for efficient truss-robot-kinematics computation using Graph Progation (GP) method. Our method computes both forward kinematics and Jacobian in a propagative manner. It also guarantees geometric constraints with the Sigmoid function as the boundary. In simulation experiments, our algorithm accelerates pipe shape adaptation computation by 5.2 - 16.4 times compared to finite difference methods. The practical feasibility of our method is assessed through physical in-pipe crawling experiments using a truss robot prototype. Additionally, the prototype's ability to carry heavy payloads is demonstrated through payload-carrying experiments. Our prototype carries 2 - 4 times heavier payloads compared to two-wheeled robot approaches. We also showcase the proposed method's versatility in addressing manipulation tasks, indicating its generalizability across diverse applications.
|
|
16:15-16:30, Paper TuBT6.2 | |
Dedicated Dynamic Parameter Identification for Delta-Like Robots |
|
Gnad, Daniel | Johannes Kepler University Linz |
Gattringer, Hubert | Johannes Kepler University Linz |
Mueller, Andreas | Johannes Kepler University |
Hoebarth, Wolfgang | B&R Industrie-Elektronik GmbH |
Riepl, Roland | B&R Industrial Automation GmbH |
Messner, Lukas | B&R Industrie-Elektronik GmbH |
Keywords: Parallel Robots, Dynamics, Calibration and Identification
Abstract: Dynamics simulation of parallel kinematic manipulators (PKM) and non-linear control methods require a precisely identified dynamics model and explicit generalized mass matrix. Standard methods, which identify so-called dynamic base-parameters, are not sufficient to this end. Algorithms for identifying the complete set of dynamic parameters were proposed for serial manipulators. A dedicated identification method for PKM does not exist, however. Such a method is introduced here for the large class of Delta-like PKM exploiting the parallel structure and making use of model simplifications specific to this class. The proposed method guarantees physical consistency of the identified parameters, and in particular a positive definite generalized mass matrix. The method is applied to a simulated model with exactly known parameters, which allows for verification of the obtained dynamic parameters. The results show that the generalized mass matrix, the acceleration, and the Coriolis, gravitation and friction terms in the equations of motion (EOM) are well approximated. The second example is a real 4-DOF industrial Delta robot ABB IRB 360-6/1600. For this robot, a physically consistent set of inertia and friction parameters is identified from measurements. The method allows prescribing estimated parameters, but does not rely on such data, e.g. from manufacturer or CAD.
|
|
16:30-16:45, Paper TuBT6.3 | |
Real-Time Constrained Tracking Control of Redundant Manipulators Using a Koopman - Zeroing Neural Network Framework |
|
Sah, Chandan Kumar | Indian Institute of Science |
Singh, Rajpal | Indian Institute of Science |
Keshavan, Jishnu | Indian Institute of Science |
Keywords: Redundant Robots, Model Learning for Control, Machine Learning for Robot Control
Abstract: This study proposes a combined Koopman-ZNN (Zeroing Neural Network) architecture for real-time control of redundant manipulators subject to input constraints. An autoencoder-based neural architecture is employed to discover the bilinear Koopman model for manipulator dynamics in joint space using input-output data, which is subsequently integrated with a feed-forward neural network that maps the joint coordinates to end-effector Cartesian coordinates. The proposed architecture allows for efficient learning of highly accurate models using a significantly lower number of observable states compared to the previous studies. This learning architecture is then coupled with a ZNN controller, which offers a computationally inexpensive alternative to state-of-the-art Nonlinear Model Predictive Control (NMPC) controllers whose computational burden might render real-time control infeasible. The low-dimensional nature of the learned model, combined with a computationally inexpensive ZNN controller, facilitates real-time control applications with improved tracking accuracy. Simulation and experimental studies of trajectory tracking, including performance comparisons with leading alternative designs, are used to verify the efficacy of the proposed scheme.
|
|
16:45-17:00, Paper TuBT6.4 | |
Bio-Inspired Rigid-Soft Hybrid Origami Actuator with Controllable Versatile Motion and Variable Stiffness (I) |
|
Zhang, Zhuang | Westlake University |
Chen, Genliang | Shanghai Jiao Tong University |
Xun, Yuanhao | Shanghai Jiao Tong University |
Long, Yongzhou | Shanghai Jiaotong University |
Wang, Jue | Purdue University |
Wang, Hao | Shanghai Jiao Tong University |
Angeles, Jorge | McGill University |
Keywords: Soft Sensors and Actuators, Parallel Robots, Origami Robots, Soft Robot Materials and Design
Abstract: Conventional soft pneumatic actuators are made of soft materials that facilitate safe interaction and adaptability. In positioning and loading tasks, however, SPAs demonstrate limited performance. Here, we extend the current designs of SPAs upon integrating a tendon-driven parallel mechanism into a pneumatic origami chamber, inspired by the performances and structures of vertebrates. The inner-rigid/outer-soft actuator exploits the advantages of both, parallel mechanisms to achieve precise, versatile motion, and SPAs, to form a compliant, modular structure. With the antagonistic actuation of tendons-pulling and air-pushing, the actuator can exhibit multi-mode motion, tunable stiffness, and load-carrying. Kinematic and quasi-static models are developed to predict the behavior and to control the actuator. Using readily accessible materials and fabrication methods, a prototype was built, on which validation experiments were conducted. The results prove the effectiveness of the model, and demonstrate the motion and stiffness characteristics of the actuator. The design strategy and comprehensive guidelines should expand the capabilities of soft robots for wider applications.
|
|
TuBT7 |
Room 7 |
Human-Robot Interaction I |
Regular session |
Chair: Ryu, Jee-Hwan | Korea Advanced Institute of Science and Technology |
|
16:00-16:15, Paper TuBT7.1 | |
Shared Autonomy of a Robotic Manipulator for Grasping under Human Intent Uncertainty Using POMDPs (I) |
|
Yow, J-Anne | Nanyang Technological University |
Garg, Neha Priyadarshini | NUS |
Ang, Wei Tech | Nanyang Technological University |
Keywords: Human Performance Augmentation, Physical Human-Robot Interaction, Physically Assistive Devices, Planning Under Uncertainty
Abstract: In shared autonomy (SA), accurate user intent prediction is crucial for good robot assistance and avoiding user-robot conflicts. Prior works have relied on passive observation of joystick inputs to predict user intent, which works when the goals are clearly separated or when a common policy exists for multiple goals. However, they may not work well when grasping objects to perform daily activities, as there are multiple ways to grasp the same object. We demonstrate the need for active information-gathering in such cases and show how this can be done in a principled manner by formulating SA as a discrete action Partially Observable Markov Decision Process (POMDP), reasoning over high-level actions. One of our insights is that apart from having explicit information-gathering actions and goal-oriented actions, it is important to have actions that move towards a distribution of goals and provide no assistance in the POMDP action space. Compared to a method with no active information-gathering, our method performs tasks faster, requires less user input, and decreases opposing actions, especially for more complex objects, getting higher ratings and preference in our user study.
|
|
16:15-16:30, Paper TuBT7.2 | |
An Evaluation Framework of Human-Robot Teaming for Navigation among Movable Obstacles Via Virtual Reality-Based Interactions |
|
Huang, Ching-I | National Yang Ming Chiao Tung University |
Chou, Sun-Fu | National Yang Ming Chiao Tung University |
Liou, Li-Wei | National Yang Ming Chiao Tung University |
Moy, Nathan | George Mason University |
Wang, Chu-Ruei | National Yang Ming Chiao Tung University |
Wang, Hsueh-Cheng | National Yang Ming Chiao Tung University, Taiwan |
Ahn, Charles | George Mason University |
Huang, Chun-Ting | Qualcomm |
Yu, Lap-Fai | George Mason University |
Keywords: Human-Robot Teaming, Virtual Reality and Interfaces, Human-Robot Collaboration
Abstract: Robots are essential for tasks that are hazardous or beyond human capabilities. However, the results of the Defense Advanced Research Projects Agency (DARPA) Subterranean (SubT) Challenge revealed that despite various techniques for robot autonomy, human input is still required in some complex situations. Moreover, heterogeneous multirobot teams are often necessary. To manage these teams, effective user interfaces to support humans are required. Accordingly, we present a framework that enables intuitive oversight of a robot team through immersive virtual reality (VR) visualizations. The framework simplifies the management of complex navigation among movable obstacles (NAMO) tasks, such as search-and-rescue tasks. Specifically, the framework integrates a simulation of the environment with robot sensor data in VR to facilitate operator navigation, enhance robot positioning, and greatly improve operator situational awareness. The framework can also boost mission efficiency by seamlessly incorporating autonomous navigation algorithms, including NAMO algorithms, to reduce detours and operator workload. The framework is effective for operating in both simulated and real scenarios and is thus ideal for training or evaluating autonomous navigation algorithms. To validate the framework, we conducted user studies (N = 53) on the basis of the DARPA SubT Challenge’s search-and-rescue missions. Supplementary materials can be found at https://arg-nctu.github.io/projects/vr-navigation.html.
|
|
16:30-16:45, Paper TuBT7.3 | |
Optimizing Setup Configuration of a Collaborative Robot Arm-Based Bimanual Haptic Display for Enhanced Performance |
|
Lee, Joong-Ku | Korea Advanced Institute of Science and Technology (KAIST) |
Ryu, Jee-Hwan | Korea Advanced Institute of Science and Technology |
Keywords: Physical Human-Robot Interaction, Haptics and Haptic Interfaces, Human Factors and Human-in-the-Loop
Abstract: A bimanual haptic display that supports the full range of human arm motion and provides sufficient haptic feedback force/torque to a human operator is essential for achieving human-like manipulation in remote or virtual environments. This paper proposes the use of redundant collaborative robot arms as a bimanual haptic display and presents an optimization method to determine its ideal setup configuration. The proposed optimization method considers factors including human arm workspace coverage, redundancy, renderable haptic feedback force/torque, and potential collisions with a human arm. The proposed optimization method is applied to the Panda and iiwa 7 robot arms and the results are compared to the setup configurations of NimbRo and German Aerospace Center (DLR) HUG. The optimized setup configurations with the proposed method were observed to outperform the existing setup configurations.
|
|
16:45-17:00, Paper TuBT7.4 | |
A Smooth Velocity Transition Framework Based on Hierarchical Proximity Sensing for Safe Human-Robot Interaction |
|
Wang, Ruohan | Zhejiang University, Hangzhou, China |
Li, Chen | Zhejiang University |
Lyu, Honghao | Zhejiang University |
Pang, Gaoyang | The University of Sydney |
Wu, Haiteng | Hangzhou Shenhao Technology Co, Ltd |
Yang, Geng | Zhejiang University |
Keywords: Physical Human-Robot Interaction, Safety in HRI, Sensor-based Control
Abstract: With the rapid technology development pushing the introduction of the fifth industrial revolution, Industry 5.0, robots are getting rid of fences and sharing the workspace with humans. In such a context, ensuring the safety of humans and robots is a critical demand. One of the effective methods for this is proximity sensing, among which capacitive sensors are widely used to detect the proximity of humans. However, the capacitive sensor cannot get accurate distance information since the capacitance varies with the characteristics of obstacles. This work develops a capacitive robot skin, seamlessly integrated into the proposed smooth velocity transition framework to deal with this challenge. The robot skin is customized to cover a large area on the exterior of a 6-DOF robot arm. A hierarchical proximity perception approach is used to grade the sensing state. Based on this, distance reduction and collision avoidance velocity generation methods are used to reach a smooth and quick decay of the velocity. The control strategy is applied in a pick-and-place scenery for verification. Compared to the traditional threshold trigger method, the proposed smooth velocity transition framework can greatly reduce the absolute value of the local maximum acceleration, which can enable a flexible and natural human-robot interaction while ensuring human safety.
|
|
TuBT8 |
Room 8 |
Intelligent Transportation Systems I |
Regular session |
Chair: Zhao, Ding | Carnegie Mellon University |
|
16:00-16:15, Paper TuBT8.1 | |
SIMPL: A Simple and Efficient Multi-Agent Motion Prediction Baseline for Autonomous Driving |
|
Zhang, Lu | Hong Kong University of Science and Technology |
Li, Peiliang | HKUST, Robotics Institute |
Liu, Sikang | DJI |
Shen, Shaojie | Hong Kong University of Science and Technology |
Keywords: Deep Learning Methods, Intelligent Transportation Systems, Representation Learning
Abstract: This paper presents a simple and efficient motion prediction baseline (SIMPL) for autonomous vehicles. Unlike conventional agent-centric methods with high accuracy but repetitive computations and scene-centric methods with compromised accuracy and generalizability, SIMPL delivers real-time, accurate motion predictions for all relevant traffic participants. To achieve improvements in both accuracy and inference speed, we propose a compact and efficient global feature fusion module that performs directed message passing in a symmetric manner, enabling the network to forecast future motion for all road users in a single feed-forward pass and mitigating accuracy loss caused by viewpoint shifting. Additionally, we investigate the continuous trajectory parameterization using Bernstein basis polynomials in trajectory decoding, allowing evaluations of states and their higher-order derivatives at any desired time point, which is valuable for downstream planning tasks. As a strong baseline, SIMPL exhibits highly competitive performance on Argoverse 1 & 2 motion forecasting benchmarks compared with other state-of-the-art methods. Furthermore, its lightweight design and low inference latency make SIMPL highly extensible and promising for real-world onboard deployment. We open-source the code at https://github.com/HKUST-Aerial-Robotics/SIMPL.
|
|
16:15-16:30, Paper TuBT8.2 | |
A Safe Preference Learning Approach for Personalization with Applications to Autonomous Vehicles |
|
Karagulle, Ruya | University of Michigan |
Arechiga, Nikos | Toyota Research Institute |
Best, Andrew | Toyota Research Institute |
DeCastro, Jonathan | Cornell University |
Ozay, Necmiye | Univ. of Michigan |
Keywords: Formal Methods in Robotics and Automation, Safety in HRI, Learning from Demonstration
Abstract: This work introduces a preference learning method that ensures adherence to given specifications, with an application to autonomous vehicles. Our approach incorporates the priority ordering of Signal Temporal Logic (STL) formulas describing traffic rules into a learning framework. By leveraging Parametric Weighted Signal Temporal Logic (PWSTL), we formulate the problem of safety-guaranteed preference learning based on pairwise comparisons and propose an approach to solve this learning problem. Our approach finds a feasible valuation for the weights of the given PWSTL formula such that, with these weights, preferred signals have weighted quantitative satisfaction measures greater than their non preferred counterparts. The feasible valuation of weights given by our approach leads to a weighted STL formula that can be used in correct-and-custom-by-construction controller synthesis. We demonstrate the performance of our method with a pilot human subject study in two different simulated driving scenarios involving a stop sign and a pedestrian crossing. Our approach yields competitive results compared to existing preference learning methods in terms of capturing preferences, and notably outperforms them when safety is considered.
|
|
16:30-16:45, Paper TuBT8.3 | |
Safety-Aware Causal Representation for Trustworthy Offline Reinforcement Learning in Autonomous Driving |
|
Lin, Haohong | Carnegie Mellon University |
Ding, Wenhao | Carnegie Mellon University |
Liu, Zuxin | Carnegie Mellon University |
Niu, Yaru | Carnegie Mellon University |
Zhu, Jiacheng | Carnegie Mellon University |
Niu, Yuming | Ford Motor Company |
Zhao, Ding | Carnegie Mellon University |
Keywords: Intelligent Transportation Systems, Representation Learning, Reinforcement Learning
Abstract: In the domain of autonomous driving, the offline Reinforcement Learning~(RL) approaches exhibit notable efficacy in addressing sequential decision-making problems from offline datasets. However, how to maintain safety under diverse safety-critical scenarios remains a significant challenge due to the long-tailed and unforeseen scenarios absent from offline datasets. In this paper, we introduce the saFety-aware strUctured Scenario representatION (FUSION), a pioneering representation learning method in offline RL to facilitate the learning of a generalizable end-to-end driving policy by leveraging structured scenario information. FUSION capitalizes on the causal relationships between decomposed reward, cost, state, and action space, constructing a framework for structured sequential reasoning under dynamic traffic environments. We conduct rigorous evaluations in two typical real-world settings of distribution shift in autonomous vehicles, demonstrating the good balance between safety cost and utility reward of FUSION compared to contemporary state-of-the-art safe RL and IL baselines. Empirical evidence under diverse driving scenarios attests that FUSION significantly enhances the safety and generalizability of autonomous driving agents, even in the face of challenging and unseen environments. Furthermore, our ablation studies reveal noticeable improvements in the integration of causal representation into the offline safe RL algorithm. Our project page is available at: https://sites.google.com/view/safe-fusion/
|
|
16:45-17:00, Paper TuBT8.4 | |
Decoupling-Based LPV Observer for Driver Torque Intervention Estimation in Human-Machine Shared Driving under Uncertain Vehicle Dynamics (I) |
|
Nguyen, Anh-Tu | INSA Hauts-De-France, Université Polytechnique Hauts-De-France |
Guerra, Thierry Marie | Polytechnic University Hauts-De-France |
Sentouh, Chouki | LAMIH UMR CNRS 8201, Université Polytechnique Hauts-De-France |
Popieul, Jean-Christophe | Université Polytechnique Hauts-De-France |
Keywords: Intelligent Transportation Systems, Robust/Adaptive Control, Neural and Fuzzy Control
Abstract: This paper proposes a method for simultaneous estimation of both the driver torque and the sideslip angle within the context of human-machine shared driving control for autonomous ground vehicles. To this end, the driver torque is considered as an unknown input (UI) and the sideslip angle is an unmeasured state of the vehicle dynamics system. For simultaneous estimation purpose, a decoupling-based technique is leveraged to design an unknown input observer (UIO). The UIO design goal is to decouple the effect of the unknown driver torque while minimizing the influence of the modeling uncertainties, considered as unknown exogenous disturbances, from the lateral tires forces and the steering system. Linear parameter-varying (LPV) framework is used to deal with the time-varying nature of the vehicle longitudinal speed. Based on Lyapunov stability theory, we derive sufficient conditions, expressed in terms of linear matrix inequality (LMI) constraints, for LPV unknown input observer design. The simultaneous vehicle estimation is reformulated as a convex optimization problem, where the modeling uncertainty influence can be minimized via the L_infty gain performance. Hardware-in-the-loop (HiL) tests are performed with the SHERPA dynamic simulator and a human driver to show the effectiveness of the proposed UIO-based estimation method, especially within the cooperative driving control framework.
|
|
TuBT9 |
Room 9 |
Semantic Scene Understanding I |
Regular session |
Chair: Beltrame, Giovanni | Ecole Polytechnique De Montreal |
|
16:00-16:15, Paper TuBT9.1 | |
Follow Anything: Open-Set Detection, Tracking, and Following in Real-Time |
|
Maalouf, Alaa | MIT |
Jadhav, Ninad | Harvard University |
Jatavallabhula, Krishna Murthy | MIT |
Chahine, Makram | Massachusetts Institute of Technology |
Vogt, Daniel | Harvard University |
Wood, Robert | Harvard University |
Torralba, Antonio | MIT |
Rus, Daniela | MIT |
Keywords: AI-Enabled Robotics, Semantic Scene Understanding, Object Detection, Segmentation and Categorization
Abstract: Tracking and following objects of interest is critical to several robotics use cases, ranging from industrial automation to logistics and warehousing, to healthcare and security. In this paper, we present a robotic system to detect, track, and follow any object in real-time. Our approach, dubbed emph{follow anything} (FAn), is an open-vocabulary and multimodal model -- it is not restricted to concepts seen at training time and can be applied to novel classes at inference time using text, images, or click queries. Leveraging rich visual descriptors from large-scale pre-trained models (emph{foundation models}), FAn can detect and segment objects by matching multimodal queries (text, images, clicks) against an input image sequence. These detected and segmented objects are tracked across image frames, all while accounting for occlusion and object re-emergence. We demonstrate FAn on a real-world robotic system (a micro aerial vehicle) and report its ability to seamlessly follow the objects of interest in a real-time control loop. FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second. To enable rapid adoption, deployment, and extensibility, we open-source our code on our href{https://github.com/alaamaalouf/FollowAnything}{project webpage}. We also encourage the reader to watch our 5-minute href{https://www.youtube.com/watch?v=6Mgt3EPytrw}{explainer video}.
|
|
16:15-16:30, Paper TuBT9.2 | |
FM-Fusion: Instance-Aware Semantic Mapping Boosted by Vision-Language Foundation Models |
|
Liu, Chuhao | Hong Kong University of Science and Technology |
Wang, Ke | Chang'an University |
Shi, Jieqi | The Hong Kong University of Science and Technology |
Qiao, Zhijian | Hong Kong University of Science and Technology |
Shen, Shaojie | Hong Kong University of Science and Technology |
Keywords: Semantic Scene Understanding, Mapping, RGB-D Perception
Abstract: Semantic mapping based on the supervised object detectors is sensitive to image distribution. The object detection and segmentation performance can seriously drop in a real-world environment, preventing semantic mapping from being used in a wider domain. On the other hand, the development of vision-language foundation models demonstrates strong zero-shot transferability across data distribution. It provides an opportunity to construct generalizable instance-aware semantic maps. Hence, this work explores how to boost instance-aware semantic mapping from object detection generated from foundation models. We propose a probabilistic label fusion method to predict close-set semantic classes from open-set label measurements. An instance refinement module merges the over-segmented instances caused by inconsistent segmentation. We integrate all the modules into a unified semantic mapping system. Reading a sequence of RGB-D input, our work incrementally reconstructs an instance-aware semantic map. We evaluate the zero-shot performance of our method in ScanNet and SceneNN datasets. Our method achieves 40.3 mAP on the ScanNet semantic instance segmentation task. It outperforms the traditional semantic mapping method significantly.
|
|
16:30-16:45, Paper TuBT9.3 | |
Uni-DVPS: Unified Model for Depth-Aware Video Panoptic Segmentation |
|
Ji-Yeon, Kim | POSTECH |
Oh, Hyun-Bin | POSTECH |
Kwon, Byungki | Pohang University of Science and Technology |
Kim, Dahun | KAIST |
Kwon, Yongjin | Electronics and Telecommunications Research Institute |
Oh, Tae-Hyun | POSTECH |
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Visual Learning
Abstract: We present Uni-DVPS, a unified model for Depth-aware Video Panoptic Segmentation (DVPS) that jointly tackles distinct vision tasks, i.e., video panoptic segmentation, monocular depth estimation, and object tracking. In contrast to the prior works that adopt diverged decoder networks tailored for each task, we propose an architecture with a unified Transformer decoder network. We design a single Transformer decoder network for multi-task learning to increase shared operations to facilitate the synergies between tasks and exhibit high efficiency. We also observe that our unified query learns instance-aware representation guided by multi-task supervision, which encourages query-based tracking and obviates the need for training extra tracking module. We validate our architectural design choices with experiments on Cityscapes-DVPS and SemKITTI-DVPS datasets. The performances of all tasks are jointly improved, and we achieve state-of-the-art results on DVPQ metric for both datasets.
|
|
16:45-17:00, Paper TuBT9.4 | |
BEVGM: A Visual Place Recognition Method with Bird's Eye View Graph Matching |
|
Niu, Haochen | Shanghai Jiao Tong University |
Liu, Peilin | Shanghai Jiao Tong Universit |
Ji, Xingwu | Shanghai Jiao Tong University |
Zhang, Lantao | Shanghai Jiao Tong University |
Ying, Rendong | Shangha | |