| |
Last updated on May 13, 2025. This conference program is tentative and subject to change
Technical Program for Thursday May 22, 2025
|
ThAT1 |
302 |
Planning and Large Language Models |
Regular Session |
Chair: Ikeuchi, Katsushi | Microsoft |
Co-Chair: Paulius, David | Brown University |
|
08:30-08:35, Paper ThAT1.1 | |
DELTA: Decomposed Efficient Long-Term Robot Task Planning Using Large Language Models |
|
Liu, Yuchen | Robert Bosch GmbH |
Palmieri, Luigi | Robert Bosch GmbH |
Koch, Sebastian | Ulm University, Robert Bosch GmbH |
Georgievski, IIche | University of Stuttgart |
Aiello, Marco | University of Stuttgart |
Keywords: Task Planning, AI-Based Methods, Planning, Scheduling and Coordination
Abstract: Recent advancements in Large Language Models (LLMs) have sparked a revolution across many research fields. In robotics, the integration of common-sense knowledge from LLMs into task and motion planning has drastically advanced the field by unlocking unprecedented levels of context awareness. Despite their vast collection of knowledge, large language models may generate infeasible plans due to hallucinations or missing domain information. To address these challenges and improve plan feasibility and computational efficiency, we introduce DELTA, a novel LLM-informed task planning approach. By using scene graphs as environment representations within LLMs, DELTA achieves rapid generation of precise planning problem descriptions. To enhance planning performance, DELTA decomposes long-term task goals with LLMs into an autoregressive sequence of sub-goals, enabling automated task planners to efficiently solve complex problems. In our extensive evaluation, we show that DELTA enables an efficient and fully automatic task planning pipeline, achieving higher planning success rates and significantly shorter planning times compared to the state of the art.
|
|
08:35-08:40, Paper ThAT1.2 | |
Hey Robot! Personalizing Robot Navigation through Model Predictive Control with a Large Language Model |
|
Martinez-Baselga, Diego | University of Zaragoza |
de Groot, Oscar | Delft University of Technology |
Knoedler, Luzia | Delft University of Technology |
Alonso-Mora, Javier | Delft University of Technology |
Riazuelo, Luis | Instituto De Investigación En IngenieríadeAragón, University of Z |
Montano, Luis | Universidad De Zaragoza |
Keywords: Motion and Path Planning, Human-Centered Robotics, Human-Aware Motion Planning
Abstract: Robot navigation methods allow mobile robots to operate in applications such as warehouses or hospitals. While the environment in which the robot operates imposes requirements on its navigation behavior, most existing methods do not allow the end-user to configure the robot's behavior and priorities, possibly leading to undesirable behavior (e.g., fast driving in a hospital). We propose a novel approach to adapt robot motion behavior based on natural language instructions provided by the end-user. Our zero-shot method uses an existing Visual Language Model to interpret a user text query or an image of the environment. This information is used to generate the cost function and reconfigure the parameters of a Model Predictive Controller, translating the user's instruction to the robot's motion behavior. This allows our method to safely and effectively navigate in dynamic and challenging environments. We extensively evaluate our method's individual components and demonstrate the effectiveness of our method on a ground robot in simulation and real-world experiments, and across a variety of environments and user specifications.
|
|
08:40-08:45, Paper ThAT1.3 | |
Large Language Model Based Autonomous Task Planning for Abstract Commands |
|
Kwon, Seokjoon | Korea Advanced Institute of Science and Technology |
Park, Jae-Hyeon | Samsung Display |
Jang, Hee-Deok | Korea Advanced Institute of Science Technology |
Roh, CheolLae | Samsung Display Co |
Chang, Dong Eui | KAIST |
Keywords: Task Planning, Computer Vision for Automation, Robotics and Automation in Life Sciences
Abstract: Recent advances in large language models (LLMs) have demonstrated exceptional reasoning capabilities in natural language processing, sparking interest in applying LLMs to task planning problems in robotics. Most studies focused on task planning for clear natural language commands that specify target objects and their locations. However, for more user-friendly task execution, it is crucial for robots to autonomously plan and carry out tasks based on abstract natural language commands that may not explicitly mention target objects or locations, such as ‘Put the food ingredients in the same place.’ In this study, we propose an LLM-based autonomous task planning framework that generates task plans for abstract natural language commands. This framework consists of two phases: an environment recognition phase and a task planning phase. In the environment recognition phase, a large vision-language model generates a hierarchical scene graph that captures the relationships between objects and spaces in the environment surrounding a robot agent. During the task planning phase, an LLM uses the scene graph and the abstract user command to formulate a plan for the given task. We validate the effectiveness of the proposed framework in the AI2THOR simulation environment, demonstrating its superior performance in task execution when handling abstract commands.
|
|
08:45-08:50, Paper ThAT1.4 | |
Self-Corrective Task Planning by Inverse Prompting with Large Language Models |
|
Lee, Jiho | Chung-Ang University |
Lee, Hayun | Chung-Ang University |
Kim, Jonghyeon | Chung-Ang University |
Lee, Kyungjae | Korea University |
Kim, Eunwoo | Chung-Ang University |
Keywords: Task Planning
Abstract: In robot task planning, large language models (LLMs) have shown significant promise in generating complex and long-horizon action sequences. However, it is observed that LLMs often produce responses that sound plausible but are not accurate. To address these problems, existing methods typically employ predefined error sets or external knowledge sources, requiring human efforts and computation resources. Recently, self-correction approaches have emerged, where LLM generates and refines plans, identifying errors by itself. Despite their effectiveness, they are more prone to failures in correction due to insufficient reasoning. In this paper, we propose a novel self-corrective planning of tasks with inverse prompting, named InversePrompt, which contains reasoning steps to provide interpretable groundings for feedback. It generates the inverse actions corresponding to generated actions and verifies if these inverse actions can restore the system to its original state, thereby explicitly validating the logical flow of the generated plans. The results on benchmark datasets show an average 16.3% higher success rate over existing LLM-based task planning methods. Our approach offers clearer justifications for feedback in real-world environments, resulting in more successful task completion than existing self-correction approaches across various scenarios.
|
|
08:50-08:55, Paper ThAT1.5 | |
Traffic Regulation-Aware Path Planning with Regulation Databases and Vision-Language Models |
|
Han, Xu | University of California Los Angeles |
Wu, Zhiwen | University of California, Los Angeles |
Xia, Xin | University of California, Los Angeles |
Ma, Jiaqi | University of California, Los Angeles |
Keywords: Motion and Path Planning, Integrated Planning and Control, Planning under Uncertainty
Abstract: This paper introduces and tests a framework that integrates traffic regulation compliance into automated driving systems (ADS). The framework enables ADS to follow traffic laws and make informed decisions based on the driving environment. Using RGB camera inputs and a vision-language model (VLM), the system generates descriptive text to support a regulation-aware decision-making process, ensuring legal and safe driving practices. This information is combined with a machine-readable ADS regulation database to guide future driving plans within legal constraints. Key features include: 1) a regulation database supporting ADS decision-making, 2) an automated process using sensor input for regulation-aware path planning, and 3) validation in both simulated and real-world environments. Particularly, the real-world vehicle tests not only assess the framework's performance but also evaluate the potential and challenges of VLMs to solve complex driving problems by integrating detection, reasoning, and planning. This work enhances the legality, safety, and public trust in ADS, representing a significant step forward in the field.
|
|
08:55-09:00, Paper ThAT1.6 | |
DrPlanner: Diagnosis and Repair of Motion Planners for Automated Vehicles Using Large Language Models |
|
Lin, Yuanfei | Technical University of Munich |
Li, Chenran | University of California, Berkeley |
Ding, Mingyu | UC Berkeley |
Tomizuka, Masayoshi | University of California |
Zhan, Wei | Univeristy of California, Berkeley |
Althoff, Matthias | Technische Universität München |
Keywords: Integrated Planning and Learning, Motion and Path Planning, Intelligent Transportation Systems
Abstract: Motion planners are essential for the safe operation of automated vehicles across various scenarios. However, no motion planning algorithm has achieved perfection in the literature, and improving its performance is often time-consuming and labor-intensive. To tackle the aforementioned issues, we present DrPlanner, the first framework designed to automatically diagnose and repair motion planners using large language models. Initially, we generate a structured description of the planner and its planned trajectories from both natural and programming languages. Leveraging the profound capabilities of large language models in addressing reasoning challenges, our framework returns repaired planners with detailed diagnostic descriptions. Furthermore, the framework advances iteratively with continuous feedback from the evaluation of the repaired outcomes. Our approach is validated using both search- and sampling-based motion planners for automated vehicles; experimental results highlight the need for demonstrations in the prompt and show the ability of our framework to effectively identify and rectify elusive issues.
|
|
ThAT2 |
301 |
SLAM 5 |
Regular Session |
Chair: Zelek, John S. | University of Waterloo |
Co-Chair: Younčs, Raoui | University Mohammed V in Rabat |
|
08:30-08:35, Paper ThAT2.1 | |
MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization |
|
Zhu, Pengcheng | Northeastern University |
Zhuang, Yaoming | Northeastern University |
Chen, Baoquan | Northeastern University |
Li, Li | Northeastern University |
Wu, Chengdong | Northeastern University |
Liu, Zhanlin | University of Washington |
Keywords: SLAM, Mapping
Abstract: This letter introduces a novel framework for dense Visual Simultaneous Localization and Mapping (VSLAM) based on Gaussian Splatting. Recently, SLAM based on Gaussian Splatting has shown promising results. However, in monocular scenarios, the Gaussian maps reconstructed lack geometric accuracy and exhibit weaker tracking capability. To address these limitations, we jointly optimize sparse visual odometry tracking and 3D Gaussian Splatting scene representation for the first time. Estimating depth maps on visual odometry keyframes window using a fast Multi-View Stereo (MVS) network for the geometric supervision of Gaussian maps. Furthermore, we propose a depth smooth loss and Sparse-Dense Adjustment Ring (SDAR) to reduce the negative effect of estimated depth maps and preserve the consistency in scale between the visual odometry and Gaussian maps. We have evaluated our system across various synthetic and real-world datasets. The accuracy of our poses estimation surpasses existing methods and achieves state-of-the-art. Additionally, it outperforms previous monocular methods in terms of novel view synthesis and geometric reconstruction fidelities.
|
|
08:35-08:40, Paper ThAT2.2 | |
GARAD-SLAM: 3D GAussian Splatting for Real-Time Anti Dynamic SLAM |
|
Li, Mingrui | Dalian University of Technology |
Chen, Weijian | Sun Yat-Sen University |
Cheng, Na | Dalian University of Technology |
Xu, Jingyuan | Dalian University of Technology |
Li, Dong | University of Macau |
Wang, Hongyu | Dalian University of Technology |
Keywords: SLAM, Mapping, Localization
Abstract: The 3D Gaussian Splatting (3DGS)-based SLAM system has garnered widespread attention due to its excellent performance in real-time high-fidelity rendering. However, in real-world environments filled with dynamic objects, existing 3DGS-based SLAM systems often face mapping errors and tracking drift issues. To address this, we propose GARAD-SLAM, a real-time 3DGS-based SLAM system tailored for dynamic scenes. In terms of tracking, unlike traditional methods, we directly perform dynamic segmentation on Gaussians and map them back to the front end to obtain dynamic point labels through a Gaussian pyramid network, achieving precise dynamic removal and robust tracking. For mapping, we impose rendering penalties on dynamically labeled Gaussians updated through the network to avoid irreversible erroneous removal caused by simple pruning. Our results on real-world datasets demonstrate that our method is competitive in tracking compared to baseline methods, generating fewer artifacts and higher-quality reconstructions in rendering.
|
|
08:40-08:45, Paper ThAT2.3 | |
Optimizing NeRF-Based SLAM with Trajectory Smoothness Constraints |
|
He, Yicheng | Southern University of Science and Technology |
Chen, Guangcheng | Southern University of Science and Technology |
Zhang, Hong | SUSTech |
Keywords: SLAM, Localization, Mapping
Abstract: The joint optimization of Neural Radiance Fields (NeRF) and camera trajectories has been widely applied in SLAM tasks due to its superior dense mapping quality and consistency. NeRF-based SLAM learns camera poses using constraints by implicit map representation. A widely observed phenomenon that results from the constraints of this form is jerky and physically unrealistic estimated camera motion, which in turn affects the map quality. To address this deficiency of current NeRF-based SLAM, we propose in this paper TS-SLAM (TS for Trajectory Smoothness). It introduces smoothness constraints on camera trajectories by representing them with uniform cubic B-splines with continuous acceleration that guarantees smooth camera motion. Benefiting from the differentiability and local control properties of B-splines, TS-SLAM can incrementally learn the control points end-to-end using a sliding window paradigm. Additionally, we regularize camera trajectories by exploiting the dynamics prior to further smooth trajectories. Experimental results demonstrate that TS-SLAM achieves superior trajectory accuracy and improves mapping quality versus NeRF-based SLAM that does not employ the above smoothness constraints.
|
|
08:45-08:50, Paper ThAT2.4 | |
MGSO: Monocular Real-Time Photometric SLAM with Efficient 3D Gaussian Splatting |
|
Hu, Kevin | University of Waterloo |
Abboud, Nicolas | American University of Beirut |
Ali, Muhammad Q. | University of Waterloo |
Yang, Adam Srebrnjak | University of Waterloo |
Elhajj, Imad | American University of Beirut |
Asmar, Daniel | American University of Beirut |
Chen, Yuhao | University of Waterloo |
Zelek, John S. | University of Waterloo |
Keywords: SLAM, Mapping, Vision-Based Navigation
Abstract: Real-time SLAM with dense 3D mapping is computationally challenging, especially on resource-limited devices. The recent development of 3D Gaussian Splatting (3DGS) offers a promising approach for real-time dense 3D reconstruction. However, existing 3DGS-based SLAM systems struggle to balance hardware simplicity, speed, and map quality. Most systems excel in one or two of the aforementioned aspects but rarely achieve all. A key issue is the difficulty of initializing 3D Gaussians while concurrently conducting SLAM. To address these challenges, we present Monocular GSO (MGSO), a novel real-time SLAM system that integrates photometric SLAM with 3DGS. Photometric SLAM provides dense structured point clouds for 3DGS initialization, accelerating optimization and producing more efficient maps with fewer Gaussians. As a result, experiments show that our system generates reconstructions with a balance of quality, memory efficiency, and speed that outperforms the state-of-the-art. Furthermore, our system achieves all results using RGB inputs. We evaluate the Replica, TUM-RGBD, and EuRoC datasets against current live dense reconstruction systems. Not only do we surpass contemporary systems, but experiments also show that we maintain our performance on laptop hardware, making it a practical solution for robotics, A/R, and other real-time applications.
|
|
08:50-08:55, Paper ThAT2.5 | |
RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes |
|
Yu, Sicheng | HKUST(gz) |
Cheng, Chong | HKUST(GZ) |
Zhou, Yifan | The Hong Kong University of Science and Technology (Guangzhou) |
Yang, Xiaojun | The Hong Kong University of Science and Technology (Guangzhou) |
Wang, Hao | HKUST(GZ) |
Keywords: Deep Learning for Visual Perception, Visual Learning, SLAM
Abstract: 3D Gaussian Splatting (3DGS) has become a popular solution in SLAM, as it can produce high-fidelity novel views. However, previous GS-based methods primarily target indoor scenes and rely on RGB-D sensors or pre-trained depth estimation models, hence underperforming in outdoor scenarios. To address this issue, we propose a RGB-only gaussian splatting SLAM method for unbounded outdoor scenes—OpenGS-SLAM. Technically, we first employ a pointmap regression network to generate consistent pointmaps between frames for pose estimation. Compared to commonly used depth maps, pointmaps include spatial relationships and scene geometry across multiple views, enabling robust camera pose estimation. Then, we propose integrating the estimated camera poses with 3DGS rendering as an end-to-end differentiable pipeline. Our method achieves simultaneous optimization of camera poses and 3DGS scene parameters, significantly enhancing system tracking accuracy. Specifically, we also design an adaptive scale mapper for the pointmap regression network, which provides more accurate pointmap mapping to the 3DGS map representation. Our experiments on the Waymo dataset demonstrate that OpenGS-SLAM reduces tracking error to 9.8% of previous 3DGS methods, and achieves state-of-the-art results in novel view synthesis. Project page: https://opengsslam.github.io/.
|
|
08:55-09:00, Paper ThAT2.6 | |
FGO-SLAM: Enhancing Gaussian SLAM with Globally Consistent Opacity Radiance Field |
|
Zhu, Fan | University of Science and Technology of China |
Zhao, Yifan | University of Science and Technology of China |
Chen, Ziyu | University of Science and Technology of China |
Yu, Biao | Hefei Institutes of Physical Science, Chinese Academy of Science |
Zhu, Hui | Hefei Institutes of Physical Science, Chinese Academy of Science |
Keywords: Mapping, SLAM, Embodied Cognitive Science
Abstract: Visual SLAM has regained attention due to its ability to provide perception capabilities and simulation test data for Embodied AI. However, traditional SLAM systems struggle to meet the demands of high-quality scene reconstruction, and Gaussian SLAM systems, despite their rapid rendering and high-quality mapping capabilities, lack effective pose optimization methods and face challenges in geometric reconstruction. To address these issues, we introduce FGO-SLAM, a Gaussian SLAM system that employs an opacity radiance field as the scene representation to enhance geometric mapping performance. After initial pose estimation, we apply global adjustment to optimize camera poses and sparse point cloud, ensuring robust tracking of our system. Additionally, we maintain a globally consistent opacity radiance field based on 3D Gaussians and introduce depth distortion and normal consistency terms to refine the scene representation. Furthermore, after constructing tetrahedral grids, we identify level sets to directly extract surfaces from 3D Gaussians. Results across various real-world and large-scale synthetic datasets demonstrate that our method achieves state-of-the-art tracking accuracy and mapping performance.
|
|
ThAT3 |
303 |
Point Cloud Registration |
Regular Session |
Chair: Fraundorfer, Friedrich | Graz University of Technology |
Co-Chair: Lim, Hyungtae | Massachusetts Institute of Technology |
|
08:30-08:35, Paper ThAT3.1 | |
Multi-View Registration of Partially Overlapping Point Clouds for Robotic Manipulation |
|
Xie, Yuzhen | Southeast University |
Song, Aiguo | Southeast University |
Keywords: RGB-D Perception, Computer Vision for Automation, Data Sets for Robotic Vision
Abstract: Point cloud registration is a fundamental task in intelligent robots, aiming to achieve globally consistent geometric structures and providing data support for robotic manipulation. Due to the limited view of measurement devices, it is necessary to collect point clouds from multiple views to construct a complete model. Previous multi-view registration methods rely on sufficient overlap and registering all pairs of point clouds, resulting in slow convergence and high cumulative errors. To solve these challenges, we present a multi-view registration method based on the point-to-plane model and pose graph. We introduce a robust kernel into the objective function to diminish registration errors caused by mismatched points. Additionally, an enhanced Euclidean clustering method is proposed for extracting object point clouds. Subsequently, by establishing pose constraints on non-adjacent frames of point clouds, the cumulative error is reduced, achieving global optimization based on the pose graph. Experimental results demonstrate the robustness of our method with respect to overlap ratios, successfully registering point clouds with overlap ratio exceeding 30%. In comparison to other techniques, our method can reduce the E(R) of multi-view registration by 13.54% and E(t) by 18.72%, effectively reducing the cumulative error.
|
|
08:35-08:40, Paper ThAT3.2 | |
Kinematic-ICP: Enhancing LiDAR Odometry with Kinematic Constraints for Wheeled Mobile Robots Moving on Planar Surfaces |
|
Guadagnino, Tiziano | University of Bonn |
Mersch, Benedikt | University of Bonn |
Vizzo, Ignacio | Dexory |
Gupta, Saurabh | University of Bonn |
Malladi, Meher Venkata Ramakrishna | University of Bonn |
Lobefaro, Luca | University of Bonn |
Doisy, Guillaume | Dexory |
Stachniss, Cyrill | University of Bonn |
Keywords: Localization, Mapping
Abstract: LiDAR odometry is essential for many robotics applications, including 3D mapping, navigation, and simultaneous localization and mapping. LiDAR odometry systems are usually based on some form of point cloud registration to compute the ego-motion of a mobile robot. Yet, few of today's LiDAR odometry systems consider domain-specific knowledge or the kinematic model of the mobile platform during the point cloud alignment. In this paper, we present Kinematic-ICP, a LiDAR odometry system that focuses on wheeled mobile robots equipped with a 3D LiDAR and moving on a planar surface, which is a common assumption for warehouses, offices, hospitals, etc. Our approach introduces kinematic constraints within the optimization of a traditional point-to-point iterative closest point scheme. In this way, the resulting motion follows the kinematic constraints of the platform, effectively exploiting the robot's wheel odometry and the 3D LiDAR observations. We dynamically adjust the influence of LiDAR measurements and wheel odometry in our optimization scheme, allowing the system to handle degenerate scenarios such as feature-poor corridors. We evaluate our approach on robots operating in large-scale warehouse environments, but also outdoors. The experiments show that our approach achieves top performances and is more accurate than wheel odometry and common LiDAR odometry systems. Kinematic-ICP has been recently deployed in the Dexory fleet of robots operating in warehouses worldwide at their customers' sites, showing that our method can run in the real world alongside a complete navigation stack.
|
|
08:40-08:45, Paper ThAT3.3 | |
GERA: Geometric Embedding for Efficient Point Registration Analysis |
|
Li, Geng | Shandong University |
Cao, Haozhi | Nanyang Technological University |
Liu, Mingyang | Shandong University |
Yuan, Shenghai | Nanyang Technological University |
Yang, Jianfei | Nanyang Technological University |
Keywords: Computer Vision for Medical Robotics, Representation Learning, Medical Robots and Systems
Abstract: Point cloud registration aims to provide estimated transformations to align 3D point clouds, which plays a crucial role in pose estimation of various navigation systems, such as surgical guidance systems and autonomous vehicles. Despite the impressive performance of recent models on benchmark datasets, many rely on complex modules like KPConv and Transformers, which impose significant computational and memory demands. These requirements hinder their practical application, particularly in resource-constrained environments such as mobile robotics. In this paper, we propose a novel point cloud registration network that leverages a pure MLP architecture, constructing geometric information offline. This approach eliminates the computational and memory burdens associated with traditional complex feature extractors and significantly reduces training time and resource consumption. Our method is the first to replace 3D coordinate inputs with offline-constructed geometric encoding, improving generalization and stability, as demonstrated by Maximum Mean Discrepancy (MMD) comparisons. This efficient and accurate geometric representation marks a significant advancement in point cloud analysis, particularly for applications requiring fast and reliable processing.
|
|
08:45-08:50, Paper ThAT3.4 | |
KISS-Matcher: Fast and Robust Point Cloud Registration Revisited |
|
Lim, Hyungtae | Massachusetts Institute of Technology |
Kim, Daebeom | Korea Advanced Institute of Science and Technology |
Shin, Gunhee | KAIST |
Shi, Jingnan | Massachusetts Institute of Technology |
Vizzo, Ignacio | Dexory |
Myung, Hyun | KAIST (Korea Advanced Institute of Science and Technology) |
Park, Jaesik | Seoul National University |
Carlone, Luca | Massachusetts Institute of Technology |
Keywords: Mapping, Localization, SLAM
Abstract: While global point cloud registration systems have advanced significantly in all aspects, many studies have focused on specific components, such as feature extraction, graph-theoretic pruning, or pose solvers. In this paper, we take a holistic view on the registration problem and develop an open-source and versatile C++ library for point cloud registration, called textit{KISS-Matcher}. textit{KISS-Matcher} combines a novel feature detector, textit{Faster-PFH}, that improves over the classical fast point feature histogram (FPFH). Moreover, it adopts a k-core-based graph-theoretic pruning to reduce the time complexity of rejecting outlier correspondences. Finally, it combines these modules in a complete, user-friendly, and ready-to-use pipeline. As verified by extensive experiments, KISS-Matcher has superior scalability and broad applicability, achieving a substantial speed-up compared to state-of-the-art outlier-robust registration pipelines while preserving accuracy. Our code will be available at href{https://github.com/MIT-SPARK/KISS-Matcher}{texttt{ht tps://github.com/MIT-SPARK/KISS-Matcher}}.
|
|
08:50-08:55, Paper ThAT3.5 | |
SANDRO: A Robust Solver with a Splitting Strategy for Point Cloud Registration |
|
Adlerstein, Michael | Italian Institute of Technology |
Soares, Joăo Carlos Virgolino | Istituto Italiano Di Tecnologia |
Bratta, Angelo | Istituto Italiano Di Tecnologia |
Semini, Claudio | Istituto Italiano Di Tecnologia |
Keywords: RGB-D Perception, Mapping
Abstract: Point cloud registration is a critical problem in computer vision and robotics, especially in the field of navigation. Current methods often fail when faced with high outlier rates or take a long time to converge to a suitable solution. In this work, we introduce a novel algorithm for point cloud registration called SANDRO (Splitting strategy for point cloud Alignment using Non-convex anD Robust Optimization), which combines an Iteratively Reweighted Least Squares (IRLS) framework with a robust loss function with graduated non-convexity. This approach is further enhanced by a splitting strategy designed to handle high outlier rates and skewed distributions of outliers. SANDRO is capable of addressing important limitations of existing methods, as in challenging scenarios where the presence of high outlier rates and point cloud symmetries significantly hinder convergence. SANDRO achieves superior performance in terms of success rate when compared to the state-of-the-art methods, demonstrating a 20% improvement from the current state of the art when tested on the Redwood real dataset and 60% improvement when tested on synthetic data.
|
|
08:55-09:00, Paper ThAT3.6 | |
Bridging In-Situ and Satellite Data: Enhancing Gas Concentration Estimation through Integration of Data-Driven and Physics-Based Modeling |
|
Lu, Guoyu | University of Georgia |
Keywords: RGB-D Perception, Vision-Based Navigation, Visual Tracking
Abstract: Gas concentration estimation is crucial for understanding and mitigating climate change. While most research and monitoring efforts focus on major greenhouse gases such as CO2, significantly less attention has been given to trace gases like NO2, which play a critical role in atmospheric chemistry and air quality. This paper aims to enhance trace gas concentration estimation by integrating physics-based models into data-driven neural network frameworks. Furthermore, to improve large-scale estimation accuracy, we incorporate in-situ measurements to refine neural network models trained on satellite observations. The resulting model can provide reliable large-scale gas concentration estimates, particularly for locations lacking precise in-situ measurements. This approach offers a novel pathway to enhance the accuracy and applicability of gas monitoring for climate and environmental research. While NO2 serves as the target trace gas in this study, the proposed framework is potentially applicable to the prediction of other atmospheric gas concentrations.
|
|
ThAT4 |
304 |
Image and 3D Segmentation 1 |
Regular Session |
Chair: Koppal, Sanjeev | University of Florida |
Co-Chair: Matteucci, Matteo | Politecnico Di Milano |
|
08:30-08:35, Paper ThAT4.1 | |
A Novel Decomposed Feature-Oriented Framework for Open-Set Semantic Segmentation on LiDAR Data |
|
Deng, Wenbang | National University of Defense Technology |
Chen, Xieyuanli | National University of Defense Technology |
Yu, Qinghua | National University of Defense Technology |
He, Yunze | Hunan University |
Xiao, Junhao | National University of Defense Technology |
Lu, Huimin | National University of Defense Technology |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Computer Vision for Transportation
Abstract: Semantic segmentation is a key technique that enables mobile robots to understand and navigate surrounding environments autonomously. However, most existing works focus on segmenting known objects, overlooking the identification of unknown classes, which is common in real-world applications. In this paper, we propose a feature-oriented framework for open-set semantic segmentation on LiDAR data, capable of identifying unknown objects while retaining the ability to classify known ones. We design a decomposed dual-decoder network to simultaneously perform closed-set semantic segmentation and generate distinctive features for unknown objects. The network is trained with multi-objective loss functions to capture the characteristics of known and unknown objects. Using the extracted features, we introduce an anomaly detection mechanism to identify unknown objects. By integrating the results of close-set semantic segmentation and anomaly detection, we achieve effective feature-driven LiDAR open-set semantic segmentation. Evaluations on both SemanticKITTI and nuScenes datasets demonstrate that our proposed framework significantly outperforms state-of-the-art methods. The source code will be made publicly available at https://github.com/nubot-nudt/DOSS.
|
|
08:35-08:40, Paper ThAT4.2 | |
SAM-Guided Pseudo Label Enhancement for Multi-Modal 3D Semantic Segmentation |
|
Yang, Mingyu | University of Michigan |
Lu, Jitong | University of Michigan |
Kim, Hun-Seok | University of Michigan |
Keywords: Deep Learning for Visual Perception, Sensor Fusion
Abstract: Multi-modal 3D semantic segmentation is vital for applications such as autonomous driving and virtual reality (VR). To effectively deploy these models in real-world scenarios, it is essential to employ cross-domain adaptation techniques that bridge the gap between training data and real-world data. Recently, self-training with pseudo-labels has emerged as a predominant method for cross-domain adaptation in multi-modal 3D semantic segmentation. However, generating reliable pseudo-labels necessitates stringent constraints, which often result in sparse pseudo-labels after pruning. This sparsity can potentially hinder performance improvement during the adaptation process. We propose an image-guided pseudo-label enhancement approach that leverages the complementary 2D prior knowledge from the Segment Anything Model (SAM) to introduce more reliable pseudo-labels, thereby boosting domain adaptation performance. Specifically, given a 3D point cloud and the SAM masks from its paired image data, we collect all 3D points covered by each SAM mask that potentially belong to the same object. Then our method refines the pseudo-labels within each SAM mask in two steps. First, we determine the class label for each mask using majority voting and employ various constraints to filter out unreliable mask labels. Next, we introduce Geometry-Aware Progressive Propagation (GAPP) which propagates the mask label to all 3D points within the SAM mask while avoiding outliers caused by 2D-3D misalignment. Experiments conducted across multiple datasets and domain adaptation scenarios demonstrate that our proposed method significantly increases the quantity of high-quality pseudo-labels and enhances the adaptation performance over baseline methods.
|
|
08:40-08:45, Paper ThAT4.3 | |
Robot Manipulation in Salient Vision through Referring Image Segmentation and Geometric Constraints |
|
Jiang, Chen | University of Alberta |
Wang, Allie | University of Alberta |
Jagersand, Martin | University of Alberta |
Keywords: Deep Learning for Visual Perception, Learning Categories and Concepts, Visual Servoing
Abstract: In this paper, we perform robot manipulation activities in real-world environments with language contexts by integrating a compact referring image segmentation model into the robot's perception module. First, we propose CLIPU^2Net, a lightweight referring image segmentation model designed for fine-grain boundary and structure segmentation from language expressions. Then, we deploy the model in an eye-in-hand visual servoing system to enact robot control in the real world. The key to our system is the representation of salient visual information as geometric constraints, linking the robot’s visual perception to actionable commands. Experimental results on 46 real-world robot manipulation tasks demonstrate that our method outperforms traditional visual servoing methods relying on labor-intensive feature annotations, excels in fine-grain referring image segmentation with a compact decoder size of 6.6 MB, and supports robot control across diverse contexts.
|
|
08:45-08:50, Paper ThAT4.4 | |
Boosting Cross-Spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation |
|
Kwon, SeokJun | Sejong University |
Shin, Jeongmin | Sejong University |
Kim, Namil | NAVER LABS |
Hwang, Soonmin | Hanyang University |
Choi, Yukyung | Sejong University |
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, Recognition
Abstract: In autonomous driving, thermal image semantic segmentation has emerged as a critical research area, owing to its ability to provide robust scene understanding under adverse visual conditions. In particular, unsupervised domain adaptation (UDA) for thermal image segmentation can be an efficient solution to address the lack of labeled thermal datasets. Nevertheless, since these methods do not effectively utilize the complementary information between RGB and thermal images, they significantly decrease performance during domain adaptation. In this paper, we present a comprehensive study on cross-spectral UDA for thermal image semantic segmentation. We first propose a novel masked mutual learning strategy that promotes complementary information exchange by selectively transferring results between each spectral model while masking out uncertain regions. Additionally, we introduce a novel prototypical self-supervised loss designed to enhance the performance of the thermal segmentation model in nighttime scenarios. This approach addresses the limitations of RGB pre-trained networks, which cannot effectively transfer knowledge under low illumination due to the inherent constraints of RGB sensors. In experiments, our method achieves higher performance over previous UDA methods and comparable performance to state-of-the-art supervised methods.
|
|
08:50-08:55, Paper ThAT4.5 | |
VideoSAM: Open-World Video Segmentation |
|
Guo, Pinxue | Fudan University |
Zhao, Zixu | Amazon Web Services |
Gao, Jianxiong | Fudan University |
Wu, Chongruo | UC Davis |
He, Tong | Amazon.com |
Zhang, Zheng | AWS |
Xiao, Tianjun | AWS |
Zhang, Wenqiang | Fudan University |
Keywords: Recognition, Object Detection, Segmentation and Categorization, Computer Vision for Automation
Abstract: Video segmentation is essential for advancing robotics and autonomous driving, particularly in open-world settings where continuous perception and object association across video frames are critical. While the Segment Anything Model (SAM) has excelled in static image segmentation, extending its capabilities to video segmentation poses significant challenges. We tackle two major hurdles: a) SAM’s embedding limitations in associating objects across frames, and b) granularity inconsistencies in object segmentation. To this end, we introduce VideoSAM, an end-to-end framework designed to address these challenges by improving object tracking and segmentation consistency in dynamic environments. VideoSAM integrates an agglomerated backbone, RADIO, enabling object association through similarity metrics and introduces Cycle-ack-Pairs Propagation with a memory mechanism for stable object tracking. Additionally, we incorporate an autoregressive object-token mechanism within the SAM decoder to maintain consistent granularity across frames. Our experiments on the UVO and BURST benchmark, and also robotic videos, demonstrate VideoSAM’s effectiveness and robustness in real-world scenarios. All codes will be available.
|
|
08:55-09:00, Paper ThAT4.6 | |
Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion |
|
Liu, Jiangyuan | University of Chinese Academy of Sciences |
Ma, Hongxuan | Institute of Automation, Chinese Academy of Sciences |
Guo, Yuxin | University of Chinese Academy of Sciences |
Zhao, Yuhao | Institute of Automation, Chinese Academy of Sciences |
Zhang, Chi | Shijiazhuang Tiedao University |
Sui, Wei | Soochow University |
Zou, Wei | Chinese Academy of Sciences, University of Chinese Academy of Sci |
Keywords: Deep Learning for Visual Perception, Perception for Grasping and Manipulation, Object Detection, Segmentation and Categorization
Abstract: Transparent object perception is indispensable for numerous robotic tasks. However, accurately segmenting and estimating the depth of transparent objects remain challenging due to complex optical properties. Existing methods primarily delve into only one task using extra inputs or specialized sensors, neglecting the valuable interactions among tasks and the subsequent refinement process, leading to suboptimal and blurry predictions. To address these issues, we propose a monocular framework, which is the first to excel in both segmentation and depth estimation of transparent objects, with only a single-image input. Specifically, we devise a novel semantic and geometric fusion module, effectively integrating the multi-scale information between tasks. In addition, drawing inspiration from human perception of objects, we further incorporate an iterative strategy, which progressively refines initial features for clearer results. Experiments on two challenging synthetic and real-world datasets demonstrate that our model surpasses state-of-the-art monocular, stereo, and multi-view methods by a large margin of about 38.8%-46.2% with only a single RGB input. Codes and models are publicly available at https://github.com/L-J-Yuan/MODEST.
|
|
ThAT5 |
305 |
Planinng and Control for Legged Robots 3 |
Regular Session |
Chair: Qian, Feifei | University of Southern California |
Co-Chair: Marchionni, Luca | Pal Robotics SL |
|
08:30-08:35, Paper ThAT5.1 | |
Obstacle-Aided Trajectory Control of a Quadrupedal Robot through Sequential Gait Composition |
|
Hu, Haodi | University of Southern California |
Qian, Feifei | University of Southern California |
Keywords: Legged Robots, Biologically-Inspired Robots, Dynamics, Rough Terrain Locomotion
Abstract: Modeling and controlling legged robot locomotion on terrains with densely distributed large rocks and boulders are fundamentally challenging. Unlike traditional methods which often consider these rocks and boulders as obstacles and attempt to find a clear path to circumvent them, in this study we aim to develop methods for robots to actively utilize interaction forces with these "obstacles" for locomotion and navigation. To do so, we studied the locomotion of a quadrupedal robot as it traversed a simplified obstacle field, and discovered that with different gaits, the robot could passively converge to distinct orientations. A compositional return map explained this observed passive convergence, and enabled theoretical prediction of the steady-state orientation angles for any given quadrupedal gait. We experimentally demonstrated that with these predictions, a legged robot could effectively generate desired shape of trajectories amongst large, slippery obstacles, simply by switching between different gaits. Our study offered a novel method for robots to exploit traditionally-considered "obstacles" to achieve agile movements on challenging terrains.
|
|
08:35-08:40, Paper ThAT5.2 | |
Enhancing Navigation Efficiency of Quadruped Robots Via Leveraging Personal Transportation Platforms |
|
Yoon, Minsung | Korea Advanced Institute of Science and Technology (KAIST) |
Yoon, Sung-eui | KAIST |
Keywords: Reinforcement Learning, Legged Robots
Abstract: Quadruped robots face limitations in long-range navigation efficiency due to their reliance on legs. To ameliorate the limitations, we introduce a Reinforcement Learning-based Active Transporter Riding method (RL-ATR), inspired by humans' utilization of personal transporters, including Segways. The RL-ATR features a transporter riding policy and two state estimators. The policy devises adequate maneuvering strategies according to transporter-specific control dynamics, while the estimators resolve sensor ambiguities in non-inertial frames by inferring unobservable robot and transporter states. Comprehensive evaluations in simulation validate proficient command tracking abilities across various transporter-robot models and reduced energy consumption compared to legged locomotion. Moreover, we conduct ablation studies to quantify individual component contributions within the RL-ATR. This riding ability could broaden the locomotion modalities of quadruped robots, potentially expanding the operational range and efficiency.
|
|
08:40-08:45, Paper ThAT5.3 | |
Continuous Control of Diverse Skills in Quadruped Robots without Complete Expert Datasets |
|
Tu, Jiaxin | FuDan University |
Wei, Xiaoyi | Fudan University |
Zhang, Yueqi | Fudan University |
Hou, Taixian | FuDan University |
Gao, Xiaofei | Beijing Zhitong Robot Technology Co., Ltd |
Dong, Zhiyan | Fudan University |
Zhai, Peng | Fudan University |
ZHang, Lihua | Fudan University |
Keywords: Legged Robots, Reinforcement Learning
Abstract: Learning diverse skills for quadruped robots presents significant challenges, such as mastering complex transitions between different skills and handling tasks of varying difficulty. Existing imitation learning methods, while successful, rely on expensive datasets to reproduce expert behaviors. Inspired by introspective learning, we propose Progressive Adversarial Self-Imitation Skill Transition (PASIST), a novel method that eliminates the need for complete expert datasets. PASIST autonomously explores and selects high-quality trajectories based on predefined target poses instead of demonstrations, leveraging the Generative Adversarial Self-Imitation Learning (GASIL) framework. To further enhance learning, We develop a skill selection module to mitigate mode collapse by balancing the weights of skills with varying levels of difficulty. Through these methods, PASIST is able to reproduce skills corresponding to the target pose while achieving smooth and natural transitions between them. Evaluations on both simulation platforms and the Solo 8 robot confirm the effectiveness of PASIST, offering an efficient alternative to expert-driven learning.
|
|
08:45-08:50, Paper ThAT5.4 | |
PIP-Loco: A Proprioceptive Infinite Horizon Planning Framework for Quadrupedal Robot Locomotion |
|
Shirwatkar, Aditya | Indian Institute of Science Bengaluru |
Saxena, Naman | Indian Institute of Science, Bengaluru |
Chandra, Kishore P | Visvesvaraya National Institute of Technology, Nagpur |
Kolathaya, Shishir | Indian Institute of Science |
Keywords: Legged Robots, Reinforcement Learning, Machine Learning for Robot Control
Abstract: A core strength of Model Predictive Control (MPC) for quadrupedal locomotion has been its ability to enforce constraints and provide interpretability of the sequence of commands over the horizon. However, despite being able to plan, MPC struggles to scale with task complexity, often failing to achieve robust behavior on rapidly changing surfaces. On the other hand, model-free Reinforcement Learning (RL) methods have outperformed MPC on multiple terrains, showing emergent motions but inherently lack any ability to handle constraints or perform planning. To address these limitations, we propose a framework that integrates proprioceptive planning with RL, allowing for agile and safe locomotion behaviors through the horizon. Inspired by MPC, we incorporate an internal model that includes a velocity estimator and a Dreamer module. During training, the framework learns an expert policy and an internal model that are co-dependent, facilitating exploration for improved locomotion behaviors. During deployment, the Dreamer module solves an infinite-horizon MPC problem, adapting actions and velocity commands to respect the constraints. We validate the robustness of our training framework through ablation studies on internal model components and demonstrate improved robustness to training noise. Finally, we evaluate our approach across multi-terrain scenarios in both simulation and hardware.
|
|
08:50-08:55, Paper ThAT5.5 | |
Whole-Body End-Effector Pose Tracking |
|
Portela, Tifanny | ETH |
Cramariuc, Andrei | ETHZ |
Mittal, Mayank | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Whole-Body Motion Planning and Control, Reinforcement Learning, Legged Robots
Abstract: Combining manipulation with the mobility of legged robots is essential for a wide range of robotic applications. However, integrating an arm with a mobile base significantly increases the system’s complexity, making precise end-effector control challenging. Existing model-based approaches are often constrained by their modeling assumptions, leading to limited robustness. Meanwhile, recent Reinforcement Learning (RL) implementations restrict the arm’s workspace to be in front of the robot or track only the position to obtain decent tracking accuracy. In this work, we address these limitations by introducing a whole-body RL formulation for end-effector pose tracking in a large workspace on rough, unstructured terrains. Our proposed method involves a terrain-aware sampling strategy for the robot’s initial configuration and end-effector pose commands, as well as a game-based curriculum to extend the robot’s operating range. We validate our approach on the ANYmal quadrupedal robot with a six DoF robotic arm. Through our experiments, we show that the learned controller achieves precise command tracking over a large workspace and adapts across varying terrains such as stairs and slopes. On deployment, it achieves a pose-tracking error of 2.64 cm and 3.64◦, outperforming existing competitive baselines.
|
|
08:55-09:00, Paper ThAT5.6 | |
MoRE : Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models |
|
Zhao, Han | Westlake University |
Song, Wenxuan | Westlake University |
Wang, Donglin | Westlake University |
Tong, Xinyang | Westlake University |
Ding, Pengxiang | Westlake University |
Cheng, Xuelian | Monash University |
Ge, Zongyuan | Monash University |
Keywords: Perception-Action Coupling, Legged Robots, Reinforcement Learning
Abstract: Developing versatile quadruped robots that can smoothly perform various actions and tasks in real-world environments remains a significant challenge. This paper introduces a novel vision-language-action (VLA) model, mixture of robotic experts (MoRE), for quadruped robots that aim to introduce reinforcement learning (RL) for fine-tuning large-scale VLA models with a large amount of mixed-quality data. method~integrates multiple low-rank adaptation modules as distinct experts within a dense multi-modal large language model (MLLM), forming a sparse-activated mixture of experts model. This design enables the model to effectively adapt to a wide array of downstream tasks. Moreover, we employ a reinforcement learning-based training objective to train our model as a Q-function after deeply exploring the structural properties of our tasks. Effective learning from automatically collected mixed-quality data enhances data efficiency and model performance. Extensive experiments demonstrate that method~outperforms all baselines across six different skills and exhibits superior generalization capabilities in out-of-distribution scenarios. We further validate our method in real-world scenarios, confirming the practicality of our approach and laying a solid foundation for future research on multi-task learning in quadruped robots.
|
|
ThAT6 |
307 |
Perception for Human-Robot Interaction |
Regular Session |
Chair: Alami, Rachid | CNRS |
Co-Chair: Fu, Di | University of Surrey |
|
08:30-08:35, Paper ThAT6.1 | |
From Seeing to Recognising -- an Extended Self-Organizing Map for Human Postures Identification |
|
He, Xin | Graduate School of Information, Production and System, Waseda Un |
Zielinska, Teresa | Warsaw University of Technology |
Dutta, Vibekananda | Warsaw University of Technology |
Matsumaru, Takafumi | Waseda University |
Sitnik, Robert | Warsaw University of Technology |
Keywords: Human-Centered Robotics, Human-Aware Motion Planning, Human and Humanoid Motion Analysis and Synthesis
Abstract: The article presents a dedicated method for recognizing human postures using classification and clustering options. The ultimate goal of the research is to recognise human actions based on posture sequences. Such a task imposes expectations on the developed method. For this purpose, a Sparse Autoencoder combined with a Self-Organized Map (SOM) is proposed. SOM is equipped with an additional layer of post-labeling or clustering. This entire structure is called the extended SOM. Two task-oriented modifications are applied to improve SOM performance -- a dedicated angular distance measure and a neighbourhood function for updating the SOM weights. The research contribution is the concept of extended SOM, which is trained using unlabeled data and classifies or clusters the human postures. The Sparse Autoencoder preserves the characteristics of the data while reducing its dimensionality. Better classification efficiency of the developed method is demonstrated compared to other representative methods. Ablation studies illustrate how the introduced modifications improve classification results. The developed method is characterised by good resolution in distinguishing postures. A discussion of the concept's usefulness is provided at the end of the article.
|
|
08:35-08:40, Paper ThAT6.2 | |
MmDEAR: MmWave Point Cloud Density Enhancement for Accurate Human Body Reconstruction |
|
Yang, Jiarui | Shanghai Jiao Tong University |
Xia, Songpengcheng | Shanghai Jiao Tong University |
Lai, Zengyuan | Shanghai Jiao Tong University |
Sun, Lan | Shanghai Jiao Tong University |
Wu, Qi | Shanghai Jiao Tong University |
Yu, Wenxian | Shanghai Jiao Tong University |
Pei, Ling | Shanghai Jiao Tong University |
Keywords: Human Detection and Tracking, Human and Humanoid Motion Analysis and Synthesis
Abstract: Millimeter-wave (mmWave) radar offers robust sensing capabilities in diverse environments, making it a highly promising solution for human body reconstruction due to its privacy-friendly and non-intrusive nature. However, the significant sparsity of mmWave point clouds limits the estimation accuracy. To overcome this challenge, we propose a two-stage deep learning framework that enhances mmWave point clouds and improves human body reconstruction accuracy. Our method includes a mmWave point cloud enhancement module that densifies the raw data by leveraging temporal features and a multi-stage completion network, followed by a 2D-3D fusion module that extracts both 2D and 3D motion features to refine SMPL parameters. The mmWave point cloud enhancement module learns the detailed shape and posture information from 2D human masks in single-view images. However, image-based supervision is involved only during the training phase, and the inference relies solely on sparse point clouds to maintain privacy. Experiments on multiple datasets demonstrate that our approach outperforms state-of-the-art methods, with the enhanced point clouds further improving performance when integrated into existing models.
|
|
08:40-08:45, Paper ThAT6.3 | |
Human Activity Recognition by Using Enhanced Radar Point Cloud 2D Histograms and Doppler Feature Fusion |
|
Liao, Guanghang | Great Bay University |
Ma, Jieming | Harbin Institute of Technology, Shenzhen |
Luo, Fei | Great Bay University |
Keywords: Human-Centered Robotics, Gesture, Posture and Facial Expressions, Multi-Modal Perception for HRI
Abstract: Human activity recognition (HAR) based on millimeter wave (mmWave) radar has recently attracted significant interest due to its diverse applications in intelligent robots and human-computer interaction (HCI), including the healthcare monitoring robot. 2-dimensional (2D) histogram features of radar point clouds have demonstrated high accuracy in HAR. But further expansion and refinement of this technique is needed. This paper presents a new precise non-invasive HAR framework based on radar point cloud 2D histograms. Our method enhances conventional 2D histograms by integrating fixed radar sensing boundaries into the histograms, which shows the relative spatial position changes of the target points detected by radar. Additionally, we have concatenated Doppler features (i.e., range-Doppler and angle-Doppler histograms) with the point cloud histograms, resulting in a more comprehensive feature representation than conventional point cloud histograms. We investigated the overfitting issue in stacked hybrid networks and established a multi-layer hybrid network with an optimal number of stacked layers for HAR. In the evaluation, our approach achieves state-of-the-art accuracy, with 99.72% on mmWaveRadarWalking dataset and 98.67% on CI4R-Human-Activity-Recognition dataset, respectively. The proposed method can be applied in the fields of robotics and HCI.
|
|
08:45-08:50, Paper ThAT6.4 | |
Estimating User Engagement in Human Robot Interaction Using a Dynamic Bayesian Network |
|
Hei, Xiaoxuan | ENSTA Paris, Institut Polytechnique De Paris |
Zhang, Heng | ENSTA Paris, Institut Polytechnique De Paris |
Tapus, Adriana | ENSTA Paris, Institut Polytechnique De Paris |
Keywords: Multi-Modal Perception for HRI, Robot Companions, Social HRI
Abstract: Engagement is a key concept in Human-Robot Interaction (HRI), as high engagement often leads to improved user experience and task performance. However, accurately estimating engagement during interactions is challenging. In this study, we propose a Dynamic Bayesian Network (DBN) to infer user engagement from various modalities, including head rotation, eye movements, facial expressions captured through visual sensors, as well as facial temperature variations measured by a thermal camera. Data was gathered from a human-robot interaction (HRI) experiment, where a robot guided participants and encouraged them to share their thoughts and insights on environmental issues. Our approach successfully combines these diverse features to offer a thorough assessment of user engagement. The network was tested on its capacity to classify participants as either engaged or not engaged, achieving an accuracy of 0.83 and an Area Under the Curve (AUC) of 0.82. These findings underscore the strength of our DBN in detecting user engagement during interactions.
|
|
08:50-08:55, Paper ThAT6.5 | |
HRI-Free: Cognitive Robotic Simulation for Evaluating Embodied Social Attention Models |
|
Abawi, Fares | Universität Hamburg |
Fu, Di | University of Surrey |
Keywords: Cognitive Modeling, Embodied Cognitive Science, Social HRI
Abstract: Scaling social robot studies is constrained due to the need for human interaction, making large participant recruitment impractical. Robotics simulators help mitigate this limitation but generally lack the realism to accurately simulate social cues. We introduce a cognitive robotic simulation scheme to evaluate social attention models in physical environments. By projecting ground-truth priority maps to a simulated environment, we can directly compare predicted maps using common saliency metrics. Using the iCub robot, we assess a dynamic scanpath model that predicts attention targets, simulating human scanpaths. Evaluations with the FindWho and MVVA datasets show strong correlations between robot-captured metrics and direct-streamed video metrics. Our results indicate robustness of the social attention model to noise and real-world conditions, suggesting its practical usability for predicting personalized scanpaths in real settings. This approach reduces the need for extensive human-robot interaction studies in the early stages of study design, enabling the scalability and reproducibility of social robot evaluations.
|
|
08:55-09:00, Paper ThAT6.6 | |
An EEG Conformer Model for Error Feedback During Human-Robot Interaction |
|
Han, Jinpei | Imperial College London |
Li, Yinxuan | Imperial College London |
Gu, Xiao | University of Oxford |
Faisal, Aldo | Imperial College London |
Keywords: Brain-Machine Interfaces, Human Factors and Human-in-the-Loop, Intention Recognition
Abstract: Identifying a brain signal that enables the detection of incorrect execution in human-robot interaction (HRI) is considered a holy grail for real-time systems. A major challenge in achieving this is the inherent imbalance caused by the sparsity of error-related potential (ErrP) events in streaming electroencephalogram (EEG) data, which often leads models to learn irrelevant features and perform poorly. Thus, while Deep learning-based ErrP detection has seen considerable advancements, the variability in individual user reaction times introduces labelling errors, complicating model adaptation to new subjects. Moreover, most deep learning methods are developed and validated on discrete, offline experiments using pre-defined windows, which fail to translate effectively to continuous, real-time HRI. Addressing these challenges is crucial to improving the robustness and adaptability of real-time ErrP detection in practical HRI applications. Here, we develop a causal EEG conformer framework, combining a Convolutional neural network (CNN) encoder and a transformer with causal attention for real-time prediction of ErrP signals during HRI. We evaluated our ErrP model in a pseudo-online environment in both inter-session and inter-subject cross-validation settings for exoskeleton assistive robotics. Our model demonstrated superior performance in decoding accuracy and efficiency, showcasing better generalization for real-world dynamic HRI applications.
|
|
ThAT7 |
309 |
Marine Robotics 5 |
Regular Session |
Chair: Kelasidi, Eleni | NTNU |
Co-Chair: Chavez, Jalil | Purdue |
|
08:30-08:35, Paper ThAT7.1 | |
Cross-Platform Learning-Based Fault Tolerant Surfacing Controller for Underwater Robots |
|
Hamamatsu, Yuya | The University of Tokyo |
Remmas, Walid | Tallinn University of Technology / Université De Montpellier |
Rebane, Jaan | Tallinna Tehnikaülikool |
Kruusmaa, Maarja | Tallinn University of Technology (TalTech) |
Ristolainen, Asko | Tallinn University of Technology |
Keywords: Marine Robotics, Field Robots, Model Learning for Control
Abstract: In this paper, we propose a novel cross-platform fault-tolerant surfacing controller for underwater robots, based on reinforcement learning (RL). Unlike conventional approaches, which require explicit identification of malfunctioning actuators, our method allows the robot to surface using only the remaining operational actuators without needing to pinpoint the failures. The proposed controller learns a robust policy capable of handling diverse failure scenarios across different actuator configurations. Moreover, we introduce a transfer learning mechanism that shares a part of the control policy across various underwater robots with different actuators, thus improving learning efficiency and generalization across platforms. To validate our approach, we conduct simulations on three different types of underwater robots: a hovering-type AUV, a torpedo shaped AUV, and a turtle-shaped robot (U-CAT). Additionally, real-world experiments are performed, successfully transferring the learned policy from simulation to a physical U-CAT in a controlled environment. Our RL-based controller demonstrates superior performance in terms of stability and success rate compared to a baseline controller, achieving an 85.7 percent success rate in real-world tests compared to 57.1 percent with a baseline controller. This research provides a scalable and efficient solution for fault-tolerant control for diverse underwater platforms, with potential applications in real-world aquatic missions.
|
|
08:35-08:40, Paper ThAT7.2 | |
Optimizing Underwater Robot Navigation: A Study of DRL Algorithms and Multi-Modal Sensor Fusion |
|
Deowan, Md Ether | University of Toulon |
Yousha, Md Shamin Yeasher | Norwegian University of Science and Technology - NTNU |
Hossain, Tihan Mahmud | Norwegian University of Science and Technology - NTNU |
Hassan, Shahriar | Instituto Superior Técnico |
Marxer, Ricard | Université De Toulon, Aix Marseille Univ, CNRS, LIS |
Keywords: Marine Robotics, Autonomous Agents, Reinforcement Learning
Abstract: Autonomous underwater navigation faces significant challenges due to the complexity of the environment, limited localization methods, and poor visibility. This paper investigates the performance of various reinforcement learning (RL) algorithms—Proximal Policy Optimization (PPO), Trust Region Policy Optimization (TRPO), Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), and Advantage Actor-Critic (A2C)—to improve navigation capabilities of low-cost underwater robots equipped with multi-modal sensors. Advanced depth estimation models such as MiDaS and Depth Anything, combined with domain randomization techniques, are employed to enhance the system's robustness and generalization across varying underwater conditions. The proposed approach integrates real-time sensor data and historical actions to enable 3D maneuvering in simulated environments, leading to significant improvements in sensor fusion, depth perception, and obstacle avoidance. Simulation results demonstrate that the combination of RL techniques with sensor fusion considerably improves mapless autonomous underwater exploration, providing a robust solution for navigating unstructured aquatic environments.
|
|
08:40-08:45, Paper ThAT7.3 | |
PUGS: Perceptual Uncertainty for Grasp Selection in Underwater Environments |
|
Bagoren, Onur | University of Michigan |
Micatka, Marc | University of Washington |
Skinner, Katherine | University of Michigan |
Marburg, Aaron | University of Washington |
Keywords: Marine Robotics, Perception for Grasping and Manipulation
Abstract: When navigating and interacting in challenging environments where sensory information is imperfect and incomplete, robots must make decisions that account for these shortcomings. We propose a novel method for quantifying and representing such perceptual uncertainty in 3D reconstruction through occupancy uncertainty estimation. We develop a framework to incorporate it into grasp selection for autonomous manipulation in underwater environments. Instead of treating each measurement equally when deciding which location to grasp from, we present a framework that propagates uncertainty inherent in the multi-view reconstruction process into the grasp selection. We evaluate our method with both simulated and the real world data, showing that by accounting for uncertainty, the grasp selection becomes robust against partial and noisy measurements. Code will be made available at https://onurbagoren.github.io/PUGS/
|
|
08:45-08:50, Paper ThAT7.4 | |
Learning to Swim: Reinforcement Learning for 6-DOF Control of Thruster-Driven Autonomous Underwater Vehicles |
|
Cai, Levi | Massachusetts Institute of Technology |
Chang, Kevin | Oregon State University |
Girdhar, Yogesh | Woods Hole Oceanographic Institution |
Keywords: Field Robots, Marine Robotics, Reinforcement Learning
Abstract: Controlling AUVs can be challenging because of the effect of complex non-linear hydrodynamic forces acting on the robot, which are significant in water and cannot be ignored. The problem is exacerbated for small AUVs for which the dynamics can change significantly with payload changes and deployments under different hydrodynamic conditions. The common approach to AUV control is a combination of passive stabilization with added buoyancy on top and weights on the bottom, and a PID controller tuned for simple and smooth motion primitives. However, the approach comes at the cost of sluggish controls and often the need to re-tune controllers with configuration changes. In this paper, we propose a fast (trainable in minutes), reinforcement learning-based approach for full 6 degree of freedom (DOF) control of a thruster-driven AUVs, taking 6-DOF command-conditioned inputs direct to thruster outputs. We present a new, highly parallelized simulator for underwater vehicle dynamics. We demonstrate this approach through zero-shot sim-to-real (with no tuning) transfer onto a real AUV that produces comparable results to hand-tuned PID controllers. Furthermore, we show that domain randomization on the simulator produces policies that are robust to small variations in vehicle's physical parameters.
|
|
08:50-08:55, Paper ThAT7.5 | |
Underwater Motions Analysis and Control of a Coupling-Tiltable Unmanned Aerial-Aquatic Vehicle |
|
Huang, Dongyue | The Chinese University of Hong Kong |
Dou, Minghao | The Chinese University of Hong Kong |
Liu, Xuchen | The Chinese University of Hong Kong |
Sun, Tao | Shenzhen Institute of Artificial Intelligence and Robotics for S |
Zhang, Jianguo | Shenzhen Institute of Artificial Intelligence and Robotics for S |
Ding, Ning | The Chinese University of Hong Kong, Shenzhen |
Chen, Xinlei | Tsinghua University |
Chen, Ben M. | Chinese University of Hong Kong |
Keywords: Marine Robotics, Aerial Systems: Mechanics and Control, Motion Control
Abstract: Coupling-Tiltable Unmanned Aerial-Aquatic Vehicles (UAAVs) have gained increasing importance, yet lack comprehensive analysis and suitable controllers. This paper analyzes the underwater motion characteristics of a self-designed UAAV, Mirs-Alioth, and designs a controller for it. The effectiveness of the controller is validated through experiments. The singularities of Mirs-Alioth are derived as Singular Thrust Tilt Angle (STTA), which serve as an essential tool for an analysis of its underwater motion characteristics. The analysis reveals several key factors for designing the controller. These include the need for logic switching, using a Nussbaum function to compensate control direction uncertainty in the auxiliary channel, and employing an auxiliary controller to mitigate coupling effects. Based on these key points, a control scheme is designed. It consists of a controller that regulates the thrust tilt angle to the singular value, an auxiliary controller incorporating a Saturated Nussbaum function, and a logic switch. Eventually, two sets of experiments are conducted to validate the effectiveness of the controller and demonstrate the necessity of the Nussbaum function.
|
|
08:55-09:00, Paper ThAT7.6 | |
Adaptive Integral Sliding Mode Control for Attitude Tracking of Underwater Robots with Large Range Pitch Variations in Confined Spaces |
|
Wang, Xiaorui | Peking University |
Sha, Zeyu | Peking University |
Zhang, Feitian | Peking University |
Keywords: Marine Robotics, Motion Control, Robust/Adaptive Control
Abstract: 水下机器人在探索水生环境中发挥着至关重要的作用。灵活调整姿态的能力,尤其是俯仰,对于水下机器人在狭窄空间内有效完成任务至关重要。然而,由姿态变化导致的高度耦合的六自由度动力学和有限空间区域内的复杂湍流带来了重大挑战。为了解决水下机器人的姿态控制问题,本文研究了站位保持期间的大范围俯仰角跟踪以及同步滚转和偏航角控制,以实现多功能姿态调整。基于动态建模,本文提出了一种自适应积分滑模控制器 (AISMC),该控制器将积分模块集成到传统的滑模控制 (SMC) 中,并自适应地调整开关增益,以提高跟踪精度、减少颤振并增强鲁棒ö
|
|
ThAT8 |
311 |
Aerial Robots: Learning 1 |
Regular Session |
Chair: Yim, Mark | University of Pennsylvania |
Co-Chair: Jagannatha Sanket, Nitin | Worcester Polytechnic Institute |
|
08:30-08:35, Paper ThAT8.1 | |
Learning Local Urban Wind Flow Fields from Range Sensing |
|
Folk, Spencer | University of Pennsylvania |
Melton, John | NASA Ames Research Center |
Margolis, Benjamin W. L. | NASA Ames Research Center |
Yim, Mark | University of Pennsylvania |
Kumar, Vijay | University of Pennsylvania |
Keywords: Aerial Systems: Perception and Autonomy, Deep Learning Methods, Automation Technologies for Smart Cities
Abstract: Obtaining accurate and timely predictions of the wind through an urban environment is a challenging task, but has wide-ranging implications for the safety and efficiency of autonomous aerial vehicles in future urban airspaces. Prior work relies strongly on global information about the environment, such as a precise map of the city and in-situ wind measurements at various locations, to run expensive computational fluid dynamics solvers to predict the entire wind flow field. In contrast, this paper introduces a new method to estimate the wind flow field in a region around the robot in real time, utilizing on-board range measurements to sense nearby buildings and sparse wind measurements to infer windspeed and direction. We propose that this information sufficiently characterizes the structure of the wind flow field in the local region of interest. To that end, we introduce a deep learning-based approach to predict local flow fields from range measurements. Our results indicate that a neural network trained on numerous simulated winds through small randomized maps is capable of reconstructing local wind flows while generalizing to larger environments with over 200 buildings. This contribution empowers computationally-constrained aerial robots to reason about the structure of local wind flow fields, thereby enabling new planning, control, and estimation strategies in windy urban environments without textit{a priori} knowledge of the map.
|
|
08:35-08:40, Paper ThAT8.2 | |
Whole-Body Control through Narrow Gaps from Pixels to Action |
|
Wu, Tianyue | Zhejiang University |
Chen, Yeke | Zhejiang University |
Chen, Tianyang | Zhejiang University |
Zhao, Guangyu | Zhejiang University |
Gao, Fei | Zhejiang University |
Keywords: Aerial Systems: Applications, Sensorimotor Learning, Reinforcement Learning
Abstract: Flying through body-size narrow gaps in the environment is one of the most challenging moments for an underactuated multirotor. We explore a purely data-driven method to master this flight skill in simulation, where a neural network directly maps pixels and proprioception to continuous low-level control commands. This learned policy enables whole-body control through gaps with different geometries demanding sharp attitude changes (e.g., near-vertical roll angle). The policy is achieved by successive model-free reinforcement learning (RL) and online observation space distillation. The RL policy receives (virtual) point clouds of the gaps' edges for scalable simulation and is then distilled into the high-dimensional pixel space. However, this flight skill is fundamentally expensive to learn by exploring in RL due to restricted feasible solution space. We propose to reset the agent as states on the trajectories by a model-based trajectory optimizer to alleviate this problem. The presented training pipeline is compared with baseline methods, and ablation studies are conducted to identify the key ingredients of the method. The immediate next step is to scale up the variation of gap sizes and geometries in anticipation of emergent policies and demonstrate the sim-to-real transformation.
|
|
08:40-08:45, Paper ThAT8.3 | |
VisFly: An Efficient and Versatile Simulator for Training Vision-Based Flight |
|
Li, Fanxing | Shanghai Jiao Tong University |
Sun, Fangyu | Shanghai Jiaotong University |
Zhang, Tianbao | Shanghai Jiao Tong University |
Zou, Danping | Shanghai Jiao Ton University |
Keywords: Aerial Systems: Perception and Autonomy, Simulation and Animation, Visual Learning
Abstract: We present VisFly, a quadrotor simulator designed to efficiently train vision-based flight policies using reinforcement learning algorithms. VisFly offers a user-friendly framework and interfaces, leveraging Habitat-Sim's rendering engines to achieve frame rates exceeding 10,000 frames per second for rendering motion and sensor data. The simulator incorporates differentiable physics and is seamlessly wrapped with the Gym environment, facilitating the straightforward implementation of various learning algorithms. It supports the directly importing open-source scene datasets compatible with Habitat-Sim, enabling training on diverse real-world environments simultaneously. To validate our simulator, we also make three reinforcement learning examples for typical flight tasks relying on visual observations. The simulator is now available at [https://github.com/SJTU-ViSYS-team/VisFly].
|
|
08:45-08:50, Paper ThAT8.4 | |
Environment As Policy: Learning to Race in Unseen Tracks |
|
Wang, Hongze | ETH Zurich |
Xing, Jiaxu | University of Zurich |
Messikommer, Nico | University of Zurich |
Scaramuzza, Davide | University of Zurich |
Keywords: Aerial Systems: Perception and Autonomy, Reinforcement Learning, AI-Enabled Robotics
Abstract: Reinforcement learning (RL) has achieved outstanding success in complex robot control tasks, such as drone racing, where the RL agents have outperformed human champions in a known racing track. However, these agents fail in unseen track configurations, always requiring complete retraining when presented with new track layouts. This work aims to develop RL agents that generalize effectively to novel track configurations without retraining. The naive solution of training directly on a diverse set of track layouts can overburden the agent, resulting in suboptimal policy learning as the increased complexity of the environment impairs the agent’s ability to learn to fly. To enhance the generalizability of the RL agent, we propose an adaptive environment-shaping framework that dynamically adjusts the training environment based on the agent’s performance. We achieve this by leveraging a secondary RL policy to design environments that strike a balance between being challenging and achievable, allowing the agent to adapt and improve progressively. Using our adaptive environment shaping, one single racing policy efficiently learns to race in diverse and challenging tracks. Experimental results validated in both simulation and the real world show that our method enables drones to successfully fly complex and unseen race tracks, outperforming existing environment-shaping techniques.
|
|
08:50-08:55, Paper ThAT8.5 | |
UAV-Assisted Self-Supervised Terrain Awareness for Off-Road Navigation |
|
Fortin, Jean-Michel | Université Laval |
Gamache, Olivier | Université Laval |
Fecteau, William | Université Laval |
Daum, Effie | Université Laval |
Larrivée-Hardy, William | Laval University |
Pomerleau, Francois | Université Laval |
Gigučre, Philippe | Université Laval |
Keywords: Field Robots, Learning from Experience, Multi-Robot Systems
Abstract: Terrain awareness is an essential milestone to enable truly autonomous off-road navigation. Accurately predicting terrain characteristics allows optimizing a vehicle's path against potential hazards. Recent methods use deep neural networks to predict traversability-related terrain properties in a self-supervised manner, relying on proprioception as a training signal. However, onboard cameras are inherently limited by their point-of-view relative to the ground, suffering from occlusions and vanishing pixel density with distance. This paper introduces a novel approach for self-supervised terrain characterization using an aerial perspective from a hovering drone. We capture terrain-aligned images while sampling the environment with a ground vehicle, effectively training a simple predictor for vibrations, bumpiness, and energy consumption. Our dataset includes 2.8 km of off-road data collected in forest environment, comprising 13 484 ground-based images and 12 935 aerial images. Our findings show that drone imagery improves terrain property prediction by 21.37 % on the whole dataset and 37.35 % in high vegetation, compared to ground images. We conduct ablation studies to identify the main causes of these performance improvements. We also demonstrate the real-world applicability of our approach by scouting an unseen area with a drone, planning and executing an optimized path on the ground.
|
|
08:55-09:00, Paper ThAT8.6 | |
EdgeFlowNet: 100FPS@1W Dense Optical Flow for Tiny Mobile Robots |
|
Pinnama Raju, Sai Ramana Kiran | Worcester Polytechnic Institute |
Singh, Rishabh | Worcester Polytechnic Institute |
Velmurugan, Manoj | Worcester Polytechnic Institute |
Jagannatha Sanket, Nitin | Worcester Polytechnic Institute |
Keywords: Aerial Systems: Perception and Autonomy, Deep Learning for Visual Perception, Vision-Based Navigation
Abstract: Optical flow estimation is a critical task for tiny mobile robotics to enable safe and accurate navigation, obstacle avoidance, and other functionalities. However, optical flow estimation on tiny robots is challenging due to limited onboard sensing and computation capabilities. In this paper, we propose EdgeFlowNet, a high-speed, low-latency dense optical flow approach for tiny autonomous mobile robots by harnessing the power of edge computing. We demonstrate the efficacy of our approach by deploying EdgeFlowNet on a tiny quadrotor to perform static obstacle avoidance, flight through unknown gaps and dynamic obstacle dodging. EdgeFlowNet is about 20X faster than the previous state-of-the-art approaches while improving accuracy by over 20% and using only 1.08W of power enabling advanced autonomy on palm-sized tiny mobile robots.
|
|
ThAT9 |
312 |
Multi-Robot Formation Control |
Regular Session |
Chair: Agarwal, Saurav | University of Pennsylvania |
Co-Chair: Parasuraman, Ramviyas | University of Georgia |
|
08:30-08:35, Paper ThAT9.1 | |
GMF: Gravitational Mass-Force Framework for Parametric Multi-Level Coordination in Multi-Robot and Swarm Robotic Systems |
|
Starks, Michael | University of Georgia Heterogeneous Robotics Research Lab |
Parasuraman, Ramviyas | University of Georgia |
Keywords: Multi-Robot Systems, Swarm Robotics, Cooperating Robots
Abstract: Distributed multi-robot coordination is critical to achieving reliable robotic missions that exploit the collective capability of swarm robots. In particular, the consensus and formation control problems have been extensively studied, resulting in distributed controllers that enable robots to rely only on information from themselves and their immediate neighbors. However, these algorithms are usually designed for specific objectives (e.g., cooperative object transportation, environmental coverage, etc.), requiring the controllers to be re-designed for domain variations. Therefore, we propose a new parametric framework inspired by gravitational fields that allow simultaneous coordination of robots at multiple levels, enabling generalization and domain adaptation. Our approach is built on top of a connectivity-preserving formation controller, with need-based and task-based ad hoc coordination at private, local, and global layers of a swarm robot team. We demonstrate the remarkable potential of our framework through extensive simulations and real-world swarm robot experiments in three representative multi-robot tasks involving tight coordination: 1) robot-initiated rendezvous at different coordination layers, 2) coordinated boundary tracking and coverage of environmental processes, and 3) accommodating task executions and motion control while satisfying the coordination laws.
|
|
08:35-08:40, Paper ThAT9.2 | |
Leader-Follower Formation Control of Perturbed Nonholonomic Agents Along Parametric Curves with Directed Communication |
|
Zhang, Bin | The Hong Kong Polytechnic University |
Shao, Xiaodong | Beihang University |
Zhi, Hui | The Hong Kong Polytechnic University |
Qiu, Liuming | The Hong Kong Polytechnic University |
Romero Velazquez, Jose Guadalupe | ITAM |
Navarro-Alarcon, David | The Hong Kong Polytechnic University |
Keywords: Multi-Robot Systems, Motion Control, Nonholonomic Motion Planning
Abstract: In this letter, we propose a novel formation controller for nonholonomic agents to form general parametric curves. First, we derive a unified parametric representation for both open and closed curves. Then, a leader-follower formation controller is designed to drive agents to form the desired parametric curves using the curve coefficients as feedbacks. We consider directed communications and constant input disturbances rejection in the controller design. Rigorous Lyapunov-based stability analysis proves the asymptotic stability of the proposed controller. The convergence of the orientations of agents to some constant values is also guaranteed. The method has the potential to be extended to deal with various real-world applications, such as object enclosing. Detailed numerical simulations and experimental studies are conducted to verify the performance of the proposed method.
|
|
08:40-08:45, Paper ThAT9.3 | |
Versatile Distributed Maneuvering with Generalized Formations Using Guiding Vector Fields |
|
Lu, Yang | National University of Defense Technology |
Luo, Sha | University of Groningen |
Zhu, Pengming | National University of Defense Technology |
Yao, Weijia | Hunan University |
Garcia de Marina, Hector | Universidad De Granada |
Zhang, Xinglong | National University of Defense Technology |
Xu, Xin | National University of Defense Technology |
Keywords: Multi-Robot Systems, Motion Control, Distributed Robot Systems
Abstract: This paper presents a unified approach to realize versatile distributed maneuvering with generalized formations. Specifically, we decompose the robots' maneuvers into two independent components, i.e., interception and enclosing, which are parameterized by two independent virtual coordinates. Treating these two virtual coordinates as dimensions of an abstract manifold, we derive the corresponding singularity-free guiding vector field (GVF), which, along with a distributed coordination mechanism based on the consensus theory, guides robots to achieve various motions (i.e., versatile maneuvering), including (a) formation tracking, (b) target enclosing, and (c) circumnavigation. Additional motion parameters can generate more complex cooperative robot motions. Based on GVFs, we design a controller for a nonholonomic robot model. Besides the theoretical results, extensive simulations and experiments are performed to validate the effectiveness of the approach.
|
|
08:45-08:50, Paper ThAT9.4 | |
Cooperative Distributed Model Predictive Control for Embedded Systems: Experiments with Hovercraft Formations |
|
Stomberg, Gösta | Hamburg University of Technology |
Schwan, Roland | EPFL |
Grillo, Andrea | EPFL |
Jones, Colin | École Polytechnique Fédérale De Lausanne (EPFL) |
Faulwasser, Timm | Hamburg University of Technology |
Keywords: Multi-Robot Systems, Optimization and Optimal Control, Cooperating Robots
Abstract: This paper presents experiments for embedded cooperative distributed model predictive control applied to a team of hovercraft floating on an air hockey table. The hovercraft collectively solve a centralized optimal control problem in each sampling step via a stabilizing decentralized real-time iteration scheme using the alternating direction method of multipliers. The efficient implementation does not require a central coordinator, executes onboard the hovercraft, and facilitates sampling intervals in the millisecond range. The formation control experiments showcase the flexibility of the approach on scenarios with point-to-point transitions, trajectory tracking, collision avoidance, and moving obstacles.
|
|
08:50-08:55, Paper ThAT9.5 | |
Coordinated Multi-Robot Navigation with Formation Adaptation |
|
Deng, Zihao | University of Massachusetts Amherst |
Gao, Peng | North Carolina State University |
Jose, Williard Joshua | University of Massachusetts Amherst |
Reardon, Christopher M. | MITRE |
Wigness, Maggie | U.S. Army Research Laboratory |
Rogers III, John G. | US Army Research Laboratory |
Zhang, Hao | University of Massachusetts Amherst |
Keywords: Multi-Robot Systems, Machine Learning for Robot Control
Abstract: Coordinated multi-robot navigation is an essential ability for a team of robots operating in diverse environments. Robot teams often need to maintain specific formations, such as wedge formations, to enhance visibility, positioning, and efficiency during fast movement. However, complex environments such as narrow corridors challenge rigid team formations, which makes effective formation control difficult in real-world environments. To address this challenge, we introduce a novel Adaptive Formation with Oscillation Reduction (AFOR) approach to improve coordinated multi-robot navigation. We develop AFOR under the theoretical framework of hierarchical learning and integrate a spring-damper model with hierarchical learning to enable both team coordination and individual robot control. At the upper level, a graph neural network facilitates formation adaptation and information sharing among the robots. At the lower level, reinforcement learning enables each robot to navigate and avoid obstacles while maintaining the formations. We conducted extensive experiments using Gazebo in the Robot Operating System (ROS), a high-fidelity Unity3D simulator with ROS, and real robot teams. Results demonstrate that AFOR enables smooth navigation with formation adaptation in complex scenarios and outperforms previous methods. More details of this work are provided on the project website: https://hcrlab.gitlab.io/project/afor.
|
|
ThAT10 |
313 |
Multi-Robot Systems 3 |
Regular Session |
Chair: Guo, Jia | Cornell University |
Co-Chair: Kim, Woojun | Carnegie Mellon University |
|
08:30-08:35, Paper ThAT10.1 | |
MARVEL: Multi-Agent Reinforcement Learning for Constrained Field-Of-View Multi-Robot Exploration in Large-Scale Environments |
|
Chiun, Jimmy | National University of Singapore |
Zhang, Shizhe | National University of Singapore |
Wang, Yizhuo | National University of Singapore |
Cao, Yuhong | National University of Singapore |
Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
Keywords: Multi-Robot Systems, Reinforcement Learning, Motion and Path Planning
Abstract: In multi-robot exploration, a team of mobile robot is tasked with efficiently mapping an unknown environments. While most exploration planners assume omnidirectional sensors like LiDAR, this is impractical for small robots such as drones, where lightweight, directional sensors like cameras may be the only option due to payload constraints. These sensors have a constrained field-of-view (FoV), which adds complexity to the exploration problem, requiring not only optimal robot positioning but also sensor orientation during movement. In this work, we propose MARVEL, a neural framework that leverages graph attention networks, together with novel frontiers and orientation features fusion technique, to develop a collaborative, decentralized policy using multi-agent reinforcement learning (MARL) for robots with constrained FoV. To handle the large action space of viewpoints planning, we further introduce a novel information-driven action pruning strategy. MARVEL improves multi-robot coordination and decision-making in challenging large-scale indoor environments, while adapting to various team sizes and sensor configurations (i.e., FoV and sensor range) without additional training. Our extensive evaluation shows that MARVEL’s learned policies exhibit effective coordinated behaviors, outperforming state-of-the-art exploration planners across multiple metrics. We experimentally demonstrate MARVEL’s generalizability in large-scale environments, of up to 90m by 90m, and validate its practical applicability through successful deployment on a team of real drone hardware.
|
|
08:35-08:40, Paper ThAT10.2 | |
RACE: A Fast and Lightweight Urban Exploration and Search Strategy for Multi-Robot Systems |
|
Leong, Jabez Kit | Singapore University of Technology and Design |
Soh, Gim Song | Singapore University of Technology and Design |
Keywords: Multi-Robot Systems, Search and Rescue Robots, Swarm Robotics
Abstract: Multi-Robot Systems (MRS) are increasingly deployed for hazardous tasks in urban environments. Among many tasks, search and rescue remains challenging as it deals with exploration in an unknown indoor constrained environment. For example, without global knowledge of the map of a building floor, it is not advantageous to choose one path over another at a corridor junction. Also, if the assigned frontiers are far from the robot, backtracking along a corridor will cost more than moving forward. Since exploration along corridors is similar to solving a maze, this paper examines classical maze-solving algorithms that are known to be computationally fast and lightweight, such as the Right Hand Rule (RHR), Random Mouse (RM), and more. The authors have identified two gaps that need to be addressed before these algorithms can be applied to physical MRS. Firstly, these algorithms are not designed for the cooperation of multiple agents in exploration. Secondly, they are often applied to only a low-fidelity simulation environment, which requires some work to make these algorithms transferable to work in the commonly used occupancy grid map environment. In this paper, the authors introduced RACE, a fast and lightweight collective urban exploration and search algorithm based on a modified and condensed version of the Ant Colony Optimization (ACO) algorithm. The proposed solution is successfully verified in a low-fidelity simulation, evaluated against other exploration and search algorithms like RHR and RM. An innovative approach of RACE Simulation to Physical implementation is presented and a physical system evaluation is performed to evaluate RACE against a Rapidly-Exploring Random Tree algorithm. Finally, the proposed solution is further verified with a physical experiment, which a quadrupedal robot is assigned to explore part of a floor of SUTD, spanning approximately (55m x 40m). RACE also showed potential in handling challenging close-loop and dead-end environments.
|
|
08:40-08:45, Paper ThAT10.3 | |
Reinforcement Learning Driven Multi-Robot Exploration Via Explicit Communication and Density-Based Frontier Search |
|
Calzolari, Gabriele | Luleĺ Tekniska Universitet |
Sumathy, Vidya | Luleĺ University of Technology |
Kanellakis, Christoforos | LTU |
Nikolakopoulos, George | Luleĺ University of Technology |
Keywords: Reinforcement Learning, Multi-Robot Systems, Cooperating Robots
Abstract: Collaborative multi-agent exploration of unknown environments is crucial for search and rescue operations. Effective real-world deployment must address challenges such as limited inter-agent communication and static and dynamic obstacles. This paper introduces a novel decentralized collaborative framework based on Reinforcement Learning to enhance multi-agent exploration in unknown environments. Our approach enables agents to decide their next action using an agent-centered field-of-view occupancy grid, and features extracted from A* algorithm-based trajectories to frontiers in the reconstructed global map. Furthermore, we propose a constrained communication scheme that enables agents to share their environmental knowledge efficiently, minimizing exploration redundancy. The decentralized nature of our framework ensures that each agent operates autonomously, while contributing to a collective exploration mission. Extensive simulations in Gymnasium and real-world experiments demonstrate the robustness and effectiveness of our system, while all the results highlight the benefits of combining autonomous exploration with inter-agent map sharing, advancing the development of scalable and resilient robotic exploration systems.
|
|
08:45-08:50, Paper ThAT10.4 | |
Integrating Multi-Robot Adaptive Sampling and Informative Path Planning for Spatiotemporal Natural Environment Prediction |
|
Kailas, Siva | Georgia Institute of Technology |
Deolasee, Srujan | Carnegie Mellon University |
Luo, Wenhao | University of Illinois Chicago |
Kim, Woojun | Carnegie Mellon University |
Sycara, Katia | Carnegie Mellon University |
Keywords: Path Planning for Multiple Mobile Robots or Agents
Abstract: Learning to predict spatiotemporal (ST) environmental processes from a sparse set of samples collected autonomously is a difficult task from both a sampling perspective (collecting the best sparse samples) and from a learning perspective (predicting the next timestep). In this work, we focus on investigating the sample collection process via multi-robot informative path planning. We present an approach for incorporating multi-robot informative path planning into a spatiotemporal adaptive sampling framework while considering path length constraints for sampling location selection. We also incorporate informative path planning to determine the best path to collect samples along while en route to collecting the desired sample. We achieve this in a decentralized manner by decoupling the process into two stages: the first stage uses our spatiotemporal mixture of Gaussian Processes (STMGP) model to determine the most informative sampling location via a mutual information lower bound heuristic and the second stage plans an informative path to collect the desired sample and other additional informative samples via submodular function optimization. Moreover, we effectively leverage peer-to-peer communication to enable coordination. Simulation results on real-world spatiotemporal data are provided to validate the effectiveness of our proposed approach.
|
|
08:50-08:55, Paper ThAT10.5 | |
D-PBS: Dueling Priority-Based Search for Multiple Nonholonomic Robots Motion Planning in Congested Environments |
|
Zhang, Xiaotong | Chinese Academy of Sciences |
Xiong, Gang | Institute of Automation, Chinese Academy of Sciences |
Wang, Yuanjing | Durham University |
Teng, Siyu | HKBU; UIC |
Chen, Long | Chinese Academy of Sciences |
Keywords: Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents, Nonholonomic Motion Planning
Abstract: This letter focuses on the multiple nonholonomic robots motion planning (MRMP) problem in congested and complex environments, where the complexity escalates dramatically with the increase in the number of robots, frequently leading to deadlocks. We present the Dueling Priority-Based Search (D-PBS), an efficient and scalable priority-based motion planner for multiple nonholonomic car-like robots, capable of enabling robots to move safely to destinations in spatially-constrained settings. We achieve this by adopting the alternate dueling collision resolution approach, coupled with the exploration of comprehensive priority relationships, effectively addressing the deadlock situations. We also introduce a novel priority-binding algorithm to enhance the scalability of our planner in restricted spaces densely populated with robots. Experimental evaluations in various scenarios demonstrate that D-PBS outperforms standard approaches to MRMP, offering superior path quality and scalability for larger robot swarms.
|
|
ThAT11 |
314 |
Haptics 1 |
Regular Session |
Chair: Moore, Carl A. | FAMU-FSU College of Engineering |
Co-Chair: Chen, Cheng-Wei | National Taiwan University |
|
08:30-08:35, Paper ThAT11.1 | |
Vision-Based Haptic Rendering with Self-Occlusion Resilience Using Shadow Correspondence |
|
Mao, Mu-Ting | National Taiwan University |
Chen, Cheng-Wei | National Taiwan University |
Keywords: Haptics and Haptic Interfaces, Telerobotics and Teleoperation, RGB-D Perception
Abstract: Vision-based haptic feedback provides cost-effective preemptive protection and real-time guidance, enhancing teleoperation with reduced system complexity. However, challenges arise as the instrument approaches target object, leading to occlusion of the point cloud behind the remote instrument, known as the self-occlusion issue. Prior solutions relying on historical point clouds or multiple viewpoints to refill the occluded region encounter adaptability issues for prolonged occlusion and limited space, thus hindering practical implementation. This paper introduces a novel non-refilling-based method for haptic force rendering, leveraging the correspondence between the tool-tip position and the tip position of the shadow-like occluded region. Experimental results demonstrate the proposed method's resilience across self-occlusion and dynamic environments, highlighting its practical applicability in robotic teleoperation.
|
|
08:35-08:40, Paper ThAT11.2 | |
A New Expression for the Passivity Bound for a Class of Sampled-Data Systems |
|
Roberts, Rodney | Florida State University |
Moore, Carl A. | FAMU-FSU College of Engineering |
Colgate, Edward | Northwestern University |
Keywords: Haptics and Haptic Interfaces, Telerobotics and Teleoperation, Force Control, Passivity
Abstract: In this article, we characterize the passivity of a class of haptic systems modeled as a simple sampled-data system. Passivity is guaranteed by ensuring that there is enough damping in the haptic interface. A necessary and sufficient bound was determined in earlier work, but the corresponding mathematical expressions were complicated, and the derivations were not completely rigorous. In this article, a more tractable expression is derived. Based on the improved expression, passivity conditions are obtained for several classes of transfer functions representing virtual environments.
|
|
08:40-08:45, Paper ThAT11.3 | |
A Haptic Feedback Device Actuated by Electromagnetic Torque |
|
Luo, Xionghuan | Hong Kong Institute of Science & Innovation, Chinese Academy Of |
Huang, Yuanrui | Xi'an Jiaotong-Liverpool University |
Zhao, Wenda | Institute of Automation,Chinese Academy of Sciences |
Liu, Hongbin | Hong Kong Institute of Science & Innovation, Chinese Academy Of |
Keywords: Haptics and Haptic Interfaces, Wearable Robotics, Virtual Reality and Interfaces
Abstract: Haptic feedback enhances user interaction with systems by adding the sense of touch, thereby improving immersion and realism in applications like virtual reality (VR), augmented reality (AR), video games, education, and robotic surgery. To address the challenges in mechanically actuated haptic feedback devices such as limited mobility, mechanical wear, and complex mechanical structures, several research sought to develop electromagnetic haptic feedback systems. However, they also suffer from the rapid decay of magnetic force with distance, thus restricting their workspace size and application potential. In this paper, we propose a novel electromagnetic haptic feedback device that is actuated by magnetic torque instead of magnetic force. By controlling the magnetic torque, which decays with distance only at a third-order rate, our device achieves a large workspace—a 200-mm-diameter hemisphere—while still delivering perceptible real-time haptic feedback within the hemisphere. While using the device, the user wears a lightweight haptic thimble housing a permanent magnet on their finger, which enables 2 degree-of-freedom (DoF) haptic feedback. A 13-coil electromagnet array serves as the source of the magnetic field. A mathematical model is proposed to determine the currents in the electromagnet array to generate the desired amount of haptic feedback torque. We conducted two experiments to prove the viability of the device. A haptic feedback accuracy experiment was conducted and validated the device's ability to generate sufficient torque within a large workspace. A user evaluation experiment showed that the device achieved an overall accuracy of 77.86% in a virtual enclosure exploration task, indicating its effectiveness and usability in haptic feedback applications.
|
|
08:45-08:50, Paper ThAT11.4 | |
Vibrotactile Haptics with Soft Magnetoresponsive Surface Interface |
|
Rimer, Evan | Queen's University, Ingenuity Labs Research Institute |
Hashtrudi-Zaad, Keyvan | Queen's University |
Robertson, Matthew | Queen's University |
Keywords: Haptics and Haptic Interfaces, Soft Robot Materials and Design, Wearable Robotics
Abstract: This paper explores the feasibility of using magnetoresponsive silicone as the primary mechanism for generating vibrotactile feedback in haptic interfaces. The distinctive feature of this research lies in the integration of magnetoresponsive silicone, a flexible material that responds to electromagnetic fields to produce localized vibrations. Preliminary experiments evaluate the performance of these actuators, focusing on their ability to produce controlled vibrations across a range of frequencies and amplitudes relevant to human tactile perception. Building on this foundation, we introduce the VibroFlex Pad, a haptic interface featuring a magnetoresponsive silicone sheet and an array of electromagnets. The VibroFlex Pad demonstrates its versatility in generating varied tactile effects and simulating dynamic wave-like movements across its surface. To assess the VibroFlex Pad's effectiveness, a user study was conducted, separately evaluating tactile accuracy, overall performance, and user comfort. The findings suggest that the VibroFlex Pad offers reliable and precise vibrotactile feedback, highlighting its potential to enhance wearable haptic technologies and improve the user experience in a variety of applications.
|
|
08:50-08:55, Paper ThAT11.5 | |
Haptic Shoulder for Rendering Biomechanically Accurate Joint Limits for Human-Robot Physical Interactions |
|
Peiros, Lizzie | University of California, San Diego |
Joyce, Calvin | University of California, San Diego |
Murugesan, Tarun | University of California, San Diego |
Nguyen, Roger | University of California, San Diego |
Fiorini, Isabella | University of California, San Diego |
Galibut, Rizzi | University of California, San Diego |
Yip, Michael C. | University of California, San Diego |
Keywords: Physical Human-Robot Interaction, Safety in HRI, Biologically-Inspired Robots
Abstract: Human-robot physical interaction (pHRI) is a rapidly evolving research field with significant implications for physical therapy, search and rescue, and telemedicine. However, a major challenge lies in accurately understanding human constraints and safety in human-robot physical experiments without an IRB and physical human experiments. Concerns regarding human studies include safety concerns, repeatability, and scalability of the number and diversity of participants. This paper examines whether a physical approximation can serve as a stand-in for human subjects to enhance robot autonomy for physical assistance. This paper introduces the SHULDRD (Shoulder Haptic Universal Limb Dynamic Repositioning Device), an economical and anatomically similar device designed for real-time testing and deployment of pHRI planning tasks onto robots in the real world. SHULDRD replicates human shoulder motion, providing crucial force feedback and safety data. The device's open-source CAD and software facilitate easy construction and use, ensuring broad accessibility for researchers. By providing a flexible platform able to emulate infinite human subjects, ensure repeatable trials, and provide quantitative metrics to assess the effectiveness of the robotic intervention, SHULDRD aims to improve the safety and efficacy of human-robot physical interactions.
|
|
08:55-09:00, Paper ThAT11.6 | |
Experimental Evaluation of Haptic Shared Control for Multiple Electromagnetic Untethered Microrobots (I) |
|
Ferro, Marco | CNRS |
Pinan Basualdo, Franco Nicolas | Katholieke Universiteit Leuven |
Robuffo Giordano, Paolo | Irisa Cnrs Umr6074 |
Misra, Sarthak | University of Twente |
Pacchierotti, Claudio | Centre National De La Recherche Scientifique (CNRS) |
Keywords: Haptics and Haptic Interfaces, Telerobotics and Teleoperation, Micro/Nano Robots
Abstract: The precise manipulation of microrobots presents challenges arising from their small size and susceptibility to external disturbances. To address these challenges, we present the experimental evaluation of a haptic shared control teleoperation framework for the locomotion of multiple microrobots, relying on a kinesthetic haptic interface and a custom electromagnetic system. Six combinations of haptic and shared control strategies are evaluated during a safe 3D navigation scenario in a cluttered environment. 18 participants are asked to steer two spherical magnetic microrobots among obstacles to reach a predefined goal, under different conditions. For each condition, participants are provided with different obstacle avoidance and navigation guidance cues. Results show that providing assistance in avoiding obstacles guarantees safer performance, regardless if the assistance is autonomous or delivered through a haptic repulsive force. Moreover, autonomous obstacle avoidance also reduces the completion time by 30% compared to haptic obstacle avoidance and no obstacle avoidance cases, although haptic feedback is preferred by the users. Finally, providing haptic guidance towards the target improves by the 65% the positioning accuracy of the microrobots with respect to not providing this guidance.
|
|
ThAT12 |
315 |
Assembly |
Regular Session |
Chair: Liu, Changliu | Carnegie Mellon University |
Co-Chair: Bahar, Iris | Colorado School of Mines |
|
08:30-08:35, Paper ThAT12.1 | |
StableLego: Stability Analysis of Block Stacking Assembly |
|
Liu, Ruixuan | Carnegie Mellon University |
Deng, Kangle | Carnegie Mellon University |
Wang, Ziwei | Tsinghua University |
Liu, Changliu | Carnegie Mellon University |
Keywords: Assembly, Performance Evaluation and Benchmarking, Robotics and Automation in Construction
Abstract: Structural stability is a necessary condition for successful construction of an assembly. However, designing a stable assembly requires a non-trivial effort since a slight variation in the design could significantly affect the structural stability. To address the challenge, this paper studies the stability of assembly structures, in particular, block stacking assembly. The paper proposes a new optimization formulation, which optimizes over force balancing equations, for inferring the structural stability of 3D block stacking structures. The proposed stability analysis is verified on hand-crafted Lego examples. The experiment results demonstrate that the proposed method can correctly predict whether the structure is stable. In addition, it outperforms the existing methods since it can accurately locate the weakest parts in the design, and more importantly, solve any given assembly structures. To further validate the proposed method, we provide StableLego: a comprehensive dataset including 50k+ 3D objects with their Lego layouts. We test the proposed stability analysis and include the stability inference for each corresponding object in StableLego. Our code and the dataset are available at https://github.com/intelligent-control-lab/StableLego.
|
|
08:35-08:40, Paper ThAT12.2 | |
Component Selection for Craft Assembly Tasks |
|
Isume, Vitor Hideyo | Osaka University |
Kiyokawa, Takuya | Osaka University |
Yamanobe, Natsuki | Advanced Industrial Science and Technology |
Domae, Yukiyasu | The National Institute of Advanced Industrial Science and Techno |
Wan, Weiwei | Osaka University |
Harada, Kensuke | Osaka University |
Keywords: Assembly, Visual Learning, Computer Vision for Automation
Abstract: Inspired by traditional handmade crafts, where a person improvises assemblies based on the available objects, we formally introduce the Craft Assembly Task. It is a robotic assembly task that involves building an accurate representation of a given target object using the available objects, which do not directly correspond to its parts. In this work, we focus on selecting the subset of available objects for the final craft, when the given input is an RGB image of the target in the wild. We use a mask segmentation neural network to identify visible parts, followed by retrieving labeled template meshes. These meshes undergo pose optimization to determine the most suitable template. Then, we propose to simplify the parts of the transformed template mesh to primitive shapes like cuboids or cylinders. Finally, we design a search algorithm to find correspondences in the scene based on local and global proportions. We develop baselines for comparison that consider all possible combinations, and choose the highest scoring combination for common metrics used in foreground maps and mask accuracy. Our approach achieves comparable results to the baselines for two different scenes, and we show qualitative results for an implementation in a real-world scenario.
|
|
08:40-08:45, Paper ThAT12.3 | |
Assembly Order Planning for Modular Structures by Autonomous Multi-Robot Systems |
|
Peters, Tom | TU Eindhoven |
Cheung, Kenneth C. | National Aeronautics and Space Administration (NASA) |
Kostitsyna, Irina | KBR at NASA Ames Research Center |
Keywords: Assembly, Path Planning for Multiple Mobile Robots or Agents, Parallel Robots
Abstract: Coordinated multi-agent robotic construction provides a means to build infrastructure in extreme environments and improve efficiency in high performance applications. Planning methods are key to understanding and achieving the scope of such applications, and are typically tailored to specific models of construction material and a consideration of passivity or activity thereof. Here, we focus on the NASA Automated Reconfigurable Mission Adaptive Digital Assembly Systems (ARMADAS) model, which includes passive lightweight structural modules and small robots that traverse the structure. We present an algorithm for calculating a build plan for robots under the constraints of this type of system. We then evaluate the quality of this plan experimentally. Many of the techniques we use can be applied to any robotic assembly system whose robots perform locomotion over the structure that they are building.
|
|
08:45-08:50, Paper ThAT12.4 | |
Master Rules from Chaos: Learning to Reason, Plan, and Interact from Chaos for Tangram Assembly |
|
Zhao, Chao | Hong Kong University of Science and Technology |
Jiang, Chunli | The Hong Kong University of Science and Technology |
Luo, Lifan | The Hong Kong University of Science and Technology |
Zhang, Guanlan | The Hong Kong University of Science and Technology |
Yu, Hongyu | The Hong Kong University of Science and Technology |
Wang, Michael Yu | Mywang@gbu.edu.cn |
Chen, Qifeng | HKUST |
Keywords: Grasping, Assembly
Abstract: Tangram assembly, the art of human intelligence and manipulation dexterity, is a new challenge for robotics and reveals the limitations of state-of-the-arts. Here, we describe our initial exploration and highlight key problems in reasoning, planning, and manipulation for robotic tangram assembly. We present MRChaos (Master Rules from Chaos), a robust and general solution for learning assembly policies that can generalize to novel objects. In contrast to conventional methods based on prior geometric and kinematic models, MRChaos learns to assemble randomly generated objects through self-exploration in simulation without prior experience in assembling target objects. The reward signal is obtained from the visual observation change without manually designed models or annotations. MRChaos retains its robustness in assembling various novel tangram objects that have never been encountered during training, with only silhouette prompts. We show the potential of MRChaos in wider applications such as cutlery combinations. The presented work indicates that radical generalization in robotic assembly can be achieved by learning in much simpler domains.
|
|
08:50-08:55, Paper ThAT12.5 | |
Robot Planning under Uncertainty for Object Assembly and Troubleshooting Using Human Causal Models |
|
Basu, Semanti | Brown University |
Tatlidil, Semir | Brown University |
Kim, Moon Hwan | Brown University |
Tran, Tiffany | Brown University |
Saxena, Serena | Brown University |
Williams, Tom | Colorado School of Mines |
Sloman, Steven | Brown University |
Bahar, Iris | Colorado School of Mines |
Keywords: Human-Centered Robotics, Embodied Cognitive Science, Planning under Uncertainty
Abstract: In this paper we explore if human mental models of objects, even when flawed, can be integrated with a collaborative robot's decision making framework to allow it to make smarter choices under partial observability for different object-related tasks such as assembly and troubleshooting. We demonstrate how (1) these informative causal models can be extracted from humans through crowdsourcing, (2) object assembly and troubleshooting can be formulated as Partially Observable Markov Decision Processes (POMDPs) and (3) our extracted causal models can be incorporated into those models in the form of approximate priors. Finally, (4) we use systematic experimentation in simulation to demonstrate the success of this approach, with 2X average improvement in reward observed for object assembly tasks, and 1.4X average improvement in reward observed for troubleshooting tasks.
|
|
08:55-09:00, Paper ThAT12.6 | |
Robotic Dry-Stacking of Clocháin with Irregular Stones |
|
Liu, Yifang | Oak Ridge National Laboratory |
Napp, Nils | Cornell University |
Keywords: Robotics and Automation in Construction, Assembly
Abstract: This paper explores automated robotic construction of clocháin, a type of corbelled rock shelter, traditionally crafted by skilled workers. While robots have been employed for simple dry-stacking tasks in the past, such as construction of stone walls or vertical stone towers, the question of whether robots possess the capacity to construct more functional structures remains unanswered. This study presents a significant step forward in robotic dry-stacking of functional structures: the assembly of natural stones into freestanding clocháin structures. We also present a set of stackability measures to aid stone selection, which significantly improves the stability of the planned structures. Our sequential filtering approach, originally designed for planning stone walls, plays a foundational role in achieving stable clochán construction. Experimental results validate the effectiveness of the stackability measures and demonstrate the physical execution of dry-stacking clocháin. The progress demonstrated in this paper opens the door to robotic construction of a wide range of utility structures in unstructured environments.
|
|
ThAT13 |
316 |
Reinforcement Learning Applications |
Regular Session |
Chair: Ekenna, Chinwe | University at Albany |
Co-Chair: Roveda, Loris | SUPSI-IDSIA |
|
08:30-08:35, Paper ThAT13.1 | |
Synthesizing Depowdering Trajectories for Robot Arms Using Deep Reinforcement Learning |
|
Maurer, Maximilian | Festo SE & Co. KG |
Seefeldt, Simon | University of Tübingen |
Seyler, Jan Reinke | Festo SE & Co. KG |
Eivazi, Shahram | University of Tübingen |
Keywords: Reinforcement Learning, Task and Motion Planning, Representation Learning
Abstract: Research into robotics applications of deep reinforcement learning (DRL) has increasingly been focussed on learning precise object manipulation and trajectory planning. Extending these tasks to continuous robot-object interactions with the surface of complex geometries remains an open problem. In this paper we investigate end-to-end DRL solutions for depowdering tasks that work by directing a pressurized air stream onto the object's surfaces using a blast nozzle head mounted on a robotic arm. We develop a GPU accelerated vectorized cleaning effect for integration into RL training and consider ways to expose vision-less trajectory synthesis for surface treatment applications to the RL agent based on UV mapping. Our experimental evaluation demonstrates that DRL has the potential to be used for generating object-specific agents for depowdering tasks on a variety of 3D objects without requiring intermediate path planners even in a full 3D motion setup. Finally, we show that DRL-generated trajectories can be transferred to a real-world setup. Our task formulation lends itself to approximate a wide range of surface treatment applications (e.g., cleaning and spray painting) with various effects.
|
|
08:35-08:40, Paper ThAT13.2 | |
World Model-Based Perception for Visual Legged Locomotion |
|
Lai, Hang | Shanghai Jiao Tong University |
Cao, Jiahang | Shanghai Jiao Tong University |
Xu, Jiafeng | ByteDance |
Wu, Hongtao | Bytedance |
Lin, Yunfeng | Shanghai Jiao Tong University |
Kong, Tao | ByteDance |
Yu, Yong | Shanghai Jiao Tong University |
Zhang, Weinan | Shanghai Jiao Tong University |
Keywords: Reinforcement Learning, Legged Robots
Abstract: Legged locomotion over various terrains is challenging and requires precise perception of the robot and its surroundings from both proprioception and vision. However, learning directly from high-dimensional visual input is often data-inefficient and intricate. To address this issue, traditional methods attempt to learn a teacher policy with access to privileged information first and then learn a student policy to imitate the teacher's behavior with visual input. Despite some progress, this imitation framework prevents the student policy from achieving optimal performance due to the information gap between inputs. Furthermore, the learning process is unnatural since animals intuitively learn to traverse different terrains based on their understanding of the world without privileged knowledge. Inspired by this natural ability, we propose a simple yet effective method, World Model-based Perception (WMP), which builds a world model of the environment and learns a policy based on the world model. We illustrate that though completely trained in simulation, the world model can make accurate predictions of real-world trajectories, thus providing informative signals for the policy controller. Extensive simulated and real-world experiments demonstrate that WMP outperforms state-of-the-art baselines in traversability and robustness. Videos and Code are available at: https://wmp-loco.github.io/.
|
|
08:40-08:45, Paper ThAT13.3 | |
V-Pilot: A Velocity Vector Control Agent for Fixed-Wing UAVs from Imperfect Demonstrations |
|
Gong, Xudong | National University of Defense Technology |
Dawei, Feng | National University of Defense Technology |
Xu, Kele | National University of Defense Technology |
Zhou, Xing | National University of Defense Technology |
Zheng, Si | Qiyuan Lab |
Ding, Bo | National University of Defense Technology |
Wang, Huaimin | National University of Defense Technology |
Keywords: Reinforcement Learning, Learning from Demonstration, Aerial Systems: Applications
Abstract: This paper addresses the challenge of Velocity Vector Control (VVC) for fixed-wing UAVs using Reinforcement Learning (RL) in the presence of imperfect demonstrations. The multi-objective and long-horizon nature of VVC introduces significant spatial and temporal complexities, complicating RL's exploration. While demonstration-based RL methods can help mitigate exploration challenges, their effectiveness is often limited by the quality of the provided demonstrations. To tackle this, we propose V-Pilot, a novel approach that integrates: (1) a controller equipped with a control law model to reduce action oscillation, thus alleviating temporal exploration issues, and (2) a VVC-specific training workflow for iterative policy refinement and demonstration quality improvement. This framework is designed to enhance the performance of demonstration-based RL under imperfect demonstrations. We evaluate V-Pilot on the fixed-wing UAV RL environment, FlyCraft. Experimental results demonstrate that V-Pilot outperforms PID and Behavioral Cloning across multiple performance metrics.
|
|
08:45-08:50, Paper ThAT13.4 | |
Efficiently Generating Expressive Quadruped Behaviors Via Language-Guided Preference Learning |
|
Clark, Jaden | Stanford University |
Hejna, Donald | Stanford University |
Sadigh, Dorsa | Stanford University |
Keywords: Reinforcement Learning, Social HRI, Emotional Robotics
Abstract: Expressive robotic behavior is essential for the widespread acceptance of robots in social environments. Recent advancements in learned legged locomotion controllers have enabled more dynamic and versatile robot behaviors. However, determining the optimal behavior for interactions with different users across varied scenarios remains a challenge. Current methods either rely on natural language input, which is efficient but low-resolution, or learn from human preferences, which, although high-resolution, is sample inefficient. This paper introduces a novel approach that leverages priors generated by pre-trained LLMs alongside the precision of preference learning. Our method, termed Language-Guided Preference Learning (LGPL), uses LLMs to generate initial behavior samples, which are then refined through preference-based feedback to learn behaviors that closely align with human expectations. Our core insight is that LLMs can guide the sampling process for preference learning, leading to a substantial improvement in sample efficiency. We demonstrate that LGPL can quickly learn accurate and expressive behaviors with as few as four queries, outperforming both purely language-parameterized models and traditional preference learning approaches. Website with videos: lgpl-gaits.github.io/
|
|
08:50-08:55, Paper ThAT13.5 | |
Learning Multi-Agent Coordination for Replenishment at Sea |
|
Han, Byeolyi | Georgia Institute of Technology |
Cho, Minwoo | Georgia Institute of Technology |
Chen, Letian | Georgia Institute of Technology |
Paleja, Rohan | MIT Lincoln Laboratory |
Wu, Zixuan | Georgia Institute of Technology |
Ye, Sean | Zoox |
Seraj, Esmaeil | Georgia Institute of Technology |
Sidoti, David | US Naval Research Laboratory |
Gombolay, Matthew | Georgia Institute of Technology |
Keywords: Planning, Scheduling and Coordination, Multi-Robot Systems, Reinforcement Learning
Abstract: Optimizing large-scale logistics is computationally challenging due to its scale and requirement to be robust to stochastic and time-varying weather disturbances. However, prior research in multi-agent reinforcement learning (MARL) does not address scenarios that capture complexity of logistics operations influenced by dynamic weather patterns. To address this gap, we suggest a new MARL environment, textsc{Marine} that has two types of agents equipped with limited resources and integrates real wave data to model the influences of weather on the replenishment at sea (RAS) operation. To this end, we propose SchedHGNN, a novel MARL algorithm that incorporates a heterogeneous graph neural network and an intrinsic reward scheme to enhance agent coordination and mitigate challenges induced by environment non-stationarity. Our results show that the combination of effective RAS scheduling and improved communication enables our model to outperform competitive baselines by up to 37.8%. This achievement marks a significant advancement in applying MARL to complex, real-world logistics scenarios.
|
|
ThAT14 |
402 |
Exoskeletons |
Regular Session |
Chair: Sharma, Nitin | North Carolina State University |
Co-Chair: Zarrouk, David | Ben Gurion University |
|
08:30-08:35, Paper ThAT14.1 | |
Real-Time Ultrasound Imaging of a Human Muscle to Optimize Shared Control in a Hybrid Exoskeleton |
|
Iyer, Ashwin | North Carolina State University |
Sun, Ziyue | NCSU |
Lambeth, Krysten | North Carolina State University |
Singh, Mayank | North Carolina State University |
Cleveland, Christine | University of North Carolina-Chapel Hill |
Sharma, Nitin | North Carolina State University |
Keywords: Prosthetics and Exoskeletons, Optimization and Optimal Control, Rehabilitation Robotics, Ultrasound Imaging
Abstract: A hybrid exoskeleton is a class of wearable robotic technology that simultaneously uses a powered exoskeleton and functional electrical stimulation (FES) to generate assistive joint torques for people with impaired mobility due to neurological disorders such as spinal cord injury (SCI). The hybrid assistive technology benefits from FES that actively elicits force from paralyzed muscles via their neural excitation, leading to muscle strengthening. The main technical barrier to realizing the hybrid technology is to attain stable coordination between FES and the exoskeleton despite the quick onset of FES-induced muscle fatigue, which causes a rapid decline in the muscle force. Current methods to measure the induced fatigue lack direct muscle state measurements and may be ineffective at capturing the muscle force decay due to FES. Instead, ultrasound (US) imaging accurately quantifies FES-related muscle contractility and fatigue due to the direct visualization of muscle fibers. In this paper, we use real-time US imaging-derived muscle strain changes as biomarkers of FES-induced fatigue in an optimal controller that modulates exoskeleton assistance and FES dosage. To demonstrate that real-time US imaging is a promising muscle-machine interface technology that can optimize shared control in a hybrid exoskeleton, we perform experiments involving continuous seated knee extension and over-ground walking tasks on two participants with SCI and four participants without disabilities. Furthermore, this work helps design a novel and unprecedented robotic gait technology with the capability to impart FES-associated therapeutic benefits while assisting the gait of neurologically impaired individuals, including those with SCI, stroke, multiple sclerosis, etc.
|
|
08:35-08:40, Paper ThAT14.2 | |
Design and Control of a Novel Semi-Passive Knee Exoskeleton |
|
Sade, Alon | Ben Gurion University of the Negev |
Coifman, Itay | Ben Gurion University of the Negev |
Riemer, Raziel | Ben-Gurion University of the Negev |
Zarrouk, David | Ben Gurion University |
Keywords: Mechanism Design, Prosthetics and Exoskeletons, Wearable Robotics
Abstract: This paper presents a novel semi-passive knee exoskeleton designed to provide running assistance. It incorporates an energy-efficient clutch mechanism activated by a mini servomotor which engages and disengages the spring that supports the leg during running. The exoskeleton extracts energy during the running phase when the muscles are acting as brakes (negative power), stores it in the spring, and then returns this energy during the positive power phase (when the muscles are acting as motors). The exoskeleton controller implements an inertial measurement unit (IMU) sensor to estimate the shank orientation that determines when to engage and disengage the spring. Two experiments designed to probe the functionality of the exoskeleton were conducted to evaluate its control performance and actuation, and the exoskeleton's biomechanical impact on three subjects. The findings showed that the control mechanism could be engaged and disengaged in real time. The maximum moment created on the knee muscles was 17 Nm, although the device could supply 28 Nm. The ratio of the consumed servo energy consumption to the subjects’ saved energy was 1:160 (0.1W input to 16W saved). This study thus paves the way for the development of lightweight, inexpensive exoskeletons that can contribute to their greater availability for a broader range of individuals.
|
|
08:40-08:45, Paper ThAT14.3 | |
Model-Based Control Strategies Comparison of One Bionic Ankle Tensegrity Exoskeleton: BATE |
|
Wei, Dunwen | University of Electronic Science and Technology of China |
Mao, Shiyu | University of Electronic Science and Technology of China |
Zhang, Zhichao | University of Electronic Science and Technology of China |
Wei, Ximing | University of Electronic Science and Technology of China |
Gao, Tao | University of Electronic Science and Technology of China |
Ficuciello, Fanny | Universitŕ Di Napoli Federico II |
Keywords: Rehabilitation Robotics, Wearable Robotics, Biologically-Inspired Robots
Abstract: This paper presents a comparative analysis of model-based control strategies for a Bionic Ankle Tensegrity Exoskeleton (BATE). The BATE is designed to mimic the self-stress equilibrium and self-supporting characteristics of the human ankle biotensegrity structure. Model-based control strategies are conventional methods that can help discover the principles of complex tensegrity systems. The high dimensions and non-linearity of the BATE pose challenges for physical modelling and require unique model-based control strategies. In this study, we propose a modelling method that considers interaction force and explore the trajectory tracking performance and robustness of the ankle exoskeleton under three power-assisted control methods: position control, force control, and hybrid force-position contorl. The experimental results suggest that the PC method offers superior tracking performance and robustness compared to the other two methods. This method can be used for early rehabilitation training to improve flexibility. The control concept emphasizes its advantages over current wearable exoskeletons and introduces new ideas for high-performance exoskeletons.
|
|
08:45-08:50, Paper ThAT14.4 | |
Human-Like Walking Motion Generation for Self-Balancing Lower Limb Rehabilitation Exoskeletons |
|
Yang, Ming | University of Science and Technology of China |
Chen, Ziqiang | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Li, Wentao | Shenzhen Institute of Advanced Technology, Chinese Academy of Sc |
Li, Feng | Shenzhen Institute of Advanced Technology Chinere Academy of Sci |
Shang, Weiwei | University of Science and Technology of China |
Tian, Dingkui | Shenzhen Advanced Technology Research Institute, Chinese Academy |
Wu, Xinyu | CAS |
Keywords: Prosthetics and Exoskeletons, Rehabilitation Robotics, Wearable Robotics
Abstract: Self-balancing lower limb rehabilitation exoskeletons (SLLREs) allow individuals with lower limb dysfunction to walk without the use of crutches. Stable and human-like walking motions are crucial for SLLREs because achieving a close imitation of healthy human walking is a key goal in rehabilitation therapy. Existing SLLREs can realize stable walking but lack human-like features. This paper designs a walking motion generator based on hierarchical optimization to generate a human-like walking motion with variable hip height, heel-strike, toe-off, and knee-stretched features. This hierarchically optimized human-like walking motion generator consists of a knee-stretched optimizer and an optimization-based stabilizing filter. Specifically, the knee-stretched optimizer realizes the stretched knee feature by optimizing the hip trajectory with varying heights. And the stabilizing filter realizes stable walking by optimizing the hip trajectory in the sagittal plane direction.To validate the effectiveness of the proposed human-like walking motion generator, walking experiments were conducted on SLLRE AutoLEE-G3 both in a simulation environment and the real world. The experimental results show that the human-like walking motions look more natural and reduce the required torque for the knee joint compared with knee-bent walking.
|
|
08:50-08:55, Paper ThAT14.5 | |
Kinematic Benefits of a Cable-Driven Exosuit for Head-Neck Mobility |
|
Bales, Ian | University of Utah |
Zhang, Haohan | University of Utah |
Keywords: Prosthetics and Exoskeletons, Wearable Robotics, Tendon/Wire Mechanism
Abstract: This letter presents a novel cable-driven exosuit intended for head-neck support and movement assistance. Mobility limitations in the head-neck, such as dropped head syndrome, can result from various neurological disorders. Current solutions, ranging from static neck collars to rigid-link robotic neck exoskeletons, are unsatisfactory. Neck collars are the most used clinically but fail to restore head-neck motion. Rigid-link neck exoskeletons can enable head movement but are bulky and restrictive. In this letter, we present the design of this exosuit, an analysis of its ability to balance the gravitational moment of the head in simulation, and the results of a user study comparing its kinematic performance to a state-of-the-art rigid-link neck exoskeleton. The exosuit is able to support the head across its full range of motion according to simulation results. It fits users of different sizes and participants exhibited more natural head-neck movement wearing the exosuit as compared to wearing the rigid-link exoskeleton. The exosuit allowed more head rotations than the rigid-link neck exoskeleton and required less compensatory torso movement for three daily tasks (looking for traffic, drinking from a bottle, and picking up an object from the floor). Its absolute range of motion was also much larger than the one allowed by the rigid-link neck exoskeleton. These results demonstrate the kinematic benefits of a cable-driven neck exosuit and provide justification for studying the use of such an exosuit for head-neck movement assistance in patient groups.
|
|
ThAT15 |
403 |
Continuum Robots 1 |
Regular Session |
Chair: Morimoto, Tania K. | University of California San Diego |
Co-Chair: Yuan, Sichen | The University of Alabama |
|
08:30-08:35, Paper ThAT15.1 | |
PH-Gauss-Lobatto Reduced-Order-Model for Shape Control of Soft-Continuum Manipulators |
|
Mbakop, Steeve | Junia |
Tagne, Gilles | Yncréa Hauts De France / ISEN Lille |
Chevillon, Tanguy | Junia |
Drakunov, Sergey | IHMC |
Merzouki, Rochdi | CRIStAL, CNRS UMR 9189, University of Lille1 |
Keywords: Modeling, Control, and Learning for Soft Robots, Kinematics, Motion and Path Planning, Soft Robot Applications
Abstract: Soft and hyper-elastic materials possess properties of resilience and flexibility, characterizing a class of Soft-Continuum Manipulators (SCM). The latter describes a robot structure with an infinite number of degrees of freedom (DoFs), useful for mobility and manipulation. However, these geometric characteristics are source of modeling and control problems. In this paper, a Pythagorean Hodograph (PH) curve based Reduced-Order-Model (ROM) relying on the Gauss-Lobatto quadrature is investigated for the modeling and the control of SCM. This allows, first, reducing the dimension of the SCM kinematics based on the PH parametric curves with a predefined length and second, developing the shape kinematics control from its control polygon. The use of the Gauss-Lobatto quadrature allows to move independently the PH curve control points, while preserving PH features of length and minimum curve energy. These features are important to control in real-time the shape of the SCM. The proposed approach has been validated numerically and experimentally, carried out on a bio-inspired Soft continuum Elephant Trunk Robot.
|
|
08:35-08:40, Paper ThAT15.2 | |
Towards Contact-Aided Motion Planning for Tendon-Driven Continuum Robots |
|
Rao, Priyanka | University of Toronto |
Salzman, Oren | Technion |
Burgner-Kahrs, Jessica | University of Toronto |
Keywords: Modeling, Control, and Learning for Soft Robots, Motion and Path Planning, Soft Robot Applications
Abstract: Tendon-driven continuum robots (TDCRs), with their flexible backbones, offer the advantage of being used for navigating complex, cluttered environments. However, to do so, they typically require multiple segments, often leading to complex actuation and control challenges. To this end, we propose a novel approach to navigate cluttered spaces effectively for a single-segment long TDCR which is the simplest topology from a mechanical point of view. Our key insight is that by leveraging contact with the environment we can achieve multiple curvatures without mechanical alterations to the robot. Specifically, we propose a search-based motion planner for a single-segment TDCR. This planner, guided by a specially designed heuristic, discretizes the configuration space and employs a best-first search. The heuristic, crucial for efficient navigation, provides an effective cost-to-go estimation while respecting the kinematic constraints of the TDCR and environmental interactions. We empirically demonstrate the efficiency of our planner-testing over 525 queries in environments with both convex and non-convex obstacles, our planner is demonstrated to have a success rate of about 80% while baselines were not able to obtain a success rate higher than 30%. The difference is attributed to our novel heuristic which is shown to significantly reduce the required search space.
|
|
08:40-08:45, Paper ThAT15.3 | |
A Simple Dynamics Model for Cable Driven Continuum Robots with Actuator Coupling |
|
Watson, Connor | Morimoto Lab, UCSD |
Morimoto, Tania K. | University of California San Diego |
Keywords: Modeling, Control, and Learning for Soft Robots, Surgical Robotics: Steerable Catheters/Needles, Tendon/Wire Mechanism
Abstract: The flexibility and dexterity of cable-driven continuum robots (CDCRs) make them well-suited for intricate tasks such as minimally invasive surgery. However, the complexity of accurately modeling their dynamics has limited their broader adoption and effective control. Current models either oversimplify the dynamics by assuming quasi-static conditions or overcomplicate them, making real-time application challenging. Additionally, many existing models neglect the critical coupling between the robot's body and actuator dynamics, a factor essential for accurate control. In this paper, we propose a new, minimal dynamics model for CDCRs that strikes a balance between simplicity and accuracy. Our model captures the essential dynamics of both the robot and its actuators, providing a practical tool for control design. We also establish connections between our model and those used for other robotic systems, enabling the transfer of well-established control strategies to CDCRs. The model is validated through hardware experiments, demonstrating its capability to effectively address complex control challenges in CDCR applications.
|
|
08:45-08:50, Paper ThAT15.4 | |
A Novel Tendon-Driven Articulated Continuum Robot with Stabilized Self-Locking Joints |
|
Ren, Jiankun | Fudan University |
Qi, Lizhe | Fudan University |
Jia, Yu | Fudan University |
Wang, Hecheng | Fudan University |
Wang, Ziheng | Academy for Engineering & Technology, Fudan University |
Sun, Yunquan | Fudan University |
Keywords: Mechanism Design, Tendon/Wire Mechanism, Actuation and Joint Mechanisms
Abstract: Articulated continuum robots (ACRs) are characterized by flexibility, controllability, and adaptability and perform excellently in complex and constrained environments. However, the large number of motor drives limit the ACRs' portability and make them cumbersome to control. This paper presents a novel tendon-driven ACR composed of stabilized self-locking joints (SLJs) connected in series. After triggering the mechanical constraints with shape memory alloy coils, each joint can be maintained in either a self-locking or release state with zero power consumption. Consequently, even with a single set of drive units, the ACR can operate in multiple modes, enabling variable motion performance and workspace adaptability, effectively reducing the number of motors. The ACR's stiffness also varies with the locking state of its SLJs, and no motor drive is required to maintain its shape when all SLJs are self-locking. The performance and reliability of the SLJ prototype were validated. The workspace of the ACR prototype model was analyzed, and its partial motion performance, motion error, and variable stiffness were verified.
|
|
08:50-08:55, Paper ThAT15.5 | |
Tensiworm: A Novel Tensegrity Robot with Enhanced Peristaltic Locomotion Efficiency |
|
Kazoleas, Christian | The University of Alabama |
Zhang, Jiajun | The University of Alabama |
Yuan, Sichen | The University of Alabama |
Keywords: Actuation and Joint Mechanisms, Biomimetics, Soft Robot Materials and Design
Abstract: Tensegrity structures have been widely explored for their lightweight, high-stiffness, and foldable properties. These unique characteristics have enabled their application in various fields including robotics. Tensegrity robots have demonstrated diverse locomotion modes offering versatile solutions for navigation in complex environments. Recent efforts in bio-inspired robotics have led to designs mimicking the movement of natural organisms, such as earthworms. However, existing designs, particularly those utilizing motor-pulley mechanisms for robot body contraction, face significant challenges due to their bulky actuation systems that reduce locomotion efficiency. This paper introduces a novel tensegrity robot, "Tensiworm," inspired by the peristaltic locomotion of an earthworm. Composed of three icosahedron tensegrity unit cells connected in series, the Tensiworm robot employs a sequential contraction and relaxation mechanism driven by active cable members made of shape memory actuators. This innovative design achieves a 59.13% folding ratio and weighs only 46.9 grams. The robot can travel a distance equal to its body length in approximately ten cycles with an average speed of 10.01 mm per minute. Furthermore, the use of thinner, flexible structural members broadens possibilities for development of millimeter-scale tensegrity robots, which hold significant potential for biomedical applications, including in-vivo testing and targeted drug delivery.
|
|
08:55-09:00, Paper ThAT15.6 | |
Accelerated Quasi-Static FEM for Real-Time Modeling of Continuum Robots with Multiple Contacts and Large Deformation |
|
Chen, Hao | University of Chinese Academy of Sciences |
Chen, Jian | Hong Kong Institute of Science and Innovation, Chinese Academy O |
Liu, Xinran | University of Chinese Academy of Sciences |
Zhang, Zihui | Institute of Automation, Chinese Academy of Sciences |
Huang, Yuanrui | Xi'an Jiaotong-Liverpool University |
Zhang, Zhongkai | University of Montpellier, CNRS |
Liu, Hongbin | Institute of Automation,Chinese Academy of Sciences |
Keywords: Simulation and Animation, Contact Modeling, Medical Robots and Systems
Abstract: Continuum 机器人提供高度的灵活性和多个自由度,使其成为导航窄流腔的理想选择。然而,准确模拟它们在大变形和频繁环境接触下的行为仍然具有挑战性。当前求解这些机器人变形的方法,例如模型降阶法和高斯-塞德尔 (GS) 方法,都存在明显的缺点。随着接触点数量的增加,他们的计算速度会降低,并且难以在速度和模型精度之间取得平衡。为了克服这些限制,我们引入了一种名为 Acc-FEM 的新型有限元方法 (FEM)。Acc-FEM 采用大变形准静态有限元模型,并集成了加速求解器方案,以高效处理多触点仿真。此外,它还利用图形处理单元 (GPU) 的并行计算来实时更新有限元模型和
|
|
ThAT16 |
404 |
Grasping 3 |
Regular Session |
Chair: Sun, Yu | University of South Florida |
Co-Chair: Natale, Lorenzo | Istituto Italiano Di Tecnologia |
|
08:30-08:35, Paper ThAT16.1 | |
Multi-Object Grasping -- Experience Forest for Robotic Finger Movement Strategies |
|
Chen, Tianze | University of South Florida |
Sun, Yu | University of South Florida |
Keywords: Logistics, Grasping
Abstract: This paper introduces a novel Experience Forest algorithm designed for multi-object grasping (MOG). Different from single-object grasping, for MOG, the hand poses a few steps before the end of grasping play important roles in the success of MOG. But similar to single-object grasping, the hand poses that are far away from the end grasping pose are not as relevant. Therefore, the proposed approach invented the Experience Forest structure to organize the finger movement sequences collected in naive MOG approaches with a set of trees instead of a single tree. The algorithm propagates success or failure results in the trials from end-pose nodes only to the nodes representing several preceding hand poses. When using the trees to generate a grasping sequence, the algorithm generates a finger-movement policy that follows a MOG synergy at the beginning and then transits to a tree in the Experience Forest and then employs a breadth-first search to achieve a more reliable solution. Tested on various objects using a UR5e robotic arm and Barrett hand in both simulated and real environments, the strategy significantly boosts efficiency in object transfer tasks by up to 60%, marking a 10% improvement over our previous methods.
|
|
08:35-08:40, Paper ThAT16.2 | |
VMF-Contact: Uncertainty-Aware Evidential Learning for Probabilistic Contact-Grasp in Noisy Clutter |
|
Shi, Yitian | Karlsruhe Institute of Technology |
Welte, Edgar | Karlsruhe Institute of Technology (KIT) |
Gilles, Maximilian | Karlsruhe Institute of Technology |
Rayyes, Rania | Karlsruhe Institute for Technology (KIT) |
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation, Grasping
Abstract: Grasp learning in noisy environments, such as occlusions, sensor noise, and out-of-distribution (OOD) objects, poses significant challenges. Recent learning-based approaches focus primarily on capturing aleatoric uncertainty from inherent data noise. The epistemic uncertainty, which represents the OOD recognition, is often addressed by ensembles with multiple forward paths, limiting real-time application. In this paper, we propose an uncertainty-aware approach for 6-DoF grasp detection using evidential learning to comprehensively capture both uncertainties in real-world robotic grasping. As a key contribution, we introduce vMF-Contact, a novel architecture for learning hierarchical contact grasp representations with probabilistic modeling of directional uncertainty as von Mises–Fisher (vMF) distribution. To achieve this, we analyze the theoretical formulation of the second-order objective on the posterior parametrization, providing formal guarantees for the model's ability to quantify uncertainty and improve grasp prediction performance. Moreover, we enhance feature expressiveness by applying partial point reconstructions as an auxiliary task, improving the comprehension of uncertainty quantification as well as the generalization to unseen objects. In the real-world experiments, our method demonstrates a significant improvement by 39% in the overall clearance rate compared to the baselines. The code is available under: https://github.com/YitianShi/vMF-Contact/tree/main
|
|
08:40-08:45, Paper ThAT16.3 | |
QuadWBG: Generalizable Quadrupedal Whole-Body Grasping |
|
Wang, Jilong | Galaxy General Robot Co., Ltd |
Rajabov, Javokhirbek | Peking University |
Xu, Chaoyi | Beijing University of Posts and Telecommunications |
Zheng, Yiming | University of Toronto |
Wang, He | Peking University |
Keywords: Mobile Manipulation, Legged Robots, Whole-Body Motion Planning and Control
Abstract: Legged robots with advanced manipulation capabilities have the potential to significantly improve household duties and urban maintenance. Despite considerable progress in developing robust locomotion and precise manipulation methods, seamlessly integrating these into cohesive whole-body control for real-world applications remains challenging. In this paper, we present a modular framework for robust and generalizable whole-body loco-manipulation controller based on a single arm-mounted camera. By using reinforcement learning (RL), we enable a robust low-level policy for command execution over 5 dimensions (5D) and a grasp-aware high-level policy guided by a novel metric, Generalized Oriented Reachability Map (GORM). The proposed system achieves state-of-the-art one-time grasping accuracy of 89% in real world, including challenging tasks such as grasping transparent objects. Through extensive simulations and real-world experiments, we demonstrate that our system can effectively manage a large workspace, from floor level to above body height, and perform diverse whole-body loco-manipulation tasks.
|
|
08:45-08:50, Paper ThAT16.4 | |
Composing Dextrous Grasping and In-Hand Manipulation Via Scoring with a Reinforcement Learning Critic |
|
Röstel, Lennart | Technical University of Munich |
Winkelbauer, Dominik | DLR |
Pitz, Johannes | Technical University of Munich |
Sievers, Leon | German Aerospace Center |
Bäuml, Berthold | Technical University of Munich |
Keywords: Deep Learning in Grasping and Manipulation, In-Hand Manipulation, Dexterous Manipulation
Abstract: In-hand manipulation and grasping are fundamental yet often separately addressed tasks in robotics. For deriving in-hand manipulation policies, reinforcement learning has recently shown great success. However, the derived controllers are not yet useful in real-world scenarios because they often require a human operator to place the objects in suitable initial (grasping) states. Finding stable grasps that also promote the desired in-hand manipulation goal is an open problem. In this work, we propose a method for bridging this gap by leveraging the critic network of a reinforcement learning agent trained for in-hand manipulation to score and select initial grasps. Our experiments show that this method significantly increases the success rate of in-hand manipulation without requiring additional training. We also present an implementation of a full grasp manipulation pipeline on a real-world system, enabling autonomous grasping and reorientation even of unwieldy objects.
|
|
08:50-08:55, Paper ThAT16.5 | |
Bring Your Own Grasp Generator: Leveraging Robot Grasp Generation for Prosthetic Grasping |
|
Stracquadanio, Giuseppe | Italian Institute of Technology |
Vasile, Federico | Istituto Italiano Di Tecnologia |
Maiettini, Elisa | Humanoid Sensing and Perception, Istituto Italiano Di Tecnologia |
Boccardo, Nicolň | IIT - Istituto Italiano Di Tecnologia |
Natale, Lorenzo | Istituto Italiano Di Tecnologia |
Keywords: Deep Learning in Grasping and Manipulation, Sensor Fusion, Prosthetics and Exoskeletons
Abstract: One of the most important research challenges in upper-limb prosthetics is enhancing the user-prosthesis communication to closely resemble the experience of a natural limb. As prosthetic devices become more complex, users often struggle to control the additional degrees of freedom. In this context, leveraging shared-autonomy principles can significantly improve the usability of these systems. In this paper, we present a novel eye-in-hand prosthetic grasping system that follows these principles. Our system initiates the approach-to-grasp action based on user's command and automatically configures the DoFs of a prosthetic hand. First, it reconstructs the 3D geometry of the target object without the need of a depth camera. Then, it tracks the hand motion during the approach-to-grasp action and finally selects a candidate grasp configuration according to user's intentions. We deploy our system on the Hannes prosthetic hand and test it on able-bodied subjects and amputees to validate its effectiveness. We compare it with a multi-DoF prosthetic control baseline and find that our method enables faster grasps, while simplifying the user experience. Code and demo videos are available online at this https URL.
|
|
ThAT17 |
405 |
Localization 5 |
Regular Session |
Chair: Lu, Guoyu | University of Georgia |
Co-Chair: Jiao, Jianhao | University College London |
|
08:30-08:35, Paper ThAT17.1 | |
AIR-HLoc: Adaptive Retrieved Images Selection for Efficient Visual Localisation |
|
Liu, Changkun | The Hong Kong University of Science and Technology |
Jiao, Jianhao | University College London |
Huang, Huajian | The Hong Kong University of Science and Technology |
Ma, Zhengyang | The Hong Kong University of Science and Technology |
Kanoulas, Dimitrios | University College London |
Braud, Tristan | HKUST |
Keywords: Localization, SLAM, Visual Learning
Abstract: State-of-the-art hierarchical localisation pipelines (HLoc) employ image retrieval (IR) to establish 2D-3D correspondences by selecting the top-k most similar images from a reference database. While increasing k improves localisation robustness, it also linearly increases computational cost and runtime, creating a significant bottleneck. This paper investigates the relationship between global and local descriptors, showing that greater similarity between the global descriptors of query and database images increases the proportion of feature matches. Low similarity queries significantly benefit from increasing k, while high similarity queries rapidly experience diminishing returns. Building on these observations, we propose an adaptive strategy that adjusts k based on the similarity between the query's global descriptor and those in the database, effectively mitigating the feature-matching bottleneck. Our approach reduces computational costs and processing time without sacrificing accuracy. Experiments on three indoor and outdoor datasets show that AIR-HLoc reduces feature matching time by up to 30% while preserving state-of-the-art accuracy. The results demonstrate that AIR-HLoc facilitates a latency-sensitive localisation system.
|
|
08:35-08:40, Paper ThAT17.2 | |
NeuraLoc: Visual Localization in Neural Implicit Map with Dual Complementary Features |
|
Zhai, Hongjia | Zhejiang University |
Boming, Zhao | Zhejiang University |
Li, Hai | Zhejiang University |
Pan, Xiaokun | Zhejiang University |
He, Yijia | TCL RayNeo |
Cui, Zhaopeng | Zhejiang University |
Bao, Hujun | Zhejiang University |
Zhang, Guofeng | Zhejiang University |
Keywords: Localization, Mapping, RGB-D Perception
Abstract: Recently, neural radiance fields (NeRF) have gained significant attention in the field of visual localization. However, existing NeRF-based approaches either lack geometric constraints or require extensive storage for feature matching, limiting their practical applications. To address these challenges, we propose an efficient and novel visual localization approach based on the neural implicit map with complementary features. Specifically, to enforce geometric constraints and reduce storage requirements, we implicitly learn a 3D keypoint descriptor field, avoiding the need to explicitly store point-wise features. To further address the semantic ambiguity of descriptors, we introduce additional semantic contextual feature fields, which enhance the quality and reliability of 2D-3D correspondences. Besides, we propose descriptor similarity distribution alignment to minimize the domain gap between 2D and 3D feature spaces during matching. Finally, we construct the matching graph using both complementary descriptors and contextual features to establish accurate 2D-3D correspondences for 6-DoF pose estimation. Compared with the recent NeRF-based approach, our method achieves a 3x faster training speed and a 45x reduction in model storage. Extensive experiments on two widely used datasets demonstrate that our approach outperforms or is highly competitive with other state-of-the-art NeRF-based visual localization methods.
|
|
08:40-08:45, Paper ThAT17.3 | |
LiftFeat: 3D Geometry-Aware Local Feature Matching |
|
Liu, Yepeng | Wuhan University |
Lai, Wenpeng | SFMAP Technology |
Zhao, Zhou | Central China Normal University |
Xiong, Yuxuan | Wuhan University |
Zhu, Jinchi | Wuhan University |
Cheng, Jun | Institute for Infocomm Research, A*STAR |
Xu, Yongchao | Wuhan University |
Keywords: Deep Learning for Visual Perception, Visual Learning, Localization
Abstract: Robust and efficient local feature matching plays a crucial role in applications such as SLAM and visual localization for robotics. Despite great progress, it is still very challenging to extract robust and discriminative visual features in scenarios with drastic lighting changes, low texture areas, or repetitive patterns. In this paper, we propose a new lightweight network called LiftFeat, which lifts the robustness of raw descriptor by aggregating 3D geometric feature. Specifically, we first adopt a pre-trained monocular depth estimation model to generate pseudo surface normal label, supervising the extraction of 3D geometric feature in terms of predicted surface normal. We then design a 3D geometry-aware feature lifting module to fuse surface normal feature with raw 2D descriptor feature. Integrating such 3D geometric feature enhances the discriminative ability of 2D feature description in extreme conditions. Extensive experimental results on relative pose estimation, homography estimation, and visual localization tasks, demonstrate that our LiftFeat outperforms some lightweight state-of-the-art methods. Code will be released at : https://github.com/lyp-deeplearning/LiftFeat.
|
|
08:45-08:50, Paper ThAT17.4 | |
DVS-Aware Visual Perception for Mobile Robots with Neuromorphic Hardware |
|
Zhong, Hanzhong | Tsinghua University |
Jin, YingJie | Lenovo Research |
Li, Guangbin | Lenovo Research |
Li, Xiang | Tsinghua University |
Wang, Zhepeng | Lenovo Research |
Keywords: Neurorobotics, Deep Learning for Visual Perception, Sensor-based Control
Abstract: The Dynamic Vision Sensor (DVS) is a distinctive visual sensor that exclusively responds to alterations in pixel brightness, enabling the real-time capture of swift and subtle movements with reduced power consumption and data bandwidth requirements. This paper proposes a DVS-aware visual perception method and presents its application for pose estimation of mobile robots. Specifically, a new marker is designed to provide pose reference data that leverages the inherent advantages of DVS more effectively. Moreover, we formulate a pose recognition system incorporating DVS, an algorithm based on Spiking Convolutional Neural Networks (SCNN) and a neuromorphic computing accelerator (Lynxi HS110). Such a formulation can well explore the DVS's advantages, as its event-triggered feature matches the nature of SCNN while the neuromorphic hardware enables efficient, low-power execution, making the system highly suitable for real-time embedded applications. Comparative analysis with traditional ARcode-based pose recognition methods reveals that our innovative approach demonstrates significant advantages in recognition speed and energy efficiency. The whole system is deployed on mobile robots and evaluated in real-world scenarios.
|
|
08:50-08:55, Paper ThAT17.5 | |
Feedback RoI Features Improve Aerial Object Detection |
|
Ren, Botao | Tsinghua University |
Xu, Botian | Tsinghua University |
Wang, Jingyi | Tsinghua University |
Gao, Hanwei | SAIC AILab |
Yu, Qiankun | Tsinghua University |
Deng, Zhidong | Tsinghua University |
Keywords: Object Detection, Segmentation and Categorization, Aerial Systems: Applications, Aerial Systems: Perception and Autonomy
Abstract: Research in visual perception has shown that the human visual system utilizes high-level feedback information to guide lower-level processing, enabling adaptation to signals of varying characteristics. Inspired by this, we propose the Feedback multi-Level feature EXtractor (Flex) to dynamically adjust feature selection in object detection based on image-wise and instance-level feedback information. This is particularly beneficial for applications such as aerial object detection, UAV-based target recognition and autonomous vehicle navigation, where global image quality issues like sensor degradation, foggy, or rainy conditions can impact detection performance. Flex adapts to variations in image quality, refining the feature extraction process to improve robustness against these challenges. Experimental results demonstrate that Flex consistently enhances a range of state-of-the-art methods on challenging aerial object detection datasets, including DOTA-v1.0, DOTA-v1.5, and HRSC2016. Furthermore, additional experiments on MS COCO confirm the module's effectiveness in general object detection tasks. Our quantitative and qualitative analyses reveal that the improvements are strongly correlated with image quality, aligning with our original motivation to address global image quality issues in real-world scenarios.
|
|
08:55-09:00, Paper ThAT17.6 | |
Keypoint Detection and Description for Raw Bayer Images |
|
Lin, Jiakai | University of Georgia |
Zhang, Jinchang | University of Georgia |
Lu, Guoyu | University of Georgia |
Keywords: Visual Tracking, Vision-Based Navigation, Visual Learning
Abstract: Keypoint detection and local feature description are fundamental tasks in robotic perception, critical for applications such as SLAM, robot localization, feature matching, pose estimation, and 3D mapping. While existing methods predominantly operate on RGB images, we propose a novel network that directly processes raw images, bypassing the need for the Image Signal Processor (ISP). This approach significantly reduces hardware requirements and memory consumption, which is crucial for robotic vision systems. Our method introduces two custom-designed convolutional kernels capable of performing convolutions directly on raw images, preserving inter-channel information without converting to RGB. Experimental results show that our network outperforms existing algorithms on raw images, achieving higher accuracy and stability under large rotations and scale variations. This work represents the first attempt to develop a keypoint detection and feature description network specifically for raw images, offering a more efficient solution for resource-constrained environments.
|
|
ThAT18 |
406 |
Planning under Uncertainty 1 |
Regular Session |
Chair: Tariq, Faizan M. | Honda Research Institute USA, Inc |
Co-Chair: Kennedy, Monroe | Stanford University |
|
08:30-08:35, Paper ThAT18.1 | |
Delayed-Decision Motion Planning in the Presence of Multiple Predictions |
|
Isele, David | University of Pennsylvania, Honda Research Institute USA |
Anon, Alexandre Miranda | Honda Research Institute USA |
Tariq, Faizan M. | Honda Research Institute USA, Inc |
Yeh, Zheng-Hang | Honda Research Institute |
Singh, Avinash | Honda Research Institute, USA |
Bae, Sangjae | Honda Research Institute, USA |
Keywords: Motion and Path Planning, Planning under Uncertainty, Autonomous Agents
Abstract: Reliable automated driving technology is challenged by various sources of uncertainties, in particular, behavioral uncertainties of traffic agents. It is not uncommon for traffic agents to contain multiple intentions followed by distinguishable maneuvers, and the automated driving car must reflect the uncertainty. This paper formalizes a behavior planning scheme in the presence of multiple possible futures with corresponding probabilities. In essence, we present a maximum entropy formulation and show how, under certain assumptions, this allows delayed decision-making to improve safety. The general formulation is then turned into a model predictive control formulation, which is solved as a quadratic program or a set of quadratic programs. We discuss implementation details for improving computation and present validation results in simulation and on a mobile robot.
|
|
08:35-08:40, Paper ThAT18.2 | |
Stochastic Trajectory Prediction under Unstructured Constraints |
|
Ma, Hao | Institute of Automation, Chinese Academy of Sciences |
Pu, Zhiqiang | University of Chinese Academy of Sciences; Institute of Automati |
Wang, Shijie | Institute of Automation, Chinese Academy of Sciences |
Liu, Boyin | University of Chinese Academy of Sciences School of Artificial I |
Wang, Huimu | University of Chinese Academy of Sciences |
Liang, Yanyan | Macau University of Science and Technology |
Yi, Jianqiang | Chinese Academy of Sciences |
Keywords: Constrained Motion Planning, Motion and Path Planning, Task and Motion Planning
Abstract: Trajectory prediction facilitates effective planning and decision-making, while constrained trajectory prediction integrates regulation into prediction. Recent advances in constrained trajectory prediction focus on structured constraints by constructing optimization objectives. However, handling unstructured constraints is challenging due to the lack of differentiable formal definitions. To address this, we propose a novel method for constrained trajectory prediction using a conditional generative paradigm, named Controllable Trajectory Diffusion (CTD). The key idea is that any trajectory corresponds to a degree of conformity to a constraint. By quantifying this degree and treating it as a condition, a model can implicitly learn to predict trajectories under unstructured constraints. CTD employs a pre-trained scoring model to predict the degree of conformity (i.e., a score), and uses this score as a condition for a conditional diffusion model to generate trajectories. Experimental results demonstrate that CTD achieves high accuracy on the ETH/UCY and SDD benchmarks. Qualitative analysis confirms that CTD ensures adherence to unstructured constraints and can predict trajectories that satisfy combinatorial constraints.
|
|
08:40-08:45, Paper ThAT18.3 | |
A Control Barrier Function for Safe Navigation with Online Gaussian Splatting Maps |
|
Chen, Timothy | Stanford University |
Swann, Aiden | Stanford |
Yu, Javier | Stanford University |
Shorinwa, Ola | Stanford University |
Murai, Riku | Imperial College London |
Kennedy, Monroe | Stanford University |
Schwager, Mac | Stanford University |
Keywords: Collision Avoidance, Robot Safety, Mapping
Abstract: SAFER-Splat (Simultaneous Action Filtering and Environment Reconstruction) is a real-time, scalable, and minimally invasive action filter, based on control barrier functions, for safe robotic navigation in a detailed map constructed at runtime using Gaussian Splatting (GSplat). We propose a novel Control Barrier Function (CBF) that not only induces safety with respect to all Gaussian primitives in the scene, but when synthesized into a controller, is capable of processing hundreds of thousands of Gaussians while maintaining a minimal memory footprint and operating at 15 Hz during online Splat training. Of the total compute time, a small fraction of it consumes GPU resources, enabling uninterrupted training. The safety layer is minimally invasive, correcting robot actions only when they are unsafe. To showcase the safety filter, we also introduce SplatBridge, an open-source software package built with ROS for real-time GSplat mapping for robots. We demonstrate the safety and robustness of our pipeline first in simulation, where our method is 20-50x faster, safer, and less conservative than competing methods based on neural radiance fields. Further, we demonstrate simultaneous GSplat mapping and safety filtering on a drone hardware platform using only on-board perception. We verify that under teleoperation a human pilot cannot invoke a collision. Our videos and codebase can be found at https://chengine.github.io/safer-splat.
|
|
08:45-08:50, Paper ThAT18.4 | |
A Skeleton-Based Topological Planner for Exploration in Complex Unknown Environments |
|
Niu, Haochen | Shanghai Jiao Tong University |
Ji, Xingwu | Shanghai Jiao Tong University |
Zhang, Lantao | Shanghai Jiao Tong University |
Wen, Fei | Shanghai Jiao Tong University |
Ying, Rendong | Shanghai Jiao Tong University |
Liu, Peilin | Shanghai Jiao Tong University |
Keywords: Motion and Path Planning, Reactive and Sensor-Based Planning
Abstract: The capability of autonomous exploration in complex, unknown environments is important in many robotic applications. While recent research on autonomous exploration have achieved much progress, there are still limitations, e.g., existing methods relying on greedy heuristics or optimal path planning are often hindered by repetitive paths and high computational demands.To address such limitations, we propose a novel exploration framework that utilizes the global topology information of observed environment to improve exploration efficiency while reducing computational overhead.Specifically, global information is utilized based on a skeletal topological graph representation of the environment geometry. We first propose an incremental skeleton extraction method based on wavefront propagation, based on which we then design an approach to generate a lightweight topological graph that can effectively capture the environment's structural characteristics. Building upon this, we introduce a finite state machine that leverages the topological structure to efficiently plan coverage paths, which can substantially mitigate the back-and-forth maneuvers (BFMs) problem. Experimental results demonstrate the superiority of our method in comparison with state-of-the-art methods. The source code will be made publicly available at: url{https://github.com/Haochen-Niu/STGPlanner}.
|
|
08:50-08:55, Paper ThAT18.5 | |
Safety-Critical Online Quadrotor Trajectory Planner for Agile Flights in Unknown Environments |
|
Yuan, Jiazhe | Zhejiang University |
Cao, Dongcheng | Zhejiang University |
Mei, Jiahao | Zhejiang University of Technology |
Chen, Jiming | Zhejiang University |
Li, Shuo | Zhejiang University |
Keywords: Motion and Path Planning, Collision Avoidance, Aerial Systems: Mechanics and Control
Abstract: Autonomous high-speed flight in unknown, cluttered environments is essential for a variety of quadrotor applications, such as inspection, search, and rescue. In this study, we propose a novel trajectory planner designed to achieve efficient, high-speed, collision-free flights in such environments. The proposed approach begins by generating a safe flight corridor based on the path found by Lazy Theta*, representing the safe regions with polytopic sets. These sets are then used to define discrete-time control barrier function (DCBF), ensuring the quadrotor stays within safe bounds during flight. By selecting one single waypoint ahead of the quadrotor on the path as the next waypoint, the trajectory is optimized by considering both the total flight time and safety constraints. Extensive simulations and real-world experiments have confirmed our method's feasibility, demonstrating its capability for high-speed performance and reliable obstacle avoidance.
|
|
08:55-09:00, Paper ThAT18.6 | |
Anytime Replanning of Robot Coverage Paths for Partially Unknown Environments |
|
Ramesh, Megnath | University of Waterloo |
Imeson, Frank | Avidbots |
Fidan, Baris | University of Waterloo |
Smith, Stephen L. | University of Waterloo |
Keywords: Coverage Path Planning, Motion and Path Planning, Reactive and Sensor-Based Planning, Service Robots
Abstract: In this paper, we propose a method to replan coverage paths for a robot operating in an environment with initially unknown static obstacles. Existing coverage approaches reduce coverage time by covering along the minimum number of coverage lines (straight-line paths). However, recomputing such paths online can be computationally expensive resulting in robot stoppages that increase coverage time. A naive alternative is greedy detour replanning, i.e., replanning with minimum deviation from the initial path, which is efficient to compute but may result in unnecessary detours. In this work, we propose an anytime coverage replanning approach named OARP-Replan that performs near-optimal replans to an interrupted coverage path within a given time budget. We do this by solving linear relaxations of integer linear programs (ILPs) to identify sections of the interrupted path that can be optimally replanned within the time budget. We validate OARP-Replan in simulation and perform comparisons against a greedy detour replanner and other state-of-the-art coverage planners. We also demonstrate OARP-Replan in experiments using an industrial-level autonomous robot.
|
|
ThAT19 |
407 |
Tactile Sensing 3 |
Regular Session |
Chair: She, Yu | Purdue University |
Co-Chair: Hipwell, M Cynthia | Texas A&M Univeristy |
|
08:30-08:35, Paper ThAT19.1 | |
LeTac-MPC: Learning Model Predictive Control for Tactile-Reactive Grasping |
|
Xu, Zhengtong | Purdue University |
She, Yu | Purdue University |
Keywords: Force and Tactile Sensing, Grasping, Perception for Grasping and Manipulation, Sensor-based Control
Abstract: Grasping is a crucial task in robotics, necessitating tactile feedback and reactive grasping adjustments for robust grasping of objects under various conditions and with differing physical properties. In this article, we introduce LeTac-MPC, a learning-based model predictive control (MPC) for tactile-reactive grasping. Our approach enables the gripper to grasp objects with different physical properties on dynamic and force-interactive tasks. We utilize a vision-based tactile sensor, GelSight (Yuan et al. 2017), which is capable of perceiving high-resolution tactile feedback that contains information on the physical properties and states of the grasped object. LeTac-MPC incorporates a differentiable MPC layer designed to model the embeddings extracted by a neural network from tactile feedback. This design facilitates convergent and robust grasping control at a frequency of 25 Hz. We propose a fully automated data collection pipeline and collect a dataset only using standardized blocks with different physical properties. However, our trained controller can generalize to daily objects with different sizes, shapes, materials, and textures. The experimental results demonstrate the effectiveness and robustness of the proposed approach. We compare LeTac-MPC with two purely model-based tactile-reactive controllers (MPC and PD) and open-loop grasping. Our results show that LeTac-MPC has optimal performance in dynamic and force-interactive tasks and optimal generalizability.
|
|
08:35-08:40, Paper ThAT19.2 | |
The Role of Tactile Sensing for Learning Reach and Grasp |
|
Zhang, Boya | University of Tübingen |
Andrussow, Iris | Max-Planck-Institute for Intelligent Systems |
Zell, Andreas | University of Tübingen |
Martius, Georg | Max Planck Institute for Intelligent Systems |
Keywords: Reinforcement Learning, Force and Tactile Sensing, Deep Learning in Grasping and Manipulation
Abstract: Stable and robust robotic grasping is essential for current and future robot applications. In recent works, the use of large datasets and supervised learning has enhanced speed and precision in antipodal grasping. However, these methods struggle with perception and calibration errors due to large planning horizons. To obtain more robust and reactive grasping motions, leveraging reinforcement learning combined with tactile sensing is a promising direction. Yet, there is no systematic evaluation of how the complexity of force-based tactile sensing affects the learning behavior for grasping tasks. This paper compares various tactile and environmental setups using two model-free reinforcement learning approaches for antipodal grasping. Our findings suggest that under imperfect visual perception, various tactile features improve learning outcomes, while complex tactile inputs complicate training.
|
|
08:40-08:45, Paper ThAT19.3 | |
Task-Specific Embodied Tactile Sensing for Dexterous Hand |
|
Wei, Qi | Nanchang University |
Xiong, Pengwen | Nanchang University |
Song, Aiguo | Southeast University |
Li, Qiang | Shenzhen Technology University |
Keywords: Haptics and Haptic Interfaces, Embodied Cognitive Science, Behavior-Based Systems
Abstract: In order to obtain a good tactile sensing, traditional dexterous hands always enable all the sensing units installed on them all the time, even if just a few sensor units are actually used, which make the tactile sensing system resource-wasting and energy consuming. In order to reduce their complexities by placing the tactile sensing units only at critical locations, this work proposes an embodied tactile dexterous hand (ET-Hand) and a novel multimodal sensor placement framework that learns multiple tasks to generate optimal placement proposal. Furthermore, our ET-Hand can dynamically adjust the perceived tactile sensor positions, types and numbers during robotic manipulation, providing novel tools and methods for investigating the tactile channels and placement scale required for robot exploration. In the object recognition and slip detection tasks, the results show that our proposed method performs close to or even better than traditional sensing way with large-scale placement.
|
|
08:45-08:50, Paper ThAT19.4 | |
TacDiffusion: Force-Domain Diffusion Policy for Precise Tactile Manipulation |
|
Wu, Yansong | Technische Universität München |
Chen, Zongxie | Technical University of Munich |
Wu, Fan | Technical University of Munich |
Chen, Lingyun | Technical University of Munich |
Zhang, Liding | Technical University of Munich |
Bing, Zhenshan | Technical University of Munich |
Swikir, Abdalla | Mohamed Bin Zayed University of Artificial Intelligence |
Haddadin, Sami | Mohamed Bin Zayed University of Artificial Intelligence |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: Dexterous Manipulation, Assembly, Learning from Demonstration
Abstract: Assembly is a crucial skill for robots in both modern manufacturing and service robotics. However, mastering transferable insertion skills that can handle a variety of high-precision assembly tasks remains a significant challenge. This paper presents a novel framework that utilizes diffusion models to generate 6D wrench for high-precision tactile robotic insertion tasks. It learns from demonstrations performed on a single task and achieves a zero-shot transfer success rate of 95.7% across various novel high-precision tasks. Our method effectively inherits the self-adaptability demonstrated by our previous work. In this framework, we address the frequency misalignment between the diffusion policy and the real-time control loop with a dynamic system-based filter, significantly improving the task success rate by 9.15%. Furthermore, we provide a practical guideline regarding the trade-off between diffusion models' inference ability and speed.
|
|
08:50-08:55, Paper ThAT19.5 | |
UpViTaL: Unpaired Visual-Tactile Self-Supervised Representation Learning for Dexterous Robotic Manipulation |
|
Han, Guwen | Zhejiang University |
Liu, Qingtao | Zhejiang University |
Cui, Yu | Zhejiang University |
Chen, Anjun | Zhejiang University |
Chen, Jiming | Zhejiang University |
Ye, Qi | Zhejiang University |
Keywords: Dexterous Manipulation, Representation Learning, Reinforcement Learning
Abstract: Visual and tactile pretraining have been extensively studied in dexterous robot manipulation tasks. However, existing methods typically require the simultaneous acquisition of visual and tactile data, making it difficult to utilize low-cost, unpaired visual-tactile datasets. Moreover, these methods often rely on tactile sensors to provide input data for reinforcement learning (RL) during the physical deployment of robotic dexterous hands, which highly increases deployment costs. To address these challenges, we propose UpViTaL, an unpaired visual- tactile self-supervised representation learning method for RL- based robot dexterous manipulation. Specifically, we collect low-cost unpaired visual and tactile datasets for manipulation skill learning using a camera and tactile gloves on three robot manipulation tasks. The temporal tactile self-supervised representation learning module of UpViTaL is used to explore efficient tactile representations from time-series tactile data. In parallel, the visual pretraining module of UpViTaL helps to extract efficient visual representations from visual data. In addition, we fuse unpaired visual-tactile representations through an RL reward mechanism, which does not require robotic dexterous hands tactile sensors for practical deployment. We validate our approach on three dexterous robot manipulation tasks. Experimental results demonstrate that UpViTaL can efficiently learn robot manipulation skills. Compared to existing approaches for visual pretraining, our method significantly improves the success rate by more than 30%.
|
|
ThAT20 |
408 |
Acceptability and Trust |
Regular Session |
Chair: de Graaf, Maartje | Utrecht University |
Co-Chair: Doshi, Prashant | University of Georgia |
|
08:30-08:35, Paper ThAT20.1 | |
Trust-Preserved Human-Robot Shared Autonomy Enabled by Bayesian Relational Event Modeling |
|
Li, Yingke | Massachusetts Institute of Technology |
Zhang, Fumin | Hong Kong University of Science and Technology |
Keywords: Acceptability and Trust, Human-Robot Teaming, Probability and Statistical Methods
Abstract: Shared autonomy functions as a flexible framework that empowers robots to operate across a spectrum of autonomy levels, allowing for efficient task execution with minimal human oversight. However, humans might be intimidated by the autonomous decision-making capabilities of robots due to perceived risks and a lack of trust. This paper proposed a trust-preserved shared autonomy strategy that allows robots to seamlessly adjust their autonomy level, striving to optimize team performance and enhance their acceptance among human collaborators. By enhancing the relational event modeling framework with Bayesian learning techniques, this paper enables dynamic inference of human trust based solely on time-stamped relational events communicated within human-robot teams. Adopting a longitudinal perspective on trust development and calibration in human-robot teams, the proposed trust-preserved shared autonomy strategy warrants robots to actively establish, maintain, and repair human trust, rather than merely passively adapting to it. We validate the effectiveness of the proposed approach through a user study on a human-robot collaborative search and rescue scenario. The objective and subjective evaluations demonstrate its merits on both task execution and user acceptability over the baseline approach that does not consider the preservation of trust.
|
|
08:35-08:40, Paper ThAT20.2 | |
Fostering Trust through Gesture and Voice-Controlled Robot Trajectories in Industrial Human-Robot Collaboration |
|
Campagna, Giulio | Aalborg University |
Frommel, Christoph | German Aerospace Center |
Haase, Tobias | German Aerospace Center (DLR) |
Gottardi, Alberto | University of Padova |
Villagrossi, Enrico | Italian National Research Council |
Chrysostomou, Dimitrios | Aalborg University |
Rehm, Matthias | Aalborg University |
Keywords: Human Factors and Human-in-the-Loop, Acceptability and Trust, Human-Robot Collaboration
Abstract: In the Industry 5.0 era, the focus shifts from basic automation to fostering collaboration between humans and robots. Trust is crucial in this new paradigm, enabling smooth interaction, especially for users with limited robotics knowledge. This study presents a novel framework that uses human hand gestures and voice commands to control robot movements, aiming to enhance trust, reduce cognitive workload, and minimize task execution time—key for efficient manufacturing. In automated systems, swift completion of micromanagement tasks is essential to prevent process disruption. To evaluate this framework, we devised a testbed scenario within an automated carbon fiber transportation and draping process, focusing on a maintenance task as the micromanagement challenge. Participants inspected the gripper, guided the robot along a defined path, and performed maintenance, such as attaching cables. Two conditions were tested: gestures and voice commands versus a smartPAD. The results showed that gestures and voice commands increased trust, lowered cognitive load, and shortened execution times, improving overall manufacturing efficiency.
|
|
08:40-08:45, Paper ThAT20.3 | |
Would You Trust Me Now? a Study on Trust Repair Strategies in Human-Robot Collaboration |
|
Mélot-Chesnel, Joséphine | Utrecht University |
de Graaf, Maartje | Utrecht University |
Keywords: Acceptability and Trust, Design and Human Factors, Human-Robot Collaboration
Abstract: As robots are prone to make errors that undermine trust, effective trust repair strategies are essential in effective human-robot collaboration. Our lab study evaluates three trust repair strategies --apology, denial, and compensation-- following two types of trust violations: competence-based and integrity-based. Consistent with prior research, integrity-based violations reduced moral trust more, while competence-based violations impacted performance trust. Denial caused greater discomfort than apology or compensation across both violation types. Dispositional trust influenced repair strategies effectiveness, particularly in willingness to engage and re-engage. Notably, individuals with high dispositional trust were more receptive to apologies. These findings underscore the need to consider individual trust differences, suggesting robots should assess human trust disposition to effectively foster continued collaboration.
|
|
08:45-08:50, Paper ThAT20.4 | |
Using Physiological Measures, Gaze, and Facial Expressions to Model Human Trust in a Robot Partner |
|
Green, Haley N. | University of Virginia |
Iqbal, Tariq | University of Virginia |
Keywords: Acceptability and Trust
Abstract: With robots becoming increasingly prevalent in various domains, it has become crucial to equip them with tools to achieve greater fluency in interactions with humans. One of the promising areas for further exploration lies in human trust. A real-time, objective model of human trust could be used to maximize productivity, preserve safety, and mitigate failure. In this work, we attempt to use physiological measures, gaze, and facial expressions to model human trust in a robot partner. We are the first to design an in-person, human-robot supervisory interaction study to create a dedicated trust dataset. Using this dataset, we train machine learning algorithms to identify the objective measures that are most indicative of trust in a robot partner, advancing trust prediction in human-robot interactions. Our findings indicate that a combination of sensor modalities (blood volume pulse, electrodermal activity, skin temperature, and gaze) can enhance the accuracy of detecting human trust in a robot partner. Furthermore, the Extra Trees, Random Forest, and Decision Trees classifiers exhibit consistently better performance in measuring the person's trust in the robot partner. These results lay the groundwork for constructing a real-time trust model for human-robot interaction, which could foster more efficient interactions between humans and robots.
|
|
08:50-08:55, Paper ThAT20.5 | |
A Novel Computational Framework of Robot Trust for Human-Robot Teams |
|
Nare, Bhavana | University of Georgia |
Frericks, John Bradley | University of Georgia |
Challa, Anusha | University of Georgia |
Doshi, Prashant | University of Georgia |
Johnsen, Kyle | University of Georgia |
Keywords: Acceptability and Trust, Human-Robot Teaming
Abstract: When humans collaborate, they form positive or negative experiences with each other. These experiences depend on various factors such as the individual's skills, abilities, and agency. In this paper, we consider human-robot collaborations and present a novel model of an autonomous robot's trust in humans based on the probability of the robot having a positive experience with the human. The model defines a dynamic trust-building process that translates into a computationally-accessible implementation. We hypothesize predictors of a positive experience with human teammates and derive trust in individual humans. As the interactions continue, team members develop an affinity toward each other. The robot's affinity towards humans can be viewed as kinship, and we also investigate how kinship affects trust and distrust. We present an algorithm for how the robot may use kinship-mediated trust in its decision-making, and demonstrate its use in simulated missions truly requiring human-robot collaboration.
|
|
08:55-09:00, Paper ThAT20.6 | |
Modeling Trust Dynamics in Robot-Assisted Delivery: Impact of Trust Repair Strategies |
|
Mangalindan, Dong Hae | Michigan State University |
Kandikonda, Karthik | Michigan State University |
Rovira, Ericka | United States Military Academy, West Point, NY |
Srivastava, Vaibhav | Michigan State University |
Keywords: Acceptability and Trust, Human-Robot Collaboration, Design and Human Factors
Abstract: With increasing efficiency and reliability, autonomous systems are becoming valuable assistants to humans in various tasks. In the context of robot-assisted delivery, we investigate how robot performance and trust repair strategies impact human trust. In this task, humans can choose to either send the robot to deliver autonomously or manually control it while handling a secondary task. The trust repair strategies examined include short and long explanations, apology and promise, and denial. Using data from human participants, we model human behavior using an Input-Output Hidden Markov Model (IOHMM) to capture the dynamics of trust and human action probabilities. Our findings indicate that humans are more likely to deploy the robot autonomously when their trust is high. Furthermore, state transition estimates show that long explanations are the most effective at repairing trust following a failure, while denial is most effective at preventing trust loss. We also demonstrate that the trust estimates generated by our model are isomorphic to self-reported trust values, making them interpretable. This model lays the groundwork for developing optimal policies that facilitate real-time adjustment of human trust in autonomous systems.
|
|
ThAT21 |
410 |
Manipulation Planning and Control 1 |
Regular Session |
Chair: Kim, Keehoon | POSTECH, Pohang University of Science and Technology |
Co-Chair: Pang, Tao | Boston Dynamics AI Institute |
|
08:30-08:35, Paper ThAT21.1 | |
Planning for Tabletop Object Rearrangement |
|
Hu, Jiaming | UC San Diego |
Szczekulski, Jan | University of California San Diego |
Peddabomma, Sudhansh | University of California San Diego |
Christensen, Henrik Iskov | UC San Diego |
Keywords: Manipulation Planning, Mobile Manipulation
Abstract: Finding an high-quality solution for the tabletop object rearrangement planning is a challenging problem. Compared to determining a goal arrangement, rearrangement planning is challenging due to the dependencies between objects and the buffer capacity available to hold objects. Although ORLA* has proposed an A* based searching strategy with lazy evaluation for the optimal solution, it is not scalable, with the success rate decreasing as the number of objects increases. Additionally, for noisy state representations, ORLA* provides only suboptimal solutions. To overcome these limitations, we propose an enhanced A*-based algorithm that improves state representation and employs incremental goal attempts with lazy evaluation at each iteration. This approach aims to enhance scalability while maintaining solution quality. Our evaluation demonstrates that our algorithm can provide superior solutions compared to ORLA*, in a shorter time, for both stationary and mobile robots.
|
|
08:35-08:40, Paper ThAT21.2 | |
DA-VIL: Adaptive Dual-Arm Manipulation with Reinforcement Learning and Variable Impedance Control |
|
Karim, Md Faizal | IIIT Hyderabad |
Bollimuntha, Shreya | International Institute of Information Technology Hyderabad |
Hashmi, Mohammed Saad | International Institute of Information Technology Hyderabad |
Das, Autrio | International Institute of Information Technology Hyderabad |
Singh, Gaurav | IIIT Hyderabad |
Sridhar, Srinath | Brown University |
Singh, Arun Kumar | University of Tartu |
Govindan, Nagamanikandan | IIITDM Kancheepuram |
Krishna, Madhava | IIIT Hyderabad |
Keywords: Dual Arm Manipulation, Compliance and Impedance Control
Abstract: Dual-arm manipulation is an area of growing interest in the robotics community. Enabling robots to perform tasks that require the coordinated use of two arms, is essential for complex manipulation tasks such as handling complex large objects, assembling components, and performing human-like interactions. However, achieving effective dual-arm manipulation is challenging due to the need for precise coordination, dynamic adaptability, and the ability to manage interaction forces between the arms and the objects being manipulated. We propose a novel pipeline that combines advantages of policy learning based on environment feedback and gradient based optimization to learn controller gains as well as the control outputs. This allows the robotic system to dynamically modulate its impedance in response to task demands, ensuring stability and dexterity in dual-arm operations. We evaluate our pipeline on a trajectory-tracking task involving a variety of large, complex objects with different masses and geometries. The performance is then compared to three other established methods for controlling dual-arm robots, demonstrating superior results.
|
|
08:40-08:45, Paper ThAT21.3 | |
Goal-Driven Robotic Pushing Manipulation under Uncertain Object Properties |
|
Lee, Yongseok | Pohang University of Science and Technology |
Kim, Keehoon | POSTECH, Pohang University of Science and Technology |
Keywords: Dexterous Manipulation, Manipulation Planning, Model Learning for Control
Abstract: Robotic pushing is one of the intuitive non-prehensile manipulation skills that can handle ungraspable objects without any complex task-specific tools. In this paper, we proposed a goal-driven accurate robotic pushing framework to achieve the robotic pushing tasks in practice that can operate under uncertain object properties. We employed a model predictive path integral (MPPI) as a goal-driven pushing controller building upon our prior work to operate pushing tasks under uncertain object properties. Unlike our prior work, the proposed framework can push the object toward the goal pose without predefined trajectories. The results of the numerical experiments demonstrated that the proposed framework can accomplish the pushing task with a significantly shorter total length, smaller total step, and a higher success rate even though the model parameters are unknown. Moreover, we demonstrated the proposed framework also works well in the real world through real-robot demonstrations.
|
|
08:45-08:50, Paper ThAT21.4 | |
Synthesizing Grasps and Regrasps for Complex Manipulation Tasks |
|
Patankar, Aditya | Stony Brook University |
Mahalingam, Dasharadhan | Stony Brook University |
Chakraborty, Nilanjan | Stony Brook University |
Keywords: Grasping, Manipulation Planning
Abstract: In complex manipulation tasks, e.g., manipulation by pivoting, the motion of the object being manipulated has to satisfy path constraints that can change during the motion. Therefore, a single grasp may not be sufficient for the entire path, and the object may need to be regrasped. Additionally, geometric data for objects from a sensor are usually available in the form of point clouds. The problem of computing grasps and regrasps from point-cloud representation of objects for complex manipulation tasks is a key problem in endowing robots with manipulation capabilities beyond pick-and-place. In this paper, we formalize the problem of grasping/regrasping for complex manipulation tasks with objects represented by (partial) point clouds and present an algorithm to solve it. We represent a complex manipulation task as a sequence of constant screw motions. Using a manipulation plan skeleton as a sequence of constant screw motions, we use a grasp metric to find graspable regions on the object for every constant screw segment. The overlap of the graspable regions for contiguous screws are then used to determine when and how many times the object needs to be regrasped. We present experimental results on point cloud data collected from RGB-D sensors to illustrate our approach.
|
|
08:50-08:55, Paper ThAT21.5 | |
A Helping (Human) Hand in Kinematic Structure Estimation |
|
Pfisterer, Adrian | Technische Universitaet Berlin |
Li, Xing | TU Berlin |
Mengers, Vito | Technische Universität Berlin |
Brock, Oliver | Technische Universität Berlin |
Keywords: RGB-D Perception, Probability and Statistical Methods, Learning from Demonstration
Abstract: Visual uncertainties such as occlusions, lack of texture, and noise present significant challenges in obtaining accurate kinematic models for safe robotic manipulation. We introduce a probabilistic real-time approach that leverages the human hand as a prior to mitigate these uncertainties. By tracking the constrained motion of the human hand during manipulation and explicitly modeling uncertainties in visual observations, our method reliably estimates an object’s kinematic model online. We validate our approach on a novel dataset featuring challenging objects that are occluded during manipulation and offer limited articulations for perception. The results demonstrate that by incorporating an appropriate prior and explicitly accounting for uncertainties, our method produces accurate estimates, outperforming two recent baselines by 195% and 140%, respectively. Furthermore, we demonstrate that our approach's estimates are precise enough to allow a robot to manipulate even small objects safely.
|
|
08:55-09:00, Paper ThAT21.6 | |
Is Linear Feedback on Smoothed Dynamics Sufficient for Stabilizing Contact-Rich Plans? |
|
Shirai, Yuki | Mitsubishi Electric Research Laboratories |
Zhao, Tong | Massachusetts Institute of Technology |
Suh, Hyung Ju Terry | Massachusetts Institute of Technology |
Zhu, Huaijiang | New York University |
Ni, Xinpei | Georgia Institute of Technology |
Wang, Jiuguang | Boston Dynamics AI Institute |
Simchowitz, Max | MIT |
Pang, Tao | Boston Dynamics AI Institute |
Keywords: Dexterous Manipulation, Multi-Contact Whole-Body Motion Planning and Control, Optimization and Optimal Control
Abstract: Designing planners and controllers for contact-rich manipulation is extremely challenging as contact violates the smoothness conditions that many gradient-based controller synthesis tools assume. Contact smoothing approximates a non-smooth system with a smooth one, allowing one to use these synthesis tools more effectively. However, applying classical control synthesis methods to smoothed contact dynamics remains relatively under-explored. This paper analyzes the efficacy of linear controller synthesis using differential simulators based on contact smoothing. We introduce natural baselines for leveraging contact smoothing to compute (a) open-loop plans robust to uncertain conditions and/or dynamics, and (b) feedback gains to stabilize around open-loop plans. Using robotic bimanual whole-body manipulation as a testbed, we perform extensive empirical experiments on over 300 trajectories and analyze why LQR seems insufficient for stabilizing contact-rich plans.
|
|
ThAT22 |
411 |
Learning for Manipulation and Navigation |
Regular Session |
Chair: Sintov, Avishai | Tel-Aviv University |
Co-Chair: Kingston, Zachary | Purdue University |
|
08:30-08:35, Paper ThAT22.1 | |
Interaction-Driven Updates: 3D Scene Graph Maintenance During Robot Task Execution |
|
Li, Qingfeng | Beihang University |
Zhang, Xinlei | BUAA |
Chen, Chen | Hangzhou Innovation Institute of Beihanga University |
Niu, Jianwei | Beihang University |
Zhao, Haochen | BUAA |
Keywords: Semantic Scene Understanding, Cognitive Control Architectures, Embodied Cognitive Science
Abstract: Robots powered by large language model (LLM) demonstrate significant research and application potential by effectively interpreting scene information to respond to human commands. However, when robots rely on static scene information during task execution, they face difficulties in adapting to changes in the environment, posing a major challenge for dynamic scene perception. To address the above issues, we propose an innovative interaction-driven approach to enhance robots' ability to perceive dynamic scene information. This approach consists of two contributions, the observation point selection module and the dynamic scene maintenance module. Specifically, first, the robot uses the 3D scene graph (3DSG) containing assets and objects to perceive static scene information through the LLM planner. Next, the best observation point for each asset is obtained through the observation point selection module. Then, with the help of the best observation point, the dynamic scene maintenance module interacts with the asset-related objects to dynamically update all the object node information related to the asset node. This approach enables robots to maintain dynamic scene information, enhancing their adaptability in unpredictable environments and improving task reliability.We evaluated our method using the iTHOR and RoboTHOR datasets within the AI2-THOR simulator and in real-world scenarios. Experimental results demonstrate that our method effectively and accurately maintains robots' perception of dynamic scene information.
|
|
08:35-08:40, Paper ThAT22.2 | |
ME-PATS: Mutually Enhancing Search-Based Planner and Learning-Based Agent for Tractor-Trailer Systems |
|
Fan, Ke | Tsinghua University |
Ren, Zhizhou | Tsinghua University |
Guo, Ruihan | Helixon |
Zhang, Jinpeng | Tsinghua University |
Huang, Zhuo | Tsinghua University |
Zhou, Yuan | Tsinghua University |
Zhang, Zufeng | Tsinghua University |
Keywords: Motion and Path Planning, AI-Based Methods, Integrated Planning and Learning
Abstract: Planning a kinodynamically feasible path for a tractor-trailer vehicle is challenging for both search-based and learning-based methods due to the vehicle’s unique kinematics and complex obstacles. These factors increase the likelihood of infeasible paths and exacerbate long-horizon issues. We introduce ME-PATS: a framework that mutually enhances the search-based planner and the learning-based agent for tractor-trailer systems. The search-based planner provides successful trajectories to help the learning-based agent update its policy, while the agent improves the planner’s efficiency through direct path simulation. Additionally, we propose two approaches to apply our framework to more challenging tasks: designing obstacle-aware networks to enhance the learning-based agent’s capabilities, and combining the planner’s paths with the trained agent’s simulated paths through multi-segment integration. Full details and results are available on our project website at href{https://github.com/FrankSinatral/TTsystems}{https://g ithub.com/FrankSinatral/TTsystems}.
|
|
08:40-08:45, Paper ThAT22.3 | |
Jailbreaking LLM-Controlled Robots |
|
Robey, Alexander | University of Pennsylvania |
Ravichandran, Zachary | University of Pennsylvania |
Kumar, Vijay | University of Pennsylvania |
Hassani, Hamed | University of Pennsylvania |
Pappas, George J. | University of Pennsylvania |
Keywords: AI-Enabled Robotics, Robot Safety, Machine Learning for Robot Control
Abstract: The recent introduction of large language models (LLMs) has revolutionized the field of robotics by enabling contextual reasoning and intuitive human-robot interaction in domains as varied as manipulation, locomotion, and self-driving vehicles. When viewed as a stand-alone technology, LLMs are known to be vulnerable to jailbreaking attacks, wherein malicious prompters elicit harmful text by bypassing LLM safety guardrails. To assess the risks of deploying LLMs in robotics, in this paper, we introduce RoboPAIR, the first algorithm designed to jailbreak LLM-controlled robots. Unlike existing, textual attacks on LLM chatbots, RoboPAIR elicits harmful physical actions from LLM-controlled robots, a phenomenon we experimentally demonstrate in three scenarios: (i) a white-box setting, wherein the attacker has full access to the NVIDIA Dolphins self-driving LLM, (ii) a gray-box setting, wherein the attacker has partial access to a Clearpath Robotics Jackal UGV robot equipped with a GPT-4o planner, and (iii) a black-box setting, wherein the attacker has only query access to the GPT-3.5-integrated Unitree Robotics Go2 robot dog. In each scenario and across three new datasets of harmful robotic actions, we demonstrate that RoboPAIR, as well as several static baselines, finds jailbreaks quickly and effectively, often achieving 100% attack success rates. Our results reveal, for the first time, that the risks of jailbroken LLMs extend far beyond text generation, given the distinct possibility that jailbroken robots could cause physical damage in the real world. Indeed, our results on the Unitree Go2 represent the first successful jailbreak of a deployed commercial robotic system. Addressing this emerging vulnerability is critical for ensuring the safe deployment of LLMs in robotics. Additional media is available at: https://robopair.org.
|
|
08:45-08:50, Paper ThAT22.4 | |
CaStL: Constraints As Specifications through LLM Translation for Long-Horizon Task and Motion Planning |
|
Guo, Weihang | Rice University |
Kingston, Zachary | Purdue University |
Kavraki, Lydia | Rice University |
Keywords: AI-Enabled Robotics, Task and Motion Planning
Abstract: Large Language Models (LLMs) have demonstrated remarkable ability in long-horizon Task and Motion Planning (TAMP) by translating clear and straightforward natural language problems into formal specifications such as the Planning Domain Definition Language (PDDL). However, real-world problems are often ambiguous and involve many complex constraints. In this paper, we introduce Constraints as Specifications through LLMs (CaStL), a framework that identifies constraints such as goal conditions, action ordering, and action blocking from natural language in multiple stages. CaStL translates these constraints into PDDL and Python scripts, which are solved using an custom PDDL solver. Tested across three PDDL domains, CaStL significantly improves constraint handling and planning success rates from natural language specification in complex scenarios.
|
|
08:50-08:55, Paper ThAT22.5 | |
Skills Made to Order: Efficient Acquisition of Robot Cooking Skills Guided by Multiple Forms of Internet Data |
|
Verghese, Mrinal | Carnegie Mellon University |
Atkeson, Christopher | CMU |
Keywords: AI-Enabled Robotics, Learning from Demonstration, Big Data in Robotics and Automation
Abstract: This study explores the utility of various internet data sources to select among a set of template robot behaviors to perform skills. Learning contact-rich skills involving tool use from internet data sources has typically been challenging due to the lack of physical information such as contact existence, location, areas, and force in this data. Prior works have generally used internet data and foundation models trained on this data to generate low-level robot behavior. We hypothesize that these data and models may be better suited to selecting among a set of basic robot behaviors to perform these contact-rich skills. We explore three methods of template selection: querying large language models, comparing video of robot execution to retrieved human video using features from a pretrained video encoder common in prior work, and performing the same comparison using features from an optic flow encoder trained on internet data. Our results show that LLMs are surprisingly capable template selectors despite their lack of visual information, optical flow encoding significantly outperforms video encoders trained with an order of magnitude more data, and important synergies exist between various forms of internet data for template selection. By exploiting these synergies, we create a template selector using multiple forms of internet data that achieves a 79% success rate on a set of 16 different cooking skills involving tool-use.
|
|
08:55-09:00, Paper ThAT22.6 | |
LEMMo-Plan: LLM-Enhanced Learning from Mutli-Modal Demonstration for Planning Sequential Contact-Rich Manipulation Tasks |
|
Chen, Kejia | Technical University of Munich |
Shen, Zheng | TU Munich |
Zhang, Yue | Technical University of Munich |
Chen, Lingyun | Technical University of Munich |
Wu, Fan | Technical University of Munich |
Bing, Zhenshan | Technical University of Munich |
Haddadin, Sami | Mohamed Bin Zayed University of Artificial Intelligence |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: AI-Enabled Robotics, Dexterous Manipulation, Compliant Assembly
Abstract: Large Language Models (LLMs) have gained popularity in task planning for long-horizon manipulation tasks. To enhance the validity of LLM-generated plans, visual demonstrations and online videos have been widely employed to guide the planning process. However, for manipulation tasks involving subtle movements but rich contact interactions, visual perception alone may be insufficient for the LLM to fully interpret the demonstration. Additionally, visual data provides limited information on force-related parameters and conditions, which are crucial for effective execution on real robots. In this paper, we introduce an in-context learning framework that incorporates tactile and force-torque information from human demonstrations to enhance the LLM's ability to generate plans for new task scenarios. We propose a bootstrapped reasoning pipeline that sequentially integrates each modality into a comprehensive task plan. This task plan is then used as a reference for planning in new task configurations. Real-world experiments on two different sequential manipulation tasks demonstrate the effectiveness of our framework in improving LLMs' understanding of multi-modal demonstrations and enhancing the overall planning performance.
|
|
ThAT23 |
412 |
Diffusion Models |
Regular Session |
Chair: Romeres, Diego | Mitsubishi Electric Research Laboratories |
Co-Chair: Gombolay, Matthew | Georgia Institute of Technology |
|
08:30-08:35, Paper ThAT23.1 | |
LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-Based Planning |
|
Feng, Zeyu | National University of Singapore |
Luan, Hao | National University of Singapore |
Goyal, Pranav | University of Michigan - Ann Arbor |
Soh, Harold | National University of Singapore |
Keywords: Imitation Learning, Machine Learning for Robot Control, Safety in HRI
Abstract: Operating effectively in complex environments while complying with specified constraints is crucial for the safe and successful deployment of robots that interact with and operate around people. In this work, we focus on generating long-horizon trajectories that adhere to novel static and temporally-extended constraints/instructions at test time. We propose a data-driven diffusion-based framework, LTLDoG, that modifies the inference steps of the reverse process given an instruction specified using finite linear temporal logic (LTLf). LTLDoG leverages a satisfaction value function on LTLf and guides the sampling steps using its gradient field. This value function can also be trained to generalize to new instructions not observed during training, enabling flexible test-time adaptability. Experiments in robot navigation and manipulation illustrate that the method is able to generate trajectories that satisfy formulae that specify obstacle avoidance and visitation sequences.
|
|
08:35-08:40, Paper ThAT23.2 | |
DARE: Diffusion Policy for Autonomous Robot Exploration |
|
Cao, Yuhong | National University of Singapore |
Lew, Jeric Jieyi | National University of Singapore |
Liang, Jingsong | National University of Singapore |
Cheng, Jin | ETH Zurich |
Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
Keywords: View Planning for SLAM, Deep Learning Methods, Motion and Path Planning
Abstract: Autonomous robot exploration requires a robot to efficiently explore and map unknown environments. Compared to conventional methods that can only optimize paths based on the current robot belief, learning-based methods show the potential to achieve improved performance by drawing on past experiences to reason about unknown areas. In this paper, we propose DARE, a novel generative approach that leverages diffusion models trained on expert demonstrations, which can explicitly generate an exploration path through one-time inference. We build DARE upon an attention-based encoder and a diffusion model, and introduce ground truth optimal demonstrations for training to learn better patterns for exploration. The trained planner can reason about the partial belief to recognize the potential structure in unknown areas and consider these areas during path planning. Our experiments demonstrate that DARE achieves on-par performance with both conventional and learning-based state-of-the-art exploration planners, as well as good generalizability in both simulations and real-life scenarios.
|
|
08:40-08:45, Paper ThAT23.3 | |
NaviDiffusor: Cost-Guided Diffusion Model for Visual Navigation |
|
Zeng, Yiming | Sun Yat-Sen University |
Ren, Hao | Sun Yat-Sen University |
Wang, Shuhang | Sun Yet-Sen University |
Huang, Junlong | Sun Yat-Sen University |
Cheng, Hui | Sun Yat-Sen University |
Keywords: Vision-Based Navigation, Integrated Planning and Learning, Imitation Learning
Abstract: Visual navigation, a fundamental challenge in mobile robotics, demands versatile policies to handle diverse environments. Classical methods leverage geometric solutions to minimize specific costs, offering adaptability to new scenarios but are prone to system errors due to their multi-modular design and reliance on hand-crafted rules. Learning-based methods, while achieving high planning success rates, face difficulties in generalizing to unseen environments beyond the training data and often require extensive training. To address these limitations, we propose a hybrid approach that combines the strengths of learning-based methods and classical approaches for RGB-only visual navigation. Our method first trains a conditional diffusion model on diverse path-RGB observation pairs. During inference, it integrates the gradients of differentiable scene-specific and task-level costs, guiding the diffusion model to generate valid paths that meet the constraints. This approach alleviates the need for retraining, offering a plug-and-play solution. Extensive experiments in both indoor and outdoor settings, across simulated and real-world scenarios, demonstrate zero-shot transfer capability of our approach, achieving higher success rates and fewer collisions compared to baseline methods. Code will be released at url{https://github.com/SYSU-RoboticsLab/NaviD}.
|
|
08:45-08:50, Paper ThAT23.4 | |
NavigateDiff: Visual Predictors Are Zero-Shot Navigation Assistants |
|
Qin, Yiran | CUHKsz |
Sun, Ao | The Chinese University of Hong Kong, Shenzhen |
Hong, Yuze | The Chinese University of Hong Kong,Shenzhen |
Wang, Benyou | The Chinese University of Hong Kong, Shenzhen |
Zhang, Ruimao | The Chinese University of Hong Kong (Shenzhen) |
Keywords: Vision-Based Navigation, Deep Learning for Visual Perception, Visual Learning
Abstract: Navigating unfamiliar environments presents significant challenges for household robots, requiring the ability to recognize and reason about novel decoration and layout. Existing reinforcement learning methods cannot be directly transferred to new environments, as they typically rely on extensive mapping and exploration, leading to time-consuming and inefficient. To address these challenges, we try to transfer the logical knowledge and the generalization ability of pre-trained foundation models to zero-shot navigation. By integrating a large vision-language model with a diffusion network, our approach named NavigateDiff constructs a visual predictor that continuously predicts the agent's potential observations in the next step which can assist robots generate robust actions. Furthermore, to adapt the temporal property of navigation, we introduce temporal historical information to ensure that the predicted image is aligned with the navigation scene. We then carefully designed an information fusion framework that embeds the predicted future frames as guidance into goal-reaching policy to solve downstream image navigation tasks. This approach enhances navigation control and generalization across both simulated and real-world environments. Through extensive experimentation, we demonstrate the robustness and versatility of our method, showcasing its potential to improve the efficiency and effectiveness of robotic navigation in diverse settings. Project Page: https://21styouth.github.io/NavigateDiff/.
|
|
08:50-08:55, Paper ThAT23.5 | |
FDPP: Fine-Tune Diffusion Policy with Human Preference |
|
Chen, Yuxin | University of California, Berkeley |
Jha, Devesh | Mitsubishi Electric Research Laboratories |
Tomizuka, Masayoshi | University of California |
Romeres, Diego | Mitsubishi Electric Research Laboratories |
Keywords: Imitation Learning, Reinforcement Learning, Sensorimotor Learning
Abstract: Imitation learning from human demonstrations enables robots to perform complex manipulation tasks and has recently witnessed huge success. However, these techniques often struggle to adapt behavior to new preferences or changes in the environment. To address these limitations, we propose Fine-tuning Diffusion Policy with Human Preference (FDPP). FDPP learns a reward function through preference-based learning. This reward is then used to fine-tune the pre-trained policy with reinforcement learning (RL), resulting in alignment of pre-trained policy with new human preferences while still solving the original task. Our experiments across various robotic tasks and preferences demonstrate that FDPP effectively customizes policy behavior without compromising performance. Additionally, we show that incorporating Kullback–Leibler (KL) regularization during fine-tuning prevents over-fitting and helps maintain the competencies of the initial policy.
|
|
08:55-09:00, Paper ThAT23.6 | |
Learning Diverse Robot Striking Motions with Diffusion Models and Kinematically Constrained Gradient Guidance |
|
Lee, Kin Man | Georgia Institute of Technology |
Ye, Sean | Zoox |
Xiao, Qingyu | Georgia Institute of Technology |
Wu, Zixuan | Georgia Institute of Technology |
Zaidi, Zulfiqar | Georgia Institute of Technology |
D'Ambrosio, David | Google |
Sanketi, Pannag | Google |
Gombolay, Matthew | Georgia Institute of Technology |
Keywords: Imitation Learning, Learning from Demonstration, Constrained Motion Planning
Abstract: Advances in robot learning have enabled robots to generate skills for a variety of tasks. Yet, robot learning is typically sample inefficient, struggles to learn from data sources exhibiting varied behaviors, and does not naturally incorporate constraints. These properties are critical for fast, agile tasks such as playing table tennis. Modern techniques for learning from demonstration improve sample efficiency and scale to diverse data, but are rarely evaluated on agile tasks. In the case of reinforcement learning, achieving good performance requires training on high-fidelity simulators. To overcome these limitations, we develop a novel diffusion modeling approach that is offline, constraint-guided, and expressive of diverse agile behaviors. The key to our approach is a kinematic constraint gradient guidance (KCGG) technique that computes gradients through both the forward kinematics of the robot arm and the diffusion model to direct the sampling process. KCGG minimizes the cost of violating constraints while simultaneously keeping the sampled trajectory in-distribution of the training data. We demonstrate the effectiveness of our approach for time-critical robotic tasks by evaluating KCGG in two challenging domains: simulated air hockey and real table tennis. In simulated air hockey, we achieved a 25.4% increase in block rate, while in table tennis, we achieved a 17.3% increase in success rate compared to imitation learning baselines.
|
|
ThLB1R |
Hall A1/A2 |
Late Breaking Results 5 |
Poster Session |
|
09:30-09:55, Paper ThLB1R.1 | |
Probabilistically-Safe Bipedal Navigation Over Uncertain Terrain Via Conformal Prediction and Contraction Analysis |
|
Muenprasitivej, Kasidit | Georgia Institute of Technology |
Zhao, Ye | Georgia Institute of Technology |
Chou, Glen | Georgia Institute of Technology |
Keywords: Humanoid and Bipedal Locomotion, Planning under Uncertainty, Optimization and Optimal Control
Abstract: This work presents a high-level navigation framework for bipedal robot that accounts for terrain uncertainty in footstep planning and motion control. Given a terrain map estimated via Gaussian Process (GP) model with nonstationary kernel, we leverage Conformal Prediction (CP) to construct tighter coverage guarantee intervals containing the true terrain elevations. We also formulate a soft CP constraint to ensure safe foot height changes between successive footfalls with probabilistic guarantee. Additionally, we model the CP coverage intervals as a bounded disturbance model which are incorporated into linear inverted pendulum plus flywheel (LIP) model. Given the LIP model with bounded disturbances arising from uncertain terrain, we formulate a flywheel torque control law using Control Contraction Metrics (CCMs). We then design a Robust Control Invariant (RCI) Tube around the desired Center of Mass (CoM) phase-space trajectory, defining a region in which the system states can be stabilized by the flywheel torque control law. This ensures the maintenance of a constant CoM height (or constant surface slope) despite terrain uncertainty. The overall framework results in an uncertainty-informed Model Predictive Controller (MPC) provides probabilistic safety guarantees, enabling robust locomotion across uncertain environments. This approach enhances the robot’s ability to navigate complex, unstructured terrains while maintaining stability in real-world deployment.
|
|
09:30-09:55, Paper ThLB1R.2 | |
Robotic Tissue Manipulation in Endoscopic Submucosal Dissection: Late Breaking Results |
|
Zhang, Tao | Arizona State University |
Ghiyasi, Morteza | Arizona State University |
Gangrade, Navya | Arizona State University |
Arora, Deepit | Arizona State University |
Jue, Terry | Mayo Clinic |
Marvi, Hamidreza | Arizona State University |
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles, Surgical Robotics: Planning
Abstract: To address the steep learning curve associated with ESD, we reengineered the tissue clip by introducing a magnetic mechanism to improve submucosal exposure and facilitate precise dissection. Furthermore, we proposed a robotic manipulation method that automates the motion of an external magnet mounted on a robotic arm to enhance operational efficiency. Based on simulation trials using ROS Gazebo (ICRA 2025 accepted, manuscript # 4388), which demonstrated promising results, we further evaluated the proposed method in a 3D-printed stomach and ex vivo setup using a clinical gastroscope (Olympus EVIS EXERA III GIF-1TH190). By labeling and training over 21,000 images captured from the gastroscope using a deep learning-based YOLO (You Only Look Once) method, we successfully deployed the magnetic robotic tissue manipulation system, integrating the robotic arm, Olympus gastroscope, mock-up and ex vivo tissue. In 42 trials using a 3D-printed stomach, the magnetic robotic manipulation system was able to orient the internal magnetic clip—clamped onto the tissue—to the desired angle (±35 degrees) in 55.36 ± 7.64 seconds (mean ± standard error, n = 42). Additionally, ex vivo trials using porcine stomach tissue recorded a performance time of 37.19 ± 3.78 seconds over 36 trials. These results lay the groundwork for future validation in in vivo scenarios.
|
|
09:30-09:55, Paper ThLB1R.3 | |
Shared Mental Models Improve Performance in Human-Robot Teams During Unforeseen Events |
|
George, Zariq | University of Michigan |
Tilbury, Dawn | University of Michigan |
Robert, Lionel | University of Michigan |
Keywords: Human-Robot Teaming, Human-Robot Collaboration
Abstract: As robots increasingly integrate into human environments, there is growing optimism about expanding and diversifying human-robot team (HRT) missions. However, existing approaches, which primarily rely on teleoperation and contingency handling, encounter limitations such as cognitive overload and scalability challenges. A user study was conducted to investigate how Shared Mental Models (SMMs) influence HRTs' problem-solving abilities when faced with novel challenges. The results revealed that a General SMM (GSMM) significantly improved task performance compared to a Specific SMM (SSMM), although the SMM type did not significantly impact overall team adaptability. Additionally, the ability to handle uncertain situations was crucial for performance.
|
|
09:30-09:55, Paper ThLB1R.4 | |
Streaming Flow Policy: Simplifying Diffusion/flow Policies by Treating Robot Trajectories As Flow Trajectories |
|
Jiang, Sunshine | Massachusetts Institute of Technology |
Fang, Xiaolin | MIT |
Roy, Nicholas | Massachusetts Institute of Technology |
Lozano-Perez, Tomas | MIT |
Kaelbling, Leslie | MIT |
Ancha, Siddharth | Massachusetts Institute of Technology |
Keywords: Imitation Learning, Machine Learning for Robot Control, Deep Learning in Grasping and Manipulation
Abstract: Recent advances in diffusion/flow policies have enabled imitation learning of complex, multi-modal action trajectories for robotic control. However, they are slow because they sample a trajectory of trajectories—a diffusion/flow trajectory of robot trajectories. They discard intermediate robot trajectories, and must wait for the sampling process to complete before any actions can be executed on the robot. In this work, we propose a novel framework that simplifies diffusion/flow policies by treating robot trajectories as flow trajectories. Instead of starting from pure noise, our algorithm starts from the current robot configuration, and incrementally integrates a velocity field learned via flow matching to produce a sequence of robot configurations that constitute a single trajectory. This enables computed actions to be streamed to the robot on-the-fly during the flow sampling process, for significantly faster and reactive policies. It is well-suited for receding horizon control where we can adaptively generate only as many actions as are executed on the robot, and no more. Despite streaming, our method retains the ability to model multi-modal behavior. We show that training flows that stabilize around demonstration trajectories reduces distribution shift and improves imitation learning performance. Streaming flow policy outperforms prior diffusion/flow policies on imitation learning benchmarks while enabling faster policy execution and tighter sensorimotor control loops.
|
|
09:30-09:55, Paper ThLB1R.5 | |
Battery-Free Computer Vision on Insect-Scale Microrobots |
|
Arroyos, Vicente | University of Washington |
Ibrahim, Michael | University of Washington |
Azuh Mensah, Emmanuel | University of Washington |
Johnson, Kyle | University of Washington Paul G. Allen School for Computer Scien |
Fuller, Sawyer | University of Washington |
Iyer, Vikram | University of Washington |
Keywords: Machine Learning for Robot Control, Visual Tracking, Micro/Nano Robots
Abstract: The goal of this project is to enable the use of efficient on-device deep learning models for battery-free mobility-based sensing robots. We target the deployment of these models on MilliMobile, a one square centimeter microrobotic platform with only 512 KB of RAM and 1 MB of flash memory (millimobile.cs.washington.edu), making it challenging to run traditional models onboard. We address this challenge by integrating intermittent computing and motion with image-based sensing, ensuring that the system only performs high-power actions–like locomotion, inference, and communication–when specific conditions are satisfied. Event-based vision and intermittent computing approaches will also be leveraged to optimize the MCUs time in ultra-low-power modes. We train on insect images from Insect Detect - insect classification dataset v2 and achieve almost 70% accuracy after 500 epochs with milliWatts of power consumed when running the system on the nRF52840 microcontroller.
|
|
09:30-09:55, Paper ThLB1R.6 | |
Implicit Behavioral Cues for Enhanced Pedestrian Comfort in Robot Social Navigation |
|
Lian, Yi | Georgia Institute of Technology |
Kim, Joanne Taery | Georgia Institute of Technology |
Ha, Sehoon | Georgia Institute of Technology |
Keywords: Social HRI, Human-Aware Motion Planning, Service Robotics
Abstract: Robots operating in public spaces must navigate alongside humans in ways that are not only safe but also intuitive and socially acceptable. In particular, how a robot communicates its navigation intent can significantly impact pedestrian comfort and trust. In this work, we explore how different robot behaviors - ranging from no cue to implicit (speed and trajectory adjustments) and explicit (verbal) signals - affect pedestrian perception during hallway encounters. We conducted a pilot user study with 12 participants, where a robot approached the participant from the front or side using one of five cue strategies. Subjective ratings (comfort, trust, clarity, predictability, and proxemics) and objective measures (hesitation, passing time) were collected. Results indicate that trajectory cues led to the highest perceived comfort and clarity, while sudden stops and no cues caused more confusion and hesitation. These findings highlight the potential of motion-based implicit cues to improve the legibility of robot navigation in shared human environments.
|
|
09:30-09:55, Paper ThLB1R.7 | |
Spectral Bayesian Inference and Neural Estimation of Acoustic Wave Propagation |
|
Huang, Yongchao | University of Aberdeen |
Keywords: Probabilistic Inference, Probability and Statistical Methods, AI-Based Methods
Abstract: We present a novel framework integrating physics and machine learning to estimate frequency-domain acoustic wave propagation coefficients. Using acoustic waveforms captured at speaker-receiver pairs, we estimate attenuation and wavenumber via: (1) Bayesian inference for uncertainty-aware learning from small and noisy data, (2) a neural-physical model trained with forward-backward physical losses, and (3) non-linear least squares as baseline. With inferred propagation coefficients, room impulse responses (RIRs) are derived, enabling robot relocalisation with uncertainty quantification.
|
|
09:30-09:55, Paper ThLB1R.8 | |
Analyzing Human Perceptions of a MEDEVAC Robot in a Simulated Evacuation Scenario |
|
Jordan, Tyson | University of Georgia |
Pandey, Pranav Kumar | University of Georgia |
Parasuraman, Ramviyas | University of Georgia |
Doshi, Prashant | University of Georgia |
Goodie, Adam | University of Georgia |
Keywords: Search and Rescue Robots, Design and Human Factors, Human-Robot Teaming
Abstract: The use of autonomous systems in medical evacuation (MEDEVAC) scenarios is promising, but existing implementations overlook key insights from human-robot interaction (HRI) research. Studies on human-machine teams demonstrate that human perceptions of a machine teammate are critical in governing the machine's performance. Consequently, it is essential to identify the factors that contribute to positive human perceptions in human-machine teams. Here, we present a mixed factorial design to assess human perceptions of a MEDEVAC robot in a simulated evacuation scenario. Participants were assigned to the role of casualty (CAS) or bystander (BYS) and subjected to three within-subjects conditions based on the MEDEVAC robot's operating mode: autonomous-slow (AS), autonomous-fast (AF), and teleoperation (TO). During each trial, a MEDEVAC robot navigated an 11-meter path, acquiring a casualty and transporting them to an ambulance exchange point while avoiding an idle bystander. Following each trial, subjects completed a questionnaire measuring their emotional states, perceived safety, and social compatibility with the robot. Results indicate a consistent main effect of operating mode on reported emotional states and perceived safety. Pairwise analyses suggest that the employment of the AF operating mode negatively impacted perceptions along these dimensions. There were no persistent differences between CAS and BYS responses.
|
|
09:30-09:55, Paper ThLB1R.9 | |
FRESHR-GSI: A Generalized Safety Model and Evaluation Framework for Mobile Robots in Multi-Human Environments |
|
Pandey, Pranav Kumar | University of Georgia |
Parasuraman, Ramviyas | University of Georgia |
Doshi, Prashant | University of Georgia |
Keywords: Safety in HRI, Human-Robot Collaboration, Human-Robot Teaming
Abstract: Human safety is critical in applications involving close human-robot interactions (HRI) and is a key aspect of physical compatibility between humans and robots. While measures of human safety in HRI exist, these mainly target industrial settings involving robotic manipulators. Less attention has been paid to settings where mobile robots and humans share the space. This paper introduces a new robot-centered directional framework of human safety. It is particularly useful for evaluating mobile robots as they operate in environments populated by multiple humans. The framework integrates several key metrics, such as each human's relative distance, speed, and orientation. The core novelty lies in the framework's flexibility to accommodate different application requirements while allowing for both the robot-centered and external observer points of view. We instantiate the framework by using RGB-D based vision integrated with a deep learning-based human detection pipeline to yield a proxemics-guided generalized safety index (GSI) that instantaneously assesses human safety. We extensively validate GSI's capability of producing appropriate and fine-grained safety measures in real-world experimental scenarios and demonstrate its superior efficacy against extant safety models.
|
|
09:30-09:55, Paper ThLB1R.10 | |
Variable Stiffness Quasi-Direct Drive Cable-Actuated Tensegrity Robot with Visuotactile Contact Sensor |
|
Mi, Jonathan | University of Michigan, Ann Arbor |
Tong, Wenzhe | University of Michigan, Ann Arbor |
Ma, Yilin | University of Michigan, Ann Arbor |
Huang, Xiaonan | University of Michigan |
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, Compliant Joints and Mechanisms
Abstract: Tensegrity robots excel in tasks requiring extreme levels of deformability and robustness. However, there are challenges in state estimation and payload versatility due to their high number of degrees of freedom and unconventional shape. This work introduces a modular three-bar tensegrity robot featuring a customizable payload design. The modular exoskeleton supports scalable configurations and efficient packaging of sensing and computation Our tensegrity robot employs a novel Quasi-Direct Drive (QDD) cable actuator with low-stretch polymer cables to achieve accurate proprioception without needing external force or torque sensors. The design allows for on-the-fly stiffness tuning for better environment and payload adaptability. Experimental data demonstrates the high accuracy cable length estimation (<1% error relative to bar length) and variable stiffness control of the cable actuator up to 7 times the minimum stiffness for self support. To augment proprioception with environmental feedback, we develop a scalable, open-source visuotactile sensor embedded within each endcap of the tensegrity structure. The presented tensegrity robot is a platform for future advancements in autonomous operation and open-source module design.
|
|
09:30-09:55, Paper ThLB1R.11 | |
OceanSim: A GPU-Accelerated Underwater Robot Perception Simulation Framework |
|
Song, Jingyu | University of Michigan |
Ma, Haoyu | University of Michigan |
Bagoren, Onur | University of Michigan |
Venkatramanan Sethuraman, Advaith | University of Michigan |
Zhang, Yiting | University of Michigan, Ann Arbor |
Skinner, Katherine | University of Michigan |
Keywords: Marine Robotics, Simulation and Animation, Field Robots
Abstract: Underwater simulators offer support for building robust underwater perception solutions. Significant work has recently been done to develop new simulators and to advance the performance of existing underwater simulators. Still, there remains room for improvement on physics-based underwater sensor modeling and rendering efficiency. In this paper, we propose OceanSim, a high-fidelity GPU-accelerated underwater simulator to address this research gap. We propose advanced physics-based rendering techniques to reduce the sim-to-real gap for underwater image simulation. We develop OceanSim to fully leverage the computing advantages of GPUs and achieve real-time imaging sonar rendering and fast synthetic data generation. We evaluate the capabilities and realism of OceanSim using real-world data to provide qualitative and quantitative results.
|
|
09:30-09:55, Paper ThLB1R.12 | |
Bayesian Intent Inference Via Egocentric Vision and Gesture Recognition |
|
Timilsina, Prabin | Florida State University |
Higgins, Taylor | Florida State University |
Keywords: Intention Recognition, Physical Human-Robot Interaction, Physically Assistive Devices
Abstract: Many robotic assistive devices require users to perform explicit commanding actions, such as pressing buttons or tilting their bodies, to initiate assistance. However, it would be more natural if the device could infer intent implicitly. Prior work has explored intent prediction using different sensor modalities, such as electromyography (EMG), electroencephalography (EEG), and inertial measurement units (IMUs). Some vision-based approaches rely on external cameras to analyze human action from a third person perspective. While these methods can provide useful insights, they either focus only on body signals without considering environmental context or require external cameras that are impractical for wearable robotics. In contrast, first-person vision offers a direct perspective of how users interact with their surroundings, yet it remains underexplored in intent inference research. We propose a hybrid approach that utilizes a Bayesian network to probabilistically fuse first-person visual context with body motion data to understand human intent. Unlike some prior work that uses computer vision only for perception, our approach reasons about the contextual affordances of objects in environment and integrates it with human motion to improve both prediction accuracy and explainability. Our results show the feasibility of this approach in controlled experiments, and future research could expand on this work to improve robustness and generalizability in real-world environments.
|
|
09:30-09:55, Paper ThLB1R.13 | |
Geometric and Dynamic Modeling of McKibben Muscles for Soft Robotic Control Applications |
|
Ochieze, Chukwuemeka George | University of Virginia |
Keywords: Rehabilitation Robotics, Modeling, Control, and Learning for Soft Robots, Soft Robot Applications
Abstract: Soft robotic systems offer remarkable adaptability and safety in dynamic environments, but their compliant structures introduce complex, nonlinear behaviors that challenge traditional modeling and control techniques. This paper proposes a unified modeling and control framework for pneumatic artificial muscles (PAMs), with emphasis on McKibben actuators in both rectilinear and curvilinear configurations. Recognizing the limitations of linear modeling for soft robotics, the framework is extended to include nonlinearities of soft body dynamics by incorporating nonlinear stiffness and damping terms, enabling accurate representation of hyper-elastic and viscoelastic behavior.
|
|
09:30-09:55, Paper ThLB1R.14 | |
Enhanced Diagnostic Imaging Via Robotic Arc Ultrasound Scanning and 3d Reconstruction |
|
Koo, Kyoungmo | University of Michigan |
Peng, Xiaorui | University of Michigan |
Ma, Guangshen | Duke University |
Draelos, Mark | University of Michigan |
Wang, Xueding | University of Michigan |
Keywords: Medical Robots and Systems, Hardware-Software Integration in Robotics
Abstract: This study presents a robotic ultrasound imaging system designed to enhance diagnostic image quality through arc-based scanning trajectories and 3D reconstruction. Building on prior work in robotic linear ultrasound for musculoskeletal imaging, we introduce an arc scanning approach that aligns more perpendicularly to curved anatomical surfaces—such as finger joints—thereby improving bone and soft tissue visualization. Our integrated system combines a GE Vivid E95 ultrasound scanner with a Universal Robots UR3 robotic arm to perform automated B-mode, Color Flow Mapping (CFM), and photoacoustic scans. The scanning procedure consists of initial alignment, trajectory execution, motion compensation, and volume reconstruction, with interpolation based on proximity-weighted pixels from the nearest planes. We conducted phantom, volunteer, and patient studies to evaluate performance. Quantitative results from phantom scans demonstrated that the arc scan offers sharper boundary contrast. In both volunteer and patient cases, arc scans consistently produced more continuous and symmetric bone surfaces with uniform soft tissue appearance in B-mode imaging. However, CFM scans revealed fewer motion artifacts and clearer vascular signals in linear trajectories, suggesting arc-induced mechanical wave artifacts. These findings highlight the complementary strengths of each trajectory, motivating future work on hybrid scanning that integrates both linear and arc paths for optimized diagnostics.
|
|
09:30-09:55, Paper ThLB1R.15 | |
Optimizing Topological Mapping for Robotics: Integrating Delaunay-Cech Filtration and Discrete Gradient Vector Fields |
|
Sahiner, Simge | University at Albany |
Ekenna, Chinwe | University at Albany |
Keywords: Formal Methods in Robotics and Automation, Motion and Path Planning, Planning under Uncertainty
Abstract: In this research, we explore the integration of topological data analysis techniques, specifically simplicial complexes and filtrations, with Discrete Morse Theory to enhance robotic motion planning and environmental mapping. While Vietoris-Rips complexes offer computational efficiency and Cˇech complexes capture topological features more accurately, combining these tools within a Discrete Morse Theory framework allows for simplification and identification of critical points in the configuration space without changing the topology of the structure. A central motivation for this work is the search for a simplicial complex that balances the topological accuracy of Cech complexes with the lower computational demands of Vietoris-Rips complexes. While creating such a complex is a challenging task, this trade-off drives the development of more efficient and accurate representations of space for robotics. Additionally, this research addresses the difficulty of identifying local minima and maxima in higher-dimensional settings, extending the utility of Discrete Morse Theory. Building on these ideas, we explore hybrid approaches involving Delaunay-Cˇech filtrations and Discrete Gradient Vector Fields to create more adaptive topological representations.
|
|
09:30-09:55, Paper ThLB1R.16 | |
Tensegrity Robot Proprioceptive State Estimation with Geometric Constraints |
|
Tong, Wenzhe | University of Michigan, Ann Arbor |
Lin, Tzu-Yuan | University of Michigan |
Mi, Jonathan | University of Michigan, Ann Arbor |
Jiang, Yicheng | University of Michigan-Ann Arbor |
Ghaffari, Maani | University of Michigan |
Huang, Xiaonan | University of Michigan |
Keywords: Modeling, Control, and Learning for Soft Robots, Localization, Sensor Fusion
Abstract: This paper presents a novel proprioceptive state estimator for tensegrity robots, addressing the challenges posed by their unique structural configurations and dynamic behaviors. The tensegrity robot, constructed from a synergistic assembly of rigid rods and elastic cables, is designed for high-impact tolerance and shape morphing. To accurately capture both the global pose and internal shape of the robot in real time, our approach fuses high-frequency inertial measurements from onboard IMUs with precise cable length data from motor encoders. An optimization-based shape reconstruction algorithm—enforced by rigorous geometric and chirality constraints—is employed to estimate the endcap positions in the body frame, overcoming the complexities inherent in its continuously deforming configuration. To further enhance the estimator's robustness, the forward kinematics from shape reconstruction is integrated into a contact-aided Invariant Extended Kalman Filter (InEKF) framework. This filter leverages detected endcap-ground contact events to correct state estimates via forward kinematics, ensuring improved accuracy even in scenarios with camera occlusions or degraded environmental conditions. Extensive simulation and real-world experiments validate the proposed framework, achieving an average drift of approximately 4.2% while maintaining computational efficiency suitable for onboard autonomous operations. Overall, this work not only advances state estimation methodologies for tense
|
|
09:30-09:55, Paper ThLB1R.17 | |
SkillWrapper: Autonomously Learning Interpretable Skill Abstractions with Foundation Models |
|
Yang, Ziyi | Brown University |
Sundara Raman, Shreyas | Brown University |
Hedegaard, Benned | Brown University |
Fu, Haotian | Brown University |
Zhao, Linfeng | Northeastern University |
Tellex, Stefanie | Brown |
Konidaris, George | Brown University |
Paulius, David | Brown University |
Shah, Naman | Arizona State University |
Keywords: Representation Learning, Task and Motion Planning, Deep Learning Methods
Abstract: We envision a future where robots are equipped "out of the box" with portable skills. However, to effectively deploy and compose these skills for robotic tasks, users must know about the conditions under which they can be successfully executed and the consequences of the execution; in task planning, these are known as an action's preconditions and effects. We present SkillWrapper: an approach that automatically learns human-interpretable abstractions of skills, while guaranteeing complete and sound representations for planning, enabling long-horizon skill composition and zero-shot generalization. Our approach exploits foundation models to propose tasks from which it then learns a semantically meaningful grounded representations of preconditions and effects given images of what the robot perceives before and after executing a skill.
|
|
09:30-09:55, Paper ThLB1R.18 | |
Batch Learning of Koopman Operators from Streaming Data of Off-Road Vehicle with Terrain Dynamics |
|
Loya, Kartik | Clemson University |
Tallapragada, Phanindra | Clemson University |
Keywords: Model Learning for Control, Wheeled Robots, Dynamics
Abstract: Autonomous off-road vehicles generate rich, high-frequency sensor data that capture complex and nonlinear interactions with unstructured terrain. While terrain dynamics are often modeled using physics-based approaches like Bekker models, such methods are computationally intensive and unsuitable for real-time use. Moreover, streaming data may contain rare or regime-specific events critical for effective modeling and control. This work addresses the challenge of updating Koopman operator-based models in real time without overwhelming memory and computational resources. Rather than storing all incoming data or discarding potentially useful samples, we propose a batch update method that selectively incorporates only informative datasets. Novelty in dynamics is detected using the Grassmannian distance between subspaces, enabling efficient identification of meaningful regime shifts. Our approach significantly reduces data volume and computational load while preserving model accuracy. Additionally, it includes basis function learning to optimize the Koopman representation. The framework is validated on simulated systems, demonstrating effective online learning, reduced model complexity, and improved prediction and control performance in off-road environments.
|
|
09:30-09:55, Paper ThLB1R.19 | |
Data-Driven Prediction Model of Soft-Tissue Surface Deformation During Robotic Palpation |
|
Qin, Tianhao | University of Michigan |
Ma, Guangshen | Duke University |
Draelos, Mark | University of Michigan |
Keywords: Medical Robots and Systems, Machine Learning for Robot Control, Computer Vision for Medical Robotics
Abstract: Biological shape deformation resulting from robot palpation is a challenging problem in surgical robotics due to the complex contact modeling of tool-tissue interaction. The existing methods mainly focused on the development of model-based frameworks (e.g. Finite element analysis) and the use of force sensing to build a shape prediction model. However, these models are either computationally expensive or susceptible to the changes of tool configurations and tissue properties. To overcome this problem, we present an entirely data-driven method for soft-tissue surface prediction during robot palpation. We implement a multi-layer perceptron model to learn the complex relationship between the tool’s configuration (position and orientation) and the resulting surface deformation. Given a robot configuration and a local surface geometry around the contact center, the model can predict 3D displacement vectors of the pre-deform surfaces. We conducted simulation experiments to evaluate the model performance based on the various palpation angles between -40◦ and +40◦ and different tool penetration depths from 0.3 mm to 1.9 mm. The results show an average root mean square error of approximately 0.01 mm and an average maximum error less than 0.03 mm. This can demonstrate the feasibility of using a generic multi-layer perceptron model to learn the tool-tissue physics and shows potential applications of fast surface inference and deformation prediction during robot palpation.
|
|
09:30-09:55, Paper ThLB1R.20 | |
An AI-Based Robot System for Automation Process of Slaughtering Ducks |
|
Ko, KwangEun | Korea Institute of Industrial Technology |
Yang, Gi-Hun | KITECH |
Kang, Jaehyeon | Korea Institute of Industrial Technology |
Choo, Sungwon | KITECH |
Nam, Kyung-Tae | Kitech |
Han, Sang Kuy | Korea Institute of Industrial Technology |
Keywords: Computer Vision for Automation, Agricultural Automation, Robotics and Automation in Agriculture and Forestry
Abstract: The bloodletting in a duck slaughter process requires repetitive and dangerous labors. In order to automate the bloodletting process for duck slaughtering, we developed a machine vision-based bloodletting-slaughtering robot system. This system includes AI-based target recognition technology to detect the bloodletting area automatically and robot slaughtering system. A Mask-RCNN is utilized for object-background segmentation. Following the background segmentation, an integrated pipeline was implemented to find the target bloodletting point based on the estimated cervical vertebrae from the neck area of each duck. A training dataset of 1,789 RGB images was collected from duck slaughter process and the deep learning model for a duck neck-background segmentation was trained. In addition, classifying duck fainting and non-fainting is required for running a duck bloodletting process without system interruption. A lightweight CNN architecture using ONNX library was implemented for an application to edge devices in the workplace. The test results of automatically detecting the target bloodletting point showed high accuracy. A slaughtering robot system consists of a three-axis orthogonal robot, and the target bloodletting point is controlled in real time using position information transmitted from AI vision camera. The slaughtering task time using the robot system was 2.7 sec per duck. For application in slaughterhouses, five robots equipped with AI vision cameras will work in a set.
|
|
09:30-09:55, Paper ThLB1R.21 | |
E-ARC: Experience-Based Subproblem Planning for Multi-Robot Motion Planning |
|
Solis Vidana, Juan Irving | University of Illinois Urbana-Champaign |
Motes, James | University of Illinois Urbana-Champaign |
Morales, Marco | University of Illinois Urbana-Champaign & Instituto Tecnológico |
Amato, Nancy | University of Illinois Urbana-Champaign |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems, Motion and Path Planning
Abstract: Multi-robot systems enhance efficiency and productivity across various applications, from manufacturing to surveillance. While single-robot motion planning has improved by using databases of prior solutions, extending this approach to multi-robot motion planning (MRMP) presents challenges due to the increased complexity and diversity of tasks and configurations. Recent discrete methods have attempted to address this by focusing on relevant lower-dimensional subproblems, but they are inadequate for complex scenarios like those involving manipulator robots. To overcome this, we propose a novel approach that constructs and utilizes databases of solutions for smaller sub-problems. By focusing on interactions between fewer robots, our method reduces the required database size compared to one that captures the full MRMP problem, enabling efficient handling of more complex scenarios. We validate our approach with experiments on mobile and manipulator robots, showing significant improvements in scalability and efficiency. Our method improves over its previous version with up to 40% faster planning and over existing experience-based methods with a 342× faster database construction and 2× faster planning. Our contributions include a rapidly constructed database for low-dimensional MRMP problems, a framework for applying these solutions to larger problems, and experimental validation with up to 32 mobile and 16 manipulator robots.
|
|
09:30-09:55, Paper ThLB1R.22 | |
Understanding Spatial Relationships with Spherical Image Data for Manipulating Intelligent Robotic Wheelchairs |
|
Sarathchandra, H.A.H.Y. | Shibaura Institute of Technology |
Shimbo, Y | Shibaura Institute of Technology |
Senevirathna, Nilupul Nuwan | Shibaura Institute of Technology |
Premachandra, Chinthaka | Shibaura Institute of Technology |
Keywords: Omnidirectional Vision, Visual Learning, Vision-Based Navigation
Abstract: Mobility assistance is one of the major supports for elderly and differently abled communities. Since wheelchairs are primarily used in human-centered areas, there is a significant risk of accidents caused by the wheelchair itself and by other pedestrians in the vicinity. To address this serious issue, this work proposes utilizing spherical camera data for environmental perception. However, incorporating spherical data may introduce additional challenges to the topic. The use of omnidirectional cameras is currently very limited for navigational purposes and for scene understanding. However, the equirectangular transformation of spherical data can help understand the surrounding environment. Moreover, these images offer better insight into the direction of a pedestrian's motion and the distance. The study proposes an architecture to extract spatial connections between the wheelchair and nearby pedestrians by identifying visual features and tracking the subjects' movements relative to the wheelchair. This method can easily detect when a pedestrian is approaching a wheelchair or passing nearby without entering a danger zone, and the proposed architecture was able to achieve over 95% accuracy at an average processing frame rate of 15 FPS. Thus, this demonstrates the potential of spherical image data, which can be further enhanced to make intelligent decisions by electric wheelchairs and reduce cognitive load on the user to make their life more comfortable.
|
|
09:30-09:55, Paper ThLB1R.23 | |
RAFFM: Robot Assisted Feeding for Finger-Foods with Multimodal Capabilities |
|
Ramchandra, Mahanthesh | Cleveland State University, Cleveland, Ohio, Center for Human-Ma |
Miller, Jacob | Cleveland State University, Cleveland, Ohio, Center for Human-Ma |
Foley, Claire | Cleveland State University, Cleveland, Ohio, Center for Human-Ma |
Burkhart, Ian | North American Spinal Cord Injury Consortium |
Kubec, Gina | Cleveland State University, Cleveland, Ohio, Center for Human-Ma |
Schearer, Eric | The MetroHealth System |
Zingale, Nicholas | Cleveland State University |
Keywords: Multi-Modal Perception for HRI, Physically Assistive Devices, Object Detection, Segmentation and Categorization
Abstract: A robotic feeding system for finger foods for individuals with spinal cord injuries. Our system integrates zero-shot object detection model, it uses Vision Language Model to recognize different food items and leverage large language models to personalize the feeding preferences. The platform is equipped with multimodal capabilities to process visual, auditory and proprioceptive inputs in realtime and respond with natural human like response and execute appropriate low-level robot actions. The system intelligently categorizes food items as sigle-bite or multi-bite to determine the most optimal grasp points. The robot also offers drink assistance using straw-delivery approach. For safe and comfortable bite transfer, our solution implements dynamic positioning that adapts to the user's height and facial orientation, using visual servoing with lip detection to ensure food is delivered only when the user's mouth is open. Additionally, we also provide a calibration method as an alternative to visual servoing, where the robot is set to admittance mode and manually moved to user's desired bite transfer pose. User studies with seven participants, including one with spinal cord injury, demonstrate excellent usability with SUS scores of 70 and NASA-TLX scores of 17, significantly outperforming the baseline average of 37±11.
|
|
09:30-09:55, Paper ThLB1R.24 | |
Human-Centered User Interface Design for Eye Imaging Medical Robot |
|
Staudinger, Samantha | University of Michigan |
Zhao, Genggeng | University of Michigan, Ann Arbor |
Pan, Haochi | University of Michigan |
Draelos, Mark | University of Michigan |
Keywords: Medical Robots and Systems
Abstract: We present the design and implementation of a user interface (UI) for a robotic optical coherence tomography (OCT) system, developed to enhance usability and accessibility in ophthalmic imaging. The system integrates a robotic arm that positions an actively tracking OCT scan head over an extended workspace, minimizing the need for mechanical head stabilization and manual alignment. The UI architecture supports both operator-assisted and autonomous imaging modes, offering real-time feedback, intuitive interaction, and motion-adaptive visualization to facilitate effective human-robot collaboration. Future work includes a formal user study to evaluate the UI’s performance in terms of usability, task efficiency, and adaptability.
|
|
09:30-09:55, Paper ThLB1R.25 | |
Equivariant Neural Inertial Odometry |
|
Kim, Chankyo | University of Michigan |
Lin, Tzu-Yuan | University of Michigan |
Zhu, Minghan | University of Michigan |
Liu, Ben | Southern University of Science and Technology |
Ghaffari, Maani | University of Michigan |
Keywords: Localization, Computational Geometry, AI-Based Methods
Abstract: Inertial odometry is essential for accurate localization in autonomous systems, particularly in environments where visual data is unreliable. However, noisy IMU measurements hinder accurate gravity compensation, which is critical for precise odometry. This paper presents a fully SO(3)-equivariant neural network framework for IMU-only odometry, designed to improve generalization across varying IMU mount orientations and mitigate challenges in gravity compensation. By representing IMU data in the Lie algebra and leveraging the symmetry properties of the rotation Lie group, the proposed method enforces 3D rotational symmetry within the feature space, leading to accurate state estimation in general unseen trajectories. Experimental results on diverse human motion datasets further demonstrate that our approach is more robust to gravitational perturbation on the IMU measurement, effectively reducing drift accumulation over the sequences. It remains robust under arbitrary IMU installation orientations, outperforming existing methods that rely on data augmentation or frame canonicalization. These improvements are achieved with a more compact network architecture, reducing computational complexity while maintaining accuracy. The results of this work enable reliable inertial position and orientation tracking in mobile robotics applications that have dynamic yaw direction motion with limited sensing capabilities.
|
|
09:30-09:55, Paper ThLB1R.26 | |
Fully Integrated Sensor Suite for Delicate Manipulation |
|
Shang, Siqi | University of Texas at Austin |
Seo, Mingyo | The University of Texas at Austin |
Zhu, Yuke | The University of Texas at Austin |
Chin, Lillian | UT Austin |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation, Grasping
Abstract: Parallel-jaw grippers can perform a wide range of complex, everyday tasks. However, they often rely on open-loop control without direct grasp feedback, leading to failures in delicate manipulation and fragile object handling. Tactile sensing can address this, but existing methods typically involve high integration complexity or compromise gripper compliance. There is a clear need for developing sensorized compliant grippers without sacrificing simplicity. We address this by embedding 3D-printed air channels within Fin-Ray fingers. These channels act as force and slip sensors, providing precise, real-time feedback while remaining simple to fabricate and integrate. Our system estimates grip force with a RMSE under 0.2 N and incorporates an analytical slip detector using second-order spectral features of tactile vibrations. We evaluate the system on an autonomous grasp-and-lift task using 31 objects spanning categories of fragile, slippery, and ordinary. Across 310 trials, our slip detector achieved 0.93 accuracy and 1.0 precision. Our system achieved 91.9% overall success and 98.6% on fragile objects. In contrast, a on-off strategy caused breakage of 84.6% of fragile objects, while a naive initial 0.25 N grip force yielded only 1.25% success on ordinary objects. These results demonstrate that our design combines accurate, real-time tactile feedback with the inherent compliance of the Fin-Ray structure, enabling reliable manipulation of delicate and diverse objects.
|
|
09:30-09:55, Paper ThLB1R.27 | |
Key Capabilities of Autonomous Mobile Platforms for Maintenance and Monitoring in Manufacturing Environments |
|
Sun, Yung-Ching | University of Michigan, Ann Arbor |
Staudinger, Samantha | University of Michigan |
Chapin, Hanna | Nestlé Purina PetCare North America |
Carter, Alyssa | Nestlé Purina PetCare North America |
Barton, Kira | University of Michigan at Ann Arbor |
Tilbury, Dawn | University of Michigan |
Keywords: Manufacturing, Maintenance and Supply Chains, Mobile Manipulation, Object Detection, Segmentation and Categorization
Abstract: Quadruped robots are increasingly being deployed in manufacturing environments for autonomous inspection and manipulation tasks due to their unique advantage in mobility, such as navigating in tight spaces and stairs, which makes them ideal candidates for manufacturing plants. This paper investigates the potential of quadruped robots equipped with a 6-degree-of-freedom arm to perform complex tasks, including pick-and-place operations and identifying target objects in a cluttered environment. We conducted experiments focused on quadruped robots' ability to autonomously pick up, carry, and place a bucket in a structured workflow. We also investigated the use of computer vision and AprilTags for object detection and tracking. The quadruped demonstrated robust performance in both tasks, although challenges arise when handling heavier loads or finding target objects in cluttered environments. These findings make clear the capabilities and limitations of autonomous quadruped robots with manipulators in manufacturing settings.
|
|
09:30-09:55, Paper ThLB1R.28 | |
Dual-Arm Teleoperated Robotic Microsurgery System with Real-Time Volumetric OCT Image Feedback |
|
Liu, Jiawei | University of Michigan |
Ma, Guangshen | Duke University |
Zhou, Genggeng | Stanford University |
Pan, Haochi | University of Michigan |
Lam, Colin | University of Michigan |
Jin, Catherine | University of Michigan |
Valikodath, Nita | University of Michigan |
Draelos, Mark | University of Michigan |
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles, Telerobotics and Teleoperation
Abstract: Microsurgical procedures demand exceptional precision and dexterity while posing significant challenges related to depth perception, fatigue, and hand tremor. Al- though intraoperative perception systems in surgical robotics typically provide real-time image feedback, they frequently lack volumetric or depth information. Moreover, teleoperated surgical robotic systems often require extensive training because their control mechanisms may not align with a surgeon’s intuitive practices, and standardizing these controls is further complicated by varying user preferences. To address these issues, we propose a dual-arm teleoperated robotic system that provides high-fidelity intraoperative volu- metric imaging for micro-scale tissue manipulation. The system integrates an optical coherence tomography (OCT) sensor for real-time 3D visualization and is operated through haptic input devices for precise manipulation. Surgeons can choose between multiple teleoperation frameworks to match their individual preferences and intuitions. We evaluate system performance through a precision positioning task and a vessel-following task in a retinal model, demonstrating average positioning errors of approximately 230 μm and 85 μm, respectively. Finally, we demonstrate the fully integrated system in an eggshell mem- brane peeling task that simulates retinal membrane peeling.
|
|
09:30-09:55, Paper ThLB1R.29 | |
Training Robot Swarms for Adaptive Foraging in Environments with Obstacles |
|
Biteng, Pigar | The University of Texas Rio Grande Valley |
Zaman, Tameem Uz | The University of Texas Rio Grande Valley |
Lu, Qi | The University of Texas Rio Grande Valley |
Keywords: Swarm Robotics, Multi-Robot Systems, Bioinspired Robot Learning
Abstract: We apply the NeuroEvolution of Augmented Topologies (NEAT) to train adaptive and efficient swarm robotic foraging behavior in unknown environments with random obstacles. By rewarding effective actions and penalizing inefficient ones, the system promotes coordination and obstacle avoidance, minimizing redundant exploration and outperforming traditional stochastic foraging algorithms. The optimization focuses on cumulative reward fitness, evaluated through simulations with three types of distributed resources, where swarm performance is analyzed in terms of foraging time efficiency and resource retrieval rates. Comparative experiments conducted across two swarm sizes demonstrate that the elaborated penalty-reward strategy enhances resource retrieval rates while reducing energy expenditure and obstacle avoidance, showcasing a significant improvement in foraging success and scalability over traditional foraging algorithms. In future work, we will leverage Federated Learning (FL) to build an efficient, distributed, scalable, and secure robot swarm tailored for foraging tasks.
|
|
09:30-09:55, Paper ThLB1R.30 | |
Design Concept of Electromagnetic Tracking Method Based on a Robotic Arm for Pose Estimation |
|
Hao-Kuei, Lu | National Taiwan University |
Hsu, Kuo-En | National Taiwan University |
Po-Ju, Huang | National Taiwan University |
Lin, Chun-Yeon | National Taiwan University |
Keywords: Localization
Abstract: The merits of electromagnetic tracking that line of sight is not required, such as utilizing inside a body or behind an obstruction, makes electromagnetic tracking systems attractive for minimally invasive procedures. One critical issue is that electromagnetic tracking does not provide uniform accuracy throughout the tracking volume. This study proposes integrating an electromagnetic tracking system consisting of a magnetic field generator of five excitation coils, a sensing coil, and a robotic arm. The magnetic field generator is installed on the robotic arm’s end effector. The pose of the electromagnetic tracking device placed on the robotic arm can be obtained from the calibration of the electromagnetic tracking system on the robotic arm. After calibration, the posture of the sensing coil in the robotic arm coordinates can be obtained and regarded as a closed-loop signal to control the robotic arm and place the excitation coil in a range suitable for sensing. The proposed method, which integrates an electromagnetic tracking system with a robotic arm, can increase the range of electromagnetic tracking and perform tracking in the accurate range.
|
|
ThBT1 |
302 |
Planning and Simulation |
Regular Session |
Chair: Yoshida, Kazuya | Tohoku University |
Co-Chair: Dantam, Neil | Colorado School of Mines |
|
09:55-10:00, Paper ThBT1.1 | |
Guarantees on Robot System Performance Using Stochastic Simulation Rollouts |
|
Vincent, Joseph | Stanford University |
Feldman, Aaron | Stanford University |
Schwager, Mac | Stanford University |
Keywords: Probability and Statistical Methods, Optimization and Optimal Control, Motion and Path Planning, Risk-Sensitive Control
Abstract: We provide finite-sample performance guarantees for control policies executed on stochastic robotic systems. Given an open- or closed-loop policy and a finite set of trajectory rollouts under the policy, we bound the expected value, value-at-risk, and conditional-value-at-risk of the trajectory cost, and the probability of failure in a sparse cost setting. The bounds hold, with user-specified probability, for any policy synthesis technique and can be seen as a post-design safety certification. Generating the bounds only requires sampling simulation rollouts, without assumptions on the distribution or complexity of the underlying stochastic system. We adapt these bounds to also give a constraint satisfaction test to verify safety of the robot system. We provide a thorough analysis of the bound sensitivity to sim-to-real distribution shifts and provide results for constructing robust bounds that can tolerate some specified amount of distribution shift. Furthermore, we extend our method to apply when selecting the best policy from a set of candidates, requiring a multi-hypothesis correction. We show the statistical validity of our bounds in the Ant, Half-cheetah, and Swimmer MuJoCo environments and demonstrate our constraint satisfaction test with the Ant. Finally, using the 20 degree-of-freedom MuJoCo Shadow Hand, we show the necessity of the multi-hypothesis correction.
|
|
10:00-10:05, Paper ThBT1.2 | |
In-Pipe Navigation Development Environment and a Smooth Path Planning Method on Pipeline Surface |
|
Liu, Hao | Independent |
Li, Xiang | The Lab for High Technology, Tsinghua University |
Zhang, Xiang | Qylab |
Liu, Gang | Tsinghua University |
Lu, Mingquan | Tsinghua University |
Keywords: Motion and Path Planning, Climbing Robots
Abstract: Autonomous in-pipe inspection robots can automatically navigate through complex pipeline networks and detect potential risks from corrosion and defects, demonstrating great potential for replacing costly manual inspections. However, there is no publicly available simulation environment where researchers can validate their in-pipe navigation algorithms as far as we know, and the navigation algorithms on constrained 3D pipe surface which is the critical software component are less discussed. Firstly, this paper proposes an open-source In-Pipe Navigation Development Environment. It contains various pipeline models, a magnetic wheel climbing robot model realized by the adhesion plugin, and baseline algorithms for navigation tasks. Secondly, a novel effective path planning method is introduced. Instead of planning based on surface structures, the proposed method plans based on pipeline axis and maps it into local path using the Frenet-Serret formula, thereby generating smooth, feasible, and efficient paths. Finally, we conduct both qualitative and quantitative experiments in the proposed simulation and real-world environments. The results show the usability of the development environment, also robustness and efficiency of the proposed planning method.
|
|
10:05-10:10, Paper ThBT1.3 | |
Extended Friction Models for the Physics Simulation of Servo Actuators |
|
Duclusaud, Marc | LaBRI - University of Bordeaux |
Passault, Grégoire | LaBRI |
Padois, Vincent | Inria Bordeaux |
Ly, Olivier | LaBRI - Bordeaux University |
Keywords: Simulation and Animation, Calibration and Identification
Abstract: Accurate physical simulation is crucial for the development and validation of control algorithms in robotic systems. Recent works in Reinforcement Learning (RL) take notably advantage of extensive simulations to produce efficient robot control. State-of-the-art servo actuator models generally fail at capturing the complex friction dynamics of these systems. This limits the transferability of simulated behaviors to real-world applications. In this work, we present extended friction models that allow to more accurately simulate servo actuator dynamics. We propose a comprehensive analysis of various friction models, present a method for identifying model parameters using recorded trajectories from a pendulum test bench, and demonstrate how these models can be integrated into physics engines. The proposed friction models are validated on four distinct servo actuators and tested on 2R manipulators, showing significant improvements in accuracy over the standard Coulomb-Viscous model. Our results highlight the importance of considering advanced friction effects in the simulation of servo actuators to enhance the realism and reliability of robotic simulations.
|
|
10:10-10:15, Paper ThBT1.4 | |
Hierarchically Accelerated Coverage Path Planning for Redundant Manipulators |
|
Wang, Yeping | University of Wisconsin-Madison |
Gleicher, Michael | University of Wisconsin - Madison |
Keywords: Motion and Path Planning, Industrial Robots
Abstract: Many robotic applications, such as sanding, polishing, wiping and sensor scanning, require a manipulator to dexterously cover a surface using its end-effector. In this paper, we provide an efficient and effective coverage path planning approach that leverages a manipulator's redundancy and task tolerances to minimize costs in joint space. We formulate the problem as a Generalized Traveling Salesman Problem and hierarchically streamline the graph size. Our strategy is to identify guide paths that roughly cover the surface and accelerate the computation by solving a sequence of smaller problems. We demonstrate the effectiveness of our method through a simulation experiment and an illustrative demonstration using a physical robot.
|
|
10:15-10:20, Paper ThBT1.5 | |
Decentralized Safe and Scalable Multi-Agent Control under Limited Actuation |
|
Zinage, Vrushabh | University of Texas at Austin |
Jha, Abhishek | Delhi Technological University |
Chandra, Rohan | University of Virginia |
Bakolas, Efstathios | The University of Texas at Austin |
Keywords: Integrated Planning and Control, Multi-Robot Systems
Abstract: To deploy safe and agile robots in cluttered environments, there is a need to develop fully decentralized controllers that guarantee safety, respect actuation limits, prevent deadlocks, and scale to thousands of agents. Current approaches fall short of meeting all these goals: optimization-based methods ensure safety but lack scalability, while learning-based methods scale but do not guarantee safety. We propose a novel algorithm to achieve safe and scalable control for multiple agents under limited actuation. Specifically, our approach includes: (i) learning a decentralized neural Integral Control Barrier function (neural ICBF) for scalable, input-constrained control, (ii) embedding a lightweight decentralized Model Predictive Control-based Integral Control Barrier Function (MPC-ICBF) into the neural network policy to ensure safety while maintaining scalability, and (iii) introducing a novel method to minimize deadlocks based on gradient-based optimization techniques from machine learning to address local minima in deadlocks. Our numerical simulations show that this approach outperforms state-of-the-art multi-agent control algorithms in terms of safety, input constraint satisfaction, and minimizing deadlocks. Additionally, we demonstrate strong generalization across scenarios with varying agent counts, scaling up to 1000 agents.
|
|
10:20-10:25, Paper ThBT1.6 | |
Multi-Agent Collective Construction of General Modular Structures |
|
Kostitsyna, Irina | KBR at NASA Ames Research Center |
Cheung, Kenneth C. | National Aeronautics and Space Administration (NASA) |
Gloyd, James | KBR Inc |
Keywords: Motion and Path Planning, Parallel Robots, Robotics and Automation in Construction
Abstract: We present an algorithmic framework for a multi-robot modular assembly system. Motivated by the prospects of in-space assembly, we focus on the NASA Automated Reconfigurable Mission Adaptive Digital Assembly Systems (ARMADAS) framework, in which multiple types of robots work together in a team to build large structures. Unlike with other multi-robot construction systems, the geometry of structures that ARMADAS robots can build is not limited to the class of histogram shapes. To address the intractability of path planning for a robot system with the exponentially growing number of dimensions, we present a decoupled planning approach, where the assembly and path planning is performed iteratively by one robot team at a time. We present a number of data structures which help us avoid collisions and deadlocks in the resulting robot schedule.
|
|
10:25-10:30, Paper ThBT1.7 | |
BPMP-Tracker: A Versatile Aerial Target Tracker Using Bernstein Polynomial Motion Primitives |
|
Lee, Yunwoo | Seoul National University |
Park, Jungwon | Seoul National University |
Jeon, Boseong | Seoul National University |
Jung, Seungwoo | Seoul National University |
Kim, H. Jin | Seoul National University |
Keywords: Visual Servoing, Reactive and Sensor-Based Planning, Motion and Path Planning
Abstract: This letter presents a versatile trajectory planning pipeline for aerial tracking. The proposed tracker is capable of handling various chasing settings such as complex unstructured environments, crowded dynamic obstacles and multiple-target following. Among the entire pipeline, we focus on developing a predictor for future target motion and a chasing trajectory planner. For rapid computation, we employ the sample-check-select strategy: modules sample a set of candidate movements, check multiple constraints, and then select the best trajectory. Also, we leverage the properties of Bernstein polynomials for quick calculations. The prediction module predicts the trajectories of the targets, which do not overlap with static and dynamic obstacles. Then the trajectory planner outputs a trajectory, ensuring various conditions such as occlusion and collision avoidance, the visibility of all targets within a camera image and dynamical limits. We fully test the proposed tracker in simulations and hardware experiments under challenging scenarios, including dual-target following, environments with dozens of dynamic obstacles and complex indoor and outdoor spaces.
|
|
ThBT2 |
301 |
SLAM 6 |
Regular Session |
Chair: Leonard, John | MIT |
Co-Chair: Schmid, Lukas M. | Massachusetts Institute of Technology (MIT) |
|
09:55-10:00, Paper ThBT2.1 | |
PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency |
|
Pan, Yue | University of Bonn |
Zhong, Xingguang | University of Bonn |
Wiesmann, Louis | University of Bonn |
Posewsky, Thorbjörn | University of Bonn |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Keywords: SLAM, Mapping, Localization, Deep Learning
Abstract: Accurate and robust localization and mapping are essential components for most autonomous robots. In this paper, we propose a SLAM system for building globally consistent maps, called PIN-SLAM, that is based on an elastic and compact point-based implicit neural map representation. Taking range measurements as input, our approach alternates between incremental learning of the local implicit signed distance field and the pose estimation given the current local map using a correspondence-free, point-to-implicit model registration. Our implicit map is based on sparse optimizable neural points, which are inherently elastic and deformable with the global pose adjustment when closing a loop. Loops are also detected using the neural point features. Extensive experiments validate that PIN-SLAM is robust to various environments and versatile to different range sensors such as LiDAR and RGB-D cameras. PIN-SLAM achieves pose estimation accuracy better or on par with the state-of-the-art LiDAR odometry or SLAM systems and outperforms the recent neural implicit SLAM approaches while maintaining a more consistent, and highly compact implicit map that can be reconstructed as accurate and complete meshes. Finally, thanks to the voxel hashing for efficient neural points indexing and the fast implicit map-based registration without closest point association, PIN-SLAM can run at the sensor frame rate on a moderate GPU.
|
|
10:00-10:05, Paper ThBT2.2 | |
Data-Driven Batch Localization and SLAM Using Koopman Linearization |
|
Guo, Zi Cong | University of Toronto |
Dümbgen, Frederike | ENS, PSL University |
Forbes, James Richard | McGill University |
Barfoot, Timothy | University of Toronto |
Keywords: Localization, SLAM, Koopman, Model Learning for Control
Abstract: We present a framework for model-free batch localization and SLAM. We use lifting functions to map a control-affine system into a high-dimensional space, where both the process model and the measurement model are rendered bilinear. During training, we solve a least-squares problem using groundtruth data to compute the high-dimensional model matrices associated with the lifted system purely from data. At inference time, we solve for the unknown robot trajectory and landmarks through an optimization problem, where constraints are introduced to keep the solution on the manifold of the lifting functions. The problem is efficiently solved using a sequential quadratic program (SQP), where the complexity of an SQP iteration scales linearly with the number of timesteps. Our algorithms, called Reduced Constrained Koopman Linearization Localization (RCKL-Loc) and Reduced Constrained Koopman Linearization SLAM (RCKL-SLAM), are validated experimentally in simulation and on two datasets: one with an indoor mobile robot equipped with a laser rangefinder that measures range to cylindrical landmarks, and one on a golf cart equipped with RFID range sensors. We compare RCKL-
|
|
10:05-10:10, Paper ThBT2.3 | |
Certifiably Correct Range-Aided SLAM |
|
Papalia, Alan | Massachusetts Institute of Technology |
Fishberg, Andrew | MIT |
O'Neill, Brendan | WHOI/MIT |
How, Jonathan | Massachusetts Institute of Technology |
Rosen, David | Northeastern University |
Leonard, John | MIT |
Keywords: SLAM, Range Sensing, Optimization and Optimal Control, Certifiable Perception
Abstract: We present the first algorithm to efficiently compute certifiably optimal solutions to range-aided simultaneous localization and mapping (RA-SLAM) problems. Robotic navigation systems increasingly incorporate point-to-point ranging sensors, leading to state estimation problems in the form of RA-SLAM. However, the RA-SLAM problem is significantly more difficult to solve than traditional pose-graph SLAM: ranging sensor models introduce non-convexity and single range measurements do not uniquely determine the transform between the involved sensors. As a result, RA-SLAM inference is sensitive to initial estimates yet lacks reliable initialization techniques. Our approach, certifiably correct RA-SLAM (CORA), leverages a novel quadratically constrained quadratic programming (QCQP) formulation of RA-SLAM to relax the RA-SLAM problem to a semidefinite program (SDP). CORA solves the SDP efficiently using the Riemannian Staircase methodology; the SDP solution provides both (i) a lower bound on the RA-SLAM problem's optimal value, and (ii) an approximate solution of the RA-SLAM problem, which can be subsequently refined using local optimization. CORA applies to problems with arbitrary pose-pose, pose-landmark, and ranging measurements and, due to using convex relaxation, is insensitive to initialization. We evaluate CORA on several real-world problems. In contrast to state-of-the-art approaches, CORA is able to obtain high-quality solutions on all problems despite being initialized with random values. Additionally, we study the tightness of the SDP relaxation with respect to important problem parameters: the number of (i) robots, (ii) landmarks, and (iii) range measurements. These experiments demonstrate that the SDP relaxation is often tight and reveal relationships between graph connectivity and the tightness of the SDP relaxation.
|
|
10:10-10:15, Paper ThBT2.4 | |
DiTer++: Diverse Terrain and Multi-Modal Dataset for Multi-Robot SLAM in Multi-Session Environments |
|
Kim, Juwon | Dept. Electr. and Comput. Eng., Inha University, South Korea |
Kim, Hogyun | Inha University |
Jeong, Seokhwan | Inha University |
Shin, Young-Sik | KIMM |
Cho, Younggun | Inha University |
Keywords: Data Sets for SLAM, Localization, Mapping
Abstract: We encounter large-scale environments where both structured and unstructured spaces coexist, such as on campuses. In this environment, lighting conditions and dynamic objects change constantly. To tackle the challenges of largescale mapping under such conditions, we introduce DiTer++, a diverse terrain and multi-modal dataset designed for multirobot SLAM in multi-session environments. According to our datasets’ scenarios, Agent-A and Agent-B scan the area designated for efficient large-scale mapping day and night, respectively. Also, we utilize legged robots for terrain-agnostic traversing. To generate the ground-truth of each robot, we first build the survey-grade prior map. Then, we remove the dynamic objects and outliers from the prior map and extract the trajectory through scan-to-map matching. Our dataset and supplement materials are available at https://github.com/sparolab/DiTer-plusplus/.
|
|
10:15-10:20, Paper ThBT2.5 | |
CELLmap: Enhancing LiDAR SLAM through Elastic and Lightweight Spherical Map Representation |
|
Duan, Yifan | University of Science and Technology of China |
Zhang, Xinran | University of Science and Technology of China |
Li, Yao | University of Science and Technology of China |
You, Guoliang | University of Science and Technology of China |
Chu, Xiaomeng | University of Scieonce and Technology of China |
Ji, Jianmin | University of Science and Technology of China |
Zhang, Yanyong | University of Science and Technology of China |
Keywords: Mapping, SLAM
Abstract: SLAM is a fundamental capability of unmanned systems, with LiDAR-based SLAM gaining widespread adop- tion due to its high precision. Current SLAM systems can achieve centimeter-level accuracy within a short period. How- ever, there are still several challenges when dealing with large- scale mapping tasks including significant storage requirements and difficulty of reusing the constructed maps. To address this, we first design an elastic and lightweight map representation called CELLmap, composed of several CELLs, each representing the local map at the corresponding location. Then, we design a general backend including CELL-based bidirectional regis- tration module and loop closure detection module to improve global map consistency. Our experiments have demonstrated that CELLmap can represent the precise geometric structure of large-scale maps of KITTI dataset using only about 60 MB. Additionally, our general backend achieves up to a 26.88% improvement over various LiDAR odometry methods.
|
|
10:20-10:25, Paper ThBT2.6 | |
A Benchmark Dataset for Collaborative SLAM in Service Environments |
|
Park, Harin | UNIST |
Lee, Inha | Ulsan National Institute of Science & Technology |
Kim, Minje | Ulsan National Institute of Science & Technology |
Park, Hyungyu | Ulsan National Institute of Science and Technology |
Joo, Kyungdon | UNIST |
Keywords: Data Sets for SLAM, Multi-Robot SLAM, Data Sets for Robotic Vision
Abstract: We introduce a new multi-modal collaborative SLAM (C-SLAM) dataset for multiple service robots in various indoor service environments, called C-SLAM dataset in Service Environments (CSE). We use the NVIDIA Isaac Sim to generate data in various indoor service environments with the challenges that may occur in real-world service environments. By using the simulator, we can provide accurate and precisely time-synchronized sensor data, such as stereo RGB, stereo depth, IMU, and ground truth poses. We configure three common indoor service environments (Hospital, Office, and Warehouse), each of which includes various dynamic objects that perform motions suitable to each environment. In addition, we drive the three robots to mimic the actions of real service robots. Through these factors, we generate a realistic C-SLAM dataset for multiple service robots. We demonstrate our CSE dataset by evaluating diverse state-of-the-art single-robot SLAM and multi-robot SLAM methods. Our dataset will be available at https://github.com/vision3d-lab/CSE_Dataset.
|
|
10:25-10:30, Paper ThBT2.7 | |
A Consistent Parallel Estimation Framework for Visual-Inertial SLAM |
|
Huai, Zheng | University of Delaware |
Huang, Guoquan (Paul) | University of Delaware |
Keywords: SLAM, Visual-Based Navigation, Sensor Fusion, Estimation Consistency
Abstract: In this article, we revisit the optimal fusion of visual and inertial information from a monocular camera and an inertial measurement unit and propose a novel parallel visual-inertial simultaneous localization and mapping (SLAM) estimation framework in favor of the multithread computation on a single CPU. We start modeling the SLAM problem with a Bayesian batch estimator, and then split it into two submodules, localization and mapping, of different scales and processing rates, however, can thus run concurrently. The estimation consistency is taken into account in decoupling the two submodules so that when loop closure occurs the localization accuracy can seamlessly benefit from the mapping result via online global optimization, which distinguishes our solution from the others. To this end, we design the corresponding front-end and back-end to consistently solve localization and mapping in parallel, especially the hybrid robocentric and world-centric formulations are used for modeling the respective problems. We also demonstrate the effectiveness of the proposed method using both the synthetic data generated for Monte-Carlo simulations and diverse real datasets acquired in highly-dynamic, long-term, and large-scale SLAM scenarios. Simulation results validate the significantly improved consistency and accuracy by applying our method. Experimental results show the better (competitive at least) performance against a state-of-the-art method, while being capable of processing a huge amount of measurements in building large-scale maps without blocking the high-accuracy real-time localization outputs.
|
|
ThBT3 |
303 |
Pose Estimation |
Regular Session |
Chair: Caverly, Ryan James | University of Minnesota |
Co-Chair: Anderson, Monica | The University of Alabama |
|
09:55-10:00, Paper ThBT3.1 | |
Depth-Based Efficient PnP: A Rapid and Accurate Method for Camera Pose Estimation |
|
Xie, Xinyue | Dalian University of Technology |
Zou, Deyue | Dalian University of Technology |
Keywords: SLAM, Vision-Based Navigation
Abstract: This paper presents a novel approach, DEPnP (Depth-based Efficient PnP), addressing the Perspective-n-Point (PnP) problem crucial in vision-based navigation and SLAM (Simultaneous Localization and Mapping) in robotics and automation, which estimates the pose of a calibrated camera by observing the 2D projections of known 3D points onto the camera image plane. The method employs eight variables to control the depth of control points and orientation of camera, formulating camera pose estimation as an optimization task. By optimizing these variables utilizing mean-subtracted rotation equations, rapid and accurate camera pose estimation is achieved. Notably, the careful selection of variables and objective function simplifies the computation of the Jacobian matrix, ensuring computational efficiency. DEPnP demonstrates robustness against noise and inlier disturbances, consistently delivering accurate camera pose estimation. Experimental evaluations validate the effectiveness and accuracy of DEPnP, positioning it as a competitive solution for real-time applications requiring precise camera pose estimation in robotics and automation.
|
|
10:00-10:05, Paper ThBT3.2 | |
Kalman-Filter-Based Pose Estimation of Cable-Driven Parallel Robots Using Cable-Length Measurements with Colored Noise |
|
Nguyen, Vinh | University of Minnesota |
Caverly, Ryan James | University of Minnesota |
Keywords: Parallel Robots, Kinematics, Sensor Fusion
Abstract: This paper introduces a cable-length-based extended Kalman filter (L-EKF) framework to estimate the end-effector pose of a cable-driven parallel robot (CDPR). The L-EKF fuses end-effector accelerometer and rate gyroscope measurements with cable-length measurements. The main contribution compared to prior CDPR pose estimation EKF methods is that the L-EKF framework does not require an iterative forward kinematics algorithm to be solved each time step, reducing the computation time of the EKF. Moreover, the L-EKF is amenable to the inclusion of colored measurement noise, which provides a more realistic quantification of the kinematic uncertainty present in the cable-length measurements. Experimental results demonstrate that the L-EKF is computationally more efficient than previous forward-kinematics-based EKF methods, as well as the moderate improvement in pose estimation provided by the colored noise model.
|
|
10:05-10:10, Paper ThBT3.3 | |
A Unified End-To-End Network for Category-Level and Instance-Level Object Pose Estimation from RGB Images |
|
Ren, Jiale | Peking University |
Liu, Hong | Peking University |
Liu, Jinfu | Peking University |
Jiang, Peifeng | Peking University |
Keywords: Deep Learning for Visual Perception, Visual Learning, Deep Learning Methods
Abstract: Accurately estimating the 6-DoF pose of objects is a fundamental challenge in computer vision and robotics. While category-level pose estimation based on RGBD data has achieved good performance in recent years, estimating poses solely from RGB images remains a significant challenge. Existing RGB-based category-level methods primarily focus on recovering object point clouds from RGB images, and pose prediction is not performed end-to-end by a network. This paper presents a Category-level and Instance-level Pose Estimation Network (CIPE), which models pose estimation as a set prediction problem and enables direct pose regression from RGB images. To further enhance the network's ability to learn object poses, first, a novel learnable rotation representation that redefines rotation learning within Euclidean space is introduced to facilitate rotation regression. Additionally, we propose a prior-query fusion strategy that utilizes a pre-trained point cloud feature extraction network to integrate categorical object features with bounding boxes, thereby improving the incorporation of category information. Experimental results demonstrate that CIPE significantly outperforms existing RGB-based methods on both category-level and instance-level datasets. The code is available at https://github.com/jialeren/CIPE.
|
|
10:10-10:15, Paper ThBT3.4 | |
MonoLDP: LED Assisted Indoor Mobile Bot Monocular Depth Prediction and Pose Estimation System |
|
Liang, Chenxin | Tsinghua Unviersity |
Wang, Jingyang | Shenzhen International Graduate School, Tsinghua University |
Li, Shoujie | Tsinghua Shenzhen International Graduate School |
Sou, Kit-Wa | Tsinghua University |
Luo, Xinyu | Tsinghua University |
Ding, Wenbo | Tsinghua University |
Keywords: Computer Vision for Automation, Visual Learning, RGB-D Perception
Abstract: Multi-robot clusters are increasingly deployed in indoor environments, where effective communication and 3D perception are critical for coordinated operations. Monocular cameras, known for their lightweight design, cost-effectiveness, and versatility, present a promising solution for these tasks. However, relying solely on monocular cameras for comprehensive perception and communication presents significant challenges. To address this, we introduce MonoLDP, a novel system that leverages monocular cameras for depth estimation, mutual pose estimation, and visible light communication in indoor environments, providing an integrated framework to overcome these limitations. MonoLDP features a two-stage network: (1) a depth estimation module that infers depth from monocular images, and (2) a depth-guided 3D object recognition network for agent-relative localization and pose estimation. We created a custom dataset to validate the accuracy of MonoLDP. On our indoor dataset, MonoLDP outperforms the baseline by 43.39% in 3D detection and 42.39% in bird’s-eye view detection, with an average localization error of 0.104m and an orientation error of 1.66 degrees. Moreover, the depth estimation network demonstrates excellent performance on the NYU v2 dataset. Additionally, the system achieves a communication rate of 1.2 Kbps with a bit error rate below 10^(-2) at a distance of up to 4 meters using LED arrays. Our code will be released at https://github.com/RavenLiang1005/MonoLDP.git.
|
|
10:15-10:20, Paper ThBT3.5 | |
LCSPose: Efficient, Accurate and Scalable Markerless 6-DoF Pose Estimation of a Quay Crane Spreader Based on LiDAR and Camera |
|
Zhou, Yichen | Nanyang Technological University |
Zhang, Jun | Nanyang Technological University |
Peng, Guohao | Nanyang Technological University |
Yun, Yanpu | Nanyang Technological University |
Liu, Yiyao | NANYANG Technological University |
Wang, Yuanzhe | Shandong University |
Wang, Danwei | Nanyang Technological University |
Keywords: Field Robots, Industrial Robots, Perception for Grasping and Manipulation
Abstract: Accurate Six Degrees of Freedom (6-DoF) pose estimation of Ship-To-Shore (STS) quay crane spreaders is crucial for ensuring safe and efficient container handling in port automation. However, existing pose estimation techniques face significant challenges, as camera-based systems either rely on markers, which are prone to damage, or struggle with depth estimation inaccuracies. Additionally, 3D sensor-based approaches, particularly point cloud registration (PCR), face challenges such as initial pose errors, high-latency inference, and difficulties in object identification based purely on geometric features. To address these limitations, we propose LCSPose, a LiDAR-camera fusion-based 6-DoF pose estimation method that is marker-free, accurate, efficient, and scalable. Our approach integrates three key modules: (1) a semantic-geometric segmentation module for spreader segmentation and outlier removal, (2) a spatial consistency template sampling module based on Spatial Consistency Score (SC-Score) for reliable template selection across varying distances, and (3) a multi-view coarse-to-fine pose refinement module which incorporates multi-view PCA alignment for robust initial posture prior estimation and iterative pose refinement strategy for long-range registration. Our method demonstrates a 60% improvement in registration recall over state-of-the-art (SOTA) PCR methods, achieving up to 6 cm in translation error and 0.19 degrees in rotation error, while maintaining real-time processing at 20Hz.
|
|
10:20-10:25, Paper ThBT3.6 | |
ZeroBP: Learning Position-Aware Correspondence for Zero-Shot 6D Pose Estimation in Bin-Picking |
|
Chen, Jianqiu | Harbin Institute of Technology |
Zhou, Zikun | Pengcheng Laboratory |
Li, Xin | Pengcheng Laboratory |
Bao, Tianpeng | Guangzhou Medical University |
Zheng, Ye | JD Logistics |
He, Zhenyu | Harbin Institute of Technology |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, RGB-D Perception
Abstract: Bin-picking is a practical and challenging robotic manipulation task, where accurate 6D pose estimation plays a pivotal role. The workpieces in bin-picking are typically textureless and randomly stacked in a bin, which poses a significant challenge to 6D pose estimation. Existing solutions are typically learning-based methods, which require object-specific training. Their efficiency of practical deployment for novel workpieces is highly limited by data collection and model retraining. Zero-shot 6D pose estimation is a potential approach to address the issue of deployment efficiency. Nevertheless, existing zero-shot 6D pose estimation methods are designed to leverage feature matching to establish point-to-point correspondences for pose estimation, which is less effective for workpieces with textureless appearances and ambiguous local regions. In this paper, we propose ZeroBP, a zero-shot pose estimation framework designed specifically for the bin-picking task. ZeroBP learns Position-Aware Correspondence (PAC) between the scene instance and its CAD model, leveraging both local features and global positions to resolve the mismatch issue caused by ambiguous regions with similar shapes and appearances. Extensive experiments on the ROBI dataset demonstrate that ZeroBP outperforms state-of-the-art zero-shot pose estimation methods, achieving an improvement of 9.1% in average recall of correct poses.
|
|
10:25-10:30, Paper ThBT3.7 | |
Virtual Frame Rotation: A Novel Two-Stage Pose Estimation Scheme of Permanent Magnet Marker for Medical Applications |
|
Park, Jiho | Gwangju Institute of Science and Technology |
Lim, Buyong | GIST |
Yoon, Jungwon | Gwangju Institutue of Science and Technology |
Keywords: Medical Robots and Systems, Micro/Nano Robots, Localization
Abstract: Permanent magnetic marker (PMM) has the potential to broaden the scope of medical robots by facilitating the localization of target points even in environments where vision-based methods cannot operate. However, conventional approaches rely on the accuracy of the modeling equations and are not adaptable to changes in the magnet's properties, which can occur due to factors like non-uniformity in the marker material or temperature fluctuations within the PMM. These constraints make it challenging to apply the PMM across diverse medical techniques. In this work, we introduce a novel two-stage PMM localization scheme, called Virtual Frame Rotation (VFR), designed to address this issue. VFR employs an approach that virtually rotates the observation frame of the hall sensors' output vector and checks the symmetry of the magnetic field in the rotated frame. This approach allows for robust pose estimation of the condition with variance in magnetic properties, as verified by comparing its localization performance with the conventional approach in the simulation and the real-world environments with temperature variance conditions. Based on these characteristics, VFR can expand the scope of medical applications that involve changes in the properties of magnetic markers, such as the in-body localization of magnetic macro particles for hyperthermia treatment.
|
|
ThBT4 |
304 |
Bioinspiration and Biomimetics 1 |
Regular Session |
Chair: Mazzolai, Barbara | Istituto Italiano Di Tecnologia |
Co-Chair: Ramezani, Alireza | Northeastern University |
|
09:55-10:00, Paper ThBT4.1 | |
Back-Stepping Experience Replay with Application to Model-Free Reinforcement Learning for a Soft Snake Robot |
|
Qi, Xinda | Michigan State University |
Chen, Dong | Mississippi State University |
Li, Zhaojian | Michigan State University |
Tan, Xiaobo | Michigan State University |
Keywords: Modeling, Control, and Learning for Soft Robots, Reinforcement Learning, Biologically-Inspired Robots
Abstract: In this paper, we propose a novel technique, Back-stepping Experience Replay (BER), that is compatible with arbitrary off-policy reinforcement learning (RL) algorithms. BER aims to enhance learning efficiency in systems with approximate reversibility, reducing the need for complex reward shaping. The method constructs reversed trajectories using back-stepping transitions to reach random or fixed targets. Interpretable as a bi-directional approach, BER addresses inaccuracies in back-stepping transitions through a purification of the replay experience during learning. Given the intricate nature of soft robots and their complex interactions with environments, we present an application of BER in a model-free RL approach for the locomotion and navigation of a soft snake robot, which is capable of serpentine motion enabled by anisotropic friction between the body and ground. In addition, a dynamic simulator is developed to assess the effectiveness and efficiency of the BER algorithm, in which the robot demonstrates successful learning (reaching a 100% success rate) and adeptly reaches random targets, achieving an average speed 48% faster than that of the best baseline approach.
|
|
10:00-10:05, Paper ThBT4.2 | |
Continuous Convolution for Automated Measurement of Sperm Flagella |
|
Jin, Yufei | The Chinese Univiersity of Hong Kong(shenzhen) |
Yang, Han | The Chinese University of Hong Kong, Shenzhen |
Chen, Wenyuan | University of Toronto |
Wang, Xinrui | The Chinese University of Hongkong (Shenzhen) |
Sun, Yu | University of Toronto |
Zhang, Zhuoran | The Chinese University of Hong Kong, Shenzhen |
Keywords: Automation at Micro-Nano Scales, Deep Learning Methods, Computer Vision for Automation
Abstract: Quantifying sperm flagellar beating behavior (e.g., beating amplitude, frequency, and wavelength) plays a crucial role in biological research, clinical diagnostics, and the design of sperm-inspired microrobots. However, existing computational methods struggle to accurately and efficiently analyze the highly dynamic, complex, and fine structures of sperm flagella, especially when portions of the flagellum become invisible due to three-dimensional out-of-focus beating. This paper proposes an automated high-throughput tool for quantitative analysis of sperm flagellar beating. The core innovation is continuous convolution (CConv), which adaptively captures the irregular, time-varying patterns of sperm flagella while ensuring continuity in segmentation outputs, even in the presence of locally invisible regions caused by out-of-focus motion. CConv can be integrated into various neural network architectures as a plug-and-play module. Extensive experiments demonstrate that integrating CConv consistently improves the accuracy and continuity of flagella segmentation across different networks. Furthermore, utilizing a curvature-based approach, we quantified key flagellar beating parameters, including length, amplitude, frequency, and wavelength. Applying the high-throughput tool on 1200 sperm revealed that sperm from fertile donors had significantly higher flagellar beating frequency than sperm from infertile patients. The proposed automated tool unlocks high-throughput, quantitative analysis of sperm flagellar beating, showing the potential for applications in reproductive biology and engineering research.
|
|
10:05-10:10, Paper ThBT4.3 | |
Adaptive Concertina Locomotion of a Robotic Snake through Narrow Uncertain Channels |
|
Koley, Jit | Indian Institute of Technology Bombay |
Sharma, Devashish | Hindustan Institute of Technology and Science, Chennai |
Chakraborty, Debraj | Indian Institute of Technology Bombay |
K. Pillai, Harish | Indian Institute of Technology Bombay |
Keywords: Redundant Robots, Search and Rescue Robots, Actuation and Joint Mechanisms
Abstract: The problem of mimicking the concertina locomotion mode of biological snakes through narrow channels of uncertain width, using a multi-link planar serpenoid robot, is considered. A novel algorithm for generating a reference trajectory that accurately reproduces this natural gait pattern is proposed and analysed for straight channels. A modification of this algorithm leverages feedback from the joints’ current and angular velocities to dynamically adjust the robot’s movements within channels of unknown and varying widths. Experiments through rugged artificial channels of varying width show remarkable ability of the programmed snake robot to negotiate such terrains with agility and reasonable speed.
|
|
10:10-10:15, Paper ThBT4.4 | |
Bio-Inspired Distributed Neural Locomotion Controller (D-NLC) for Robust Locomotion and Emergent Behaviors |
|
Zhang, Zhikai | Carnegie Mellon University |
Guo, Siqi | Carnegie Mellon University |
Kou, Henry | Carnegie Mellon University |
Shikhare, Ishayu | Carnegie Mellon University |
Choset, Howie | Carnegie Mellon University |
Li, Lu | Carnegie Mellon University |
Keywords: Biologically-Inspired Robots, Cellular and Modular Robots, Neurorobotics
Abstract: With relatively fewer neurons than more complex life forms, insects are still capable of producing astonishing locomotive behaviors, such as traversing diverse habitats and making rapid gait adaptations after extreme injury or autotomy. Biologists attribute this to a chain of segmental neuron clusters (ganglia) within insect nervous systems, which act as distributed, self-organizing sensorimotor control units. Inspired by the neural structure of the Carausius morosus, the common stick insect, this research introduces the Distributed Neural Locomotion Controller (D-NLC), a modular control framework utilizing local proprioceptive feedback to modulate joint-level Central Pattern Generator (CPG) signals to produce emergent locomotive behaviors. We implemented this framework using a modular legged robot with distributed joint-level embedded computing units and assessed its performance and behavior under various experimental settings. Based on real-world experiments, we observe an overall 31.3% average increase in curvilinear motion performance under external (terrain) and internal (amputation) disturbances compared to a centralized predefined gait controller. This difference is statistically significant (P<<0.05) for larger perturbations but not for single-leg amputations. Experiments with perturbation-induced leg stance duration and leg-phase-difference analysis further validated our hypothesis regarding D-NLC's role in the robust perceptive locomotion and self-emergent gait adaptation against complex unforeseen perturbations. This proposed control framework does not require any numerical optimization or weight training processes, which are time-consuming and computationally expensive. To the best of our knowledge, this framework is the first bio-inspired neural controller deployed on a distributed embedded system.
|
|
10:15-10:20, Paper ThBT4.5 | |
Reduced-Order Model-Based Gait Generation for Snake Robot Locomotion Using NMPC |
|
Salagame, Adarsh | Northeastern University |
Sihite, Eric | California Institute of Technology |
Ramezani, Milad | CSIRO |
Ramezani, Alireza | Northeastern University |
Keywords: Biologically-Inspired Robots, Optimization and Optimal Control, Motion Control
Abstract: This paper presents an optimization-based motion planning methodology for snake robots operating in constrained environments. By using a reduced-order model, the proposed approach simplifies the planning process, enabling the optimizer to autonomously generate gaits while constraining the robot’s footprint within tight spaces. The method is validated through high-fidelity simulations that accurately model contact dynamics and the robot’s motion. Key locomotion strategies are identified and further demonstrated through hardware experiments, including successful navigation through narrow corridors.
|
|
10:20-10:25, Paper ThBT4.6 | |
AquaMILR+: Design of an Untethered Limbless Robot for Complex Aquatic Terrain Navigation |
|
Fernandez, Matthew | Georgia Institute of Technology |
Wang, Tianyu | Georgia Institute of Technology |
Tunnicliffe, Galen | Georgia Institute of Technology |
Dortilus, Donoven | Georgia Institute of Technology |
Gunnarson, Peter | California Institute of Technology |
Dabiri, John | California Insititute of Technology |
Goldman, Daniel | Georgia Institute of Technology |
Keywords: Biologically-Inspired Robots, Redundant Robots, Search and Rescue Robots
Abstract: This paper presents AquaMILR+, an untethered limbless robot designed for agile navigation in complex aquatic environments. The robot features a bilateral actuation mechanism that models musculoskeletal actuation in many anguilliform swimming organisms which propagates a moving wave from head to tail allowing open fluid undulatory swimming. This actuation mechanism employs mechanical intelligence, enhancing the robot's maneuverability when interacting with obstacles. AquaMILR+ also includes a compact depth control system inspired by the swim bladder and lung structures of eels and sea snakes. The mechanism, driven by a syringe and telescoping leadscrew, enables depth and pitch control -- capabilities that are difficult for most anguilliform swimming robots to achieve. Additional structures, such as fins and a tail, further improve stability and propulsion efficiency. Our tests in both open water and indoor 2D and 3D heterogeneous aquatic environments highlight AquaMILR+'s capabilities and suggest a promising system for complex underwater tasks such as search and rescue and deep-sea exploration.
|
|
10:25-10:30, Paper ThBT4.7 | |
Traversing between Two Planes Using Obstacle-Aided Locomotion of a Snake Robot |
|
Yoshida, Yuto | The University of Electro-Communications |
Chin, Ching Wen | The University of Electro-Communications |
Tanaka, Motoyasu | The Univ. of Electro-Communications |
|
|
ThBT5 |
305 |
Model Predictive Control for Legged Robots 1 |
Regular Session |
Chair: Lee, Jinoh | German Aerospace Center (DLR) |
Co-Chair: Zhao, Ye | Georgia Institute of Technology |
|
09:55-10:00, Paper ThBT5.1 | |
Adapting Gait Frequency for Posture-Regulating Humanoid Push-Recovery Via Hierarchical Model Predictive Control |
|
Li, Junheng | University of Southern California |
Le, Zhanhao | University of Southern California |
Ma, Junchao | University of Southern California |
Nguyen, Quan | University of Southern California |
Keywords: Humanoid and Bipedal Locomotion, Optimization and Optimal Control, Whole-Body Motion Planning and Control
Abstract: Current humanoid push-recovery strategies often use whole-body motion, yet they tend to overlook posture regulation. For instance, in manipulation tasks, the upper body may need to stay upright and have minimal recovery displacement. This paper introduces a novel approach to enhancing humanoid push-recovery performance under unknown disturbances and regulating body posture by tailoring the recovery stepping strategy. We propose a hierarchical-MPC-based scheme that analyzes and detects instability in the prediction window and quickly recovers through adapting gait frequency. Our approach integrates a high-level nonlinear MPC, a posture-aware gait frequency adaptation planner, and a low-level convex locomotion MPC. The planners predict the center of mass (CoM) state trajectories that can be assessed for precursors of potential instability and posture deviation. In simulation, we demonstrate improved maximum recoverable impulse by 131% on average compared with baseline approaches. In hardware experiments, a 125 ms advancement in recovery stepping timing/reflex has been observed with the proposed approach. We also demonstrate improved push-recovery performance and minimized body attitude change under 0.2 rad.
|
|
10:00-10:05, Paper ThBT5.2 | |
Robots with Attitude: Singularity-Free Quaternion-Based Model-Predictive Control for Agile Legged Robots |
|
Zhang, Zixin | Northwestern University |
Zhang, John | Carnegie Mellon University |
Yang, Shuo | Carnegie Mellon University |
Manchester, Zachary | Carnegie Mellon University |
Keywords: Legged Robots, Optimization and Optimal Control, Body Balancing
Abstract: We present a model-predictive control (MPC) framework for legged robots that avoids the singularities associated with common three-parameter attitude representations like Euler angles during large-angle rotations. Our method parameterizes the robot's attitude with singularity-free unit quaternions and makes modifications to the iterative linear-quadratic regulator (iLQR) algorithm to deal with the resulting geometry. The derivation of our algorithm requires only elementary calculus and linear algebra, deliberately avoiding the abstraction and notation of Lie groups. We demonstrate the performance and computational efficiency of quaternion MPC in several experiments on quadruped and humanoid robots.
|
|
10:05-10:10, Paper ThBT5.3 | |
Online Nonlinear MPC for Multimodal Locomotion |
|
Taliani, Saverio | Italian Institute of Technology |
Nava, Gabriele | Istituto Italiano Di Tecnologia |
L'Erario, Giuseppe | Istituto Italiano Di Tecnologia |
Elobaid, Mohamed | Fondazione Istituto Italiano Di Tecnologia |
Romualdi, Giulio | Istituto Italiano Di Tecnologia |
Pucci, Daniele | Italian Institute of Technology |
Keywords: Humanoid Robot Systems, Aerial Systems: Mechanics and Control, Control Architectures and Programming
Abstract: Aerial humanoid robots can enhance the efficiency and safety of rescue operations in disaster scenarios. The control of such complex machines presents many challenges, for instance, the control of the different locomotion strategies and the stabilization of the transition maneuvers. In this article, we present an online nonlinear Model Predictive Controller and the relative prediction model to stabilize walking and flying trajectories. The controller uses a reduced model to generate feasible base link references, thrust profiles, and contact forces while dealing with different locomotion strategies and transition maneuvers. The control algorithm is tested in a simulated environment using our aerial humanoid robot iRonCub under the effect of external disturbances. The proposed control strategy demonstrates to effectively stabilize the desired trajectories while keeping the problem still treatable online.
|
|
10:10-10:15, Paper ThBT5.4 | |
Terrain-Aware Model Predictive Control of Heterogeneous Bipedal and Aerial Robot Coordination for Search and Rescue Tasks |
|
Shamsah, Abdulaziz | Georgia Institute of Technology |
Jiang, Jesse | Georgia Institute of Technology |
Yoon, Ziwon | Georgia Institute of Technology |
Coogan, Samuel | Georgia Tech |
Zhao, Ye | Georgia Institute of Technology |
Keywords: Humanoid and Bipedal Locomotion, Multi-Robot Systems, Search and Rescue Robots
Abstract: Humanoid robots offer significant advantages for search and rescue tasks, thanks to their capability to traverse rough terrains and perform transportation tasks. In this study, we present a task and motion planning framework for search and rescue operations using a heterogeneous robot team composed of humanoids and aerial robots. We propose a terrain-aware Model Predictive Controller (MPC) that incorporates terrain elevation gradients learned using Gaussian processes (GP). This terrain-aware MPC generates safe navigation paths for the bipedal robots to traverse rough terrain while minimizing terrain slopes, and it directs the quadrotors to perform aerial search and mapping tasks. The rescue subjects' locations are estimated by a target belief GP, which is updated online during the map exploration. A high-level planner for task allocation is designed by encoding the navigation tasks using syntactically cosafe Linear Temporal Logic (scLTL), and a consensus-based algorithm is designed for task assignment of individual robots. We evaluate the efficacy of our planning framework in simulation in an uncertain environment with various terrains and random rescue subject placements.
|
|
10:15-10:20, Paper ThBT5.5 | |
Koopman Operator Based Linear Model Predictive Control for Quadruped Trotting |
|
Yang, Chun-Ming | University of Illinois at Chicago |
Bhounsule, Pranav | University of Illinois at Chicago |
Keywords: Legged Robots, Model Learning for Control, Force Control
Abstract: Online optimal control of quadruped robots would enable them to adapt to varying inputs and changing conditions in real time. A common way of achieving this is linear model predictive control (LMPC), where a quadratic programming (QP) problem is formulated over a finite horizon with a quadratic cost and linear constraints obtained by linearizing the equations of motion and solved on the fly. However, the model linearization may lead to model inaccuracies. In this paper, we use the Koopman operator to create a linear model of the quadrupedal system in high dimensional space which preserves the nonlinearity of the equations of motion. Then using LMPC, we demonstrate high fidelity tracking and disturbance rejection on a quadrupedal robot. This is the first work that uses the Koopman operator theory for LMPC of quadrupedal locomotion.
|
|
10:20-10:25, Paper ThBT5.6 | |
Kinodynamic Model Predictive Control for Energy Efficient Locomotion of Legged Robots with Parallel Elasticity |
|
Zhuang, Yulun | University of Michigan |
Wang, Yichen | University of Michigan |
Ding, Yanran | University of Michigan |
Keywords: Legged Robots, Optimization and Optimal Control, Compliant Joints and Mechanisms
Abstract: In this paper, we introduce a kinodynamic model predictive control (MPC) framework that exploits unidirectional parallel springs (UPS) to improve the energy efficiency of dynamic legged robots. The proposed method employs a hierarchical control structure, where the solution of MPC with simplified dynamic models is used to warm-start the kinodynamic MPC, which accounts for nonlinear centroidal dynamics and kinematic constraints. The proposed approach enables energy efficient dynamic hopping on legged robots by using UPS to reduce peak motor torques and energy consumption during stance phases. Simulation results demonstrated a 38.8% reduction in the cost of transport (CoT) for a monoped robot equipped with UPS during high-speed hopping. Additionally, preliminary hardware experiments show a 14.8% reduction in energy consumption.
|
|
10:25-10:30, Paper ThBT5.7 | |
Dynamic Bipedal MPC with Foot-Level Obstacle Avoidance and Adjustable Step Timing |
|
Wang, Tianze | Florida State University |
Hubicki, Christian | Florida State University |
Keywords: Legged Robots, Humanoid and Bipedal Locomotion, Whole-Body Motion Planning and Control
Abstract: Collision-free planning is essential for bipedal robots operating within unstructured environments. This paper presents a real-time Model Predictive Control (MPC) framework that addresses both body and foot avoidance for dynamic bipedal robots. Our contribution is two-fold: we introduce (1) a novel formulation for adjusting step timing to facilitate faster body avoidance and (2) a novel 3D foot-avoidance formulation that implicitly selects swing trajectories and footholds that either steps over or navigate around obstacles with awareness of Center of Mass (COM) dynamics. We achieve body avoidance by applying a half-space relaxation of the safe region but introduce a switching heuristic based on tracking error to detect a need to change foot-timing schedules. To enable foot avoidance and viable landing footholds on all sides of foot-level obstacles, we decompose the non-convex safe region on the ground into several convex polygons and use Mixed-Integer Quadratic Programming to determine the optimal candidate. We found that introducing a soft minimum-travel-distance constraint is effective in preventing the MPC from being trapped in local minima that can stall half-space relaxation methods behind obstacles. We demonstrated the proposed algorithms on multibody simulations on the bipedal robot platforms, Cassie and Digit, as well as hardware experiments on Digit.
|
|
ThBT6 |
307 |
Perception for Manipulation 1 |
Regular Session |
Chair: Calli, Berk | Worcester Polytechnic Institute |
Co-Chair: Liu, Tengyu | Beijing Institute for General Artificial Intelligence |
|
09:55-10:00, Paper ThBT6.1 | |
Enhancing Robotic Perception with Low-Cost Fast Active Vision Achieving Sub-Millimeter Accurate Marker-Based Pose Estimation |
|
Knobbe, Dennis | Technical University of Munich |
Standke, Johann Jakob Wilhelm | Technical University of Munich |
Haddadin, Sami | Mohamed Bin Zayed University of Artificial Intelligence |
Keywords: Visual Servoing, Computer Vision for Automation, Performance Evaluation and Benchmarking
Abstract: Robust perception of the environment is a critical challenge for robots, especially those that use mobile platforms or humanoid forms to perform manipulation tasks. Active vision, leveraging strategic camera movements and adaptive imaging parameters, holds great potential for addressing critical challenges such as achieving high accuracy in precise manipulation, ensuring low latency for rapid responsiveness, and overcoming occlusions and illumination variations in dynamic environments. This paper introduces a novel, cost-effective, and easily deployable active vision system designed to enhance visual perception for robotic applications. Integrated with a novel hybrid software setup, the system utilizes ArUco markers to achieve high-accuracy, low-latency performance, boasting sub-millimeter and sub-degree accuracy at 200 Hz with a latency of less than 15 ms. Additionally, a new measurement and evaluation procedure is presented, offering benchmarking for marker-based object detection systems that for the first time includes rotation measurements as well. The benchmarking results for the proposed system indicate that achieving the desired performance levels necessitates specialized active vision measurement strategies. For instance, to ensure high positional accuracy, the system needs precise object centering, while high rotational accuracy requires accounting for lateral or rotational offsets.
|
|
10:00-10:05, Paper ThBT6.2 | |
PhysPart: Physically Plausible Part Completion for Interactable Objects |
|
Luo, Rundong | Cornell University |
Geng, Haoran | University of California, Berkeley |
Deng, Congyue | Stanford |
Li, Puhao | Tsinghua University |
Wang, Zan | Beijing Institute of Technology |
Jia, Baoxiong | Beijing Institute for General Artificial Intelligence |
Guibas, Leonidas | Stanford University |
Huang, Siyuan | Beijing Institute for General Artificial Intelligence |
Keywords: Perception for Grasping and Manipulation, Manipulation Planning
Abstract: Interactable objects are ubiquitous in our daily lives. Recent advances in 3D generative models make it possible to automate the modeling of these objects, benefiting a range of applications from 3D printing to the creation of robot simulation environments. However, while significant progress has been made in modeling 3D shapes and appearances, modeling object physics, particularly for interactable objects, remains challenging due to the physical constraints imposed by inter-part motions. In this paper, we tackle the problem of physically plausible part completion for interactable objects, aiming to generate 3D parts that not only fit precisely into the object but also allow smooth part motions. To this end, we propose a diffusion-based part generation model that utilizes geometric conditioning through classifier-free guidance and formulates physical constraints as a set of stability and mobility losses to guide the sampling process. Additionally, we demonstrate the generation of dependent parts, paving the way toward sequential part generation for objects with complex part-whole hierarchies. Experimentally, we introduce a new metric for measuring physical plausibility based on motion success rates. Our model outperforms existing baselines over shape and physical metrics, especially those that do not adequately model physical constraints. We also demonstrate our applications in 3D printing, robot manipulation, and sequential part generation, showing our strength in realistic tasks with the demand for high physical plausibility.
|
|
10:05-10:10, Paper ThBT6.3 | |
Generalizable Zero-Shot Object Pose Estimation for Bin-Picking |
|
Zhang, Zijiang | Kyushu Institute of Technology |
Huimin, Lu | Southeast University |
Jintong, Cai | Southeast University |
Kamiya, Tohru | Kyushu Institute of Technology |
Serikawa, Seiichi | Kyushu Institute of Technology |
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation
Abstract: Abstract—Unordered grasping in industrial robotic manipulation requires precise six-degree-of-freedom (6D) pose estimation. However, existing methods often struggle with unknown objects and require retraining, limiting their practicality. Traditional 3D point-pair feature methods, while training-free, perform poorly with textured symmetric objects. We propose a generalizable approach for zero-shot 6D pose estimation without retraining. Our method consists of two steps: generating CAD-based templates through real-time rendering for coarse pose estimation, and refining poses using semantic point-pair features aligned with the camera viewpoint. We conducted experiments on seven core datasets from the Benchmark for 6D Object Pose Estimation (BOP) challenge, and the results are publicly available on the BOP website. Integration into a robotic grasping system further highlights its high precision and fast execution, making it idealfor applications such as bin-picking.
|
|
10:10-10:15, Paper ThBT6.4 | |
Visuo-Tactile Object Pose Estimation for a Multi-Finger Robot Hand with Low-Resolution In-Hand Tactile Sensing |
|
Mack, Lukas | University of Augsburg |
Grüninger, Felix | Max Planck Institute for Intelligent Systems |
Richardson, Benjamin A. | Max Planck Institute for Intelligent Systems |
Lendway, Regine | University of Tuebingen |
Kuchenbecker, Katherine J. | Max Planck Institute for Intelligent Systems |
Stueckler, Joerg | University of Augsburg |
Keywords: Perception for Grasping and Manipulation, Sensor Fusion, Force and Tactile Sensing
Abstract: Accurate 3D pose estimation of grasped objects is an important prerequisite for robots to perform assembly or in-hand manipulation tasks, but object occlusion by the robot's own hand greatly increases the difficulty of this perceptual task. Here, we propose that combining visual information and proprioception with binary, low-resolution tactile contact measurements from across the interior surface of an articulated robotic hand can mitigate this issue. The visuo-tactile object-pose-estimation problem is formulated probabilistically in a factor graph. The pose of the object is optimized to align with the three kinds of measurements using a robust cost function to reduce the influence of visual or tactile outlier readings. The advantages of the proposed approach are first demonstrated in simulation: a custom 15-DoF robot hand with one binary tactile sensor per link grasps 17 YCB objects while observed by an RGB-D camera. This low-resolution in-hand tactile sensing significantly improves object-pose estimates under high occlusion and also high visual noise. We also show these benefits through grasping tests with a preliminary real version of our tactile hand, obtaining reasonable visuo-tactile estimates of object pose at approximately 13.3 Hz on average.
|
|
10:15-10:20, Paper ThBT6.5 | |
Proactive Tactile Exploration for Object-Agnostic Shape Reconstruction from Minimal Visual Priors |
|
Oikonomou, Paris | National Technical University of Athens (NTUA) |
Retsinas, George | National Technical University of Athens |
Maragos, Petros | National Technical University of Athens |
Tzafestas, Costas S. | ICCS - Inst of Communication and Computer Systems |
Keywords: Perception for Grasping and Manipulation
Abstract: The perception of an object’s surface is important for robotic applications enabling robust object manipulation. The level of accuracy in such a representation affects the outcome of the action planning, especially during tasks that require physical contact, e.g. grasping. In this paper, we propose a novel iterative method for 3D shape reconstruction consisting of two steps. At first, a mesh is fitted on data points acquired from the object’s surface, based on a single primitive template. Subsequently, the mesh is properly adjusted to adequately represent local deformities. Moreover, a novel proactive tactile exploration strategy aims at minimizing the total uncertainty with the least number of contacts, while reducing the risk of contact failure in case the estimated surface differs significantly from the real one. The performance of the methodology is evaluated both in 3D simulation and on a real setup.
|
|
10:20-10:25, Paper ThBT6.6 | |
Multi-Layer Feature Exchange Transformer for Multi-View 6D Object Pose Estimation in Robot Bin Picking |
|
Khalil, Momen | Technical University of Munich |
Dietrich, Vincent | Siemens Corporate Technology |
Ilic, Slobodan | Technische Universitat Munchen |
Keywords: Perception for Grasping and Manipulation, Computer Vision for Automation, Deep Learning for Visual Perception
Abstract: Accurate 6D object pose estimation is crucial in industrial automation, particularly in robotic bin picking, where objects are often textureless, reflective, and arranged in cluttered environments. Multi-view pose estimation methods offer significant advantages over single-view methods by providing more comprehensive information, effectively handling occlusions and lack of features, and resolving depth ambiguities. However, current multi-view methods often rely on late-stage information fusion, limiting their ability to fully exploit complementary multi-view data. This paper presents a novel approach to enhance multi-view 6D pose estimation by introducing a Feature Exchange Transformer (FET) for early-stage feature fusion. This approach leverages self-attention and epipolar cross-attention mechanisms to enable multi-layer feature aggregation across views. Additionally, we introduce a coarse-to-fine strategy for an efficient feature exchange at multiple network layers. Our method, implemented on top of EpiSurfEmb, enhances the utilization of multi-view information, leading to significant improvements in pose estimation accuracy and robustness, especially in challenging bin-picking scenarios. We evaluate our approach on the ROBI dataset, demonstrating that it outperforms both the baseline EpiSurfEmb and other state-of-the-art multi-view pose estimation methods
|
|
ThBT7 |
309 |
Assistive Human-Robot Interaction |
Regular Session |
Chair: Yanco, Holly | UMass Lowell |
Co-Chair: Haring, Kerstin Sophie | University of Denver |
|
09:55-10:00, Paper ThBT7.1 | |
DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding |
|
Liu, Shuijing | The University of Texas at Austin |
Hasan, Aamir | University of Illinois Urbana-Champaign |
Hong, Kaiwen | University of Illinois at Urbana Champaign |
Wang, Runxuan | University of Illinois at Urbana Champaign |
Chang, Peixin | University of Illinois at Urbana Champaign |
Mizrachi, Zachary | University of Illinois at Urbana Champaign |
Lin, Justin | University of Illinois at Urbana-Champaign |
McPherson, D. Livingston | University of Illinois |
Rogers, Wendy | University of Illinois Urbana-Champaign |
Driggs-Campbell, Katherine | University of Illinois at Urbana-Champaign |
Keywords: Human-Centered Robotics, Natural Dialog for HRI, AI-Enabled Robotics
Abstract: Persons with visual impairments (PwVI) have difficulties understanding and navigating spaces around them. Current wayfinding technologies either focus solely on navigation or provide limited communication about the environment. Motivated by recent advances in visual-language grounding and semantic navigation, we propose DRAGON, a guiding robot powered by a dialogue system and the ability to associate the environment with natural language. By understanding the commands from the user, DRAGON is able to guide the user to the desired landmarks on the map, describe the environment, and answer questions from visual observations. Through effective utilization of dialogue, the robot can ground the user's free-form descriptions to landmarks in the environment, and give the user semantic information through spoken language. We conduct a user study with blindfolded participants in an everyday indoor environment. Our results demonstrate that DRAGON is able to communicate with the user smoothly, provide a good guiding experience, and connect users with their surrounding environment in an intuitive manner. Videos are available at https://sites.google.com/view/dragon-wayfinding/home.
|
|
10:00-10:05, Paper ThBT7.2 | |
Space-Aware Instruction Tuning: Dataset and Benchmark for Guide Dog Robots Assisting the Visually Impaired |
|
Han, ByungOk | ETRI |
Yun, Woo-han | Electronics and Telecommunications Research Institute (ETRI) |
Seo, BeomSu | ETRI |
Kim, Jaehong | ETRI |
Keywords: Multi-Modal Perception for HRI, Data Sets for Robot Learning, Natural Dialog for HRI
Abstract: Guide dog robots offer promising solutions to enhance mobility and safety for visually impaired individuals, addressing the limitations of traditional guide dogs, particularly in perceptual intelligence and communication. With the emergence of Vision-Language Models (VLMs), robots are now capable of generating natural language descriptions of their surroundings, aiding in safer decision-making. However, existing VLMs often struggle to accurately interpret and convey spatial relationships, which is crucial for navigation in complex environments such as street crossings. We introduce the Space-Aware Instruction Tuning (SAIT) dataset and the Space-Aware Benchmark (SA-Bench) to address the limitations of current VLMs in understanding physical environments. Our automated data generation pipeline focuses on the virtual path to the destination in 3D space and the surroundings, enhancing environmental comprehension and enabling VLMs to provide more accurate guidance to visually impaired individuals. We also propose an evaluation protocol to assess VLM effectiveness in delivering walking guidance. Comparative experiments demonstrate that our space-aware instruction-tuned model outperforms state-of-the-art algorithms. We have fully open-sourced the SAIT dataset and SA-Bench, along with the related code, at https://github.com/byungokhan/Space-awareVLM.
|
|
10:05-10:10, Paper ThBT7.3 | |
FitnessAgent: A Unified Agent Framework for Open-Set and Personalized Fitness Evaluation |
|
Tang, Zhenhui | Shanghai Jiaotong University |
jiahao Li, Ljh | Dalian University of Technology |
Guo, Ping | Intel |
Tian, Bowen | University of Electronic Science and Technology of China |
Xing, Qingjun | Beijing Sport University |
Xing, XuYang | Nanjing University of Science and Technology |
Wang, Peng | Intel |
Keywords: Multi-Modal Perception for HRI, Computer Vision for Automation, Data Sets for Robotic Vision
Abstract: Robotic systems face challenges in performing open-set and personalized fitness evaluations, especially when adapting to new exercises and individual user needs. This paper introduces FitnessAgent, a unified agent framework designed to address these challenges. Unlike traditional systems that rely on pre-trained neural networks or fixed rule-based criteria, FitnessAgent can assess any exercise without prior training, adapting evaluation metrics based on expert knowledge and user-specific requirements. The system breaks down fitness evaluation tasks into combinations of metrics, each calculated using measurable operators such as angles, distances, and positions. By leveraging a set of primitive, exercise-agnostic operators, a large language model (LLM)-based planner dynamically selects and combines these operators for each task. The open-set capability of FitnessAgent is validated through experiments on both the widely-used Functional Movement Screen dataset and a newly collected isometric pose dataset. Results highlight the system's flexibility in handling new movements and its ability to adapt to personalized evaluation criteria without the need for code or algorithm modifications. FitnessAgent offers a scalable and personalized solution for fitness evaluation, making it well-suited for robotic applications that require adaptability to diverse user needs.
|
|
10:10-10:15, Paper ThBT7.4 | |
A Reinforcement Learning-Based Social Robot for Personalized Learning in Children with Autism |
|
Askari, Farzaneh | McGill University |
Abdollahi, Hojjat | University of Denver |
Haring, Kerstin Sophie | University of Denver |
Mahoor, Mohammad | University of Denver |
Keywords: Human-Robot Collaboration, Reinforcement Learning, Robot Companions
Abstract: This work hypothesizes that a social robot that uses reinforcement learning can effectively adapt to individual differences in teaching imitation skills (e.g., facial expressions) to children with autism spectrum disorder. We developed an active learning method based on reinforcement learning to personalize human-robot interaction sessions based on each child's imitation performance and preference. We evaluated this method with five children with autism spectrum disorder, and the results demonstrated varying responses to different methods of presenting facial expressions to teach imitation skills. We found that the robot consistently promoted increased shared attention, including visual contact and physical proximity during imitation tasks. This suggests that adaptive human-robot interactions can cater to the unique needs of children with autism, offering a promising avenue for personalized intervention. Additionally, we discuss observed qualitative insights from our study and considerations for robot behavior mitigation strategies to sustain engagement.
|
|
10:15-10:20, Paper ThBT7.5 | |
Comparison of User Interface Paradigms for Assistive Robotic Manipulators |
|
Sinclaire, Amelia | University of Massachusetts Lowell |
Wilkinson, Alexander | University of Massachusetts Lowell |
Kim, Boyoung | George Mason University Korea |
Yanco, Holly | UMass Lowell |
Keywords: Design and Human Factors, Virtual Reality and Interfaces
Abstract: This paper presents the results of a within-subjects user study with 27 participants over the age of 60, comparing the use of two different user interfaces for an assistive robot scooter. The graphical user interface (GUI) shows a representation of the environment on a 10-inch touchscreen. The tangible user interface (TUI) consists of a joystick, a box of buttons, and a projector -- designed to keep the user's attention in the real world. Trends suggest that the TUI could help mitigate difficulty caused by highly cluttered environments, as well as differences in individual spatial reasoning ability, but additional studies are needed.
|
|
10:20-10:25, Paper ThBT7.6 | |
VQA-Driven Event Maps for Assistive Navigation for People with Low Vision in Urban Environments |
|
Morales, Joseph | Massachusetts Institute of Technology |
Gebregziabher, Bruk | Biel Glasses |
Cabańeros, Alex | Biel Glasses |
Sanchez-Riera, Jordi | IRI, CSIC-UPC |
Keywords: Multi-Modal Perception for HRI, Semantic Scene Understanding, Human Performance Augmentation
Abstract: We introduce a novel framework for assistive urban navigation for individuals with low vision. Utilizing a smart glasses platform developed by Biel Glasses, which provide a continuous stream of stereo images and GPS fixes, we generate an textit{Event Map} based on key semantic elements extracted by carefully prompted visual question-answering (VQA) models. For individuals with blurry or reduced fields of vision (low vision), traversing city streets poses a variety of challenges; they may struggle to perceive construction work, potholes, crowded sidewalks, and other ambiguous obstacles obstructing their paths. Some tasks, such as distinguishing traffic light signals, are nigh impossible without assistance from a companion or city infrastructure aimed towards accessibility. Although the majority of these problems may be solved with individually tailored traditional computer vision algorithms, developing and running a suite of these algorithms is challenging and resource demanding. Therefore, our proposed solution capitalizes on a single underlying implementation that need only be extended by adding queries. We validate our approach using a custom dataset of over 1,300 annotated images from various locations around Barcelona, reporting performance across different urban navigation tasks. We demonstrate the performance of the end to end system on a run of data collected by the Biel Glasses platform.
|
|
ThBT8 |
311 |
Aerial Robots 4 |
Regular Session |
Chair: Aloimonos, Yiannis | University of Maryland |
Co-Chair: Foong, Shaohui | Singapore University of Technology and Design |
|
09:55-10:00, Paper ThBT8.1 | |
A Robust High-Strength Multi-Surface Rapid UV-Curable Payload Installation System for Generic Multirotors Via Impact Delivery |
|
Lim, Ryan Jon Hui | Singapore University of Technology & Design |
Tan, Jeck Chuang | Singapore University of Technology and Design |
Ng, Matthew | Singapore University of Technology and Design |
Low, Hong Yee | Singapore University of Technology and Design |
Foong, Shaohui | Singapore University of Technology and Design |
Keywords: Aerial Systems: Applications, Field Robots
Abstract: This letter details the design and development of a novel 3D-printed, lightweight and rapid-curing automated payload installation system for aerial robots, using a 3D printed resin-filled adhesive carrier tile (ACT). Its structure is designed to fracture and disperse ultraviolet (UV) curable resin on impact, delivered with a lightweight spring-driven impactor that rams the tile against a target surface. The dispersed resin is then cured with UV light. Shear-testing experiments with 40×40 mm ACTs across common building materials, surface conditions and roughness demonstrate loading exceeding 900N only after 10 seconds of curing, showcasing the strength, robustness and speed of the proposed system. Automated payload installation experiments show potential for applications requiring strong and permanent bonds to wall structures, such as sensor payloads or tether points within urban environments. To the authors’ knowledge, this is the first work employing wet UV adhesives for payload installation via multirotors.
|
|
10:00-10:05, Paper ThBT8.2 | |
Multi-View Stereo with Geometric Encoding for Dense Scene Reconstruction |
|
Yang, Guidong | The Chinese University of Hong Kong |
Cao, Rui | The Chinese University of Hong Kong |
Wen, Junjie | The Chinese University of Hong Kong |
Zhao, Benyun | The Chinese University of Hong Kong |
Li, Qingxiang | The Chineses University of Hong Kong |
Huang, Yijun | The Chinese University of Hong Kong |
Lei, Lei | City University of Hong Kong |
Chen, Xi | The Chinese University of Hong Kong |
Lam, Alan Hiu-Fung | The Chinese University of Hong Kong, |
Liu, Yunhui | Chinese University of Hong Kong |
Chen, Ben M. | Chinese University of Hong Kong |
Keywords: Aerial Systems: Applications
Abstract: Multi-view stereo (MVS) implicitly encodes photometric and geometric cues into the cost volume for multi-view correspondence matching, transferring insufficient geometric cues essential to depth estimation and reconstruction. This paper proposes GE-MVS, a novel multi-view stereo network with geometric encoding for more accurate and complete depth estimation and point cloud reconstruction. First, the cross-view adaptive cost volume aggregation module is proposed to strengthen multi-view geometric cues encoding during cost volume construction. Then, the depth consistency optimization is performed in the 3D point space during learning by invoking ground-truth depth cues from adjacent views. Finally, the surface normal geometries are explicitly encoded to refine the sampled depth hypotheses to be consistent in the local neighbor regions. Extensive experiments on the standard MVS benchmarks including DTU, Tanks and Temples, and BlendedMVS demonstrate the state-of-the-art depth estimation and point cloud reconstruction performance of GE-MVS. The GE-MVS is further deployed in real-world experiments for UAV-based large-scale reconstruction, where our method outperforms the prevalent industrial reconstruction solutions concerning reconstruction efficiency and efficacy. Our project page is: https://cuhk-usr-group.github.io/GE-MVS/
|
|
10:05-10:10, Paper ThBT8.3 | |
MicroASV: An Affordable 3D-Printed Centimeter-Scale Autonomous Surface Vehicle |
|
Macauley, Kevin | University of Wisconsin-Madison |
Chen, Zhiheng | Cornell University |
Wang, Wei | University of Wisconsin-Madison |
Keywords: Marine Robotics, Swarm Robotics, Field Robots
Abstract: This paper introduces the design, fabrication, and autonomous control of MicroASV, a low-cost, centimeter-scale autonomous surface Vehicle (ASV). MicroASV has a square footprint with a side length of 85 mm. Its propulsion system consists of four custom water jets arranged in a “Diamond”- shaped actuator configuration, powered by magnetically coupled brushless motors. This setup allows for complete 2D mobility, enabling forward and backward motion, lateral translation, and in-place rotation. The MicroASV is built using commercially available motors and 3D-printed components, creating a modular, appendage-free structure that is simple to assemble. An onboard camera and inertial measurement unit (IMU) are integrated to enable real-time localization, with position and heading controllers developed to provide autonomous feedback control. Preliminary experiments validate the platform’s effectiveness in motion, sensing, and control, establishing MicroASV as a valuable tool for studying centimeter-scale ASV control, both individually and in collective swarm operations.
|
|
10:10-10:15, Paper ThBT8.4 | |
Airflow Source Seeking on Small Quadrotors Using a Single Flow Sensor |
|
Thomas, Lenworth | Carnegie Mellon University |
Bridges, Tjaden | Carnegie Mellon University |
Bergbreiter, Sarah | Carnegie Mellon University |
Keywords: Aerial Systems: Applications, Environment Monitoring and Management, Reactive and Sensor-Based Planning
Abstract: As environmental disasters happen more frequently and severely, seeking the source of pollutants or harmful particulates using plume tracking becomes even more important. Plume tracking on small quadrotors would allow these systems to operate around humans and fly in more confined spaces, but can be challenging due to poor sensitivity and long response times from gas sensors that fit on small drones. In this work, we present an approach to complement chemical plume tracking with airflow source-seeking behavior using a custom flow sensor that can sense both airflow magnitude and direction on small quadrotors (<100 g). We use this sensor to implement a modified version of the `Cast and Surge' algorithm that takes advantage of flow direction sensing to find and navigate towards flow sources. A series of characterization experiments verified that the system can detect airflow while in flight and reorient the quadrotor toward the airflow. Several trials with random starting locations and orientations were used to show that our source-seeking algorithm can reliably find a flow source. This work aims to provide a foundation for future platforms that can use flow sensors in concert with other sensors to enable richer plume tracking data collection and source-seeking.
|
|
10:15-10:20, Paper ThBT8.5 | |
Air-FAR: Fast and Adaptable Routing for Aerial Navigation in Large-Scale Complex Unknown Environments |
|
He, Botao | University of Maryland |
Chen, Guofei | Carnegie Mellon University |
Fermuller, Cornelia | University of Maryland |
Aloimonos, Yiannis | University of Maryland |
Zhang, Ji | Carnegie Mellon University |
Keywords: Field Robots, Task and Motion Planning, Aerial Systems: Perception and Autonomy
Abstract: This paper presents a novel method for real-time 3D navigation in large-scale, complex environments using a hierarchical 3D visibility graph (V-graph). The proposed algorithm addresses the computational challenges of V-graph construction and shortest path search on the graph simulta- neously. By introducing hierarchical 3D V-graph construction with heuristic visibility update, the 3D V-graph is constructed in O(K ·n2logn) time, which guarantees real-time performance. The proposed iterative divide-and-conquer path search method can achieve near-optimal path solutions within the constraints of real-time operations. The algorithm ensures efficient 3D V- graph construction and path search. Extensive simulated and real-world environments validated that our algorithm reduces the travel time by 42%, achieves up to 24.8% higher trajectory efficiency, and runs faster than most benchmarks by orders of magnitude in complex environments. The code and developed simulator have been open-sourced to facilitate future research.
|
|
10:20-10:25, Paper ThBT8.6 | |
Multi-Agent Visual-Inertial Localization for Integrated Aerial Systems with Loose Fusion of Odometry and Kinematics |
|
Lai, Ganghua | Beijing Institute of Technology |
Shi, Chuanbeibei | Univeristy of Bristol |
Wang, Kaidi | Beijing Institute of Technology |
Yu, Yushu | Beijing Institute of Technology |
Dong, Yiqun | Nanyang Technological University |
Franchi, Antonio | University of Twente / Sapienza University of Rome |
Keywords: Aerial Systems: Applications, Localization, Multi-Robot SLAM
Abstract: Reliably and efficiently estimating the relative pose and global localization of robots in a common reference for Integrated Aerial Platforms (IAPs) is a challenging problem. Unlike unmanned aerial vehicle (UAV) swarms, where the agent individual is able to move freely, IAPs connect UAV agents with mechanical joints, such as spherical joints, and form a rigid central platform, limiting the degree of freedom (DOF) of agents. Traditional methods, which rely on forming loop closures, object detection, or range sensors, suffer from degeneration or inefficiency due to the restricted relative motion between agents. In this paper, we present a centralized multi-agent localization system that fuses the internal kinematic constraints of IAPs and odometry measurements, using only visual-inertial suits for ego-motion estimation for agents and an additional 9-DOF Inertial Measurement Unit (IMU) attached to the central platform for posture estimation. A general formulation for kinematic constraints is derived without requiring knowledge about detailed kinematic parameters. A sliding-window optimization-based state estimator is constructed to estimate the relative transformation between agents. Our proposed approach is validated in our collected dataset. The results show that the proposed method reduces the global localization drift by 27.15% and relative localization error by 53.4% in the translation part and 36.99% in the rotation part compared to the baseline.
|
|
10:25-10:30, Paper ThBT8.7 | |
Multi Map Visual Localization for Unmanned Aerial Vehicles |
|
Lřmo, Tobias | University of Oslo |
Maffei, Renan | Federal University of Rio Grande Do Sul |
Kolberg, Mariana | UFRGS |
Torresen, Jim | University of Oslo |
Keywords: Aerial Systems: Perception and Autonomy, Localization, Vision-Based Navigation
Abstract: Localization has long been an essential area of research within robotics. The popularity of using Unmanned Aerial Vehicles (UAVs) to solve different tasks has increased and is expected to continue. Developing a robust complementary system to the Global Navigation Satellite Systems (GNSS) used today has been researched, and visual localization using cameras and satellite images is a popular choice to use. One of the challenges with using satellite images is that different images over the same area can impact the system’s performance. This article proposes a novel approach called Multi Map Visual Localization (MMVL), a method to use multiple satellite images simultaneously, which is combined using a weighted average of probability maps. The proposal uses a convolutional neural network (CNN) with a caching strategy together with Monto Carlo Localization (MCL). MMVL achieves excellent robustness compared to other approaches and manages to estimate the correct location on all test flights. At the same time, using multiple satellite images does not significantly impact accuracy and computation time.
|
|
ThBT9 |
312 |
Task and Motion Planning 1 |
Regular Session |
Chair: Morales, Marco | University of Illinois Urbana-Champaign & Instituto Tecnológico Autónomo De México |
Co-Chair: Beetz, Michael | University of Bremen |
|
09:55-10:00, Paper ThBT9.1 | |
Task and Motion Planning for Execution in the Real |
|
Pan, Tianyang | Rice University |
Shome, Rahul | The Australian National University |
Kavraki, Lydia | Rice University |
Keywords: Task and Motion Planning, Motion and Path Planning, Manipulation Planning, Task Planning
Abstract: Task and motion planning represents a powerful set of hybrid planning methods that combine reasoning over discrete task domains and continuous motion generation. Traditional reasoning necessitates task domain models and enough information to ground actions to motion planning queries. Gaps in this knowledge often arise from sources like occlusion or imprecise modeling. This work generates task and motion plans that include actions cannot be fully grounded at planning time. During execution, such an action is handled by a provided human-designed or learned closed-loop behavior. Execution combines offline planned motions and online behaviors till reaching the task goal. Failures of behaviors are fed back as constraints to find new plans. Forty real-robot trials and motivating demonstrations are performed to evaluate the proposed framework and compare against state-of-the-art. Results show faster execution time, less number of actions, and more success in problems where diverse gaps arise. The experiment data is shared for researchers to simulate these settings. The work shows promise in expanding the applicable class of realistic partially grounded problems that robots can address.
|
|
10:00-10:05, Paper ThBT9.2 | |
Automated Planning Domain Inference for Task and Motion Planning |
|
Huang, Jinbang | University of Toronto |
Tao, Allen | University of Toronto |
Marco, Rozilyn | University of Toronto |
Bogdanovic, Miroslav | University of Toronto |
Kelly, Jonathan | University of Toronto |
Shkurti, Florian | University of Toronto |
Keywords: Task and Motion Planning, Integrated Planning and Learning
Abstract: Task and motion planning (TAMP) frameworks address long and complex planning problems by integrating high-level task planners with low-level motion planners. However, existing TAMP methods rely heavily on the manual design of planning domains that specify the preconditions and postconditions of all high-level actions. This paper proposes a method to automate planning domain inference from a handful of test-time trajectory demonstrations, reducing the reliance on human design. Our approach incorporates a deep learning-based estimator that predicts the appropriate components of a domain for a new task and a search algorithm that refines this prediction, reducing the size and ensuring the utility of the inferred domain. Our method can generate new domains from minimal demonstrations at test time, enabling robots to handle complex tasks more efficiently. We demonstrate that our approach outperforms behavior cloning baselines, which directly imitate planner behavior, in terms of planning performance and generalization across a variety of tasks. Additionally, our method reduces computational costs and data amount requirements at test time for inferring new planning domains.
|
|
10:05-10:10, Paper ThBT9.3 | |
Shadow Program Inversion with Differentiable Planning: A Framework for Unified Robot Program Parameter and Trajectory Optimization |
|
Alt, Benjamin | ArtiMinds Robotics |
Kienle, Claudius | ArtiMinds Robotics GmbH |
Katic, Darko | HFT STUTTGART |
Jäkel, Rainer | Karlsruhe Institute of Technology |
Beetz, Michael | University of Bremen |
Keywords: Motion and Path Planning, Task and Motion Planning, Integrated Planning and Learning
Abstract: This paper presents SPI-DP, a novel first-order optimizer capable of optimizing robot programs with respect to both high-level task objectives and motion-level constraints. To that end, we introduce DGPMP2-ND, a differentiable collision-free motion planner for serial N-DoF kinematics, and integrate it into an iterative, gradient-based optimization approach for generic, parameterized robot program representations. SPI-DP allows first-order optimization of planned trajectories and program parameters with respect to objectives such as cycle time or smoothness subject to e.g. collision constraints, while enabling humans to understand, modify or even certify the optimized programs. We provide a comprehensive evaluation on two practical household and industrial applications.
|
|
10:10-10:15, Paper ThBT9.4 | |
AlignBot: Aligning VLM-Powered Customized Task Planning with User Reminders through Fine-Tuning for Household Robots |
|
Zhaxizhuoma, Zhaxizhuoma | Shanghai Artificial Intelligence Laboratory |
Chen, Pengan | The University of Hong Kong |
Wu, Ziniu | University of Bristol |
Sun, Jiawei | Shanghai Artificial Intelligence Laboratory |
Wang, Dong | Shanghai Artificial Intelligence Laboratory |
Zhou, Peng | Great Bay University |
Cao, Nieqing | Binghamton University |
Ding, Yan | SUNY Binghamton |
Zhao, Bin | Northwestern Polytechnical University |
Li, Xuelong | Northwestern Polytechnical University |
Keywords: Task and Motion Planning, Human-Centered Robotics, Learning from Experience
Abstract: This paper presents AlignBot, a novel framework designed to optimize VLM-powered customized task planning for household robots by effectively aligning with user reminders. In domestic settings, aligning task planning with user reminders poses significant challenges due to the limited quantity, diversity, and multimodal nature of the reminders. To address these challenges, AlignBot employs a fine-tuned LLaVA-7B model, functioning as an adapter for GPT-4o. This adapter model internalizes diverse forms of user reminders-such as personalized preferences, corrective guidance, and contextual assistance-into structured that prompt GPT-4o in generating customized task plans. Additionally, AlignBot integrates a dynamic retrieval mechanism that selects task-relevant historical successes as prompts for GPT-4o, further enhancing task planning accuracy. To validate the effectiveness of AlignBot, experiments are conducted in real-world household environments, which are constructed within the laboratory to replicate typical household settings. A multimodal dataset with over 1,500 entries derived from volunteer reminders is used for training and evaluation. The results demonstrate that AlignBot significantly improves customized task planning, outperforming existing LLM- and VLM-powered planners by interpreting and aligning with user reminders, achieving 86.8% success rate compared to the vanilla GPT-4o baseline at 21.6%, reflecting a 65% improvement and over four times greater effectiveness. Supplementary materials are available at: https://yding25.com/AlignBot/
|
|
10:15-10:20, Paper ThBT9.5 | |
Curiosity-Driven Imagination: Discovering Plan Operators and Learning Associated Policies for Open-World Adaptation |
|
Lorang, Pierrick | AIT Austrian Institute of Technology GmbH - Tufts University |
Lu, Hong | Tufts University |
Scheutz, Matthias | Tufts University |
Keywords: Integrated Planning and Learning, Task and Motion Planning, Learning from Experience
Abstract: Adapting quickly to dynamic, uncertain environments—often called ``open worlds"—remains a major challenge in robotics. Traditional Task and Motion Planning (TAMP) approaches struggle to cope with unforeseen changes, are data-inefficient when adapting, and do not leverage world models during learning. We address this issue with a hybrid planning and learning system that integrates two models: a low-level neural network-based model that learns stochastic transitions and drives exploration via an Intrinsic Curiosity Module (ICM), and a high-level symbolic planning model that captures abstract transitions using operators, enabling the agent to plan in an ``imaginary" space and generate reward machines. Our evaluation in a robotic manipulation domain with sequential novelty injections demonstrates that our approach converges faster and outperforms state-of-the-art hybrid methods.
|
|
10:20-10:25, Paper ThBT9.6 | |
Optimization-Based Task and Motion Planning under Signal Temporal Logic Specifications Using Logic Network Flow |
|
Lin, Xuan | UCLA |
Ren, Jiming | Georgia Institute of Technology |
Coogan, Samuel | Georgia Tech |
Zhao, Ye | Georgia Institute of Technology |
Keywords: Task and Motion Planning, Path Planning for Multiple Mobile Robots or Agents, Formal Methods in Robotics and Automation
Abstract: This paper proposes an optimization-based task and motion planning framework, named "Logic Network Flow", to integrate signal temporal logic (STL) specifications into efficient mixed-binary linear programmings. In this framework, temporal predicates are encoded as polyhedron constraints on each edge of the network flow, instead of as constraints between the nodes as in the traditional Logic Tree formulation. Synthesized with Dynamic Network Flows, Logic Network Flows render a tighter convex relaxation compared to Logic Trees derived from these STL specifications. Our formulation is evaluated on several multi-robot motion planning case studies. Empirical results demonstrate that our formulation outperforms Logic Tree formulation in terms of computation time for several planning problems. As the problem size scales up, our method still discovers better lower and upper bounds by exploring fewer number of nodes during the branches.
|
|
10:25-10:30, Paper ThBT9.7 | |
Integrating Active Sensing and Rearrangement Planning for Efficient Object Retrieval from Unknown, Confined, Cluttered Environments |
|
Kim, Junyoung | Purdue University |
Ren, Hanwen | Purdue University |
Qureshi, Ahmed H. | Purdue University |
Keywords: Task and Motion Planning, Reactive and Sensor-Based Planning, Task Planning
Abstract: Retrieving target objects from unknown, confined spaces remains a challenging task that requires integrated, task-driven active sensing and rearrangement planning. Previous approaches have independently addressed active sensing and rearrangement planning, limiting their practicality in real-world scenarios. This paper presents a new, integrated heuristic-based active sensing and Monte-Carlo Tree Search (MCTS)-based retrieval planning approach. These components provide feedback to one another to actively sense critical, unobserved areas suitable for the retrieval planner to plan a sequence for relocating path-blocking obstacles and a collision-free trajectory for retrieving the target object. We demonstrate the effectiveness of our approach using a robot arm equipped with an in-hand camera in both simulated and real-world confined, cluttered scenarios. Our framework is compared against various state-of-the-art methods. The results indicate that our proposed approach outperforms baseline methods by a significant margin in terms of the success rate, the object rearrangement planning time consumption and the number of planning trials before successfully retrieving the target.
|
|
ThBT10 |
313 |
Multi-Robot Systems 4 |
Regular Session |
Chair: Keren, Sarah | Technion - Israel Institute of Technology |
Co-Chair: Zhao, Lin | National University of Singapore |
|
09:55-10:00, Paper ThBT10.1 | |
A Cooperative Bearing-Rate Approach for Observability-Enhanced Target Motion Estimation |
|
Zheng, Canlun | Westlake University |
Guo, Hanqing | Westlake University |
Zhao, Shiyu | Westlake University |
Keywords: Sensor Networks, Localization
Abstract: Vision-based target motion estimation is a fundamental problem in many robotic tasks. The existing methods have the limitation of low observability and, hence, face challenges in tracking highly maneuverable targets. Motivated by the aerial target pursuit task where a target may maneuver in 3D space, this paper studies how to further enhance observability by incorporating the emph{bearing rate} information that has not been well explored in the literature. The main contribution of this paper is to propose a new cooperative estimator called STT-R (Spatial-Temporal Triangulation with bearing Rate), which is designed under the framework of distributed recursive least squares. This theoretical result is further verified by numerical simulation and real-world experiments. It is shown that the proposed STT-R algorithm can effectively generate more accurate estimations and effectively reduce the lag in velocity estimation, enabling tracking of more maneuverable targets.
|
|
10:00-10:05, Paper ThBT10.2 | |
Overlapping Free: Anchorless UWB-Assisted Relative Pose Estimation for Multi-Robot Systems |
|
Yun, Yanpu | Nanyang Technological University |
Peng, Guohao | Nanyang Technological University |
Zhou, Yichen | Nanyang Technological University |
Zhang, Jun | Nanyang Technological University |
Liu, Yiyao | NANYANG Technological University |
Mao, Kaimin | Nanyang Technological University |
Wang, Danwei | Nanyang Technological University |
Keywords: Multi-Robot Systems
Abstract: Accurate Relative Pose Estimation (RPE) is critical for effective collaboration of multi-robot systems. Traditional methods using cameras or LiDARs heavily rely on overlapping Fields of View (FoV) between robots, which is highly demanding in practical applications and may hinder collaboration efficiency. To accommodate this issue, we propose Anchorless UWB-Assisted Relative Pose Estimation (AURPE), a novel approach that leverages ultra-wideband (UWB) technology in an anchorless setup to achieve multi-robot RPE without requiring overlapping FoVs or external infrastructure. AURPE first estimates the initial relative poses between robots using inter-robot UWB ranging combined with a Bayesian framework and constrained optimization. During robot operation, AURPE continuously refines the relative poses by integrating UWB measurements with LiDAR-inertial odometry (LIO) and employs a consensus voting mechanism to identify the most reliable pose estimates. Additionally, a pose graph-based back-end optimization is incorporated to enhance the accuracy of both initial and real-time relative pose. Extensive simulations and real-world experiments demonstrate that AURPE achieves accurate RPE even in non-overlapping scenarios where traditional methods fail. Compared to state-of-the-art point cloud registration methods, AURPE shows superior performance in both accuracy and robustness, highlighting its potential to significantly enhance cooperative tasks in multi-robot systems operating in complex environments.
|
|
10:05-10:10, Paper ThBT10.3 | |
Maintaining Strong R-Robustness in Reconfigurable Multi-Robot Networks Using Control Barrier Functions |
|
Lee, Haejoon | University of Michigan |
Panagou, Dimitra | University of Michigan, Ann Arbor |
Keywords: Networked Robots, Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems
Abstract: In leader-follower consensus, strong r-robustness of the communication graph provides a sufficient condition for followers to achieve consensus in the presence of misbehaving agents. Previous studies have assumed that robots can form and/or switch between predetermined network topologies with known robustness properties. However, robots with distance-based communication models may not be able to achieve these topologies while moving through spatially constrained environments, such as narrow corridors, to complete their objectives. This paper introduces a Control Barrier Function (CBF) that ensures robots maintain strong r-robustness of their communication graph above a certain threshold without maintaining any fixed topologies. Our CBF directly addresses robustness, allowing robots to have flexible reconfigurable network structure while navigating to achieve their objectives. The efficacy of our method is tested through various simulation and hardware experiments.
|
|
10:10-10:15, Paper ThBT10.4 | |
Online Waypoint Recognition of Controlled Agents in Uncertain Environments |
|
Guo, Jia | Cornell University |
Surve, Sushrut | Cornell University |
He, Zilong | Cornell University |
Ferrari, Silvia | Cornell University |
Keren, Sarah | Technion - Israel Institute of Technology |
Keywords: Cooperating Robots, Integrated Planning and Control, Autonomous Agents
Abstract: For multi-robot teams with limited communication, the ability to rapidly recognize the intention of a teammate via its exhibited behavior is key to achieving effective collaboration. While current research on plan and goal recognition provide powerful tools, most of them rely on a high-level abstraction of the environment and of its dynamics. We propose online waypoint recognition (OWR) that incorporates knowledge about the dynamic models into the analysis of the observed agent behavior. Our algorithm takes the form of a Kalman filter and performs recognition of the agent's intended waypoint at high frequency. The approach is robust to uncertainties in dynamics and observations. Moreover, it does not require the agent to reach the next waypoint to perform recognition, which saves valuable time. Our empirical evaluation shows the ability of our proposed algorithm to expedite recognition of both simulated and real-world mobile robots.
|
|
10:15-10:20, Paper ThBT10.5 | |
MARF: Cooperative Multi-Agent Path Finding with Reinforcement Learning and Frenet Lattice in Dynamic Environments |
|
Hu, Tianyang | Zhejiang University |
Zhang, Zhen | Zhejiang University |
Zhu, Chengrui | Zhejiang University |
Xu, Gang | Zhejiang University |
Wu, Yuchen | Zhejiang University |
Wu, Huifeng | Hangzhou Dianzi University |
Liu, Yong | Zhejiang University |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Motion and Path Planning, Reinforcement Learning
Abstract: Multi-agent path finding (MAPF) in dynamic and complex environments is a highly challenging task. Recent research has often focused on the scalability of the number of robots or the complexity of the environment. Usually, they disregard the robots' physical models or use a differential drive robot. However, this approach fails to adequately capture the kinematic and dynamic constraints of real-world vehicles, particularly those equipped with Ackermann steering in warehousing applications. This paper presents a novel MAPF algorithm that combines reinforcement learning (RL) with a lattice planner. RL provides strong generalization capabilities while maintaining computational efficiency. By incorporating lattice planner trajectories into the action space of the RL framework, agents are capable of generating smooth and feasible paths that respect the kinematic and dynamic constraints. In addition, we adopt a decentralized training and execution framework, where a network of shared value functions enables efficient cooperation among agents during decision-making. Simulation results and real-world experiments in different scenarios demonstrate that our method achieves superior performance in terms of success rate, average speed, extra distance of trajectory, and computing time.
|
|
10:20-10:25, Paper ThBT10.6 | |
Robust Self-Reconfiguration for Fault-Tolerant Control of Modular Aerial Robot Systems |
|
Huang, Rui | National University of Singapore |
Tang, Siyu | National University of Singapore |
Cai, Zhiqian | National University of Singapore |
Zhao, Lin | National University of Singapore |
Keywords: Cellular and Modular Robots, Failure Detection and Recovery, Aerial Systems: Applications
Abstract: Modular Aerial Robotic Systems (MARS) consist of multiple drone units assembled into a single, integrated rigid flying platform. With inherent redundancy, MARS can self-reconfigure into different configurations to mitigate rotor or unit failures and maintain stable flight. However, existing works on MARS self-reconfiguration often overlook the practical controllability of intermediate structures formed during the reassembly process, which limits their applicability. In this paper, we address this gap by considering the control-constrained dynamic model of MARS and proposing a robust and efficient self-reconstruction algorithm that maximizes the controllability margin at each intermediate stage. Specifically, we develop algorithms to compute optimal, controllable disassembly and assembly sequences, enabling robust self-reconfiguration. Finally, we validate our method in several challenging fault-tolerant self-reconfiguration scenarios, demonstrating significant improvements in both controllability and trajectory tracking while reducing the number of assembly steps. The videos and source code of this work are available at https://github.com/RuiHuangNUS/MARS-Reconfig/
|
|
10:25-10:30, Paper ThBT10.7 | |
Where Are You? Unscented Particle Filter for Single Range Relative Pose Estimation in Unobservable Motion Using UWB and VIO |
|
Durodié, Yuri | Vrije Universiteit Brussel |
Convens, Bryan | Vrije Universiteit Brussel |
Liu, Gaoyuan | Vrije Universiteit Brussel |
Decoster, Thomas | Vrije Universiteit Brussel |
Munteanu, Adrian | Vrije Universiteit Brussel |
Vanderborght, Bram | Vrije Universiteit Brussel |
Keywords: Multi-Robot Systems, Localization, Sensor Fusion
Abstract: Real-time relative pose (RP) estimation is a corner- stone for effective multi-agent collaboration. When conventional global positioning infrastructure such as GPS is unavailable, the use of Ultra-Wideband (UWB) technology on each agent provides a practical means to measure inter-agent range, eliminating the need for external hardware installations, due to UWB’s precise range measurements and robust communi- cation capabilities. However, when only a single UWB device per agent is used, the relative pose between the agents can be unobservable, resulting in a complex solution space with multiple possible RPs. In this paper, a novel method is proposed based on an Unscented Particle Filter (UPF) that fuses single UWB ranges with visual-inertial odometry (VIO). The proposed decentralized method solves the multi-modal solution in 3D (4-DoF) for the RP when it is unobservable. Moreover, a pseudo-state is introduced to correct for rotational drift of the agents. Through simulations and experiments involving two robots, the proposed solution was shown to be competitive, but less computationally expensive. Additionally, the proposed solution provides all possible relative poses from the first measurement. The code and link to the video are available https://github.com/y2d2/UPF_RPE.
|
|
ThBT11 |
314 |
Robot Vision 1 |
Regular Session |
Chair: Malis, Ezio | Inria |
Co-Chair: Culbertson, Preston | Cornell University |
|
09:55-10:00, Paper ThBT11.1 | |
Asynchronous Blob Tracker for Event Cameras |
|
Wang, Ziwei | Australian National University |
Molloy, Timothy L. | Australian National University |
van Goor, Pieter | University of Twente |
Mahony, Robert | Australian National University |
Keywords: Computer Vision for Automation, Aerial Systems: Perception and Autonomy, Visual Tracking, Event Cameras
Abstract: Event-based cameras are popular for tracking fast-moving objects due to their high temporal resolution, low latency, and high dynamic range. In this paper, we propose a novel algorithm for tracking event blobs using raw events asynchronously in real time. We introduce the concept of an event blob as a spatio-temporal likelihood of event occurrence where the conditional spatial likelihood is blob-like. Many real-world objects such as car headlights or any quickly moving foreground objects generate event blob data. The proposed algorithm uses a nearest neighbour classifier with a dynamic threshold criteria for data association coupled with an extended Kalman filter to track the event blob state. Our algorithm achieves highly accurate blob tracking, velocity estimation, and shape estimation even under challenging lighting conditions and high-speed motions (> 11000 pixels/s). The microsecond time resolution achieved means that the filter output can be used to derive secondary information such as time-to-contact or range estimation, that will enable applications to real-world problems such as collision avoidance in autonomous driving.
|
|
10:00-10:05, Paper ThBT11.2 | |
Deep Height Decoupling for Precise Vision-Based 3D Occupancy Prediction |
|
Wu, Yuan | Nanjing University of Science and Technology |
Yan, Zhiqiang | Nanjing University of Science and Tenchnology |
Wang, Zhengxue | Nanjing University of Science and Technology |
Li, Xiang | Nankai University |
Hui, Le | Nanjing University of Science and Technology |
Yang, Jian | Nanjing University of Science & Technology |
Keywords: Computer Vision for Transportation, Semantic Scene Understanding
Abstract: The task of vision-based 3D occupancy prediction aims to reconstruct 3D geometry and estimate its semantic classes from 2D color images, where the 2D-to-3D view transformation is an indispensable step. Most previous methods conduct forward projection, such as BEVPooling and VoxelPooling, both of which map the 2D image features into 3D grids. However, the current grid representing features within a certain height range usually introduces many confusing features that belong to other height ranges. To address this challenge, we present Deep Height Decoupling (DHD), a novel framework that incorporates explicit height prior to filter out the confusing features. Specifically, DHD first predicts height maps via explicit supervision. Based on the height distribution statistics, DHD designs Mask Guided Height Sampling (MGHS) to adaptively decouple the height map into multiple binary masks. MGHS projects the 2D image features into multiple subspaces, where each grid contains features within reasonable height ranges. Finally, a Synergistic Feature Aggregation (SFA) module is deployed to enhance the feature representation through channel and spatial affinities, enabling further occupancy refinement. On the popular Occ3D-nuScenes benchmark, our method achieves state-of-the-art performance even with minimal input frames. Source code is released at https://github.com/yanzq95/DHD.
|
|
10:05-10:10, Paper ThBT11.3 | |
RE0: Recognize Everything with 3D Zero-Shot Instance Segmentation |
|
Yan, Xiaohan | Tongji University |
Jiang, Zijian | Tongji University |
Shuai, Yinghao | Tongji University |
Wang, Nan | Tongji University |
Song, Xiaowei | Tongji University |
Ji, Wenbo | Tongji University |
Wu, Ge | Nankai University |
He, Jinyu | Xiamen University |
Wei, Gang | Tongji University |
Wang, Zhicheng | Tongji University |
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding, Embodied Cognitive Science
Abstract: Recognizing objects in the 3D world is a significant challenge for robotics. Due to the lack of high-quality 3D data, directly training a general-purpose segmentation model in 3D is almost infeasible. Meanwhile, vision foundation models (VFM) have revolutionized the 2D computer vision field with outstanding performance, making the use of VFM to assist 3D perception a promising direction. However, most existing VFM-assisted methods do not effectively address the 2D-3D inconsistency problem or adequately provide corresponding semantic information for 3D instance objects. To address these two issues, this paper introduces a novel framework for 3D zero-shot instance segmentation called RE0. For the given 3D point clouds and multi-view RGB-D images with poses, we leverage the 3D geometric information, projection relationships, and CLIP semantic features. Specifically, we utilize CropFormer to extract mask information from multi-view posed images, combined with projection relationships to assign point-level labels to each point in the point cloud, and achieve instance- level consistency through inter-frame information interaction. Then, we employ projection relationships again to assign CLIP semantic features to the point cloud and achieve aggregation of small-scale point clouds. Notably, RE0 does not require any additional training and can be implemented by supporting only one inference of CropFormer and one inference of CLIP. Experiments on ScanNet200 and ScanNet++ show that our method achieves higher quality segmentation than the previous zero-shot methods. Our codes and demos are available at https://recognizeeverything.github.io/, with only one RTX 3090 GPU required.
|
|
10:10-10:15, Paper ThBT11.4 | |
PTQ4RIS: Post-Training Quantization for Referring Image Segmentation |
|
Jiang, Xiaoyan | Shanghai University of Engineering Science |
Yang, Hang | Shanghai University of Engineering Science |
Zhu, Kaiying | SenseTime |
Qiu, Xihe | Shanghai University of Engineering Science |
Zhao, Shibo | Carnegie Mellon University |
Zhou, Sifan | Southeast University |
Keywords: Robotics in Under-Resourced Settings, Object Detection, Segmentation and Categorization, Semantic Scene Understanding
Abstract: Referring Image Segmentation (RIS), aims to segment the object referred by a given sentence in an image by understanding both visual and linguistic information. However, existing RIS methods tend to explore top-performance models, disregarding considerations for practical applications on resources-limited edge devices. This oversight poses a significant challenge for on-device RIS inference. To this end, we propose an effective and efficient post-training quantization framework termed PTQ4RIS. Specifically, we first conduct an in-depth analysis of the root causes of performance degradation in RIS model quantization and propose dual-region quantization (DRQ) and reorder-based outlier-retained quantization (RORQ) to address the quantization difficulties in visual and text encoders. Extensive experiments on three benchmarks with different bits settings (from 8 to 4 bits) demonstrates its superior performance. Importantly, we are the first PTQ method specifically designed for the RIS task, highlighting the feasibility of PTQ in RIS applications.
|
|
10:15-10:20, Paper ThBT11.5 | |
LeAP: Consistent Multi-Domain 3D Labeling Using Foundation Models |
|
Gebraad, Simon | Delft University of Technology |
Palffy, Andras | Delft University of Technology |
Caesar, Holger | TU Delft |
Keywords: Data Sets for Robotic Vision, Sensor Fusion, Deep Learning for Visual Perception
Abstract: Availability of datasets is a strong driver for research on 3D semantic understanding, and whilst obtaining unlabeled 3D data is straightforward, manually annotating this data with semantic labels is time-consuming and costly. As a result, labeled 3D datasets have largely been confined to the popular automotive domain due to the abundance of labeled data. Recently, Vision Foundation Models (VFMs) enable open-set semantic segmentation, potentially aiding automatic labeling. However, VFMs for 3D data have been limited to adaptations of 2D models, which can introduce inconsistencies to 3D labels. This work introduces Label Any Pointcloud (LeAP), leveraging 2D VFMs to automatically label multi-frame 3D data with any set of classes in any kind of application whilst ensuring label consistency. Using a Bayesian update, point labels are combined into voxels to improve spatio-temporal consistency. A novel 3D Consistency Network (3D-CN) further enhances geometric consistency. Through various experiments, we show that our method can generate high-quality 3D semantic labels across diverse fields without any manual labeling. Further, models adapted to new domains using our labels show a significant mIoU increase in semantic segmentation tasks.
|
|
10:20-10:25, Paper ThBT11.6 | |
PlaceFormer: Transformer-Based Visual Place Recognition Using Multi-Scale Patch Selection and Fusion |
|
Kannan, Shyam Sundar | Purdue University |
Min, Byung-Cheol | Purdue University |
Keywords: Localization, Visual Learning, Deep Learning for Visual Perception
Abstract: Visual place recognition is a challenging task in the field of computer vision, and autonomous robotics and vehicles, which aims to identify a location or a place from visual inputs. Contemporary methods in visual place recognition employ convolutional neural networks and utilize every region within the image for the place recognition task. However, the presence of dynamic and distracting elements in the image can impact the effectiveness of the place recognition process. Therefore, it is meaningful to focus on the task-relevant regions of the image for improved recognition. In this paper, we present PlaceFormer, a novel transformer-based approach for visual place recognition. PlaceFormer uses patch tokens from the transformer to create global image descriptors, which are then used for image retrieval. To re-rank the retrieved images, PlaceFormer merges the patch tokens from the transformer to form multi-scale patches. Utilizing the transformer's self-attention mechanism, it selects patches that correspond to task-relevant areas in an image. These selected patches undergo geometric verification, generating similarity scores across different patch sizes. Subsequently, the spatial scores from each patch size are fused to produce a final similarity score. This score is then used to re-rank the images initially retrieved using global image descriptors. Extensive experiments on benchmark datasets demonstrate that PlaceFormer outperforms several state-of-the-art methods in terms of accuracy and computational efficiency, requiring less time and memory.
|
|
10:25-10:30, Paper ThBT11.7 | |
Motion-Aware Optical Camera Communication with Event Cameras |
|
Su, Hang | ShanghaiTech University |
Gao, Ling | ShanghaiTech University |
Liu, Tao | ShanghaiTech University |
Kneip, Laurent | ShanghaiTech University |
Keywords: Localization, Visual Tracking, Automation Technologies for Smart Cities
Abstract: As the ubiquity of smart mobile devices continues to rise, Optical Camera Communication systems have gained more attention as a solution for efficient and private data streaming. This system utilizes optical cameras to receive data from digital screens via visible light. Despite their promise, most of them are hindered by dynamic factors such as screen refreshing and rapid camera motion. CMOS cameras, often serving as the receivers, suffer from limited frame rates and motion-induced image blur, which degrade overall performance. To address these challenges, this paper unveils a novel system that utilizes event cameras. We introduce a dynamic visual marker and design event-based tracking algorithms to achieve fast localization and data streaming. Remarkably, the event camera's unique capabilities mitigate issues related to screen refresh rates and camera motion, enabling a high throughput of up to 114 Kbps in static conditions, and a 1 cm localization accuracy with 1% bit error rate under various camera motions. We plan on open-sourcing the code upon acceptance.
|
|
ThBT12 |
315 |
Applications in the Wild |
Regular Session |
Chair: Kelasidi, Eleni | NTNU |
Co-Chair: Hutter, Marco | ETH Zurich |
|
09:55-10:00, Paper ThBT12.1 | |
Hybrid State Estimation and Mode Identification of an Amphibious Robot |
|
Amundsen, Herman Biřrn | NTNU |
Randeni, Supun | Massachusetts Institute of Technology |
Bingham, Russell | Pliant Energy Systems Inc |
Civit, Carles | Pliant Energy Systems Inc |
Filardo, Benjamin Pietro | Pliant Energy Systems Inc |
Fřre, Martin | NTNU |
Kelasidi, Eleni | NTNU |
Benjamin, Michael | Massachusetts Institute of Technology |
Keywords: Discrete Event Dynamic Automation Systems, Localization, Biologically-Inspired Robots
Abstract: C-Ray is an amphibious robot that is capable of swimming in water and crawling on land using its undulating fins, enabling operations in a wide range of environments. The robot can be modeled as a hybrid dynamical system whose dynamics and propulsion change when the robot transitions between water and land. Most importantly, the direction of wave travel in the robot's fins is reversed between its swimming and crawling locomotion styles. To operate autonomously, C-Ray requires both accurate identification of when transitions between water and land occur and robust state estimation in littoral environments where the transition dynamics are highly discontinuous and transient. This paper presents a hybrid observer for estimating continuous states and identifying state-driven mode switches for C-Ray, enabling autonomous water/land-transitions. The proposed observer is a combination of the multiplicative extended Kalman filter (MEKF) and the salted Kalman filter, a newly proposed Kalman filter for mapping state uncertainty during hybrid transitions. We also propose an altitude and sea floor geometry observer and incorporate this directly into the MEKF. The performance is evaluated in simulations.
|
|
10:00-10:05, Paper ThBT12.2 | |
LiDARDustX: A LiDAR Dataset for Dusty Unstructured Road Environments |
|
Wei, Chenfeng | Wuxi Intelligent Control Research Institute, HNU |
Wu, Qi | Wuxi Intelligent Control Research Institute, Hunan University |
Zuo, Si | Hunan University |
Xu, Jiahua | Wuxi Intelligent Control Research Institute, Hunan University |
Zhao, Boyang | Tsinghua University |
Zeyu, Yang | Hunan University |
Guotao, Xie | Hunan University |
Shenhong, Wang | Xi'an Jiaotong-Liverpool University |
Keywords: Data Sets for Robotic Vision, Object Detection, Segmentation and Categorization, Computer Vision for Transportation
Abstract: Abstract— Autonomous driving datasets are essential for validating the progress of intelligent vehicle algorithms, which include localization, perception, and prediction. However, existing datasets are predominantly focused on structured urban environments, which limits the exploration of unstructured and specialized scenarios, particularly those characterized by significant dust levels. This paper introduces the LiDARDustX dataset, which is specifically designed for perception tasks under high-dust conditions, such as those encountered in mining areas. The LiDARDustX dataset consists of 30,000 LiDAR frames captured by six different LiDAR sensors, each accompanied by 3D bounding box annotations and point cloud semantic segmentation. Notably, over 80% of the dataset comprises dust-affected scenes. By utilizing this dataset, we have established a benchmark for evaluating the performance of state-of-the-art 3D detection and segmentation algorithms. Additionally, we have ana- lyzed the impact of dust on perception accuracy and delved into the causes of these effects. The data and further information can be accessed at: https://github.com/vincentweikey/LiDARDustX.
|
|
10:05-10:10, Paper ThBT12.3 | |
How about Them Apples: 3D Pose and Cluster Estimation of Apple Fruitlets in a Commercial Orchard |
|
Qureshi, Ans | University of Auckland |
Smith, David Anthony James | University of Auckland |
Gee, Trevor | The University of Auckland |
Ahn, Ho Seok | The University of Auckland, Auckland |
McGuinness, Benjamin John | University of Waikato |
Downes, Catherine | University of Waikato |
Jangali, Rahul | The University of Waikato |
Black, Kale | Black Box Technologies LTD |
Lim, Shen Hin | University of Waikato |
Duke, Mike | Waikato University |
MacDonald, Bruce | University of Auckland |
Williams, Henry | University of Auckland |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, Field Robots
Abstract: Aotearoa’s apple industry struggles to maintain the skilled workforce required for fruitlet thinning each year. Skilled labourers play a pivotal role in managing crop loads by precisely thinning fruitlets to achieve the desired spacing for high-quality apple growth. This complex task requires accurate mapping of the fruitlets along each branch. This paper presents a novel vision system capable of mapping the orientation and clustering information of apple fruitlets as a human expert does. The vision system has been validated against data collected from a real-world commercial apple orchard. The results show an improved counting accuracy of 83.97% over prior implementations, an orientation estimate accuracy of 88.1%, and a clustering accuracy of 94.3%. Future work will utilise this information to determine which fruitlets to remove and then robotically thin them from the canopy.
|
|
10:10-10:15, Paper ThBT12.4 | |
Active Semantic Mapping with Mobile Manipulator in Horticultural Environments |
|
Cuaran, Jose | University of Illinois at Urbana-Champaign |
Singh Ahluwalia, Kulbir | University of Illinois at Urbana Champaing |
Koe, Kendall | University of Illinois Urbana Champaign |
Uppalapati, Naveen Kumar | University of Illinois at Urbana-Champaign |
Chowdhary, Girish | University of Illinois at Urbana Champaign |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, Mapping
Abstract: Semantic maps are fundamental for robotics tasks such as navigation and manipulation. They also enable yield prediction and phenotyping in agricultural settings. In this paper, we introduce an efficient and scalable approach for active semantic mapping in horticultural environments, employing a mobile robot manipulator equipped with an RGB-D camera. Our method leverages probabilistic semantic maps to detect semantic targets, generate candidate viewpoints, and compute the corresponding information gain. We present an efficient ray-casting strategy and a novel information utility function that accounts for both semantics and occlusions. The proposed approach reduces total runtime by 8% compared to previous baselines. Furthermore, our information metric surpasses other metrics in reducing multiclass entropy and improving surface coverage, particularly in the presence of segmentation noise. Real-world experiments validate our method's effectiveness but also reveal challenges such as depth sensor noise and varying environmental conditions, requiring further research.
|
|
10:15-10:20, Paper ThBT12.5 | |
Surface Roughness Estimation for Terrain Perception |
|
Ye, Minxiang | Zhejiang Lab |
Zhang, Yifei | Beihang University |
Gu, Jason | Dalhousie University |
Xiang, Senwei | Hangzhou International Innovation Institute, Beihang University, |
Kong, Lingyu | Zhejiang Lab |
Xie, Anhuan | Zhejiang University |
Keywords: Deep Learning for Visual Perception, Vision-Based Navigation, Legged Robots
Abstract: Ground terrain perception has become the primary visual task for the robust navigation of intelligent systems in unstructured outdoor environments. However, complex terrain poses a significant challenge to vision-based perception. This work introduces a novel estimation task using RGB images to facilitate low-cost terrain perception in extracting surface roughness information. The proposed task presents both semantic-aware and edge-aware roughness descriptors at the pixel level instead of a single value for a given image. To promote the research on the proposed novel terrain roughness estimation task, we introduce a multimodal synthetic dataset for terrain perception in outdoor scenes, containing multiple terrain categories, diverse viewpoints, different lighting and weather conditions, as well as semantic and roughness annotations. Additionally, inspired by computer graphics, we introduce TRENet, a roughness estimation architecture to model the intrinsic correlation of depth-normal-roughness. We also perform ablation studies on the effect of each component and diverse types of inputs. Extensive evaluations and comparisons demonstrate that our method can effectively predict pixel-wise terrain surface roughness with high accuracy.
|
|
10:20-10:25, Paper ThBT12.6 | |
Automatic Identification of Individual African Leopards in Unlabeled Camera Trap Images (I) |
|
Guo, Cheng | Colorado State University |
Miguel, Agnieszka | Seattle University |
Maciejewski, Anthony A. | Colorado State University |
Keywords: Computer Vision for Automation
Abstract: This article describes an algorithm to solve the real-world animal identification problem, i.e., determine the unknown number of K individual animals in a dataset of N unlabeled camera-trap images of African leopards, provided by Panthera. To determine the leopards’ IDs, we propose an effective automated algorithm, that consists of segmenting leopard bodies from images, scoring similarity between image pairs, and clustering followed by verification. To perform clustering, we employ a modified ternary search that uses a novel adaptive k-medoids++ clustering algorithm. The best clustering is determined using an expanded definition of the silhouette score. A new post-clustering verification procedure is used to further improve the quality of a clustering. The algorithm was evaluated using the Panthera dataset that consists of 677 individual leopards taken from 1555 images, and resulted in a clustering with an adjusted mutual information score of 0.958 as compared to 0.864 using a baseline k-medoids++ clustering algorithm.
|
|
10:25-10:30, Paper ThBT12.7 | |
RoadRunner M&M: Learning Multi-Range Multi-Resolution Traversability Maps for Autonomous Off-Road Navigation |
|
Patel, Manthan | ETH Zurich |
Frey, Jonas | ETH Zurich |
Atha, Deegan | Jet Propulsion Laboratory |
Spieler, Patrick | JPL |
Hutter, Marco | ETH Zurich |
Khattak, Shehryar | NASA Jet Propulsion Laboratory |
Keywords: Field Robots, Deep Learning for Visual Perception, Mapping
Abstract: Autonomous robot navigation in off-road environments requires a comprehensive understanding of the terrain geometry and traversability. The degraded perceptual conditions and sparse geometric information at longer ranges make the problem challenging especially when driving at high speeds. Furthermore, the sensing-to-mapping latency and the look-ahead map range can limit the maximum speed of the vehicle. Building on top of the recent work RoadRunner, in this work, we address the challenge of long-range (±100m) traversability estimation. Our RoadRunner (M&M) is an end-to-end learning-based framework that directly predicts the traversability and elevation maps at multiple ranges (±50m, ±100m) and resolutions (0.2m, 0.8m) taking as input multiple images and a LiDAR voxel map. Our method is trained in a self-supervised manner by leveraging the dense supervision signal generated by fusing predictions from an existing traversability estimation stack (X-Racer) in hindsight and satellite Digital Elevation Maps. RoadRunner M&M achieves a significant improvement of up to 50% for elevation mapping and 30% for traversability estimation over RoadRunner, and is able to predict in 30% more regions compared to X-Racer while achieving real-time performance. Experiments on various out-of-distribution datasets also demonstrate that our data-driven approach starts to generalize to novel unstructured environments. We integrate our proposed framework in closed-loop with the path planner to demonstrate autonomous high-speed off-road robotic navigation in challenging real-world environments. Project Page-https://leggedrobotics.github.io/roadrunner_mm
|
|
ThBT13 |
316 |
Perception Systems |
Regular Session |
Chair: Zhu, Pingping | Marshall University |
Co-Chair: Hays, James | Georgia Institute of Technology, Argo AI |
|
09:55-10:00, Paper ThBT13.1 | |
RipGAN: A GAN-Based Rip Current Data Augmentation Method |
|
Qian, Shenyang | UNSW Sydney |
Harley, Mitchell Dean | UNSW Sydney |
Razzak, Imran | MBZUAI |
Song, Yang | University of New South Wales |
Keywords: Computer Vision for Automation, Deep Learning Methods, Data Sets for Robotic Vision
Abstract: Rip currents are a major hazard on beaches worldwide, and their strong, offshore-directed currents can place even experienced beachgoers at risk of drowning. While it is intuitive to consider developing an automated rip current detection system to assist lifeguards in protecting beachgoers, rip current detection is in its infancy due to the lack of high-quality large-scale annotated rip current datasets. Also, the collection and annotation of rip current images require expert knowledge, which makes it more difficult to build datasets. So, this paper proposes a GAN-based rip current data augmentation method, RipGAN, to improve the performance of rip current detectors by increasing representative training data. To create new training images, RipGAN, has two branches. One is a texture generator that enriches the pattern and texture details of waves, making the image more realistic. The other is a rip generator based on FFFM-Unet. FFFM (Fast Fourier Fusion Module) uses Fast Fourier convolution to fuse the features from the low and the high layers, so as to further optimise the generated image. Furthermore, we trained Yolov8, YOLOv10, DINO and RT-DETR as rip current detectors to prove the effectiveness of RipGAN. The detectors' mAP 50:95 improved by 2.67% on the test set and AP 50 by 4.93% on real-scene videos, outperforming other data augmentation methods. Besides, abundant ablation studies have been conducted to further evaluate each component of RipGAN.
|
|
10:00-10:05, Paper ThBT13.2 | |
Points, Images and Texts: Boosting Point Cloud Completion with Multi-Modal Features |
|
Xia, ChengKai | Tongji University |
Lu, Fan | Tongji University |
Li, Bin | Tongji University |
Yu, Guo | Tongji University |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Chen, Guang | Tongji University |
Keywords: Computer Vision for Automation, Visual Learning, Semantic Scene Understanding
Abstract: Point cloud completion is crucial for reconstructing accurate shapes in many 3D visual applications. Recent approaches incorporate images into the completion pipeline, introducing geometric clues and global constraints. However, their fusion processes often fail to reconstruct detailed parts and maintain global consistency simultaneously. Except for images, text is another important clue for recognizing the target’s characteristics. Thus, in this work, we propose to combine multiple modalities including points, images and texts for point cloud completion. Specifically, inspired by recently pre-trained large language models, we generate the description texts for images by Visual Question Answering (VQA) models and introduce Visual-Textual Embedding (VTE) models to extract joint features of image-text pairs. Furthermore, we describe the edge geometric patterns by multi-scale edge convolution to guide the refinement of shapes in local areas. Then we adopt cross attention mechanism to effectively fuse multi-modal features and refine the coarse shape. Extensive experiments on the ShapeNet-ViPC benchmark demonstrate our method’s superior performance over previous uni-modal and cross-modal methods.
|
|
10:05-10:10, Paper ThBT13.3 | |
3DWG: 3D Weakly Supervised Visual Grounding Via Category and Instance-Level Alignment |
|
Li, Xiaoqi | Peking University |
Liu, Jiaming | Peking University |
Han, Nuowei | Beijing University of Posts and Telecommunications |
Heng, Liang | Peking University |
Guo, Yandong | OPPO Research Institute |
Dong, Hao | Peking University |
Liu, Yang | Peking University |
Keywords: Deep Learning for Visual Perception, RGB-D Perception, Visual Learning
Abstract: The 3D weakly-supervised visual grounding task aims to localize oriented 3D boxes in point clouds based on natural language descriptions without requiring annotations to guide model learning. This setting presents two primary challenges: category-level ambiguity and instance-level complexity. Category-level ambiguity arises from representing objects of fine-grained categories in a highly sparse point cloud format, making category distinction challenging. Instance-level complexity stems from multiple instances of the same category coexisting in a scene, leading to distractions during grounding. To address these challenges, we propose a novel weakly-supervised grounding approach that explicitly differentiates between categories and instances. In the category-level branch, we utilize extensive category knowledge from a pre-trained external detector to align object proposal features with sentence-level category features, thereby enhancing category awareness. In the instance-level branch, we utilize spatial relationship descriptions from language queries to refine object proposal features, ensuring clear differentiation among objects. These designs enable our model to accurately identify target-category objects while distinguishing instances within the same category. Compared to previous methods, our approach achieves state-of-the-art performance on three widely used benchmarks: Nr3D, Sr3D, and ScanRef.
|
|
10:10-10:15, Paper ThBT13.4 | |
MPI-Mamba : Cross Propagation Mamba for Multipath Interference Correction |
|
An, Kang | ShenZhen University |
Jiang, ZhaoXiang | Guangdong Laboratory of Artificial Intelligence and Digital Econ |
Tian, Jindong | Guangdong Laboratory of Artificial Intelligence and Digital Econ |
Keywords: RGB-D Perception, Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: Owing to their compact structure, high stability,and low cost, Indirect Time-of-Fligh (IToF) cameras have gained increasing attention in the fields of robotics and automation. However, in real-world scenarios, IToF cameras are affected by multipath interference, which severely degrades imaging quality. Existing learning-based methods for multipath interference correction are all based on CNN architectures and rely on synthetic datasets, leading to poor generalization in real-world scenarios. We proposed an efficient and accurate real data collection scheme and explored the application of Transformer and Mamba in multipath interference correction tasks. Additionally, we introduced a cross-propagation network that integrates Mamba and CNN modules, reducing system complexity to linear levels while achieving superior multipath interference correction compared to state-of-the-art methods.
|
|
10:15-10:20, Paper ThBT13.5 | |
SurgPLAN++: Universal Surgical Phase Localization Network for Online and Offline Inference |
|
Chen, Zhen | Centre for Artificial Intelligence and Robotics (CAIR), Hong Kon |
Luo, Xingjian | Centre for Artificial Intelligence and Robotics (CAIR) Hong Kong |
Wu, Jinlin | Institute of Automation, Chinese Academy of Sciences |
Bai, Long | The Chinese University of Hong Kong |
Lei, Zhen | Institute of Automation, Chinese Academy of Sciences |
Ren, Hongliang | Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS) |
Ourselin, Sebastien | University College London |
Liu, Hongbin | Hong Kong Institute of Science & Innovation, Chinese Academy Of |
Keywords: Recognition, Visual Learning, Deep Learning for Visual Perception
Abstract: Surgical phase recognition is critical for assisting surgeons in understanding surgical videos. Existing studies focused more on online surgical phase recognition, by leveraging preceding frames to predict the current frame. Despite great progress, they formulated the task as a series of frame-wise classification, which resulted in a lack of global context of the entire procedure and incoherent predictions. Moreover, besides online analysis, accurate offline surgical phase recognition is also in significant clinical need for retrospective analysis, and existing online algorithms do not fully analyze the entire video, thereby limiting accuracy in offline analysis. To overcome these challenges and enhance both online and offline inference capabilities, we propose a universal Surgical Phase Localization Network, named SurgPLAN++, with the principle of temporal detection. To ensure a global understanding of the surgical procedure, we devise a phase localization strategy for SurgPLAN++ to predict phase segments across the entire video through phase proposals. For online analysis, to generate high-quality phase proposals, SurgPLAN++ incorporates a data augmentation strategy to extend the streaming video into a pseudo-complete video through mirroring, center-duplication, and down-sampling. For offline analysis, SurgPLAN++ capitalizes on its global phase prediction framework to continuously refine preceding predictions during each online inference step, thereby significantly improving the accuracy of phase recognition. We perform extensive experiments to validate the effectiveness, and our SurgPLAN++ achieves remarkable performance in both online and offline modes, which outperforms state-of-the-art methods. The source code is available at https://github.com/franciszchen/SurgPLAN-Plus.
|
|
10:20-10:25, Paper ThBT13.6 | |
Real-Time LiDAR Point Cloud Compression and Transmission for Resource-Constrained Robots |
|
Cao, Yuhao | Harbin Institute of Technology Shenzhen |
Wang, Yu | University of Science and Technology of China |
Chen, Haoyao | Harbin Institute of Technology, Shenzhen |
Keywords: Robotics in Under-Resourced Settings, Field Robots
Abstract: LiDARs are widely used in autonomous robots due to their ability to provide accurate environment structural information. However, the large size of point clouds poses challenges in terms of data storage and transmission. In this paper, we propose a novel point cloud compression and transmission framework for resource-constrained robotic applications, called RCPCC. We iteratively fit the surface of point clouds with a similar range value and eliminate redundancy through their spatial relationships. Then, we use Shape-adaptive DCT (SA-DCT) to transform the unfit points and reduce the data volume by quantizing the transformed coefficients. We design an adaptive bitrate control strategy based on QoE as the optimization goal to control the quality of the transmitted point cloud. Experiments show that our framework achieves compression rates of 40x to 80x while maintaining high accuracy for downstream applications. our method significantly outperforms other baselines in terms of accuracy when the compression rate exceeds 70x. Fur thermore, in situations of reduced communication bandwidth, our adaptive bitrate control strategy demonstrates significant QoE improvements.
|
|
ThBT14 |
402 |
Language Guided Manipulation |
Regular Session |
Chair: Walter, Matthew | Toyota Technological Institute at Chicago |
Co-Chair: Chen, Haonan | University of Illinois at Urbana-Champaign |
|
09:55-10:00, Paper ThBT14.1 | |
A Shared Autonomy System for Precise and Efficient Remote Underwater Manipulation |
|
Phung, Amy | MIT-WHOI Joint Program |
Billings, Gideon | University of Sydney, Australian Center for Field Robotics |
Daniele, Andrea F | Toyota Technological Institute at Chicago |
Walter, Matthew | Toyota Technological Institute at Chicago |
Camilli, Richard | Woods Hole Oceanographic Institution |
Keywords: Cognitive Human-Robot Interaction, Perception for Grasping and Manipulation, Virtual Reality and Interfaces, Shared Autonomy and Field Robotics
Abstract: Conventional underwater intervention operations using robotic vehicles require expert teleoperators and limit interaction with remote scientists. We present the SHared Autonomy for Remote Collaboration (SHARC) framework that enables novice operators to cooperatively conduct underwater sampling and manipulation tasks. With SHARC, operators can plan and complete manipulation tasks using natural language or hand gestures through a virtual reality (SHARC-VR) interface. The interface provides remote operators with a contextual 3D scene understanding that is updated according to bandwidth availability. Evaluation of the SHARC framework through controlled lab experiments demonstrates that SHARC-VR enables novice operators to complete manipulation tasks in framerate-limited conditions (i.e., 0.1–0.5 frames per second) faster than expert pilots using a conventional topside controller. For both novice and expert users, the SHARC-VR interface also increases the task completion rate and improves sampling precision. The SHARC framework is readily extensible to other hardware architectures, including terrestrial and space systems.
|
|
10:00-10:05, Paper ThBT14.2 | |
E2Map: Experience-And-Emotion Map for Self-Reflective Robot Navigation with Language Models |
|
Kim, Chan | Seoul National University |
Kim, Keonwoo | Seoul National University |
Oh, Mintaek | Seoul National University |
Baek, Hanbi | Seoul National University |
Lee, Jiyang | Seoul National University |
Jung, Donghwi | Seoul National University |
Woo, Soojin | Seoul National University |
Woo, Younkyung | Carnegie Mellon University |
Tucker, John | Stanford University |
Firoozi, Roya | Stanford University |
Seo, Seung-Woo | Seoul National University |
Schwager, Mac | Stanford University |
Kim, Seong-Woo | Seoul National University |
Keywords: AI-Enabled Robotics, Learning from Experience, Emotional Robotics
Abstract: Large language models (LLMs) have shown significant potential in guiding embodied agents to execute language instructions across a range of tasks, including robotic manipulation and navigation. However, existing methods are primarily designed for static environments and do not leverage the agent's own experiences to refine its initial plans. Given that real-world environments are inherently stochastic, initial plans based solely on LLMs' general knowledge may fail to achieve their objectives, unlike in static scenarios. To address this limitation, this study introduces the Experience-and-Emotion Map (E2Map), which integrates not only LLM knowledge but also the agent's real-world experiences, drawing inspiration from human emotional responses. The proposed methodology enables one-shot behavior adjustments by updating the E2Map based on the agent's experiences. Our evaluation in stochastic navigation environments, including both simulations and real-world scenarios, demonstrates that the proposed method significantly enhances performance in stochastic environments compared to existing LLM-based approaches.
|
|
10:05-10:10, Paper ThBT14.3 | |
Improving Zero-Shot ObjectNav with Generative Communication |
|
Dorbala, Vishnu Sashank | University of Maryland, College Park |
Sharma, Vishnu D. | Nokia Bell Labs |
Tokekar, Pratap | University of Maryland |
Manocha, Dinesh | University of Maryland |
Keywords: Agent-Based Systems, Domestic Robotics, AI-Enabled Robotics
Abstract: We propose a new method for improving Zero-Shot ObjectNav that aims to utilize potentially available environmental percepts for navigational assistance. Our approach takes into account that the ground agent may have limited and sometimes obstructed view. Our formulation encourages Generative Communication (GC) between an assistive overhead agent with a global view containing the target object and the ground agent with an obfuscated view; both equipped with Vision-Language Models (VLMs) for vision-to-language translation. In this assisted setup, the embodied agents communicate environmental information before the ground agent executes actions towards a target. Despite the overhead agent having a global view with the target, we note a drop in performance (-13% in OSR and -13% in SPL) of a fully cooperative assistance scheme over an unassisted baseline. In contrast, a selective assistance scheme where the ground agent retains its independent exploratory behaviour shows a 10% OSR and 7.65% SPL improvement. To explain navigation performance, we analyze the GC for unique traits, quantifying the presence of hallucination and cooperation. Specifically, we identify the novel linguistic trait of preemptive hallucination in our embodied setting, where the overhead agent assumes that the ground agent has executed an action in the dialogue when it is yet to move, and note its strong correlation with navigation performance. We conduct real-world experiments and present some qualitative examples where we mitigate hallucinations via prompt finetuning to improve ObjectNav performance.
|
|
10:10-10:15, Paper ThBT14.4 | |
Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models |
|
Chen, Annie | Stanford University |
Lessing, Alec | Stanford |
Tang, Andy | Stanford University |
Chada, Govind | Stanford University |
Smith, Laura | UC Berkeley |
Levine, Sergey | UC Berkeley |
Finn, Chelsea | Stanford University |
Keywords: AI-Based Methods, Autonomous Agents, Legged Robots
Abstract: Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. For example, in a search and rescue mission, a legged robot could climb over debris, crawl through gaps, and navigate out of dead ends. However, the robot’s controller needs to respond intelligently to such varied obstacles, and this requires handling unexpected and unusual scenarios successfully. This presents an open challenge to current learning methods, which often struggle with generalization to the long tail of unexpected situations without heavy human supervision. To address this issue, we investigate how to leverage the broad knowledge about the structure of the world and commonsense reasoning capabilities of vision-language models (VLMs) to aid legged robots in handling difficult, ambiguous situations. We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection with VLMs: (1) in-context adaptation over previous robot interactions and (2) planning multiple skills into the future and replanning. We evaluate VLM-PC on several challenging real-world obstacle courses, involving dead ends and climbing and crawling, on a Go1 quadruped robot. Our experiments show that by reasoning over the history of interactions and future plans, VLMs enable the robot to autonomously perceive, navigate, and act in a wide range of complex scenarios that would otherwise require environment- specific engineering or human guidance.
|
|
10:15-10:20, Paper ThBT14.5 | |
Language-Guided Object-Centric Diffusion Policy for Generalizable and Collision-Aware Manipulation |
|
Li, Hang | Technical University of Munich |
Feng, Qian | Technical University of Munich |
Zheng, Zhi | TUM |
Feng, Jianxiang | Technical University of Munich (TUM) |
Chen, Zhaopeng | University of Hamburg |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: Imitation Learning, Manipulation Planning, Learning from Demonstration
Abstract: Learning from demonstrations faces challenges in generalizing beyond the training data and often lacks collision awareness. This paper introduces Lan-o3dp, a language-guided object-centric diffusion policy framework that can adapt to unseen situations such as cluttered scenes, shifting camera views, ambiguous similar objects, while offering training-free collision avoidance and achieving high success rate with few demonstrations. We train diffusion model conditioned on 3D point clouds of task-relevant objects to predict the robot's end-effector trajectories, enabling it to complete the tasks. During inference we incorporate cost optimization into denoising steps to guide the generated trajectory to be collision free. We leverage open-set segmentation to obtain the 3D point clouds of related objects and use a large language model to identify the target objects and possible obstacles by interpreting the user's natural language instructions. To effectively guide the conditional diffusion model using time-independent cost function, we proposed a novel guided generation mechanism based on the estimated clean trajectories. In simulation, we showed that diffusion policy based on the object-centric 3D representation achieves a much higher success rate (68.7%) compared to baselines with simple 2D (39.3%) and 3D scene (43.6%) representations, across 21 challenging RLBench tasks with only 40 demonstrations. In real-world experiments, we extensively evaluated the generalization in various unseen situations and validated the effectiveness of the proposed zero-shot cost-guided collision avoidance.
|
|
10:20-10:25, Paper ThBT14.6 | |
This&That: Language-Gesture Controlled Video Generation for Robot Planning |
|
Wang, Boyang | University of Michigan |
Sridhar, Nikhil | University of Michigan |
Feng, Chao | University of Michigan - Ann Arbor |
Van der Merwe, Mark | University of Michigan |
Fishman, Adam | OpenAI |
Fazeli, Nima | University of Michigan |
Park, Jeong Joon | University of Michigan, Ann Arbor |
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation, Embodied Cognitive Science
Abstract: Clear, interpretable instructions are invaluable for complex tasks, helping to clarify goals and anticipate necessary steps. In this work, we propose a robot learning framework for communicating, planning, and executing a wide range of tasks, dubbed This&That. This&That solves general tasks by leveraging video generative models, which, through training on internet-scale data, contain rich physical and semantic context. Through this work, we tackle three fundamental challenges in video-based planning: 1) unambiguous task communication with simple human instructions, 2) controllable video generation that respects user intent, and 3) translating visual plans into robot actions. This&That adds gesture conditioning alongside language to generate video predictions, as a succinct and unambiguous alternative to existing language-only methods, especially in complex and uncertain environments. These video predictions are then fed into a behavior cloning architecture dubbed Diffusion Video to Action (DiVA), which outperforms prior state-of-the-art behavior cloning and video-based planning methods by substantial margins.
|
|
ThBT15 |
403 |
Robot Safety |
Regular Session |
Chair: Vela, Patricio | Georgia Institute of Technology |
Co-Chair: Koga, Shumon | Kobe University |
|
09:55-10:00, Paper ThBT15.1 | |
Quantifying the Risk of Unmapped Associations for Mobile Robot Localization Safety |
|
Chen, Yihe | Illinois Insitute of Technology |
Pervan, Boris | Illinois Institute of Technology |
Spenko, Matthew | Illinois Institute of Technology |
Keywords: Robot Safety, Localization, Integrity Risk, Probability and Statistical Methods
Abstract: Integrity risk is a measure of localization safety that accounts for the presence of undetected sensor faults. The metric has been used for decades in aviation and has recently been applied to terrestrial robots operating in life-critical missions. For ground vehicles, integrity risk can be quantified for systems using lidar measurements, where two specific fault types have been identified: miss-association and unmapped association. While miss-association faults, which occur when a correctly extracted feature is associated to the wrong landmark, have been well-studied, the probability of an unmapped association fault, where an incorrectly extracted feature is associated to a landmark, is not well-understood. Namely, previous research has never quantified this value and instead relies on an assumed value, one whose value has not been properly justified. This work is the first to provide a methodology that estimates the risk of unmapped association for each mapped landmark; the paper demonstrates the effect of this probability for both the chi-squared and fixed-lag smoothing methods for integrity monitoring. Data collected in downtown Chicago, IL USA was used to tes
|
|
10:00-10:05, Paper ThBT15.2 | |
Control Strategies for Pursuit-Evasion under Occlusion Using Visibility and Safety Barrier Functions |
|
Zhou, Minnan | University of California, San Diego |
Shaikh, Mustafa | University of California, San Diego |
Chaubey, Vatsalya | University of California, San Diego |
Haggerty, Patrick | General Dynamics Mission Systems |
Koga, Shumon | Honda Research and Development |
Panagou, Dimitra | University of Michigan, Ann Arbor |
Atanasov, Nikolay | University of California, San Diego |
Keywords: Sensor-based Control, Vision-Based Navigation, Robot Safety
Abstract: This paper develops a control strategy for pursuit-evasion problems in environments with occlusions. We address the challenge of a mobile pursuer keeping a mobile evader within its field of view (FoV) despite line-of-sight obstructions. The signed distance function (SDF) of the FoV is used to formulate visibility as a control barrier function (CBF) constraint on the pursuer's control inputs. Similarly, obstacle avoidance is formulated as a CBF constraint based on the SDF of the obstacle set. While the visibility and safety CBFs are Lipschitz continuous, they are not differentiable everywhere, necessitating the use of generalized gradients. To achieve non-myopic pursuit, we generate reference control trajectories leading to evader visibility using a sampling-based kinodynamic planner. The pursuer then tracks this reference via convex optimization under the CBF constraints. We validate our approach in CARLA simulations and real-world robot experiments, demonstrating successful visibility maintenance using only onboard sensing, even under severe occlusions and dynamic evader movements.
|
|
10:05-10:10, Paper ThBT15.3 | |
Dynamic Gap: Safe Gap-Based Navigation in Dynamic Environments |
|
Asselmeier, Maxwell | Georgia Institute of Technology |
Ahuja, Dhruv | Georgia Institute of Technology |
Zaro, Abdel | University of California, Berkeley |
Abuaish, Ahmad | Georgia Institute of Technology |
Zhao, Ye | Georgia Institute of Technology |
Vela, Patricio | Georgia Institute of Technology |
Keywords: Vision-Based Navigation, Motion and Path Planning, Collision Avoidance
Abstract: This paper extends the family of gap-based local planners to unknown dynamic environments through generating provably collision-free properties for hierarchical navigation systems. Existing perception-informed local planners that operate in dynamic environments rely on emergent or empirical robustness for collision avoidance as opposed to performing formal analysis of dynamic obstacles. In addition to this, the obstacle tracking that is performed in these existent planners is often achieved with respect to a global inertial frame, subjecting such tracking estimates to transformation errors from odometry drift. The proposed local planner, dynamic gap, shifts the tracking paradigm to modeling how the free space, represented as gaps, evolves over time. Gap crossing and closing conditions are developed to aid in determining the feasibility of passage through gaps, and a breadth of simulation benchmarking is performed against other navigation planners in the literature where the proposed dynamic gap planner achieves the highest success rate out of all planners tested in all environments.
|
|
10:10-10:15, Paper ThBT15.4 | |
Conformalized Reachable Sets for Obstacle Avoidance with Spheres |
|
Kwon, Yong Seok | University of Michigan |
Michaux, Jonathan | University of Michigan |
Isaacson, Seth | University of Michigan |
Zhang, Bohao | University of Michigan |
Ejakov, Matthew | University of Michigan |
Skinner, Katherine | University of Michigan |
Vasudevan, Ram | University of Michigan |
Keywords: Robot Safety, Planning under Uncertainty, Constrained Motion Planning
Abstract: Safe motion planning algorithms are necessary for deploying autonomous robots in unstructured environments. Motion plans must be safe to ensure that the robot does not harm humans or damage any nearby objects. Generating these motion plans in real-time is also important to ensure that the robot can adapt to sudden changes in its environment. Many trajectory optimization methods introduce heuristics that balance safety and real-time performance, potentially increasing the risk of the robot colliding with its environment. This paper addresses this challenge by proposing Conformalized Reachable Sets for Obstacle Avoidance With Spheres (CROWS). CROWS is a novel real-time, receding-horizon trajectory planner that generates probablistically-safe motion plans. Offline, CROWS learns a novel neural network-based representation of a sphere-based reachable set that overapproximates the swept volume of the robot's motion. CROWS then uses conformal prediction to compute a confidence bound that provides a probabilistic safety guarantee on the learned reachable set. At runtime, CROWS performs trajectory optimization to select a trajectory that is probabilstically-guaranteed to be collision-free. We demonstrate that CROWS outperforms a variety of state-of-the-art methods in solving challenging motion planning tasks in cluttered environments while remaining collision-free. Code, data, and video demonstrations can be found at url{https://roahmlab.github.io/crows/}.
|
|
10:15-10:20, Paper ThBT15.5 | |
System-Level Safety Monitoring and Recovery for Perception Failures in Autonomous Vehicles |
|
Chakraborty, Kaustav | University of Southern California |
Feng, Zeyuan | Stanford University |
Veer, Sushant | NVIDIA |
Sharma, Apoorva | NVIDIA |
Ivanovic, Boris | NVIDIA |
Pavone, Marco | Stanford University |
Bansal, Somil | Stanford University |
Keywords: Intelligent Transportation Systems, Failure Detection and Recovery, Autonomous Vehicle Navigation
Abstract: The safety-critical nature of autonomous vehicle(AV) operation necessitates development of task-relevant algorithms that can reason about safety at the system level and not just at the component level. To reason about the impact of a perception failure on the entire system performance, such task-relevant algorithms must contend with various challenges: complexity of AV stacks, high uncertainty in the operating environments, and the need for real-time performance. To overcome these challenges, in this work, we introduce a Q-network called SPARQ (abbreviation for Safety evaluation for Perception And Recovery Q-network) that evaluates the safety of a plan generated by a planning algorithm, accounting for perception failures that the planning process may have overlooked. This Q-network can be queried during system runtime to assess whether a proposed plan is safe for execution or poses potential safety risks. If a violation is detected, the network can then recommend a corrective plan while accounting for the perceptual failure. We validate our algorithm using the NuPlan-Vegas dataset, demonstrating its ability to handle cases where a perception failure compromises a proposed plan, while the corrective plan remains safe. We observe an overall accuracy and recall of 90% while sustaining a frequency of 42HZ on the unseen testing dataset. We compare our performance to a popular reachability based baseline and analysed some interesting properties of our approach in improving the safety properties of an AV pipeline.
|
|
10:20-10:25, Paper ThBT15.6 | |
Safety Filtering While Training: Improving the Performance and Sample Efficiency of Reinforcement Learning Agents |
|
Pizarro Bejarano, Federico | University of Toronto |
Brunke, Lukas | University of Toronto |
Schoellig, Angela P. | TU Munich |
Keywords: Robot Safety, Reinforcement Learning, Machine Learning for Robot Control
Abstract: Reinforcement learning (RL) controllers are flexible and performant but rarely guarantee safety. Safety filters impart hard safety guarantees to RL controllers while maintaining flexibility. However, safety filters can cause undesired behaviours due to the separation between the controller and the safety filter, often degrading performance and robustness. In this paper, we analyze several modifications to incorporating the safety filter in training RL controllers rather than solely applying it during evaluation. The modifications allow the RL controller to learn to account for the safety filter. This paper presents a comprehensive analysis of training RL with safety filters, featuring simulated and real-world experiments with a Crazyflie 2.0 drone. We examine how various training modifications and hyperparameters impact performance, sample efficiency, safety, and chattering. Our findings serve as a guide for practitioners and researchers focused on safety filters and safe RL.
|
|
ThBT16 |
404 |
Soft Robotics 1 |
Regular Session |
Chair: Dorsey, Kristen | Northeastern University |
Co-Chair: Caldwell, Darwin G. | Istituto Italiano Di Tecnologia |
|
09:55-10:00, Paper ThBT16.1 | |
Pneumatic Logic Systems for Selectively Operating Distributed Pneumatic Elements |
|
Ferrin Pozuelo, Rafael | National Institute of Advanced Industrial Science and Technology |
Tomita, Kohji | National Institute of Advanced Industrial Science AndTechnology |
Kamimura, Akiya | National Institute of Advanced Industrial Science and Technology |
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Hydraulic/Pneumatic Actuators
Abstract: Microfluidic and pneumatic logic systems are valuable for applications such as lab-on-a-chip devices, soft robotics, and factory automation. These systems are particularly advantageous when metal or electronic components are impractical or when there are constraints on the control system volume or weight. This paper introduces a novel individual membrane valve that functions as a set-reset latch and can reduce the number of valves required for some pneumatic or microfluidic logic systems. An application of pneumatic logic systems in soft robotics is the access to multiple tethered pneumatic elements through a reduced number of pneumatic lines. To this end, this paper proposes two pneumatic logic systems capable of selecting among multiple distributed sets of pneumatic elements and operating the elements of the set simultaneously and independently through the different pneumatic lines. The selection is achieved via a sequence of pressure pulses applied on the same lines used afterwards for operation. Two prototypes of these pneumatic logic systems were built and successfully demonstrated, consisting primarily of set-reset membrane valves and powered by binary high/low pressure sources. The first prototype features a hierarchical network with four lines and five sets of three pneumatic elements each; the second prototype features a non-hierarchical network with five lines and twelve sets of four pneumatic elements each.
|
|
10:00-10:05, Paper ThBT16.2 | |
Helical Structured Soft Growing Robot for Hazardous Gas Suction in Inaccessible Environments |
|
Lee, Sanghun | Korea Advanced Institute of Science and Technology |
Kim, Nam Gyun | Korea Advanced Institute of Science and Technology |
Seo, Dongoh | Korea Advanced Institute of Science and Technology |
Park, Shinwoo | KAIST |
Ryu, Jee-Hwan | Korea Advanced Institute of Science and Technology |
Keywords: Soft Robot Applications, Soft Robot Materials and Design, Modeling, Control, and Learning for Soft Robots
Abstract: Immediate removal of hazardous gases is critical for ensuring safety. Traditional methods, such as portable ventilation equipment, are difficult to use when hazardous gases are released in inaccessible environments. In this paper, we propose a novel mechanism that integrates an inflatable helical structure into a soft growing robot. The proposed mechanism is capable of performing suction through its inner channel after navigating complex environments, while maintaining the inherent advantages of the soft growing robot as it grows. The mechanism operates in two phases: a growing phase, in which the robot extends by eversion, and a suction phase, in which suction is performed through the inner channel of the robot. Experiments and demonstrations were conducted to evaluate the performance of the proposed mechanism. The experimental results confirmed the ability to maintain the passageway shape of the inner channel during suction operations and provided a design guideline. The demonstration validated that the mechanism can effectively navigate inaccessible environments and perform suction to remove hazardous gases.
|
|
10:05-10:10, Paper ThBT16.3 | |
Shape-Programming Robotic Reflectors for Wireless Networks |
|
Liu, Yawen | Carnegie Mellon University |
Prabhakara, Akarsh | Carnegie Mellon University |
Zhu, Jiangyifei | Carnegie Mellon University |
Qiao, Shenyi | Carnegie Mellon University |
Kumar, Swarun | Carnegie Mellon University |
Keywords: Soft Robot Applications, Automation Technologies for Smart Cities, Sensor Networks
Abstract: With the increasing use of wireless technologies in robotics for communication, sensing, and localization, the potential benefits of how robotics can complement and enhance wireless systems remain underexplored. This paper explores a novel application of the existing inflatable robots for wireless communication systems by forming a shape-programming, reflective waveguide that enhances the received signal quality for wireless devices. Our primary target is enhancing Low-Power Wide-Area Networks (LP WANs) – where 10-year battery-powered client devices (e.g. energy meters or smart home sensors) connect to cellular-like powered base stations to deliver data. Devices in these networks often experience significant seasonal variability in battery life – even simple obstructions between the device and base station (e.g. due to construction) can shave off years of battery life. We propose MetaMorph, a programmable robotic reflector attached to base stations that enhances signal quality from client devices by enhancing received signal energy with controlled reflections. We investigate the design of the reflector, and our experiments show the ability to improve the signal quality for LP-WAN(LoRa) communication systems demonstrating signal quality and battery-benefits. To our best knowledge, MetaMorph is the first paper to explore how flexible robotics can serve as virtuous reflectors for wireless communication systems.
|
|
10:10-10:15, Paper ThBT16.4 | |
MORF: Magnetic Origami Reprogramming and Folding System for Repeatably Reconfigurable Structures with Fold Angle Control |
|
Unger, Gabriel | University of Pennsylvania |
Shenoy, Sridhar | University of Pennsylvania |
Li, Tianyu | University of Pennsylvania |
Figueroa, Nadia | University of Pennsylvania |
Sung, Cynthia | University of Pennsylvania |
Keywords: Soft Robot Materials and Design, Soft Robot Applications
Abstract: We present the Magnetic Origami Reprogramming and Folding System (MORF), a magnetically reprogrammable system capable of precise shape control, repeated transformations, and adaptive functionality for robotic applications. Unlike current self-folding systems, which often lack re-programmability or lose rigidity after folding, MORF generates stiff structures over multiple folding cycles without degradation in performance. The ability to reconfigure and maintain structural stability is crucial for tasks such as reconfigurable tooling. The system utilizes a thermoplastic layer sandwiched within a thin magnetically responsive laminate sheet, enabling structures to self-fold in response to a combination of external magnetic field and heating. We demonstrate that the resulting folded structures can bear loads over 40 times their own weight and can undergo up to 50 cycles of repeated transformations without losing structural integrity. We showcase these strengths in a reconfigurable tool for unscrewing and screwing bolts and screws of various sizes, allowing the tool to adapt its shape to different bolt sizes while withstanding the mechanical stresses involved. This capability highlights the system’s potential for task-varying, load-bearing applications in robotics, where both versatility and durability are essential.
|
|
10:15-10:20, Paper ThBT16.5 | |
Tunable Leg Stiffness in a Monopedal Hopper for Energy-Efficient Vertical Hopping across Varying Ground Profiles |
|
Chen, Rongqian | George Washington University |
Kwon, Jun | University of Pennsylvania |
Wu, Kefan | University of Connecticut |
Chen, Wei-Hsi | University of Pennsylvania |
Keywords: Soft Robot Applications, Legged Robots, Mechanism Design
Abstract: We present the design and implementation of HASTA (Hopper with Adjustable Stiffness for Terrain Adaption), a vertical hopping robot with real-time tunable leg stiffness, aimed at optimizing energy efficiency across various ground profiles (a pair of ground stiffness and damping conditions). By adjusting leg stiffness, we aim to maximize apex hopping height, a key metric for energy-efficient vertical hopping. We hypothesize that softer legs perform better on soft, damped ground by minimizing penetration and energy loss, while stiffer legs excel on hard, less damped ground by reducing limb deformation and energy dissipation. Through experimental tests and simulations, we find the best leg stiffness within our selection for each combination of ground stiffness and damping, enabling the robot to achieve maximum steady-state hopping height with a constant energy input. These results support our hypothesis that tunable stiffness improves energy-efficient locomotion in controlled experimental conditions. In addition, the simulation provides insights that could aid in future development of controllers for selecting leg stiffness.
|
|
10:20-10:25, Paper ThBT16.6 | |
Online Learning Based Shape Control for a Soft Manipulator Based on Spatial Features Feedback |
|
Shen, Yi | Huazhong University of Science and Technology |
Zhang, Jinghao | Huazhong University of Science and Technology |
Yuan, Ye | Huazhong University of Science and Technology |
Zhang, Fumin | Hong Kong University of Science and Technology |
Ding, Han | Huazhong University of Science and Technology |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications
Abstract: Although soft manipulators are endowed with compliance and flexibility, most control strategies focus on end-effector control and lack shape control ability. This letter aims to design a shape controller for the soft manipulator. Firstly, we establish a modified forward kinematics model (FKM) based on the long-short-term-memory (LSTM) neural network to describe the mapping between actuation inputs and spatial features. The spatial features consist of the backbone curve and contour features. The backbone curve is represented by the piecewise Bézier curve under geometrically continuous constraint. The contour features are extracted from the camera-generated point cloud. Besides, an adaptive online learning based shape controller (OLSC) is designed by online back-propagating shape error. The stability of OLSC is proved based on the Lyapunov theorem. Finally, the random excitation model validation experiment demonstrates the prediction accuracy of the proposed modified FKM, and the shape control experiments in air and water validate the effectiveness of the proposed OLSC.
|
|
10:25-10:30, Paper ThBT16.7 | |
Augmenting Compliance with Motion Generation through Imitation Learning Using Drop-Stitch Reinforced Inflatable Robot Arm with Rigid Joints |
|
Gubbala, Gangadhara Naga Sai | Waseda University |
Nagashima, Masato | Waseda University |
Mori, Hiroki | Waseda University |
Seong, Young Ah | The University of Tokyo |
Sato, Hiroki | The University of Tokyo |
Niiyama, Ryuma | Meiji University |
Suga, Yuki | Waseda University |
Ogata, Tetsuya | Waseda University |
Keywords: Modeling, Control, and Learning for Soft Robots, Deep Learning Methods, Soft Robot Materials and Design
Abstract: Safe physical human-robot collaboration can be possible with soft robots due to their inherent compliance and low inertia. Soft bodies provide passive compliance and adaptability due to their deformations, but these same characteristics also lead to difficulty in dynamic control and mathematical modeling. We focus on motion generation for a 3-DOF (Degree of freedom) inflatable robot arm, consisting of soft inflatable body links and rigid joints. This research explores the limitations of relying only on soft robot compliance for contact-based tasks. Our goal is to generate adaptive motion for contact-based tasks by exploiting the compliance of the soft links. We compare contact-based tasks for the inflatable robot with and without a learning model. This shows improved performance when soft robot compliance is augmented with imitation learning. The combination of soft robot compliance and the machine learning model's adaptability shows the potential for collaborative robots to interact with humans and their surroundings safely.
|
|
ThBT17 |
405 |
Planning, Scheduling and Coordination |
Regular Session |
Chair: Pecora, Federico | Amazon Robotics |
Co-Chair: Rastgoftar, Hossein | University of Arizona |
|
09:55-10:00, Paper ThBT17.1 | |
Safe Human-UAS Collaboration from High-Level Planning to Low-Level Tracking (I) |
|
Rastgoftar, Hossein | University of Arizona |
Keywords: Planning, Scheduling and Coordination, Intention Recognition, Aerial Systems: Applications
Abstract: This paper studies the problem of safe human-uncrewed aerial system (UAS) collaboration in a shared work environment. By considering human and UAS as co-workers, we use Petri Nets to abstractly model evolution of shared tasks assigned to human and UAS co-workers. Particularly, the Petri Nets’ “places” represent work stations; therefore, the Petri Nets’ transitions can formally specify displacements between the work stations. The paper’s first objective is to incorporate uncertainty regarding the intentions of human co-workers into motion planning for UAS, when UAS closely interacts with human co-workers. To this end, the proposed Petri Nets model uses “conflict” constructs to represent situations at which UAS deals with incomplete knowledge about human co-worker intention. The paper’s second objective is then to plan the motion of the UAS in a resilient and safe manner, in the presence of non-cooperative human co-workers. In order to achieve this objective, UAS equipped with onboard perception and decision-making capabilities are able to, through real-time processing of in-situ observation, predict human intention, quantify human distraction, and apply a non-stationary Markov Decision Process (MDP) model to safely plan UAS motion in the presence of uncertainty. Given the current and next UAS waypoints, the paper applies Potryagin’s minimal principle to plan the desired trajectory of the UAS and uses feedback linearaztion method for trajectory tracking control.
|
|
10:00-10:05, Paper ThBT17.2 | |
Reliable and Efficient Multi-Agent Coordination Via Graph Neural Network Variational Autoencoders |
|
Meng, Yue | Massachusetts Institute of Technology |
Majcherczyk, Nathalie | Worcester Polytechnic Institute |
Liu, Wenliang | Amazon |
Kiesel, Scott | Amazon |
Fan, Chuchu | Massachusetts Institute of Technology |
Pecora, Federico | Amazon Robotics |
Keywords: Planning, Scheduling and Coordination, Multi-Robot Systems, Deep Learning Methods
Abstract: Multi-agent coordination is crucial for reliable multi-robot navigation in shared spaces such as automated warehouses. In regions of dense robot traffic, local coordination methods may fail to find a deadlock-free solution. In these scenarios, it is appropriate to let a central unit generate a global schedule that decides the passing order of robots. However, the runtime of such centralized coordination methods increases significantly with the problem scale. In this paper, we propose to leverage Graph Neural Network Variational Autoencoders (GNN-VAE) to solve the multi-agent coordination problem faster than through centralized optimization at scale. We formulate the coordination problem as a graph problem and collect ground truth data using a Mixed-Integer Linear Program (MILP) solver. During training, our learning framework encodes good quality solutions of the graph problem into a latent space. At inference time, solution samples are decoded from the sampled latent variables, and the lowest-cost sample is selected for coordination. By construction, our GNN-VAE framework returns solutions that always respect the constraints of the considered coordination problem. Numerical results show that our approach trained on small-scale problems can achieve high-quality solutions even for large-scale problems with 250 robots, being much faster than other baselines.
|
|
10:05-10:10, Paper ThBT17.3 | |
Efficient Cross-Boundary Grasping in Stacked Clutter with Single-Visual Mapping Multi-Step |
|
Luo, Yudong | Dalian Maritime University |
Wang, Tong | Dalian Martime University |
Xie, Feiyu | Dalian Maritime University |
Zhao, Na | Dalian Maritime University |
Fu, Xianping | Dalian Maritime University |
Shen, Yantao | University of Nevada, Reno |
Keywords: Logistics, Factory Automation
Abstract: In logistics applications, the vision-based technology for grasping target objects in the air is relatively mature. However, when operating across the air and water, such as grasping marine products from the water, the visual information collected by the camera will be disturbed by ripples and bubbles on the water surface, resulting in low grasping efficiency. Therefore, we introduce a grasping strategy based on single-visual mapping for multi-step (SVMMS) operations, which is suitable for cross-medium operations involving stacked objects. Specifically, we design a multifunctional integrated network model based on Deep Q-learning, which extracts visual features from the scene to detect stacked objects and outputs their hierarchical relationships effectively. Moreover, we quantify the potential relationship between motion logic during action execution and changes in RGB-D information to help the robot achieve efficient and collision-free operations. Our approach also incorporates a time-series design with prioritized experience replay to optimize the action sequence globally. Additionally, we propose a novel sim2real method by combining domain randomization to address the difference in object sizes between the simulation and the real world. Extensive experiments in both simulation and physical environments show that SVMMS-Grasp significantly outperforms existing methods regarding task success rate, stability, and operational efficiency.
|
|
10:10-10:15, Paper ThBT17.4 | |
Efficient Second-Order Cone Programming for the Close Enough Traveling Salesman Problem |
|
Gutow, Geordan | Carnegie Mellon University |
Choset, Howie | Carnegie Mellon University |
Keywords: Planning, Scheduling and Coordination, Optimization and Optimal Control, Motion and Path Planning
Abstract: When agents must execute multiple tasks at spatially distinct locations, it is common to formulate and solve a Traveling Salesman Problem (TSP) to find the order of locations (targets) that requires the smallest travel cost. Approaching such task sequencing problems as a TSP is restrictive, as it requires that unique locations be specified for each task. In reality a set of acceptable locations might be available. The Close Enough Traveling Salesman Problem (CETSP) is a generalization of the Traveling Salesman Problem in which the agent needs only visit a spherical neighborhood surrounding each target, and can thus address this task sequencing problem when any location in a sphere is acceptable. Prior work has developed a branch-and-bound approach that finds globally optimal solutions to instances of the CETSP by solving a sequence of Second-Order Cone Programs (SOCP). We demonstrate it is possible to eliminate 2/3 of the variables and 1/2 of the constraints in these SOCPs, show how to reuse computation and memory allocation across multiple SOCPs in the sequence, and propose a strategy to warm-start the SOCPs using solutions obtained earlier in the sequence. Collectively, these three changes halve the time required to solve 210 random CETSP instances to optimality. We also obtained improved lower bounds on 73 instances from the literature, including solving one instance to optimality for the first time.
|
|
10:15-10:20, Paper ThBT17.5 | |
Decoupled Training Neural Solver for Dynamic Traveling Salesman Problem |
|
Lin, Shaoheng | South China University of Technology |
Cui, Hanyun | South China University of Technology |
Yang, Wang | South China University of Technology |
Jia, Ya-Hui | South China University of Technology |
Keywords: Planning, Scheduling and Coordination, Planning under Uncertainty, Task Planning
Abstract: Deep reinforcement learning (DRL) methods have achieved remarkable success in solving static traveling salesman problems (TSP). However, dynamic TSP (DTSP), with the random appearance of new customers over time, introduces additional complexities that challenge DRL methods by the difficulty of obtaining optimized routing policy which lead to sub-optimal results and reduced training efficiency. To address these issues, we propose a decoupled training neural solver (DTNS) based on the encoder-decoder architecture, which is a novel approach that decouples the optimization of encoder and decoder, enhancing the model's ability to handle dynamic changes. Our method involves training under an Fore-Reveal condition first where the information of all customers nodes are known in advance to obtain optimized encoder and initialization for decoder and then fine-tuning the decoder in dynamic scenarios where dynamic customers are revealed over time. This training paradigm results in a flexible and globally optimized routing policy. Experimental results demonstrate that DTNS efficiently adapts to new customer requests in dynamic scenario, outperforming existing methods in dynamic routing environments.
|
|
10:20-10:25, Paper ThBT17.6 | |
Multi-Drone-Truck Collaborative Delivery with En Route Operations: A Hierarchical MARL-Based Approach |
|
Hu, Shun | Tongji University |
Li, Bing | Tongji University |
Zhang, Rongqing | Tongji University |
Keywords: Planning, Scheduling and Coordination, Distributed Robot Systems, Path Planning for Multiple Mobile Robots or Agents
Abstract: The multi-drone-truck collaborative delivery, where unmanned trucks serve as mobile supply stations for drones, effectively combines the strengths of both vehicles and presents wide application prospects. But the majority of existing literature restricts drone launch and retrieve operations (LARO) to stationary truck, and potential drone route collisions are mostly ignored. This leads to inability to fully exploit the capability of drones. We address these gaps and introduce a new variant of multi-drone-truck collaborative delivery. However, the scheduling for drones and truck faces high-dimensional solution space and complex constraints, making it almost impossible for centralized solving. To this end, we develop a hierarchical solution framework that decomposes the complete problem into two levels of subproblem. The upper solver centrally allocates tasks and schedules when drones to launch, while the lower solver, based on multi-agent reinforcement learning (MARL), plans paths for each drone agent in a decentralized but cooperative manner. In addition, we validate the effectiveness of our method by benchmarking it against three state-of-the-art approaches, demonstrating its superiority in terms of both efficiency and collision avoidance.
|
|
10:25-10:30, Paper ThBT17.7 | |
Risk-Aware Energy-Constrained UAV-UGV Cooperative Routing Using Attention-Guided Reinforcement Learning |
|
Mondal, Mohammad Safwan | University of Illinois Chicago |
Ramasamy, Subramanian | University of Illinois at Chicago |
Rownak, Ragib | University of Illinois Chicago |
Russo, Luca | University of Illinois at Chicago |
Humann, James | DEVCOM Army Research Laboratory, |
James, Dotterweich, Jim | Army Research Laboratory |
Bhounsule, Pranav | University of Illinois at Chicago |
Keywords: Planning, Scheduling and Coordination, Multi-Robot Systems, Autonomous Agents
Abstract: Maximizing the endurance of unmanned aerial vehicles (UAVs) in large-scale monitoring missions spanning over large areas requires addressing their limited battery capacity. Deploying unmanned ground vehicles (UGVs) as mobile recharging stations offers a practical solution, extending UAVs’ operational range. This introduces the challenge of optimizing UAV-UGV routes for efficient mission point coverage and seamless recharging coordination. In this paper, we present a risk-aware deep reinforcement learning (Ra-DRL) framework with a multi-head attention mechanism within an encoder-decoder transformer architecture to solve this cooperative routing problem for a UAV-UGV team. Our model minimizes mission time while accounting for the stochastic fuel consumption of the UAV, influenced by environmental factors like wind velocity, ensuring adherence to a risk threshold to avoid mid-mission energy depletion. Extensive evaluations on various problem sizes show that our method significantly outperforms nearest-neighbor heuristics in both solution quality and risk management. We validate the Ra-DRL policy in a Gazebo-ROS SITL environment with a PX4-based custom UAV and Clearpath Husky UGV. The results demonstrate the robustness and adaptability of our policy, making it highly effective for mission planning in dynamic, uncertain scenarios.
|
|
ThBT18 |
406 |
RADAR-Based Navigation |
Regular Session |
Chair: Khattak, Shehryar | NASA Jet Propulsion Laboratory |
Co-Chair: Heidingsfeld, Michael | CARIAD SE |
|
09:55-10:00, Paper ThBT18.1 | |
Ground-Aware Automotive Radar Odometry |
|
Casado Herraez, Daniel | University of Bonn & CARIAD SE |
Kaschner, Franz | Technical University of Munich |
Zeller, Matthias | CARIAD SE |
Muhle, Dominik | Technical University of Munich |
Behley, Jens | University of Bonn |
Heidingsfeld, Michael | CARIAD SE |
Cremers, Daniel | Technical University of Munich |
Stachniss, Cyrill | University of Bonn |
Keywords: SLAM, Localization, Autonomous Vehicle Navigation
Abstract: Odometry is crucial for the navigation of autonomous vehicles in unknown environments. While cameras and LiDARs are commonly used to estimate the ego-motion of a vehicle, these sensors face limitations under bad lighting and severe weather conditions. Automotive radars overcome these challenges, but radar point clouds are generally sparse and noisy, making it difficult to identify useful features within a radar scan. In this paper, we address the problem of ego-motion estimation using a single automotive radar sensor. We propose a simple, yet effective, heuristic-based method to extract the ground plane from single radar scans and perform ground plane matching between consecutive scans. Additionally, we perform a windowed factor-graph optimization of the poses together with the ground plane, improving the accuracy of the pose estimation. We put our work to the test using the 4DRadarDataset. Our findings illustrate the state-of-the-art performance of our odometry approach compared to existing alternatives that use radar point clouds.
|
|
10:00-10:05, Paper ThBT18.2 | |
CAO-RONet: A Robust 4D Radar Odometry with Exploring More Information from Low-Quality Points |
|
Li, Zhiheng | Northeastern University |
Cui, Yubo | Northeastern University |
Huang, Ningyuan | Northeastern University |
Pang, Chenglin | Northeastern University |
Fang, Zheng | Northeastern University |
Keywords: Localization, SLAM, Visual Learning
Abstract: Recently, 4D millimetre-wave radar exhibits more stable perception ability than LiDAR and camera under adverse conditions (e.g. rain and fog). However, low-quality radar points hinder its application, especially the odometry task that requires a dense and accurate matching. To fully explore the potential of 4D radar, we introduce a learning-based odometry framework, enabling robust ego-motion estimation from finite and uncertain geometry information. First, for sparse radar points, we propose a local completion to supplement missing structures and provide denser guideline for aligning two frames. Then, a context-aware association with a hierarchical structure flexibly matches points of different scales aided by feature similarity, and improves local matching consistency through correlation balancing. Finally, we present a window-based optimizer that uses historical priors to establish a coupling state estimation and correct errors of inter-frame matching. The superiority of our algorithm is confirmed on View-of-Delft dataset, achieving around a 50% performance improvement over previous approaches and delivering accuracy on par with LiDAR odometry. The code will be released at https://github.com/NEU-REAL/CAO-RONet.
|
|
10:05-10:10, Paper ThBT18.3 | |
Radar Teach and Repeat: Architecture and Initial Field Testing |
|
Qiao, Xinyuan | University of Toronto |
Krawciw, Alec | University of Toronto |
Lilge, Sven | University of Toronto |
Barfoot, Timothy | University of Toronto |
Keywords: Field Robots, Autonomous Vehicle Navigation, Localization
Abstract: Frequency-modulated continuous-wave (FMCW) scanning radar has emerged as an alternative to spinning LiDAR for state estimation on mobile robots. Radar's longer wavelength is less affected by small particulates, providing operational advantages in challenging environments such as dust, smoke, and fog. This paper presents Radar Teach and Repeat (RT&R): a full-stack radar system for long-term off-road robot autonomy. RT&R can drive routes reliably in off-road cluttered areas without any GPS. We benchmark the radar system's closed-loop path-tracking performance and compare it to its 3D LiDAR counterpart. 11.8 km of autonomous driving was completed without interventions using only radar and gyro for navigation. RT&R was evaluated on four different routes with progressively less structured scene geometry. RT&R achieved lateral path-tracking root mean squared errors (RMSE) of 5.6 cm, 7.5 cm, and 12.1 cm as the routes became more challenging. These RMSE values are less than half of the width of one tire (24 cm) on our robot testing platform. These same routes have worst-case errors of 21.7 cm, 24.0 cm, and 43.8 cm. We conclude that radar is a viable alternative to LiDAR for long-term autonomy in challenging off-road scenarios. The implementation of RT&R is open-source and available at: https://github.com/utiasASRL/vtr3.
|
|
10:10-10:15, Paper ThBT18.4 | |
Structure-Aware Radar-Camera Depth Estimation |
|
Zhang, Fuyi | Zhejiang University |
Yu, Zhu | Zhejiang University |
Li, ChunHao | Zhejiang University |
Zhang, Runmin | Zhejiang University |
Bai, Xiaokai | Zhejiang University |
Zhou, Zili | Zhejiang University |
Cao, Siyuan | Zhejiang University |
Wang, Fang | Hangzhou City University |
Shen, Hui-liang | Zhejaing University |
Keywords: Deep Learning for Visual Perception, RGB-D Perception, Visual Learning
Abstract: Radar has gained much attention in autonomous driving due to its accessibility and robustness. However, its standalone application for depth perception is constrained by issues of sparsity and noise. Radar-camera depth estimation offers a more promising complementary solution. Despite significant progress, current approaches fail to produce satisfactory dense depth maps, due to the unsatisfactory processing of the sparse and noisy radar data. They constrain the regions of interest for radar points in rigid rectangular regions, which may introduce unexpected errors and confusions. To address these issues, we develop a structure-aware strategy for radar depth enhancement, which provides more targeted regions of interest by leveraging the structural priors of RGB images. Furthermore, we design a Multi-Scale Structure Guided Network to enhance radar features and preserve detailed structures, achieving accurate and structure-detailed dense metric depth estimation. Building on these, we propose a structure-aware radar-camera depth estimation framework, named SA-RCD. Extensive experiments demonstrate that our SA-RCD achieves state-of-the-art performance on the nuScenes dataset. Our code will be available at https://github.com/FreyZhangYeh/SA-RCD.
|
|
10:15-10:20, Paper ThBT18.5 | |
Doppler Former: Velocity Supervision of Raw Radar Data |
|
Zhao, Shuo | Megvii |
Sun, Wei | Fvidar |
Li, Huadong | MEGVII Technique |
Jiang, Zhaoying | Southeast University |
Keywords: Deep Learning for Visual Perception, Computer Vision for Manufacturing
Abstract: Thanks to the high robustness of 4D millimeter-wave radar in various environments, it has been widely applied in the field of autonomous driving. Recent research has increasingly focused on utilizing raw data, as a substitute for the sparse and noisy point cloud data. However, these approaches have not fully exploited the Doppler features present in the raw data. In this paper, we introduce the Doppler Former (DPF) module to efficiently extract velocity information from the target environment. DPF can be seamlessly integrated into most radar perception backbone and enhance their performance in downstream tasks. Additionally, we propose a new backbone, Fully Complex Convolutional Network (FCCN), which is more suitable for raw data. By incorporating the DPF module into FCCN, we achieved state-of-the-art (SOTA) performance on the RADIal dataset, with code available at https://github.com/coconut-zs/Fvidar-DopplerFormer.
|
|
10:20-10:25, Paper ThBT18.6 | |
Robust High-Speed State Estimation for Off-Road Navigation Using Radar Velocity Factors |
|
Nissov, Morten | NTNU |
Edlund, Jeffrey | Jet Propulsion Lab |
Spieler, Patrick | JPL |
Padgett, Curtis | JPL |
Alexis, Kostas | NTNU - Norwegian University of Science and Technology |
Khattak, Shehryar | NASA Jet Propulsion Laboratory |
Keywords: Field Robots, Sensor Fusion, Localization
Abstract: Enabling robot autonomy in complex environments for mission critical application requires robust state estimation. Particularly under conditions where the exteroceptive sensors, which the navigation depends on, can be degraded by environmental challenges thus, leading to mission failure. It is precisely in such challenges where the potential for Frequency Modulated Continuous Wave (FMCW) radar sensors is highlighted: as a complementary exteroceptive sensing modality with direct velocity measuring capabilities. In this work we integrate radial speed measurements from a FMCW radar sensor, using a radial speed factor, to provide linear velocity updates into a sliding–window state estimator for fusion with LiDAR pose and IMU measurements. We demonstrate that this augmentation increases the robustness of the state estimator to challenging conditions present in the environment and the negative effects they can pose to vulnerable exteroceptive modalities. The proposed method is extensively evaluated using robotic field experiments conducted using an autonomous, full-scale, off-road vehicle operating at high-speeds (~12 m/s) in complex desert environments. Furthermore, the robustness of the approach is demonstrated for cases of both simulated and real-world degradation of the LiDAR odometry performance along with comparison against state-of-the-art methods for radar-inertial odometry on public datasets.
|
|
ThBT19 |
407 |
Active Sensing |
Regular Session |
Chair: Abraham, Ian | Yale University |
Co-Chair: Yau, Wei-Yun | I2R |
|
09:55-10:00, Paper ThBT19.1 | |
Graph-Based SLAM-Aware Exploration with Prior Topo-Metric Information |
|
Bai, Ruofei | Nanyang Technological University |
Guo, Hongliang | Agency for Science Technology and Research |
Yau, Wei-Yun | I2R |
Xie, Lihua | NanyangTechnological University |
Keywords: Planning under Uncertainty, SLAM, Autonomous Vehicle Navigation
Abstract: Autonomous exploration requires a robot to explore an unknown environment while constructing an accurate map using SLAM (Simultaneous Localization and Mapping) techniques. Without prior information, the exploration performance is usually conservative due to the limited planning horizon. This paper exploits a prior topo-metric graph of the environment to benefit both the exploration efficiency and the pose graph reliability in SLAM. Based on the relationship between pose graph reliability and graph topology, we formulate a SLAM-aware path planning problem over the prior graph, which finds a fast exploration path enhanced with the globally informative loop-closing actions to stabilize the SLAM pose graph. A greedy algorithm is proposed to solve the problem, in which we derive theoretical thresholds that significantly prune non-optimal loop-closing actions without affecting the potential informative ones. Furthermore, we incorporate the proposed planner into a hierarchical exploration framework, with flexible features including path replanning, and online prior graph update that adds additional information to the prior graph. Simulation and real-world experiments indicate that the proposed method can reliably achieve higher mapping accuracy than compared methods when exploring environments with rich topologies, while maintaining comparable exploration efficiency. Our method is open-sourced on GitHub.
|
|
10:00-10:05, Paper ThBT19.2 | |
Dynamic Multi-Objective Ergodic Path Planning Using Decomposition Methods |
|
Breitfeld, Abigail | Carnegie Mellon University |
Wettergreen, David | Carnegie Mellon University |
Keywords: Motion and Path Planning, Space Robotics and Automation, Field Robots
Abstract: Robots are often employed in hazardous or inaccessible environments, such as disaster sites, extraterrestrial terrains, agricultural fields, and ocean floors. Autonomous operation is crucial in these scenarios to reduce reliance on human operators and enable real-time decision-making. However, robots must balance multiple, often conflicting, objectives. These objectives are subject to change based on new data or evolving conditions. This paper presents a novel approach to dynamic multi-objective trajectory planning. The proposed method leverages the boundary intersection decomposition technique to adaptively plan trajectories that balance multiple evolving objectives. Our approach ensures efficient and effective exploration by continuously optimizing the trade-offs between changing objectives. We show that our method performs on average 34% better in terms of solution quality on the dynamic multi-objective trajectory planning problem as compared to prior work.
|
|
10:05-10:10, Paper ThBT19.3 | |
Rapid Autonomous Exploration of Large-Scale Environments for Ground Robots Based on Region Partitioning |
|
Wen, Zhi | Xidian University |
Liu, Xiaotao | Xidian University |
Lu, GaoJie | Xidian University |
Liu, Jing | Xidian University |
Keywords: Motion and Path Planning, Vision-Based Navigation, Wheeled Robots
Abstract: Autonomous exploration in large environments often leads to inefficient long backtracking, as distant targets are prioritized over closer ones. To address this issue, in this work, we propose a hierarchical planning method based on region partitioning. The space is dynamically partitioned at a coarse resolution, and as exploration progresses, regions with sufficient known areas are further subdivided to locate unknown areas more precisely. A utility function considering unknown area size, travel distance, and sequence similarity is used, and the simulated annealing algorithm generates a subregion sequence for global guidance. Within each subregion, a linear acceleration model helps select target points. This method reduces computational load and minimizes long-distance backtracking, enabling more efficient high-frequency planning. Extensive simulations and real world tests show that our method significantly improves exploration efficiency compared to existing vision-based techniques.
|
|
10:10-10:15, Paper ThBT19.4 | |
MapEx: Indoor Structure Exploration with Probabilistic Information Gain from Global Map Predictions |
|
Ho, Cherie | Carnegie Mellon University |
Kim, Seungchan | Carnegie Mellon University |
Moon, Brady | Carnegie Mellon University |
Parandekar, Aditya | Birla Institute of Technology and Science, Pilani - Goa Campus |
Harutyunyan, Narek | Brown University |
Wang, Chen | University at Buffalo |
Sycara, Katia | Carnegie Mellon University |
Best, Graeme | University of Technology Sydney |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Planning under Uncertainty, Integrated Planning and Learning
Abstract: Exploration is a critical challenge in robotics, centered on understanding unknown environments. In this work, we focus on structured indoor environments, which often exhibit predictable, repeating patterns. Conventional frontier-based exploration approaches have difficulty leveraging this predictability, relying on simple heuristics such as 'closest first' for exploration. More recent deep learning-based methods predict unknown regions of the map for information gain computation, but these approaches are often sensitive to the predicted map quality or fail to account for sensor coverage. To overcome these issues, our key insight is to jointly reason over what the robot can observe and its uncertainty to calculate probabilistic information gain. We introduce MapEx, a new exploration framework that uses predicted maps to form probabilistic sensor model for information gain estimation. MapEx generates multiple predicted maps based on observed information, and takes into consideration both the computed variances of predicted maps and estimated visible area to estimate the information gain of a given viewpoint. Experiments on the real-world KTH dataset showed on average 12.4% improvement than representative map-prediction based exploration and 25.4% improvement than nearest frontier approach. Website: mapex-explorer.github.io
|
|
10:15-10:20, Paper ThBT19.5 | |
Ergodic Trajectory Optimization on Generalized Domains Using Maximum Mean Discrepancy |
|
Hughes, Christian | Yale University |
Warren, Houston | University of Sydney |
Lee, Darrick | Univ. of Edinburgh |
Ramos, Fabio | University of Sydney, NVIDIA |
Abraham, Ian | Yale University |
Keywords: Motion and Path Planning, Integrated Planning and Control
Abstract: We present a novel formulation of ergodic trajectory optimization that can be specified over general domains using kernel maximum mean discrepancy. Ergodic trajectory optimization is an effective approach that generates coverage paths for problems related to robotic inspection, information gathering problems, and search and rescue. These optimization schemes compel the robot to spend time in a region proportional to the expected utility of visiting that region. Current methods for ergodic trajectory optimization rely on domain-specific knowledge, e.g., a defined utility map, and well-defined spatial basis functions to produce ergodic trajectories. Here, we present a generalization of ergodic trajectory optimization based on maximum mean discrepancy that requires only samples from the search domain. We demonstrate the ability of our approach to produce coverage trajectories on a variety of problem domains including robotic inspection of objects with differential kinematics constraints and on Lie groups without having access to domain specific knowledge. Furthermore, we show favorable computational scaling compared to existing state-of-the-art methods for ergodic trajectory optimization with a trade-off between domain specific knowledge and computational scaling, thus extending the versatility of ergodic coverage on a wider application domain
|
|
10:20-10:25, Paper ThBT19.6 | |
Ergodic Exploration Over Meshable Surfaces |
|
Dong, Dayi, E | University of California Berkeley |
Xu, Albert | Carnegie Mellon University |
Gutow, Geordan | Carnegie Mellon University |
Choset, Howie | Carnegie Mellon University |
Abraham, Ian | Yale University |
Keywords: Motion and Path Planning, Search and Rescue Robots, Computational Geometry
Abstract: Robotic search and rescue, exploration, and inspection require trajectory planning across a variety of domains. A popular approach to trajectory planning for these types of missions is ergodic search, which biases a trajectory to spend time in parts of the exploration domain that are believed to contain more information. Most prior work on ergodic search has been limited to searching simple surfaces, like a 2D Euclidean plane or a sphere, as they rely on projecting functions defined on the exploration domain onto analytically obtained Fourier basis functions. In this paper, we extend ergodic search to any surface that can be approximated by a triangle mesh. The basis functions are approximated through finite element methods on a triangle mesh of the domain. We formally prove that this approximation converges to the continuous case as the mesh approximation converges to the true domain. We demonstrate that on domains where analytical basis functions are available (plane, sphere), the proposed method obtains equivalent results, and while on other domains (torus, bunny, wind turbine), the approach is versatile enough to still search effectively. Lastly, we also compare with an existing ergodic search technique that can handle complex domains and show that our method results in a higher quality exploration.
|
|
10:25-10:30, Paper ThBT19.7 | |
FALCON: Fast Autonomous Aerial Exploration Using Coverage Path Guidance |
|
Zhang, Yichen | The Hong Kong University of Science and Technology |
Chen, Xinyi | The Hong Kong University of Science and Technology |
Feng, Chen | Hong Kong University of Science and Technology |
Zhou, Boyu | Southern University of Science and Technology |
Shen, Shaojie | Hong Kong University of Science and Technology |
Keywords: Aerial Systems: Perception and Autonomy, Aerial Systems: Applications, Motion and Path Planning, Autonomous Exploration
Abstract: This paper introduces FALCON, a novel Fast Autonomous expLoration framework using COverage path guidaNce, which aims at setting a new performance benchmark in the field of autonomous aerial exploration. FALCON effectively harnesses the full potential of online generated coverage paths in enhancing exploration efficiency. The framework begins with an incremental connectivity-aware space decomposition and connectivity graph construction. Subsequently, a hierarchical planner generates a coverage path spanning the entire unexplored space, serving as a global guidance. Then, a local planner optimizes the frontier visitation order, consciously incorporating the intention of the global guidance. For fair and comprehensive benchmark experiments, we introduce a lightweight exploration planner evaluation environment that allows for comparing exploration planners across a variety of testing scenarios using an identical quadrotor simulator. Extensive benchmark experiments and ablation studies demonstrate the significant performance of FALCON. Real-world experiments conducted fully onboard further validate FALCON’s practical capability in complex and challenging environments.
|
|
ThBT20 |
408 |
Agricultural Automation 2 |
Regular Session |
Chair: Chowdhary, Girish | University of Illinois at Urbana Champaign |
Co-Chair: Cappelleri, David | Purdue University |
|
09:55-10:00, Paper ThBT20.1 | |
Improving Robotic Fruit Harvesting within Cluttered Environments through 3D Shape Completion |
|
Magistri, Federico | University of Bonn |
Pan, Yue | University of Bonn |
Bartels, Jake | Queensland University of Technology (QUT) |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Lehnert, Christopher | Queensland University of Technology |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, Perception for Grasping and Manipulation
Abstract: The world population is increasing and will, by 2050, nearly double its demand for food, feed, fuel, and fiber. Be sides environmental challenges, labor shortage also poses crucial challenges to the agricultural production system. Automation of manual tasks in crop production can potentially increase efficiency but also lead to a change in agricultural practices for more effective usage of available land. In this paper, we address the problem of robotic fruit harvesting in challenging real-world scenarios such as vertical farms, where robotic sensing and acting need to cope with a cluttered environment. Robotic fruit harvesting is typically done by directly detecting a grasp point in the sensor reading, which can lie on the fruit itself or on its peduncle depending on crop harvesting requirements. However, grasp point detection is not always possible as the ideal grasp point may be hidden behind leaves or other fruits. Our approach exploits shape completion techniques allowing us to estimate the complete 3D shape of a target fruit together with its pose even under strong occlusions. In this way, we can estimate a grasp point even when the fruit is only partially visible. We evaluate our approach on a real robotic manipulator operating in a vertical farm growing different fruit species and employing different harvesting tools. Our experiments show that, on average, our proposed pipeline increases the success rate by 18.5 percentage points, in terms of end-effector positioning, compared to the most competitive baseline among the ones reported in this work, that does not rely on shape completion.
|
|
10:00-10:05, Paper ThBT20.2 | |
P-AgSLAM: In-Row and Under-Canopy SLAM for Agricultural Monitoring in Cornfields |
|
Kim, Kitae | Purdue University |
Deb, Aarya | Purdue University |
Cappelleri, David | Purdue University |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, SLAM
Abstract: In this paper, we present an in-row and under-canopy Simultaneous Localization and Mapping (SLAM) framework called the Purdue AgSLAM or P-AgSLAM which is designed for robot pose estimation and agricultural monitoring in cornfields. Our SLAM approach is primarily based on a 3D light detection and ranging (LiDAR) sensor and it is designed for the extraction of unique morphological features of cornfields which have significantly different characteristics from structured indoor and outdoor urban environments. The performance of the proposed approach has been validated with experiments in simulation and in real cornfield environments. P-AgSLAM outperforms existing state-of-the-art LiDAR-based state estimators in robot pose estimations and mapping.
|
|
10:05-10:10, Paper ThBT20.3 | |
Robotic Mushroom Harvesting with Real2Sim2Real and Model Predictive Path Integral (MPPI) Based Planning |
|
Vasios, Konstantinos | University of Essex |
Porichis, Antonis | University of Essex |
Mohan, Vishwanathan | University of Essex |
Chatzakos, Panagiotis | University of Essex AI Innovation Centre |
Keywords: Agricultural Automation, Manipulation Planning, Dexterous Manipulation
Abstract: We present a strategy for the problem of robotic button mushroom harvesting (Agaricus Bisporus) that involves a Real2Sim2Real pipeline with dynamic scene reconstruction and a Model Predictive Path Integral (MPPI) control & planning architecture for generating optimal uprooting motion primitives based on a physics engine simulation framework. Given the complex, nonlinear, anisotropic material properties of the mushrooms in combination with the multiple failure-mode modalities involved, we design a simulation framework around the PyBullet rigid-body physics engine by utilizing first-order approximations of the equivalent continuum mechanics models. By exploiting the computational efficiency of the aforementioned simulation framework, we directly apply the MPPI control framework to generate offline optimal mushroom uprooting motion primitives, defining a set of cost objectives for an optimal and within-constraint harvesting plan. We show that with this planning strategy, the ``root-bending'' action emerges autonomously for the single mushroom case as an optimal uprooting maneuver, which corresponds well to empirical knowledge obtained by expert pickers. A video demonstration of the proposed architecture can be found in https://youtu.be/k38ePBsBego.
|
|
10:10-10:15, Paper ThBT20.4 | |
Collision-Aware Traversability Analysis for Autonomous Vehicles in the Context of Agricultural Robotics |
|
Philippe, Florian | Université De Haute-Alsace |
Laconte, Johann | French National Research Institute for Agriculture, Food and The |
Lapray, Pierre-Jean | Université De Haute-Alsace |
Spisser, Matthias | Technology & Strategy Engineering SAS |
Lauffenburger, Jean-Philippe | Université De Haute-Alsace |
Keywords: Agricultural Automation, Sensor Fusion, Collision Avoidance
Abstract: In this paper, we introduce a novel method for safe navigation in agricultural robotics. As global environmental challenges intensify, robotics offers a powerful solution to reduce chemical usage while meeting the increasing demands for food production. However, significant challenges remain in ensuring the autonomy and resilience of robots operating in unstructured agricultural environments. Obstacles such as crops and tall grass, which are deformable, must be identified as safely traversable, compared to rigid obstacles. To address this, we propose a new traversability analysis method based on a 3D spectral map reconstructed using a LIDAR and a multispectral camera. This approach enables the robot to distinguish between safe and unsafe collisions with deformable obstacles. We perform a comprehensive evaluation of multispectral metrics for vegetation detection and incorporate these metrics into an augmented environmental map. Utilizing this map, we compute a physics-based traversability metric that accounts for the robot’s weight and size, ensuring safe navigation over deformable obstacles.
|
|
10:15-10:20, Paper ThBT20.5 | |
Enhanced View Planning for Robotic Harvesting: Tackling Occlusions with Imitation Learning |
|
Li, Lun | University of Groningen |
Kasaei, Hamidreza | University of Groningen |
Keywords: Agricultural Automation, Robotics and Automation in Agriculture and Forestry, Imitation Learning
Abstract: In agricultural automation, inherent occlusion presents a major challenge for robotic harvesting. We propose an imitation learning-based viewpoint planning approach to actively adjust camera viewpoint and capture unobstructed images of the target crop. Traditional viewpoint planners and existing learning-based methods, depend on manually designed evaluation metrics or reward functions, often struggle to generalize to complex, unseen scenarios. Our method employs the Action Chunking with Transformer (ACT) algorithm to learn effective camera motion policies from expert demonstrations. This enables continuous six-degree-of-freedom (6-DoF) viewpoint adjustments that are smoother, more precise and reveal occluded targets. Extensive experiments in both simulated and real-world environments, featuring agricultural scenarios and a 6-DoF collaborative robot arm equipped with an RGB-D camera, demonstrate our method's superior success rate and efficiency, especially in complex occlusion conditions, as well as its ability to generalize across different crops without reprogramming. This study advances robotic harvesting by providing a practical “learn from demonstration” (LfD) solution to occlusion challenges, ultimately enhancing autonomous harvesting performance and productivity.
|
|
10:20-10:25, Paper ThBT20.6 | |
Precision Harvesting in Cluttered Environments: Integrating End Effector Design with Dual Camera Perception |
|
Koe, Kendall | University of Illinois Urbana Champaign |
Shah, Poojan Kalpeshbhai | University of Illinois |
Walt, Benjamin | University of Illinois Urbana-Champaign |
Westphal, Jordan | University of Illinois at Urbana-Champaign |
Marri, Samhita | University of Illinois at Urbana Champaign |
Kamtikar, Shivani Kiran | University of Illinois at Urbana-Champaign |
Nam, James Seungbum | University of Illinois at Urbana-Champaign |
Uppalapati, Naveen Kumar | University of Illinois at Urbana-Champaign |
Chowdhary, Girish | University of Illinois at Urbana Champaign |
Krishnan, Girish | University of Illinois Urbana Champaign |
Keywords: Agricultural Automation, Robotics and Automation in Agriculture and Forestry, Field Robots
Abstract: Due to labor shortages in specialty crop industries, a need for robotic automation to increase agricultural efficiency and productivity has arisen. Previous manipulation systems harvest well in uncluttered and structured environments. High tunnel environments are more compact and cluttered in nature, requiring a rethinking of the large form factor systems and grippers. We propose a novel co-designed framework incorporating a global detection camera and a local eye-in-hand camera that demonstrates precise localization of small fruits via closed-loop visual feedback and reliable error handling. Field experiments in high tunnels show that our system can reach 85.0% of cherry tomato fruit in 10.98s on average.
|
|
10:25-10:30, Paper ThBT20.7 | |
S^2BEV: Lightweight, Robust, and Precise SLAM-Oriented Segmentation Bird Eye’s View Mapping Approach |
|
Sun, Yefeng | Shanghai Jiao Tong University |
Gong, Liang | Shanghai Jiao Tong University |
Dai, Jialing | University of Chinese Academy of Sciences |
Bishu, Gao | Shanghai Jiao Tong University |
Cai, Jinghan | Shanghai Jiao Tong University |
Lin, Gengjie | Shanghai Jiao Tong University |
Moutarde, Fabien | MINES ParisTech - PSL University |
Lu, Junguo | Shanghai Jiaotong University |
Liu, Chengliang | Shanghai Jiao Tong University |
Keywords: Agricultural Automation, Robotics and Automation in Agriculture and Forestry, Mapping
Abstract: As modern agriculture progresses, the swift deployment of accurate maps becomes essential for the autonomous navigation and operation of orchard robots. Traditional mapping techniques often fall short in addressing the challenges posed by orchards, which are characterized by unstructured, dynamically changing environments with complex spatial and temporal dynamics due to seasonal and continuous operations. This paper proposes a new approach to orchard map construction that merges topological maps with semantic SLAM, which leverages semantic segmentation to discriminate the topological invariant against volatile orchard scenes during mapping. Meanwhile, this integration enables the creation, optimization, and rapid deployment of maps that are not only lightweight and robust but also precise. To evaluate the effectiveness of our method, we performed navigation tests in orchard environments using the newly developed maps. The experimental outcomes demonstrated a significant reduction in CPU usage, with maximum and average reductions of 7.6% and 4.5%, respectively. This approach not only enhances navigation efficiency but also facilitates quicker map deployment, effectively freeing computational resources for other critical tasks.
|
|
ThBT21 |
410 |
Manipulation Planning and Control 2 |
Regular Session |
Chair: Hu, Ai-Ping | Georgia Tech Research Institute |
Co-Chair: Misimi, Ekrem | SINTEF Ocean |
|
09:55-10:00, Paper ThBT21.1 | |
Non-Prehensile Object Transport by Nonholonomic Robots Connected by Linear Deformable Elements |
|
Zhi, Hui | The Hong Kong Polytechnic University |
Zhang, Bin | The Hong Kong Polytechnic University |
Qi, Jiaming | Centre for Transformative Garment Production, HongKong |
Romero Velazquez, Jose Guadalupe | ITAM |
Shao, Xiaodong | Beihang University |
Yang, Chenguang | University of Liverpool |
Navarro-Alarcon, David | The Hong Kong Polytechnic University |
Keywords: Motion Control, Constrained Motion Planning, Soft Robot Applications
Abstract: This paper presents a new method to automatically transport objects with mobile robots via non-prehensile actions. Our proposed approach utilizes a pair of nonholonomic robots connected by a deformable tube to efficiently manipulate objects of irregular shapes toward target locations. To autonomously perform this task, we develop a local integrated planning and control strategy that solves the problem in two steps (viz. enveloping and transport) based on the model predictive control (MPC) framework. The deformable underactuated system is simplified by a linear kinematic model. The enveloping problem is formulated as the minimization of multiple criteria that represent the enclosing error of the object by the variable morphology system. The transport problem is tackled by formulating the non-prehensile dragging action as an inequality constraint specified by the body frame of the deformable system. Reactive obstacle avoidance is ensured by a maximum margin-based term that utilizes the system's geometry and the feedback proximity to the environment. To validate the performance of the proposed methodology, we report a detailed experimental study with vision-guided robotic prototypes conducting multiple autonomous object transport tasks.
|
|
10:00-10:05, Paper ThBT21.2 | |
Implicit Physics-Aware Policy for Dynamic Manipulation of Rigid Objects Via Soft Body Tools |
|
Wang, Zixing | Purdue University |
Qureshi, Ahmed H. | Purdue University |
Keywords: Deep Learning in Grasping and Manipulation, Learning from Demonstration, Sensorimotor Learning
Abstract: Recent advancements in robot tool use have unlocked their usage for novel tasks, yet the predominant focus is on rigid-body tools, while the investigation of soft-body tools and their dynamic interaction with rigid bodies remains unexplored. This paper takes a pioneering step towards dynamic one-shot soft tool use for manipulating rigid objects, a challenging problem posed by complex interactions and unobservable physical properties. To address these problems, we propose the Implicit Physics-aware (IPA) policy, designed to facilitate effective soft tool use across various environmental configurations. The IPA policy conducts system identification to implicitly identify physics information and predict goal-conditioned, one-shot actions accordingly. We validate our approach through a challenging task, i.e., transporting rigid objects using soft tools such as ropes to distant target positions in a single attempt under unknown environment physics parameters. Our experimental results indicate the effectiveness of our method in efficiently identifying physical properties, accurately predicting actions, and smoothly generalizing to real-world environments. The related video is available at: https://youtu.be/4hPrUDTc4Rg?si=WUZrT2vjLMt8qRWA
|
|
10:05-10:10, Paper ThBT21.3 | |
General-Purpose Clothes Manipulation with Semantic Keypoints |
|
Deng, Yuhong | National University of Singapore |
Hsu, David | National University of Singapore |
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation, Representation Learning
Abstract: Clothes manipulation is a critical capability for household robots; yet, existing methods are often confined to specific tasks, such as folding or flattening, due to the complex high-dimensional geometry of deformable fabric. This paper presents CLothes mAnipulation with Semantic keyPoints (CLASP) for general-purpose clothes manipulation, which enables the robot to perform diverse manipulation tasks over different types of clothes. The key idea of CLASP is semantic keypoints---e.g., "right shoulder", "left sleeve", etc.---a sparse spatial-semantic representation that is salient for both perception and action. Semantic keypoints of clothes can be effectively extracted from depth images and are sufficient to represent a broad range of clothes manipulation policies. CLASP leverages semantic keypoints to bridge LLM-powered task planning and low-level action execution in a two-level hierarchy. Extensive simulation experiments show that CLASP outperforms baseline methods across diverse clothes types in both seen and unseen tasks. Further, experiments with a Kinova dual-arm system on four distinct tasks---folding, flattening, hanging, and placing---confirm CLASP's performance on a real robot.
|
|
10:10-10:15, Paper ThBT21.4 | |
Robust Optical Transceiver Manipulation in Cluttered Cable Environments Using 3D Scene Understanding and Planning |
|
Sarantopoulos, Iason | Microsoft Research |
Liu, Chenyu | Peking University |
Weng, Bohong | University of Science and Technology of China |
Xu, Sicheng | Microsoft Research Asia |
Zhang, Yizhong | Microsoft |
Yang, Jiaolong | Microsoft Research |
Tong, Xin | MICROSOFT |
Otto, Fabian | Microsoft Research |
Sweeney, David | Microsoft Research |
Chatzieleftheriou, Andromachi | Microsoft |
Rowstron, Antony | Microsoft Research |
Keywords: Perception for Grasping and Manipulation, Object Detection, Segmentation and Categorization, Constrained Motion Planning
Abstract: Robotic manipulation in cluttered environments presents significant challenges, particularly when the clutter includes thin, deformable objects like cables, which complicate perception and decision-making processes. In the context of datacenters, the automation of networking tasks often involves the manipulation of optical transceivers within densely packed cable configurations. Such environments are characterized by an abundance of delicate, overlapping, and intersecting cables, leading to frequent occlusions. This paper introduces an innovative system designed for the manipulation of optical transceivers in environments cluttered by cables. Our integrated approach combines advanced 3D scene understanding with a heuristic-based pushing policy to effectively manipulate optical transceivers amidst clutter. The system's perception component utilizes image segmentation and 3D reconstruction to accurately model the transceivers and surrounding cables. Meanwhile, the planning aspect employs a search algorithm with task-specific heuristics, to navigate the gripper, displace obstructing cables, and safely achieve a precise pre-grasp position in front of the target transceiver. We have conducted extensive evaluations of our methodology in both simulated and real-world settings, demonstrating its high success rates, robustness, and proficiency in addressing the unique challenges posed by cable-occluded environments within datacenters.
|
|
10:15-10:20, Paper ThBT21.5 | |
ReloPush: Multi-Object Rearrangement in Confined Spaces with a Nonholonomic Mobile Robot Pusher |
|
Ahn, Jeeho | University of Michigan |
Mavrogiannis, Christoforos | University of Michigan |
Keywords: Mobile Manipulation, Task and Motion Planning, Manipulation Planning
Abstract: We focus on the problem of rearranging a set of objects within a confined space with a nonholonomically constrained mobile robot pusher. This problem is relevant to many real-world domains, including warehouse automation and construction. These domains give rise to instances involving a combination of geometric, kinematic, and physics constraints, which make planning particularly challenging. Prior work often makes simplifying assumptions like the use of holonomic mobile robots or dexterous manipulators capable of unconstrained overhand reaching. Our key insight is we can empower even a constrained mobile pusher to tackle complex rearrangement tasks by enabling it to modify the environment to its favor in a constraint-aware fashion. To this end, we describe a Push-Traversability graph, whose vertices represent poses that the pusher can push objects from and edges represent optimal, kinematically feasible, and stable push-rearrangements of objects. Based on this graph, we develop ReloPush, a planning framework that leverages Dubins curves and standard graph search techniques to generate an efficient sequence of object rearrangements to be executed by the pusher. We evaluate ReloPush across a series of challenging scenarios, involving the rearrangement of densely cluttered workspaces with up to eight objects by a 1tenth mobile robot pusher. ReloPush exhibits orders of magnitude faster runtimes and significantly more robust execution in the real world, evidenced in lower execution times and fewer losses of object contact, compared to two baselines lacking our proposed graph structure.
|
|
10:20-10:25, Paper ThBT21.6 | |
Non-Prehensile Shape Manipulation of Elastoplastic Objects with Reinforcement Learning |
|
Herland, Sverre | Norwegian University of Science and Technology |
Misimi, Ekrem | SINTEF Ocean |
Keywords: Deep Learning in Grasping and Manipulation, Reinforcement Learning
Abstract: We present a novel framework for non-prehensile shape manipulation of deformable objects using Deep Reinforcement Learning. Unlike previous approaches that rely on grasping, our method employs a sequence of gentle pushing actions to deform objects into target shapes. We introduce a continuous parametrization of pushing actions that allows for precise control over pushing trajectories, enabling more flexible and efficient manipulation. The framework is applicable to a wide range of objects by representing them as sampled boundary coordinates, removing the need for predefined object partitions. Trained entirely in simulation, our controller demonstrates zero-shot transfer to real-world scenarios without additional training. Extensive evaluations show that our approach not only matches but substantially exceeds the performance of previous methods, while being more gentle and efficient. We demonstrate successful manipulation across various deformable objects and materials, including food items like salmon and pork loin. This work represents a significant advancement in robotic manipulation of deformable objects, with potential applications in food processing, manufacturing, and beyond.
|
|
10:25-10:30, Paper ThBT21.7 | |
ORLA*: Mobile Manipulator-Based Object Rearrangement with Lazy A* |
|
Gao, Kai | Rutgers University |
Zhaxizhuoma, Zhaxizhuoma | Shanghai Artificial Intelligence Laboratory |
Ding, Yan | SUNY Binghamton |
Zhang, Shiqi | SUNY Binghamton |
Yu, Jingjin | Rutgers University |
Keywords: Mobile Manipulation, Task Planning, Manipulation Planning
Abstract: Effectively performing object rearrangement is an essential skill for mobile manipulators, e.g., setting up a dinner table. A key challenge in such problems is deciding an appropriate ordering to effectively untangle object-object dependencies while considering the necessary motions for realizing manipulation tasks (e.g., pick and place). Computing time-optimal multi-object rearrangement solutions for mobile manipulators remains a largely untapped research direction. In this work, we propose ORLA*, which leverages delayed/lazy evaluation in searching for a high-quality object pick-n-place sequence that considers both end-effector and mobile robot base travel. ORLA* readily handles multi-layered rearrangement tasks powered by learning-based stability predictions. Employing an optimal solver for finding temporary locations for displacing objects, ORLA* can achieve global optimality. Through extensive simulation and ablation study, we confirm the effectiveness of ORLA* delivering quality solutions for challenging rearrangement instances. Supplementary materials are available at: https://gaokai15.github.io/ORLA-Star/
|
|
ThBT22 |
411 |
Imitation Learning for Manipulation 1 |
Regular Session |
Chair: Hoffman, Judy | Georgia Tech |
Co-Chair: Ravichandar, Harish | Georgia Institute of Technology |
|
09:55-10:00, Paper ThBT22.1 | |
Learning Prehensile Dexterity by Imitating and Emulating State-Only Observations |
|
Han, Yunhai | Georgia Institute of Technology |
Chen, Zhenyang | Georgia Institute of Technology |
Williams, Kyle | Sandia National Labs |
Ravichandar, Harish | Georgia Institute of Technology |
Keywords: Imitation Learning, Dexterous Manipulation
Abstract: When human acquire physical skills (e.g., tennis) from experts, we tend to first learn from merely observing the expert. But this is often insufficient. We then engage in practice, where we try to emulate the expert and ensure that our actions produce similar effects on our environment. Inspired by this observation, we introduce Combining IMitation and Emulation for Motion Refinement (CIMER) -- a two-stage framework to learn dexterous prehensile manipulation skills from state-only observations. CIMER's first stage involves imitation: simultaneously encode the complex interdependent motions of the robot hand and the object in a structured dynamical system. This results in a reactive motion generation policy that provides a reasonable motion prior, but lacks the ability to reason about contact effects due to the lack of action labels. The second stage involves emulation: learn a motion refinement policy via reinforcement that adjusts the robot hand's motion prior such that the desired object motion is reenacted. CIMER is both task-agnostic (no task-specific reward design or shaping) and intervention-free (no additional teleoperated or labeled demonstrations). Detailed experiments with prehensile dexterity reveal that i) imitation alone is insufficient, but adding emulation drastically improves performance, ii) CIMER outperforms existing methods in terms of sample efficiency and the ability to generate realistic and stable motions, iii) CIMER can either zero-shot generalize or learn to adapt to novel objects from the YCB dataset, even outperforming expert policies trained with action labels in most cases. Source code and videos are available at https://sites.google.com/view/cimer-2024/.
|
|
10:00-10:05, Paper ThBT22.2 | |
EgoMimic: Scaling Imitation Learning Via Egocentric Video |
|
Kareer, Simar | Georgia Tech |
Patel, Dhruv | Georgia Institute of Technology |
Punamiya, Ryan | Georgia Institute of Technology |
Mathur, Pranay | Georgia Institute of Technology |
Cheng, Shuo | Gatech |
Wang, Chen | Stanford University |
Hoffman, Judy | Georgia Tech |
Xu, Danfei | Georgia Institute of Technology |
Keywords: Big Data in Robotics and Automation, Imitation Learning, Transfer Learning
Abstract: The scale and diversity of demonstration data required for imitation learning is a significant challenge. We present EgoMimic, a full-stack framework that scales manipulation through egocentric-view human demonstrations. EgoMimic achieves this through: (1) an ergonomic human data collection system using the Project Aria glasses, (2) a low-cost bimanual manipulator that minimizes the kinematic gap to human data, (3) cross-domain data alignment techniques, and (4) an imitation learning architecture that co-trains on hand and robot data. Compared to prior works that only extract high-level intent from human videos, our approach treats human and robot data equally as embodied demonstration data and learns a unified policy from both data sources. EgoMimic achieves significant improvement on a diverse set of long-horizon, single-arm and bimanual manipulation tasks over state-of-the-art imitation learning methods and enables generalization to entirely new scenes. Finally, we show a favorable scaling trend for EgoMimic, where adding 1 hour of additional hand data is significantly more valuable than 1 hour of additional robot data. Videos and additional information can be found at https://egomimic.github.io/.
|
|
10:05-10:10, Paper ThBT22.3 | |
Neural Dynamics Augmented Diffusion Policy |
|
Wu, Ruihai | Peking University |
Chen, Haozhe | University of Illinois Urbana-Champaign |
Zhang, Mingtong | UIUC |
Lu, Haoran | Peking University |
Li, Yitong | Tsinghua University |
Li, Yunzhu | Columbia University |
Keywords: Imitation Learning, Model Learning for Control, Machine Learning for Robot Control
Abstract: Imitation learning has been proven effective in mimicking demonstrations across various robotic manipulation tasks. However, to develop robust policies, current imitation methods, such as diffusion policy, require training on extensive demonstrations, making data collection labor-intensive. In contrast, model-based planning with dynamics models can effectively cover a sufficient range of configurations using only off-policy data. Yet, without the guidance of expert demonstrations, many tasks are difficult and time-consuming to plan using the dynamics models. Therefore, we take the best of both model learning and imitation learning, and propose neural dynamics augmented imitation learning that covers a large scene configurations with few-shot demonstrations. This method trains a robust diffusion policy in a local support region using few-shot demonstrations and rearranges objects outside this region into it using offline-trained neural dynamics models. Extensive experiments across various tasks in both simulations and real-world scenarios, including granular manipulation, contact-rich task and multi-object interaction task, have demonstrated that trained with only 1 to 30 demonstrations, our proposed method can robustly cover a significantly larger area than the policy trained purely from the demonstrations. Our project page is available at: https://dynamics-dp.github.io/.
|
|
10:10-10:15, Paper ThBT22.4 | |
CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation |
|
Xia, Shangning | Shanghai Jiao Tong University |
Fang, Hongjie | Shanghai Jiao Tong University |
Lu, Cewu | ShangHai Jiao Tong University |
Fang, Hao-Shu | Massachusetts Institute of Technology |
Keywords: Imitation Learning, Learning from Demonstration
Abstract: Generalization in robotic manipulation remains a critical challenge, particularly when scaling to new environments with limited demonstrations. This paper introduces CAGE, a novel robotic manipulation policy designed to overcome these generalization barriers by integrating the pre-trained visual representation with causal attention mechanism. CAGE utilizes the powerful feature extraction capabilities of the vision foundation model DINOv2, combined with LoRA fine-tuning for robust environment understanding. The policy further employs a causal perceiver for effective token compression and a diffusion-based action head with attention to enhance task-specific fine-grained conditioning. With as few as 50 demonstrations from a single training environment, CAGE achieves robust generalization across diverse visual changes in objects, backgrounds, and viewpoints. Extensive experiments validate that CAGE significantly outperforms existing state-of-the-art RGB/RGB-D-based approaches in various manipulation tasks, especially under large distribution shifts. In similar environments, CAGE offers an average of 42% increase in task completion rate. While all baselines fail in unseen environments, CAGE manages to obtain a 43% completion rate and a 51% success rate in average, marking a substantial advancement toward the practical deployment of robots in real-world settings. Project website: cage-policy.github.io.
|
|
10:15-10:20, Paper ThBT22.5 | |
RoCoDA: Counterfactual Data Augmentation for Data-Efficient Robot Learning from Demonstrations |
|
Ameperosa, Ezra | Georgia Institute of Technology |
Collins, Jeremy | Georgia Institute of Technology |
Jain, Mrinal | Georgia Institute of Technology |
Garg, Animesh | Georgia Institute of Technology |
Keywords: Imitation Learning, Bimanual Manipulation, Deep Learning Methods
Abstract: Imitation learning in robotics faces significant challenges in generalization due to the complexity of robotic environments and the high cost of data collection. We introduce RoCoDA, a novel method that unifies the concepts of invariance, equivariance, and causality within a single framework to enhance data augmentation for imitation learning. RoCoDA leverages causal invariance by modifying task-irrelevant subsets of the environment state without affecting the policy's output. Simultaneously, we exploit SE(3) equivariance by applying rigid body transformations to object poses and adjusting corresponding actions to generate synthetic demonstrations. We validate RoCoDA through extensive experiments on five robotic manipulation tasks, demonstrating improvements in policy performance, generalization, and sample efficiency compared to state-of-the-art data augmentation methods. Our policies exhibit robust generalization to unseen object poses, textures, and the presence of distractors. Furthermore, we observe emergent behavior such as re-grasping, indicating policies trained with RoCoDA possess a deeper understanding of task dynamics. By leveraging invariance, equivariance, and causality, RoCoDA provides a principled approach to data augmentation in imitation learning, bridging the gap between geometric symmetries and causal reasoning. Project Page: https://rocoda.github.io
|
|
10:20-10:25, Paper ThBT22.6 | |
Conditional Neural Expert Processes for Learning Movement Primitives from Demonstration |
|
Yildirim, Yigit | Bogazici University |
Ugur, Emre | Bogazici University |
Keywords: Learning from Demonstration, Deep Learning Methods
Abstract: Learning from Demonstration (LfD) is a widely used technique for skill acquisition in robotics. However, demonstrations of the same skill may exhibit significant variances, or learning systems may attempt to acquire different means of the same skill simultaneously, making it challenging to encode these motions into movement primitives. To address these challenges, we propose an LfD framework, namely the Conditional Neural Expert Processes (CNEP), that learns to assign demonstrations from different modes to distinct expert networks utilizing the inherent information within the latent space to match experts with the encoded representations. CNEP does not require supervision on which mode the trajectories belong to. We compare the performance of CNEP against widely used and powerful LfD methods such as Gaussian Mixture Models, Probabilistic Movement Primitives, and Stable Movement Primitives and show that our method outperforms these baselines on multimodal trajectory datasets. The results reveal enhanced modeling performance for movement primitives, leading to the synthesis of trajectories that more accurately reflect those demonstrated by experts, particularly when the skill demonstrations include intersection points from various trajectories. We evaluated the CNEP model on two real-robot tasks, namely obstacle avoidance and pick-and-place tasks, that require the robot to learn multi-modal motion trajectories and execute the correct primitives given target environment conditions. We also showed that our system is capable of on-the-fly adaptation to environmental changes via an online conditioning mechanism. Lastly, we believe that CNEP offers improved explainability and interpretability by autonomously finding discrete behavior primitives and providing probability values about its expert selection decisions.
|
|
10:25-10:30, Paper ThBT22.7 | |
PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-Efficient Imitation Learning |
|
Gao, Tian | Stanford University |
Nasiriany, Soroush | The University of Austin at Texas |
Liu, Huihan | University of Texas, Austin |
Yang, Quantao | KTH Royal Institute of Technology |
Zhu, Yuke | The University of Texas at Austin |
Keywords: Imitation Learning, Deep Learning in Grasping and Manipulation, Deep Learning Methods
Abstract: Imitation learning has shown great potential for enabling robots to acquire complex manipulation behaviors. However, these algorithms suffer from high sample complexity in long-horizon tasks, where compounding errors accumulate over the task horizons. We present PRIME (PRimitive-based IMitation with data Efficiency), a behavior primitive-based framework designed for improving the data efficiency of imitation learning. PRIME scaffolds robot tasks by decomposing task demonstrations into primitive sequences, followed by learning a high-level control policy to sequence primitives through imitation learning. Our experiments demonstrate that PRIME achieves a significant performance improvement in multi-stage manipulation tasks, with 10-34% higher success rates in simulation over state-of-the-art baselines and 20-48% on physical hardware.
|
|
ThBT23 |
412 |
Diffusion-Based Visual Perception and Learning |
Regular Session |
Chair: Brandt, Laura Eileen | Massachusetts Institute of Technology |
Co-Chair: Nalpantidis, Lazaros | Technical University of Denmark |
|
09:55-10:00, Paper ThBT23.1 | |
Towards Dense and Accurate Radar Perception Via Efficient Cross-Modal Diffusion Model |
|
Zhang, Ruibin | Zhejiang University |
Xue, Donglai | Huzhou Institude of Zhejiang University |
Wang, Yuhan | Nanyang Technological University |
Geng, Ruixu | University of Science and Technology of China |
Gao, Fei | Zhejiang University |
Keywords: Range Sensing, Deep Learning Methods, Mapping
Abstract: Millimeter wave (mmWave) radars have attracted significant attention from both academia and industry due to their capability to operate in extreme weather conditions. However, they face challenges in terms of sparsity and noise interference, which hinder their application in the field of micro aerial vehicle (MAV) autonomous navigation. To this end, this paper proposes a novel approach to dense and accurate mmWave radar point cloud construction via cross-modal learning. Specifically, we introduce diffusion models, which possess state-of-the-art performance in generative modeling, to predict LiDAR-like point clouds from paired raw radar data. We also incorporate the most recent diffusion model inference accelerating techniques to ensure that the proposed method can be implemented on MAVs. We validate the proposed method through extensive benchmark comparisons and real-world experiments, demonstrating its superior performance and generalization ability. Code and pre-trained models will be available at https://github.com/ZJU-FAST-Lab/Radar-Diffusion.
|
|
10:00-10:05, Paper ThBT23.2 | |
DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model |
|
Jia, Peijin | Tsinghua University |
Wen, Tuopu | Tsinghua University |
Luo, Ziang | TsingHua University |
Yang, Mengmeng | Tsinghua University |
Jiang, Kun | Tsinghua University |
Liu, ZiYuan | Tsinghua University |
Tang, Xuewei | Tsinghua University |
Lei, Zhiquan | Tsinghua University |
Cui, Le | DIdi Inc |
Sheng, Kehua | DIdi Inc |
Zhang, Bo | DIdi Inc |
Yang, Diange | Tsinghua University |
Keywords: Mapping, Computer Vision for Transportation, Deep Learning for Visual Perception
Abstract: Constructing high-definition (HD) maps is a crucial requirement for enabling autonomous driving. In recent years, several map segmentation algorithms have been developed to address this need, leveraging advancements in Bird's-Eye View (BEV) perception. However, existing models still encounter challenges in producing realistic and consistent semantic map layouts. One prominent issue is the limited utilization of structured priors inherent in map segmentation masks. In light of this, we propose DiffMap, a novel approach specifically designed to model the structured priors of map segmentation masks using latent diffusion model. By incorporating this technique, the performance of existing semantic segmentation methods can be significantly enhanced and certain structural errors present in the segmentation outputs can be effectively rectified. Notably, the proposed module can be seamlessly integrated into any map segmentation model, thereby augmenting its capability to accurately delineate semantic information. Furthermore, through extensive visualization analysis, our model demonstrates superior proficiency in generating results that more accurately reflect real-world map layouts, further validating its efficacy in improving the quality of the generated maps.
|
|
10:05-10:10, Paper ThBT23.3 | |
AVD2: Accident Video Diffusion for Accident Video Description |
|
Li, Cheng | The Hong Kong University of Science and Technology |
Zhou, Keyuan | Jilin University |
Liu, Tong | Nanjing University of Science and Technology |
Wang, Yu | Beijing Institute of Technology |
Zhuang, Mingqiao | Fudan University |
Gao, Huan-ang | Tsinghua University |
Jin, Bu | Institute of Automation, Chinese Academy of Sciences |
Zhao, Hao | Tsinghua University |
Keywords: Computer Vision for Transportation, Semantic Scene Understanding
Abstract: Traffic accidents present complex challenges for autonomous driving, often creating unpredictable scenarios that hinder accurate system interpretation and responses. Therefore, understanding accident scenarios is crucial for improving safety and gaining public trust. However, current methods struggle to fully explain accident causes and preventive actions. In this work, we introduce AVD2 (Accident Video Diffusion for Accident Video Description), a novel framework that enhances accident scene understanding by generating detailed natural language descriptions and reasoning. Additionally, we propose a new approach for augmenting accident video datasets by generating accident videos with a customized diffusion model, resulting in the EMM-AU (Enhanced Multi-Modal Accident Video Understanding) dataset, a higher-quality, more diverse version of MM-AU. Experimental results demonstrate that using the AVD2 system and training on the EMM-AU dataset achieves state-of-the-art performance in both automated metrics and human evaluations, significantly advancing accident analysis and prevention. Project resources are available at https://an-answer-tree.github.io
|
|
10:10-10:15, Paper ThBT23.4 | |
LDM-ISP: Enhancing Neural ISP for Low Light with Latent Diffusion Models |
|
Wen, Qiang | The Hong Kong University of Science and Technology |
Rao, Zhefan | HKUST |
Xing, Yazhou | The Hong Kong University of Science and Technology |
Chen, Qifeng | HKUST |
Keywords: Deep Learning for Visual Perception, Visual Learning, Computer Vision for Automation
Abstract: Enhancing a low-light noisy RAW image into a well-exposed and clean sRGB image is a significant challenge for modern digital cameras. Prior approaches have difficulties in recovering fine-grained details and true colors of the scene under extremely low-light environments due to near-to-zero SNR. Meanwhile, diffusion models have shown significant progress towards general domain image generation. In this paper, we propose to leverage the pre-trained latent diffusion model to perform the neural ISP for enhancing extremely low-light images. Specifically, to tailor the pre-trained latent diffusion model to operate on the RAW domain, we train a set of lightweight taming modules to inject the RAW information into the diffusion denoising process via modulating the intermediate features of UNet. We further observe different roles of UNet denoising and decoder reconstruction in the latent diffusion model, which inspires us to decompose the low-light image enhancement task into latent-space low-frequency content generation and decoding-phase high-frequency detail maintenance. Through extensive experiments on representative datasets, we demonstrate our simple design not only achieves state-of-the-art performance in quantitative evaluations but also shows significant superiority in visual comparisons over strong baselines, which highlight the effectiveness of powerful generative priors for neural ISP under extremely low-light environments.
|
|
10:15-10:20, Paper ThBT23.5 | |
SteeredMarigold: Steering Diffusion towards Depth Completion of Largely Incomplete Depth Maps |
|
Gregorek, Jakub | DTU - Technical University of Denmark |
Nalpantidis, Lazaros | Technical University of Denmark |
Keywords: RGB-D Perception, Deep Learning for Visual Perception
Abstract: Even if the depth maps captured by RGB-D sensors deployed in real environments are often characterized by large areas missing valid depth measurements, the vast majority of depth completion methods still assumes depth values covering all areas of the scene. To address this limitation, we introduce SteeredMarigold, a training-free, zero-shot depth completion method capable of producing metric dense depth, even for largely incomplete depth maps. SteeredMarigold achieves this by using the available sparse depth points as conditions to steer a denoising diffusion probabilistic model. Our method outperforms relevant top-performing methods on the NYUv2 dataset, in tests where no depth was provided for a large area, achieving state-of-art performance and exhibiting remarkable robustness against depth map incompleteness. Our source code is publicly available at https://steeredmarigold.github.io.
|
|
10:20-10:25, Paper ThBT23.6 | |
DualDiff: Dual-Branch Diffusion Model for Autonomous Driving with Semantic Fusion |
|
Li, Haoteng | Xi'an Jiaotong University |
Yang, Zhao | Xi'an Jiaotong University |
Qian, Zezhong | Xi'an Jiaotong University |
Zhao, Gongpeng | University of Science and Technology of China |
Huang, Yuqi | Xi'an Jiaotong University |
Yu, Jun | University of Science and Technology of China |
Zhou, Huazheng | Xi'an Jiaotong University |
Liu, Longjun | Xi'an Jiaotong University |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception, Visual Learning
Abstract: Accurate and high-fidelity driving scene reconstruction relies on fully leveraging scene information as conditioning. However, existing approaches, which primarily use 3D bounding boxes and binary maps for foreground and background control, fall short in capturing the complexity of the scene and integrating multi-modal information. In this paper, we propose DualDiff, a dual-branch conditional diffusion model designed to enhance multi-view driving scene generation. We introduce Occupancy Ray Sampling (ORS), a semantic-rich 3D representation, alongside numerical driving scene representation, for comprehensive foreground and background control. To improve cross-modal information integration, we propose a Semantic Fusion Attention (SFA) mechanism that aligns and fuses features across modalities. Furthermore, we design a foreground-aware masked (FGM) loss to enhance the generation of tiny objects. DualDiff achieves state-of-the-art performance in FID score, as well as consistently better results in downstream BEV segmentation and 3D object detection tasks.
|
|
10:25-10:30, Paper ThBT23.7 | |
Anomalies-By-Synthesis: Anomaly Detection Using Generative Diffusion Models for Off-Road Navigation |
|
Ancha, Siddharth | Massachusetts Institute of Technology |
Jiang, Sunshine | Massachusetts Institute of Technology |
Manderson, Travis | McGill University |
Brandt, Laura Eileen | Massachusetts Institute of Technology |
Du, Yilun | MIT |
Osteen, Philip | U.S. Army Research Laboratory |
Roy, Nicholas | Massachusetts Institute of Technology |
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, Visual Learning
Abstract: In order to navigate safely and reliably in off-road environments, robots must detect anomalies that are out-of- distribution (OOD) with respect to the training data. We present an analysis-by-synthesis approach for pixel-wise anomaly detection without making any assumptions about the nature of OOD data. Given an input image, we use a generative diffusion model to synthesize an edited image that removes anomalies while keeping the remaining image unchanged. Then, we formulate anomaly detection as analyzing which image segments were modified by the diffusion model. We propose a novel inference approach for guided diffusion by analyzing the ideal guidance gradient and deriving a principled approximation that bootstraps the diffusion model to predict guidance gradients. Our editing technique is purely test-time that can be integrated into existing workflows without the need for retraining or fine-tuning. Finally, we use a combination of vision-language foundation models to compare pixels between the original and synthesized images in a learned feature space and detect semantically meaningful edits. Our diffusion-based analysis-by-synthesis method enables accurate anomaly detections for off-road navigation.
|
|
ThCT1 |
302 |
Mobile Manipulation: Planning and Control |
Regular Session |
Chair: Martín-Martín, Roberto | University of Texas at Austin |
Co-Chair: Tsagarakis, Nikos | Istituto Italiano Di Tecnologia |
|
11:15-11:20, Paper ThCT1.1 | |
EHC-MM: Embodied Holistic Control for Mobile Manipulation |
|
Wang, Jiawen | Peking University |
Jin, Yixiang | Samsung Research China – Beijing (SRC-B) |
Shi, Jun | Samsung Research China – Beijing (SRC-B) |
A, Yong | Samsung Research China – Beijing (SRC-B) |
Li, Dingzhe | Beihang University |
Sun, Fuchun | Tsinghua University |
Luo, Dingsheng | Peking University |
Fang, Bin | Beijing University of Posts and Telecommunications / Tsinghua Un |
Keywords: Mobile Manipulation, Embodied Cognitive Science, Whole-Body Motion Planning and Control
Abstract: Mobile manipulation typically entails the base for mobility, the arm for accurate manipulation, and the camera for perception. The principle of Distant Mobility, Close Grasping(DMCG) is essential for holistic control. We propose Embodied Holistic Control for Mobile Manipulation(EHC-MM) with the embodied function of sig(w): By formulating the DMCG principle as a Quadratic Programming (QP) problem, sig(w) dynamically balances the robot’s emphasis between movement and manipulation with the consideration of the robot's state and environment. In addition, we propose the Monitor-Position-Based Servoing(MPBS) with sig(w), enabling the tracking of the target during the operation. This approach enables coordinated control among the robot's base, arm, and camera, enhancing task efficiency. Through extensive simulations and real-world experiments, our approach significantly improves both the success rate and efficiency of mobile manipulation tasks, achieving a 95.6% success rate in real-world scenarios and a 52.8% increase in time efficiency.
|
|
11:20-11:25, Paper ThCT1.2 | |
BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for Building-Wide Mobile Manipulation |
|
Shah, Rutav | The University of Texas at Austin |
Yu, Albert | UT Austin |
Zhu, Yifeng | The University of Texas at Austin |
Zhu, Yuke | The University of Texas at Austin |
Martín-Martín, Roberto | University of Texas at Austin |
Keywords: Mobile Manipulation, Big Data in Robotics and Automation, Continual Learning
Abstract: To operate at a building scale, service robots must perform long-horizon mobile manipulation tasks by navigating to different rooms, accessing multiple floors, and interacting with a wide and unseen range of everyday objects. We refer to these tasks as Building-wide Mobile Manipulation. To tackle these inherently long-horizon tasks, we introduce BUMBLE, a unified Vision-Language Model (VLM)-based framework integrating open-world RGB-D perception, a wide spectrum of gross-to-fine motor skills, and dual-layered memory. Our extensive evaluation (90+ hours) indicates that BUMBLE outperforms competitive baselines in long-horizon building-wide tasks that require sequencing up to 12 skills, spanning 15 minutes per trial. BUMBLE achieves 47.1% success rate averaged over 70 trials in different buildings, tasks, and scene layouts from various starting locations. Our user study shows 22% higher task satisfaction using our framework compared to state-of-the-art VLM-based mobile manipulation methods. Finally, we show the potential of using increasingly capable foundation models to improve the system performance further. For more information, see https://robin-lab.cs.utexas.edu/BUMBLE/
|
|
11:25-11:30, Paper ThCT1.3 | |
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation |
|
Liu, Peiqi | New York University |
Guo, Zhanqiu | New York University |
Warke, Mohit | New York University |
Chintala, Soumith | Facebook AI Research |
Paxton, Chris | Meta AI |
Shafiullah, Nur Muhammad (Mahi) | New York University |
Pinto, Lerrel | New York University |
Keywords: Semantic Scene Understanding, Mobile Manipulation, Continual Learning
Abstract: Significant progress has been made in open-vocabulary mobile manipulation, where the goal is for a robot to perform tasks in any environment given a natural language description. However, most current systems assume a static environment, which limits the system’s applicability in real-world scenarios where environments frequently change due to human intervention or the robot’s own actions. In this work, we present DynaMem, a new approach to open-world mobile manipulation that uses a dynamic spatio-semantic memory to represent a robot’s environment. DynaMem constructs a 3D data structure to maintain a dynamic memory of point clouds, and answers open-vocabulary object localization queries using multimodal LLMs or open-vocabulary features generated by state-of-the-art vision-language models. Powered by DynaMem, our robots can explore novel environments, search for objects not found in memory, and continuously update the memory as objects move, appear, or disappear in the scene. We run extensive experiments on the Stretch SE3 robots in three real and nine offline scenes, and achieve an average pick-and-drop success rate of 70% on non-stationary objects, a 3X improvement over state-of-the-art static systems.
|
|
11:30-11:35, Paper ThCT1.4 | |
Whole-Body Model Predictive Control for Mobile Manipulation with Task Priority Transition |
|
Wang, Yushi | Tsinghua University |
Chen, Ruoqu | Tsinghua University |
Zhao, Mingguo | Tsinghua University |
Keywords: Mobile Manipulation, Whole-Body Motion Planning and Control
Abstract: Mobile manipulators enable a wide range of operations with mobility and advanced manipulation capabilities. Despite their potential, existing approaches typically treat the mobile base and the manipulator separately, thereby limiting the optimality of the system for composite whole-body behaviors. In this work, we present a Whole-Body Model Predictive Control framework for mobile manipulation involving tasks with varying timelines. We integrate task priorities across both task and time dimensions, bringing inherent transition ability with enhanced performance. Our approach improves the trajectory tracking performance by up to 36% in terms of manipulability and reduces the maximum velocity during task priority transitions by 53% compared to the existing approach while maintaining a low computational cost of 4.3ms, allowing for high reactivity in real-world applications. We demonstrate its effectiveness through a door-opening and traversing behavior, showcasing the first successful implementation of a non-holonomic mobile manipulator in such a scenario. See https://wbmpc.github.io/ for supplemental materials.
|
|
11:35-11:40, Paper ThCT1.5 | |
Dynamic Object Goal Pushing with Mobile Manipulators through Model-Free Constrained Reinforcement Learning |
|
Dadiotis, Ioannis | Italian Institute of Technology |
Mittal, Mayank | ETH Zurich |
Tsagarakis, Nikos | Istituto Italiano Di Tecnologia |
Hutter, Marco | ETH Zurich |
Keywords: Mobile Manipulation, AI-Enabled Robotics, Deep Learning in Grasping and Manipulation
Abstract: Non-prehensile pushing to move and reorient objects to a goal is a versatile loco-manipulation skill. In the real world, the object's physical properties and friction with the floor contain significant uncertainties, which makes the task challenging for a mobile manipulator. In this paper, we develop a learning-based controller for a mobile manipulator to move an unknown object to a desired position and yaw orientation through a sequence of pushing actions. The proposed controller for the robotic arm and the mobile base motion is trained using a constrained Reinforcement Learning (RL) formulation. We demonstrate its capability in experiments with a quadrupedal robot equipped with an arm. The learned policy achieves a success rate of 91.35% in simulation and at least 80% on hardware in challenging scenarios. Through our extensive hardware experiments, we show that the approach demonstrates high robustness against unknown objects of different masses, materials, sizes, and shapes. It reactively discovers the pushing location and direction, thus achieving contact-rich behavior while observing only the pose of the object. Additionally, we demonstrate the adaptive behavior of the learned policy towards preventing the object from toppling.
|
|
11:40-11:45, Paper ThCT1.6 | |
Door-To-Door Parcel Delivery from Supply Point to Users Home with Heterogeneous Robot Team: EuROBIN First Year Robotics Hackathon |
|
Suarez, Alejandro | University of Seville |
Kartmann, Rainer | Karslruhe Institute of Technology (KIT) |
Leidner, Daniel | German Aerospace Center (DLR) |
Rossini, Luca | Istituto Italiano Di Tecnologia |
Huber, Johann | ISIR, Sorbonne Université |
Azevedo, Carlos | Instituto Superior Técnico - Institute for Systems and Robotics |
Rouxel, Quentin | INRIA |
Bjelonic, Marko | ETH Zurich |
Gonzalez-Morgado, Antonio | Universidad De Sevilla |
Dreher, Christian R. G. | Karlsruhe Institute of Technology (KIT) |
Schmaus, Peter | German Aerospace Center (DLR) |
Laurenzi, Arturo | Istituto Italiano Di Tecnologia |
Hélénon, François | Sorbonne Université |
Serra, Rodrigo | Institute for Systems and Robotics / Instituto Superior Técnico |
Rochel, Olivier | INRIA Institut National De Recherche En Sciences Et Technologies |
Wellhausen, Lorenz | ETH Zürich |
Perez Sanchez, Vicente | University of Seville, GRVC |
Gao, Jianfeng | Karlsruhe Institute of Technology (KIT) |
Bauer, Adrian Simon | German Aerospace Center (DLR) |
De Luca, Alessio | Istituto Italiano Di Tecnologia |
Abrini, Mouad | Sorbonne University |
Bettencourt, Rui | Institute for Systems and Robotics / Instituto Superior Técnico |
Mouret, Jean-Baptiste | Inria |
Lee, Joonho | Neuromeka |
Viana Servan, Pablo | GRVC |
Pohl, Christoph | Karlsruhe Institute of Technology (KIT) |
Batti, Nesrine | German Aerospace Center (DLR) |
Vedelago, Diego | Fondazione Istituto Italiano Di Tecnologia (IIT) |
Guda, Vamsi Krishna | Sorbonne University |
Carlos, Alvarez Cia | Universidad De Sevilla |
Reister, Fabian | Karlsruhe Institute of Technology (KIT) |
Friedl, Werner | German AerospaceCenter (DLR) |
Burchielli, Corrado | Fondazione Istituto Italiano Di Tecnologia (IIT) |
Baudry, Aline | CNRS |
Peller-Konrad, Fabian | Karlsruhe Institute of Technology (KIT) |
Gumpert, Thomas | German Aerospace Center (DLR) |
Muratore, Luca | Istituto Italiano Di Tecnologia |
Gauthier, Philippe | Sorbonne Université |
Schedl-Warpup, Rebecca | German Aerospace Center (DLR) |
Hutter, Marco | ETH Zurich |
Ivaldi, Serena | INRIA |
Lima, Pedro U. | Instituto Superior Técnico - Institute for Systems and Robotics |
Doncieux, Stéphane | Sorbonne University |
Tsagarakis, Nikos | Istituto Italiano Di Tecnologia |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Ollero, Anibal | AICIA. G41099946 |
Albu-Schäffer, Alin | DLR - German Aerospace Center |
Keywords: Cooperating Robots, Mobile Manipulation, Service Robotics
Abstract: Logistics and service operations involving parcel preparation, delivery, and unpacking from a supply point to the user's home could be carried out completely by robots in the near future, taking benefit of the capabilities of the different robot morphologies for the logistics, outdoors, and domestic environments. The use of robots for parcel delivery can contribute to the goals of sustainability and reduced emissions by exploiting the different locomotion modalities (wheeled, legged, and aerial). This paper reports the development and results obtained from the first robotics hackathon celebrated as part of the European Robotics and Artificial Intelligence Network (euROBIN) involving eight robotic platforms in three domains: 1) an industrial robotic arm for parcel preparation at the supply point, 2) a Centauro robot, a dual-arm aerial manipulator, and a wheeled-legged quadruped for parcel transportation, and 3) two humanoid robots and two commercial mobile manipulators for parcel delivery and unpacking in domestic scenarios. The paper describes the joint operation and the evaluation scenario, the features and capabilities of the robots, particularly those involved in the realization of the tasks, and the lessons learned.
|
|
ThCT2 |
301 |
Bio-Inspired Robot Learning |
Regular Session |
Chair: Tucker, Maegan | Georgia Institute of Technology |
Co-Chair: Krichmar, Jeffrey | University of California, Irvine |
|
11:15-11:20, Paper ThCT2.1 | |
HSRL: A Hierarchical Control System Based on Spiking Deep Reinforcement Learning for Robot Navigation |
|
Yang, Bo | Zhejiang University |
Zhou, Shibo | Zhejiang Lab |
Lin, Chaohui | Zhejiang Lab |
Chai, Qingao | Zhejiang University |
Yan, Rui | Zhejiang University of Technology |
Ma, De | Zhejiang University |
Pan, Gang | Zhejiang University |
Tang, Huajin | Zhejiang University, China |
Keywords: Bioinspired Robot Learning, Reinforcement Learning, Motion Control
Abstract: Reinforcement Learning (RL) has shown promise in robotic navigation tasks, yet applying it to real-world environments remains challenging due to dynamic complexities and the need for dynamically feasible actions. We propose a hierarchical control framework based on Spiking Deep Reinforcement Learning (SDRL) for robust robot navigation in real environments. Our approach utilizes a two-layer architecture: a high-level decision layer powered by a Spiking GRU network for handling partially observable environments, and a low-level executive layer employing Continuous Attractor Neural Networks (CANNs) to ensure precise and continuous actions. This hierarchical structure allows real-time decision-making that respects the physical constraints of the robot. Experimental results show that our method adapts effectively to new environments without fine-tuning and surpasses existing methods in performance. We also explore the implementation on the Darwin3 chip, paving the way for biologically inspired motion control in future robotic applications.
|
|
11:20-11:25, Paper ThCT2.2 | |
Materials Matter: Investigating Functional Advantages of Bio-Inspired Materials Via Simulated Robotic Hopping |
|
Schulz, Andrew | Max Planck Institute for Intelligent System |
Ahmad, Ayah | Georgia Institute of Technology |
Tucker, Maegan | Georgia Institute of Technology |
Keywords: Methods and Tools for Robot System Design, Biologically-Inspired Robots, Simulation and Animation
Abstract: In contrast with the diversity of materials found in nature, most robots are designed with some combination of aluminum, stainless steel, and 3D-printed filament. Additionally, robotic systems are typically assumed to follow basic rigid-body dynamics. However, several examples in nature illustrate how changes in physical material properties yield functional advantages. In this paper, we explore how physical materials (non-rigid bodies) affect the functional performance of a hopping robot. In doing so, we address the practical question of how to model and simulate material properties. Through these simulations we demonstrate that material gradients in the leg of a single-limb hopper provide functional advantages compared to homogeneous designs. For example, when considering incline ramp hopping, a material gradient with increasing density provides a 35% reduction in tracking error and a 23% reduction in power consumption compared to homogeneous stainless steel. By providing bio-inspiration to the rigid limbs in a robotic system, we seek to show that future fabrication of robots should look to leverage the material anisotropies of moduli and density found in nature. This would allow for reduced vibrations in the system and would provide offsets of joint torques and vibrations while protecting their structural integrity against reduced fatigue and wear. This simulation system could inspire future intelligent material gradients of custom-fabricated robotic locomotive devices.
|
|
11:25-11:30, Paper ThCT2.3 | |
SHIRE: Enhancing Sample Efficiency Using Human Intuition in REinforcement Learning |
|
Joshi, Amogh | Purdue University |
Kosta, Adarsh Kumar | Purdue University |
Roy, Kaushik | Purdue University |
Keywords: Reinforcement Learning, Bioinspired Robot Learning, Probabilistic Inference
Abstract: The ability of neural networks to perform robotic perception and control tasks such as depth and optical flow estimation, simultaneous localization and mapping (SLAM), and automatic control has led to their widespread adoption in recent years. Deep Reinforcement Learning (DeepRL) has been used extensively in these settings, as it does not have the unsustainable training costs associated with supervised learning. However, DeepRL suffers from poor sample efficiency, i.e., it requires a large number of environmental interactions to converge to an acceptable solution. Modern RL algorithms such as Deep Q Learning and Soft Actor-Critic attempt to remedy this shortcoming but can not provide the explainability required in applications such as autonomous robotics. Humans intuitively understand the long-time-horizon sequential tasks common in robotics. Properly using such intuition can make RL policies more explainable while enhancing their sample efficiency. In this work, we propose SHIRE, a novel framework for encoding human intuition using Probabilistic Graphical Models (PGMs) and using it in the Deep RL training pipeline to enhance sample efficiency. Our framework achieves 25−78% sample efficiency gains across the environments we evaluate at negligible overhead cost. Additionally, by teaching RL agents the encoded elementary behavior, SHIRE enhances policy explainability. A real-world demonstration further highlights the efficacy of policies trained using our framework.
|
|
11:30-11:35, Paper ThCT2.4 | |
Hyperdimensional Computing-Based Federated Learning in Mobile Robots through Synthetic Oversampling |
|
Lee, Hyunsei | DGIST |
Han, WoongJae | DGIST |
Kim, Hojeong | DGIST |
Kwon, Hyukjun | DGIST |
Jang, Shinhyoung | DGIST |
Suh, Il Hong | Hanyang University |
Kim, Yeseong | DGIST |
Keywords: Bioinspired Robot Learning, Networked Robots, Learning from Demonstration
Abstract: Traditional federated learning frameworks, often reliant on deep neural networks, face challenges related to computational demands and privacy risks. In this paper, we present a novel Hyperdimensional (HD) Computing-based federated learning framework designed for resource-constrained mobile robots. Unlike other HD-based learning, our approach introduces dynamic encoding, which improves both model accuracy and privacy by continuously updating hypervector representations. To further address the issue of imbalanced data, especially prevalent in robotics tasks, we propose a hypervector oversampling technique, enhancing model robustness. Extensive evaluations on LiDAR-equipped mobile robots demonstrate that our oversampling method outperforms state-of-the-art HD computing frameworks, achieving up to a 22.9% increase in accuracy while maintaining computational efficiency and privacy protection.
|
|
11:35-11:40, Paper ThCT2.5 | |
Brain-Inspired Spatial Continuous State Encoding for Efficient Spiking-Based Navigation |
|
Chai, Qingao | Zhejiang University |
Wang, Jiashuo | Zhejiang University |
Jiang, Runhao | Zhejiang University |
Yang, Bo | Zhejiang University |
Yan, Rui | Zhejiang University of Technology |
Tang, Huajin | Zhejiang University, China |
Keywords: Bioinspired Robot Learning, Reinforcement Learning, Cognitive Modeling
Abstract: Spiking neural networks (SNNs) show great potential in mapless navigation tasks due to their low power consumption, but the continuous representation of spatial information poses a challenge to SNN training. Neuroscience findings reveal that spatial cognition cells encode spatial information through population spike patterns. Inspired by this, we propose a navigation method based on SNNs, leveraging spatial cognition cells, which include grid cells (GCs), head direction cells (HDCs), and boundary vector cells (BVCs). Our method integrates spike-based information to achieve precise navigation goal encoding and egocentric environment perception, significantly improving SNN navigation capabilities in complex environments. Simulation and real-world experiments demonstrate that our method achieves significant improvements in navigation success rate and energy efficiency, showcasing superior adaptability across environments. Our work provides a novel approach to developing efficient brain-inspired navigation systems.
|
|
11:40-11:45, Paper ThCT2.6 | |
A Rapid Adapting and Continual Learning Spiking Neural Network Path Planning Algorithm for Mobile Robots |
|
Espino, Harrison | University of California, Irvine |
Bain, Robert | University of California Irvine |
Krichmar, Jeffrey | University of California, Irvine |
Keywords: Neurorobotics, Learning from Experience, Motion and Path Planning
Abstract: Mapping traversal costs in an environment and planning paths based on this map are important for autonomous navigation. We present a neurorobotic navigation system that utilizes a Spiking Neural Network (SNN) Wavefront Planner and E-prop learning to concurrently map and plan paths in a large and complex environment. We incorporate a novel method for mapping which, when combined with the Spiking Wavefront Planner (SWP), allows for adaptive planning by selectively considering any combination of costs. The system is tested on a mobile robot platform in an outdoor environment with obstacles and varying terrain. Results indicate that the system is capable of discerning features in the environment using three measures of cost, (1) energy expenditure by the wheels, (2) time spent in the presence of obstacles, and (3) terrain slope. In just twelve hours of online training, E-prop learns and incorporates traversal costs into the path planning maps by updating the delays in the SWP. On simulated paths, the SWP plans significantly shorter and lower cost paths than A* and RRT*. The SWP is compatible with neuromorphic hardware and could be used for applications requiring low size, weight, and power.
|
|
ThCT3 |
303 |
Space Robotics 1 |
Regular Session |
Chair: Naclerio, Nicholas | University of California, Santa Barbara |
Co-Chair: Beksi, William J. | The University of Texas at Arlington |
|
11:15-11:20, Paper ThCT3.1 | |
LuVo: Lunar Visual Odometry Using Homography-Based Image Feature Matching |
|
Soussan, Ryan | Aerodyne Industries |
McCaffery, John | KBR, Inc |
McMichael, Scott | NASA Ames Research Center, KBR Inc |
Deans, Matthew | NASA Ames Research Center |
Keywords: Space Robotics and Automation, Vision-Based Navigation
Abstract: We present LuVo, an initialization-free stereo visual odometry (VO) method developed for the VIPER lunar rover. We provide a novel stereo registration method using LightGlue image feature matching in a warped, locally planar space that improves matching robustness to larger baseline stereo sequences and repetitive terrain that traditionally challenge odometry approaches. We additionally introduce methods that increase the usable image region for matching by estimating a horizon cutoff in image space and enhance robustness to stereo correspondence failures using a Manhattan distance search for valid stereo points during cloud alignment. We evaluate the performance of LuVo on a dataset of 155 simulated lunar stereo sequences and show that it significantly improves registration accuracy and success rates for clouds separated by both expected driving ranges below eight meters and longer distance translations of up to 16 meters. While LuVo is developed for VIPER, it can be used in other environments featuring slip-prone and repetitive terrain that limit rover travel.
|
|
11:20-11:25, Paper ThCT3.2 | |
Instance Segmentation-Based Hazard Detection with Lunar South Pole Lighting |
|
Cloud, Joseph | NASA Kennedy Space Center |
Buckles, Bradley | NASA Kennedy Space Center |
Muller, Thomas | Bennett Aerospace, NASA Kennedy Space Center |
Beksi, William J. | The University of Texas at Arlington |
Schuler, Jason | NASA Kennedy Space Center |
Keywords: Space Robotics and Automation, Mining Robotics, Object Detection, Segmentation and Categorization
Abstract: This paper addresses rock hazard detection for in-situ resource utilization (ISRU) robotic navigation in the challenging visual environment of the lunar south pole (LSP). We evaluate three state-of-the-art instance segmentation models—Mask~R-CNN, YOLOv8, and SAM—using a novel, synthetically generated dataset that simulates LSP-specific illumination challenges at sun angles of 2.5°, 5°, and 7.5°. Additionally, we evaluate these approaches in both up and down-sun driving with low solar angle light. This study highlights the potential of deep learning-based approaches for improving ISRU operations by reliably identifying visual surface hazards, such as rocks, which may impede robotic navigation and excavation in future lunar missions.
|
|
11:25-11:30, Paper ThCT3.3 | |
Resettable Land Anchor Launcher for Unmanned Rover Rescue and Slope Climbing |
|
Kainth, Aaryan | University of California Santa Barbara |
Krohn, Andrew R. | University of California Santa Barbara |
Johnson, Kyle A. | NASA Glenn Research Center |
Schepelmann, Alexander | Carnegie Mellon University |
Hawkes, Elliot Wright | University of California, Santa Barbara |
Naclerio, Nicholas | University of California, Santa Barbara |
Keywords: Space Robotics and Automation, Mechanism Design
Abstract: Unmanned planetary rovers have traversed kilometers of Lunar and Martian terrain while performing valuable science. However, they still face mobility challenges including steep slopes and unstable soil that can entrap vehicles, as demonstrated by NASA’s Spirit rover. Vehicles on Earth can depend on a human operator or rescue vehicle to tow them out of an entrapment, but remote rovers cannot, limiting their route to highly conservative path selections. To increase rover mobility on slopes and unstable soils, we present a resettable anchor launcher for independent self-rescue. The device launches a tethered land anchor away from the rover and then uses a winch to tow the rover up a hill or out of an entrapment. This paper presents the design of the launcher and its integration into a half-meter-long rover mobility platform with field testing at the NASA Glenn Research Center SLOPE Lab. We demonstrate repeatable launching and winching to help the rover climb a 17° slope of loose GRC-1 Lunar regolith simulant that it otherwise could not climb. Our work presents an alternative method to increase rover mobility, especially up slopes, and enables independent rover rescue, which could eventually increase mission duration and reduce risk of entrapment during extraterrestrial exploration.
|
|
11:30-11:35, Paper ThCT3.4 | |
SOF-E: An Energy Efficient Robot for Collaborative Transport and Placement of Mechanical Meta-Material Modules |
|
Moon, Inchul | Seoul National University |
Sebastianelli, Frank | NASA |
Gregg, Christine | NASA Ames Research Center |
Cheung, Kenneth C. | National Aeronautics and Space Administration (NASA) |
Keywords: Space Robotics and Automation, Robotics and Automation in Construction, Cooperating Robots
Abstract: In-space assembly is a key capability to enable construction of large-scale structures required for sustained human presence in space. Robotic assembly is critical to reduce required crew time and risk, while modularity ensures that solutions are versatile and adaptive to complex mission concepts. NASA’s Automated Reconfigurable Mission Adaptive Digital Assembly Systems (ARMADAS) project demonstrated that robots with relatively low cost, size, and degrees-of-freedom (DoFs) can be used for large-scale modular lattice structure assembly. This is possible by using the structural modules for robotic systems metrology and error mitigation. Robots with reduced complexity may lead to advantages in initial and maintenance cost, offering an alternative to large, complex, and expensive robots. In this paper, we describe the Structure Omni-directional Foldable Explorer (SOF-E), a robot with significantly lower mass and DoF compared to the previous ARMADAS robot architecture. Although SOF-E is a five DoF robot with only two or three control states per actuator, it is capable of transporting and placing structural modules by collaborating with other instances of itself. We discuss the mechanical design and architecture of SOF-E, including analysis of energy usage during each operation. Experiments demonstrate that during locomotion and module transport tasks, SOF-E requires significantly lower energy than the previous cargo transport robot architecture, the Scaling Omni-directional Lattice Locomoting Explorer (SOLL-E). The cost of transport metric is used to compare the energy efficiency of the operation.
|
|
11:35-11:40, Paper ThCT3.5 | |
Quarry-Bot: A Reconfigurable Cable-Suspended Robot for Lunar Site Engineering |
|
Castrejon, Zahir | University of Nevada Las Vegas |
Oh, Paul Y. | University of Nevada, Las Vegas (UNLV) |
Keywords: Field Robots, Robotics and Automation in Agriculture and Forestry, Space Robotics and Automation
Abstract: This paper introduces Quarry-Bot, a Reconfigurable Cable-Suspended Robot developed to support the NASA Artemis program’s efforts in preparing for the long-term colonization of the Moon and Mars. Quarry-Bot autonomously clears debris on the lunar surface, a key step in site preparation for future habitats and infrastructure. The system utilizes active control strategies, combined with the Moon’s lower gravity, to perform underhand rock tosses as a scalable approach to extraterrestrial site preparation. Its reconfigurable structure, including motorized anchor points and a lightweight tripod design, adjusts cable tensions to generate swing motions for debris displacement. The system is driven by two Dynamixel MX-106 motors for movement and steering, along with a NEMA 17 stepper motor for cable adjustments. A decentralized control system, managed by Raspberry Pi units, coordinates these components. Simulations and experiments conducted under both Earth and lunar gravity conditions demonstrate the effectiveness of Linear Quadratic Regulator (LQR) and Model Predictive Control (MPC) strategies in achieving rock throws. Quarry-Bot reaches swing angles and projects rocks over distances that may support lunar site clearing and overall engineering purposes. The paper concludes by discussing potential areas for further system refinement, including adjustments for different terrain conditions and im- proved actuation strategies for lunar missions.
|
|
11:40-11:45, Paper ThCT3.6 | |
A Tugging Controller That Maximizes Lateral Resistive Force by Mounding Sandy Terrain |
|
Moon, Deaho | University of California Berkeley |
Huang, Chris | University of California Berkeley |
Page, Justin | UC Berkeley Mechanical Engineering |
Stuart, Hannah | UC Berkeley |
Keywords: Space Robotics and Automation, Field Robots, Sensor-based Control
Abstract: Sandy environments present challenges for robotic space rovers and systems due to reduced traction, limiting mobility and tugging force. This paper presents an anchoring method that utilizes a winching system to create a sand mound in front of a mobile agent dragged through the media. The proposed controller is designed to consistently achieve real-time capture of close-to-maximal lateral sand mound resistive force, even when applied to varied uneven terrains, like holes or waves. Notably, tugging is non-reversible, so suitable peaks should be captured before breakdown and without necessarily knowing the global optimum a priori. The controller logic tracks both tugging force and agent pitch gradients to detect terrain conditions and peak force trends. Results show that the controller captures an average 92% of the maximum forces, within the previously winched workspace tested, across three different granular media with four varying structured terrain features. The controller achieves higher resistive force peaks on terrains with geometric features, as opposed to flat sand. We conclude that sand mounding through tugging is a viable means to generate robotic resistive forces for unknown sandy terrains, a simple yet effective anchoring mechanism.
|
|
ThCT4 |
304 |
Image and 3D Segmentation 2 |
Regular Session |
Chair: Burgard, Wolfram | University of Technology Nuremberg |
Co-Chair: Marron, Pedro Jose | University of Duisburg-Essen |
|
11:15-11:20, Paper ThCT4.1 | |
RMSeg-UDA: Unsupervised Domain Adaptation for Road Marking Segmentation under Adverse Conditions |
|
Cai, Yi-Chang | National Chung Cheng University |
Hsiao, Heng Chih | National Chung Cheng University |
Chiu, Wei-Chen | National Chiao Tung University |
Lin, Huei-Yung | National Taipei University of Technology |
Chiao-Tung, Chan | Mechanical and Mechatronics Systems Research Laboratories, Indus |
Keywords: Computer Vision for Transportation, Vision-Based Navigation, Intelligent Transportation Systems
Abstract: The segmentation of road markings plays a crucial role in visual perception for the autonomous driving system. It enables vehicles to recognize road markings at the pixel-level, and facilitates subsequent path planning, localization, and map construction tasks. Current techniques mainly focus on normal driving scenes (i.e., clear daytime), and the performance would decrease significantly for adverse weather conditions. This work proposes RMSeg-UDA: an unsupervised domain adaptive road marking segmentation framework. By combining schedule self- training and class-conditioned adversarial training, the network utilizes both labeled normal data and unlabeled data from other domains to train a road marking segmentation model. For the evaluation on adverse conditions, a new image dataset, RLMD- AC, is established with rainy and nighttime driving scenes. The experiments conducted using both public and our datasets have demonstrated the effectiveness of the proposed technique. Code and dataset are available at https://github.com/stu9113611/RM Seg-UDA.
|
|
11:20-11:25, Paper ThCT4.2 | |
Enhancing the Utilization of Color Information in Point Cloud Semantic Segmentation |
|
Guo, Xinyu | Wuhan University |
Gao, Zhi | Temasek Laboratories @ NUS |
Zhou, Zhiyu | Wuhan University |
Wang, Jingshi | 1.School of Aeronautics and Astronautics, Shanghai Jiao Tong Uni |
Tang, Luliang | Wuhan University |
Cao, Min | Wuhan Guanggu Zoyon Science and Technology Company Ltd., Wuhan 4 |
Keywords: Recognition, RGB-D Perception, Sensor Fusion
Abstract: Point cloud semantic segmentation is crucial in various applications such as autonomous driving, robotics, and virtual reality, aiming to assign labels to each point in a cloud to reflect spatial relationships and boundaries. While previous methods primarily focus on geometric features, they often overlook the auxiliary role of color information, especially in scenes where geometric structures are less distinct. In this paper, we propose the Color Point Cloud Enhancement (CPCE) method to effectively leverage color information for improved 3D scene understanding. CPCE introduces a color information enhancement module with multi-scale consistency, enriching point features throughout the encoder stages. Additionally, we develop a novel contrastive learning module that uses relative color coordinates for point cloud serialization, allowing for the capture of positive and negative samples from distant points with similar color textures. Furthermore, we design a contrastive learning module tailored for scenes with weak geometric structures, enhancing feature representation through color-augmented contrast. Our method achieved a 78.1% mIoU on the ScanNet dataset, outperforming existing models trained on a single dataset. These results highlight the effectiveness of CPCE in scenarios where traditional methods struggle, particularly in enhancing segmentation accuracy by utilizing color as a critical feature.
|
|
11:25-11:30, Paper ThCT4.3 | |
UltraFastCrackSeg: A Lightweight Real-Time Crack Segmentation Model with Task-Oriented Pretraining |
|
Qi, Weiqing | HKUST |
Zhao, Guoyang | HKUST(GZ) |
Ma, Fulong | The Hong Kong University of Science and Technology |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Yang, Yang | Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Computer Vision for Automation, Object Detection, Segmentation and Categorization, Robotics in Under-Resourced Settings
Abstract: Crack segmentation is pivotal for structural health monitoring, enabling the timely maintenance of critical infrastructure such as bridges and roads. However, existing deep learning models are often too computationally intensive for deployment on resource-constrained devices. To address this limitation, we introduce UltraFastCrackSeg, a lightweight model designed for real-time crack segmentation that effectively balances high accuracy with low computational demands. Featuring an efficient encoder-decoder architecture, our model significantly reduces parameter count and floating point operations (FLOPs) compared to current methods. We further enhance performance through a self-supervised pretraining approach that employs a novel, task-oriented masking strategy, thereby improving feature extraction. Experiments across multiple datasets demonstrate that UltraFastCrackSeg achieves state-of-the-art Intersection over Union (IoU) and F1 scores while maintaining a compact model size and high inference speed. Evaluations on a low-power CPU device confirm its capability to achieve up to 80 frames per second (FPS) with ONNX runtime optimization, making it highly suitable for real-time, on-site applications. These findings establish UltraFastCrackSeg as a robust and efficient solution for practical crack detection tasks.
|
|
11:30-11:35, Paper ThCT4.4 | |
Enhancing 3D Scene Graphs with Real-Time Room Classification |
|
Janzon, Simon | University of Duisburg Essen |
Medina Sanchez, Carlos | Duisburg Essen University |
Golkowski, Alexander Julian | University of Duisburg-Essen |
Handte, Marcus | University of Duisburg-Essen |
Marron, Pedro Jose | University of Duisburg-Essen |
Keywords: Semantic Scene Understanding, Software Architecture for Robotic and Automation
Abstract: In recent years, 3D scene graphs have become a critical tool in robotics and computer vision for enabling systems to understand both the geometric and semantic aspects of their surroundings. These data structures represent spatial and semantic relationships between objects in a three-dimensional environment, supporting tasks like navigation, object manipulation, and scene understanding. This paper presents a real-time pipeline for 3D scene graph generation that offers flexibility in image segmentation techniques while incorporating room classification that is based on a Random Forest model. Our work enables robots to dynamically update their understanding of complex and large-scale environments in real-time. We evaluate our approach systematically on a dataset and in a real-life experiment. The results demonstrate the capability of running our solution at over 10 Hz on an Nvidia Jetson AGX Orin SoC while also scaling favorably in larger environments. Our proposed room classification approach predicts classes with an average accuracy of 80%.
|
|
11:35-11:40, Paper ThCT4.5 | |
MFSeg: Efficient Multi-Frame 3D Semantic Segmentation |
|
Huang, Chengjie | University of Waterloo |
Czarnecki, Krzysztof | University of Waterloo |
Keywords: Deep Learning for Visual Perception
Abstract: We propose MFSeg, an efficient multi-frame 3D semantic segmentation framework. By aggregating point cloud sequences at the feature level and regularizing the feature extraction and aggregation process, MFSeg reduces computational overhead while maintaining high accuracy. Moreover, by employing a lightweight MLP-based point decoder, our method eliminates the need to upsample redundant points from past frames. Experiments on the nuScenes and Waymo datasets show that MFSeg outperforms existing methods, demonstrating its effectiveness and efficiency.
|
|
11:40-11:45, Paper ThCT4.6 | |
A Good Foundation Is Worth Many Labels: Label-Efficient Panoptic Segmentation |
|
Vödisch, Niclas | University of Freiburg |
Petek, Kürsat | University of Freiburg |
Käppeler, Markus | University of Freiburg |
Valada, Abhinav | University of Freiburg |
Burgard, Wolfram | University of Technology Nuremberg |
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Robotics and Automation in Agriculture and Forestry
Abstract: A key challenge for the widespread application of learning-based models for robotic perception is to significantly reduce the required amount of annotated training data while achieving accurate predictions. This is essential not only to decrease operating costs but also to speed up deployment time. In this work, we address this challenge for PAnoptic SegmenTation with fEw Labels (PASTEL) by exploiting the groundwork paved by visual foundation models. We leverage descriptive image features from such a model to train two lightweight network heads for semantic segmentation and object boundary detection, using very few annotated training samples. We then merge their predictions via a novel fusion module that yields panoptic maps based on normalized cut. To further enhance the performance, we utilize self-training on unlabeled images selected by a feature-driven similarity scheme. We underline the relevance of our approach by employing PASTEL to important robot perception use cases from autonomous driving and agricultural robotics. In extensive experiments, we demonstrate that PASTEL significantly outperforms previous methods for label-efficient segmentation even when using fewer annotations. The code of our work is publicly available at https://pastel.cs.uni-freiburg.de.
|
|
ThCT5 |
305 |
Explainable AI in Robotics |
Regular Session |
Chair: Chernova, Sonia | Georgia Institute of Technology |
Co-Chair: Feil-Seifer, David | University of Nevada, Reno |
|
11:15-11:20, Paper ThCT5.1 | |
CE-MRS: Contrastive Explanations for Multi-Robot Systems |
|
Schneider, Ethan | Georgia Institute of Technology |
Wu, Daniel | Georgia Institute of Technology |
Das, Devleena | Georgia Institute of Technology |
Chernova, Sonia | Georgia Institute of Technology |
Keywords: Design and Human Factors, Human Factors and Human-in-the-Loop, Multi-Robot Systems
Abstract: As the complexity of multi-robot systems grows to incorporate a greater number of robots, more complex tasks, and longer time horizons, the solutions to such problems often become too complex to be fully intelligible to human users. In this work, we introduce an approach for generating natural language explanations that justify the validity of the system's solution to the user, or else aid the user in correcting any errors that led to a suboptimal system solution. Toward this goal, we first contribute a generalizable formalism of contrastive explanations for multi-robot systems, and then introduce a holistic approach to generating contrastive explanations for multi-robot scenarios that selectively incorporates data from multi-robot task allocation, scheduling, and motion-planning to explain system behavior. Through user studies with human operators we demonstrate that our integrated contrastive explanation approach leads to significant improvements in user ability to identify and solve system errors, leading to significant improvements in overall multi-robot team performance.
|
|
11:20-11:25, Paper ThCT5.2 | |
Affordance-Based Explanations of Robot Navigation |
|
Halilovic, Amar | Ulm University |
Krivic, Senka | University of Sarajevo |
Keywords: Social HRI, Human-Centered Robotics
Abstract: This paper introduces affordance-based explanations of robot navigational decisions. The rationale behind affordance-based explanations draws on the theory of affordances, a principle rooted in ecological psychology that describes potential actions the objects in the environment offer to the robot. We demonstrate how affordances can be incorporated into visual and textual explanations for common robot navigation and path-planning scenarios. Furthermore, we formalize and categorize the concept of affordance-based explanations and connect it to existing explanation types in robotics. We present the results of a user study that shows participants to be, on average, highly satisfied with visual-textual, i.e., multimodal, affordance-based explanations of robot navigation. Furthermore, we investigate the complexity of different types of textual affordance-based explanations. Our research contributes to the expanding domain of explainable robotics, focusing on explaining robot actions in navigation.
|
|
11:25-11:30, Paper ThCT5.3 | |
Explainable Reinforcement Learning Via Dynamic Mixture Policies |
|
Schier, Maximilian | Leibniz Universität Hannover |
Schubert, Frederik | Leibniz University Hannover |
Rosenhahn, Bodo | Institute of Information Processing, Leibniz Universität Hannove |
Keywords: Reinforcement Learning, Acceptability and Trust
Abstract: Learning control policies using deep reinforcement learning has shown great success for a variety of applications, including robotics and automated driving. A key area limiting the adaptation of RL in the real world is the lack of trust in the decision-making process of such policies. Therefore, explainability is a requirement of any RL agent operating in the real world. In this work, we propose a family of control policies that are explainable-by-design regarding individual observation components on object-based scene representations. By estimating diagonal squashed Gaussian and categorical mixture distributions on sub-spaces of the decomposed observations, we develop stochastic policies with easy-to-read explanations of the decision-making process. Our design is generally applicable to any RL algorithm using stochastic policies. We showcase the explainability on an extensive suite of single- and multi-agent simulations, set- and sequence-based high-level scenes, and discrete and continuous action spaces, with performance at least on-par or better compared to standard policy architectures. In additional experiments, we analyze the robustness of our approach to its single additional hyper-parameter and examine its potential for very low computational requirements with tiny policies.
|
|
11:30-11:35, Paper ThCT5.4 | |
3D Spatial Understanding in MLLMs: Disambiguation and Evaluation |
|
Chang, Chun-Peng | DFKI |
Pagani, Alain | German Research Center for Artificial Intelligence |
Stricker, Didier | German Research Center for Artificial Intelligence |
Keywords: Multi-Modal Perception for HRI, Deep Learning for Visual Perception, Visual Learning
Abstract: Multimodal Large Language Models (MLLMs) have made significant progress in tasks such as image captioning and question answering. However, while these models can generate realistic captions, they often struggle with providing precise instructions, particularly when it comes to localizing and disambiguating objects in complex 3D environments. This capability is critical as MLLMs become more integrated with collaborative robotic systems. In scenarios where a target object is surrounded by similar objects (distractors), robots must deliver clear, spatially-aware instructions to guide humans effectively. We refer to this challenge as contextual object localization and disambiguation, which imposes stricter constraints than conventional 3D dense captioning, especially regarding ensuring target exclusivity. In response, we propose simple yet effective techniques to enhance the model's ability to localize and disambiguate target objects. Our approach not only achieves state-of-the-art performance on conventional metrics that evaluate sentence similarity, but also demonstrates improved 3D spatial understanding through 3D visual grounding model.
|
|
11:35-11:40, Paper ThCT5.5 | |
Towards Transparent Multi-Agent Autonomous Systems through Principled Multi-Source Knowledge Distillation |
|
Zhongzheng, Guo | Chinese Academy of Military Science |
Chaoran, Wang | Zhejiang University |
Zhu, Xiaozhou | Chinese Academy of Military Science |
Changju, Wu | Zhejiang University |
Deng, Baosong | Academy of Military Science |
Yao, Wen | Chinese Academy of Military Science |
Keywords: AI-Based Methods, Behavior-Based Systems, Path Planning for Multiple Mobile Robots or Agents
Abstract: Many real-world robotic applications can be formulated as Multi-Agent Path-Finding (MAPF) problems and approximated using Multi-Agent Reinforcement Learning (MARL) algorithms. However, the opaque nature of the black-box neural network models employed by MARL algorithms has impeded their widespread adoption due to concerns over interpretability, debugging, and user trust.To address these limitations, we propose an interpretable MAPF framework that emulates a group of n path-finding agents optimized through reinforcement learning (RL) using behavior trees (BTs), where n is the number of agents in path-finding scenarios. Expert behavior datasets consisting of state-action trajectories from MARL algorithms are generated, and a knowledge distillation approach is employed to reduce the size of the datasets and extract implicit rules.Additionally, a principled rules factorization technique based on Boolean algebra theory is utilized to prune the behavior rules and create more compact BTs representations.The proposed framework is evaluated on randomly generated MAPF scenarios and demonstrates superior performance compared to conventional BTs generation methods. This paper advances the field of interpretable AI by enabling the extraction of understandable decision-making processes from complex reinforcement learning models in multi-agent systems.
|
|
11:40-11:45, Paper ThCT5.6 | |
Through the Clutter: Exploring the Impact of Complex Environments on the Legibility of Robot Motion |
|
Schmidt-Wolf, Melanie | University of Nevada, Reno |
Becker, Tyler J | University of Nevada, Reno |
Oliva, Denielle | University of Nevada, Reno |
Nicolescu, Monica | University of Nevada, Reno |
Feil-Seifer, David | University of Nevada, Reno |
Keywords: Intention Recognition, Human-Robot Collaboration, Social HRI
Abstract: The environments in which the collaboration of a robot would be the most helpful to a person are frequently uncontrolled and cluttered with many objects present. Legible robot arm motion is crucial in tasks like these in order to avoid possible collisions, improve the workflow and help ensure the safety of the person. Prior work in this area, however, focuses on solutions that are tested only in uncluttered environments and there are not many results taken from cluttered environments. In this research we present a measure for clutteredness based on an entropic measure of the environment, and a novel motion planner based on potential fields. Both our measure and the planner were tested in a cluttered environment meant to represent a more typical tool-sorting task for which the person would collaborate with a robot. The in-person validation study with Baxter robots shows a significant improvement in legibility of our proposed legible motion planner compared to the current state-of-the-art legible motion planner in cluttered environments. Further, the results show a significant difference in the performance of the planners in cluttered and uncluttered environments, and the need to further explore legible motion in cluttered environments. We argue that the inconsistency of our results in cluttered environments with those obtained from uncluttered environments points out several important issues with the current research performed in the area of legible motion planners.
|
|
ThCT6 |
307 |
Perception for Manipulation 2 |
Regular Session |
Chair: Chrysostomou, Dimitrios | Aalborg University |
Co-Chair: Grotz, Markus | University of Washington (UW) |
|
11:15-11:20, Paper ThCT6.1 | |
OpenSU3D: Open World 3D Scene Understanding Using Foundation Models |
|
Mohiuddin, Rafay | Technical University of Munich |
Prakhya, Sai Manoj | Huawei Technologies Deutscheland GmbH |
Collins, Fiona | TUM |
Liu, Ziyuan | Huawei Group |
Borrmann, Andre | Technical University of Munich |
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, RGB-D Perception
Abstract: In this paper, we present a novel, scalable approach for constructing open set, instance-level 3D scene representations, advancing open world understanding of 3D environments. Existing methods require pre-constructed 3D scenes and face scalability issues due to per-point feature representation, additionally struggle with contextual queries. Our method overcomes these limitations by incrementally building instance level 3D scene representations using 2D foundation models, and efficiently aggregating instance-level details such as masks, feature vectors, names, and captions. We introduce fusion schemes for feature vectors to enhance their contextual knowledge and performance on complex queries. Additionally, we explore large language models for robust automatic annotation and spatial reasoning tasks. We evaluate our proposed approach on multiple scenes from ScanNet and Replica datasets demonstrating zero-shot generalization capabilities, exceeding current state-of-the-art methods in open world 3D scene understanding. Project page: https://opensu3d.github.io/
|
|
11:20-11:25, Paper ThCT6.2 | |
Task-Aware Semantic Map: Autonomous Robot Task Assignment Beyond Commands |
|
Choi, Daewon | Hanyang University |
Lee, Ho Sung | Hanyang Univ |
Hwang, Soeun | Hanyang University |
Oh, Yoonseon | Hanyang University |
Keywords: Semantic Scene Understanding, Task Planning, Perception-Action Coupling
Abstract: With recent advancements in Large Language Models, task planning methods that interpret human commands have garnered significant attention. However, as home robots become more common, specifying every daily task could become impractical. This paper introduces a novel semantic map called the Task-Aware Semantic Map (TASMap), which enables robots to autonomously assign and propose necessary tasks in a scene without explicit human commands. The core innovation of this approach is the ability of TASMap to comprehend the context of objects within a scene and autonomously generate task proposals. This capability significantly advances autonomous robotic assistance, reducing the dependency on specific commands and enhancing interaction with environments. We present two key applications of TASMap: contextual task proposal and spatial task proposal. Our results, verified across 35 diverse and realistically disordered scenes, underscore the effectiveness of TASMap in both simulation and real-world environments.
|
|
11:25-11:30, Paper ThCT6.3 | |
High-Quality Unknown Object Instance Segmentation Via Quadruple Boundary Error Refinement |
|
Back, Seunghyeok | Korea Institute of Machinery & Materials |
Lee, Sangbeom | Gwangju Institute of Science and Technology |
Kim, Kangmin | Gwangju Institute of Science and Technology |
Lee, Joosoon | Gwangju Institute of Science and Technology |
Shin, Sungho | Hyundai Motors |
Maeng, Jemo | Gwangju Institute of Science and Technology(GIST) |
Lee, Kyoobin | Gwangju Institute of Science and Technology |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Perception for Grasping and Manipulation
Abstract: Accurate and efficient segmentation of unknown objects in unstructured environments is essential for robotic manipulation. Unknown Object Instance Segmentation (UOIS), which aims to identify all objects in unknown categories and backgrounds, has become a key capability for various robotic tasks. However, existing methods struggle with over-segmentation and under-segmentation, leading to failures in manipulation tasks such as grasping. To address these challenges, we propose QuBER (Quadruple Boundary Error Refinement), a novel error-informed refinement approach for high-quality UOIS. QuBER first estimates quadruple boundary errors—true positive, true negative, false positive, and false negative pixels—at the instance boundaries of the initial segmentation. It then refines the segmentation using an error-guided fusion mechanism, effectively correcting both fine-grained and instance-level segmentation errors. Extensive evaluations on three public benchmarks demonstrate that QuBER outperforms state-of-the-art methods and consistently improves various UOIS methods while maintaining a fast inference time of less than 0.1 seconds. Furthermore, we show that QuBER improves the success rate of grasping target objects in cluttered environments. Code and supplementary materials are available at https://sites.google.com/view/uois-quber.
|
|
11:30-11:35, Paper ThCT6.4 | |
Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph |
|
Linok, Sergey | MIPT |
Zemskova, Tatiana | AIRI, MIPT |
Ladanova, Svetlana | MIPT |
Titkov, Roman | Moscow Institute of Physics and Technology |
Yudin, Dmitry | Moscow Institute of Physics and Technology |
Monastyrny, Maxim | Sberbank of Russia |
Valenkov, Aleksei | Sberbank of Russia |
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, RGB-D Perception
Abstract: Locating objects described in natural language presents a significant challenge for autonomous agents. Existing CLIP-based open-vocabulary methods successfully perform 3D object grounding with simple (bare) queries, but cannot cope with ambiguous descriptions that demand an understanding of object relations. To tackle this problem, we propose a modular approach called BBQ (Beyond Bare Queries), which constructs 3D scene graph representation with metric and semantic edges and utilizes a large language model as a human-to-agent interface through our deductive scene reasoning algorithm. BBQ employs robust DINO-powered associations to construct 3D object-centric map and an advanced raycasting algorithm with a 2D vision-language model to describe them as graph nodes. On the Replica and ScanNet datasets, we have demonstrated that BBQ takes a leading place in open-vocabulary 3D semantic segmentation compared to other zero-shot methods. Also, we show that leveraging spatial relations is especially effective for scenes containing multiple entities of the same semantic class. On challenging Sr3D+, Nr3D and ScanRefer benchmarks, our deductive approach demonstrates a significant improvement, enabling objects grounding by complex queries compared to other state-of-the-art methods. The combination of our design choices and software implementation has resulted in significant data processing speed in experiments on the robot on-board computer. This promising performance enables the application of our approach in intelligent robotics projects. We made the code publicly available at linukc.github.io/BeyondBareQueries.
|
|
11:35-11:40, Paper ThCT6.5 | |
A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space |
|
He, Yonghao | D-Robotics |
Su, Hu | Institute of Automation, Chinese Academy of Science |
Yu, Haiyong | D-Robotics |
Yang, Cong | Soochow University |
Sui, Wei | Soochow University |
Wang, Cong | D-Robotics |
Liu, Song | ShanghaiTech University |
Keywords: Object Detection, Segmentation and Categorization, Semantic Scene Understanding, Deep Learning for Visual Perception
Abstract: Open-set object detection (OSOD) is highly desirable for robotic manipulation in unstructured environments. However, existing OSOD methods often fail to meet the requirements of robotic applications due to their high computational burden and complex deployment. To address this issue, this paper proposes a light-weight framework called Decoupled OSOD (DOSOD), which is a practical and highly efficient solution for supporting real-time OSOD tasks in robotic systems. Specifically, DOSOD builds upon the YOLO-World pipeline by integrating a vision-language model (VLM) with a detector. A Multilayer Perceptron (MLP) adaptor is developed to transform text embeddings extracted by the VLM into a joint space, within which the detector learns the region representations of class-agnostic proposals. Cross-modality features are directly aligned in the joint space, avoiding the complex feature interactions and thereby improving computational efficiency. DOSOD operates like a traditional closed-set detector during the testing
|
|
11:40-11:45, Paper ThCT6.6 | |
LBSNet: Lightweight Joint Boundary Detection and Semantic Segmentation for Transparent and Reflective Objects |
|
Tong, Ling | Southeast University |
Qian, Kun | Southeast University |
Jing, Xingshuo | Southeast University |
Keywords: Deep Learning for Visual Perception, Semantic Scene Understanding, Computer Vision for Automation
Abstract: Accurate visual detection of transparent and reflective objects remains a challenging issue for mobile manipulators. For the most common depth cameras and LiDAR sensors, the distinctive optical attributes inherent in both transparent and reflective objects pose a significant challenge. To address this problem, this study proposes a lightweight joint boundary detection and semantic segmentation network named LBSNet. LBSNet aims to enhance the perception of transparent and reflective objects in complex and dynamic environments, using RGB images only. It leverages the synergy between boundary detection and semantic segmentation through feature fusion and a multitask learning mechanism. The encoder consists of two paths: one captures category-aware semantic information, while the other discerns boundaries from multi-scale features. The gated channel adaptive (GCA) module enhances boundary features by learning channel parameters. The dynamic adaptive feature fusion (DAFF) module dynamically adjusts semantic and boundary information through cross-feature fusion. These methods effectively capture the distinctive characteristics of transparent and reflective objects, such as light refraction, boundary blurring and low contrast. Experimental results show that LBSNet achieves higher accuracy and faster processing speed on multiple public datasets compared with existing methods. Moreover, its lightweight design makes it suitable for resource-constrained mobile manipulators.
|
|
ThCT7 |
309 |
Marine Robotics 6 |
Regular Session |
Chair: Johnson-Roberson, Matthew | Carnegie Mellon University |
Co-Chair: Roznere, Monika | Binghamton University |
|
11:15-11:20, Paper ThCT7.1 | |
Stonefish: Supporting Machine Learning Research in Marine Robotics |
|
Grimaldi, Michele | University of Girona |
Cieslak, Patryk | Universitat De Girona |
Ochoa, Eduardo | Universitat De Girona |
Bharti, VIbhav | Heriot Watt University |
Rajani, Hayat | University of Girona |
Carlucho, Ignacio | University of Edinburgh |
Koskinopoulou, Maria | Heriot-Watt University |
Petillot, Yvan R. | Heriot-Watt University |
Gracias, Nuno | University of Girona |
Keywords: Marine Robotics, Simulation and Animation
Abstract: Simulations are highly valuable in marine robotics, offering a cost-effective and controlled environment for testing in the challenging conditions of underwater and surface operations. Given the high costs and logistical difficulties of real-world trials, simulators capable of capturing the operational conditions of subsea environments have become key in developing and refining remotely-operated and autonomous underwater vehicles. This paper highlights recent enhancements to the Stonefish simulator, an advanced open-source platform supporting development and testing of marine robotics solutions. Key updates include a suite of additional sensors, such as an event-based camera, a thermal camera, and an optical flow camera, as well as, visual light com- munication, support for tethered operations, improved thruster modelling, more flexible hydrodynamics, and enhanced sonar accuracy. These developments and an automated annotation tool significantly bolster Stonefish’s role in marine robotics research, especially in the field of deep learning, where training data with a known ground truth is hard or impossible to collect. https://github.com/patrykcieslak/stonefish
|
|
11:20-11:25, Paper ThCT7.2 | |
Sea-U-Whale: A Reconfigurable Marine Robot with Multi-Modal Motion |
|
Ding, Wendi | The Chinese University of Hong Kong |
Zhao, Zuoquan | The Chinese University of Hong Kong |
Yan, Ruixin | The Chinese University of Hong Kong |
Gao, Songqun | The Chinese University of Hong Kong |
Guo, Zixuan | The Chinese University of Hong Kong |
Liu, Xuchen | The Chinese University of Hong Kong |
Chen, Ben M. | Chinese University of Hong Kong |
Keywords: Marine Robotics, Actuation and Joint Mechanisms, Product Design, Development and Prototyping
Abstract: As marine exploration becomes increasingly important, marine robots have been extensively studied in recent years. Despite some well-designed robots have already achieved to various successful missions, most existing robots struggle to adapt to diverse demands or tasks due to their fixed structure and complexity of the marine environment. To address these challenges, we present a novel reconfigurable marine robot named Sea-U-Whale. This system can dynamically adjust its actuator configuration in the marine environment, providing superior environmental adaptability, maneuverability, and versatile mobility. Considering the demands of unmanned ocean exploration, an active reconfiguration mechanism and three distinct vehicle modes are designed for optimal actuation in various marine scenarios. The multi-modal mobility of our system and its robust performance have been validated through extensive field tests and water tank experiments, demonstrating its potential in handling a wide range of mission profiles.
|
|
11:25-11:30, Paper ThCT7.3 | |
MERLION: Marine ExploRation with Language guIded Online iNformative Visual Sampling and Enhancement |
|
Thengane, Shrutika | Singapore University of Technology and Design |
Prasetyo, Marcel Bartholomeus | Singapore University of Technology and Design |
Tan, Yu Xiang | Singapore University of Technology and Design |
Meghjani, Malika | Singapore University of Technology and Design |
Keywords: Marine Robotics, Environment Monitoring and Management, Computer Vision for Automation
Abstract: Autonomous and targeted underwater visual monitoring and exploration using Autonomous Underwater Vehicles (AUVs) can be a challenging task due to both online and offline constraints. The online constraints comprise limited onboard storage capacity and communication bandwidth to the surface, whereas the offline constraints entail the time and effort required for the selection of desired keyframes from the video data. An example use case of targeted underwater visual monitoring is finding the most interesting visual frames of fish in a long sequence of an AUV's visual experience. This challenge of targeted informative sampling is further aggravated in murky waters with poor visibility. In this paper, we present MERLION, a novel framework that provides semantically aligned and visually enhanced summaries for murky underwater marine environment monitoring and exploration. Specifically, our framework integrates (a) an image-text model for semantically aligning the visual samples to the user's needs, (b) an image enhancement model for murky water visual data and (c) an informative sampler for summarizing the monitoring experience. We validate our proposed MERLION framework on real-world data with user studies and present qualitative and quantitative results using our evaluation metric and show improved results compared to the state-of-the-art approaches. The code is available at https://github.com/MARVL-Lab/MERLION.git
|
|
11:30-11:35, Paper ThCT7.4 | |
PoLaRIS Dataset: A Maritime Object Detection and Tracking Dataset in Pohang Canal |
|
Choi, Jiwon | Inha University |
Cho, Dongjin | Inha University |
Lee, Gihyeon | Inha University |
Kim, Hogyun | Inha University |
Yang, Geonmo | Inha University |
Kim, Joowan | Samsung Heavy Industries |
Cho, Younggun | Inha University |
Keywords: Marine Robotics, Data Sets for Robotic Vision, Sensor Fusion
Abstract: Maritime environments often present hazardous situations due to factors such as moving ships or buoys, which become obstacles under the influence of waves. In such challenging conditions, the ability to detect and track potentially hazardous objects is critical for the safe navigation of marine robots, but datasets capturing these scenarios remain limited. To address this limitation, we introduce a new multi-modal dataset that includes image and point-wise annotations of maritime obstacles. Our dataset provides detailed ground truth for obstacle detection and tracking, including objects as small as 10×10 pixels, which are crucial for maritime safety. To validate the dataset’s effectiveness as a reliable benchmark, we conducted evaluations using various methodologies, including state-of-the-art (SOTA) techniques for object detection and tracking. These evaluations are expected to contribute to improving performance, particularly in the complex maritime environment. This represents the first demonstration of a dataset offering multi-modal annotations specifically tailored to maritime environments. Our dataset is available at https: //github.com/sparolab/PoLaRIS.
|
|
11:35-11:40, Paper ThCT7.5 | |
Confidence-Aware Object Capture for a Manipulator Subject to Floating-Base Disturbances |
|
Xu, Ruoyu | The Chinese University of Hong Kong, Shenzhen |
Jiang, Zixing | The Chinese University of Hong Kong |
Liu, Beibei | The Chinese University of Hongkong, Shenzhen |
Wang, Yuquan | Tencent |
Qian, Huihuan (Alex) | The Chinese University of Hong Kong, Shenzhen |
Keywords: Marine Robotics, Field Robots, Robotics in Hazardous Fields, Floating-Base Manipulator
Abstract: Capturing stationary aerial objects on unmanned surface vehicles (USVs) is challenging due to quasiperiodic and fast floating-base motions caused by wave-induced disturbances. It is hard to (1) maintain high motion prediction accuracy due to the stochastic nature of these disturbances and (2) perform object capture through real-time tracking due to the limited active torque. We introduce confidence analysis in predictive capture. To address the inaccuracy predictions, we calculate a real-time confidence tube to evaluate the prediction quality. To overcome tracking difficulties, we plan a trajectory to capture the object at a future moment while maximizing the confidence of the capture position on the predicted trajectory. All calculations are completed within 0.2 seconds to ensure a timely response. We validate our approach through experiments, where we simulate disturbances by executing real USV motions using a servo platform. The results demonstrate that our method achieves an 80% success rate.
|
|
11:40-11:45, Paper ThCT7.6 | |
RecGS: Removing Water Caustic with Recurrent Gaussian Splatting |
|
Zhang, Tianyi | Carnegie Mellon University |
Zhi, Weiming | Carnegie Mellon University |
Meyers, Braden | Brigham Young University |
Durrant, Sterling Nelson | Brigham Young University |
Huang, Kaining | Carnegie Mellon University |
Mangelson, Joshua | Brigham Young University |
Barbalata, Corina | Louisiana State University |
Johnson-Roberson, Matthew | Carnegie Mellon University |
Keywords: Marine Robotics, Deep Learning for Visual Perception
Abstract: Water caustics are commonly observed in seafloor imaging data from shallow-water areas. Traditional methods that remove caustic patterns from images often rely on 2D filtering or pre-training on an annotated dataset, hindering the performance when generalizing to real-world seafloor data with 3D structures. In this paper, we present a novel method Recurrent Gaussian Splatting (RecGS), which takes advantage of today’s photorealistic 3D reconstruction technology, 3D Gaussian Splatting (3DGS), to separate caustics from seafloor imagery. With a sequence of images taken by an underwater robot, we build 3DGS recurrently and decompose the caustic with low-pass filtering in each iteration. In the experiments, we analyze and compare with different methods, including joint optimization, 2D filtering, and deep learning approaches. The results show that our proposed RecGS paradigm can effectively separate the caustic from the seafloor, improving the visual appearance, and can be potentially applied on more problems with inconsistent illumination.
|
|
ThCT8 |
311 |
Aerial Robots: Learning 2 |
Regular Session |
Chair: Robuffo Giordano, Paolo | Irisa Cnrs Umr6074 |
Co-Chair: Shim, David Hyunchul | KAIST |
|
11:15-11:20, Paper ThCT8.1 | |
Learning to Fly in Seconds |
|
Eschmann, Jonas | New York University |
Albani, Dario | Technology Innovation Institure |
Loianno, Giuseppe | New York University |
Keywords: Aerial Systems: Applications, Machine Learning for Robot Control, Reinforcement Learning
Abstract: Learning-based methods, particularly Reinforcement Learning (RL), hold great promise for streamlining deployment, enhancing performance, and achieving generalization in the control of autonomous multirotor aerial vehicles. Deep RL has been able to control complex systems with impressive fidelity and agility in simulation but the simulation-to-reality transfer often brings a hard-to-bridge reality gap. Moreover, RL is commonly plagued by prohibitively long training times. In this work, we propose a novel asymmetric actor-critic-based architecture coupled with a highly reliable RL-based training paradigm for end-to-end quadrotor control. We show how curriculum learning and a highly optimized simulator enhance sample complexity and lead to fast training times. To precisely discuss the challenges related to low-level/end-to-end multirotor control, we also introduce a taxonomy that classifies the existing levels of control abstractions as well as non-linearities and domain parameters. Our framework enables Simulation-to-Reality (Sim2Real) transfer for direct RPM control after only 18 seconds of training on a consumer-grade laptop as well as its deployment on microcontrollers to control a multirotor under real-time guarantees. Finally, our solution exhibits competitive performance in trajectory tracking, as demonstrated through various experimental comparisons with existing state-of-the-art control solutions using a real Crazyflie nano quadrotor. We open source the code including a very fast multirotor dynamics simulator that can simulate about 5 months of flight per second on a laptop GPU. The fast training times and deployment to a cheap, off-the-shelf quadrotor lower the barriers to entry and help democratize the research and development of these systems.
|
|
11:20-11:25, Paper ThCT8.2 | |
Multi-UAVs End-To-End Distributed Trajectory Generation Over Point Cloud Data |
|
Marino, Antonio | University of Rennes |
Pacchierotti, Claudio | Centre National De La Recherche Scientifique (CNRS) |
Robuffo Giordano, Paolo | Irisa Cnrs Umr6074 |
Keywords: Aerial Systems: Mechanics and Control, Multi-Robot Systems, Deep Learning Methods
Abstract: This paper introduces an end-to-end trajectory planning algorithm tailored for multi-UAV systems that gener- ates collision-free trajectories in environments populated with both static and dynamic obstacles, leveraging point cloud data. Our approach consists of a 2-branch neural network fed with sensing and localization data, able to communicate intermediate learned features among the agents. One network branch crafts an initial collision-free trajectory estimate, while the other devises a neural collision constraint for subsequent optimiza- tion, ensuring trajectory continuity and adherence to physical actuation limits. Extensive simulations in challenging cluttered environments, involving up to 25 robots and 25% obstacle density, show a collision avoidance success rate in the range of 100 − 85%. Finally, we introduce a saliency map computation method acting on the point cloud data, offering qualitative insights into our methodology.
|
|
11:25-11:30, Paper ThCT8.3 | |
Lightweight yet High-Performance Defect Detector for UAV-Based Large-Scale Infrastructure Real-Time Inspection |
|
Zhao, Benyun | The Chinese University of Hong Kong |
Duan, Qigeng | The Chinese University of Hong Kong |
Yang, Guidong | The Chinese University of Hong Kong |
Tang, Haoyun (Jerry) | UC Berkeley |
Song, Zhenbo | Nanjing University of Science and Technology |
Wen, Junjie | The Chinese University of Hong Kong |
Liu, Xuchen | The Chinese University of Hong Kong |
Li, Qingxiang | The Chineses University of Hong Kong |
Lei, Lei | City University of Hong Kong |
Zhang, Jihan | Chinese University of Hong Kong |
Chen, Xi | The Chinese University of Hong Kong |
Mueller, Mark Wilfried | University of California, Berkeley |
Chen, Ben M. | Chinese University of Hong Kong |
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Deep Learning Methods
Abstract: Defect diagnosis in urban infrastructure is crucial for public safety. Traditional manual inspections face significant challenges in terms of accuracy and cost-effectiveness. In this paper, we propose a lightweight and hardware-friendly large-scale infrastructure detector, CUPID, highly suitable for unmanned aerial vehicles (UAVs). Given the significant challenges in automatically detecting defects of varying intensity and size within complex infrastructure, along with the tendency of lightweight models to lose detail and fail to fully capture features during the defect extraction process, we propose the CUPID_Block, a multi-level information fusion block to construct the backbone, featuring the CUPID_Conv module equipped with our proposed CCA (CrissCross Attention). Furthermore, CUPID features an auxiliary training branch that assimilates lower feature maps, helping to recover details lost in deeper convolutional layers. To verify the effectiveness of CUPID and to address the lack of a suitable dataset in the community, we establish a multi-scenario infrastructure defect dataset, CUBIT2024, to conduct extensive experiments. Finally, to assess the efficiency and adaptability of CUPID in UAV for online infrastructure inspection, we design a compact autonomous drone, CU-Astro, where the proposed CUPID is deployed on the Jetson Orin NX computer onboard to evaluate the speed and power consumption of the inference.
|
|
11:30-11:35, Paper ThCT8.4 | |
ProxFly: Robust Control for Close Proximity Quadcopter Flight Via Residual Reinforcement Learning |
|
Zhang, Ruiqi | University of California, Berkeley |
Zhang, Dingqi | University of California, Berkeley |
Mueller, Mark Wilfried | University of California, Berkeley |
Keywords: Reinforcement Learning, Aerial Systems: Mechanics and Control, Robust/Adaptive Control
Abstract: This paper proposes the ProxFly, a residual deep Reinforcement Learning (RL)-based controller for close proximity quadcopter flight. Specifically, we design a residual module on top of a cascaded controller (denoted as basic controller) to generate high-level control commands, which compensate for external disturbances and thrust loss caused by downwash effects from other quadcopters. First, our method takes only the ego state and controllers' commands as inputs and does not rely on any communication between quadcopters, thereby reducing the bandwidth requirement. Through domain randomization, our method relaxes the requirement for accurate system identification and fine-tuned controller parameters, allowing it to adapt to changing system models. Meanwhile, our method not only reduces the proportion of unexplainable signals from the black box in control commands but also enables the RL training to skip the time-consuming exploration from scratch via guidance from the basic controller. We validate the effectiveness of the residual module in the simulation with different proximities. Moreover, we conduct the real close proximity flight test to compare ProxFly with the basic controller and an advanced model-based controller with complex aerodynamic compensation. Finally, we show that ProxFly can be used for challenging quadcopter mid-air docking, where two quadcopters fly in extreme proximity, and strong airflow significantly disrupts flight. However, our method can stabilize the quadcopter in this case and accomplish docking. The resources are available at https://github.com/ruiqizhang99/ProxFly.
|
|
11:35-11:40, Paper ThCT8.5 | |
TempFuser: Learning Agile, Tactical, and Acrobatic Flight Maneuvers Using a Long Short-Term Temporal Fusion Transformer |
|
Seong, Hyunki | KAIST |
Shim, David Hyunchul | KAIST |
Keywords: Aerial Systems: Applications, Reinforcement Learning, Machine Learning for Robot Control
Abstract: Dogfighting is a challenging scenario in aerial applications that requires a comprehensive understanding of both strategic maneuvers and the aerodynamics of agile aircraft. The aerial agent needs to not only understand tactically evolving maneuvers of fighter jets from a long-term perspective but also react to rapidly changing aerodynamics of aircraft from a short-term viewpoint. In this paper, we introduce TempFuser, a novel long short-term temporal fusion transformer architecture that can learn agile, tactical, and acrobatic flight maneuvers in complex dogfight problems. Our approach integrates two distinct temporal transition embeddings into a transformer-based network to comprehensively capture both the long-term tactics and short-term agility of aerial agents. By incorporating these perspectives, our policy network generates end-to-end flight commands that secure dominant positions over the long term and effectively outmaneuver agile opponents. After training in a high-fidelity flight simulator, our model successfully learns to execute strategic maneuvers, outperforming baseline policy models against various types of opponent aircraft. Notably, our model exhibits human-like acrobatic maneuvers even when facing adversaries with superior specifications, all without relying on prior knowledge. Moreover, it demonstrates robust pursuit performance in challenging supersonic and low-altitude situations. Demo videos are available at https://sites.google.com/view/tempfuser.
|
|
11:40-11:45, Paper ThCT8.6 | |
Modular Reinforcement Learning for a Quadrotor UAV with Decoupled Yaw Control |
|
Yu, Beomyeol | The George Washington University |
Lee, Taeyoung | George Washington University |
Keywords: Aerial Systems: Mechanics and Control, Reinforcement Learning, AI-Enabled Robotics
Abstract: This paper presents modular reinforcement learning (RL) frameworks for the low-level control of a quadrotor, enabling direct control of yawing motion. While traditional monolithic RL approaches have demonstrated success in real-world autonomous flight, they often struggle to precisely control both the translational and yawing motions due to their distinct dynamic characteristics and strong coupling. Moreover, training a large-scale monolithic network typically demands a wealth of training data for broad generalization. To address these issues, we decompose the quadrotor dynamics into translational and yawing subsystems and assign dedicated modular RL agents for each. This design significantly improves performance, as each RL agent is trained for its specific purpose, and they are integrated in a synergistic way. It further enhances robustness, as potential failures within one module have minimal impact on the other, promoting fault tolerance. These improvements are illustrated by flight experiments achieved via zero-shot sim-to-real transfer. It is shown that the proposed modular policies substantially enhance training efficiency, tracking performance, and adaptability to real-world conditions.
|
|
ThCT9 |
312 |
Task and Motion Planning 2 |
Regular Session |
Chair: Pappas, George J. | University of Pennsylvania |
Co-Chair: Ashur, Stav | University of Illinois |
|
11:15-11:20, Paper ThCT9.1 | |
HBTP: Heuristic Behavior Tree Planning with Large Language Model Reasoning |
|
Cai, Yishuai | National University of Defense Technology |
Chen, Xinglin | National University of Defense Technology |
Mao, Yunxin | National University of Defense Technology |
Li, Minglong | National University of Defense Technology |
Yang, Shaowu | National University of Defense Technology |
Yang, Wenjing | State Key Laboratory of High Performance Computing (HPCL), Schoo |
Wang, Ji | National University of Defense Technology |
Keywords: AI-Enabled Robotics
Abstract: Behavior Trees (BTs) are increasingly becoming a popular control structure in robotics due to their modularity, reactivity, and robustness. In terms of BT generation methods, BT planning shows promise for generating reliable BTs. However, the scalability of BT planning is often constrained by prolonged planning times in complex scenarios, largely due to a lack of domain knowledge. In contrast, pre-trained Large Language Models (LLMs) have demonstrated task reasoning capabilities across various domains, though the correctness and safety of their planning remain uncertain. This paper proposes integrating BT planning with LLM reasoning, introducing Heuristic Behavior Tree Planning (HBTP)—a reliable and efficient framework for BT generation. The key idea in HBTP is to leverage LLMs for task-specific reasoning to generate a heuristic path, which BT planning can then follow to expand efficiently. We first introduce the heuristic BT expansion process, along with two heuristic variants designed for optimal planning and satisficing planning, respectively. Then, we propose methods to address the inaccuracies of LLM reasoning, including action space pruning and reflective feedback, to further enhance both reasoning accuracy and planning efficiency. Experiments demonstrate the theoretical bounds of HBTP, and results from four datasets confirm its practical effectiveness in everyday service robot applications.
|
|
11:20-11:25, Paper ThCT9.2 | |
SPINE: Online Semantic Planning for Missions with Incomplete Natural Language Specifications in Unstructured Environments |
|
Ravichandran, Zachary | University of Pennsylvania |
Murali, Varun | University of Pennsylvania |
Tzes, Mariliza | University of Pennsylvania |
Pappas, George J. | University of Pennsylvania |
Kumar, Vijay | University of Pennsylvania |
Keywords: AI-Enabled Robotics, Autonomous Agents, Field Robots
Abstract: As robots become increasingly capable, users will want to describe high-level missions and have robots infer the relevant details. Because pre-built maps are difficult to obtain in many realistic settings, accomplishing such missions will require the robot to map and plan online. While many semantic planning methods operate online, they are typically designed for well specified missions such as object search or exploration. Recently, Large Language Models (LLMs) have demonstrated powerful contextual reasoning abilities over a range of robotic tasks described in natural language. However, existing LLM-enabled planners typically do not consider online planning or complex missions; rather, relevant subtasks and semantics are provided by a pre-built map or a user. We address these limitations via SPINE, an online planner for missions with incomplete mission specifications provided in natural language. The planner uses an LLM to reason about subtasks implied by the mission specification and then realizes these subtasks in a receding horizon framework. Tasks are automatically validated for safety and refined online with new map observations. We evaluate SPINE in simulation and real-world settings with missions that require multiple steps of semantic reasoning and exploration in cluttered outdoor environments of over 20,000 square meters. Compared to baselines that use existing LLM-enabled planning approaches, our method is over twice as efficient in terms of time and distance, requires less user interactions, and does not require a full map. Additional resources are provided at https://zacravichandran.github.io/SPINE.
|
|
11:25-11:30, Paper ThCT9.3 | |
Closed Loop Interactive Embodied Reasoning for Robot Manipulation |
|
Nazarczuk, Michal | Imperial College London |
Behrens, Jan Kristof | Czech Technical University in Prague, CIIRC |
Stepanova, Karla | Czech Technical University |
Hoffmann, Matej | Czech Technical University in Prague, Faculty of Electrical Engi |
Mikolajczyk, Krystian | Imperial College London |
Keywords: AI-Enabled Robotics, Manipulation Planning, Reactive and Sensor-Based Planning
Abstract: Embodied reasoning systems integrate robotic hardware and cognitive processes to perform complex tasks, typically in response to a natural language query about a specific physical environment. This usually involves changing the belief about the scene or physically interacting and changing the scene (e.g., sort the objects from lightest to heaviest). In order to facilitate the development of such systems we introduce a new modular Closed Loop Interactive Embodied Reasoning (CLIER) approach that takes into account the measurements of non-visual object properties, changes in the scene caused by external disturbances as well as uncertain outcomes of robotic actions. CLIER performs multi-modal reasoning and action planning and generates a sequence of primitive actions that can be executed by a robot manipulator. Our method operates in a closed loop, responding to changes in the environment. Our approach is developed with the use of MuBle simulation environment and tested in 10 interactive benchmark scenarios. We extensively evaluate our reasoning approach in simulation and in real-world manipulation tasks with a success rate above 76% and 64%, respectively.
|
|
11:30-11:35, Paper ThCT9.4 | |
SayComply: Grounding Field Robotic Tasks in Operational Compliance through Retrieval-Based Language Models |
|
Ginting, Muhammad Fadhil | Stanford University |
Kim, Dong Ki | Massachusetts Institute of Tech |
Kim, Sung-Kyun | NASA Jet Propulsion Laboratory, Caltech |
Bandi, Jai Krishna | Field AI |
Kochenderfer, Mykel | Stanford University |
Omidshafiei, Shayegan | Massachusetts Institute of Technology |
Agha-mohammadi, Ali-akbar | NASA-JPL, Caltech |
Keywords: AI-Enabled Robotics, Field Robots, Task and Motion Planning
Abstract: This paper addresses the problem of task planning for robots that must comply with operational manuals in real-world settings. Task planning under these constraints is essential for enabling autonomous robot operation in domains that require adherence to domain-specific knowledge. Current methods for generating robot goals and plans rely on common sense knowledge encoded in large language models. However, these models lack grounding of robot plans to domain-specific knowledge and are not easily transferable between multiple sites or customers with different compliance needs. In this work, we present SayComply, which enables grounding robotic task planning with operational compliance using retrieval-based language models. We design a hierarchical database of operational, environment, and robot embodiment manuals and procedures to enable efficient retrieval of the relevant context under the limited context length of the LLMs. We then design a task planner using a tree-based retrieval augmented generation (RAG) technique to generate robot tasks that follow user instructions while simultaneously complying with the domain knowledge in the database. We demonstrate the benefits of our approach through simulations and hardware experiments in real-world scenarios that require precise context retrieval across various types of context, outperforming the standard RAG method. Our approach bridges the gap in deploying robots that consistently adhere to operational protocols, offering a scalable and edge-deployable solution for ensuring compliance across varied and complex real-world environments.
|
|
11:35-11:40, Paper ThCT9.5 | |
LiP-LLM: Integrating Linear Programming and Dependency Graph with Large Language Models for Multi-Robot Task Planning |
|
Obata, Kazuma | Osaka University |
Aoki, Tatsuya | Osaka University |
Horii, Takato | Osaka University |
Taniguchi, Tadahiro | Ritsumeikan University |
Nagai, Takayuki | Osaka University |
Keywords: Multi-Robot Systems, Task Planning, Cooperating Robots
Abstract: This study proposes LiP-LLM: integrating linear programming and dependency graph with large language models (LLMs) for multi-robot task planning. In order for multiple robots to perform tasks more efficiently, it is necessary to manage the precedence dependencies between tasks. Although multi-robot decentralized and centralized task planners using LLMs have been proposed, none of these studies focus on precedence dependencies from the perspective of task efficiency or leverage traditional optimization methods. It addresses key challenges in managing dependencies between skills and optimizing task allocation. LiP-LLM consists of three steps: skill list generation and dependency graph generation by LLMs, and task allocation using linear programming. The LLMs are utilized to generate a comprehensive list of skills and to construct a dependency graph that maps the relationships and sequential constraints among these skills. To ensure the feasibility and efficiency of skill execution, the skill list is generated by calculated likelihood, and linear programming is used to optimally allocate tasks to each robot. Experimental evaluations in simulated environments demonstrate that this method outperforms existing task planners, achieving higher success rates and efficiency in executing complex, multi-robot tasks. The results indicate the potential of combining LLMs with optimization techniques to enhance the capabilities of multi-robot systems in executing coordinated tasks accurately and efficiently. In an environment with two robots, a maximum success rate difference of 0.82 is observed in the language instruction group with a change in the object name.
|
|
11:40-11:45, Paper ThCT9.6 | |
Transformer-Based Model Predictive Control: Trajectory Optimization Via Sequence Modeling |
|
Celestini, Davide | Politecnico Di Torino |
Gammelli, Daniele | Stanford |
Guffanti, Tommaso | Stanford University |
D’Amico, Simone | Stanford University |
Capello, Elisa | Politecnico Di Torino CNR IEIIT |
Pavone, Marco | Stanford University |
Keywords: Optimization and Optimal Control, Deep Learning Methods, Machine Learning for Robot Control
Abstract: Model predictive control (MPC) has established itself as the primary methodology for constrained control, enabling general-purpose robot autonomy in diverse real-world scenarios. However, for most problems of interest, MPC relies on the recursive solution of highly non-convex trajectory optimization problems, leading to high computational complexity and strong dependency on initialization. In this work, we present a unified framework to combine the main strengths of optimization-based and learning-based methods for MPC. Our approach entails embedding high-capacity, transformer-based neural network models within the optimization process for trajectory generation, whereby the transformer provides a near-optimal initial guess, or target plan, to a non-convex optimization problem. Our experiments, performed in simulation and the real world onboard a free flyer platform, demonstrate the capabilities of our framework to improve MPC convergence and runtime. Compared to purely optimization-based approaches, results show that our approach can improve trajectory generation performance by up to 75%, reduce the number of solver iterations by up to 45%, and improve overall MPC runtime by 7x without loss in performance.
|
|
ThCT10 |
313 |
Multi-Robot Systems 5 |
Regular Session |
Chair: Saeedi, Sajad | Toronto Metropolitan University |
Co-Chair: Sabattini, Lorenzo | University of Modena and Reggio Emilia |
|
11:15-11:20, Paper ThCT10.1 | |
A Method for Constructing Building Structure Grid Map Based on a Climbing Algorithm |
|
Zhou, Xidong | Hunan University |
Zhong, Hang | Hunan University |
Zhang, Hui | Hunan University |
Chen, MingYuan | Hunan University |
Yu, Haoyang | Hunan University |
Wang, Weizheng | Hunan University |
Wang, Yaonan | Hunan University |
Keywords: Aerial Systems: Perception and Autonomy, Mapping, Motion and Path Planning
Abstract: Aerial-terrestrial amphibious robots excel in search and rescue tasks in unstructured terrains but face challenges in autonomous navigation indoors. Traditional full-mapping methods can degrade global path planning performance, especially when semi-static obstacles shift, leading to suboptimal paths. We propose a method for constructing building structure grid maps that are unaffected by semi-static obstacles. Our approach includes a building structure recognition algorithm based on an octree structure to differentiate between occupied and free grid cells. Experimental results demonstrate that coverage path planning on building structure grid maps produces superior global paths compared to traditional grid maps, offering a more streamlined and robust solution for autonomous navigation of aerial-terrestrial amphibious robots in indoor environments.
|
|
11:20-11:25, Paper ThCT10.2 | |
Efficient Scale-Uniform 3D Visual Coverage Algorithm for UAV Based on Elastic Photogrammetric Constraints |
|
Zong, Jianping | Nankai University |
Cao, Zhongzhi | Nankai University |
Chen, Qi | Nankai University |
Sun, Chuanyu | Nankai University |
Shao, Xiuli | Nankai University |
Li, Haifeng | Civil Aviation University of China |
Wang, Hongpeng | Nankai University |
Keywords: Aerial Systems: Applications, Environment Monitoring and Management, Search and Rescue Robots
Abstract: Unmanned aerial vehicles equipped with modern vision algorithms are crucial for missions such as reconstruction and target acquisition. However, when deployed in the field, undulating terrain can cause significant fluctuations in image scale and degrade the performance of vision algorithms. Instead of developing specialized image processing schemes with limited adaptability, this paper presents a novel 3D visual coverage algorithm that is compatible with existing generic vision algorithms and maintains a uniform image scale for ground targets. In detail, photogrammetric constraints are initially introduced to generate aerial waypoints, and then the negative effects of valley clustering are addressed. Elastic Photogrammetric Constraints (EPC) are further proposed to eliminate valley clustering effects induced by saddle terrain. The experimental results demonstrate that EPC reduces the traversal path length by up to 37.38% compared to the previous work, but with a minor trade-off in scale variations.
|
|
11:25-11:30, Paper ThCT10.3 | |
Target-Aware Viewpoint Generation for Active Robotic Exploration in Unknown Environments |
|
Xu, Pu | Northeastern University |
Liu, Haoming | Northeastern University(CN) |
Li, Zhiheng | Northeastern University |
Bai, Zhaoqiang | Northeastern University |
Fang, Zheng | Northeastern University |
Keywords: Search and Rescue Robots, Constrained Motion Planning, Motion and Path Planning
Abstract: When entering an unfamiliar environment, animals usually sweep off their surroundings to identify points of interest. In search and rescue robotics, autonomous exploration requires both coarse mapping of unknown areas and detailed target detection, which poses a significant challenge in balancing these tasks. To that end, we propose a target-aware robotic exploration framework that prioritizes both exploration efficiency and search completeness through three components: First, considering the computational limitations of robotic platforms, a lightweight 3D target detection method with post-fusion is introduced to detect target positions in real time. Secondly, we propose a target-aware viewpoint generation approach that integrates information gain and inspection gain to identify promising viewpoints for thorough target searches. Lastly, since a detailed examination of the environment demands numerous viewpoints, we propose a heuristic-based active exploration framework that employs a hierarchical structure to optimize exploration gain, traveling distance, and path smoothness to maximize the utility function of viewpoint sequences and ultimately find the optimal path. Extensive simulations and real-world experiments demonstrate our framework significantly enhances target search capabilities, achieving a 13% average improvement in exploration efficiency over existing methods.
|
|
11:30-11:35, Paper ThCT10.4 | |
Online Multi-Robot Federated Learning for Distributed Coverage Control of Unknown Spatial Processes |
|
Mantovani, Mattia | University of Modena and Reggio Emilia |
Pratissoli, Federico | Universitŕ Degli Studi Di Modena E Reggio Emilia |
Sabattini, Lorenzo | University of Modena and Reggio Emilia |
Keywords: Distributed Robot Systems, Multi-Robot Systems, Networked Robots
Abstract: Distributed multi-robot teams are increasingly used for optimal coverage of domains with unknown density distributions, often modeled with Gaussian Processes (GPs). However, current methods rely on data sharing, raising privacy concerns and computational issues. We propose a Federated Learning (FL) approach that enables collaborative training of GP models without sharing raw data. To enhance scalability and efficiency, we introduce a filtering strategy that selects relevant data samples, minimizing computational load. Realistic simulations emulating real scenarios demonstrate the effectiveness of our method in achieving robust environmental estimates with minimal data sharing and reduced complexity.
|
|
11:35-11:40, Paper ThCT10.5 | |
Constrained Learning for Decentralized Multi-Objective Coverage Control |
|
Cervino, Juan | MIT |
Agarwal, Saurav | University of Pennsylvania |
Kumar, Vijay | University of Pennsylvania |
Ribeiro, Alejandro | University of Pennsylvania |
Keywords: Deep Learning Methods, Autonomous Vehicle Navigation, Multi-Robot Systems
Abstract: The multi-objective coverage control problem requires a robot swarm to collaboratively provide sensor coverage to multiple heterogeneous importance density fields (IDFs) simultaneously. We pose this as an optimization problem with constraints and study two different formulations: (1) Fair coverage, where we minimize the maximum coverage cost for any field, promoting equitable resource distribution among all fields; and (2) Constrained coverage, where each field must be covered below a certain cost threshold, ensuring that critical areas receive adequate coverage according to predefined importance levels. We study the decentralized setting where robots have limited communication and local sensing capabilities, making the system more realistic, scalable, and robust. Given the complexity, we propose a novel decentralized constrained learning approach that combines primal-dual optimization with a Learnable Perception-Action-Communication (LPAC) neural network architecture. We show that the Lagrangian of the dual problem can be reformulated as a linear combination of the IDFs, enabling the LPAC policy to serve as a primal solver. We empirically demonstrate that the proposed method (i) significantly outperforms state-of-the-art decentralized controllers by 30% on average in terms of coverage cost, (ii) transfers well to larger environments with more robots, and (iii) is scalable in the number of IDFs and robots in the swarm.
|
|
11:40-11:45, Paper ThCT10.6 | |
Di-NeRF: Distributed NeRF for Collaborative Learning with Relative Pose Refinement |
|
Asadi, Mahboubeh | Toronto Metropolitan University |
Zareinia, Kourosh | Toronto Metropolitan University |
Saeedi, Sajad | Toronto Metropolitan University |
Keywords: Distributed Robot Systems, Mapping, Multi-Robot SLAM
Abstract: Collaborative mapping of unknown environments can be done faster and more robustly than a single robot. However, a collaborative approach requires a distributed paradigm to be scalable and deal with communication issues. This work presents a fully distributed algorithm enabling a group of robots to collectively optimize the parameters of a Neural Radiance Field (NeRF). The algorithm involves the communication of each robot's trained NeRF parameters over a mesh network, where each robot trains its NeRF and has access to its own visual data only. Additionally, the relative poses of all robots are jointly optimized alongside the model parameters, enabling mapping with less accurate relative camera poses. We show that multi-robot systems can benefit from differentiable and robust 3D reconstruction optimized from multiple NeRFs. Experiments on real-world and synthetic data demonstrate the efficiency of the proposed algorithm. See the website of the project for videos of the experiments and supplementary material https://sites.google.com/view/di-nerf/home.
|
|
ThCT11 |
314 |
Haptics 2 |
Regular Session |
Chair: Kyung, Ki-Uk | Korea Advanced Institute of Science & Technology (KAIST) |
Co-Chair: Dills, Patrick | University of Wisconsin - Madison |
|
11:15-11:20, Paper ThCT11.1 | |
A Hybrid Haptic Device for Virtual Car Door Interactions: Design and Implementation |
|
Ma, Jihyeong | Korea Advanced Institute of Science and Technology |
Kim, Ji-Sung | KAIST |
Kyung, Ki-Uk | Korea Advanced Institute of Science & Technology (KAIST) |
Keywords: Haptics and Haptic Interfaces, Virtual Reality and Interfaces, Compliance and Impedance Control
Abstract: As cars evolve from mere modes of transportation into living spaces, the importance of haptic interaction with vehicles is increasing. Here, we introduce a hybrid haptic device for the virtual prototyping of car doors, employing both the motor and brake. Physical prototyping, which is a conventional method for product designing, is often expensive and time-consuming. As a valuable alternative, virtual prototyping with a haptic device that delivers realistic haptic feedback can be utilized. However, replicating the substantial torque of a car door requires a high torque capacity motor, which can potentially pose safety risks to the user during haptic interaction. The proposed hybrid haptic device, combining a servo motor and a magnetic powder brake, effectively renders the dynamics of car doors. We experimentally measured the door's torque profile and confirmed significant friction from the door check mechanism and hinge. The torque profile was divided into active and passive torque, and each torque was distributed to the motor and brake, respectively. Finally, the proposed device and control method demonstrate the capability to accurately render the car door's kinesthetic haptic feedback, confirming its potential as an efficient tool for virtual prototyping in automotive design.
|
|
11:20-11:25, Paper ThCT11.2 | |
RAR-6: An Optimized Reconfigurable Asymmetric 6-DOF Haptic Robot for Gross and Fine Motor Tasks |
|
Zhang, Changqi | SINOPEC Research Institute of Petroleum Engineering Co., Ltd |
Wang, Cui | Southern University of Science and Technology |
Wang, Congzhe | Chongqing University of Posts and Telecommunications |
Zhang, Mingming | Southern University of Science and Technology |
Keywords: Haptics and Haptic Interfaces, Optimization and Optimal Control, Mechanism Design
Abstract: Robot-assisted task-oriented training demonstrates immense potential in rehabilitation area. Parallel robots, with advantages such as low inertia and high stiffness, facilitate precise haptic feedback, yet their application in rehabilitation is limited by workspace constraints. To this end, we propose a design scheme for a haptic robot based on a reconfigurable asymmetric parallel mechanism. We first introduce a two-stage multi-objective optimization method to obtain the optimal parameter configurations. Then, to achieve precise assembling of the reconfigurable mechanism in each configuration, corresponding positioning mechanisms are designed. System performance tests validate the robot’s capabilities under different configurations: workspace meets design requirements, stiffness output reaches 30 N/mm, force output is 40 N, RMS of maximum back-driven force along x, y, and z axes is 7.5 N, and RMS of maximum back-driven torque around x and y axes is 567.4 N∙mm. Target tracking and virtual channel trajectory tracking experiments demonstrate the system’s haptic rendering ability for gross motor tasks (GMTs) and fine motor tasks (FMTs), respectively. The developed 6-DOF haptic robot holds promise for versatile task-oriented rehabilitation training.
|
|
11:25-11:30, Paper ThCT11.3 | |
Design, Implementation, and Validation of an Ungrounded Visuo-Tactile Haptic Interface for Robotic Teleoperation in High-Risk Steel Production |
|
Park, Jaehyun | Pohang University of Science and Technology |
Choi, Il Seop | POSCO HOLDINGS |
Choi, Sang-Woo | PoscoHoldings |
Kim, Keehoon | POSTECH, Pohang University of Science and Technology |
Keywords: Telerobotics and Teleoperation, Haptics and Haptic Interfaces, Robotics in Hazardous Fields
Abstract: Haptic devices are widely used as control interfaces for robotic teleoperation, offering intuitive rendering of interactions between remote robot and environment. In particular, cutaneous feedback devices provide intrinsic stability and reduced form factor compared to kinesthetic feedback interfaces. However, the implementation of cutaneous feedback devices in industrial settings must be rigorously validated to prevent potential equipment accidents, which could lead to substantial economic losses due to unskilled robot manipulation. This paper presents a novel ungrounded haptic control interface (POstick-VF), designed specifically for high-risk steel production tasks. POstick-VF offers visuo-tactile feedback within an extensive workspace, enabling intuitive robot manipulation through its kinematic similarity with real tools ensuring safety. The performance of the developed POstick is rigorously validated and compared with conventional joystick controller through experiments conducted with an on-site hydraulic robot.
|
|
11:30-11:35, Paper ThCT11.4 | |
Enhanced Tiny Haptic Dial with T-Shaped Shaft Based on Magnetorheological Fluid |
|
Heo, Yong Hae | Korea University of Technology and Education |
Kim, Seongho | Korea University of Technology and Education |
Kim, Sang-Youn | Korea Univ. Technology & Education |
Keywords: Haptics and Haptic Interfaces, Touch in HRI
Abstract: This paper introduces a tiny haptic dial utilizing magnetorheological fluid (MRF) to enhance its resistive torque feedback. Moreover, we design the T-shaped rotary shaft with bumps and embed it into the haptic dial to enhance its haptic performance (resistive torque). This structure enables two operation modes (shear and flow) of MRF that contribute to the actuation simultaneously in the proposed haptic dial. This structure allows the magnetic flux to flow towards the MRF, helping further maximize the resistive torque. We conduct a simulation to confirm that the magnetic flux generated from a solenoid forms a closed-loop magnetic path without magnetic saturation or leakage in the proposed haptic dial. The resistive torque of the proposed haptic dial varied from 8 N·mm to 47 N·mm as the input current changed from 0 to 300 mA, thus indicating that the proposed haptic dial can create a variety of haptic sensations in a tiny size (diameter: 20 mm; height:20 mm).
|
|
11:35-11:40, Paper ThCT11.5 | |
Path-Constrained Haptic Motion Guidance Via Adaptive Phase-Based Admittance Control |
|
Shahriari, Erfan | Boston Dynamics AI Institute |
Svarny, Petr | CTU in Prague, FEE |
Baradaran Birjandi, Seyed Ali | Technical University of Munich |
Hoffmann, Matej | Czech Technical University in Prague, Faculty of Electrical Engi |
Haddadin, Sami | Mohamed Bin Zayed University of Artificial Intelligence |
Keywords: Motion and Path Planning, Physical Human-Robot Interaction, Robust/Adaptive Control of Robotic Systems, Motion Control
Abstract: Robots have surpassed humans in terms of strength and precision, yet humans retain an unparalleled ability for decision-making in the face of unpredictable disturbances. This article aims to combine the strengths of both entities within a singular task: human motion guidance under strict geometric constraints, particularly adhering to predetermined paths. To tackle this challenge, a modular haptic guidance law is proposed that takes the human-applied wrench as an input. Using an auxiliary variable called phase, the generated desired motion is guaranteed to consistently adhere to the constraint path. It is demonstrated how the guidance policy can be generalized into physically interpretable terms, adjustable either prior to initiating the task or dynamically while the task is in progress. Additionally, an illustrative guidance adaptation policy is showcased that takes into account the human's manipulability. Leveraging passivity analysis, potential sources of instability are pinpointed, and subsequently, overall system stability is ensured by incorporating an augmented virtual energy tank. Lastly, a comprehensive set of experiments, including a 20-participant user study, explores various aspects of the approach in practice, encompassing both technical and usability consideration.
|
|
11:40-11:45, Paper ThCT11.6 | |
A Pneumatic-Actuated Feel-Through Wearable Haptic Display for Multi-Cue Delivery |
|
Pagnanelli, Giulia | University of Pisa |
Latella, Giovanni | University of Pisa |
Catalano, Manuel Giuseppe | Istituto Italiano Di Tecnologia |
Bianchi, Matteo | University of Pisa |
Keywords: Haptics and Haptic Interfaces, Wearable Robotics, Mechanism Design
Abstract: Compared to the ”Seeing-through” paradigm for the concurrent display of both real and virtual images in vision-enabled Augmented Reality (AR), its haptic counterpart, i.e., the ”Feeling-through” via wearable tactile systems, which enables to experience simultaneously physical objects and haptically rendered virtual properties, is still largely unexplored. In a previous work, we introduced the Wearable-Fabric Yielding Display (W-FYD), which uses an elastic thin fabric as the interaction surface with the finger, allowing the delivery of softness-related cues both in active and passive exploration mode, together with sliding stimuli. The device was proven effective, but the current design faces form factor issues related to the dimensions and weight of the device, due to the actuation strategy of the lifting mechanism in the passive mode. To tackle this issue, we propose a miniaturized version of the system, named the W-FYD AIR, which allows reducing the overall dimensions of the device, from 100 × 60 × 36 mm to 78 × 45 × 37 mm, and its weight, from 100 g to 54 g, by exploiting pneumatically-actuated chambers for the lifting mechanism. Through careful sizing of each component and a process of characterization and identification, we demonstrated that the new system attained the same characteristics and functionality as the original one.
|
|
ThCT12 |
315 |
Big Data |
Regular Session |
Chair: Xu, Danfei | Georgia Institute of Technology |
Co-Chair: Shi, Guangyao | University of Southern California |
|
11:15-11:20, Paper ThCT12.1 | |
How Generalizable Is My Behavior Cloning Policy? a Statistical Approach to Trustworthy Performance Evaluation |
|
Vincent, Joseph | Stanford University |
Nishimura, Haruki | Toyota Research Institute |
Itkina, Masha | Stanford University |
Shah, Paarth | University of Oxford |
Schwager, Mac | Stanford University |
Kollar, Thomas | Toyota Research Institute |
Keywords: Performance Evaluation and Benchmarking, Probability and Statistical Methods, AI-Enabled Robotics
Abstract: With the rise of stochastic generative models in robot policy learning, end-to-end visuomotor policies are increasingly successful at solving complex tasks by learning from human demonstrations. Nevertheless, since real-world evaluation costs afford users only a small number of policy rollouts, it remains a challenge to accurately gauge the performance of such policies. This is exacerbated by distribution shifts causing unpredictable changes in performance during deployment. To rigorously evaluate behavior cloning policies, we present a framework that provides a tight lower-bound on robot performance in an arbitrary environment, using a minimal number of experimental policy rollouts. Notably, by applying the standard stochastic ordering to robot performance distributions, we provide a worst-case bound on the entire distribution of performance (via bounds on the cumulative distribution function) for a given task. We build upon established statistical results to ensure that the bounds hold with a user-specified confidence level and tightness, and are constructed from as few policy rollouts as possible. In experiments we evaluate policies for visuomotor manipulation in both simulation and hardware. Specifically, we (i) empirically validate the guarantees of the bounds in simulated manipulation settings, (ii) find the degree to which a learned policy deployed on hardware generalizes to new real-world environments, and (iii) rigorously compare two policies tested in out-of-distribution settings. Our experimental data, code, and implementation of confidence bounds are open-source.
|
|
11:20-11:25, Paper ThCT12.2 | |
Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark |
|
Liu, Ying | Northeastern University, China |
Hua, Yijing | Northeastern University, China |
Chai, Haojiang | Northeastern University, China |
Wang, Yanbo | Northeastern University, China |
TengQi, Ye | Articul8 AI |
Keywords: Data Sets for Robotic Vision, Computer Vision for Automation, Object Detection, Segmentation and Categorization
Abstract: Open-vocabulary detectors are proposed to locate and recognize objects in novel classes. However, variations in vision-aware language vocabulary data used for open-vocabulary learning can lead to unfair and unreliable evaluations. Recent evaluation methods have attempted to address this issue by incorporating object properties or adding locations and characteristics to the captions. Nevertheless, since these properties and locations depend on the specific details of the images instead of classes, detectors can not make accurate predictions without precise descriptions provided through human annotation. This paper introduces 3F-OVD, a novel task that extends supervised fine-grained object detection to the open-vocabulary setting. Our task is intuitive and challenging, requiring a deep understanding of fine-grained captions and careful attention to fine-grained details in images in order to accurately detect fine-grained objects. Additionally, due to the scarcity of qualified fine-grained object detection datasets, we have created a new dataset, NEU-171K, tailored for both supervised and open-vocabulary settings. We benchmark state-of-the-art object detectors on our dataset for both settings. Furthermore, we propose a simple yet effective post-processing technique. Our data, annotations, and codes are available at https://github.com/tengerye/3FOVD.
|
|
11:25-11:30, Paper ThCT12.3 | |
GPU-Accelerated Subsystem-Based ADMM for Large-Scale Interactive Simulation |
|
Ji, Harim | Seoul National University |
Kim, Hyunsu | Seoul National University |
Lee, Jeongmin | Seoul National University |
Lee, Somang | Seoul National University |
An, Seoki | Seoul National University |
Heo, Jinuk | Seoul National University |
Lee, Youngseon | Seoul National University |
Lee, Yongseok | Seoul National University |
Lee, Dongjun | Seoul National University |
Keywords: Simulation and Animation, Virtual Reality and Interfaces, Haptics and Haptic Interfaces
Abstract: In this paper, we implement the GPU-accelerated subsystem-based Alternating Direction Method of Multipliers (SubADMM) for interactive simulation. The challenging objective for interactive simulations is to deliver realistic results under tight performance, even for large-scale scenarios. We aim to achieve this by exploiting the parallelizable nature of SubADMM to the fullest extent. We introduce a new subsystem division strategy to make SubADMM `GPU friendly' along with custom kernel designs and optimization regarding efficient memory access patterns. We successfully implement the GPU-accelerated SubADMM and show the accuracy and speed of the framework for large-scale scenarios, highlighted with an interactive `Hand demo' scenario. We also show improved robustness and accuracy compared to other state-of-the-art interactive simulators with several challenging scenarios that introduce large-scale ill-conditioned dynamics problems.
|
|
11:30-11:35, Paper ThCT12.4 | |
Local Policies Enable Zero-Shot Long-Horizon Manipulation |
|
Dalal, Murtaza | Carnegie Mellon University |
Liu, Min | Carnegie Mellon University |
Talbott, Walter | Apple |
Chen, Chen | Apple |
Pathak, Deepak | Carnegie Mellon University |
Zhang, Jian | Purdue University |
Salakhutdinov, Ruslan | University of Toronto |
Keywords: Big Data in Robotics and Automation, Machine Learning for Robot Control, Deep Learning Methods
Abstract: Sim2real for robotic manipulation is difficult due to the challenges of simulating complex contacts and generating realistic task distributions. To tackle the latter problem, we introduce ManipGen, which leverages a new class of policies for sim2real transfer: local policies. Locality enables a variety of appealing properties including invariances to absolute robot and object pose, skill ordering, and global scene configuration. We combine these policies with foundation models for vision, language and motion planning and demonstrate SOTA zero-shot performance of our method to Robosuite benchmark tasks in simulation (97%). We transfer our local policies from simulation to reality and observe they can solve unseen long-horizon manipulation tasks with up to 8 stages with significant pose, object and scene configuration variation. ManipGen outperforms SOTA approaches such as SayCan, OpenVLA and LLMTrajGen across 50 real-world manipulation tasks by 36%, 76% and 62% respectively. All code, models and datasets will be released. Video results at manipgen.github.io
|
|
11:35-11:40, Paper ThCT12.5 | |
DART: Dexterous Augmented Reality Teleoperation Platform for Large-Scale Robot Data Collection in Simulation |
|
Park, Younghyo | MIT |
Bhatia, Jagdeep | Massachusetts Institute of Technology |
Ankile, Lars | Massachusetts Institute of Technology |
Agrawal, Pulkit | MIT |
Keywords: Data Sets for Robot Learning, Telerobotics and Teleoperation, Virtual Reality and Interfaces
Abstract: The scarcity of diverse and high-quality data impedes the quest to build a generalist robotic system. Current robotics data collection efforts face many challenges: the need for physical robotic hardware, setting up the environment, frequent resets, and the fatigue for data collectors operating real robots. We introduce DART, a teleoperation platform designed for crowdsourcing that reimagines robotic data collection by leveraging cloud-based simulation and augmented reality (AR) to address many limitations of prior data collection efforts. User studies show that DART enables higher data collection throughput and lower physical fatigue than real-world teleoperation. We also demonstrate that policies trained using DART-collected datasets successfully transfer to reality and are robust to unseen visual disturbances. All data collected through DART is automatically stored in a cloud-hosted database, DexHub, paving the path for an ever-growing data hub for robot learning.
|
|
ThCT13 |
316 |
Motion Prediction |
Regular Session |
Chair: Liang, Xiao | Texas A&M University |
Co-Chair: Stiffler, Nicholas | University of Dayton |
|
11:15-11:20, Paper ThCT13.1 | |
TransFusion: A Practical and Effective Transformer-Based Diffusion Model for 3D Human Motion Prediction |
|
Tian, Sibo | Texas A&M University |
Zheng, Minghui | Texas A&M University |
Liang, Xiao | Texas A&M University |
Keywords: Human-Robot Collaboration, Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: Predicting human motion plays a crucial role in ensuring a safe and effective human-robot close collaboration in intelligent remanufacturing systems of the future. Existing works can be categorized into two groups: those focusing on accuracy, predicting a single future motion, and those generating diverse predictions based on observations. The former group fails to address the uncertainty and multi-modal nature of human motion, while the latter group often produces motion sequences that deviate too far from the ground truth or become unrealistic within historical contexts. To tackle these issues, we propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction which can generate samples that are more likely to happen while maintaining a certain level of diversity. Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers. Additionally, we employ the discrete cosine transform to model motion sequences in the frequency space, thereby improving performance. In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization to condition the prediction on past observed motion, we treat all inputs, including conditions, as tokens to create a more practical and effective model compared to existing approaches. Extensive experimental studies are conducted on benchmark datasets to validate the effectiveness of our human motion prediction model. The project page is available at https://github.com/sibotian96/TransFusion.
|
|
11:20-11:25, Paper ThCT13.2 | |
DE-TGN: Uncertainty-Aware Human Motion Forecasting Using Deep Ensembles |
|
Eltouny, Kareem | Simpson Gumpertz & Heger |
Liu, Wansong | University at Buffalo |
Tian, Sibo | Texas A&M University |
Zheng, Minghui | Texas A&M University |
Liang, Xiao | Texas A&M University |
Keywords: Human-Robot Collaboration, Computer Vision for Automation, Deep Learning for Visual Perception
Abstract: Ensuring the safety of human workers in a collaborative environment with robots is of utmost importance. Although accurate pose prediction models can help prevent collisions between human workers and robots, they are still susceptible to critical errors. In this study, we propose a novel approach called deep ensembles of temporal graph neural networks (DE-TGN) that not only accurately forecast human motion but also provide a measure of prediction uncertainty. By leveraging deep ensembles and employing stochastic Monte-Carlo dropout sampling, we construct a volumetric field representing a range of potential future human poses based on covariance ellipsoids. To validate our framework, we conducted experiments using three motion capture datasets including Human3.6M, and two human-robot interaction scenarios, achieving state-of-the-art prediction error. Moreover, we discovered that deep ensembles not only enable us to quantify uncertainty but also improve the accuracy of our predictions.
|
|
11:25-11:30, Paper ThCT13.3 | |
A Large-Scale Dataset for Humanoid Robotics Enabling a Novel Data-Driven Fall Prediction |
|
Urbann, Oliver | Fraunhofer IML |
Eßer, Julian | Fraunhofer IML |
Kleingarn, Diana | TU Dortmund University |
Moos, Arne | Robotics Research Institute |
Brämer, Dominik | Fraunhofer IML |
Brömmel, Piet | Fraunhofer IML |
Bach, Nicolas | Fraunhofer IML |
Jestel, Christian | Fraunhofer IML |
Larisch, Aaron | TU Dortmund University |
Kirchheim, Alice | TU Dortmund |
Keywords: Humanoid and Bipedal Locomotion, Failure Detection and Recovery, Data Sets for Robot Learning
Abstract: In this paper, we present a comprehensive dataset comprising 37.9 hours of sensor data collected from humanoid robots, including 18.3 hours of walking and 2,519 recorded falls. This extensive dataset is a valuable resource for various robotics and machine learning applications. Leveraging this data, we propose RePro-TCN, a Temporal Convolutional Network (TCN) enhanced with two novel extensions: Relaxed Loss Formulation and Progressive Forecasting. Predicting falls is a critical capability in humanoid robotics for implementing countermeasures such as lunging or stopping the walk. Thanks to the new dataset, we train RePro-TCN and demonstrate its superiority over previous approaches under real-world conditions that were previously unattainable.
|
|
11:30-11:35, Paper ThCT13.4 | |
Social-MAE: Social Masked Autoencoder for Multi-Person Motion Representation Learning |
|
Ehsanpour, Mahsa | University of Adelaide |
Reid, Ian | University of Adelaide |
Rezatofighi, Hamid | Monash University |
Keywords: Deep Learning for Visual Perception, Recognition, Human-Centered Robotics
Abstract: For seamless robot navigation, it’s vital to thoroughly understand multi-person scenes, which requires moving beyond simple tasks such as detection and tracking. Higher-level tasks, such as understanding the interactions and social activities among individuals, are also crucial. Progress towards models that can fully understand scenes involving multiple people is hindered by a lack of sufficient annotated data for such high-level tasks. To address this challenge, we introduce Social-MAE, a simple yet effective transformer-based masked autoencoder framework for multi-person human motion data. The framework uses masked modeling to pre-train the encoder to reconstruct masked human joint trajectories, enabling it to learn generalizable representations of motion in human crowded scenes. Social-MAE comprises a transformer as the MAE encoder and a lighter-weight transformer as the MAE decoder which operates on multi-person joints’ trajectory. After the reconstruction task, the MAE decoder is replaced with a task-specific decoder and the model is fine-tuned end-to-end for a variety of high-level social tasks. Our proposed model combined with our pre-training approach achieves the state-of-the-art results on various high-level social tasks, including multi-person pose forecasting, social grouping, and social action understanding. These improvements are demonstrated across four popular multi-person datasets encompassing both human 2D and 3D body pose.
|
|
11:35-11:40, Paper ThCT13.5 | |
Depth-Temporal Attention with Dual Modality Data for Walking Intention Prediction in Close-Proximity Front-Following |
|
Zhao, Chongyu | The University of Hong Kong |
Guo, Lingyu | The University of Hong Kong |
Wen, Rongwei | The University of Hong Kong |
Wang, Yanrui | The University of Hong Kong |
Wu, Chuan | The University of Hong Kong |
Keywords: Human Detection and Tracking, Intention Recognition, Visual Learning
Abstract: The role of robot following is crucial for effective human-robot collaboration. Traditional methods often rely on maintaining a significant distance between the robot and the human, which limits interaction and responsiveness. In contrast, close-proximity front-following facilitates immediate engagement, enhancing user experience and improving human-robot interaction. Nonetheless, it presents challenges in accurately interpreting human walking intentions due to a restricted observational field. In our paper, we introduce an innovative Depth-Temporal Attention Network that takes lower-limb depth images and robot motor signals as input, to accurately predict human walking intentions. This network leverages a depth attention module to capture essential spatial features and integrates a temporal attention mechanism to analyze movement dynamics. To enhance generalization, we use a domain adversarial module that focuses on shared features across diverse walking data, ensuring consistent performance across users. Experimental results demonstrate that our approach achieves an impressive average intention prediction accuracy of 91.09%, significantly surpassing baseline models by 12.59% to 23.66%. Additionally, an ablation study reveals that the depth-attention module substantially improves the model's understanding of depth features, resulting in an 11.44% increase in accuracy. With this high prediction accuracy, smooth front-following is achieved at close-proximity.
|
|
11:40-11:45, Paper ThCT13.6 | |
UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction |
|
Nilavadi, Nisarga | University of Technology Nuremberg |
Rudenko, Andrey | Robert Bosch GmbH |
Linder, Timm | Robert Bosch GmbH |
Keywords: Human Detection and Tracking, Human Factors and Human-in-the-Loop, Datasets for Human Motion
Abstract: We introduce a unified approach to forecast the dynamics of human keypoints along with the motion trajectory based on a short sequence of input poses. While many studies address either full-body pose prediction or motion trajectory prediction, only a few attempt to merge them. We propose a motion transformation technique to simultaneously predict full-body pose and trajectory key-points in a global coordinate frame. We utilize an off-the-shelf 3D human pose estimation module, a graph attention network to encode the skeleton structure, and a compact, non-autoregressive transformer suitable for real-time motion prediction for human-robot interaction and human-aware navigation. We introduce a human navigation dataset "DARKO" with specific focus on navigational activities that are relevant for human-aware mobile robot navigation. We perform extensive evaluation on Human3.6M, CMU-Mocap, and our DARKO dataset. In comparison to prior work, we show that our approach is compact, real-time, and accurate in predicting human navigation motion across all datasets. Result animations, our dataset, and code will be available at https://nisarganc.github.io/UPTor-page/
|
|
ThCT14 |
402 |
Scene Reconstruction Using Radiance Fields |
Regular Session |
Chair: Schwertfeger, Sören | ShanghaiTech University |
Co-Chair: Zakharov, Sergey | Toyota Research Institute |
|
11:15-11:20, Paper ThCT14.1 | |
Category-Level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment |
|
Lee, Taekbeom | Seoul National University |
Jang, Youngseok | Seoul National University |
Kim, H. Jin | Seoul National University |
Keywords: Mapping, Semantic Scene Understanding, Visual Learning
Abstract: Neural implicit representation has attracting at- tention in 3D reconstruction through various success cases. For further applications such as scene understanding or editing, sev- eral works have shown progress towards object-compositional reconstruction. Despite their superior performance in observed regions, their performance is still limited in reconstructing ob- jects that are partially observed. To better treat this problem, we introduce a category-level neural fields which learns meaningful common 3D information among objects belonging to the same category present in the scene. Our key idea is to subcategorize objects based on their observed shape for better training of category-level model. Then we take advantage of the neural field to conduct the challenging task of registering partially observed objects by selecting and aligning against representa- tive objects selected by ray-based uncertainty. Experiments on both simulation and real-world dataset demonstrate that our method improve reconstruction of unobserved part for several categories.
|
|
11:20-11:25, Paper ThCT14.2 | |
PlanarNeRF: Online Learning of Planar Primitives with Neural Radiance Fields |
|
Chen, Zheng | Indiana University Bloomington |
Yan, Qingan | Goertek US |
Zhan, Huangying | The University of Adelaide |
Cai, Changjiang | Stevens Institute of Technology |
Xu, Xiangyu | OPPO |
Huang, Yuzhong | University of Southern California |
Wang, Weihan | Stevens Institute of Technology |
Feng, Ziyue | Clemson University |
Xu, Yi | OPPO US Research Center |
Liu, Lantao | Indiana University |
Keywords: RGB-D Perception, Recognition
Abstract: Identifying spatially complete planar primitives from visual data is a crucial task in computer vision. Prior methods are largely restricted to either 2D segment recovery or simplifying 3D structures, even with extensive plane annotations. We present PlanarNeRF, a novel framework capable of detecting dense 3D planes through online learning. Drawing upon the neural field representation, PlanarNeRF brings three major contributions. First, it enhances 3D plane detection with concurrent appearance and geometry knowledge. Second, a lightweight plane fitting module is used to estimate plane parameters. Third, a novel global memory bank structure with an update mechanism is introduced, ensuring consistent cross-frame correspondence. The flexible architecture of PlanarNeRF allows it to function in both 2D-supervised and self-supervised solutions, in each of which it can effectively learn from sparse training signals, significantly improving training efficiency. Through extensive experiments, we demonstrate the effectiveness of PlanarNeRF in various real-world scenarios and remarkable improvement in 3D plane detection over existing works.
|
|
11:25-11:30, Paper ThCT14.3 | |
FreeDriveRF: Monocular RGB Dynamic NeRF without Poses for Autonomous Driving Via Point-Level Dynamic-Static Decoupling |
|
Wen, Yue | Shanghai Jiao Tong University |
Song, Liang | Dimension |
Liu, Yijia | China University of Mining and Technology |
Zhu, Siting | Shanghai Jiao Tong University |
Miao, Yanzi | China University of Mining and Technology |
Han, Lijun | Shanghai Jiao Tong University |
Wang, Hesheng | Shanghai Jiao Tong University |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Computer Vision for Automation
Abstract: Dynamic scene reconstruction for autonomous driving enables vehicles to perceive and interpret complex scene changes more precisely. Dynamic Neural Radiance Fields (NeRFs) have recently shown promising capability in scene modeling. However, many existing methods rely heavily on accurate poses inputs and multi-sensor data, leading to increased system complexity. To address this, we propose FreeDriveRF, which reconstructs dynamic driving scenes using only sequential RGB images without requiring poses inputs. We innovatively decouple dynamic and static parts at the early sampling level, avoiding image blurring and artifacts. To overcome the challenges posed by object motion and occlusion in monocular camera, we introduce a warped ray-guided dynamic object rendering consistency loss, utilizing optical flow to better constrain the dynamic modeling process. Additionally, we incorporate estimated dynamic flow to constrain the pose optimization process, improving the stability and accuracy of unbounded scene reconstruction. Extensive experiments conducted on the KITTI and Waymo datasets demonstrate the superior performance of our method in dynamic scene modeling for autonomous driving.
|
|
11:30-11:35, Paper ThCT14.4 | |
LLGS: Unsupervised Gaussian Splatting for Image Enhancement and Reconstruction in Pure Dark Environment |
|
Wang, Haoran | The University of Sussex |
Huang, Jingwei | University of Electronic Science and Technology of China |
Yang, Lu | University of Electronic Science and Technology of China |
Deng, Tianchen | Shanghai Jiao Tong University |
Zhang, Gaojing | University of Sussex |
Li, Mingrui | Dalian University of Technology |
Keywords: Visual Learning, Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: 3D Gaussian Splatting has shown remarkable capabilities in novel view rendering tasks and exhibits significant potential for multi-view optimization. However, the original 3D Gaussian Splatting lacks color representation for inputs in lowlight environments. Simply using enhanced images as inputs would lead to issues with multi-view consistency, and current single-view enhancement systems rely on pre-trained data, lacking scene generalization. These problems limit the application of 3D Gaussian Splatting in low-light conditions in the field of robotics, including high-fidelity modeling and feature matching. To address these challenges, we propose an unsupervised multiview stereoscopic system based on Gaussian Splatting, called Low-Light Gaussian Splatting (LLGS). This system aims to enhance images in low-light environments while reconstructing the scene. Our method introduces a decomposable Gaussian representation called M-Color, which separately characterizes color information for targeted enhancement. Furthermore, we propose an unsupervised optimization method with zeroknowledge priors, using direction-based enhancement to ensure multi-view consistency. Experiments conducted on real-world datasets demonstrate that our system outperforms state-of-theart methods in both low-light enhancement and 3D Gaussian Splatting.
|
|
11:35-11:40, Paper ThCT14.5 | |
Hash-GS: Anchor-Based 3D Gaussian Splatting with Multi-Resolution Hash Encoding for Efficient Scene Reconstruction |
|
Xie, Yijia | Zhejiang University |
Lin, Yuhang | Zhejiang University |
Li, Laijian | Zhejiang University |
Liu, Lina | Zhejiang University |
Wei, Xiaobin | Wasu Media&Network Co..Ltd |
Liu, Yong | Zhejiang University |
Lv, Jiajun | Zhejiang University |
Keywords: Visual Learning, Mapping, Deep Learning Methods
Abstract: Realistic 3D object and scene reconstruction is pivotal in advancing fields such as world model simulation and embodied intelligence. In this paper, we introduce Hash-GS, a storage-efficient method for large-scale scene reconstruction using anchor-based 3D Gaussian Splatting (3DGS). The vanilla 3DGS struggles with high memory demands due to the large number of primitives, especially in complex or extensive scenes. Hash-GS addresses these challenges with a compact representation by leveraging high-dimensional features to parameterize primitive properties, stored in compact hash tables, which reduces memory usage while preserving rendering quality. It also incorporates adaptive anchor management to efficiently control the number of anchors and neural Gaussians. Additionally, we introduce an analytic 3D smoothing filter to mitigate aliasing and support Level-of-Detail for optimized rendering across varying intrinsic parameters. Experimental results on several datasets demonstrate that Hash-GS improves storage efficiency while maintaining competitive rendering performance, especially in large-scale scenes.
|
|
11:40-11:45, Paper ThCT14.6 | |
Elite-EvGS: Learning Event-Based 3D Gaussian Splatting by Distilling Event-To-Video Priors |
|
Zhang, Zixin | HKUST-GZ |
Chen, Kanghao | Hong Kong University of Science and Technology (Guangzhou) |
Wang, Lin | Nanyang Technological University (NTU) |
Keywords: Visual Learning, Deep Learning for Visual Perception, Mapping
Abstract: Event cameras are bio-inspired sensors that output asynchronous and sparse event streams, instead of fixed frames. Benefiting from their distinct advantages, such as high dynamic range and high temporal resolution, event cameras have been applied to address 3D reconstruction, important for robotic mapping. Recently, neural rendering techniques, such as 3D Gaussian splatting (3DGS), have been shown successful in 3D reconstruction. However, it still remains under-explored how to develop an effective event-based 3DGS pipeline. In particular, as 3DGS typically depends on high-quality initialization and dense multiview constraints, a potential problem appears for the 3DGS optimization with events given its inherent sparse property. To this end, we propose a novel event-based 3DGS framework, named textbf{Elite-EvGS}. Our key idea is to distill the prior knowledge from the off-the-shelf event-to-video (E2V) models to effectively reconstruct 3D scenes from events in a coarse-to-fine optimization manner. Specifically, to address the complexity of 3DGS initialization from events, we introduce a novel textit{warm-up initialization strategy} that optimizes a coarse 3DGS from the frames generated by E2V models and then incorporates events to refine the details. Then, we propose a textit{progressive event supervision strategy} that employs the window-slicing operation to progressively reduce the number of events used for supervision. This subtly relives the temporal randomness of the event frames, benefiting the optimization of local textural and global structural details. Experiments on the benchmark datasets demonstrate that Elite-EvGS can reconstruct 3D scenes with better textural and structural details. Meanwhile, our method yields plausible performance on the captured real-world data, including diverse challenging conditions, such as fast motion and low light scenes. For demo and more results, please check our project page: https://vlislab22.github.io/elite-evgs/
|
|
ThCT15 |
403 |
Continuum Robots 2 |
Regular Session |
Chair: Krishnan, Girish | University of Illinois Urbana Champaign |
Co-Chair: Alambeigi, Farshid | University of Texas at Austin |
|
11:15-11:20, Paper ThCT15.1 | |
Hysteresis Compensation of Flexible Continuum Manipulator Using RGBD Sensing and Temporal Convolutional Network |
|
Park, Junhyun | DGIST |
Jang, Seonghyeok | DGIST |
Park, Hyojae | Daegu Gyeongbuk Institute of Science and Technology (DGIST) |
Bae, Seongjun | DGIST |
Hwang, Minho | Daegu Gyeongbuk Instituute of Science and Technology (DGIST) |
Keywords: Tendon/Wire Mechanism, Machine Learning for Robot Control, Modeling, Control, and Learning for Soft Robots
Abstract: Flexible continuum manipulators are valued for minimally invasive surgery, offering access to confined spaces through nonlinear paths. However, cable-driven manipulators face control difficulties due to hysteresis from cabling effects such as friction, elongation, and coupling. These effects are difficult to model due to nonlinearity and the difficulties become even more evident when dealing with long and coupled, multi-segmented manipulator. This paper proposes a data-driven approach based on Deep Neural Networks (DNN) to capture these nonlinear and previous states-dependent characteristics of cable actuation. We collect physical joint configurations according to command joint configurations using RGBD sensing and 7 fiducial markers to model the hysteresis of the proposed manipulator. Result on a study comparing the estimation performance of four DNN models show that the Temporal Convolution Network (TCN) demonstrates the highest predictive capability. Leveraging trained TCNs, we build a control algorithm to compensate for hysteresis. Tracking tests in task space using unseen trajectories show that the proposed control algorithm reduces the average position and orientation error by 61.39% (from 13.7mm to 5.29 mm) and 64.04% (from 31.17° to 11.21°), respectively. This result implies that the proposed calibrated controller effectively reaches the desired configurations by estimating the hysteresis of the manipulator. Applying this method in real surgical scenarios has the potential to enhance control precision and improve surgical performance.
|
|
11:20-11:25, Paper ThCT15.2 | |
Command Filtered Cartesian Impedance Control for Tendon Driven Continuum Manipulators with Actuator Fault Compensation |
|
Zheng, Xianjie | Nanjing University of Science and Technology |
Yu, Zhaobao | Nanjing University of Science and Technology |
Ding, Meng | Nanjing University of Science and Technology |
Liu, Liaoxue | Nanjing University of Science and Technology |
Guo, Jian | Nanjing Univ. of Sci. & Tech |
Guo, Yu | Nanjing University of Science and Technology |
Keywords: Modeling, Control, and Learning for Soft Robots, Compliance and Impedance Control
Abstract: Continuum robots are well-suited for constrained environments due to their superior flexibility and structural compliance. However, relying solely on passive compliance may lead to damage to both the robot and the surrounding environment. This work proposes a finite-time Cartesian impedance control scheme for tendon-driven continuum manipulators (TDCMs), where a second-order low-pass filter is used to adjust the reference trajectory according to the external robot tip force. The controller is designed using the command filtered backstepping method, and the finite-time stability is established by the designed Lyapunov function. In TDCM systems, the tendons operate antagonistically, and actuators often fail to quickly reach the desired tendon tension, leading to partial failures. To address this, we propose an actuator fault compensation algorithm to enhance system performance and reliability. We conducted trajectory tracking experiments on a multi-segment TDCM prototype, the results demonstrate that the designed Cartesian impedance controller achieves effective compliance control effect and high position control accuracy.
|
|
11:25-11:30, Paper ThCT15.3 | |
A Synergistic Framework for Learning Shape Estimation and Shape-Aware Whole-Body Control Policy for Continuum Robots |
|
Kasaei, Mohammadreza | University of Edinburgh |
Alambeigi, Farshid | University of Texas at Austin |
Khadem, Mohsen | University of Edinburgh |
Keywords: Modeling, Control, and Learning for Soft Robots, Machine Learning for Robot Control, Soft Robot Applications
Abstract: In this paper, we present a novel synergistic framework for learning shape estimation and a shape-aware whole-body control policy for continuum robots. Our approach leverages the interaction between two Augmented Neural Ordinary Differential Equations (ANODEs) - the Shape-NODE and Control-NODE - to achieve continuous shape estimation and shape-aware control. The Shape-NODE integrates prior knowledge from Cosserat rod theory, allowing it to adapt and account for model mismatches, while the Control-NODE uses this shape information to optimize a whole-body control policy, trained in a Model Predictive Control (MPC) fashion. This unified framework effectively overcomes limitations of existing data-driven methods, such as poor shape awareness and challenges in capturing complex nonlinear dynamics. Extensive evaluations in both simulation and real-world environments demonstrate the framework’s robust performance in shape estimation, trajectory tracking, and obstacle avoidance. The proposed method consistently outperforms state-of-the-art end-to-end, Neural-ODE, and Recurrent Neural Network (RNN) models, particularly in terms of tracking accuracy and generalization capabilities.
|
|
11:30-11:35, Paper ThCT15.4 | |
On the Benefits of Hysteresis in Tendon Driven Continuum Robots |
|
Hanley, David | University of Edinburgh |
Alambeigi, Farshid | University of Texas at Austin |
Khadem, Mohsen | University of Edinburgh |
Keywords: Soft Robot Materials and Design, Modeling, Control, and Learning for Soft Robots, Surgical Robotics: Steerable Catheters/Needles
Abstract: Hysteresis in the tendons driving continuum robots is frequently regarded as a nuisance and a problem that is best avoided. Some prior work seeks to ameliorate the effects of hysteresis through the selection of materials. Others propose models of hysteresis to compensate for their effects. In this work, we present an empirically validated model of hysteresis in tendon-driven continuum robots. We demonstrate that hysteresis contributes to the stability of these robots by mitigating undesirable tensions in robot's backbone. As a result, a model-based approach to hysteresis can be used not just for compensation of a nuisance, but to enhance the utility of continuum robots in safety critical applications such as medical robots.
|
|
11:35-11:40, Paper ThCT15.5 | |
Automating Tension Calibration for Tendon-Driven Continuum Robots: A Low-Cost Approach towards Consistent Teleoperation |
|
Lee, Kyum | University of Toronto |
Shentu, Chengnan | University of Toronto |
Pogue, Chloe | University of Toronto |
Burgner-Kahrs, Jessica | University of Toronto |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications
Abstract: We present a low-cost method to automate tension calibration for tendon-driven continuum robots (TDCRs), particularly those lacking tension sensing. Our method utilizes Hall effect sensors to localize the robot’s tip with respect to the one-dimensional trajectory it follows under individual tendon actuation. We propose two workflows for robots with and without a static model, making the method generalizable to other tendon-driven soft robots. We demonstrate our method’s ability to repeatably tension the tendons through associated tendon displacements. The calibration approach’s measured repeatability (±0.03 mm) is also benchmarked against manual calibration on a TDCR prototype, and its accuracy in achieving target tensions is assessed ((0.06±0.20) N). We further investigate how tension calibration impacts open-loop tracking accuracy, confirming the effectiveness of our method to enhance motion consistency in open-loop control and teleoperation.
|
|
11:40-11:45, Paper ThCT15.6 | |
A Neural Network-Based Framework for Fast and Smooth Posture Reconstruction of a Soft Continuum Arm |
|
Wang, Tixian | University of Illinois at Urbana-Champaign |
Chang, Heng-Sheng | University of Illinois Urbana-Champaign |
Kim, Seung Hyun | University of Illinois at Urbana-Champaign |
Guo, Jiamiao | University of Illinois Urbana-Champaign |
Akcal, M. Ugur | University of Illinois Urbana-Champaign |
Walt, Benjamin | University of Illinois Urbana-Champaign |
Biskup, Darren | Columbia University |
Halder, Udit | University of South Florida |
Krishnan, Girish | University of Illinois Urbana Champaign |
Chowdhary, Girish | University of Illinois at Urbana Champaign |
Gazzola, Mattia | University of Illinois at Urbana-Champaign |
Mehta, Prashant | University of Illinois Urbana-Champaign |
Keywords: Soft Robot Applications, Modeling, Control, and Learning for Soft Robots, Software-Hardware Integration for Robot Systems
Abstract: A neural network-based framework is developed and experimentally demonstrated for the problem of estimating the shape of a soft continuum arm (SCA) from noisy measurements of the pose at a finite number of locations along the length of the arm. The neural network takes as input these measurements and produces as output a finite-dimensional approximation of the strain, which is further used to reconstruct the infinite-dimensional smooth posture. This problem is important for various soft robotic applications. It is challenging due to the flexible aspects that lead to the infinite-dimensional reconstruction problem for the continuous posture and strains. Because of this, past solutions to this problem are computationally intensive. The proposed fast smooth reconstruction method is shown to be five orders of magnitude faster while having comparable accuracy. The framework is evaluated on two testbeds: a simulated octopus muscular arm and a physical BR2 pneumatic soft manipulator.
|
|
ThCT16 |
404 |
Grasping 4 |
Regular Session |
Chair: Chakraborty, Nilanjan | Stony Brook University |
Co-Chair: Harada, Kensuke | Osaka University |
|
11:15-11:20, Paper ThCT16.1 | |
GraspSAM: When Segment Anything Model Meets Grasp Detection |
|
Noh, Sangjun | Gwangju Institute of Science and Technology |
Kim, Jong-Won | GIST(Gwangju Institute of Science and Technology) |
Nam, Dongwoo | Gwangju Institute of Science and Technology |
Back, Seunghyeok | Korea Institute of Machinery & Materials |
Kang, Raeyoung | Gwangju Institute of Science and Technology |
Lee, Kyoobin | Gwangju Institute of Science and Technology |
Keywords: Deep Learning Methods, Grasping, Transfer Learning
Abstract: Grasp detection requires flexibility to handle objects of various shapes without relying on prior object knowledge, while also offering intuitive, user-guided control. In this paper, we introduce GraspSAM, an innovative extension of the Segment Anything Model (SAM) designed for prompt-driven and category-agnostic grasp detection. Unlike previous methods, which are often limited by small-scale training data, GraspSAM leverages SAM’s large-scale training and prompt-based segmentation capabilities to efficiently support both target-object and category-agnostic grasping. By utilizing adapters, learnable token embeddings, and a lightweight modified decoder, GraspSAM requires minimal fine-tuning to integrate object segmentation and grasp prediction into a unified framework. Our model achieves state-of-the-art (SOTA) performance across multiple datasets, including Jacquard, Grasp-Anything, and Grasp-Anything++. Extensive experiments demonstrate GraspSAM’s flexibility in handling different types of prompts (such as points, boxes, and language), highlighting its robustness and effectiveness in real-world robotic applications. Robot demonstrations, additional results, and code can be found at https://gistailab.github.io/GraspSAM/.
|
|
11:20-11:25, Paper ThCT16.2 | |
Dexterous Ungrasping Manipulation in Three Dimensions |
|
Kang, Taewoong | Pusan National University |
Kim, Joonyoung | Pusan National University |
Oh, Seung Hwa | Pusan National University |
Lim, WooSung | Pusan National University |
Lee, Junwoo | Pusan National University |
Yi, Seung-Joon | Pusan National University |
Seo, Jungwon | Pusan National University |
Keywords: Dexterous Manipulation, Grasping, Assembly
Abstract: This study focuses on the robotic capability of ungrasping, or releasing, an object in a grasp from the gripper to the robot’s environment. The presented technique enables the delicate release of a grasped object using non-static contacts, allowing for rolling and/or sliding. This dexterous manipulation capability is particularly relevant when ungrasping thin or slender objects, as will be demonstrated with real examples. We initially discuss the establishment of three-dimensional stability during ungrasping manipulation, ensuring robustness. Subsequently, we present a planning and control solution for three-dimensional ungrasping, building upon our previous planar version. A series of experiments across various test scenarios, ranging from precision placement to puzzle tiling, showcase the viability and effectiveness of our approach.
|
|
11:25-11:30, Paper ThCT16.3 | |
RTAGrasp: Learning Task-Oriented Grasping from Human Videos Via Retrieval, Transfer, and Alignment |
|
Dong, Wenlong | Southern University of Science and Technology |
Huang, Dehao | Southern University of Science and Technology |
Liu, Jiangshan | Southern University of Science and Technology |
Tang, Chao | Southern University of Science and Technology |
Zhang, Hong | SUSTech |
Keywords: Grasping, Perception for Grasping and Manipulation, Deep Learning in Grasping and Manipulation
Abstract: Task-oriented grasping (TOG) is crucial for robots to accomplish manipulation tasks, requiring the determination of TOG positions and directions. Existing methods either rely on costly manual TOG annotations or only extract coarse grasping positions or regions from human demonstrations, limiting their practicality in real-world applications. To address these limitations, we introduce RTAGrasp, a Retrieval, Transfer, and Alignment framework inspired by human grasping strategies. Specifically, our approach first effortlessly constructs a robot memory from human grasping demonstration videos, extracting both TOG position and direction constraints. Then, given a task instruction and a visual observation of the target object, RTAGrasp retrieves the most similar human grasping experience from its memory and leverages semantic matching capabilities of vision foundation models to transfer the TOG constraints to the target object in a training-free manner. Finally, RTAGrasp aligns the transferred TOG constraints with the robot's action for execution. Evaluations on the public TOG benchmark, TaskGrasp dataset, show the competitive performance of RTAGrasp on both seen and unseen object categories compared to existing baseline methods. Real-world experiments further validate its effectiveness on a robotic arm. Our code, appendix, and video are available at https://sites.google.com/view/rtagrasp/home.
|
|
11:30-11:35, Paper ThCT16.4 | |
You Only Estimate Once: Unified, One-Stage, Real-Time Category-Level Articulated Object 6D Pose Estimation for Robotic Grasping |
|
Huang, Jingshun | Fudan University |
Lin, Haitao | Tencent |
Wang, Tianyu | Fudan University |
Fu, Yanwei | Fudan University |
Jiang, Yu-Gang | Fudan University |
Xue, Xiangyang | Fudan University |
Keywords: Deep Learning for Visual Perception, Perception for Grasping and Manipulation, Deep Learning in Grasping and Manipulation
Abstract: This paper addresses the problem of category-level pose estimation for articulated objects in robotic manipulation tasks. Recent works have shown promising results in estimating part pose and size at the category level. However, these approaches primarily follow a complex multi-stage pipeline that first segments part instances in the point cloud and then estimates the Normalized Part Coordinate Space (NPCS) representation for 6D poses. These approaches suffer from high computational costs and low performance in real-time robotic tasks. To address these limitations, we propose YOEO, a single-stage method that simultaneously outputs instance segmentation and NPCS representations in an end-to-end manner. We use a unified network to generate point-wise semantic labels and centroid offsets, allowing points from the same part instance to vote for the same centroid. We further utilize a clustering algorithm to distinguish points based on their estimated centroid distances. Finally, we first separate the NPCS region of each instance. Then, we align the separated regions with the real point cloud to recover the final pose and size. Experimental results on the GAPart dataset demonstrate the pose estimation capabilities of our proposed single-shot method. We also deploy our synthetically-trained model in a real-world setting, providing real-time visual feedback at 200Hz, enabling a physical Kinova robot to interact with unseen articulated objects. This showcases the utility and effectiveness of our proposed method.
|
|
11:35-11:40, Paper ThCT16.5 | |
Point Cloud Decomposition for Task-Oriented Grasping |
|
Phi, Khiem | Stony Brook University |
Patankar, Aditya | Stony Brook University |
Mahalingam, Dasharadhan | Stony Brook University |
Chakraborty, Nilanjan | Stony Brook University |
Ramakrishnan, Iv | Stony Brook University |
Keywords: Grasping, Perception for Grasping and Manipulation
Abstract: Accurate localization of graspable regions within a single object point cloud is critical to enable task-based robot grasps. State-of-the-art task-based robot grasp synthesis methods fits over-approximated 3D bounding boxes that fails to isolate graspable regions even if they exist. While deep learning or geometrical shape decomposition methods can offer improved approximations, they lack guarantees for the graspability of segmented regions, require prior knowledge of the object, and/or demand large annotated datasets for fine-tuning. In this paper, we overcome these limitations to introduce ITSI. ITSI is a complete, task-oriented grasp synthesis approach that functions independently of object-specific knowledge. ITSI (Iterative Slicing) effectively segments multiple graspable regions that conform to the constraints of robot grippers thereby enabling compatibility with any object a robot seeks to grasp and any robot gripper size. Our extensive real-world and simulation experiments on diverse object datasets demonstrates how ITSI dramatically increases the number of discoverable robot grasps by up to 44% when compared to the state-of-the-art. We also expand ITSI's capabilities beyond task-based robot grasp synthesis to highlight its performance in human affordance segmentation outperforming fully supervised deep learning methods by 1%.
|
|
11:40-11:45, Paper ThCT16.6 | |
Adaptive Grasping of Moving Objects in Dense Clutter Via Global-To-Local Detection and Static-To-Dynamic Planning |
|
Chen, Hao | Osaka University |
Kiyokawa, Takuya | Osaka University |
Wan, Weiwei | Osaka University |
Harada, Kensuke | Osaka University |
Keywords: Grasping, Dexterous Manipulation, Planning under Uncertainty
Abstract: Robotic grasping is facing a variety of real-world uncertainties caused by non-static object states, unknown object properties, and cluttered object arrangements. The difficulty of grasping increases with the presence of more uncertainties, where commonly used learning-based approaches struggle to perform stably across varying conditions. In this study, we extend the idea of using similarity matching to tackle the challenge of grasping novel objects that are simultaneously in motion and densely cluttered where multiple uncertainties coexist with a single in-hand camera. We achieve this difficult task by shifting visual detection from global to local states and operating grasp planning from static to dynamic states. We propose several methods and algorithms to optimize planning efficiency and accuracy. Our system is adaptive to different object types, arrangements and movement speeds without additional training, as proved by our real-world experiments.
|
|
ThCT17 |
405 |
Localization 6 |
Regular Session |
Chair: Scherer, Sebastian | Carnegie Mellon University |
Co-Chair: Costante, Gabriele | University of Perugia |
|
11:15-11:20, Paper ThCT17.1 | |
UASTHN: Uncertainty-Aware Deep Homography Estimation for UAV Satellite-Thermal Geo-Localization |
|
Xiao, Jiuhong | New York University |
Loianno, Giuseppe | New York University |
Keywords: Deep Learning for Visual Perception, Aerial Systems: Applications, Localization
Abstract: Geo-localization is an essential component of Unmanned Aerial Vehicle (UAV) navigation systems to ensure precise absolute self-localization in outdoor environments. To address the challenges of GPS signal interruptions or low illumination, Thermal Geo-localization (TG) employs aerial thermal imagery to align with reference satellite maps to accurately determine the UAV's location. However, existing TG methods lack uncertainty measurement in their outputs, compromising system robustness in the presence of textureless or corrupted thermal images, self-similar or outdated satellite maps, geometric noises, or thermal images exceeding satellite maps. To overcome these limitations, this paper presents UASTHN, a novel approach for Uncertainty Estimation (UE) in Deep Homography Estimation (DHE) tasks for TG applications. Specifically, we introduce a novel Crop-based Test-Time Augmentation (CropTTA) strategy, which leverages the homography consensus of cropped image views to effectively measure data uncertainty. This approach is complemented by Deep Ensembles (DE) employed for model uncertainty, offering comparable performance with improved efficiency and seamless integration with any DHE model. Extensive experiments across multiple DHE models demonstrate the effectiveness and efficiency of CropTTA in TG applications. Analysis of detected failure cases underscores the improved reliability of CropTTA under challenging conditions. Finally, we demonstrate the capability of combining CropTTA and DE for a comprehensive assessment of both data and model uncertainty. Our research provides profound insights into the broader intersection of localization and uncertainty estimation. The code and models are publicly available.
|
|
11:20-11:25, Paper ThCT17.2 | |
Enhancing Feature Tracking Reliability for Visual Navigation Using Real-Time Safety Filter |
|
Kim, Dabin | Seoul National University |
Jang, Inkyu | Seoul National University |
Han, Youngsoo | Seoul National University |
Hwang, Sunwoo | Seoul National University |
Kim, H. Jin | Seoul National University |
Keywords: Sensor-based Control, Reactive and Sensor-Based Planning, View Planning for SLAM
Abstract: Vision sensors are extensively used for localizing a robot's pose, particularly in environments where global localization tools such as GPS or motion capture systems are unavailable. In many visual navigation systems, localization is achieved by detecting and tracking visual features or landmarks, which provide information about the sensor's relative pose. For reliable feature tracking and accurate pose estimation, it is crucial to maintain visibility of a sufficient number of features. This requirement can sometimes conflict with the robot's overall task objective. In this paper, we approach it as a constrained control problem. By leveraging the invariance properties of visibility constraints within the robot's kinematic model, we propose a real-time safety filter based on quadratic programming. This filter takes a reference velocity command as input and produces a modified velocity that minimally deviates from the reference while ensuring the information score from the currently visible features remains above a user-specified threshold. Numerical simulations demonstrate that the proposed safety filter preserves the invariance condition and ensures the visibility of more features than the required minimum. We also validated its real-world performance by integrating it into a visual simultaneous localization and mapping (SLAM) algorithm, where it maintained high estimation quality in challenging environments, outperforming a simple tracking controller.
|
|
11:25-11:30, Paper ThCT17.3 | |
SuperLoc: The Key to Robust LiDAR-Inertial Localization Lies in Predicting Alignment Risks |
|
Zhao, Shibo | Carnegie Mellon University |
Zhu, Honghao | CMU |
Gao, Yuanjun | Carnegie Mellon University |
Kim, Beomsoo | Hanyang University |
Qiu, Yuheng | Carnegie Mellon University |
Johnson, Aaron M. | Carnegie Mellon University |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Localization, SLAM, Mapping
Abstract: Map-based LiDAR localization, while widely used in autonomous systems, faces significant challenges in degraded environments due to the lack of distinct geometric features. This paper introduces SuperLoc, a robust LiDAR localization package that addresses key limitations in existing methods. SuperLoc features a novel predictive alignment risk assessment technique, enabling early detection and mitigation of potential failures before optimization. This approach significantly improves performance in challenging scenarios such as corridors, tunnels, and caves. Unlike existing degeneracy mitigation algorithms that rely on post-optimization analysis and heuristic thresholds, SuperLoc evaluates the localizability of raw sensor measurements. Experimental results demonstrate significant performance improvements over state-of-the-art methods across various degraded environments. Our approach achieves a 49.7% increase in accuracy and exhibits the highest robustness. To facilitate further research, we release our implementation along with datasets from eight challenging scenarios.
|
|
11:30-11:35, Paper ThCT17.4 | |
Active Illumination for Visual Ego-Motion Estimation in the Dark |
|
Crocetti, Francesco | University of Perugia |
Dionigi, Alberto | University of Perugia |
Brilli, Raffaele | University of Perugia |
Costante, Gabriele | University of Perugia |
Valigi, Paolo | Universita' Di Perugia |
Keywords: Vision-Based Navigation, Perception-Action Coupling, Localization
Abstract: Visual Odometry (VO) and Visual SLAM (V-SLAM) systems often struggle in low-light and dark environments due to the lack of robust visual features. In this paper, we propose a novel active illumination framework to enhance the performance of VO and V-SLAM algorithms in these challenging conditions. The developed approach dynamically controls a moving light source to illuminate highly textured areas, thereby improving feature extraction and tracking. Specifically, a detector block, which incorporates a deep learning-based enhancing network, identifies regions with relevant features. Then, a pan-tilt controller is responsible for guiding the light beam toward these areas, so that to provide information-rich images to the ego-motion estimation algorithm. Experimental results on a real robotic platform demonstrate the effectiveness of the proposed method, showing a reduction in the pose estimation error up to 75% with respect to a traditional fixed lighting technique.
|
|
11:35-11:40, Paper ThCT17.5 | |
Intensity Triangle Descriptor Constructed from High-Resolution Spinning LiDAR Intensity Image for Loop Closure Detection |
|
Zhang, Yanfeng | Institute of Automation, Chinese Academy of Sciences |
Tian, Yunong | Institute of Automation, Chinese Academy of Sciences |
Yang, Guodong | Institute of Automation, Chinese Academy of Sciences; Beijing Zh |
Li, Zhishuo | Chinese Academy of Sciences |
Luo, Mingrui | Institute of Automation, Chinese Academy of Sciences |
Li, En | Institute of Automation, Chinese Academy of Sciences |
Jing, Fengshui | Institute of Automation, CAS |
Keywords: SLAM, Recognition, Localization
Abstract: LiDAR-based loop closure detection is a crucial part of realizing robust SLAM algorithms for intelligent vehicles with LiDAR sensors. Existing methods often reduce the keypoint dimension to encode the global descriptor, which sacrifices the freedom of loop detection and correction. Based on the 6-DOF rigid transformation property of spatial triangles, we propose an algorithm for extracting and describing 3D keypoints from high-resolution spinning LiDAR intensity images to encode triangle descriptors, termed intensity triangle descriptor (ITD). In comparison to the direct extraction of keypoints from the point cloud, the use of image-derived feature points provides additional photometric texture information and better handles uneven spatial density of the point cloud, which is advantageous in unstructured and geometrically degraded scenes. To enhance the stability of keypoints, the spatial positions of multi-frame image feature points are registered to a keyframe by an odometer for voxel downsampling and non-maximum suppression, with the objective of reducing unstable feature points. For high discrimination, the neighbor image patches of each vertex (keypoint) are aggregated to estimate a Gaussian mixture model (GMM) as the keypoint signature. An efficient two-stage loop closure detection method is then proposed for ITD, consisting of candidate retrieval based on triangle side lengths and vertex GMMs, followed by geometric verification of matched descriptor pairs. The effectiveness of the proposed method is evaluated on the STheReO, FusionPortable, and our self-collected datasets.
|
|
11:40-11:45, Paper ThCT17.6 | |
IBTC: An Image-Assisting Binary and Triangle Combined Descriptor for Place Recognition by Fusing LiDAR and Camera Measurements |
|
Zou, Zuhao | HongKong University |
Zheng, Chunran | The University of Hong Kong |
Yuan, Chongjian | The University of Hong Kong |
Zhou, Shunbo | Huawei |
Xue, Kaiwen | The Chinese University of Hong Kong, Shenzhen |
Zhang, Fu | University of Hong Kong |
Keywords: SLAM, Localization, Mapping
Abstract: In this work, we introduce a novel multimodal descriptor, the image-assisting binary, and triangle combined (iBTC) descriptor, which fuses LiDAR (Light Detection and Ranging) and camera measurements for 3D place recognition. The inherent invariance of a triangle to rigid transformations inspires us to design triangle-based descriptors. We first extract distinct 3D key points from both LiDAR and camera measurements and organize them into triplets to form triangles. By utilizing the lengths of the sides of these triangles, we can create triangle descriptors, enabling the rapid retrieval of similar triangles from a database. By encoding the geometric and visual details at the triangle vertices into binary descriptors, we augment the triangle descriptors with richer local information. This enrichment process empowers our descriptors to reject mismatched triangle pairs. Consequently, the remaining matched triangle pairs yield accurate loop closure place indices and relative poses. In our experiments, we conduct a thorough comparison of our proposed method with several SOTA methods across public and self-collected datasets. The results demonstrate that our method exhibits superior performance in place recognition and overcomes the limitations associated with unimodal methods like BTC, RING++, ORB-DBoW2, and NetVLAD. Additionally, we performed a time cost benchmark experiment and the result indicates that our method’s time consumption is reasonable, compared with baseline methods. A demonstration video is available on https://www.youtube.com/watch?v=fe1Q0eR2fWk.
|
|
ThCT18 |
406 |
Planning under Uncertainty 2 |
Regular Session |
Chair: Yamane, Katsu | Path Robotics Inc |
Co-Chair: Gammell, Jonathan | Queen's University |
|
11:15-11:20, Paper ThCT18.1 | |
Belief Roadmaps with Uncertain Landmark Evanescence |
|
Fuentes, Erick | Massachusetts Institute of Technology |
Strader, Jared | Massachusetts Institute of Technology |
Fahnestock, Ethan | MIT |
Roy, Nicholas | Massachusetts Institute of Technology |
Keywords: Planning under Uncertainty
Abstract: We would like a robot to navigate to a goal location while minimizing state uncertainty. To aid the robot in this endeavor, maps provide a prior belief over the location of objects and regions of interest. To localize itself within the map, a robot identifies mapped landmarks using its sensors. However, as the time between map creation and robot deployment increases, portions of the map can become stale, and landmarks, once believed to be permanent, may disappear. We refer to the propensity of a landmark to disappear as landmark evanescence. Reasoning about landmark evanescence during path planning, and the associated impact on localization accuracy, requires analyzing the presence or absence of each landmark, leading to an exponential number of possible outcomes of a given motion plan. To address this complexity, we develop BRULE, an extension of the Belief Roadmap. During planning, we replace the belief over future robot poses with a Gaussian mixture which is able to capture the effects of landmark evanescence. Furthermore, we show that belief updates can be made efficient, and that maintaining a random subset of mixture components is sufficient to find high quality solutions. We demonstrate performance in simulated and real-world experiments. Software is available at https://bit.ly/BRULE.
|
|
11:20-11:25, Paper ThCT18.2 | |
Safe and Efficient Path Planning under Uncertainty Via Deep Collision Probability Fields |
|
Herrmann, Felix | Technische Universität Darmstadt |
Zach, Sebastian Bernhard | Technische Universität Darmstadt |
Banfi, Jacopo | Amazon |
Peters, Jan | Technische Universität Darmstadt |
Chalvatzaki, Georgia | Technische Universität Darmstadt |
Tateo, Davide | Technische Universität Darmstadt |
Keywords: Planning under Uncertainty, Deep Learning Methods, Motion and Path Planning
Abstract: Estimating collision probabilities between robots and environmental obstacles or other moving agents is crucial to ensure safety during path planning. This is an important building block of modern planning algorithms in many application scenarios such as autonomous driving, where noisy sensors perceive obstacles. While many approaches exist, they either provide too conservative estimates of the collision probabilities or are computationally intensive due to their sampling-based nature. To deal with these issues, we introduce Deep Collision Probability Fields, a neural-based approach for computing collision probabilities of arbitrary objects with arbitrary unimodal uncertainty distributions. Our approach relegates the computationally intensive estimation of collision probabilities via sampling at the training step, allowing for fast neural network inference of the constraints during planning. In extensive experiments, we show that Deep Collision Probability Fields can produce reasonably accurate collision probabilities (up to 10^{-3}) for planning and that our approach can be easily plugged into standard path planning approaches to plan safe paths on 2-D maps containing uncertain static and dynamic obstacles. Additional material, code, and videos are available at https://sites.google.com/view/ral-dcpf.
|
|
11:25-11:30, Paper ThCT18.3 | |
Safe POMDP Online Planning among Dynamic Agents Via Adaptive Conformal Prediction |
|
Sheng, Shili | University of Virginia |
Yu, Pian | University College London |
Parker, David | University of Oxford |
Kwiatkowska, Marta | University of Oxford |
Feng, Lu | University of Virginia |
Keywords: Formal Methods in Robotics and Automation, Planning under Uncertainty, Collision Avoidance
Abstract: Online planning for partially observable Markov decision processes (POMDPs) provides efficient techniques for robot decision-making under uncertainty. However, existing methods fall short of preventing safety violations in dynamic environments. This work presents a novel safe POMDP online planning approach that maximizes expected returns while providing probabilistic safety guarantees amidst environments populated by multiple dynamic agents. Our approach utilizes data-driven trajectory prediction models of dynamic agents and applies Adaptive Conformal Prediction (ACP) to quantify the uncertainties in these predictions. Leveraging the obtained ACP-based trajectory predictions, our approach constructs safety shields on-the-fly to prevent unsafe actions within POMDP online planning. Through experimental evaluation in various dynamic environments using real-world pedestrian trajectory data, the proposed approach has been shown to effectively maintain probabilistic safety guarantees while accommodating up to hundreds of dynamic agents.
|
|
11:30-11:35, Paper ThCT18.4 | |
Rao-Blackwellized POMDP Planning |
|
Lee, Jiho | University of Colorado Boulder |
Ahmed, Nisar | University of Colorado Boulder |
Wray, Kyle | N/a |
Sunberg, Zachary | University of Colorado |
Keywords: Planning under Uncertainty, Reinforcement Learning, Probabilistic Inference
Abstract: Partially Observable Markov Decision Processes (POMDPs) provide a structured framework for decision-making under uncertainty, but their application requires efficient belief updates. Sequential Importance Resampling Particle Filters (SIRPF), also known as Bootstrap Particle Filters, are commonly used as belief updaters in large approximate POMDP solvers, but they face challenges such as particle deprivation and high computational costs as the system's state dimension grows. To address these issues, this study introduces Rao-Blackwellized POMDP (RB-POMDP) approximate solvers and outlines generic methods to apply Rao-Blackwellization in both belief updates and online planning. We compare the performance of SIRPF and Rao-Blackwellized Particle Filters (RBPF) in a simulated localization problem where an agent navigates toward a target in a GPS-denied environment using POMCPOW and RB-POMCPOW planners. Our results not only confirm that RBPFs maintain efficient belief approximations over time with fewer particles, but, more surprisingly, RBPFs combined with quadrature-based integration improves planning quality significantly compared to SIRPF-based planning under the same computational limits.
|
|
11:35-11:40, Paper ThCT18.5 | |
Nearest-Neighbourless Asymptotically Optimal Motion Planning with Fully Connected Informed Trees (FCIT*) |
|
Wilson, Tyler S. | Queen's University |
Thomason, Wil | The AI Institute |
Kingston, Zachary | Purdue University |
Kavraki, Lydia | Rice University |
Gammell, Jonathan | Queen's University |
Keywords: Motion and Path Planning, Manipulation Planning, Constrained Motion Planning
Abstract: Improving the performance of motion planning algorithms for high-degree-of-freedom robots usually requires reducing the cost or frequency of computationally expensive operations. Traditionally, and especially for asymptotically optimal sampling-based motion planners, the most expensive operations are local motion validation and querying the nearest neighbours of a configuration. Recent advances have significantly reduced the cost of motion validation by using single instruction/multiple data (SIMD) parallelism to improve solution times for satisficing motion planning problems. These advances have not yet been applied to asymptotically optimal motion planning. This paper presents Fully Connected Informed Trees (FCIT*), the first fully connected, informed, anytime almost-surely asymptotically optimal (ASAO) algorithm. FCIT* exploits the radically reduced cost of edge evaluation via SIMD parallelism to build and search fully connected graphs. This removes the need for nearest-neighbours structures, which are a dominant cost for many sampling-based motion planners, and allows it to find initial solutions faster than state-of-the-art ASAO (VAMP, OMPL) and satisficing (OMPL) algorithms on the MotionBenchMaker dataset while converging towards optimal plans in an anytime manner.
|
|
11:40-11:45, Paper ThCT18.6 | |
Efficient Path Planning in Complex Environments with Trust Region Continuous Belief Tree Search |
|
Nunez, Andre Julio | University of Technology Sydney |
Kong, Felix Honglim | The University of Technology Sydney |
González-Cantos, Alberto | Navantia |
Fitch, Robert | University of Technology Sydney |
Keywords: Constrained Motion Planning, Motion and Path Planning, Marine Robotics
Abstract: Real-world applications of path planning must contend with complicated constraint and objective functions imposed by the surrounding operational and regulatory environment. Traditional methods such as PRM* and RRT* have asymptotic guarantees, but often struggle in practice with complex black-box objective/constraint functions, especially in compute-limited situations. Continuous Belief Tree Search (CBTS) addresses these limitations by maintaining local estimates of the objective function in order to sample new nodes from continuous space, often giving high-quality solutions more quickly. However, CBTS requires careful tuning of a control duration parameter, which introduces a tradeoff between compute time and path cost/feasibility. In environments with complex costs and constraints, there may be no single control duration that gives good paths in short compute time. This paper proposes Trust Region CBTS (TR-CBTS), an extension of CBTS with an adaptive control duration parameter inspired by trust region methods. TR-CBTS adjusts control duration based on information from recently sampled candidate nodes, allowing longer control duration where possible to speed up compute time, and shortening control duration when precise navigation in environments with complex, unknown constraint and objective functions. We show TR-CBTS outperforms existing comparable planners for a realistic robotic path planning application in autonomous ship routing.
|
|
ThCT19 |
407 |
Active Perception |
Regular Session |
Chair: Bezzo, Nicola | University of Virginia |
Co-Chair: Lopez, Brett | University of California, Los Angeles |
|
11:15-11:20, Paper ThCT19.1 | |
PRIMER: Perception-Aware Robust Learning-Based Multiagent Trajectory Planner |
|
Kondo, Kota | Massachusetts Institute of Technology |
Tewari, Claudius Taroon | Massachusetts Institute of Technology |
Tagliabue, Andrea | Massachusetts Institute of Technology |
Tordesillas Torres, Jesus | ICAI School of Engineering, Comillas Pontifical University |
Lusk, Parker C. | Massachusetts Institute of Technology |
Peterson, Mason B. | Massachusetts Institute of Technology |
How, Jonathan | Massachusetts Institute of Technology |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Imitation Learning, Aerial Systems: Applications
Abstract: In decentralized multiagent trajectory planners, agents need to communicate and exchange their positions to generate collision-free trajectories. However, due to localization errors/uncertainties, trajectory deconfliction can fail even if trajectories are perfectly shared between agents. To address this issue, we first present PARM and PARM*, perception-aware, decentralized, asynchronous multiagent trajectory planners that enable a team of agents to navigate uncertain environments while deconflicting trajectories and avoiding obstacles using perception information. PARM* differs from PARM as it is less conservative, using more computation to find closer-to-optimal solutions. While these methods achieve state-of-the-art performance, they suffer from high computational costs as they need to solve large optimization problems onboard, making it difficult for agents to replan at high rates. To overcome this challenge, we present our second key contribution, PRIMER, a learning-based planner trained with imitation learning (IL) using PARM* as the expert demonstrator. PRIMER leverages the low computational requirements at deployment of neural networks and achieves a computation speed up to 5614 times faster than optimization-based approaches.
|
|
11:20-11:25, Paper ThCT19.2 | |
HGS-Planner: Hierarchical Planning Framework for Active Scene Reconstruction Using 3D Gaussian Splatting |
|
Xu, Zijun | Fudan University |
Jin, Rui | Zhejiang University |
Wu, Ke | Fudan University |
Zhao, Yi | Fudan University |
Zhang, Zhiwei | Fudan University |
Zhao, Jieru | Shanghai Jiao Tong University |
Gao, Fei | Zhejiang University |
Gan, Zhongxue | Fudan University |
Ding, Wenchao | Fudan University |
Keywords: View Planning for SLAM, Deep Learning for Visual Perception
Abstract: In complex missions such as search and rescue, robots must make intelligent decisions in unknown environments, relying on their ability to perceive and understand their surroundings. High-quality and real-time reconstruction enhances situational awareness and is crucial for intelligent robotics. Traditional methods often struggle with poor scene representation or are too slow for real-time use. Inspired by the efficacy of 3D Gaussian Splatting (3DGS), we propose a hierarchical planning framework for fast and high-fidelity active reconstruction. Our method evaluates completion and quality gain to adaptively guide reconstruction, integrating global and local planning for efficiency. Experiments in simulated and real-world environments show our approach outperforms existing real-time methods.
|
|
11:25-11:30, Paper ThCT19.3 | |
An Active Perception Game for Robust Information Gathering |
|
He, Siming | University of Pennsylvania |
Tao, Yuezhan | University of Pennsylvania |
Spasojevic, Igor | University of Pennsylvania |
Kumar, Vijay | University of Pennsylvania |
Chaudhari, Pratik | University of Pennsylvania |
Keywords: Mapping, Probability and Statistical Methods, Vision-Based Navigation
Abstract: Active perception approaches select future viewpoints by using some estimate of the information gain. An inaccurate estimate can be detrimental in critical situations, e.g., locating a person in distress. However the true information gained can only be calculated post hoc, i.e., after the observation is realized. We present an approach to estimate the discrepancy between the estimated information gain (which is the expectation over putative future observations while neglecting correlations among them) and the true information gain. The key idea is to analyze the mathematical relationship between active perception and the estimation error of the information gain in a game-theoretic setting. Using this, we develop an online estimation approach that achieves sub-linear regret (in the number of time-steps) for the estimation of the true information gain and reduces the sub-optimality of active perception systems. We demonstrate our approach for active perception using a comprehensive set of experiments on: (a) different types of environments, including a quadrotor in a photorealistic simulation, real-world robotic data, and real-world experiments with ground robots exploring indoor and outdoor scenes; (b) different types of robotic perception data; and (c) different map representations. On average, our approach reduces information gain estimation errors by 42%, increases the information gain by 7%, PSNR by 5%, and semantic accuracy (measured as the number of objects that are localized correctly) by 6%. In real-world experiments with a Jackal ground robot, our approach demonstrated complex trajectories to explore occluded regions.
|
|
11:30-11:35, Paper ThCT19.4 | |
Take Your Best Shot: Sampling-Based Planning for Autonomous Photography |
|
Gao, Shijie | University of Virginia |
Bramblett, Lauren | University of Virginia |
Bezzo, Nicola | University of Virginia |
Keywords: Vision-Based Navigation, Planning under Uncertainty, Reactive and Sensor-Based Planning
Abstract: Autonomous mobile robots (AMRs) equipped with high-quality cameras are revolutionizing the field of autonomous photography by delivering efficient and cost-effective methods for capturing dynamic visual content. As AMRs are deployed in increasingly diverse environments, the challenge of consistently producing high-quality photographic content remains. Traditional approaches often involve AMRs following a predetermined path while capturing data-intensive imagery, which can be suboptimal, especially in environments with limited connectivity or physical obstructions. These drawbacks necessitate intelligent decision-making to pinpoint optimal vantage points for image capture. Inspired by Next Best View studies, we propose a novel autonomous photography framework that enhances image quality and minimizes the number of photos needed. This framework incorporates a proposed evaluation metric that leverages ray-tracing and Gaussian process interpolation, enabling the assessment of potential visual information from the target in partially known environments. A derivative-free optimization (DFO) method is then proposed to sample candidate views and identify the optimal viewpoint. The effectiveness of our approach is demonstrated by comparing it with existing methods and further validated through simulations and experiments with various vehicles.
|
|
11:35-11:40, Paper ThCT19.5 | |
An Addendum to NeBula: Toward Extending Team CoSTAR’s Solution to Larger Scale Environments (I) |
|
Morrell, Benjamin | Jet Propulsion Laboratory, California Institute of Technology |
Otsu, Kyohei | California Institute of Technology |
Agha-mohammadi, Ali-akbar | NASA-JPL, Caltech |
Fan, David D | NASA Jet Propulsion Laboratory |
Kim, Sung-Kyun | NASA Jet Propulsion Laboratory, Caltech |
Ginting, Muhammad Fadhil | Stanford University |
Lei, Xianmei | NASA JPL |
Edlund, Jeffrey | Jet Propulsion Lab |
Fakoorian, Seyed Abolfazl | Cleveland State University |
Bouman, Amanda | Caltech |
Chavez, Fernando | Jet Propulsion Laboratory |
Kim, Taeyeon | Korea Advanced Institute of Science and Technology |
Correa, Gustavo J. | University of California Riverside |
Saboia Da Silva, Maira | NASA Jet Propulsion Laboratory |
Santamaria-Navarro, Angel | Universitat Politčcnica De Catalunya |
Lopez, Brett | University of California, Los Angeles |
Kim, Boseong | Korea Advanced Institute of Science and Technology (KAIST) |
Jung, Chanyoung | KAIST |
Sobue, Mamoru | The University of Tokyo |
Peltzer, Oriana | Stanford University |
Ott, Joshua | Stanford University |
Trybula, Robert | University of Southern California |
Touma, Thomas | Caltech |
Kaufmann, Marcel | Polytechnique Montreal |
Vaquero, Tiago | JPL, Caltech |
Pailevanian, Torkom | Jet Propulsion Laboratory |
Palieri, Matteo | NASA Jet Propulsion Laboratory |
Chang, Yun | MIT |
Reinke, Andrzej | University of Bonn |
Spieler, Patrick | JPL |
Clark, Lillian | University of Southern California |
Archanian, Avak | Jet Propulsion Laboratory, California Institute of Technology |
Chen, Kenny | University of California, Los Angeles |
Melikyan, Hovhannes | Jet Propulsion Laboratory, California Institute of Technology |
Dixit, Anushri | University of California, Los Angeles |
Delecki, Harrison | Stanford University |
Pastor, Daniel | Caltech |
Ridge, Barry | NASA Jet Propulsion Laboratory, California Institute of Technolo |
Marchal, Nicolas Paul | ETH Zurich |
Uribe, Jose | Jet Propulsion Laboratory |
Kochenderfer, Mykel | Stanford University |
Beltrame, Giovanni | Ecole Polytechnique De Montreal |
Nikolakopoulos, George | Luleĺ University of Technology |
Shim, David Hyunchul | KAIST |
Carlone, Luca | Massachusetts Institute of Technology |
Burdick, Joel | California Institute of Technology |
Keywords: Field Robots, Multi-Robot Systems, Software-Hardware Integration for Robot Systems
Abstract: This article presents an appendix to the original NeBula autonomy solution developed by the Team Collaborative SubTerranean Autonomous Robots (CoSTAR), participating in the DARPA Subterranean Challenge. Specifically, this article presents extensions to NeBula’s hardware, software, and algorithmic components that focus on increasing the range and scale of the exploration environment. From the algorithmic perspective, we discuss the following extensions to the original NeBula framework: 1) large-scale geometric and semantic environment mapping; 2) an adaptive positioning system; 3) probabilistic traversability analysis and local planning; 4) large-scale partially observable Markov decision process (POMDP)-based global motion planning and exploration behavior; 5) large-scale networking and decentralized reasoning; 6) communication-aware mission planning; and 7) multimodal ground–aerial exploration solutions. We demonstrate the application and deployment of the presented systems and solutions in various large-scale underground environments, including limestone mine exploration scenarios as well as deployment in the DARPA Subterranean challenge.
|
|
11:40-11:45, Paper ThCT19.6 | |
InstanceVO: Self-Supervised Semantic Visual Odometry by Using Metric Learning to Incorporate Geometrical Priors in Instance Objects |
|
Xie, Yuanyan | Tsinghua University |
Yang, Junzhe | University of Science and Technology Beijing |
Zhou, Huaidong | Tsinghua University |
Sun, Fuchun | Tsinghua University |
Keywords: Localization, Semantic Scene Understanding, Autonomous Agents
Abstract: Visual odometry is one of the key technologies for unmanned ground vehicles. To improve the robustness of the systems and enable intelligent tasks, researchers introduced learning-based recognition modules into visual odometry systems, but didn't realize tight coupling between visual odometry systems and recognition modules. This paper proposes a self-supervised semantic visual odometry method, which can complete the tasks of ego-motion estimation, depth prediction, and instance segmentation with a shared encoder. The potential dynamic regions are removed and the image reconstruction loss is rectified by instance detection results. Moreover, the instance-guided triplet loss and cross-task self-attention modules are devised to learn the geometrical relationships among pixels that are implied in instance object priors. The proposed method is validated on KITTI and ComplexUrban datasets. The experimental results show that our method has superiority to baseline models in both pose estimation and depth prediction. We also discuss the efficacy of evaluation metrics for pose estimation, and consider the accumulation errors of trajectories.
|
|
ThCT20 |
408 |
In-Hand Manipulation |
Regular Session |
Chair: Mason, Matthew T. | Carnegie Mellon University |
Co-Chair: Iba, Soshi | Honda Research Institute USA |
|
11:15-11:20, Paper ThCT20.1 | |
GET-Zero: Graph Embodiment Transformer for Zero-Shot Embodiment Generalization |
|
Patel, Austin | Stanford University |
Song, Shuran | Stanford University |
Keywords: Transfer Learning, Dexterous Manipulation, Multifingered Hands
Abstract: This paper introduces GET-Zero, a model architecture and training procedure for learning an embodiment-aware control policy that can immediately adapt to new hardware changes without retraining. To do so, we present Graph Embodiment Transformer (GET), a transformer model that leverages the embodiment graph connectivity as a learned structural bias in the attention mechanism. We use behavior cloning to distill demonstration data from embodiment-specific expert policies into an embodiment-aware GET model that conditions on the hardware configuration of the robot to make control decisions. We conduct a case study on a dexterous in-hand object rotation task using different configurations of a four-fingered robot hand with joints removed and with link length extensions. Using the GET model along with a self-modeling loss enables GET-Zero to zero-shot generalize to unseen variation in graph structure and link length, yielding a 20% improvement over baseline methods. All code and qualitative video results are on our project website https://get-zero-paper.github.io.
|
|
11:20-11:25, Paper ThCT20.2 | |
Proprioceptive Object Shape and Size Extraction Via In-Hand-Manipulation with a Variable Friction Robot Gripper |
|
Bodnar, Igor | Imperial College London |
Spiers, Adam | Imperial College London |
Keywords: In-Hand Manipulation, Grippers and Other End-Effectors, Force and Tactile Sensing
Abstract: Robotic manipulation tasks commonly rely on computer vision or tactile sensing to extract the physical characteristics of an object. However, this additional sensing capability adds complexity and financial cost to a robotic system. Our work investigates the inexpensive alternative of feature extraction via proprioceptive sensing. Our goal is to determine whether proprioceptive data combined with in-hand-manipulation provides sufficient information to enable geometric reconstruction of object profiles. We use a newly designed 3-DOF robotic gripper with variable-friction finger surfaces to perform model-free in-hand-anipulation on a set of test objects comprised of two dimensional convex prisms. We have devised a manipulation sequence based on the rotation and sliding of test objects to allow side-counting with the successful measurement of shapes and sizes with average angle and size errors of 1.64% and 6.76% respectively. In addition, we have outlined potential research directions aimed at resolving inherent limitations of proprioceptive approaches and making our algorithm generalisable to any arbitrary shape.
|
|
11:25-11:30, Paper ThCT20.3 | |
Diffusion-Informed Probabilistic Contact Search for Multi-Finger Manipulation |
|
Kumar, Abhinav | University of Michigan |
Power, Thomas | Robotics Institute, University of Michigan |
Yang, Fan | University of Michigan |
Aguilera, Sergio | Georgia Institute of Technology |
Iba, Soshi | Honda Research Institute USA |
Soltani Zarrin, Rana | Honda Research Institute - USA |
Berenson, Dmitry | University of Michigan |
Keywords: Dexterous Manipulation, Manipulation Planning, Deep Learning in Grasping and Manipulation
Abstract: Planning contact-rich interactions for multi-finger manipulation is challenging due to the high-dimensionality and hybrid nature of dynamics. Recent advances in data-driven methods have shown promise, but are sensitive to the quality of training data. Combining learning with classical methods like trajectory optimization and search adds additional structure to the problem and domain knowledge in the form of constraints, which can lead to outperforming the data on which models are trained. We present Diffusion-Informed Probabilistic Contact Search (DIPS), which uses an A* search to plan a sequence of contact modes informed by a diffusion model. We train the diffusion model on a dataset of demonstrations consisting of contact modes and trajectories generated by a trajectory optimizer given those modes. In addition, we use a particle filter-inspired method to reason about variability in diffusion sampling arising from model error, estimating likelihoods of trajectories using a learned discriminator. We show that our method outperforms ablations that do not reason about variability and can plan contact sequences that outperform those found in training data across multiple tasks. We evaluate on simulated tabletop card sliding and screwdriver turning tasks, as well as the screwdriver task in hardware to show that our combined learning and planning approach transfers to the real world.
|
|
11:30-11:35, Paper ThCT20.4 | |
Variable-Friction In-Hand Manipulation for Arbitrary Objects Via Diffusion-Based Imitation Learning |
|
Yan, Qiyang | Imperial College London |
Ding, Zihan | Princeton University |
Zhou, Xin | Imperial College London |
Spiers, Adam | Imperial College London |
Keywords: In-Hand Manipulation, Imitation Learning, Machine Learning for Robot Control
Abstract: Dexterous in-hand manipulation (IHM) for arbitrary objects is challenging due to the rich and subtle contact process. Variable-friction manipulation is an alternative approach to dexterity, previously demonstrating robust and versatile 2D IHM capabilities with only two single-joint fingers. However, the hard-coded manipulation methods for variable friction hands are restricted to regular polygon objects and limited target poses, as well as requiring the policy to be tailored for each object. This paper proposes an end-to-end learning-based manipulation method to achieve arbitrary object manipulation for any target pose on real hardware, with minimal engineering efforts and data collection. The method features a diffusion policy-based imitation learning method with co-training from simulation and a small amount of real-world data. With the proposed framework, arbitrary objects including polygons and non-polygons can be precisely manipulated to reach arbitrary goal poses within 2 hours of training on an A100 GPU and only 1 hour of real-world data collection. The precision is higher than previous customized object-specific policies, achieving an average success rate of 71.3% with average pose error being 2.676 mm and 1.902°. Code and videos can be found at: https://sites.google.com/view/vf-ihm-il/home.
|
|
11:35-11:40, Paper ThCT20.5 | |
From Simple to Complex Skills: The Case of In-Hand Object Reorientation |
|
Qi, Haozhi | UC Berkeley |
Yi, Brent | University of California, Berkeley |
Lambeta, Mike Maroje | Facebook |
Ma, Yi | University of Illinois at Urbana-Champaign |
Calandra, Roberto | TU Dresden |
Malik, Jitendra | UC Berkeley |
Keywords: In-Hand Manipulation, Dexterous Manipulation, Reinforcement Learning
Abstract: Learning policies in simulation and transferring them to the real world has become a promising approach in dexterous manipulation. However, bridging the sim-to-real gap for each new task requires substantial human effort, such as careful reward engineering, hyperparameter tuning, and system identification. In this work, we present a system that leverages low-level skills to address these challenges for more complex tasks. Specifically, we introduce a hierarchical policy for in-hand object reorientation based on previously acquired rotation skills. This hierarchical policy learns to select which low-level skill to execute based on feedback from both the environment and the low-level skill policies themselves. Compared to learning from scratch, the hierarchical policy is more robust to out-of-distribution changes and transfers easily from simulation to real-world environments. Additionally, we propose a generalizable object pose estimator that uses proprioceptive information, low-level skill predictions, and control errors as inputs to estimate the object's pose over time. We demonstrate that our system can reorient objects, including symmetrical and textureless ones, to a desired pose.
|
|
11:40-11:45, Paper ThCT20.6 | |
DROP: Dextereous Reorientation Via Online Planning |
|
Li, Albert H. | California Institute of Technology |
Culbertson, Preston | Cornell University |
Kurtz, Vincent | California Institute of Technology |
Ames, Aaron | California Institute of Technology |
Keywords: In-Hand Manipulation, Dexterous Manipulation, Manipulation Planning
Abstract: Achieving human-like dexterity is a longstanding challenge in robotics, in part due to the complexity of planning and control for contact-rich systems. In reinforcement learning (RL), one popular approach has been to use massively-parallelized, domain-randomized simulations to learn a policy offline over a vast array of contact conditions, allowing robust sim-to-real transfer. Inspired by recent advances in real-time parallel simulation, this work considers instead the viability of online planning methods for contact-rich manipulation by studying the well-known in-hand cube reorientation task. We propose a simple architecture that employs a sampling-based predictive controller and vision-based pose estimator to search for contact-rich control actions online. We conduct thorough experiments to assess the real-world performance of our method, architectural design choices, and key factors for robustness, demonstrating that our simple sampled-based approach achieves performance comparable to prior RL-based works. Supplemental material: https://caltech-amber.github.io/drop.
|
|
ThCT21 |
410 |
Safety and Control in HRI |
Regular Session |
Chair: He, Hongsheng | The University of Alabama |
Co-Chair: Kim, Wansoo | Hanyang University ERICA |
|
11:15-11:20, Paper ThCT21.1 | |
Uncertainty-Aware Probabilistic 3D Human Motion Forecasting Via Invertible Networks |
|
Ma, Yue | Beihang University |
Zhou, Kanglei | Beihang University |
Yu, Fuyang | Beihang University |
Li, Frederick W. B. | University of Durham |
Xiaohui, Liang | State Key Laboratory of Virtual Reality Technology and Systems, |
Keywords: Human and Humanoid Motion Analysis and Synthesis, Modeling and Simulating Humans, Safety in HRI
Abstract: 3D human motion forecasting aims to enable autonomous applications. Estimating uncertainty for each prediction (i.e., confidence based on probability density or quantile) is essential for safety-critical contexts like human-robot collaboration to minimize risks. However, existing diverse motion forecasting approaches struggle with uncertainty quantification due to implicit probabilistic representations hindering uncertainty modeling. We propose ProbHMI, which introduces invertible networks to parameterize poses in a disentangled latent space, enabling probabilistic dynamics modeling. A forecasting module then explicitly predicts future latent distributions, allowing effective uncertainty quantification. Evaluated on benchmarks, ProbHMI achieves strong performance for both deterministic and diverse prediction while validating uncertainty calibration, critical for risk-aware decision making.
|
|
11:20-11:25, Paper ThCT21.2 | |
MonLog: MONotonic-Constrained LOGistic Regressions for Automated Safety Curve Design |
|
Melone, Alessandro | Technical University of Munich |
Kirschner, Robin Jeanne | TU Munich, Institute for Robotics and Systems Intelligence |
Müller, Dirk | Department of Orthopaedics and Sports Orthopaedics, Klinikum Rec |
Swikir, Abdalla | Mohamed Bin Zayed University of Artificial Intelligence |
Haddadin, Sami | Mohamed Bin Zayed University of Artificial Intelligence |
Keywords: Safety in HRI, Physical Human-Robot Interaction, Human-Centered Robotics
Abstract: The increasing integration of robots in close human environments necessitates robust safety measures that can adapt to evolving tasks and conditions. Current standards rely on task-specific safety evaluations that are often inflexible, requiring repeated assessments whenever task parameters change. This work proposes MonLog, a data-driven, probabilistic method to automatically derive safety curves (SCs) from recent injury protection data sets. By leveraging non-linear modeling techniques, our approach addresses the limitations of conventional linear SCs, which often result in overly conservative speed restrictions. We present a comprehensive test routine to validate our method, highlighting improvements in both compliance with safety constraints and operational efficiency. Our findings demonstrate that the proposed approach not only enhances safety but also optimizes robotic performance, making it suitable for a wide range of applications.
|
|
11:25-11:30, Paper ThCT21.3 | |
Passivity Filters for Bilateral Teleoperation with Variable Impedance Control |
|
Alyousef Almasalmah, Fadi | University of Strasbourg |
Poignonec, Thibault | University of Strasbourg, Icube Laboratory |
Omran, Hassan | ICube Laboratory, University of Strasbourg, Strasbourg |
Liu, Chao | LIRMM |
Bayle, Bernard | University of Strasbourg |
Keywords: Telerobotics and Teleoperation, Compliance and Impedance Control, Safety in HRI
Abstract: In robotic teleoperation, it is crucial to be able to dynamically adjust interactions with the environment. Drawing inspiration from human behavior during interactions, Variable Impedance Control (VIC) has been widely adopted to enhance robotic flexibility and adaptability. However, maintaining the passivity of such control systems remains a critical safety concern. This paper introduces an optimization-based framework for passive variable impedance control in bilateral teleoperation, combining the advantages of Passivity Filters (PFs), Time-Domain Passivity (TDP) control, and Passive-Set-Position-Modulation (PSPM). The method solves an optimization problem aimed at dissipating the energy that could lead to a lack of passivity. The proposed method is assessed through experiments, illustrating its ability to keep the teleoperation system passive and safe under a variable impedance profile.
|
|
11:30-11:35, Paper ThCT21.4 | |
Robots That Learn to Safely Influence Via Prediction-Informed Reach-Avoid Dynamic Games |
|
Pandya, Ravi | Carnegie Mellon University |
Liu, Changliu | Carnegie Mellon University |
Bajcsy, Andrea | Carnegie Mellon University |
Keywords: Human-Robot Collaboration, Safety in HRI, Robot Safety
Abstract: Robots can influence people to accomplish their tasks more efficiently: autonomous cars can inch forward at an intersection to pass through, and tabletop manipulators can go for an object on the table first. However, a robot's ability to influence can also compromise the physical safety of nearby people if naively executed. In this work, we pose and solve a novel robust reach-avoid dynamic game which enables robots to be maximally influential, but only when a safety backup control exists. On the human side, we model the human's behavior as goal-driven but conditioned on the robot's plan, enabling us to capture influence. On the robot side, we solve the dynamic game in the joint physical and belief space, enabling the robot to reason about how its uncertainty in human behavior will evolve over time. We instantiate our method, called SLIDE (Safely Leveraging Influence in Dynamic Environments), in a high-dimensional (39-D) simulated human-robot collaborative manipulation task solved via offline game-theoretic reinforcement learning. We compare our approach to a robust baseline that treats the human as a worst-case adversary, a safety controller that does not explicitly reason about influence, and an energy-function-based safety shield. We find that SLIDE consistently enables the robot to leverage the influence it has on the human when it is safe to do so, ultimately allowing the robot to be less conservative while still ensuring a high safety rate during task execution.
|
|
11:35-11:40, Paper ThCT21.5 | |
Multi-Layered Safety of Redundant Robot Manipulators Via Task-Oriented Planning and Control |
|
Jia, Xinyu | National University of Singapore |
Wang, Wenxin | National University of Singapore |
Yang, Jun | National University of Singapore |
Pan, Yongping | Peng Cheng Laboratory |
Yu, Haoyong | National University of Singapore |
Keywords: Safety in HRI, Collision Avoidance, Motion Control
Abstract: Ensuring safety is crucial to promote the application of robot manipulators in open workspaces. Factors such as sensor errors or unpredictable collisions make the environment full of uncertainties. In this work, we investigate these potential safety challenges on redundant robot manipulators, and propose a task-oriented planning and control framework to achieve multi-layered safety while maintaining efficient task execution. Our approach consists of two main parts: a task-oriented trajectory planner based on multiple-shooting model predictive control (MPC) method, and a torque controller that allows safe and efficient collision reaction using only proprioceptive data. Through extensive simulations and real-hardware experiments, we demonstrate that the proposed framework can effectively handle uncertain static or dynamic obstacles, and perform disturbance resistance in manipulation tasks when unforeseen contacts occur.
|
|
11:40-11:45, Paper ThCT21.6 | |
A Multi-Task Energy-Aware Impedance Controller for Enhanced Safety in Physical Human-Robot Interaction |
|
Choi, SeungMin | Hanyang University |
Ha, Seongmin | Hanyang University |
Kim, Wansoo | Hanyang University ERICA |
Keywords: Safety in HRI, Physical Human-Robot Interaction, Human-Robot Collaboration
Abstract: In physical human-robot interaction (pHRI), ensuring human safety in all tasks conducted by the robot is crucial. Traditional compliance control strategies, such as admittance and impedance control, often lead to unpredictable robot behavior due to incidents like contact loss or unexpected external forces, which can cause significant harm to humans. To overcome these limitations, this study introduces a multi-task energy-aware impedance controller for kinematically redundant robots. This controller extends the energy-aware impedance control strategy, which ensures the passivity and safety of a single task using a virtual global energy tank, to kinematically redundant robots performing multiple tasks. The proposed controller effectively regulates the power flow of all tasks performed by the robot through a single global energy tank, ensuring the safety and passivity of the tasks. Experimental results in a shared environment, where external forces are simultaneously applied to the end-effector and the third joint of the Franka Emika Panda, showed that the robot's energy and power, as well as the power of all tasks, consistently remained within predefined thresholds. Additionally, when comparing the proposed controllers with controller that do not consider null space projection in the power regulation stage and controller that do not regulate the robot's power, our approach effectively managed the robot's energy and power and the power of all tasks, ensuring passivity and enhanced safety.
|
|
ThCT22 |
411 |
Learning for Manipulation |
Regular Session |
Chair: Zambelli, Martina | Google DeepMind |
Co-Chair: Meißner, Pascal | Wuerzburg-Schweinfurt Technical University of Applied Sciences |
|
11:15-11:20, Paper ThCT22.1 | |
A Parameter-Efficient Tuning Framework for Language-Guided Object Grounding and Robot Grasping |
|
Yu, Houjian | University of Minnesota, Twin Cities |
Li, Mingen | University of Minnesota Twin Cities |
Rezazadeh, Alireza | University of Minnesota |
Yang, Yang | Meta |
Choi, Changhyun | University of Minnesota, Twin Cities |
Keywords: Perception for Grasping and Manipulation, Semantic Scene Understanding, Deep Learning in Grasping and Manipulation
Abstract: The language-guided robot grasping task requires a robot agent to integrate multimodal information from both visual and linguistic inputs to predict actions for target-driven grasping. While recent approaches utilizing Multimodal Large Language Models (MLLMs) have shown promising results, their extensive computation and data demands limit the feasibility of local deployment and customization. To address this, we propose a novel CLIP-based multimodal parameter-efficient tuning (PET) framework designed for three language-guided object grounding and grasping tasks: (1) Referring Expression Segmentation (RES), (2) Referring Grasp Synthesis (RGS), and (3) Referring Grasp Affordance (RGA). Our approach introduces two key innovations: a bi-directional vision-language adapter that aligns multimodal inputs for pixel-level language understanding and a depth fusion branch that incorporates geometric cues to facilitate robot grasping predictions. Experiment results demonstrate superior performance in the RES object grounding task compared with existing CLIP-based full-model tuning or PET approaches. In the RGS and RGA tasks, our model not only effectively interprets object attributes based on simple language descriptions but also shows strong potential for comprehending complex spatial reasoning scenarios, such as multiple identical objects present in the workspace.
|
|
11:20-11:25, Paper ThCT22.2 | |
Cascaded Diffusion Models for Neural Motion Planning |
|
Sharma, Mohit | Carnegie Mellon University |
Fishman, Adam | OpenAI |
Kumar, Vikash | Meta AI |
Paxton, Chris | Meta AI |
Kroemer, Oliver | Carnegie Mellon University |
Keywords: AI-Based Methods, Deep Learning in Grasping and Manipulation
Abstract: Robots in the real world need to perceive and move to goals in complex environments without collisions. Avoiding collisions is especially difficult when relying on sensor perception and when goals are among clutter. Diffusion policies and other generative models have shown strong performance in solving textit{local} planning problems, but often struggle at avoiding all of the subtle constraint violations that characterize truly challenging global motion planning problems. In this work, we propose an approach for learning global motion planning using diffusion policies, allowing the robot to generate full trajectories through complex scenes and reasoning about multiple obstacles along the path. Our approach uses cascaded hierarchical models which unify global prediction and local refinement together with online plan repair to ensure the trajectories are collision free. Our method outperforms (approx 5%) a wide variety of baselines on challenging tasks in multiple domains including navigation and manipulation.
|
|
11:25-11:30, Paper ThCT22.3 | |
Reinforcement Learning with Lie Group Orientations for Robotics |
|
Schuck, Martin | Technical University of Munich |
Bruedigam, Jan | Technical University of Munich |
Hirche, Sandra | Technische Universität München |
Schoellig, Angela P. | TU Munich |
Keywords: Deep Learning in Grasping and Manipulation, Reinforcement Learning, Grasping
Abstract: Handling orientations of robots and objects is a crucial aspect of many applications. Yet, ever so often, there is a lack of mathematical correctness when dealing with orientations, especially in learning pipelines involving, for example, artificial neural networks. In this paper, we investigate reinforcement learning with orientations and propose a simple modification of the network's input and output that adheres to the Lie group structure of orientations. As a result, we obtain a practically efficient implementation that is directly usable with existing learning libraries and achieves significantly better performance than other common orientation representations. We briefly introduce Lie theory specifically for orientations in robotics to motivate and outline our approach. Subsequently, a thorough empirical evaluation of different combinations of orientation representations for states and actions demonstrates the superior performance of our proposed approach in different scenarios, including: direct orientation control, end effector orientation control, and pick-and-place tasks.
|
|
11:30-11:35, Paper ThCT22.4 | |
DexTouch: Learning to Seek and Manipulate Objects with Tactile Dexterity |
|
Lee, Kang-Won | Dongguk University |
Qin, Yuzhe | UC San Diego |
Wang, Xiaolong | UC San Diego |
Lim, Soo-Chul | Dongguk University |
Keywords: Dexterous Manipulation, Reinforcement Learning, AI-Enabled Robotics
Abstract: The sense of touch is an essential ability for skillfully performing a variety of tasks, providing the capacity to search and manipulate objects without relying on visual information. In this paper, we introduce a multi-finger robot system designed to manipulate objects using the sense of touch, without relying on vision. For tasks that mimic daily life, the robot uses its sense of touch to manipulate randomly placed objects in dark. The objective of this study is to enable robots to perform manipulation without vision by using tactile sensation to compensate for the information gap caused by the absence of vision, given the presence of prior information. Training the policy through reinforcement learning in simulation and transferring the trained policy to the real environment,we demonstrate that manipulationwithout visual input can be applied to robots without vision. In addition, the experiments showcase the importance of tactile sensing in tasks performed without vision. Our project page is available at https://lee-kangwon.github.io/dextouch/
|
|
11:35-11:40, Paper ThCT22.5 | |
Catch It! Learning to Catch in Flight with Mobile Dexterous Hands |
|
Zhang, Yuanhang | Carnegie Mellon University |
Liang, Tianhai | Tsinghua University |
Chen, Zhenyang | Georgia Institute of Technology |
Ze, Yanjie | Stanford University |
Xu, Huazhe | Tsinghua University |
Keywords: Mobile Manipulation, Deep Learning in Grasping and Manipulation, Reinforcement Learning
Abstract: Catching objects in flight (i.e., thrown objects) is a common daily skill for humans, yet it presents a significant challenge for robots. This task requires a robot with agile and accurate motion, a large spatial workspace, and the ability to interact with diverse objects. In this paper, we build a mobile manipulator composed of a mobile base, a 6-DoF arm, and a 12-DoF dexterous hand to tackle such a challenging task. We propose a two-stage reinforcement learning framework to efficiently train a whole-body-control catching policy for this high-DoF system in simulation. The objects' throwing configurations, shapes, and sizes are randomized during training to enhance policy adaptivity to various trajectories and object characteristics in flight. The results show that our trained policy catches diverse objects with randomly thrown trajectories, at a high success rate of about 80% in simulation, with a significant improvement over the baselines. The policy trained in simulation can be directly deployed in the real world with onboard sensing and computation, which achieves catching sandbags in various shapes, randomly thrown by humans. Our project page is available at href{https://mobile-dex-catch.github.io/}{https://mobile-d ex-catch.github.io}
|
|
ThCT23 |
412 |
Legged Robots |
Regular Session |
Chair: Johnson, Aaron M. | Carnegie Mellon University |
Co-Chair: Zhao, Ding | Carnegie Mellon University |
|
11:15-11:20, Paper ThCT23.1 | |
Adaptive Complexity Model Predictive Control |
|
Norby, Joseph | Apptronik |
Tajbakhsh, Ardalan | Carnegie Mellon University |
Yang, Yanhao | Oregon State University |
Johnson, Aaron M. | Carnegie Mellon University |
Keywords: Optimization and Optimal Control, Legged Robots, Underactuated Robots, Dynamics
Abstract: This work introduces a formulation of model predictive control (MPC) which adaptively reasons about the complexity of the model while maintaining feasibility and stability guarantees. Existing approaches often handle computational complexity by shortening prediction horizons or simplifying models, both of which can result in instability. Inspired by related approaches in behavioral economics, motion planning, and biomechanics, our method solves MPC problems with a simple model for dynamics and constraints over regions of the horizon where such a model is feasible and a complex model where it is not. The approach leverages an interleaving of planning and execution to iteratively identify these regions, which can be safely simplified if they satisfy an exact template/anchor relationship. We show that this method does not compromise the stability and feasibility properties of the system, and measure performance in simulation experiments on a quadrupedal robot executing agile behaviors over terrains of interest. We find that this adaptive method enables more agile motion (55% increase in top speed) and expands the range of executable tasks compared to fixed-complexity implementations.
|
|
11:20-11:25, Paper ThCT23.2 | |
Benchmarking Different QP Formulations and Solvers for Dynamic Quadrupedal Walking |
|
Stark, Franek | Robotics Innovation Center, DFKI GmbH |
Middelberg, Jakob | German Research Center for Artificial Intelligence |
Mronga, Dennis | University of Bremen, German Research Center for Artificial Inte |
Vyas, Shubham | Robotics Innovation Center, DFKI GmbH |
Kirchner, Frank | University of Bremen |
Keywords: Performance Evaluation and Benchmarking, Whole-Body Motion Planning and Control, Legged Robots
Abstract: Quadratic Programs (QPs) are widely used in the control of walking robots, especially in Model Predictive Control (MPC) and Whole-Body Control (WBC). In both cases, the controller design requires the formulation of a QP and the selection of a suitable QP solver, both requiring considerable time and expertise. While computational performance benchmarks exist for QP solvers, studies comparing optimal combinations of computational hardware (HW), QP formulation, and solver performance are lacking. In this work, we compare dense and sparse QP formulations, and multiple solving methods on different HW architectures, focusing on their computational efficiency in dynamic walking of four-legged robots using MPC. We introduce the Solve Frequency per Watt (SFPW) as a performance measure to enable a cross-hardware comparison of the efficiency of QP solvers. We also benchmark different QP solvers for WBC that we use for trajectory stabilization in quadrupedal walking. As a result, this paper recommends a starting point for practitioners on the selection of QP formulations and solvers for different HW architectures in walking robots and indicates which problems should be devoted the greater technical effort.
|
|
11:25-11:30, Paper ThCT23.3 | |
Indoor and Outdoor Multi-Terrain Stair-Climbing Robot Design |
|
Chen, Wei-Ting | National Taiwan University |
Tsui, En-Chieh | National Taiwan University |
Yu, Wei-Shun | National Taiwan University |
Lin, Pei-Chun | National Taiwan University |
Keywords: Wheeled Robots, Legged Robots, Mechanism Design
Abstract: This paper introduces an Autonomous Mobile Robot (IOMT) designed for indoor and outdoor multi-terrain environments. The robot features a four-wheel independent drive and steering system (4WID-4WIS), allowing it to maintain high maneuverability on smooth surfaces. Additionally, based on reducing the control complexity, the IOMT addresses the challenges associated with stair climbing by providing stable pitch control, which effectively reduces the impact of stairs on the robot’s posture like pitch angle. The design also incorporates a special mechanism which reducing energy consumption through its worm gear system with self-locking characteristics, and combining steering with shock absorption to simplify both the mechanism complexity. This paper not only proposes a stair climbing strategy for the IOMT configuration but also explores the impact of various design parameters on the robot’s pitch angle, ultimately validating the feasibility and development potential of the design for multi-terrain mobility.
|
|
11:30-11:35, Paper ThCT23.4 | |
WaLTER: A Wheel and Leg Tumbling Expedition Robot |
|
Jay, David | FAMU-FSU College of Engineering |
Hackett, Jacob | Florida State University |
Bosscher, Paul | Harris Corporation |
Hubicki, Christian | Florida State University |
Clark, Jonathan | Florida State University |
Keywords: Wheeled Robots, Legged Robots, Field Robots
Abstract: For effective operation in challenging outdoor environments, mobile unmanned robots face stiff and competing demands including payload capacity, driving speed, range, as well as the ability to traverse rough terrain. To address these issues we introduce the hybrid wheel-leg quadrupedal robot WaLTER. WaLTER utilizes a unique combination of continuously rotating distal leg joints, actuated wheels, and a roll body DOF to efficiently drive on flat ground and effectively tumble over stairs and difficult, broken terrain. We developed intuitive teleoperation scheme and a employed deep reinforcement learning as proof of concept control techniques for the novel morphology. To test its capabilities, we constructed a multi-body simulation in MuJoCo and a 2.1-kg physical prototype for experimentation on traversability and energy economy. Our testing demonstrated the ability to traverse rougher terrain negotiation relative to larger-wheeled counterparts and reliable stair-climbing while maintaining a 4km range on a 24.4 Wh battery (COT: 1.21).
|
|
11:35-11:40, Paper ThCT23.5 | |
Deformable Multibody Modeling for Model Predictive Control in Legged Locomotion with Embodied Compliance |
|
Ye, Keran | University of California, Riverside |
Karydis, Konstantinos | University of California, Riverside |
Keywords: Dynamics, Legged Robots, Compliant Joints and Mechanisms
Abstract: The paper presents a method to stabilize dynamic gait for a legged robot with embodied compliance. Our approach introduces a unified description for rigid and compliant bodies to approximate their deformation and a formulation for deformable multibody systems. We develop the centroidal composite predictive deformed inertia (CCPDI) tensor of a deformable multibody system and show how to integrate it with the standard-of-practice model predictive controller (MPC). Simulation shows that the resultant control framework can stabilize trot stepping on a quadrupedal robot with both rigid and compliant spines under the same MPC configurations. Compared to standard MPC, the developed CCPDI-enabled MPC distributes the ground reactive forces closer to the heuristics for body balance, and it is thus more likely to stabilize the gaits of the compliant robot. A parametric study shows that our method preserves some robustness within a suitable envelope of key parameter values.
|
|
11:40-11:45, Paper ThCT23.6 | |
Learning Multi-Agent Loco-Manipulation for Long-Horizon Quadrupedal Pushing |
|
Feng, Yuming | Peking University |
Hong, Chuye | Tsinghua University |
Niu, Yaru | Carnegie Mellon University |
Liu, Shiqi | Carnegie Mellon University |
Yang, Yuxiang | Google Deepmind |
Zhao, Ding | Carnegie Mellon University |
Keywords: Multi-Robot Systems, Legged Robots, Reinforcement Learning
Abstract: Recently, quadrupedal locomotion has achieved significant success, but their manipulation capabilities, particularly in handling large objects, remain limited, restricting their usefulness in demanding real-world applications such as search and rescue, construction, industrial automation, and room organization. This paper tackles the task of obstacle-aware, long-horizon pushing by multiple quadrupedal robots. We propose a hierarchical multi-agent reinforcement learning framework with three levels of control. The high-level controller integrates an RRT planner and a centralized adaptive policy to generate subgoals, while the mid-level controller uses a decentralized goal-conditioned policy to guide the robots toward these sub-goals. A pre-trained low-level locomotion policy executes the movement commands. We evaluate our method against several baselines in simulation, demonstrating significant improvements over baseline approaches, with 36.0% higher success rates and 24.5% reduction in completion time than the best baseline. Our framework successfully enables long-horizon, obstacle-aware manipulation tasks like Push-Cuboid and Push-T on Go1 robots in the real world. The videos and code of this work can be found at: https://collaborative-mapush.github.io/.
|
|
ThLB2R |
Hall A1/A2 |
Late Breaking Results 6 |
Poster Session |
|
14:45-15:15, Paper ThLB2R.1 | |
Large Language Models As Natural Selector for Embodied Soft Robot Design |
|
Chen, Changhe | University of Michigan |
Xu, Xiaohao | University of Michigan, Ann Arbor |
Wang, Xiangdong | University of Michigan |
Huang, Xiaonan | University of Michigan |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Materials and Design, Performance Evaluation and Benchmarking
Abstract: Designing soft robots is a complex and iterative process that demands cross-disciplinary expertise in materials science, mechanics, and control, often relying on intuition and extensive experimentation. While Large Language Models (LLMs) have demonstrated impressive reasoning abilities, their capacity to learn and apply embodied design principles—crucial for creating functional robotic systems—remains largely unexplored. This paper introduces RoboCrafter-QA, a novel benchmark to evaluate whether LLMs can learn representations of soft robot designs that effectively bridge the gap between high-level task descriptions and low-level morphological and material choices. RoboCrafter-QA leverages the EvoGym simulator to generate a diverse set of soft robot design challenges, spanning robotic locomotion, manipulation, and balancing tasks. Our experiments with state-of-the-art multi-modal LLMs reveal that while these models exhibit promising capabilities in learning design representations, they struggle with fine-grained distinctions between designs with subtle performance differences. We further demonstrate the practical utility of LLMs for robot design initialization. Our code and benchmark will be available to encourage the community to foster this exciting research direction.
|
|
14:45-15:15, Paper ThLB2R.2 | |
Toward Embedded LLM-Guided Navigation and Object Detection for Aerial Robots |
|
Suganda, Richie Ryulie | University of Houston |
Hu, Bin | University of Houston |
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, AI-Enabled Robotics
Abstract: This work presents a novel framework that integrates natural language understanding with autonomous quadrotor navigation and object detection, aiming toward fully embedded, language-driven autonomy. Our long-term objective is to develop efficient and lightweight Large Language Models (LLMs) deployable on edge platforms such as the ModalAI Seeker drone. We leverage Low-Rank Adaptation (LoRA) to fine-tune a pre-trained LLaMA model on a custom dataset composed of natural language commands for exploration and object localization. The proposed system adopts a hierarchical architecture: the LLM interprets high-level language instructions and translates them into task-level goals, which are then executed via onboard modules including visual-inertial odometry (VIO)-based control, path planning, and real-time object detection. To evaluate the framework, we design a hardware-in-the-loop (HIL) testbed using the ModalAI Seeker platform, enabling closed-loop validation in realistic scenarios. In our current setup, LLM inference runs on an offboard workstation equipped with an RTX 4090 GPU, while the drone autonomously handles the perception and control stack.This poster demonstrates a proof-of-concept toward scalable, natural language-based human-robot interaction in real-world environments.
|
|
14:45-15:15, Paper ThLB2R.3 | |
Deep Learning Model Enables Prediction of Future Physiological States During Perturbed Locomotion |
|
Arvelo Rojas, Sofia | Georgia Institute of Technology |
Leestma, Jennifer | Georgia Institute of Technology |
Sawicki, Gregory | Georgia Institute of Technology |
Young, Aaron | Georgia Tech |
Keywords: Prosthetics and Exoskeletons, Wearable Robotics, Human Performance Augmentation
Abstract: Falling is the leading cause of injury related death in older adults. Exoskeletons have the potential to augment balance to reduce fall rate, however, classical control strategies are not well suited for transient movements. Recently, a deep learning-based task-agnostic controller based on the user’s physiological state was shown to decrease metabolic cost across cyclic and non-cyclic tasks, showing this controller’s ability to generalize. However, the effectiveness of this approach for perturbation recovery has not been investigated. We anticipate that the timing of joint moment-driven exoskeleton assistance will differ for approaches that aim to augment balance. Metabolic cost-reducing approaches apply joint moment assistance that is delayed relative to the user’s joint moment, while faster-than-human assistance has proven beneficial for balance augmentation. The aim of this study is to optimize a deep learning model that predicts future joint moments, enabling faster-than-human control. We trained a series of temporal convolutional networks that used wearable IMUs (pelvis, torso, and bilateral dorsal foot, shank, thighs) to predict joint moments. We trained models that forecasted joint moments 0ms (estimation at current time), 40ms, 80ms, and 120ms into the future. We found that model R2 were above 0.8 for most forecasts, a threshold previously shown to allow for highly controllable exoskeletons. This controller shows potential to assist perturbed locomotion
|
|
14:45-15:15, Paper ThLB2R.4 | |
FireantV3 Geometry and Locomotion Simplifies Localization in Self-Assembled Structures |
|
Rashidioun, Mohammadali | New Jersey Institute of Technology |
Sosa, Michael | New Jersey Institute of Technology |
Swissler, Petras | New Jersey Institute of Technology |
Keywords: Swarm Robotics, Localization, Climbing Robots
Abstract: Accurate localization plays a key role in swarm robotics, especially for algorithms towards modular and self-assembling robotic (MSR) systems, where knowing the shape of the self-assembled structure is often important. However, most methods for achieving localization in MSR systems rely on known offsets of robots within a lattice, and thus are not useful for free-form MSR systems. Work towards localizing in free-form systems has thus far required complex sensor arrays and extensive global coordination. In this work, we introduce a 3D localization approach for FireAntV3 robots that relies solely on local communication between robots. The robots rely on contact-based sensing using vibration to identify neighboring robots and communicate information. In the simulation-based investigation we present here, the first robot serves as an origin reference that robots localize off of as they move about the structure while executing the ReactiveBuild algorithm. The ability to localize contacts between the three spheres of the FireAntV3 robots and the discrete nature of robot locomotion provides ample information for robots to use in a gradient descent localization algorithm. With the presented method we find that the average absolute localization error across a 100 robot structure with no larger than two dock radius, demonstrating the utility of the presented approach.
|
|
14:45-15:15, Paper ThLB2R.5 | |
Let's Make a Splan: Risk-Aware Trajectory Optimization in a Normalized Gaussian Splat |
|
Michaux, Jonathan | University of Michigan |
Isaacson, Seth | University of Michigan |
Enninful Adu, Challen | University of Michigan |
Li, Adam | University of Michigan |
Swayampakula, Rahul Kashyap | University of Michigan, Ann Arbor |
Ewen, Parker | University of Michigan |
Rice, Sean | University of Michigan |
Skinner, Katherine | University of Michigan |
Vasudevan, Ram | University of Michigan |
Keywords: Manipulation Planning, Sensor-based Control, RGB-D Perception
Abstract: Neural Radiance Fields and Gaussian Splatting have recently transformed computer vision by enabling photo-realistic representations of complex scenes. However, they have seen limited application in real-world robotics tasks such as trajectory optimization. This is due to the difficulty in reasoning about collisions in radiance models and the computational complexity associated with operating in dense models. This paper addresses these challenges by proposing SPLANNING, a risk-aware trajectory optimizer operating in a Gaussian Splatting model. This paper first derives a method to rigorously upper-bound the probability of collision between a robot and a radiance field. Then, this paper introduces a normalized reformulation of Gaussian Splatting that enables efficient computation of this collision bound. Finally, this paper presents a method to optimize trajectories that avoid collisions in a Gaussian Splat. Experiments show that SPLANNING outperforms state-of-the-art methods in generating collision-free trajectories in cluttered environments. The proposed system is also tested on a real-world robot manipulator. A project page is available at https://roahmlab.github.io/splanning.
|
|
14:45-15:15, Paper ThLB2R.6 | |
Analyzing the Limitations of Imitation Learning for Autonomous Surgical Robot Environments |
|
Jairam, Andrew | University of Toronto |
Sun, Jinjie | University of Toronto |
Pore, Ameya | University of Toronto |
Kahrs, Lueder Alexander | University of Toronto Mississauga |
Keywords: Imitation Learning, Surgical Robotics: Laparoscopy
Abstract: As the field of supervised machine learning continues to evolve, imitation learning (IL) emerges as a promising alternative to reinforcement learning (RL) for training autonomous surgical robotic agents. In particular, Transformer-based models such as the Action Chunking Transformer (ACT) show potential to robustly clone expert demonstrations that require fine-grained manipulation while bypassing the need for bridging the simulation-to-reality gap experienced in training RL policies. This study systematically investigates the limitations and challenges associated with using ACT-based IL for learning surgically relevant tasks on a simulated and real da Vinci Research Kit (dVRK). Through a series of controlled experiments using simulated and real dVRK setups, this work evaluates the sensitivity of ACT models to environmental and architectural factors such as lighting, demonstration quality, and chunk size hyperparameter choices. Baseline accuracies of manipulation tasks are presented, along with experiments assessing model robustness under distractors and visual inconsistencies. The findings highlight the critical importance of dataset quality and lighting consistency in achieving successful behaviour cloning. While IL proves viable for certain structured tasks under controlled conditions, the study outlines clear limitations and identifies future directions to improve generalization and robustness of learned policies for real-world surgical deployment.
|
|
14:45-15:15, Paper ThLB2R.7 | |
Structured Noise for Better State-Conditioned Diffusion in Robotic Motion Planning |
|
Kim, Minji (Amelie) | Georgia Tech |
Zhao, Ye | Georgia Institute of Technology |
Keywords: Motion and Path Planning, Probabilistic Inference
Abstract: In diffusion-based motion planning, the standard use of zero-mean isotropic Gaussian noise may overlook an opportunity to encode task-relevant structure into the generative process. We explore a simple yet effective modification: replacing the standard noise with a non-standard Gaussian distribution—specifically, a Gaussian mixture—whose mode-specific mean and covariance encode desirable properties of the trajectory, such as smoothness and state conditioning. We begin by deriving a forward diffusion formulation using general, non-standard Gaussian noise. We then derive corresponding reverse denoising formulations, both for neural-network-based and model-based diffusion models. Finally, we propose a specific design for the mean and covariance structure—realized through the Gaussian mixture components—to encode trajectory priors. When applied to robot motion planning with both types of diffusion models, our structured noise formulation improves the quality of the generated trajectories compared to the baselines using standard Gaussian noise.
|
|
14:45-15:15, Paper ThLB2R.8 | |
Task-Dependent Grasp Metric with Object and Manipulator Dynamics |
|
Patankar, Aditya | Stony Brook University |
Mahalingam, Dasharadhan | Stony Brook University |
Chakraborty, Nilanjan | Stony Brook University |
Keywords: Grasping, Manipulation Planning
Abstract: Grasp evaluation is a key aspect of many grasp planning algorithms. This paper introduces a novel optimization framework as a second-order cone program (SOCP) for evaluating a grasp's ability to perform the task. We formalize the notion of a manipulation task as a constant screw motion (or a sequence of constant screw motions) to be applied to the object after grasping. Our proposed task-dependent grasp metric evaluates a grasp's ability to generate force/moment to impart the desired constant screw motion. A key distinguishing feature of our grasp metric is its ability to incorporate the manipulator and object dynamics. It also considers nonlinear friction cone constraints at the object-robot and the object-environment contact. Since our proposed quality measure is an SOCP, it can be solved optimally using interior point methods. We present simulation results for the task of pivoting, showing the effectiveness of our approach.
|
|
14:45-15:15, Paper ThLB2R.9 | |
STITCHER: Real-Time Trajectory Planning with Motion Primitive Search |
|
Levy, Helene J. | University of California, Los Angeles |
Lopez, Brett | University of California, Los Angeles |
Keywords: Motion and Path Planning, Aerial Systems: Perception and Autonomy, Collision Avoidance
Abstract: Autonomous high-speed navigation through large, complex environments requires real-time generation of agile trajectories that are dynamically feasible, collision-free, and satisfy state or actuator constraints. Most modern trajectory planning techniques rely on numerical optimization because high-quality, expressive trajectories that satisfy various constraints can be systematically computed. However, meeting computation time constraints and the potential for numerical instabilities can limit the use of optimization-based planners in safety-critical scenarios. This work presents an optimization-free planning framework that stitches short trajectory segments together with graph search to compute long range, expressive, and near-optimal trajectories in real-time. Our STITCHER algorithm is shown to outperform modern optimization-based planners through our innovative planning architecture and several algorithmic developments that make real-time planning possible. Extensive simulation testing is conducted to analyze the algorithmic components that make up STITCHER, and a thorough comparison with two state-of-the-art optimization planners is performed. It is shown STITCHER can generate trajectories through complex environments over long distances (tens of meters) with low computation times (milliseconds).
|
|
14:45-15:15, Paper ThLB2R.10 | |
STATE-NAV: Stability-Aware Traversability Estimation for Bipedal Navigation Over Rough Terrain |
|
Yoon, Ziwon | Georgia Institute of Technology |
Zhu, Yunzhou | University of Edinburgh |
Gan, Lu | Georgia Institute of Technology |
Zhao, Ye | Georgia Institute of Technology |
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Motion and Path Planning
Abstract: Bipedal robots have advantages in human-centered environments but face greater failure risk compared to more stable morphologies such as wheeled or quadrupedal robots. While learning-based traversability has been widely explored for these platforms, bipedal traversability has largely been addressed using heuristic methods, with limited consideration for instability on rough terrain. In this work, we present the first learning-based traversability estimation and safe navigation framework tailored to bipedal robots operating in diverse, uneven environments. We train our transformer-based neural network to predict bipedal locomotion instability in a risk-sensitive manner, enabling safer locomotion. Based on the predicted instability, we define traversability as stability-aware command velocity—the fastest velocity that can keep instability below a user-defined limit. This velocity-based traversability is intergrated into a hierarchical planner, combining RRT* for finding time-efficient global path planning and MPC for the safe execution. We validate our method in MuJoCo simulation, demonstrating improved safety and navigation performance, with enhanced robustness across varying terrains compared to traditional baselines.
|
|
14:45-15:15, Paper ThLB2R.11 | |
Congestion Mitigation for Foraging Robot Swarms Using Trajectory Analysis and Adaptive Spiral Path Strategies |
|
Gonzalez, Arturo | University of Texas at Rio Grande Valley |
Trevino, Artemisa | University of Texas Rio Grande Valley |
Lu, Qi | The University of Texas Rio Grande Valley |
Keywords: Swarm Robotics, Multi-Robot Systems, Biologically-Inspired Robots
Abstract: As foraging robot swarms grow in size, congestion and inter-robot collisions become critical challenges, especially in high-traffic areas like the resource collection zone where robots drop off collected resources. This work aims to mitigate congestion through trajectory analysis and adaptive spiral path strategies. The focus was to enhance resource collection efficiency while reducing inter-robot collisions. A data-driven method was developed using 112 simulated robot trajectories, which were analyzed to extract key movement features. These features fed into a logistic regression model that detects congestion with approximately 93% accuracy, enabling real-time behavior adjustments. Initial results demonstrate substantial improvements through the use of adaptive spiral paths, like square and circular spirals, where robots enter these paths when congestion is detected. Resource delivery rates increased by over 120%, and collision time decreased by 21%, validating the method’s effectiveness. This research underscores the potential of dynamic behavior modification to improve scalability and coordination in swarm robotics.
|
|
14:45-15:15, Paper ThLB2R.12 | |
Vision-Language Action Models for Autonomous Liver Ultrasound: A Comparative Study of π0 and Transformer-Based Policies |
|
McLeod, Angus Jonathan | NVIDIA |
Nelson, Nigel | NVIDIA |
Varvak, Peter | Eonite Perception Inc |
Simson, Walter | Technical University Munich |
Moghani, Masoud | University of Toronto |
Hom, Wendell | Georgia Institute of Technology |
Ofir, Maximilian | NVIDIA |
Alle, Sachidanand | NVIDIA |
Johannsmeier, Lars | Franka Robotics GmbH |
Azizian, Mahdi | Nvidia |
Keywords: Medical Robots and Systems, Imitation Learning
Abstract: Building on advancements in Vision Language Models, researchers have adapted these into Vision Language Action (VLA) models to achieve state-of-the-art results on various robotic manipulation benchmarks. We present the application of the recently released π0 model pretrained on 10,000 hours of robotic data and fine-tune it to perform a cursory survey of the liver. As a baseline, we compare against trained Action Chunking Transformers (ACT). Both methods were trained on 175 ultrasound sweeps representing approximately an hour of data acquired on a tissue mimicking phantom. The methods were then evaluated based on their ability to achieve targeted views during initial contact as well as during the middle and end of the scan. The π0 policy had a higher success rate in acquiring each of these three views. The most challenging target was the initial contact where π0 was successful 47% of the time, compared to only 7% with ACT. The success rate for π0 and ACT was 67% vs 33% and 60% vs 33% for the middle and end scan views respectively. These preliminary results show the potential value of foundational VLA models in autonomous ultrasound acquisitions, an area characterized by many highly complex tasks and where obtaining large numbers of expert demonstrations is difficult. While imitation learning has been applied to robotic ultrasound acquisitions before, to our knowledge, this work represents the first use of VLA in robotic ultrasound.
|
|
14:45-15:15, Paper ThLB2R.13 | |
Open-PAV: An Open Platform Designed to Facilitate Data, Model, and Simulation of Product AV |
|
Ma, Ke | University of Wisconsin-Madison |
Zhou, Hang | University of Wisconsin-Madison |
Li, Xiaopeng | University of Wisconsin-Madison |
Keywords: Data Sets for Robot Learning, Autonomous Vehicle Navigation
Abstract: Open-PAV (Open Product Automated Vehicle) is an open platform designed to facilitate data collection, model calibration, and simulation of producted automated vehicle behaviors. It integrates diverse datasets and calibrated vehicle models, making it an essential tool for researchers and developers aiming to study product automated vehicle (PAV) dynamics and their impacts. The project encourages contributions from the research community and provides ready-to-use model parameters for seamless integration with simulation tools.
|
|
14:45-15:15, Paper ThLB2R.14 | |
Learning Precise Robot Motion from Demonstration with Constraint-Aware Refinement |
|
Joo, Sungmoon | Korea Atomic Energy Research Institute |
Keywords: Learning from Demonstration, Imitation Learning, Deep Learning Methods
Abstract: We present a modular system for generating precise robot motion from human demonstrations with constraint-aware refinement. Targeting flexible, precision assembly tasks in small-to-medium factories, our system addresses key challenges in motion planning, data collection, and execution reliability. Demonstrations are captured in real time using a portable teaching interface equipped with GoPro camera, HTC Vive tracking and encoder-based gripper sensing, supporting physical, simulated, and real-robot teleoperation modes. Motion generation follows a two-step approach: (1) learning an initial trajectory using GMM/GMR or a diffusion policy, and (2) refining it via constrained nonlinear optimization that ensures goal accuracy and adherence to robot limits. Compared to GMM/GMR, diffusion better captures multi-modal behavior and generalizes to unseen conditions. The refinement stage significantly improves precision and constraint satisfaction. Experiments show GMM/GMR method learned unimodal behavior only, and failed on distant test cases (from demonstration). The optimization-based refinement improved success rate but didn’t guarantee success for all test cases. The diffusion model, however, learned multimodal behavior, and the refinement ensured success in all test cases, highlighting the system’s effectiveness for generalizable, high-precision robotic tasks.
|
|
14:45-15:15, Paper ThLB2R.15 | |
Understanding Human Fall from Standing Via Whole-Body Musculoskeletal Simulation |
|
Ma, Chengtian | Tsinghua University |
Wei, Yunyue | Tsinghua University |
Zuo, Chenhui | Tsinghua University |
Zhang, Chen | Tsinghua University |
Sui, Yanan | Tsinghua University |
Keywords: Modeling and Simulating Humans, Body Balancing, Robotics and Automation in Life Sciences
Abstract: Understanding human balancing and falling is crucial for advancing the design of assistive robots and human-like bipedal robots. This study uses a high-dimensional musculoskeletal model with a comprehensive set of skeletal muscles to simulate the transition from standing to falling. We investigate the intrinsic temporal dynamics, motor responses, and the distribution of related metrics across the human body during loss of balance. Through large-scale simulations, we identify a central stability region in front of the feet where center-of-mass trajectories converge, and demonstrate that backward and lateral leaning are associated with increased fall risk. Under fall-prone conditions, such as muscle weakness and neural control latency, we observe shorter fall durations and more consistent fall patterns. Contact analysis reveals frequent collisions at the pelvis, thigh, and head, consistent with clinical observations of hip and head injuries. These findings contribute to a more quantitative understanding of human balancing and falling, and provide insights for the design of bio-inspired control strategies and fall prevention systems.
|
|
14:45-15:15, Paper ThLB2R.16 | |
Spin Swimmer: A Fast, Efficient and Agile Fish-Like Robot |
|
Chivkula, Prashanth | Clemson University |
Tallapragada, Phanindra | Clemson University |
Keywords: Marine Robotics, Biologically-Inspired Robots, Mechanism Design
Abstract: Underwater roboticists have long aimed to replicate the remarkable swimming capabilities of fish, combining speed, efficiency, and agility. To achieve this engineers and robotocists have focused their efforts on the design of soft or articulated multi-body tails that can oscillate or undulate at frequencies and amplitudes similar to those of the fish they seek to mimic. Such kinematic approaches do not account for the complex interaction of the efficiency of actuation, dynamic response of flexible appendages, and hydrodynamic forces. This work presents a fundamentally novel means of mechanical actuation: a fast spinning unbalanced rotor internal to the body of the robot, that transfers a periodic axial force to an otherwise passive flexible tail. The net result is that the tail acts as a parametric oscillator that undergoes a 2:1 subharmonic resonance. High tail-beat frequencies are achieved with minimal input power due to this parametric resonance. The resulting robot has the lowest cost of transport amongst free swimming robots while also being fast, extremely agile and gyroscopically roll and pitch stable. The results demonstrate the importance of exploiting parametric resonances in designing efficient fish-like robots.
|
|
14:45-15:15, Paper ThLB2R.17 | |
Multi-Sensor Wearable Systems for Proactive Fall Risk Assessment |
|
Im, Nathaniel | Portsmouth Abbey School |
Keywords: Whole-Body Motion Planning and Control
Abstract: This study presents an AI-based wearable system for proactive fall risk prediction in older adults. The system integrates IMUs and pressure sensors at key anatomical locations to continuously monitor gait patterns. Using temporal convolutional networks (TCNs) and LSTM neural networks, the system analyzes spatiotemporal features to predict fall risk. Cross-dataset validation demonstrated 91% mean accuracy (±3%) with an AUC of 0.92, while a pilot study with 20 older adults showed 88% accuracy. The TCN model achieved 89% sensitivity and 93% specificity, with a 30% reduction in false alarms compared to single-sensor systems. This technology enables early detection of fall risk indicators weeks or months before incidents occur, allowing for timely interventions that can improve health outcomes and extend independence for older adults, ultimately reducing healthcare costs associated with fall-related injuries.
|
|
14:45-15:15, Paper ThLB2R.18 | |
Myoelectric Spatiotemporal Fusion for Gesture Recognition in Human-Robot Interaction |
|
Ying, Zhenzhi | The University of Tokyo |
Li, Jiaxuan | Dalian University of Technology |
Zhao, Yuxin | Dalian University of Technology |
Zhang, Xianyu | The University of Tokyo |
Li, Shihao | The University of Tokyo |
Sugita, Naohiko | The University of Tokyo |
Shu, Liming | Dalian University of Technology |
Keywords: Gesture, Posture and Facial Expressions, Prosthetics and Exoskeletons, Physical Human-Robot Interaction
Abstract: The application of gesture recognition in human-robot interaction (HRI) demands high levels of robustness and accuracy, while controlling model computational speed and memory usage. Currently, deep learning-based methods for muscle activity feature extraction fail to effectively capture spatiotemporal information from electromyography(EMG) signals, potentially leading to the curse of dimensionality. This paper introduces a novel temporal and cross-channel convolutional neural network (TCCNN) algorithm to process EMG signals and enhance the accuracy and robustness of gesture recognition. TCCNN utilizes a circular concatenation and cross-channel learning approach to extract EMG data correlations between muscle fibers and their adjacent fibers. By integrating parallel and serial processing methods to obtain fused spatiotemporal features, the proposed algorithm improves feature extraction capabilities. In offline experiments containing 17 gestures with 10 subjects, TCCNN achieved a gesture recognition accuracy of 95.21% and significantly outperformed baseline models in recall rates. Online tests further confirmed the system's low latency and natural interaction quality, demonstrating its suitability for real-world HRI applications.
|
|
14:45-15:15, Paper ThLB2R.19 | |
Integration of Vision, Language, and Video-Based Learning for Robotic Tasks |
|
Ganatra, Shyam | Arizona State University |
Jeong, Heejin | Arizona State University |
Keywords: Visual Learning, Computer Vision for Automation, Human Factors and Human-in-the-Loop
Abstract: This research presents a novel framework for integrating vision, language, and video-based learning to enhance robotic task execution in dynamic environments. Current robotics systems struggle with generalizing task execution across different scenarios due to limitations in multimodal learning capabilities. Our approach addresses this challenge by developing a comprehensive architecture that enables robots to learn task representations from diverse input modalities autonomously. The proposed framework consists of three primary components: (1) a Vision-Language Model incorporating object detection, semantic segmentation, and visual question answering for scene understanding; (2) a Video Understanding Module extracting action sequences, object interactions, and workflows from demonstrations; and (3) a Task Representation Framework synthesizing multimodal inputs into unified representations for planning. Large Language Models provide reasoning capabilities to generate adaptive task plans, while reinforcement learning optimizes execution based on simulation and real-world feedback. Our experimental methodology includes system validation using the Reachy humanoid robot with camera and microphone inputs and human-robot interaction studies to evaluate adaptation to user feedback. Implemented using PyTorch, OpenCV, and ROS2, with models trained, the system will be evaluated in both Isaac SIM environments and real-world scenarios.
|
|
14:45-15:15, Paper ThLB2R.20 | |
Gaussian Mixture Koopman Kalman Filter: Application to a Closed Koopman System |
|
Van Heck, Cedric | UGent - University of Ghent |
Coene, Annelies | Ghent University |
Crevecoeur, Guillaume | Ghent University |
Keywords: Probabilistic Inference, Probability and Statistical Methods, Dynamics
Abstract: This work extends the Koopman Kalman Filter (KKF), a method that leverages Koopman operator theory to represent nonlinear dynamical systems in a lifted space with linear evolution. By modeling the extended state as a joint Gaussian distribution, KKF enables the use of classical linear filtering tools for nonlinear estimation. Building on this foundation, we explore richer probabilistic representations within the same Koopman-based framework. In particular, we introduce a Gaussian Mixture Model variant (GMMKKF), where the extended state is modeled as a mixture of Gaussians. Each component evolves under globally linear models, enabling efficient inference in systems with significant nonlinearity. Compared to traditional approaches like the Extended Kalman Filter—which relies on local linearizations and unimodal Gaussian beliefs—our method captures multimodal uncertainty for nonlinear evolving systems, while retaining computational tractability. This filtering perspective offers a principled and scalable approach to nonlinear state estimation.
|
|
14:45-15:15, Paper ThLB2R.21 | |
Infrastructure Inspection with Robots and 3D Reconstruction Methods: A Comparative Study of Computer Vision Algorithms |
|
Ramírez Vázquez, Hortencia Alejandra | Tecnologico De Monterrey |
Pacheco Ramírez, Max | Tecnologico De Monterrey |
Salcedo Vazquez, Andrea Marisol | Tecnológico De Monterrey |
Ceron Lopez, Arturo Eduardo | Tecnologico De Monterrey |
Keywords: Field Robots, Environment Monitoring and Management, Robotics in Hazardous Fields
Abstract: This study evaluates the feasibility of using 3D reconstruction methods based on Structure-from-Motion (SfM), such as 3D Meshing and 3D Gaussian Splatting, as complementary tools for surface inspection in industrial and civil infrastructure settings. Our focus is on detecting small-scale and fine defects (e.g. cracks, scuffs, edge chips) with high accuracy. Our approach consists of taking several pictures of an object using a single RGB camera mounted on an Omnidirectional Mobile Robot. Then, we benchmarked the defect representation accuracy of the 3D reconstruction methods through a set of feature detection methods based on Computer Vision (CV), covering both Classical CV (e.g. Canny edge detection, contour detection, etc.) as well as Modern CV (e.g. Class-Activation Map (CAM) based on CNNs).
|
|
14:45-15:15, Paper ThLB2R.22 | |
Flow Matching Architecture for Navigation |
|
Hernandez, Eduardo | Tecnologico De Monterrey |
Diaz, Alan Ulises | Tecnologico De Monterrey |
Ceron Lopez, Arturo Eduardo | Tecnologico De Monterrey |
Keywords: Vision-Based Navigation, Planning under Uncertainty, Motion and Path Planning
Abstract: This work explores a multimodal approach for generating instantaneous, collision-free trajectories for robotic navigation in novel environments. By utilizing RGB and sparse point cloud data, we aim to enhance environmental awareness and navigation performance. Our focus is on developing a model that maintains a robust performance across unseen settings integrating both Efficient Channel Attention (ECA) and Transformer-based bottleneck models. Initial results demonstrate the potential of fusing RGB and point cloud modalities for an effective navigation. The model is initially validated in the PushT simulation environment, lowering the number of parameters of existing baselines in typical Diffusion Policies. However, the absence of a Joint multi-modal encoder for native fusion of raw modalities presents a key limitation. Future work will address this by integrating such a component and incorporating online reinforcement learning to enable real-time trajectory correction. We plan to validate our approach through deployment on a modified Shuttle Personnel Carrier EG6088K, equipped with a Velodyne HDL-32 and Multisense S21 camera, in a dynamic outdoor environment.
|
|
14:45-15:15, Paper ThLB2R.23 | |
Generalized Gimbal Construction: Algorithmic Design of Kinematic Chains with No Self-Collision in Any Configuration |
|
Feshbach, Daniel | University of Pennsylvania |
Schaumburg, Emil | University of Pennsylvania |
Chen, Wei-Hsi | University of Pennsylvania |
Sung, Cynthia | University of Pennsylvania |
Keywords: Kinematics, Computational Geometry, Motion and Path Planning
Abstract: We present a linear-time design algorithm mapping any sequence of axes of motion to a kinematic chain design implementing those axes, arranged such that no configurations have self-collision. Specifically, the algorithm is responsible for placing each joint in a specific pose on its axis of motion and finding link shapes connecting them. The core idea generalizes the structure of gimbals to deal with axes of motion that do not all intersect at a point: the algorithm maintains a bounding sphere about everything generated so far such that the sphere centers on the previous axis of motion, then routes outside this sphere to place and connect to the next joint. The algorithm is enabled by abstracting mechanism thickness as a bounding radius and thus link shapes as tubes defined by their centerline paths. Since a tube cannot bend faster than its own radius, turning the link design problem into a Dubins planning problem. Proof of the algorithm provides justification for using this tubular abstraction in computational design of linkages, showing that restricting the design space in this way makes it tractable to explore but still fully kinematically expressive. The algorithm is also useful as an initialization to be further optimized for compactness or other properties. We have a preliminary approach to sequentially re-arrange joints (preserving their axes of motion) and link waypoints to minimize chain length while maintaining self-collision avoidance in the neutral configuration.
|
|
14:45-15:15, Paper ThLB2R.24 | |
Towards Mobile Robotic Optical Coherence Tomography for Practical Clinical Imaging |
|
Pan, Haochi | University of Michigan |
Zhou, Genggeng | Stanford University |
Staudinger, Samantha | University of Michigan |
Liu, Jiawei | University of Michigan |
Fleifil, Salma | University of Michigan |
Jin, Catherine | University of Michigan |
Valikodath, Nita | University of Michigan |
Draelos, Mark | University of Michigan |
Keywords: Medical Robots and Systems
Abstract: Optical coherence tomography (OCT) is an indispensable technology in ophthalmology for diagnosing and managing eye disease and requires patients who can sit upright and participate in imaging. To overcome these barriers, we introduce a mobile and motion-tolerant robotic OCT system that is capable of versatile use in both outpatient and inpatient clinical environments and is suitable for imaging in diverse clinical configurations. Our system includes a robot arm and a vertical lift, and is equipped with real-time nested MPPI motion planning algorithms for face and pupil movement tracking and obstacle collision avoidance during the imaging process. To enhance motion tolerance, we implement the model-free autoregressive filter-based active cancellation for eye movement. Motion cancellation aiming offsets are derived from filter predictions instead of relying on the last observed eye position. We validate the system’s workspace, dynamic tracking, and obstacle avoidance capabilities and then evaluate the effectiveness of predictive cancellation using a retinal eye phantom on a motorized stage to simulate jerk nystagmus. Predictive scan aiming stabilized repeated B-scans and retained contrast within retinal structures, which correspond to higher mean brightness, while residual motion is observed during traditional scan aiming. The combination of these results enables us to move toward clinical deployment in the near future.
|
|
14:45-15:15, Paper ThLB2R.25 | |
RKHS Gaussian Splatting SLAM |
|
Wu, Junzhe | University of Michigan |
Zhang, Ray | University of Michigan |
Ghaffari, Maani | University of Michigan |
Keywords: SLAM, Mapping
Abstract: Recent advancements in 3D Gaussian Splatting (3D-GS) have enabled visually appealing and computationally efficient scene representations for robotics applications. However, while these approaches deliver high visual fidelity, they often fall short in preserving the geometric accuracy required for robust Simultaneous Localization and Mapping (SLAM), particularly in tasks such as manipulation and autonomous driving. In this work, we introduce RKHS GS SLAM, a novel RGBD SLAM system that leverages a 3D-GS-based non-correspondence registration algorithm, augmented with a robust loop closure module and global pose optimization framework. Unlike conventional point cloud registration methods, our approach integrates covariance and feature information inherent to the 3D-GS representation, thereby preserving both visual and geometric properties even when employing a sparser mapping strategy. Extensive experiments on simulated and real-world datasets reveal that our sparse 3D-GS mapping retains most of the performance of dense mappings while reducing the number of Gaussians by 80%. Furthermore, our registration algorithm demonstrates improved tracking accuracy when compared to traditional point cloud registration and rendering-based pose optimization techniques. These results highlight the potential of 3D-GS for efficient and precise SLAM in complex robotic applications.
|
|
14:45-15:15, Paper ThLB2R.26 | |
Learning PID Gains for Planar Tracking of Bio-Inspired Swimming Robots Using Simplified Models |
|
Loya, Kartik | Clemson University |
Chivkula, Prashanth | Clemson University |
Tallapragada, Phanindra | Clemson University |
Keywords: Incremental Learning, Reinforcement Learning, Biologically-Inspired Robots
Abstract: The rising demand for autonomous systems capable of operating in complex marine environments has led to growing interest in underwater and bio-inspired fish-like robots, with applications in environmental monitoring, search and rescue, and offshore inspection. However, accurately modeling and controlling these robots remains a major challenge due to nonlinear fluid-structure interactions, high degrees of freedom, and the unsteady nature of underwater dynamics. To address these complexities, reduced-order models that capture the essential system dynamics offer a promising balance between accuracy and computational efficiency for real-time control. While kinematic models have been widely used in fish-like robots, dynamic models remain relatively scarce, particularly in the context of control. In this work, we develop a reduced-order dynamic model based on a modified Chaplygin Sleigh tailored to an underactuated, internally actuated fish-like robot driven by a reaction wheel. This model serves as the basis for a control framework in which we use data-driven methods and reinforcement learning to tune control gains in order to track a planar trajectory in an efficient and optimal manner. The approach is then validated through implementation and testing on the robotic platform.
|
|
14:45-15:15, Paper ThLB2R.27 | |
Imitation Balancing Control Using Human Balancing Skills for a Wheeled Inverted Pendulum Robot with a Fan |
|
Dohyeon, Kim | Jeonbuk National University |
Jung, Yeongtae | Jeonbuk National University |
Keywords: Telerobotics and Teleoperation, Imitation Learning, Wheeled Robots
Abstract: Wheeled Inverted Pendulum (WIP) robots offer agile mobility but suffer from unstable, underactuated dynamics. To address this, we are developing a Wheeled Inverted Pendulum with a Fan (WIPF), which uses bidirectional fan thrust to achieve full actuation. This study proposes an imitation balancing controller that leverages human balancing skills. A human-machine interface (HMI) synchronizes the human and robot by sending reference trajectories from the human to the robot and providing force feedback based on their state differences. During teleoperation, external disturbances are applied to the robot, and the user responds by balancing it in real time. A 1D Convolutional Neural Network (1D-CNN) was trained to imitate the human-generated reference trajectory. The proposed controller outperformed traditional Linear Quadratic Regulator (LQR) in both simulations and experiments. Future work will extend to locomotion control and more dynamic movements beyond balancing.
|
|
14:45-15:15, Paper ThLB2R.28 | |
Universal Control Barrier Functions for Agile, but Safe, Multi-Robot Control in Dynamic, Cluttered Environments |
|
Chandra, Rohan | University of Virginia |
Keywords: Multi-Robot Systems
Abstract: Robots operating in complex human environments must balance two often conflicting objectives: agility and safety. Traditional safety guarantees, while effective at preventing collisions and unsafe behaviors, tend to restrict agile movement and can result in deadlocks in crowded, multi-agent scenarios. This poster introduces Universal Control Barrier Functions (UCBFs), a new class of control-theoretic tools designed to simultaneously ensure safety and prevent deadlocks by introducing Liveness Sets—a novel extension of traditional safety sets. The work demonstrates how UCBFs generalize classical Control Barrier Functions to provide both safety (set invariance) and liveness (ensuring progress), allowing for safe yet fluid motion in real-world dynamic environments. Preliminary simulations and hardware results on quadrupeds and other robots show that UCBFs enable agile behavior without compromising on safety. The framework is further extended to accommodate challenges such as limited actuation, lidar-only perception, decentralized multi-agent coordination, and neural approximations of control. These results underscore the versatility of UCBFs as a foundational control primitive for next-generation robotic systems deployed in cluttered, human-centric spaces.
|
|
14:45-15:15, Paper ThLB2R.29 | |
Autonomous Robotic Assistant for Knee Replacement Surgery |
|
Baweja, Paramjit Singh | Carnegie Mellon University |
Gupta, Shivangi | Carnegie Mellon University |
Yang, Liwei | Carnegie Mellon University |
Warrier, Abhishek | Carnegie Mellon University |
Wu, Qilin | Carnegie Mellon University |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics, Surgical Robotics: Planning
Abstract: The emergence of computer-assisted total knee arthroplasty (TKA) aims to reduce post-operative complications by improving surgical accuracy. Leveraging implant geometry and patient-specific bone shape, modern technology enables more precise bone preparation. However, current systems rely on infrared-based tracking for registration, which is invasive, cumbersome, and introduces additional hardware into the sterile field. These systems also impose strict line-of-sight constraints, obstructing critical areas within the surgical field and limiting surgeon mobility. In contrast, our work proposes a vision-based robotic alternative capable of performing accurate registration and autonomously drilling pilot holes for surgical pin placement, without reliance on external tracking hardware. This system represents a step toward fully self-contained surgical robots that simplify operating room setup, reduce intra-operative errors, and improve the accessibility of robotic assistance in orthopedic procedures.
|
|
14:45-15:15, Paper ThLB2R.30 | |
AURA: Painted Heart Beats |
|
Adhya, Angshu | University of Michigan, Ann Arbor |
Yang, Cindy | University of Michigan |
Wu, Emily | University of Michigan |
Hasan, Rishad | University of Michigan |
Narula, Abhishek | University of Michigan |
Alves-Oliveira, Patrícia | Amazon Lab126 |
Keywords: Human-Centered Robotics, Human Detection and Tracking, Art and Entertainment Robotics
Abstract: In this work we present AURA, a framework for synergetic human-artist painting. We developed a robot arm that collaboratively paints with a human artist. The robot has an awareness of the artist's heartbeat through the EmotiBit sensor, which provides the arousal levels of the painter. Given the heartbeat detected, the robot decides to increase proximity to the artist's workspace or retract. If a higher heartbeat is detected, which is associated with increased arousal in human artists, the robot will move away from that area of the canvas. If the artist's heart rate is detected as neutral, indicating the human artist's baseline state, the robot will continue its painting actions across the entire canvas. We also demonstrate and propose alternative robot-artist interactions using natural language and physical touch. This work combines the biometrics of a human artist to inform fluent artistic interactions.
|
|
ThDT1 |
302 |
Model Predictive Control |
Regular Session |
Chair: Lin, Ming C. | University of Maryland at College Park |
Co-Chair: Ding, Yanran | University of Michigan |
|
15:15-15:20, Paper ThDT1.1 | |
Time-Correlated Model Predictive Path Integral: Smooth Action Generation for Sampling-Based Control |
|
Lee, Minhyeong | Seoul National University |
Lee, Dongjun | Seoul National University |
Keywords: Motion and Path Planning, Integrated Planning and Control, Optimization and Optimal Control
Abstract: In this paper, we introduce time-correlated model predictive path integral (TC-MPPI), a novel approach to mitigate action noise in sampling-based control methods. Unlike conventional smoothing techniques that rely on post-processing or additional state variables, TC-MPPI directly incorporates temporal correlation of actions into stochastic optimal control, effectively enforcing quadratic costs on action derivatives. This reformulation enables us to generate smooth action sequences without extra modifications, using a time-correlated and conditional Gaussian sampling distribution. We demonstrate the effectiveness of our approach through simulations on various robotic platforms, including a pendulum, cart-pole, 2D bicopter, 3D quadcopter, and autonomous vehicle. Simulation videos are available at https://youtu.be/nWfJ2MAV2JI.
|
|
15:20-15:25, Paper ThDT1.2 | |
Gradient-Based Trajectory Optimization with Parallelized Differentiable Traffic Simulation |
|
Son, Sanghyun | University of Maryland |
Zheng, Laura | University of Maryland, College Park |
Clipp, Brian | Kitware Inc |
Greenwell, Connor | Kitware Inc |
Philip, Sujin | Kitware Inc |
Lin, Ming C. | University of Maryland at College Park |
Keywords: Simulation and Animation, Optimization and Optimal Control
Abstract: We present a parallelized differentiable traffic simulator based on the Intelligent Driver Model (IDM), a car-following framework that incorporates driver behavior as key variables. Our vehicle simulator efficiently models vehicle motion, generating trajectories that can be supervised to fit real-world data. By leveraging its differentiable nature, IDM parameters are optimized using gradient-based methods. With the capability to simulate up to 2 million vehicles in real time, the system is scalable for large-scale trajectory optimization. We show that we can use the simulator to filter noise in the input trajectories (trajectory filtering), reconstruct dense trajectories from sparse ones (trajectory reconstruction), and predict future trajectories (trajectory prediction), with all generated trajectories adhering to physical laws. We validate our simulator and algorithm on several datasets including NGSIM and Waymo Open Dataset. The code is publicly available at: https://github.com/SonSang/diffidm.
|
|
15:25-15:30, Paper ThDT1.3 | |
Swept Volume-Aware Trajectory Planning and MPC Tracking for Multi-Axle Swerve-Drive AMRs |
|
Hu, Tianxin | Nanyang Technological University |
Yuan, Shenghai | Nanyang Technological University |
Bai, Ruofei | Nanyang Technological University |
Xu, Xinhang | Nanyang Technological University |
Liao, Yuwen | Nanyang Technological University |
Liu, Fen | Nanyang Technological University |
Xie, Lihua | NanyangTechnological University |
Keywords: Integrated Planning and Control, Motion and Path Planning, Computational Geometry
Abstract: Multi-axle autonomous mobile robots (AMRs) are set to revolutionize the future of robotics in logistics. As the backbone of next-generation solutions, these robots face a critical challenge: managing and minimizing swept volume during turns while maintaining precise control. Traditional systems designed for standard vehicles often struggle with the complex dynamics of multi-axle configurations, leading to inefficiency and increased safety risk in confined spaces. Our innovative framework overcomes these limitations by combining swept volume minimization with Signed Distance Field (SDF) path planning and model predictive control (MPC) for independent wheel steering. This approach not only plans paths with an awareness of the swept volume, but actively minimizes it in real-time, allowing each axle to follow a precise trajectory while significantly reducing the space the vehicle occupies. By predicting future states and adjusting the turning radius of each wheel, our method enhances both maneuverability and safety, even in the most constrained environments. Unlike previous works, our solution goes beyond basic path calculation and tracking, offering real-time path optimization with minimal swept volume and efficient individual axle control. To our knowledge, this is the first comprehensive approach to tackle these challenges, delivering life-saving improvements in control, efficiency, and safety for multi-axle AMRs. Furthermore, we will open-source our work to foster collaboration and enable others to advance safer and more efficient autonomous systems.
|
|
15:30-15:35, Paper ThDT1.4 | |
Efficient Trajectory Generation Based on Traversable Planes in 3D Complex Architectural Spaces |
|
Zhang, Mengke | Zhejiang University |
Tian, Zhihao | Nanjing Institute of Technology |
Xia, Yaoguang | China Tobacco Zhejiang Industrial Co., Ltd |
Xu, Chao | Zhejiang University |
Gao, Fei | Zhejiang University |
Cao, Yanjun | Zhejiang University, Huzhou Institute of Zhejiang University |
Keywords: Motion and Path Planning, Field Robots, Nonholonomic Motion Planning
Abstract: With the increasing integration of robots into human life, their role in architectural spaces where people spend most of their time has become more prominent. While motion capabilities and accurate localization for automated robots have rapidly developed, the challenge remains to generate efficient, smooth, comprehensive, and high-quality trajectories in these areas. In this paper, we propose a novel efficient planner for ground robots to autonomously navigate in large complex multi-layered architectural spaces. Considering that traversable regions typically include ground, slopes, and stairs, which are planar or nearly planar structures, we simplify the problem to navigation within and between complex intersecting planes. We first extract traversable planes from 3D point clouds through segmenting, merging, classifying, and connecting to build a plane-graph, which is lightweight but fully represents the traversable regions. We then build a trajectory optimization based on motion state trajectory and fully consider special constraints when crossing multi-layer planes to maximize the robot's maneuverability. We conduct experiments in simulated environments and test on a CubeTrack robot in real-world scenarios, validating the method's effectiveness and practicality.
|
|
15:35-15:40, Paper ThDT1.5 | |
Model Predictive Control with Visibility Graphs for Humanoid Path Planning and Tracking against Adversarial Opponents |
|
Hou, Ruochen | UCLA |
Fernandez, Gabriel Ikaika | University of California Los Angeles |
Zhu, Mingzhang | University of California, Los Angeles |
Hong, Dennis | UCLA |
Keywords: Motion and Path Planning, Collision Avoidance, Optimization and Optimal Control
Abstract: In this paper we detail the methods used for obstacle avoidance, path planning, and trajectory tracking that helped us win the adult-sized, autonomous humanoid soccer league in RoboCup 2024. Our team was undefeated for all seated matches and scored 45 goals over 6 games, winning the championship game 6 to 1. During the competition, a major challenge for collision avoidance was the measurement noise coming from bipedal locomotion and a limited field of view (FOV). Furthermore, obstacles would sporadically jump in and out of our planned trajectory. At times our estimator would place our robot inside a hard constraint. Any planner in this competition must also be be computationally efficient enough to re-plan and react in real time. This motivated our approach to trajectory generation and tracking. In many scenarios long-term and short-term planning is needed. To efficiently find a long-term general path that avoids all obstacles we developed DAVG (Dynamic Augmented Visibility Graphs). DAVG focuses on essential path planning by setting certain regions to be active based on obstacles and the desired goal pose. By augmenting the states in the graph, turning angles are considered, which is crucial for a large soccer playing robot as turning may be more costly. A trajectory is formed by linearly interpolating between discrete points generated by DAVG. A modified version of model predictive control (MPC) is used to then track this trajectory called cf-MPC (Collision-Free MPC). This ensures short-term planning. Without having to switch formulations cf-MPC takes into account the robot dynamics and collision free constraints. Without a hard switch the control input can smoothly transition in cases where the noise places our robot inside a constraint boundary. The nonlinear formulation runs at approximately 120 Hz, while the quadratic version achieves around 400 Hz.
|
|
15:40-15:45, Paper ThDT1.6 | |
Learning Time-Optimal Online Replanning for Distributed Model Predictive Contouring Control of Quadrotors |
|
Guan, Xin | Zhejiang University |
Zhao, Fangguo | Zhejiang University |
Tian, Shunxin | Zhejiang University |
Li, Shuo | Zhejiang University |
Keywords: Motion and Path Planning, Aerial Systems: Mechanics and Control
Abstract: Achieving time-optimal flight in real time for multi-drone systems presents significant challenges, particularly in scenarios requiring rapid responses or aggressive maneuvers. This paper introduces a novel framework that bridges the gap between time-optimal polynomial trajectory generation and optimal control, facilitating efficient online replanning (100 Hz onboard) for multiple quadrotors. Specifically, the proposed method leverages a neural network to learn optimal time allocations for polynomial trajectories, which are then integrated with Model Predictive Contouring Control to fully exploit the dynamics of quadrotors. We further extend this approach to multi-drone systems, enabling collaborative high-speed flight with reciprocal collision avoidance. We benchmark the time-optimal performance and computational efficiency of our method in a drone racing scenario and demonstrate its effectiveness in agile cooperative flight within more constrained simulation and real-world environments. The results demonstrate that the proposed method achieves agile waypoint traverse at a speed of up to 19 m/s in simulation and up to 9 m/s in two-drone real-world scenario.
|
|
15:45-15:50, Paper ThDT1.7 | |
Predictive Control with Indirect Adaptive Laws for Payload Transportation by Quadrupedal Robots |
|
Amanzadeh, Leila | Virginia Tech University |
Chunawala, Taizoon Aliasgar | Virginia Polytechnic Institute and State University |
Fawcett, Randall | Virginia Polytechnic Institute and State University |
Leonessa, Alexander | Virginia Tech |
Akbari Hamed, Kaveh | Virginia Tech |
Keywords: Legged Robots, Motion Control, Multi-Contact Whole-Body Motion Planning and Control
Abstract: This paper formally develops a novel hierarchical planning and control framework for robust payload transportation by quadrupedal robots, integrating a model predictive control (MPC) algorithm with a gradient-descent-based adaptive updating law. At the framework's high level, an indirect adaptive law estimates the unknown parameters of the reduced-order (template) locomotion model under varying payloads. These estimated parameters feed into an MPC algorithm for real-time trajectory planning, incorporating a convex stability criterion within the MPC constraints to ensure the stability of the template model's estimation error. The optimal reduced-order trajectories generated by the high-level adaptive MPC (AMPC) are then passed to a low-level nonlinear whole-body controller (WBC) for tracking. Extensive numerical investigations validate the framework's capabilities, showcasing the robot's proficiency in transporting unmodeled, unknown static payloads up to 109% in experiments on flat terrains and 91% on rough experimental terrains. The robot also successfully manages dynamic payloads with 73% of its mass on rough terrains. Performance comparisons with a normal MPC and an L1-MPC indicate a significant improvement. Furthermore, comprehensive hardware experiments conducted in indoor and outdoor environments confirm the method’s efficacy on rough terrains despite uncertainties such as payload variations, push disturbances, and obstacles.
|
|
ThDT2 |
301 |
Learning-Based SLAM 3 |
Regular Session |
Chair: Leutenegger, Stefan | Technical University of Munich |
Co-Chair: Papalia, Alan | Massachusetts Institute of Technology |
|
15:15-15:20, Paper ThDT2.1 | |
M3DSS: A Multi-Platform, Multi-Sensor, and Multi-Scenario Dataset for SLAM System |
|
Huang, Shulei | Northeastern University |
Zhang, Haotian | Northeastern University |
Xu, Kang | Northeastern University |
Lv, Xianwei | Northeastern University |
Ma, Xiaoguang | Northeastern University |
Keywords: Data Sets for SLAM, SLAM, Visual-Inertial SLAM
Abstract: This paper proposed M3DSS, a multi-platform, multi-sensor, and multi-scenario dataset for Simultaneous Localization and Mapping (SLAM) systems. Fifty-five sequences were collected from multiple platforms, including a handheld equipment, an unmanned ground vehicle, a quadruped robot, a car, and an unmanned aerial vehicle. Sensors used in M3DSS included two pairs of stereo event cameras with resolutions of 640×480 and 346×260, one infrared camera, four RGB cameras, two visual-inertial sensors, four mechanical and one solid-state LiDARs, three inertial measurement units, two global navigation satellite and inertial navigation systems with real-time kinematic signals. 21 various sensors were used on 5 different platforms under various challenging scenarios, including extreme illumination, aggressive motion, low-texture, high-speed driving scenarios, etc. To the best of our knowledge, M3DSS offered the richest event-based sensory information for SLAM up to date. We comprehensively evaluated state-of-the-art SLAM approaches and identified their limitations on M3DSS. Details could be found at https://neufs-ma.github.io/M3DSS.
|
|
15:20-15:25, Paper ThDT2.2 | |
Uncertainty-Aware Visual-Inertial SLAM with Volumetric Occupancy Mapping |
|
Jung, Jaehyung | Technical University of Munich |
Boche, Simon | Technical University of Munich |
Barbas Laina, Sebastián | TU Munich |
Leutenegger, Stefan | Technical University of Munich |
Keywords: Visual-Inertial SLAM, SLAM, Mapping
Abstract: We propose visual-inertial simultaneous localization and mapping that tightly couples sparse reprojection errors, inertial measurement unit pre-integrals, and relative pose factors with dense volumetric occupancy mapping. Hereby depth predictions from a deep neural network are fused in a fully probabilistic manner. Specifically, our method is rigorously uncertainty-aware: first, we use depth and uncertainty predictions from a deep network not only from the robot's stereo rig, but we further probabilistically fuse motion stereo that provides depth information across a range of baselines, therefore drastically increasing mapping accuracy. Next, predicted and fused depth uncertainty propagates not only into occupancy probabilities but also into alignment factors between generated dense submaps that enter the probabilistic nonlinear least squares estimator. This submap representation offers globally consistent geometry at scale. Our method is thoroughly evaluated in two benchmark datasets, resulting in localization and mapping accuracy that exceeds the state of the art, while simultaneously offering volumetric occupancy directly usable for downstream robotic planning and control in real-time.
|
|
15:25-15:30, Paper ThDT2.3 | |
Real-Time 3D Reconstruction Via Camera-LIDAR (2D) Fusion for Mobile Robots: A Gaussian Splatting Approach |
|
Sandula, Ajay Kumar | Indian Institute of Science, Bengaluru |
Damodaran, Shriram | National Institute of Technology, Jalandhar, India |
Nagaraj, Suhas | University of Maryland, College Park |
Ghose, Debasish | Indian Institute of Science |
Biswas, Pradipta | Indian Institute of Science |
Keywords: Visual-Inertial SLAM, Mapping, Sensor Fusion
Abstract: We present a novel 3D reconstruction-based SLAM (Simultaneous Localization and Mapping) approach for robots that leverage multimodal sensory input data, including a camera and a 2D lidar. By integrating these inputs with the gaussian splatting technique, our method significantly enhances performance over traditional SLAM approaches. Traditional SLAM techniques often struggle with the limitations of monocular vision and fail to accurately map and locate objects in dynamic and cluttered environments. Purely relying on camera to localize the robot and map creation is challenging in the presence of dynamic obstacles in the scene. To address this, we proposed a multimodal sensor fusion based 3D reconstruction. Our approach employs lidar-based localization to achieve precise positioning of both the camera and the robot, while utilizing the gaussian splatting technique for robust environmental mapping and 3D reconstruction. This approach is robust to dynamic obstacles in the scene. We have conducted extensive experiments in various real-world and simulated environments, demonstrating that our method not only outperforms traditional monocular SLAM approaches but also achieves higher accuracy in terms of localization and constructed map. Our results demonstrate substantial improvements in 3D reconstruction for mobile robots, achieving reduced computational load, higher FPS and enhanced scaling accuracy
|
|
15:30-15:35, Paper ThDT2.4 | |
DVN-SLAM: Dynamic Visual Neural SLAM Based on Local-Global Encoding |
|
Wu, Wenhua | Shang Hai Jiao Tong University |
Wang, Guangming | University of Cambridge |
Deng, Ting | Imperial College London |
Aegidius, Sebastian | University College London |
Shanks, Stuart | University College London |
Modugno, Valerio | University College London |
Kanoulas, Dimitrios | University College London |
Wang, Hesheng | Shanghai Jiao Tong University |
Keywords: SLAM, Mapping, Localization
Abstract: Recent research on Simultaneous Localization and Mapping (SLAM) based on implicit representation has shown promising results in indoor environments. However, some challenges remain: the limited scene representation capability of implicit encoding, the uncertainty in the rendering process from implicit representations, and the disruption of consistency by dynamic objects. To address these challenges, we propose a dynamic visual SLAM system based on local-global fusion neural implicit representation, named DVN-SLAM. To improve the scene representation capability, we introduce a local-global fusion neural implicit representation that enables the construction of an implicit map while considering both global structure and local details. To tackle uncertainties arising from the rendering process, we design an information concentration loss for optimization, aiming to concentrate scene information on object surfaces. The proposed DVN-SLAM achieves competitive performance in localization and mapping across multiple datasets. More importantly, DVN-SLAM demonstrates robustness without semantic and optical flow prior in dynamic scenes, which sets it apart from other NeRF-based methods.
|
|
15:35-15:40, Paper ThDT2.5 | |
Dy3DGS-SLAM: Monocular 3DGS-SLAM System for Dynamic Environments |
|
Li, Mingrui | Dalian University of Technology |
Zhou, Yiming | Saarland University of Applied Science |
Zhou, Hongxing | Beijing University of Chemical Technology |
Hu, Xinggang | Dalian University of Technology |
Roemer, Florian | Fraunhofer IZFP |
Wang, Hongyu | Dalian University of Technology |
Osman, Ahmad | Htw Saar |
Keywords: SLAM, Mapping, Localization
Abstract: The current SLAM methods based on NeRF or 3DGS have shown impressive results in reconstructing ideal static 3D scenes. However, they perform poorly in tracking and reconstruction when facing more challenging dynamic environments, such as real-world scenes involving dynamic elements. Although some NeRF-based SLAM methods have attempted to address these dynamic challenges, they rely on RGB-D inputs, and there is a lack of methods that work with pure RGB input. To address these challenges, we introduce Dy3DGS-SLAM, the first 3DGS-SLAM method for dynamic scenes using monocular RGB input. For tracking, our method first acquires dynamic object masks through an optical flow estimation system, then combines them with a monocular depth estimation system to obtain merged masks and recover scale. This allows us to remove dynamic objects from non-predefined scenes, enabling dense frame-to-frame mapping. For rendering, we prune the Gaussians generated by pixels with dynamic masks, while applying a scale regularizer to avoid Gaussian artifacts. We impose additional photometric, geometric, and uncertainty losses on the proxy depth to improve rendering accuracy. Experimental results show that our method achieves state-of-the-art (SOTA) tracking and rendering results in dynamic environments, while also being competitive with or outperforming RGB-D methods.
|
|
15:40-15:45, Paper ThDT2.6 | |
SGBA: Semantic Gaussian Mixture Model-Based LiDAR Bundle Adjustment |
|
Ji, Xingyu | Nanyang Technological University |
Yuan, Shenghai | Nanyang Technological University |
Li, Jianping | Nanyang Technological University |
Yin, Pengyu | Nanyang Technological University |
Cao, Haozhi | Nanyang Technological University |
Xie, Lihua | NanyangTechnological University |
Keywords: Mapping, Localization, SLAM
Abstract: LiDAR bundle adjustment (BA) is an effective approach to reduce the drifts in pose estimation from the front-end. Existing works on LiDAR BA usually rely on predefined geometric features for landmark representation. This reliance restricts generalizability, as the system will inevitably deteriorate in environments where these specific features are absent. To address this issue, we propose SGBA, a LiDAR BA scheme that models the environment as a semantic Gaussian mixture model (GMM) without predefined feature types. This approach encodes both geometric and semantic information, offering a comprehensive and general representation adaptable to various environments. Additionally, to limit computational complexity while ensuring generalizability, we propose an adaptive semantic selection framework that selects the most informative semantic clusters for optimization by evaluating the condition number of the cost function. Lastly, we introduce a probabilistic feature association scheme that considers the entire probability density of assignments, which can manage uncertainties in measurement and initial pose estimation. We have conducted various experiments and the results demonstrate that SGBA can achieve accurate and robust pose refinement even in challenging scenarios with low-quality initial pose estimation and limited geometric features. We plan to open source the work for the benefit of the community @ https://github.com/Ji1Xinyu/SGBA.
|
|
15:45-15:50, Paper ThDT2.7 | |
GeoRecon: Geometric Coherence for Online 3D Scene Reconstruction from Monocular Video |
|
Wang, Yanmei | Chinese Academy of Sciences |
Chu, Fupeng | Chinese Academy of Sciences |
Han, Zhi | Shenyang Institute of Automation, Chinese Academy of Sciences |
Tang, Yandong | Shenyang Institute of Automation, CAS |
Keywords: Mapping, Cognitive Modeling
Abstract: Online 3D scene reconstruction from monocular video aims to incrementally recover 3D mesh from monocular RGB videos.It enables robots to accomplish tasks involving interactions with the environment.Due to the high memory consumption of 3D data,almost all existing methods adopt the coarse-to-fine architecture,in which the voxel is progressively sparsified and split across levels.However,these methods overlook alignment between different levels,resulting in poor geometric properties of reconstructed scene.Furthermore,the whole framework relies on voxel features for supervision, lacking effective supervision of the image geometric features extracted by the feature extraction network.These geometric features are essential for further 3D scene reconstruction. To tackle the above problems,we propose GeoRecon,which achieves geometric coherent reconstruction through keyframe 2D representation self-regression and cross-level 3D voxel fea- ture alignment.Specifically,for 2D image space,to alleviate the lack of supervision in 2D feature extraction,an image recon- struction self-supervision regression constraint is introduced on the input 2D keyframes to ensure that the extracted features can learn accurate geometric features and further voxel features. For 3D voxel features space,to achieve consistent alignment between different levels,the high-level voxel features are used to constrain low-level voxel features,and achieve alignment from coarse (i.e.,low-level)voxel features to fine (i.e.,high-level)voxel features.With the design of these two components,the proposed method effectively reconstructs the geometric structures of the scene.The experimental results demonstrate the effectiveness of the proposed method.
|
|
ThDT3 |
303 |
Space Robotics 2 |
Regular Session |
Chair: Janabi-Sharifi, Farrokh | Ryerson University |
Co-Chair: Vidal-Calleja, Teresa A. | University of Technology Sydney |
|
15:15-15:20, Paper ThDT3.1 | |
AstroLoc2: Fast Sequential Depth-Enhanced Localization for Free-Flying Robots |
|
Soussan, Ryan | Aerodyne Industries |
Moreira, Marina | Instituto Superior Técnico, Lisbon University |
Coltin, Brian | Carnegie Mellon University |
Smith, Trey | NASA Ames Research Center |
Keywords: Space Robotics and Automation, Vision-Based Navigation, Localization
Abstract: We present AstroLoc2, a monocular and time-of-flight (ToF) visual-inertial graph-based localizer used by the Astrobee free-flying robots on the International Space Station (ISS). AstroLoc2 sequentially performs odometry and absolute localization in a single process to decouple map noise from velocity and IMU bias estimation and run efficiently on resource constrained platforms. It improves monocular visual-inertial odometry robustness by adding ToF correspondence factors and uses adaptive map-matching to increase image registration reliability in dynamic environments while preserving fast matching in static ones. We evaluate the performance of AstroLoc2 on a public dataset of 10 ISS activities and show that it improves localization accuracy by 16% and success rates by 5.5% while maintaining a faster runtime than leading methods. AstroLoc2 has enabled the Astrobee robots to perform higher precision maneuvers in changing environments on the ISS. It can be configured for other limited computation platforms and we release the source code to the public.
|
|
15:20-15:25, Paper ThDT3.2 | |
Mixing Data-Driven and Geometric Models for Satellite Docking Port State Estimation Using an RGB or Event Camera |
|
Le Gentil, Cedric | University of Toronto |
Naylor, Jack | University of Sydney |
Munasinghe, Nuwan | University of Technology Sydney (UTS) |
Mehami, Jasprabhjit | University of Technology Sydney |
Dai, Benny | University of Technology Sydney |
Asavkin, Mikhail | ANT61 |
Dansereau, Donald | University of Sydney |
Vidal-Calleja, Teresa A. | University of Technology Sydney |
Keywords: Space Robotics and Automation, Visual Tracking, Deep Learning for Visual Perception
Abstract: In-orbit automated servicing is a promising path towards lowering the cost of satellite operations and reducing the amount of orbital debris. For this purpose, we present a pipeline for automated satellite docking port detection and state estimation using monocular vision data from standard RGB sensing or an event camera. Rather than taking snapshots of the environment, an event camera has independent pixels that asynchronously respond to light changes, offering advantages such as high dynamic range, low power consumption and latency. This work focuses on satellite-agnostic operations (only a geometric knowledge of the actual port is required) using the recently released Lockheed Martin Mission Augmentation Port (LM-MAP) as the target. By leveraging shallow data-driven techniques to preprocess the incoming data to highlight the LM-MAP's reflective navigational aids and then using basic geometric models for state estimation, we present a lightweight and data-efficient pipeline that can be used independently with either RGB or event cameras. We demonstrate the soundness of the pipeline and perform a quantitative comparison of the two modalities based on data collected with a photometrically accurate test bench that includes a robotic arm to simulate the target satellite's uncontrolled motion. The data has been made publicly available: https://uts-ri.github.io/rgb_event_docking_port/
|
|
15:25-15:30, Paper ThDT3.3 | |
A Visual Servo System for Robotic On-Orbit Servicing Based on 3D Perception of Non-Cooperative Satellite |
|
Zhao, Panpan | Shandong University |
Jin, Li | Shandong University |
Chen, Yeheng | Zhejiang Lab |
Li, Jiachen | Zhejiang University |
Song, Xiuqiang | Shandong University, China; Engineering Research Center of Digit |
Chen, Wenxuan | Zhejiang Lab |
Li, Nan | Technology and Engineering Center for Space Utilization, Chinese |
Du, Wenjuan | Zhejiang Lab |
Ma, Ke | Zhejiang Lab |
Wang, Xiaokun | Zhejianglab |
Li, Yuehua | Zhejiang Lab |
Xiangxu, Xiangxu | Shandong University |
Qin, Xueying | Shandong University |
Keywords: Space Robotics and Automation, Perception for Grasping and Manipulation, Visual Servoing
Abstract: The 3D perception of satellites, including both their shape and pose, is a key foundation for robotic on-orbit servicing. However, the demanding space environment—such as intense and dim illumination—presents significant challenges. Previous non-cooperative methods focus on specific geometric features like solar panel brackets or docking rings, overlooking the satellite's overall shape and increasing the risk of collisions during grasping. Additionally, satellites are often weakly textured, limiting the accuracy of 3D perception. To address these issues, we propose, for the first time, a 3D perception-based visual servo system of non-cooperative satellites. This system combines reconstruction and tracking to enhance shape perception and pose estimation accuracy in orbital conditions. Specifically, we employ an alternating iterative strategy to simultaneously reconstruct and track the satellite and introduce a novel constraint to fuse different cues under extreme conditions. Further, we develop a simulation environment platform, a dual-arm microgravity grasping system, and an online monitoring module to enhance system capabilities for on-orbit servicing. Synthetic and real-world datasets from the simulation environment are also created for experimental validation. Results show that each module of our system achieves state-of-the-art performance.
|
|
15:30-15:35, Paper ThDT3.4 | |
A Control Strategy for an Orbital Manipulator Equipped with an External Actuator at the End-Effector |
|
Sena, Francesco | German Aerospace Center (DLR) |
Mishra, Hrishik | German Aerospace Center (DLR) |
Vijayan, Ria | German Aerospace Center (DLR) |
De Stefano, Marco | German Aerospace Center (DLR) |
Keywords: Space Robotics and Automation, Motion Control, Dynamics
Abstract: This paper exploits the robotic capabilities of an orbital manipulator equipped with an actuation module at its end-effector to perform close-proximity robotic operations. The proposed control strategy enables repositioning the system’s center-of-mass by reconfiguring the manipulator configuration and using the end-effector-mounted thrusting mechanism to achieve displacement. The key advantage of the proposed method is that the plume impingement due to thruster firing of the servicer satellite in close-proximity operations towards the client is mitigated. This is achieved by regulating the internal motion of the manipulator such that the thrust firing does not occur near the space asset. The effectiveness of the controller is verified through a multibody dynamic simulation of an orbital manipulator.
|
|
15:35-15:40, Paper ThDT3.5 | |
Robotic Space Simulator: Controls Implementation for Auxiliary Axes and Zero-G Dynamics |
|
Hilburn, Eddie | Texas A&M University |
Pettinger, Adam | Texas A&M University |
Wilkinson, Emily | Texas A&M University |
Lansdowne, Ian | Texas A&M University |
Ambrose, Robert | Texas A&M University |
Keywords: Space Robotics and Automation, Force Control, Parallel Robots
Abstract: The Robotic Space Simulator was developed as a physical simulation for in-space manipulation tasks. It incorporates external inputs to its dynamics simulation via force/torque sensors mounted to the 2 6-DoF Stewart platforms which compose its primary structure. Each platform is augmented with an additional degree of freedom in the form of an auxiliary axis - one in translation and one in rotation. Previous work has not effectively included the additional workspace provided by these auxiliary axes. Additionally, it limited the use of external force/torque inputs to the case of platform translation only because the external forces/torques due to platform motion and gravitational force were not removed from the sensor inputs prior to inclusion in the dynamic simulation. In this work, we address each of these limitations. We develop and test two methods of auxiliary axis control: Cartesian Workspace and Joint Cost-Function, and find that both methods are an improvement over the existing system. Additionally we develop and test a method for calculating the mass properties of hardware mounted to the force/torque sensors and a dynamics compensation method for this hardware. Using this technique we are able to effectively compensate for gravitational force in different platform orientations, and achieve zero-g behavior of the system.
|
|
15:40-15:45, Paper ThDT3.6 | |
Dynamics, Simulation & Control of Orbital Modules for On-Orbit Assembly |
|
Mishra, Hrishik | German Aerospace Center (DLR) |
Vicariotto, Tommaso | Politecnico Di Milano |
De Stefano, Marco | German Aerospace Center (DLR) |
Keywords: Space Robotics and Automation, Motion Control, Multi-Robot Systems
Abstract: In the context of in-orbit assembly, modular building blocks offer the advantage of distributed launches. After the orbit injection, the overall motion control requires the individual modules to approach each other while regulating their relative shape and total formation. This kind of formation control has already been addressed for rigid body modules. However, in practical cases, each module might be a multibody (with rotors) system. To address the control problem for such a fleet of fixed-inertia multibody modules, we propose a novel dynamics formulation that is inertia-decoupled, singularity-free, and invariant of their absolute poses. We extend the passive decomposition theory for deriving new representative systems corresponding to the total momentum (locked) and relative shape variations. We exploit the dynamics to design two distinct control laws with complementary mission benefits to regulate the locked and relative motions. We also leverage the proposed formulation to design a Hardware-in-the-Loop (HIL) framework, in which the facility reproduced the relative motions while total momentum was propagated in software. Furthermore, the proposed HIL framework and the motion control are experimentally validated.
|
|
15:45-15:50, Paper ThDT3.7 | |
Int-Ball2: On-Orbit Demonstration of Autonomous Intravehicular Flight and Docking for Image Capturing and Recharging |
|
Hirano, Daichi | Japan Aerospace Exploration Agency |
Mitani, Shinji | JAXA |
Watanabe, Keisuke | Japan Aerospace Exploration Agency |
Nishishita, Taisei | Japan Aerospace Exploration Agency |
Yamamoto, Tatsuya | Japan Aerospace Exploration Agency (JAXA) |
Yamaguchi, Seiko Piotr | Japan Aerospace Exploration Agency (JAXA) |
Keywords: Space Robotics and Automation, Aerial Systems: Mechanics and Control, Motion Control
Abstract: This article presents the system architecture and the orbital demonstration results of the Int-Ball2, a free-flying camera robot developed by the Japan Aerospace Exploration Agency (JAXA). The purpose of the Int-Ball2 project is to assist astronauts and reduce their workload in the International Space Station (ISS). This robot is an upgrade from the first Int-Ball, enhancing the propulsion subsystem for greater maneuverability and adding a new docking station (DS) for autonomous battery recharging. This study performed comprehensive ground tests for autonomous maneuvering and docking, employing a combination of a fully software-based simulator,a hardware-in-the-loop (HIL) simulator, and a planar air-bearing facility. After a successful launch to the ISS, the Int-Ball2 demonstrated its ability to work in microgravity without relying on astronaut support. The results obtained from ground and orbital tests underscored the effectiveness of our system design and ground verification approach. Further, we present key technologies essential for the Int-Ball2's successful implementation on board the ISS. We expect the insights from this project to be invaluable to future missions involving free-flying robots in microgravity.
|
|
ThDT4 |
304 |
Bioinspiration and Biomimetics 2 |
Regular Session |
Chair: Hasegawa, Yasuhisa | Nagoya University |
Co-Chair: Ozkan-Aydin, Yasemin | University of Notre Dame |
|
15:15-15:20, Paper ThDT4.1 | |
Harnessing Flagella Dynamics for Enhanced Robot Locomotion at Low Reynolds Number |
|
Chikere, Nnamdi | University of Notre Dame |
Ozkan-Aydin, Yasemin | University of Notre Dame |
Keywords: Biologically-Inspired Robots, Biomimetics, Soft Robot Applications
Abstract: Navigating environments with low Reynolds numbers (Re), where viscous forces dominate, presents unique challenges, such as the need for non-reciprocal motion dynamics. Microorganisms like algae and bacteria, with their specialized structures such as asymmetrical and flexible cilia and flagella, inspire efficient propulsion in such media. However, the mechanism for enhancing the propulsion speed of these microorganisms remains not fully understood. This study introduces a quadriflagellated, algae-inspired, cable-driven robot that mirrors these biological locomotion mechanisms. A single DC motor actuates four multi-segmented flagella, modulating their stiffness throughout the propulsion cycle. We focus on enhancing propulsion speed, hypothesizing that strategic flexibility alterations in flagella—increased during the backward stroke and decreased during the forward stroke—significantly improve propulsion speed. Our experimental results confirm this, showing a marked improvement in propulsion speed, achieving a rate of 0.7+-0.11 cm/cycle. Additionally, we explore the impact of flagella length and number on propulsion, providing valuable insights for biomedical and microfluidic research applications.
|
|
15:20-15:25, Paper ThDT4.2 | |
Development of Multi-Joint Biohybrid Soft Robot by Using Skeletal Muscle Tissue |
|
Kim, Eunhye | Nagoya University |
Takeuchi, Masaru | Nagoya University |
Hasegawa, Yasuhisa | Nagoya University |
Fukuda, Toshio | Nagoya University |
Keywords: Biological Cell Manipulation, Micro/Nano Robots, Soft Sensors and Actuators
Abstract: Various forms of biohybrid robots have been developed; however, creating robots with multiple degrees of freedom remains a challenging task. In this paper, we developed a multi-joint biohybrid robot by using skeletal muscle tissue. To achieve this, we first developed a modular bio-actuator actuated by skeletal muscle tissues. The objective of this study was to enhance the contraction force of the actuator and establish optimal experimental conditions for creating high-performance robots. By applying continuous electrical stimulation for five days during culture of bio-actuator, we were able to increase the contraction force by more than threefold. Additionally, we determined the appropriate electric field based on the electrode distance, which enabled us to establish an optimal experimental setup. We also confirmed that connecting the actuators in series can significantly increase the moving distance. Connecting two actuators in series resulted in a total movement distance equivalent to the sum of the distances of each actuator. This finding suggests the potential to create robots with a larger operational workspace. Using these actuators, we first constructed a manipulator with a rotational joint. This research is expected to contribute not only to the development of various robots utilizing bio-actuators but also to advancements in biology technology.
|
|
15:25-15:30, Paper ThDT4.3 | |
A Novel Underwater Robot with Carangiform Locomotion Achieved Via Single Degree of Actuation and Magnetically Transmitted Traveling Wave |
|
Manduca, Gianluca | Scuola Superiore Sant'Anna |
Luca, Padovani | Sapienza |
Santaera, Gaspare | Sant'Anna School of Advanced Studies |
Graziani, Giorgio | Sapienza University, Rome |
Dario, Paolo | Scuola Superiore Sant'Anna |
Romano, Donato | Scuola Superiore Sant’Anna |
Stefanini, Cesare | Scuola Superiore Sant'Anna |
Keywords: Biologically-Inspired Robots, Marine Robotics, Mechanism Design
Abstract: The phenomenon of the “traveling wave,” commonly observed in various organisms, involves a wave that propagates along the body, serving as a locomotion mechanism. Particularly, in aquatic environments, organisms such as fish and cetaceans utilize traveling waves to propel themselves through water, minimizing fluid drag and maximizing movement efficiency. Inspired by nature, robotics has extensively explored replicating such locomotion strategies. This work presents a fish robot with an innovative magnetic transmission system. The mechanism transforms the unidirectional rotation of a single motor into an oscillatory, phase-shifted movement across the modules of the kinematic chain, generating a traveling wave along the body. The robot’s design and functionality are detailed, highlighting advancements in bio-inspired robotics for underwater applications, such as efficient and non-invasive monitoring and exploration of marine ecosystems. The fish robot achieved a swimming speed of approximately 2 body lengths per second (BL/s) with a tail-beat frequency of 3.24 Hz and a minimum Cost of Transport (CoT) of 5.33 J/(kg·m). Biomimetic robotics can play a key role in sustainable aquafarming, biodiversity conservation, and animal-robot interaction research, offering the potential to minimize ecosystem disruption and advance marine science.
|
|
15:30-15:35, Paper ThDT4.4 | |
AquaMILR: Mechanical Intelligence Simplifies Control of Undulatory Robots in Cluttered Fluid Environments |
|
Wang, Tianyu | Georgia Institute of Technology |
Mankame, Nishanth | Georgia Institute of Technology |
Fernandez, Matthew | Georgia Institute of Technology |
Kojouharov, Velin | Georgia Institute of Technology |
Goldman, Daniel | Georgia Institute of Technology |
Keywords: Biologically-Inspired Robots, Redundant Robots, Search and Rescue Robots
Abstract: While undulatory swimming of elongate limbless robots has been extensively studied in open hydrodynamic environments, less research has been focused on limbless locomotion in complex, cluttered aquatic environments. Motivated by the concept of mechanical intelligence, where controls for obstacle navigation can be offloaded to passive body mechanics in terrestrial limbless locomotion, we hypothesize that principles of mechanical intelligence can be extended to cluttered hydrodynamic regimes. To test this, we developed an untethered limbless robot capable of undulatory swimming on water surfaces, utilizing a bilateral cable-driven mechanism inspired by organismal muscle actuation morphology to achieve programmable anisotropic body compliance. We demonstrated through robophysical experiments that, similar to terrestrial locomotion, an appropriate level of body compliance can facilitate emergent swim through complex hydrodynamic environments under pure open-loop control. Moreover, we found that swimming performance depends on undulation frequency, with effective locomotion achieved only within a specific frequency range. This contrasts with highly damped terrestrial regimes, where inertial effects can often be neglected. Further, to enhance performance and address the challenges posed by nondeterministic obstacle distributions, we incorporated computational intelligence by developing a real-time body compliance tuning controller based on cable tension feedback. This controller improves the robot's robustness and overall speed in heterogeneous hydrodynamic environments.
|
|
15:35-15:40, Paper ThDT4.5 | |
Ambient Flow Perception of Freely Swimming Robotic Fish Using an Artificial Lateral Line System |
|
Dai, Hongru | Shanghaitech University |
Lin, Xiaozhu | ShanghaiTech University |
Chao, Kaitian | ShanghaiTech University |
Wang, Yang | Shanghaitech University |
Keywords: Biologically-Inspired Robots, Bioinspired Robot Learning, Marine Robotics
Abstract: Robotic fish hold significant promise as efficient underwater systems, yet their inability to accurately perceive ambient flow hinders their deployment in real-world scenarios. Inspired by the natural lateral line system(LLS), a flowresponsive organ in fish that plays a crucial role in behaviors such as rheotaxis, this paper introduces the first Artificial Lateral Line System (ALLS)-based ambient flow classifier for robotic fish that allows robotic fish to perceive flow fields while swimming freely. To be specific, using just 5 pressure sensors and 3.5 minutes of swimming data, we trained a Long Short-Term Memory (LSTM) network, achieving a classification accuracy of 81.25% across 8 flow speed categories, ranging from 0.08 m/s to 0.18 m/s. A key innovation of this work is the formulation of ambient flow perception as a classification task, which not only enables the robotic fish to extract meaningful information but also enhances the robustness and generalizability of the perception framework. Extensive experiments further identify critical factors such as affecting the effectiveness of the ambient flow classifier, offering valuable insights for future development.
|
|
15:40-15:45, Paper ThDT4.6 | |
Leader-Follower Formation Enabled by Pressure Sensing in Free-Swimming Undulatory Robotic Fish |
|
Panta, Kundan | The Pennsylvania State University |
Deng, Hankun | Penn State University |
DeLattre, Micah | Penn State University |
Cheng, Bo | Pennsylvania State University |
Keywords: Biologically-Inspired Robots, Imitation Learning, Marine Robotics
Abstract: Fish use their lateral lines to sense flows and pressure gradients, enabling them to detect nearby objects and organisms. Towards replicating this capability, we demonstrated successful leader-follower formation swimming using flow pressure sensing in our undulatory robotic fish (µBot/MUBot). The follower µBot is equipped at its head with bilateral pressure sensors to detect signals excited by both its own and the leader's movements. First, using experiments with static formations between an undulating leader and a stationary follower, we determined the formation that resulted in strong pressure variations measured by the follower. This formation was then selected as the desired formation in free swimming for obtaining an expert policy. Next, a long short-term memory neural network was used as the control policy that maps the pressure signals along with the robot motor commands and the Euler angles (measured by the onboard IMU) to the steering command. The policy was trained to imitate the expert policy using behavior cloning and Dataset Aggregation (DAgger). The results show that with merely two bilateral pressure sensors and less than one hour of training data, the follower effectively tracked the leader within distances of up to 200 mm (= 1 body length) while swimming at speeds of 155 mm/s (= 0.8 body lengths/s). This work highlights the potential of fish-inspired robots to effectively navigate fluid environments and achieve formation swimming through the use of flow pressure feedback.
|
|
15:45-15:50, Paper ThDT4.7 | |
Analysis of Kinematics and Propulsion of a Self-Sensing Multi-DoF Undulating Soft Robotic Fish |
|
Park, Myungsun | University of California San Diego |
Cervera Torralba, Jacobo | University of California, San Diego |
Adibnazari, Iman | University of California, San Deigo |
Pawlak, Geno | UC San Diego |
Tolley, Michael T. | University of California, San Diego |
Keywords: Biologically-Inspired Robots, Soft Robot Applications, Marine Robotics
Abstract: In this paper we explore kinematics ranging from anguilliform to thunniform achieved in a self-sensing multi-degree-of-freedom soft robotic fish and analyze the effect of them on the swimming. First, we examine the characteristics of the bending actuators of the robotic fish. Then, we express the kinematics of the fish as a propagating wave parameterized by three bending amplitudes and a wavelength, which are determined by the flow rates and phase shift of the pumps. We capture various motion patterns generated by different actuator inputs and directly measure the thrust generated by each pattern. We observe that the robotic swimmer can reproduce two different modes of propulsion, that are embodied by two distinct morphological patterns in nature: anguilliform and thunniform. When neither of modes are activated, propulsion is zero or even negative. Finally, we estimate the stationary swimming speed by towing the undulating fish, which satisfies the slip condition (with the speed of the body wave matching the swimming velocity). The analysis of a wide range of kinematic patterns in this study, including two extreme cases of anguilliform and thunniform modes, will provide insights for comprehensive understanding the mechanics of efficient swimming.
|
|
ThDT5 |
305 |
Model Predictive Control for Legged Robots 2 |
Regular Session |
Chair: Wensing, Patrick M. | University of Notre Dame |
Co-Chair: Park, Hae-Won | Korea Advanced Institute of Science and Technology |
|
15:15-15:20, Paper ThDT5.1 | |
Model Predictive Parkour Control of a Monoped Hopper in Dynamically Changing Environments |
|
Albracht, Maximilian | German Aerospace Center (DLR) |
Kumar, Shivesh | DFKI GmbH |
Vyas, Shubham | Robotics Innovation Center, DFKI GmbH |
Kirchner, Frank | University of Bremen |
Keywords: Legged Robots, Optimization and Optimal Control, Underactuated Robots
Abstract: A great advantage of legged robots is their ability to operate on particularly difficult and obstructed terrain, which demands dynamic, robust, and precise movements. The study of obstacle courses provides invaluable insights into the challenges legged robots face, offering a controlled environment to assess and enhance their capabilities. Traversing it with a one-legged hopper introduces intricate challenges, such as planning over contacts and dealing with flight phases, which necessitates a sophisticated controller. A novel model predictive parkour controller is introduced, that finds an optimal path through a real-time changing obstacle course with mixed integer motion planning. The execution of this optimized path is then achieved through a state machine employing a PD control scheme with feedforward torques, ensuring robust and accurate performance.
|
|
15:20-15:25, Paper ThDT5.2 | |
Humanoid Walking Stabilization Via Model Predictive Control with Step Adjustment Based on the 3D Divergent Component of Motion |
|
Park, Gyeongjae | Seoul National University |
Kim, Myeong-Ju | Hyundai Motor Company |
Lee, Kwanwoo | Seoul National University |
Park, Jaeheung | Seoul National University |
Keywords: Humanoid and Bipedal Locomotion, Body Balancing, Legged Robots
Abstract: In this paper, as an approach to stabilize humanoid walking where the height of CoM varies, a Novel Model Predictive Control framework based on three dimensional Divergent Component of Motion (3D-DCM) is proposed. To ensure the feasible utilization of contact forces for maintaining humanoid balance, constraints on the control inputs, Virtual Repellent Point (VRP) and footstep adjustment, and their correlation are analytically formulated into a quadratic form, resulting a Quadratically Constrained Quadratic Programming. Additionally, to enable the humanoid robot to withstand disturbances over a broader range of strides or safely navigates various terrains without encountering knee stretch, the distance between the CoM and the foot is constrained in the 3D-CoM trajectory planner. The effectiveness of the proposed method is validated through simulations and real-robot experiments in scenarios involving external disturbances and step down motions.
|
|
15:25-15:30, Paper ThDT5.3 | |
MPC-QP-Based Control Framework for Compliant Behavior of Humanoid Robots in Physical Collaboration with Humans |
|
Kumbhar, Shubham | University of Delaware |
Artemiadis, Panagiotis | University of Delaware |
Keywords: Legged Robots, Human-Robot Collaboration
Abstract: We present a control framework specifically for physical human-humanoid collaboration involving the transportation and manipulation of heavy objects. Using this framework, the humanoid can exhibit desired levels of compliance with the object to be co-transported. This desired compliance is achieved through an admittance model. A Model Predictive Control (MPC) problem, based on a novel Interaction Linear Inverted Pendulum (I-LIP) model, generates footstep patterns that facilitate this desired compliant behavior while keeping the robot stable. Subsequently, we have an object-informed low-level quadratic program (QP) that sends control input to realize the high-level plans on the robot. The stiffness parameters of the I-LIP are modulated in real time for better compliance tracking performance of the robot. We verify all the results through simulation on the humanoid platform, the Digit, showing the prowess of the framework in collaboratively transporting heavy objects with a human.
|
|
15:30-15:35, Paper ThDT5.4 | |
Real-Time Whole-Body Control of Legged Robots with Model-Predictive Path Integral Control |
|
Alvarez Padilla, Juan Rodolfo | Carnegie Mellon University |
Zhang, John | Carnegie Mellon University |
Kwok, Sofia | Carnegie Mellon University |
Dolan, John M. | Carnegie Mellon University |
Manchester, Zachary | Carnegie Mellon University |
Keywords: Legged Robots, Multi-Contact Whole-Body Motion Planning and Control, Motion Control
Abstract: This paper presents a system for enabling real-time synthesis of whole-body locomotion and manipulation policies for real-world legged robots. Motivated by recent advancements in robot simulation, we leverage the efficient parallelization capabilities of the MuJoCo simulator on a multi-core CPU to achieve fast sampling over the robot state and action trajectories. Our results show surprisingly effective real-world locomotion and manipulation capabilities with a very simple control strategy. We demonstrate our approach on several hardware and simulation experiments: robust locomotion over flat and uneven terrains, climbing over a box whose height is comparable to the robot, and pushing a box to a goal position. To our knowledge, this is the first successful deployment of whole-body sampling-based MPC on real-world legged robot hardware.
|
|
15:35-15:40, Paper ThDT5.5 | |
Wallbounce: Push Wall to Navigate with Contact-Implicit MPC |
|
Liu, Xiaohan | Carnegie Mellon University |
Dai, Cunxi | Carnegie Mellon University |
Zhang, John | Carnegie Mellon University |
Bishop, Arun | Carnegie Mellon University |
Manchester, Zachary | Carnegie Mellon University |
Hollis, Ralph | Carnegie Mellon University |
Keywords: Multi-Contact Whole-Body Motion Planning and Control, Optimization and Optimal Control, Body Balancing
Abstract: In this work, we introduce a framework that enables highly maneuverable locomotion using non-periodic contacts. This task is challenging for traditional optimization and planning methods to handle due to difficulties in specifying contact mode sequences in real-time. To address this, we use a bi-level contact-implicit planner and hybrid model predictive controller to draft and execute a motion plan. We investigate how this method allows us to plan arm contact events on the shmoobot, a smaller ballbot, which uses an inverse mouse-ball drive to achieve dynamic balancing with a low number of actuators. Through multiple experiments we show how the arms allow for acceleration, deceleration and dynamic obstacle avoidance that are not achievable with the mouse-ball drive alone. This demonstrates how a holistic approach to locomotion can increase the control authority of unique robot morpohologies without additional hardware by leveraging robot arms that are typically used only for manipulation. Project website: https://cmushmoobot.github.io/Wallbounce
|
|
15:40-15:45, Paper ThDT5.6 | |
Reduced-Order Model Guided Contact-Implicit Model Predictive Control for Humanoid Locomotion |
|
Esteban, Sergio | California Institute of Technology |
Kurtz, Vincent | California Institute of Technology |
Ghansah, Adrian | California Institute of Technology |
Ames, Aaron | Caltech |
Keywords: Multi-Contact Whole-Body Motion Planning and Control, Whole-Body Motion Planning and Control, Humanoid and Bipedal Locomotion
Abstract: Humanoid robots have great potential for real-world applications due to their ability to operate in environments built for humans, but their deployment is hindered by the challenge of controlling their underlying high-dimensional nonlinear hybrid dynamics. While reduced-order models like the Hybrid Linear Inverted Pendulum (HLIP) are simple and computationally efficient, they lose whole-body expressiveness. Meanwhile, recent advances in Contact-Implicit Model Predictive Control (CI-MPC) enable robots to plan through multiple hybrid contact modes, but remain vulnerable to local minima and require significant tuning. We propose a control framework that combines the strengths of HLIP and CI-MPC. The reduced-order model generates a nominal gait, while CI-MPC manages the whole-body dynamics and modifies the contact schedule as needed. We demonstrate the effectiveness of this approach in simulation with a novel 24 degree-of-freedom humanoid robot: Achilles. Our proposed framework achieves rough terrain walking, disturbance recovery, robustness under model and state uncertainty, and allows the robot to interact with obstacles in the environment, all while running online in real-time at 50 Hz.
|
|
15:45-15:50, Paper ThDT5.7 | |
CAFE-MPC: A Cascaded-Fidelity Model Predictive Control Framework with Tuning-Free Whole-Body Control |
|
Li, He | University of Notre Dame |
Wensing, Patrick M. | University of Notre Dame |
Keywords: Legged Robots, Optimization and Optimal Control, Humanoid and Bipedal Locomotion, Whole-Body Control
Abstract: This work introduces an optimization-based locomotion control framework for on-the-fly synthesis of complex dynamic maneuvers. At the core of the proposed framework is a cascaded-fidelity model predictive controller (CAFE-MPC). CAFE-MPC strategically relaxes the planning problem along the prediction horizon (i.e., with descending model fidelity, increasingly coarse time steps, and relaxed constraints) for computational and performance gains. This problem is numerically solved with an efficient customized multiple-shooting iLQR (MS-iLQR) solver. The action-value function from CAFE-MPC is then used as the basis for a new value-function-based whole-body control (VWBC) technique that avoids additional tuning for the WBC. We show that CAFE-MPC if configured appropriately, advances the performance of whole-body MPC without necessarily increasing computational cost. Further, we show the superior performance of the proposed VWBC over the Ricatti feedback controller in terms of constraint handling. The proposed framework enables accomplishing for the first time gymnastic-style running barrel roll on the MIT Mini Cheetah.
|
|
ThDT6 |
307 |
Perception for Manipulation 3 |
Regular Session |
Chair: Wachs, Juan | Purdue University |
Co-Chair: Ogata, Tetsuya | Waseda University |
|
15:15-15:20, Paper ThDT6.1 | |
Accurate Robotic Pushing Manipulation through Online Model Estimation under Uncertain Object Properties |
|
Lee, Yongseok | Pohang University of Science and Technology |
Kim, Keehoon | POSTECH, Pohang University of Science and Technology |
Keywords: Model Learning for Control, Manipulation Planning
Abstract: Robotic pushing is a fundamental non-prehensile manipulation skill essential for handling objects that are difficult to grasp. This letter proposes a highly accurate robotic pushing framework that utilizes an online estimated model to push objects along a given nominal trajectory, despite uncertain object properties such as friction coefficients, mass distribution, and the position of the center of friction (CoF). The core concept involves estimating an optimal pushing motion model capable of representing observed local motions. A generalized form of the conventional analytical model, coupled with a moving-window Unscented Kalman Filter (UKF), serves as the online estimated model. It captures the local behavior of the pushed objects and is integrated with a model predictive control-based pushing strategy to achieve precise pushing performance. In experiments, the proposed robotic pushing framework demonstrated superior accuracy in tracking the given nominal trajectory compared to the conventional analytical model and data-driven model approaches, even when the motion model was perturbed. Additionally, the practicality of the proposed framework was showcased through a demonstration involving an autonomous robot collecting dishes, illustrating its applicability in various real-world applications.
|
|
15:20-15:25, Paper ThDT6.2 | |
Exploring the Domain-Invariant Flow Representation in Vision-Based Tactile Sensors for Omni-Hardness Perception |
|
Yang, Xuewen | Ocean University of China |
Wang, Nan | Ocean University of China |
Gu, Jiayang | Ocean University of China |
Zhang, Yugang | Ocean University of China |
Wang, Guoyu | Ocean University of China |
Song, Aiguo | Southeast University |
Keywords: Perception for Grasping and Manipulation, Force and Tactile Sensing
Abstract: Vision-based tactile sensors have recently gained prominence due to their superior resolution and ability to capture multi-dimensional contact information. However, even when sensors share the same sensing principle, variations in production factors can lead to differences in the color patterns of tactile signals. Unlike common vision tasks, vision-based tactile perception depends on tracking light variation in colorful signals, making it more susceptible to lighting conditions and thus more prone to domain gaps. In this paper, we propose an Omni-hardness perception framework that enables adaptation across various vision-based tactile sensors. Firstly, in-depth analyses of the factors influencing the generalization of hardness perception are presented. Furthermore, the light balance module and the force scale module are coupled to regulate network learning of generalized representations. Experimental results across multiple sensors demonstrate the transferability of learned representations. Additionally, downstream tasks in natural object perception, tumor detection, and grasping stability prediction, are proposed to evaluate the potential applications. The framework’s performance shows promise for advancing general tactile sensing and embodied tactile perception.
|
|
15:25-15:30, Paper ThDT6.3 | |
Focused Blind Switching Manipulation Based on Constrained and Regional Touch States of Multi-Fingered Hand Using Deep Learning |
|
Funabashi, Satoshi | Waseda University |
Hiramoto, Atsumu | Waseda University |
Chiba, Naoya | Osaka University |
Schmitz, Alexander | Waseda University |
Kulkarni, Shardul | Waseda University |
Ogata, Tetsuya | Waseda University |
Keywords: Deep Learning in Grasping and Manipulation, Force and Tactile Sensing, Multifingered Hands
Abstract: To achieve a desired grasping posture (including object position and orientation), multi-finger motions need to be conducted according to the the current touch state. Specifically, when subtle changes happen during correcting the object state, not only proprioception but also tactile information from the entire hand can be beneficial. However, switching motions with high-DOFs of multiple fingers and abundant tactile information is still challenging. In this study, we propose a loss function with constraints of touch states and an attention mechanism for focusing on important modalities depending on the touch states. The policy model is AE-LSTM which consists of Autoencoder (AE) which compresses abundant tactile information and Long Short-Term Memory (LSTM) which switches the motion depending on the touch states. Motion for cap-opening was chosen as a target task which consists of subtasks of sliding an object and opening its cap. As a result, the proposed method achieved the best success rates with a variety of objects for real time cap-opening manipulation. Furthermore, we could confirm that the proposed model acquired the features of each subtask and attention on specific modalities.
|
|
15:30-15:35, Paper ThDT6.4 | |
A Magnetic-Actuated Vision-Based Whisker Array for Contact Perception and Grasping |
|
Hu, Zhixian | Purdue University |
Wachs, Juan | Purdue University |
She, Yu | Purdue University |
Keywords: Perception for Grasping and Manipulation, Grippers and Other End-Effectors, Force and Tactile Sensing
Abstract: Tactile sensing and the manipulation of delicate objects are critical challenges in robotics. This study presents a vision-based magnetic-actuated whisker array sensor that integrates these functions. The sensor features eight whiskers arranged circularly, supported by an elastomer membrane and actuated by electromagnets and permanent magnets. A camera tracks whisker movements, enabling high-resolution tactile feedback. The sensor's performance was evaluated through object classification and grasping experiments. In the classification experiment, the sensor approached objects from four directions and accurately identified five distinct objects with a classification accuracy of 99.17% using a Multi-Layer Perceptron model. In the grasping experiment, the sensor tested configurations of eight, four, and two whiskers, achieving the highest success rate of 87% with eight whiskers. These results highlight the sensor's potential for precise tactile sensing and reliable manipulation.
|
|
15:35-15:40, Paper ThDT6.5 | |
GAPartManip: A Large-Scale Part-Centric Dataset for Material-Agnostic Articulated Object Manipulation |
|
Cui, Wenbo | Institute of Automation, Chinese Academy of Sciences |
Zhao, Chengyang | Carnegie Mellon University |
Wei, Songlin | Soochow University |
Zhang, Jiazhao | Peking University |
Geng, Haoran | University of California, Berkeley |
Chen, Yaran | Institute of Automation, Chinese Academy of Sciense |
Li, Haoran | Institute of Automation, Chinese Academy of Sciences |
Wang, He | Peking University |
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation
Abstract: Effectively manipulating articulated objects in household scenarios is a crucial step toward achieving general embodied artificial intelligence. Mainstream research in 3D vision has primarily focused on manipulation through depth perception and pose detection. However, in real-world environments, these methods often face challenges due to imperfect depth perception, such as with transparent lids and reflective handles. Moreover, they generally lack the diversity in part-based interactions required for flexible and adaptable manipulation. To address these challenges, we introduced a large-scale part-centric dataset for articulated object manipulation that features both photo-realistic material randomizations and detailed annotations of part-oriented, scene-level actionable interaction poses. We evaluated the effectiveness of our dataset by integrating it with several state-of-the-art methods for depth estimation and interaction pose prediction. Additionally, we proposed a novel modular framework that delivers superior and robust performance for generalizable articulated object manipulation. Our extensive experiments demonstrate that our dataset significantly improves the performance of depth perception and actionable interaction pose prediction in both simulation and real-world scenarios. More information and demos can be found at: https://pku-epic.github.io/GAPartManip/.
|
|
15:40-15:45, Paper ThDT6.6 | |
High-Precision Object Pose Estimation Using Visual-Tactile Information for Dynamic Interactions in Robotic Grasping |
|
Peng, Zicai | Beijing Institute of Technology |
Cui, Te | Beijing Institute of Technology |
Chen, Guangyan | Beijing Institute of Technology |
Lu, Haoyang | Beijing Institute of Techonology |
Yang, Yi | Beijing Institute of Technology |
Yue, Yufeng | Beijing Institute of Technology |
Keywords: Force and Tactile Sensing, Manipulation Planning, Grasping
Abstract: In various robotic applications, understanding accurate object poses for robots is essential for high-precision tasks such as factory assembly or daily insertions. Tactile sensing, which compensates for visual information, offers rich texture-based or force-based data for object pose estimation. However, previous methods for pose estimation typically overlook dynamic situations, such as slippage of grasped objects or movement of contacted objects during interactions with the environment, thus increasing the complexity of pose estimation. To address these challenges, we propose an efficient method that utilizes visual and tactile sensing to estimate object poses through particle filtering. We leverage visual information to track the pose of the contacted object in real-time and estimate the pose changes of the grasped object using displacement data obtained from tactile sensors. Our experimental evaluation on 13 objects with diverse geometric shapes demonstrated the ability to estimate high-precision poses, which revealed the robot's powerful ability to cope with dynamic scenes for compelled motion of objects, proving our framework's adaptability in practical scenarios with uncertainty.
|
|
15:45-15:50, Paper ThDT6.7 | |
Object-Aware Impedance Control for Human-Robot Collaborative Task with Online Object Parameter Estimation (I) |
|
Park, Jinseong | Korea Institute of Machinery and Materials |
Shin, Young-Sik | KIMM |
Kim, Sanghyun | Kyung Hee University |
Keywords: Physical Human-Robot Interaction, Human-Robot Collaboration, Compliance and Impedance Control
Abstract: Physical human-robot interactions (pHRIs) can improve robot autonomy and reduce physical demands on humans. In this paper, we consider a collaborative task with a considerably long object and no prior knowledge of the object's parameters. An integrated control framework with an online object parameter estimator and a Cartesian object-aware impedance controller is proposed to realize complicated scenarios. During the transportation task, the object parameters are estimated online while a robot and human keep lifting an object. The perturbation motion is incorporated into the null space of the desired trajectory to enhance the estimator precision. An object-aware impedance controller is designed by incorporating the real-time estimation results to effectively transmit the intended human motion to the robot through the object. Experimental demonstrations of collaborative tasks, including object transportation and assembly, are implemented to show the effectiveness of our proposed method. The proposed controller was also compared to a conventional impedance controller through subjective testing and found to be more sensitive, requiring less human effort.
|
|
ThDT7 |
309 |
Navigation Planning |
Regular Session |
Chair: Andersson, Olov | KTH Royal Institute of Technology |
Co-Chair: Petit, Louis | Université De Sherbrooke |
|
15:15-15:20, Paper ThDT7.1 | |
SARO: Space-Aware Robot System for Terrain Crossing Via Vision-Language Model |
|
Zhu, Shaoting | Tsinghua University |
Li, Derun | Shanghai Jiao Tong University |
Mou, Linzhan | University of Pennsylvania |
Liu, Yong | Zhejiang University |
Xu, Ningyi | Shanghai Jiao Tong University |
Zhao, Hang | Tsinghua University |
Keywords: AI-Enabled Robotics, Legged Robots, Autonomous Agents
Abstract: The application of vision-language models (VLMs) has achieved impressive success in various robotics tasks. However, there are few explorations for foundation models used in quadruped robot navigation through terrains in 3D environments. We introduce SARO (Space-Aware Robot System for Terrain Crossing), an innovative system composed of a high-level reasoning module, a closed-loop sub-task execution module, and a low-level control policy. It enables the robot to navigate across 3D terrains and reach the goal position. For high-level reasoning and execution, we propose a novel algorithmic system taking advantage of a VLM, with a design of task decomposition and a closed-loop sub-task execution mechanism. For low-level locomotion control, we utilize the Probability Annealing Selection (PAS) method to effectively train a control policy by reinforcement learning. Numerous experiments show that our whole system can accurately and robustly navigate across several 3D terrains, and its generalization ability ensures the applications in diverse indoor and outdoor scenarios and terrains. Appendix and Videos can be found in project page: https://saro-vlm.github.io/
|
|
15:20-15:25, Paper ThDT7.2 | |
Lab2Car: A Versatile Wrapper for Deploying Experimental Planners in Complex Real-World Environments |
|
Heim, Marc | Motional AD |
Suárez-Ruiz, Francisco | Motional Inc |
Bhuiyan, Ishraq | Motional |
Brito, Bruno | TU Delft |
Tomov, Momchil | Motional |
Keywords: Autonomous Agents, Motion and Path Planning, Machine Learning for Robot Control
Abstract: Human-level autonomous driving is an ever-elusive goal, with planning and decision making -- the cognitive functions that determine driving behavior -- posing the greatest challenge. Despite a proliferation of promising approaches, progress is stifled by the difficulty of deploying experimental planners in naturalistic settings. In this work, we propose Lab2Car, an optimization-based wrapper that can take a trajectory sketch from an arbitrary motion planner and convert it to a safe, comfortable, dynamically feasible trajectory that the car can follow. This allows motion planners that do not provide such guarantees to be safely tested and optimized in real-world environments. We demonstrate the versatility of Lab2Car by using it to deploy a machine learning (ML) planner and a classical planner on self-driving cars in Las Vegas. The resulting systems handle challenging scenarios, such as cut-ins, overtaking, and yielding, in complex urban environments like casino pick-up/drop-off areas. Our work paves the way for quickly deploying and evaluating candidate motion planners in realistic settings, ensuring rapid iteration and accelerating progress towards human-level autonomy.
|
|
15:25-15:30, Paper ThDT7.3 | |
One Map to Find Them All: Real-Time Open-Vocabulary Mapping for Zero-Shot Multi-Object Navigation |
|
Busch, Finn Lukas | KTH Royal Institute of Technology |
Homberger, Timon | KTH Royal Institute of Technology |
Ortega Peimbert, Jesús Gerardo | KTH Royal Institute of Technology |
Yang, Quantao | KTH Royal Institute of Technology |
Andersson, Olov | KTH Royal Institute |
Keywords: Semantic Scene Understanding, AI-Enabled Robotics, Autonomous Agents
Abstract: The capability to efficiently search for objects in complex environments is fundamental for many real-world robot applications. Recent advances in open-vocabulary vision models have resulted in semantically-informed object navigation methods that allow a robot to search for an arbitrary object without prior training. However, these zero-shot methods have so far treated the environment as unknown for each consecutive query. In this paper we introduce a new benchmark for zero-shot multi-object navigation, allowing the robot to leverage information gathered from previous searches to more efficiently find new objects. To address this problem we build a reusable open-vocabulary feature map tailored for real-time object search. We further propose a probabilistic-semantic map update that mitigates common sources of errors in semantic feature extraction and leverage this semantic uncertainty for informed multi-object exploration. We evaluate our method on a set of object navigation tasks in both simulation as well as with a real robot, running in real-time on a Jetson Orin AGX. We demonstrate that it outperforms existing state-of-the-art approaches both on single and multi-object navigation tasks. Additional videos, code and the multi-object navigation benchmark will be available on https://finnbsch.github.io/OneMap.
|
|
15:30-15:35, Paper ThDT7.4 | |
Exploring Adversarial Obstacle Attacks in Search-Based Path Planning for Autonomous Mobile Robots |
|
Szvoren, Adrian | University College London |
Liu, Jianwei | University College London |
Kanoulas, Dimitrios | University College London |
Tuptuk, Nilufer | University College London |
Keywords: Autonomous Agents, Constrained Motion Planning, Performance Evaluation and Benchmarking
Abstract: Path planning algorithms, such as the search-based A*, are a critical component of autonomous mobile robotics, enabling robots to navigate from a starting point to a destination efficiently and safely. We investigated the resilience of the A* algorithm in the face of potential adversarial interventions known as obstacle attacks. The adversary’s goal is to delay the robot’s timely arrival at its destination by introducing obstacles along its original path. We developed malicious software to execute the attacks and conducted experiments to assess their impact, both in simulation using TurtleBot in Gazebo and in real-world deployment with the Unitree Go1 robot. In simulation, the attacks resulted in an average delay of 36%, with the most significant delays occurring in scenarios where the robot was forced to take substantially longer alternative paths. In real-world experiments, the delays were even more pronounced, with all attacks successfully rerouting the robot and causing measurable disruptions. These results highlight that the algorithm’s robustness is not solely an attribute of its design but is significantly influenced by the operational environment. For example, in constrained environments like tunnels, the delays were maximized due to the limited availability of alternative routes.
|
|
15:35-15:40, Paper ThDT7.5 | |
Topological Mapping for Traversability-Aware Long-Range Navigation in Off-Road Terrain |
|
Tremblay, Jean-François | McGill University |
Alhosh, Julie | McGill University |
Petit, Louis | Université De Sherbrooke |
Lotfi, Faraz | McGill University |
Landauro, Lara | McGill University |
Meger, David Paul | McGill University |
Keywords: Field Robots, Integrated Planning and Learning, Vision-Based Navigation
Abstract: Autonomous robots navigating in off-road terrain like forests open new opportunities for automation. While off-road navigation has been studied, existing work often relies on clearly delineated pathways. We present a method allowing for long-range planning, exploration and low-level control in unknown off-trail forest terrain, using vision and GPS only. We represent outdoor terrain with a topological map, which is a set of panoramic snapshots connected with edges containing traversability information. A novel traversability analysis method is demonstrated, predicting the existence of a safe path towards a target in an image. Navigating between nodes is done using goal-conditioned behavior cloning, leveraging the power of a pretrained vision transformer. An exploration planner is presented, efficiently covering an unknown off-road area with unknown traversability using a frontiers-based approach. The approach is successfully deployed to autonomously explore two 400 m˛ forest sites unseen during training, in difficult conditions for navigation.
|
|
15:40-15:45, Paper ThDT7.6 | |
GPU-Enabled Parallel Trajectory Optimization Framework for Safe Motion Planning of Autonomous Vehicles |
|
Lee, Yeongseok | Korea Advanced Institute of Science and Technology |
Choi, Keun Ha | Korea Advanced Institute of Science and Technology |
Kim, Kyung-Soo | KAIST(Korea Advanced Institute of Science and Technology) |
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, Integrated Planning and Control
Abstract: This paper presents a GPU-enabled parallel trajectory optimization framework for model predictive control (MPC) in complex urban environments. It fuses the advantages of sampling-based MPC, which can cope with nonconvex costmaps through random sampling of trajectories, with the advantages of gradient-based MPC, which can generate smooth trajectories. In addition, we leverage a generalized safety-embedded MPC problem definition with a discrete barrier state (DBaS). The proposed framework has three steps: 1) a costmap builder to generate the barrier function map, 2) a seed trajectory generator to choose randomly generated trajectories to send to the optimizers, and 3) a batch trajectory optimizer to optimize each of the seed trajectories and select the best trajectory. Experiments with real-time simulations compare the effectiveness of the proposed framework, sampling-based MPC, and gradient-based MPC, which optimizes a single trajectory. The experiments also compare the application of two different control sequence sampling schemes to the proposed framework. The results show that the proposed framework performs gradient-based optimization but can plan a better trajectory even in complex environments by providing various initial guesses. We also show that the proposed framework can perform more accurate control actions than sampling-based MPC.
|
|
15:45-15:50, Paper ThDT7.7 | |
A Real-Time Spatio-Temporal Trajectory Planner for Autonomous Vehicles with Semantic Graph Optimization |
|
He, Shan | Beihang University |
Ma, Yalong | Beihang University |
Song, Tao | Beihang University |
Jiang, Yongzhi | Beihang University |
Wu, Xinkai | Beihang University |
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, Intelligent Transportation Systems
Abstract: Planning a safe and feasible trajectory for autonomous vehicles in real-time by fully utilizing perceptual information in complex urban environments is challenging. In this paper, we propose a spatio-temporal trajectory planning method based on graph optimization. It efficiently extracts the multi-modal information of the perception module by constructing a semantic spatio-temporal map through separation processing of static and dynamic obstacles, and then quickly generates feasible trajectories via sparse graph optimization based on a semantic spatio-temporal hypergraph. Extensive experiments have proven that the proposed method can effectively handle complex urban public road scenarios and perform in real time. We will also release our codes to accommodate benchmarking for the research community
|
|
ThDT8 |
311 |
Collision Avoidance 1 |
Regular Session |
Chair: Christensen, Henrik | University of California, San Diego |
Co-Chair: Hereid, Ayonga | Ohio State University |
|
15:15-15:20, Paper ThDT8.1 | |
Sailing through Point Clouds: Safe Navigation Using Point Cloud Based Control Barrier Functions |
|
Dai, Bolun | New York University |
Khorrambakht, Rooholla | New York University |
Krishnamurthy, Prashanth | New York University Tandon School of Engineering |
Khorrami, Farshad | New York University Tandon School of Engineering |
Keywords: Robot Safety, Collision Avoidance, Motion and Path Planning
Abstract: The capability to navigate safely in an unstructured environment is crucial when deploying robotic systems in real-world scenarios. Recently, control barrier function (CBF) based approaches have been highly effective in synthesizing safety-critical controllers. In this work, we propose a novel CBF-based local planner comprised of two components: Vessel and Mariner. The Vessel is a novel scaling factor based CBF formulation that synthesizes CBFs using only point cloud data. The Mariner is a CBF-based preview control framework that is used to mitigate getting stuck in spurious equilibria during navigation. To demonstrate the efficacy of our proposed approach, we first compare the proposed point cloud based CBF formulation with other point cloud based CBF formulations. Then, we demonstrate the performance of our proposed approach and its integration with global planners using experimental studies on the Unitree B1 and Unitree Go2 quadruped robots in various environments.
|
|
15:20-15:25, Paper ThDT8.2 | |
Parallel-Constraint Model Predictive Control: Exploiting Parallel Computation for Improving Safety |
|
Fontanari, Elias | University of Trento |
Lunardi, Gianni | University of Trento |
Saveriano, Matteo | University of Trento |
Del Prete, Andrea | University of Trento |
Keywords: Robot Safety, Optimization and Optimal Control, Motion Control
Abstract: Ensuring constraint satisfaction is a key requirement for safety-critical systems, which include most robotic platforms. For example, constraints can be used for modeling joint position/velocity/torque limits and collision avoidance. Constrained systems are often controlled using Model Predictive Control, because of its ability to naturally handle constraints relying on numerical optimization. However, ensuring constraint satisfaction is challenging for nonlinear systems/constraints. A well-known tool to make controllers safe is the so-called control-invariant set (a.k.a. safe set). In our previous work we have shown that safety can be improved by letting the safe set constraint recede along the horizon. In this paper we push that idea further. We suggest to exploit parallel computation for solving several MPC problems at the same time. Each problem instantiate the safe set constraint at a different time step along the horizon. Finally, the controller can select the best solution according to some user-defined criteria. We validated this idea through extensive simulations with a 3-joint robotic arm, showing that significant improvements can be achieved, even using as little as 4 computational cores.
|
|
15:25-15:30, Paper ThDT8.3 | |
Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking |
|
Zhang, Wei | Harbin Institute of Techonolgy |
Li, Pengfei | Institute for AI Industry Research (AIR), Tsinghua University |
Wang, Junli | Institute of Automation, Chinese Academy of Sciences |
Sun, Bingchuan | Lenovo |
Jin, Qihao | Fudan University |
Bao, Guangjun | Lenovo |
Yu, Yang | Lenovo |
Ding, Wenchao | Fudan University |
Li, Peng | Tsinghua University |
Chen, Yilun | Tsinghua University |
Keywords: Autonomous Agents, Semantic Scene Understanding
Abstract: Automatic Emergency Braking (AEB) systems are a crucial component in ensuring the safety of passengers in autonomous vehicles. Conventional AEB systems primarily rely on closed-set perception modules to recognize traffic conditions and assess collision risks. To enhance the adaptability of AEB systems in open scenarios, we propose Dual-AEB, a system combines an advanced multimodal large language model (MLLM) for comprehensive scene understanding and a conventional rule-based rapid AEB to ensure quick response times. To the best of our knowledge, Dual-AEB is the first method to incorporate MLLMs within AEB systems. Through extensive experimentation, we have validated the effectiveness of our method.
|
|
15:30-15:35, Paper ThDT8.4 | |
Estimating Control Barriers from Offline Data |
|
Yu, Hongzhan | University of California San Diego |
Farrell, Seth | University of California San Diego |
Yoshimitsu, Ryo | IHI Corporation |
Qin, Zhizhen | University of California, San Diego |
Christensen, Henrik | University of California, San Diego |
Gao, Sicun | UCSD |
Keywords: AI-Based Methods, Robot Safety, Collision Avoidance
Abstract: Learning-based methods for constructing control barrier functions (CBFs) are gaining popularity, for enforcing safety in practical robot control under complex dynamics and uncertainty that are hard to model. A major limitation of existing methods is their reliance on extensive sampling over the state space, making it hard to construct CBFs on real robots. In this work we introduce methods for learning neural CBFs through a fixed, sparsely-labeled dataset collected prior to training either the CBFs or the controllers. We propose novel annotation techniques based on out-of-distribution analysis to effectively propagate the information from the limited labeled data to the unlabeled data. We evaluate the proposed algorithm on real-world platforms. With limited amount of offline data, the proposed methods can achieve state-of-the-art performance for dynamic obstacle avoidance, with statistically safer and less conservative maneuvers compared to existing methods.
|
|
15:35-15:40, Paper ThDT8.5 | |
Real-Time Safe Bipedal Robot Navigation Using Linear Discrete Control Barrier Functions |
|
Peng, Chengyang | The Ohio State University |
Paredes, Victor | The Ohio State University |
Castillo, Guillermo A. | The Ohio State University |
Hereid, Ayonga | Ohio State University |
Keywords: Humanoid and Bipedal Locomotion, Integrated Planning and Control, Collision Avoidance
Abstract: Safe navigation in real-time is an essential task for humanoid robots in real-world deployment. Since humanoid robots are inherently underactuated thanks to unilateral ground contacts, a path is considered safe if it is obstacle-free and respects the robot's physical limitations and underlying dynamics. Existing approaches often decouple path planning from gait control due to the significant computational challenge caused by the full-order robot dynamics. In this work, we develop a unified, safe path and gait planning framework that can be evaluated online in real-time, allowing the robot to navigate clustered environments while sustaining stable locomotion. Our approach uses the popular Linear Inverted Pendulum (LIP) model as a template model to represent walking dynamics. It incorporates heading angles in the model to evaluate kinematic constraints essential for physically feasible gaits properly. In addition, we leverage discrete control barrier functions (DCBF) for obstacle avoidance, ensuring that the subsequent foot placement provides a safe navigation path within clustered environments. To guarantee real-time computation, we use a novel approximation of the DCBF to produce linear DCBF constraints. We validate our proposed approach in simulation using a Digit robot in randomly generated environments. The results demonstrate that the proposed approach can generate safe gaits for a non-trivial humanoid robot to navigate in a clustered environment in real-time.
|
|
15:40-15:45, Paper ThDT8.6 | |
FuzzRisk: Online Collision Risk Estimation for Autonomous Vehicles Based on Depth-Aware Object Detection Via Fuzzy Inference |
|
Liao, Brian Hsuan-Cheng | DENSO AUTOMOTIVE Deutschland GmbH |
Xu, Yingjie | Technical University of Munich |
Cheng, Chih-Hong | Chalmers University of Technology |
Esen, Hasan | DENSO AUTOMOTIVE Deutschland GmbH |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: Object Detection, Segmentation and Categorization, Robot Safety, Intelligent Transportation Systems
Abstract: This paper presents a novel monitoring framework that infers the level of collision risk for autonomous vehicles (AVs) based on their object detection performance. The framework takes two sets of predictions from different algorithms and associates their inconsistencies with the collision risk via fuzzy inference. The first set of predictions is obtained by retrieving safety-critical 2.5D objects from a depth map, and the second set comes from the ordinary AV's 3D object detector. We experimentally validate that, based on Intersection-over-Union (IoU) and a depth discrepancy measure, the inconsistencies between the two sets of predictions strongly correlate to the error of the 3D object detector against ground truths. This correlation allows us to construct a fuzzy inference system and map the inconsistency measures to an AV collision risk indicator. In particular, we optimize the fuzzy inference system towards an existing offline metric that matches AV collision rates well. Lastly, we validate our monitor's capability to produce relevant risk estimates with the large-scale nuScenes dataset and demonstrate that it can safeguard an AV in closed-loop simulations.
|
|
15:45-15:50, Paper ThDT8.7 | |
Adaptive Deadlock Avoidance for Decentralized Multi-Agent Systems Via CBF-Inspired Risk Measurement |
|
Zhang, Yanze | University of Illinois Chicago |
Lyu, Yiwei | Carnegie Mellon University |
Jo, Siwon | University of North Carolina at Charlotte |
Yang, Yupeng | University of North Carolina at Charlotte |
Luo, Wenhao | University of Illinois Chicago |
Keywords: Autonomous Agents, Agent-Based Systems, Multi-Robot Systems
Abstract: Decentralized safe control plays an important role in multi-agent systems given the scalability and robustness without reliance on a central authority. However, without an explicit global coordinator, the decentralized control methods are often prone to deadlock --- a state where the system reaches equilibrium, causing the robots to stall. In this paper, we propose a generalized decentralized framework that unifies the Control Lyapunov Function (CLF) and Control Barrier Function (CBF) to facilitate efficient task execution and ensure deadlock-free trajectories for the multi-agent systems. As the agents approach the deadlock-related undesirable equilibrium, the framework can detect the equilibrium and drive agents away before that happens. This is achieved by a secondary deadlock resolution design with an auxiliary CBF to prevent the multi-agent systems from converging to the undesirable equilibrium. To avoid dominating effects due to the deadlock resolution over the original task-related controllers, a deadlock indicator function using CBF-inspired risk measurement is proposed and encoded in the unified framework for the agents to adaptively determine when to activate the deadlock resolution. This allows the agents to follow their original control tasks and seamlessly unlock or deactivate deadlock resolution as necessary, effectively improving task efficiency. We demonstrate the effectiveness of the proposed method through theoretical analysis, numerical simulations, and real-world experiments.
|
|
ThDT9 |
312 |
Task and Motion Planning 3 |
Regular Session |
Chair: Park, Shinkyu | KAUST |
Co-Chair: Mukherjee, Koena | NIT Silchar |
|
15:15-15:20, Paper ThDT9.1 | |
SEAL: A Sample-Efficient Adjustment-Learning Method for Table Tennis Robot Serve |
|
Guo, Qitong | The University of Tokyo |
Shi, Xiaohang | The University of Tokyo |
Murakami, Kenichi | The University of Tokyo |
Jia, Ruoyu | The University of Tokyo |
Yamakawa, Yuji | The University of Tokyo |
Keywords: Machine Learning for Robot Control, Task and Motion Planning, Manipulation Planning
Abstract: Table tennis robots have significantly advanced in performance owing to the rapid progress in deep learning and reinforcement learning technologies. However, these advancements often require a large number of training samples. Besides, research focused on the robot serve task remains relatively limited. In response to these problems, this paper proposes a sample-efficient adjustment-learning (SEAL) method for the serve task inspired by human experience in table tennis, which can inherently augment the available training samples without the need for additional sample collection. The adjustment learning does not require complex network structures but demonstrates superior performances. The models trained by adjustment learning have good generalization and robustness, that can adapt to different serve styles and reduce system transfer errors very efficiently. In addition, the random interpolation method during dataset generation stage is introduced, and the effectiveness of simultaneous learning in both joint space and Cartesian space is also demonstrated. For specific serve task, an accuracy of less than 30 mm to any designated position at the first shot is achieved.
|
|
15:20-15:25, Paper ThDT9.2 | |
Inference Based Multi-Object Reactive Search in a Partially Known Environment with Temporal Logic Specifications |
|
Kang, Yaohui | University of Science and Technology of China |
Chen, Ziyang | University of Science and Technology of China |
Xia, Yanjie | University of Science and Technology of China |
Kan, Zhen | University of Science and Technology of China |
Keywords: Task and Motion Planning, Formal Methods in Robotics and Automation
Abstract: Efficiently searching for multiple objects in a partially known environment, where only the names and locations of landmarks are available, presents significant challenges. Existing search algorithms in the literature fail to fully utilize prior knowledge to improve search efficiency, and exhibit significantly diminished efficiency when extended to multi- object search. To address these limitations, we propose an inference-based multi-object reactive search framework. This framework utilizes the COMET inference model to reason about co-occurrence values between the target objects and known landmarks, thereby enhancing search efficiency. These co-occurrence values are integrated into a reactive temporal logic motion planning strategy, which allows the robot search for multiple objects with temporal logic constraints specified by LTL and adapt dynamically if the inferred reasoning differs from the actual object arrangement encountered during the search. Extensive simulations were conducted to evaluate the feasibility and efficiency of the proposed motion planning algorithm. Results demonstrate that the integration of commonsense reasoning with reactive temporal logic planning significantly improves multi-object search efficiency. Project website: https://sites.google.com/view/imors.
|
|
15:25-15:30, Paper ThDT9.3 | |
Planning with Adaptive World Models for Autonomous Driving |
|
Vasudevan, Arun Balajee | Carnegie Mellon University |
Peri, Neehar | Carnegie Mellon University |
Schneider, Jeff | Carnegie Mellon University |
Ramanan, Deva | Carnegie Mellon University |
Keywords: Task and Motion Planning, Behavior-Based Systems, Robust/Adaptive Control
Abstract: Motion planning is crucial for safe navigation in complex urban environments. Historically, motion planners (MPs) have been evaluated with procedurally-generated simulators like CARLA. However, such synthetic benchmarks do not capture real-world multi-agent interactions. nuPlan, a recently released MP benchmark, addresses this limitation by augmenting real-world driving logs with closed-loop simulation logic, effectively turning the fixed dataset into a reactive simulator. We analyze the characteristics of nuPlan's recorded logs and find that each city has its own unique driving behaviors, suggesting that robust planners must adapt to different environments. We learn to model such unique behaviors with BehaviorNet, a graph convolutional neural network (GCNN) that predicts reactive agent behaviors using features derived from recently-observed agent histories; intuitively, some aggressive agents may tailgate lead vehicles, while others may not. To model such phenomena, BehaviorNet predicts the parameters of an agent's motion controller rather than directly predicting its spacetime trajectory (as most forecasters do). Finally, we present AdaptiveDriver, a model-predictive control (MPC) based planner that unrolls different world models conditioned on BehaviorNet's predictions. Our extensive experiments demonstrate that AdaptiveDriver achieves state-of-the-art results on the nuPlan closed-loop planning benchmark, improving over prior work by 2% on Test-14 Hard R-CLS, and generalizes even when evaluated on never-before-seen cities.
|
|
15:30-15:35, Paper ThDT9.4 | |
Subassembly to Full Assembly: Effective Assembly Sequence Planning through Graph-Based Reinforcement Learning |
|
Shu, Chang | KAUST |
Kim, Anton | KAUST |
Park, Shinkyu | KAUST |
Keywords: Task and Motion Planning, Assembly, Manipulation Planning
Abstract: This paper proposes an assembly sequence planning framework, named Subassembly to Assembly (S2A). The framework is designed to enable a robotic manipulator to assemble multiple parts in a prespecified structure by leveraging object manipulation actions. The primary technical challenge lies in the exponentially increasing complexity of identifying a feasible assembly sequence as the number of parts grows. To address this, we introduce a graph-based reinforcement learning approach, where a graph attention network is trained using a delayed reward assignment strategy. In this strategy, rewards are assigned only when an assembly action contributes to the successful completion of the assembly task. We validate the framework's performance through physics-based simulations, comparing it against various baselines to emphasize the significance of the proposed reward assignment approach. Additionally, we demonstrate the feasibility of deploying our framework in a real-world robotic assembly scenario.
|
|
15:35-15:40, Paper ThDT9.5 | |
Fuel-Optimal Operational Speed Planning for Autonomous Trucking on Highways |
|
Li, Wei | Inceptio |
Wu, Bin | Inceptio |
Xiang, Jiahao | Tongji University, Inceptio Technology |
Ren, Jiaping | Inceptio Technology |
Wu, Yi | Nanjing University of Posts and Telecommunications |
Yang, Ruigang | University of Kentucky |
Keywords: Task and Motion Planning, Logistics, Planning, Scheduling and Coordination
Abstract: The rapid advancement of autonomous driving technology, particularly in autonomous trucking on highways, shows great value for enhancing efficiency and reducing costs in the logistics industry. In this work, we define the full-trip speed planning problem for autonomous trucks under delivery time and fuel consumption constraints, referred to as the Operational Speed Planning (OSP) problem. To support and accelerate research on the OSP problem, we have developed a comprehensive dataset using a fleet of over 400 trucks. The dataset contains rich, diverse information covering more than 22 million kilometers of real-world highway driving data. In addition to this static dataset, we have developed a closed-loop simulator that allows for the interactive evaluation of OSP solutions, enabling researchers to test speed planning strategies in a realistic environment. Furthermore, we provide an OSP baseline method based on dynamic programming to optimize speed planning, balancing the delivery time requirements and fuel consumption. Our extensive experiments demonstrate both the accuracy of the simulation and the effectiveness of the OSP baseline in planning optimal speeds, proving its capability to meet time constraints while improving fuel efficiency. The dataset, simulator, and baseline will be made publicly available to foster further research and innovation in this area.
|
|
15:40-15:45, Paper ThDT9.6 | |
Verifiably Following Complex Robot Instructions with Foundation Models |
|
Quartey, Benedict | Brown University |
Rosen, Eric | The AI Institute |
Tellex, Stefanie | Brown |
Konidaris, George | Brown University |
Keywords: Task and Motion Planning, Mobile Manipulation, Semantic Scene Understanding
Abstract: When instructing robots, users want to flexibly express constraints, refer to arbitrary landmarks, and verify robot behavior, while robots must disambiguate instructions into specifications and ground instruction referents in the real world. To address this problem, we propose Language Instruction grounding for Motion Planning (LIMP), an approach that enables robots to verifiably follow complex, open-ended instructions in real-world environments without prebuilt semantic maps. LIMP constructs a symbolic instruction representation that reveals the robot’s alignment with an instructor’s intended motives and affords the synthesis of correct-by-construction robot behaviors. We conduct a large-scale evaluation of LIMP on 150 instructions across five real-world environments, demonstrating its versatility and ease of deployment in diverse, unstructured domains. LIMP performs comparably to state-of-the-art baselines on standard open-vocabulary tasks and additionally achieves a 79% success rate on complex spatiotemporal instructions, significantly outperforming baselines that only reach 38%.
|
|
15:45-15:50, Paper ThDT9.7 | |
A Hierarchical Approach for Joint Task Allocation and Path Planning |
|
Ho, Florence | NEC Corporation, National Institute of Advanced Industrial Scien |
Higa, Ryota | NEC Corporation, National Institute of Advanced Industrial Scien |
Kato, Takuro | National Institute of Advanced Industrial Science and Technology |
Nakadai, Shinji | NEC Corporation |
Keywords: Task Planning, Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents
Abstract: This paper addresses the joint task allocation and path planning problem, whereby a fleet of vehicles must be optimally assigned to service multiple given tasks while their planned paths must be collision-free. Such a problem composed of two tightly coupled optimization problems has a high complexity with the number of tasks and the number of vehicles, thus optimal solvers do not scale to large size instances. Therefore, we propose a novel method to solve this problem, HTAPPS, which introduces a hierarchical resolution framework. Our proposed approach decomposes a given instance into three levels of abstractions and associated amount of details that progressively filter the search space. This allows us to reduce the computational effort required when performing task allocation and multi-agent path planning jointly. We perform simulations on automated warehouse scenarios, and compare our approach to baseline solvers. The obtained results show that our proposed approach is able to solve large size instances within a limited time.
|
|
ThDT10 |
313 |
Multi-Robot Systems 6 |
Regular Session |
Chair: Tsiotras, Panagiotis | Georgia Tech |
Co-Chair: Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
|
15:15-15:20, Paper ThDT10.1 | |
Multi S-Graphs: An Efficient Distributed Semantic-Relational Collaborative SLAM |
|
Fernandez-Cortizas, Miguel | Universidad Politécnica De Madrid |
Bavle, Hriday | University of Luxembourg |
Perez Saura, David | Computer Vision and Aerial Robotics Group (CVAR), Universidad Po |
Sanchez-Lopez, Jose Luis | University of Luxembourg |
Campoy, Pascual | Computer Vision & Aerial Rootics Group. Universidad Politécnica |
Voos, Holger | University of Luxembourg |
Keywords: Multi-Robot SLAM, SLAM, Multi-Robot Systems
Abstract: Collaborative Simultaneous Localization and Mapping (CSLAM) is critical to enable multiple robots to operate in complex environments. Most CSLAM techniques rely on raw sensor measurement or low-level features such as keyframe descriptors, which can lead to wrong loop closures due to the lack of deep understanding of the environment. Moreover, the exchange of these measurements and low-level features among the robots requires the transmission of a significant amount of data, which limits the scalability of the system. To overcome these limitations, we present Multi S-Graphs, a decentralized CSLAM system that utilizes high-level semantic-relational information embedded in the four-layered hierarchical and optimizable situational graphs for cooperative map generation and localization in structured environments while minimizing the information exchanged between the robots. To support this, we present a novel room-based descriptor which, along with its connected walls, is used to perform inter-robot loop closures, addressing the challenges of multi-robot kidnapped problem initialization. Multiple experiments in simulated and real environments validate the improvement in accuracy and robustness of the proposed approach while reducing the amount of data exchanged between robots compared to other state-of-the-art approaches.
|
|
15:20-15:25, Paper ThDT10.2 | |
Language-Conditioned Offline RL for Multi-Robot Navigation |
|
Morad, Steven | The University of Cambridge |
Shankar, Ajay | University of Cambridge, UK |
Blumenkamp, Jan | University of Cambrdige |
Prorok, Amanda | University of Cambridge |
Keywords: Multi-Robot Systems, Networked Robots, Reinforcement Learning
Abstract: We present a method for synthesizing navigation policies for multi-robot teams that interpret and follow natural language instructions. We condition these policies on embeddings from pretrained Large Language Models (LLMs), and train them via offline reinforcement learning with as little as 20 minutes of randomly-collected real-world data. Experiments on a team of five real robots show that these policies generalize well to unseen commands, indicating an understanding of the LLM latent space. Our method requires no simulators or environment models, and produces low-latency control policies that can be deployed directly to real robots without finetuning. We provide videos of our experiments at https://sites.google.com/view/llm-marl.
|
|
15:25-15:30, Paper ThDT10.3 | |
Deep Reinforcement Learning for Coordinated Payload Transport in Biped-Wheeled Robots |
|
Mehta, Dhruv K | Clemson University |
Joglekar, Ajinkya | Clemson University |
Krovi, Venkat | Clemson University |
Keywords: Cooperating Robots, Multi-Robot Systems, Reinforcement Learning
Abstract: Coordinated payload transport via a fleet of modular wheeled mobile robots offers flexibility for handling larger loads in indoor and outdoor environments. Biped-wheeled robots have recently emerged as a viable architecture for an independent/stand-alone wheeled mobile robot. In this work, we explore the use of two biped-wheeled robots that can leverage their mobility and maneuvarability for enhanced spatial pose control and stabilization for various payload transport tasks. However, coordinated control of multiple articulated wheeled robots for path tracking of a payload presents significant (and potentially competing) challenges, including kinematic redundancy, stability concerns, relative motion between the payload and robots, and precise motion control to achieve effective coordination. To address these challenges, we propose a Deep Reinforcement Learning (DRL) framework to develop the motion-plans for the system. In particular, this approach generates the ego robot's body twist and the follower robot's relative twist with respect to the ego robot. By formulating the action space of the follower robot as a relative twist, our approach facilitates pairwise interactions between robots. Furthermore, we use only relative pose information and the errors as states for the DRL controller, thereby making it agnostic to initial conditions and avoiding explicit dependency on absolute pose. We validate our approach through simulations conducted in Isaac Sim and on hardware using Diablo biped-wheeled robots with zero-shot transfer, demonstrating effective payload path tracking across varying parameters.
|
|
15:30-15:35, Paper ThDT10.4 | |
Reinforcement Learning within the Classical Robotics Stack: A Case Study in Robot Soccer |
|
Labiosa, Adam | University of Wisconsin-Madison |
Wang, Zhihan | The University of Texas at Austin |
Agarwal, Siddhant | The University of Texas at Austin |
Cong, William | University of Wisconsin-Madison |
Hemkumar, Geethika | The University of Texas at Austin |
Harish, Abhinav Narayan | University of Wisconsin Madison |
Hong, Benjamin | University of Wisconsin - Madison |
Kelle, Josh | University of Texas at Austin |
Li, Chen | UW-Madison |
Li, Yuhao | University of Wisconsin–Madison |
Shao, Zisen | University of Wisconsin–Madison |
Stone, Peter | University of Texas at Austin |
Hanna, Josiah | University of Wisconsin -- Madison |
Keywords: Reinforcement Learning, Machine Learning for Robot Control, Multi-Robot Systems
Abstract: Robot decision-making in partially observable, real-time, dynamic, and multi-agent environments remains a difficult and unsolved challenge. Model-free reinforcement learning (RL) is a promising approach to learning decision-making in such domains, however, end-to-end RL in complex environments is often intractable. To address this challenge in the RoboCup Standard Platform League (SPL) domain, we developed a novel architecture integrating RL within a classical robotics stack, while employing a multi-fidelity sim2real approach and decomposing behavior into learned sub-behaviors with heuristic selection. Our architecture led to victory in the 2024 RoboCup SPL Challenge Shield Division. In this work, we fully describe our system's architecture and empirically analyze key design decisions that contributed to its success. Our approach demonstrates how RL-based behaviors can be integrated into complete robot behavior architectures.
|
|
15:35-15:40, Paper ThDT10.5 | |
Residual Descent Differential Dynamic Game (RD3G) -- a Fast Newton Solver for Constrained General Sum Games |
|
Zhang, Zhiyuan | Georgia Institute of Technology |
Tsiotras, Panagiotis | Georgia Tech |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Integrated Planning and Control, Optimization and Optimal Control
Abstract: We present Residual Descent Differential Dynamic Game (RD3G), a Newton-based solver for constrained multi- agent game-control problems. The proposed solver seeks a local Nash equilibrium for games where agents are coupled through their rewards and state constraints. By maintaining a dynamic set of active constraints, combined with a barrier function on satisfied constraints and a backtracking line search, the proposed method is able to satisfy state constraints while keeping the dimension of the Newton descent direction problem to a minimum. We compare the proposed method against state- of-the-art techniques and showcase the computational benefits of the RD3G algorithm on several example problems. The RD3G is up to 4X faster and has 2X higher convergence rate than existing approaches in higher dimensional games.
|
|
15:40-15:45, Paper ThDT10.6 | |
MARLadona - towards Cooperative Team Play Using Multi-Agent Reinforcement Learning |
|
Li, Zichong | ANYbotics |
Bjelonic, Filip | ETH Zürich, Switzerland |
Klemm, Victor | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Cooperating Robots, Multi-Robot Systems, Reinforcement Learning
Abstract: Robot soccer, in its full complexity, poses an unsolved research challenge. Current solutions heavily rely on engineered heuristic strategies, which lack robustness and adaptability. Deep reinforcement learning has gained significant traction in various complex robotics tasks such as locomotion, manipulation, and competitive games (e.g., AlphaZero, OpenAI Five), making it a promising solution to the robot soccer problem. This paper introduces MARLadona. A decentralized multi-agent reinforcement learning (MARL) training pipeline capable of producing agents with sophisticated team play behavior, bridging the shortcomings of heuristic methods. Furthermore, we created an open-source multi-agent soccer environment. Utilizing our MARL framework and a modified a global entity encoder (GEE) as our core architecture, our approach achieves a 66.8 % win rate against HELIOS agent, which employs a state-of-the-art heuristic strategy. In addition, we provided an in-depth analysis of the policy behavior and interpreted the agent’s intention using the critic network.
|
|
15:45-15:50, Paper ThDT10.7 | |
Multi-Agent Inverse Q-Learning from Demonstrations |
|
Haynam, Nathaniel | UC Berkeley |
Khoja, Adam | UC Berkeley |
Kumar, Dhruv | UC Berkeley |
Myers, Vivek | UC Berkeley |
Bıyık, Erdem | University of Southern California |
Keywords: Multi-Robot Systems, Imitation Learning
Abstract: When reward functions are hand-designed, deep reinforcement learning algorithms often suffer from reward misspecification, causing them to learn suboptimal policies. In the single-agent case, Inverse Reinforcement Learning (IRL) techniques attempt to address this issue by inferring the reward function from expert demonstrations. However, in multi-agent problems, misalignment between the learned and true objectives is exacerbated due to increased environment non-stationarity and variance that scale with multiple agents. As such, in multi-agent general-sum games, multi-agent IRL algorithms have difficulty balancing cooperative and competitive objectives. To address these issues, we propose Multi-Agent Marginal Q-Learning from Demonstrations (MAMQL), a novel sample-efficient framework for multi-agent IRL. For each agent, MAMQL learns a critic marginalized over the other agents' policies, allowing for a well-motivated use of Boltzmann policies in the multi-agent context. We identify a connection between optimal marginalized critics and single-agent soft-Q IRL, allowing us to apply a direct, simple optimization criterion from the single-agent domain. Across our experiments on three different simulated domains, MAMQL significantly outperforms previous multi-agent methods in average reward, sample efficiency, and reward recovery by often more than 2-5x. We make our code available at https://sites.google.com/view/mamql .
|
|
ThDT11 |
314 |
Robot Vision 2 |
Regular Session |
Chair: Wang, Lin | Nanyang Technological University (NTU) |
Co-Chair: Le Gentil, Cedric | University of Toronto |
|
15:15-15:20, Paper ThDT11.1 | |
LoGS: Visual Localization for Mobile Robots with Gaussian Splatting |
|
Cheng, Yuzhou | University College London |
Jiao, Jianhao | University College London |
Wang, Yue | Zhejiang University |
Kanoulas, Dimitrios | University College London |
Keywords: Localization, Mapping, RGB-D Perception
Abstract: Visual localization involves estimating a query im-age’s 6-DoF (degrees of freedom) camera pose, which is a funda-mental component in various computer vision and robotic tasks. This paper presents LoGS, a vision-based localization pipeline utilizing the 3D Gaussian Splatting (GS) technique as scene representation. This novel representation allows high-quality novel view synthesis. During the mapping phase, structure-from-motion (SfM) is applied first, followed by the generation of a GS map. During localization, the initial position is obtained through image retrieval, local feature matching coupled with a PnP solver, and then a high-precision pose is achieved through the analysis-by-synthesis manner on the GS map. Experimental results on four large-scale datasets demonstrate the proposed approach’s SoTA accuracy in estimating camera poses and robustness under challenging few-shot conditions.
|
|
15:20-15:25, Paper ThDT11.2 | |
Unified Human Localization and Trajectory Prediction with Monocular Vision |
|
Luan, Po-Chien | EPFL |
Gao, Yang | EPFL |
Demonsant, Céline | EPFL |
Alahi, Alexandre | EPFL |
Keywords: Intelligent Transportation Systems, Localization, Computer Vision for Transportation
Abstract: Conventional human trajectory prediction models rely on clean curated data, requiring specialized equipment or manual labeling, which is often impractical for robotic applications. The existing predictors tend to overfit to clean observation affecting their robustness when used with noisy inputs. In this work, we propose MonoTransmotion (MT), a Transformer-based framework that uses only a monocular camera to jointly solve localization and prediction tasks. Our framework has two main modules: Bird’s Eye View (BEV) localization and trajectory prediction. The BEV localization module estimates the position of a person using 2D human poses, enhanced by a novel directional loss for smoother sequential localizations. The trajectory prediction module predicts future motion from these estimates. We show that by jointly training both tasks with our unified framework, our method is more robust in real-world scenarios made of noisy inputs. We validate our MT network on both curated and non-curated datasets. On the curated dataset, MT achieves around 12% improvement over baseline models on BEV localization and trajectory prediction. On real-world non-curated dataset, experimental results indicate that MT maintains similar performance levels, highlighting its robustness and generalization capability.
|
|
15:25-15:30, Paper ThDT11.3 | |
HGSLoc: 3DGS-Based Heuristic Camera Pose Refinement |
|
Niu, Zhongyan | National University of Defense Technology |
Tan, Zhen | National University of Defense Technology |
Zhang, Jinpu | National University of Defense Technology |
Yang, Xueliang | National University of Defense Technology |
Hu, Dewen | National University of Defense Technology |
Keywords: Localization, Visual Learning, Computer Vision for Automation
Abstract: Visual localization refers to the process of determining camera poses and orientation within a known scene representation. This task is often complicated by factors such as changes in illumination and variations in viewing angles. In this paper, we propose HGSLoc, a novel lightweight plug-and-play pose optimization framework, which integrates 3D reconstruction with a heuristic refinement strategy to achieve higher pose estimation accuracy. Specifically, we introduce an explicit geometric map for 3D representation and high-fidelity rendering, allowing the generation of high-quality synthesized views to support accurate visual localization. Our method demonstrates higher localization accuracy compared to NeRF-based neural rendering localization approaches. We introduce a heuristic refinement strategy, its efficient optimization capability can quickly locate the target node, while we set the step-level optimization step to enhance the pose accuracy in the scenarios with small errors. With carefully designed heuristic functions, it offers efficient optimization capabilities, enabling rapid error reduction in rough localization estimations. Our method mitigates the dependence on complex neural network models while demonstrating improved robustness against noise and higher localization accuracy in challenging environments, as compared to neural network joint optimization strategies. The optimization framework proposed in this paper introduces novel approaches to visual localization by integrating the advantages of 3D reconstruction and the heuristic refinement strategy, which demonstrates strong performance across multiple benchmark datasets, including 7Scenes and Deep Blending dataset. The implementation of our method has been released at https://github.com/anchang699/HGSLoc.
|
|
15:30-15:35, Paper ThDT11.4 | |
Depth Estimation Based on 3D Gaussian Splatting Siamese Defocus |
|
Zhang, Jinchang | University of Georgia |
Xu, Ningning | University of Georgia |
Zhang, Hao | University of Massachusetts Amherst |
Lu, Guoyu | University of Georgia |
Keywords: Range Sensing, Mapping, RGB-D Perception
Abstract: Depth estimation is a fundamental task in 3D geometry. While stereo depth estimation can be achieved through triangulation methods, it is not as straightforward for monocular methods, which require the integration of global and local information. The Depth from Defocus (DFD) method utilizes camera lens models and parameters to recover depth information from blurred images and has been proven to perform well. However, these methods rely on All-In-Focus (AIF) images for depth estimation, which is nearly impossible to obtain in real-world applications. To address this issue, we propose a self-supervised framework based on 3D Gaussian splatting and Siamese networks. By learning the blur levels at different focal distances of the same scene in the focal stack, the framework predicts the defocus map and Circle of Confusion (CoC) from a single defocused image, using the defocus map as input to DepthNet for monocular depth estimation. The 3D Gaussian splatting model renders defocused images using the predicted CoC, and the differences between these and the real defocused images provide additional supervision signals for the Siamese Defocus self-supervised network. This framework has been validated on both artificially synthesized and real blurred datasets. Subsequent quantitative and visualization experiments demonstrate that our proposed framework is highly effective as a DFD method.
|
|
15:35-15:40, Paper ThDT11.5 | |
GSFusion: Online RGB-D Mapping Where Gaussian Splatting Meets TSDF Fusion |
|
Wei, Jiaxin | Technical University of Munich |
Leutenegger, Stefan | Technical University of Munich |
Keywords: Mapping, RGB-D Perception
Abstract: Traditional volumetric fusion algorithms preserve the spatial structure of 3D scenes, which is beneficial for many tasks in computer vision and robotics. However, they often lack realism in terms of visualization. Emerging 3D Gaussian splatting bridges this gap, but existing Gaussian-based reconstruction methods often suffer from artifacts and inconsistencies with the underlying 3D structure, and struggle with real-time optimization, unable to provide users with immediate feedback in high quality. One of the bottlenecks arises from the massive amount of Gaussian parameters that need to be updated during optimization. Instead of using 3D Gaussian as a standalone map representation, we incorporate it into a volumetric mapping system to take advantage of geometric information and propose to use a quadtree data structure on images to drastically reduce the number of splats initialized. In this way, we simultaneously generate a compact 3D Gaussian map with fewer artifacts and a volumetric map on the fly. Our method, GSFusion, significantly enhances computational efficiency without sacrificing rendering quality, as demonstrated on both synthetic and real datasets. Code is available at https://github.com/goldoak/GSFusion.
|
|
15:40-15:45, Paper ThDT11.6 | |
San Francisco World: Leveraging Structural Regularities of Slope for 3-DoF Visual Compass |
|
Ham, Jungil | Gwangju Institute of Science and Technology |
Kim, Minji | Gwangju Institute of Science and Technology |
Kang, Suyoung | University of Massachusetts Amherst |
Joo, Kyungdon | UNIST |
Li, Haoang | Hong Kong University of Science and Technology (Guangzhou) |
Kim, Pyojin | Gwangju Institute of Science and Technology (GIST) |
Keywords: Mapping, Vision-Based Navigation, RGB-D Perception
Abstract: We propose the San Francisco world (SFW) model, a novel structural model inspired by San Francisco's hilly terrain, enabling 3D inter-floor navigation in urban areas rather than being limited to 2D intra-floor navigation of various robotics platforms. Our SFW consists of a single vertical dominant direction (VDD), two horizontal dominant directions (HDDs), and four sloping dominant directions (SDDs) sharing a common inclination angle. Although SFW is a more general model than the Manhattan world (MW), it is a more compact model than the mixture of Manhattan world (MMW). Leveraging the structural regularities of SFW, such as uniform inclination angle and geometric patterns of the four SDDs, we design an efficient and robust DD/vanishing point estimation method by aggregating sloping line normals on the Gaussian sphere. We further utilize the structural patterns of SFW for the 3-DoF visual compass, the rotational motion tracking from a single line and plane, which corresponds to the theoretical minimal sampling for 3-DoF rotation estimation. Our method demonstrates enhanced adaptability in more challenging inter-floor scenes in urban areas and the highest rotational tracking accuracy compared to state-of-the-art methods. We release the first dataset of sequential RGB-D images captured in San Francisco world (SFW) and open source codes at: https://SanFranciscoWorld.github.io/.
|
|
15:45-15:50, Paper ThDT11.7 | |
Monocular 360 Depth Estimation Via Spherical Fully-Connected CRFs |
|
Cao, Zidong | HKUST |
Wang, Lin | Nanyang Technological University (NTU) |
Keywords: Omnidirectional Vision, Deep Learning for Visual Perception
Abstract: Monocular 360 depth estimation poses significant challenges due to the inherent distortion of the equirectangular projection (ERP). This distortion separates adjacent spherical points after their projection onto the ERP plane, especially in the polar regions, resulting in insufficient spherical relationships. To address this issue, recent methods calculate spherical neighbors within the tangent domain. However, since the tangent patch and the sphere share only one common point, spherical relationships are established only among neighbors around this common point. In this paper, we propose Spherical Fully-Connected CRFs (SF-CRFs). We start by evenly partitioning an ERP image into regular windows, where windows at the equator have broader spherical neighbors than those at the poles. To enhance spherical relationships, our SF-CRFs feature two key components. Firstly, to include sufficient spherical neighbors, we introduce a Spherical Window Transform (SWT) module. This module replicates the equator window’s spherical relationships across all other windows, leveraging the rotational invariance of the sphere. Remarkably, the transformation process is efficient, transforming all windows in a 512x1024 ERP image in just 0.038 seconds on a CPU. Secondly, we introduce a Planar-Spherical Interaction (PSI) module to calculate the SF-CRFs, which facilitates the relationships between regular and transformed windows. By integrating SF-CRFs blocks into a decoder, we propose CRF360D, a novel 360 depth estimation framework that achieves state-of-the-art performance across diverse datasets. Our CRF360D is compatible with different perspective image-trained backbones, serving as the encoder.
|
|
ThDT12 |
315 |
Motion Control 1 |
Regular Session |
Chair: Zhang, Cheng | Texas A&M University |
Co-Chair: Roncone, Alessandro | University of Colorado Boulder |
|
15:15-15:20, Paper ThDT12.1 | |
Bidirectional Energy Flow Modulation for Passive Admittance Control |
|
Lee, Donghyeon | Pohang University of Science and Technology(POSTECH) |
Ko, Dongwoo | POSTECH |
Kim, Min Jun | KAIST |
Chung, Wan Kyun | POSTECH |
Keywords: Compliance and Impedance Control, Force Control, Physical Human-Robot Interaction, Passivity-based Control
Abstract: Admittance control is a control scheme to enable physical interactions of a robot, but it easily induces instability when the robot contacts a rigid surface. In this study, passivity analysis was conducted on a robotic system with admittance control. The results showed that coupled stability with the environment can be ensured when the velocity error between the proxy and the real robot is eliminated. Thus, an adaptive structure modification method is proposed to suppress the possible source of instability. In addition, the energy tank method is combined with the proposed method to ensure system passivity. As a proof of concept, three robot experiments were performed, and the results of the proposed method were compared with those of the conventional admittance control and impedance control (with friction compensation). The comparison showed that the proposed method could make the system passive while it realized the desired dynamics during the interaction.
|
|
15:20-15:25, Paper ThDT12.2 | |
A Minimum-Jerk Approach to Handle Singularities in Virtual Fixtures |
|
Braglia, Giovanni | University of Modena and Reggio Emilia |
Calinon, Sylvain | Idiap Research Institute |
Biagiotti, Luigi | University of Modena and Reggio Emilia |
Keywords: Human-Robot Collaboration, Motion and Path Planning, Optimization and Optimal Control
Abstract: Implementing virtual fixtures in guiding tasks constrains the movement of the robot's end effector to specific curves within its workspace. However, incorporating guiding frameworks may encounter discontinuities when optimizing the reference target position to the nearest point relative to the current robot position. This article aims to give a geometric interpretation of such discontinuities, with specific reference to the commonly adopted Gauss-Newton algorithm. The effect of such discontinuities, defined as Euclidean Distance Singularities, is experimentally proved.We then propose a solution that is based on a linear quadratic tracking problem with minimum jerk command, then compare and validate the performances of the proposed framework in two different human-robot interaction scenarios.
|
|
15:25-15:30, Paper ThDT12.3 | |
Continuous Wrist Control on the Hannes Prosthesis: A Vision-Based Shared Autonomy Framework |
|
Vasile, Federico | Istituto Italiano Di Tecnologia |
Maiettini, Elisa | Humanoid Sensing and Perception, Istituto Italiano Di Tecnologia |
Pasquale, Giulia | Istituto Italiano Di Tecnologia |
Boccardo, Nicolň | IIT - Istituto Italiano Di Tecnologia |
Natale, Lorenzo | Istituto Italiano Di Tecnologia |
Keywords: Sensor-based Control, Deep Learning for Visual Perception, Prosthetics and Exoskeletons
Abstract: Most control techniques for prosthetic grasping focus on dexterous fingers control, but overlook the wrist motion. This forces the user to perform compensatory movements with the elbow, shoulder and hip to adapt the wrist for grasping. We propose a computer vision-based system that leverages the collaboration between the user and an automatic system in a shared autonomy framework, to perform continuous control of the wrist degrees of freedom in a prosthetic arm, promoting a more natural approach-to-grasp motion. Our pipeline allows to seamlessly control the prosthetic wrist to follow the target object and finally orient it for grasping according to the user intent. We assess the effectiveness of each system component through quantitative analysis and finally deploy our method on the Hannes prosthetic arm. Code and videos: https://hsp-iit.github.io/hannes-wrist-control
|
|
15:30-15:35, Paper ThDT12.4 | |
Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control |
|
Wang, Haochen | Shandong University |
Zhiwei, Shi | Shandong University |
Zhu, Chengxi | Shandong University |
Qiao, Yafei | Shandong University |
Zhang, Cheng | Texas A&M University |
Yang, Fan | Deepcode Robotics Co., Ltd |
Ren, Pengjie | Shandong University |
Lu, Lan | Shanghai Jiao Tong University |
Xuan, Dong | Shandong University |
Keywords: Product Design, Development and Prototyping, AI-Enabled Robotics, Autonomous Agents
Abstract: Learning-based methods, such as imitation learning (IL) and reinforcement learning (RL), can produce excel control policies over challenging agile robot tasks, such as sports robot. However, no existing work has harmonized learning-based policy with model-based methods to reduce training complexity and ensure the safety and stability for agile badminton robot control. In this paper, we introduce Hamlet, a novel hybrid control system for agile badminton robots. Specifically, we propose a model-based strategy for chassis locomotion which provides a base for arm policy. We introduce a physics-informed “IL+RL” training framework for learning-based arm policy. In this train framework, a model-based strategy with privileged information is used to guide arm policy training during both IL and RL phases. In addition, we train the critic model during IL phase to alleviate the performance drop issue when transitioning from IL to RL. We present results on our self-engineered badminton robot, achieving 94.5% success rate against the serving machine and 90.7% success rate against human players. Our system can be easily generalized to other agile mobile manipulation tasks e.g., agile catching, table tennis. A video demonstrating our system can be viewed at https://youtu.be/8-ixKAD18Mk.
|
|
15:35-15:40, Paper ThDT12.5 | |
Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems |
|
Welde, Jake | University of Pennsylvania |
Rao, Nishanth Arun | University of Pennsylvania |
Kunapuli, Pratik | University of Pennsylvania |
Jayaraman, Dinesh | University of Pennsylvania |
Kumar, Vijay | University of Pennsylvania |
Keywords: Dynamics, Reinforcement Learning, Aerial Systems: Mechanics and Control
Abstract: Tracking controllers enable robotic systems to accurately follow planned reference trajectories. In particular, reinforcement learning (RL) has shown promise in the synthesis of controllers for systems with complex dynamics and modest online compute budgets. However, the poor sample efficiency of RL and the challenges of reward design make training slow and sometimes unstable, especially for high-dimensional systems. In this work, we leverage the inherent Lie group symmetries of robotic systems with a floating base to mitigate these challenges when learning tracking controllers. We model a general tracking problem as a Markov decision process (MDP) that captures the evolution of both the physical and reference states. Next, we prove that symmetry in the underlying dynamics and running costs leads to an MDP homomorphism, a mapping that allows a policy trained on a lower-dimensional "quotient" MDP to be lifted to an optimal tracking controller for the original system. We compare this symmetry-informed approach to an unstructured baseline, using Proximal Policy Optimization (PPO) to learn tracking controllers for three systems: the Particle (a forced point mass), the Astrobee (a fully-actuated space robot), and the Quadrotor (an underactuated system). Results show that a symmetry-aware approach both accelerates training and reduces tracking error at convergence.
|
|
15:40-15:45, Paper ThDT12.6 | |
Quadratic Programming-Based Reference Spreading Control for Dual-Arm Robotic Manipulation with Planned Simultaneous Impacts |
|
van Steen, Jari J. | Eindhoven University of Technology |
van den Brandt, Gijs | Eindhoven University of Technology |
van de Wouw, Nathan | Eindhoven University of Technology |
Kober, Jens | TU Delft |
Saccon, Alessandro | Eindhoven University of Technology - TU/e |
Keywords: Impact-aware manipulation, Motion Control of Manipulators, Dual Arm Manipulation, Learning from Demonstration
Abstract: With the aim of further enabling the exploitation of intentional impacts in robotic manipulation, a control framework is presented that directly tackles the challenges posed by tracking control of robotic manipulators that are tasked to perform nominally simultaneous impacts. This framework is an extension of the reference spreading (RS) control framework, in which overlapping anteand post-impact references that are consistent with impact dynamics are defined. In this work, such a reference is constructed starting from a teleoperation-based approach. By using the corresponding ante- and post-impact control modes in the scope of a quadratic programming control approach, peaking of the velocity error and control inputs due to impacts is avoided while maintaining high tracking performance. With the inclusion of a novel interim mode, we aim to also avoid input peaks and steps when uncertainty in the environment causes a series of unplanned single impacts to occur rather than the planned simultaneous impact. This work in particular presents for the first time an experimental evaluation of RS control on a robotic setup, showcasing its robustness against uncertainty in the environment compared to three baseline control approaches.
|
|
15:45-15:50, Paper ThDT12.7 | |
HARMONIOUS - Human-Like Reactive Motion Control and Multimodal Perception for Humanoid Robots |
|
Rozlivek, Jakub | Czech Technical University in Prague, Faculty of Electrical Engi |
Roncone, Alessandro | University of Colorado Boulder |
Pattacini, Ugo | Istituto Italiano Di Tecnologia |
Hoffmann, Matej | Czech Technical University in Prague, Faculty of Electrical Engi |
Keywords: Humanoid Robots, Physical Human-Robot Interaction, Collision Avoidance, Biologically-Inspired Robots
Abstract: For safe and effective operation of humanoid robots in human-populated environments, the problem of commanding a large number of Degrees of Freedom (DoF) while simultaneously considering dynamic obstacles and human proximity has still not been solved. We present a new reactive motion controller that commands two arms of a humanoid robot and three torso joints (17 DoF in total). We formulate a quadratic program that seeks joint velocity commands respecting multiple constraints while minimizing the magnitude of the velocities. We introduce a new unified treatment of obstacles that dynamically maps visual and proximity (pre-collision) and tactile (post-collision) obstacles as additional constraints to the motion controller, in a distributed fashion over the surface of the upper body of the iCub robot (with 2000 pressure-sensitive receptors). This results in a bio-inspired controller that: (i) gives rise to a robot with whole-body visuo-tactile awareness, resembling peripersonal space representations, and (ii) produces human-like minimum jerk movement profiles. The controller was extensively experimentally validated, including a physical human-robot interaction scenario.
|
|
ThDT13 |
316 |
Resiliency and Security 1 |
Regular Session |
Chair: Kaur, Upinder | Purdue University |
Co-Chair: Gu, Jason | Dalhousie University |
|
15:15-15:20, Paper ThDT13.1 | |
FedDet: Data Poisoning Attack Detection for Federated Skeleton-Based Action Recognition |
|
Kim, Min Hyuk | Chonnam National University |
Lee, Eungi | Chonnam National University |
Yoo, Seok Bong | Chonnam National University |
Keywords: Computer Vision for Automation, Deep Learning for Visual Perception, Recognition
Abstract: Skeleton-based action recognition (SAR) models often centralize skeleton data, increasing significant privacy concerns. To address this, decentralized training models for SAR have been advanced, particularly using federated learning (FL), a research area of considerable value with wide-ranging applications, including human-robot interaction, camera-enabled devices, and security surveillance. However, FL-based SAR faces the challenge of substantial accuracy degradation due to data poisoning attacks; thus, it requires the identification of malicious clients. This paper introduces a novel approach for detecting data poisoning attacks in federated SAR, called FedDet. The method involves creating prototypes of perspective transform and exchanging these matrices between the clients and server to identify the malicious client. Additionally, a prototype-guided attack detector is developed, incorporating spatiotemporal matching to analyze the correlation between prototype skeleton data. Experimental results on FL frameworks and SAR models demonstrate that the proposed approach outperforms existing models. Our code is publicly available at https://github.com/alsgur0720/federated-detection.
|
|
15:20-15:25, Paper ThDT13.2 | |
ROS2WASM: Bringing the Robot Operating System to the Web |
|
Fischer, Tobias | Queensland University of Technology |
Paredes, Isabel | QuantStack, RWTH Aachen |
Batchelor, Michael | Queensland University of Technology |
Beier, Thorsten | QuantStack |
Haviland, Jesse | Queensland University of Technology |
Traversaro, Silvio | Istituto Italiano Di Tecnologia |
Vollprecht, Wolf Kristian | QuantStack |
Schmitz, Markus | RWTH Aachen University |
Milford, Michael J | Queensland University of Technology |
Keywords: Software Tools for Robot Programming, Software Tools for Benchmarking and Reproducibility, Engineering for Robotic Systems
Abstract: The Robot Operating System (ROS) has become the de facto standard middleware in robotics, widely adopted across domains ranging from education to industrial applications. The RoboStack distribution, a conda-based packaging system for ROS, has extended ROS's accessibility by facilitating installation across all major operating systems and architectures, integrating seamlessly with scientific tools such as PyTorch and Open3D. This paper presents ROS2WASM, a novel integration of RoboStack with WebAssembly, enabling the execution of ROS 2 and its associated software directly within web browsers, without requiring local installations. ROS2WASM significantly enhances the reproducibility and shareability of research, lowers barriers to robotics education, and leverages WebAssembly's robust security framework to protect against malicious code. We detail our methodology for cross-compiling ROS 2 packages into WebAssembly, the development of a specialized middleware for ROS 2 communication within browsers, and the implementation of www.ros2wasm.dev, a web platform enabling users to interact with ROS 2 environments. Additionally, we extend support to the Robotics Toolbox for Python and adapt its Swift simulator for browser compatibility. Our work paves the way for unprecedented accessibility in robotics, offering scalable, secure, and reproducible environments that have the potential to transform educational and research paradigms.
|
|
15:25-15:30, Paper ThDT13.3 | |
Prepared for the Worst: Resilience Analysis of the ICP Algorithm Via Learning-Based Worst-Case Adversarial Attacks |
|
Zhang, Ziyu | University of Toronto |
Laconte, Johann | French National Research Institute for Agriculture, Food and The |
Lisus, Daniil | University of Toronto |
Barfoot, Timothy | University of Toronto |
Keywords: Localization, Deep Learning Methods, Robot Safety
Abstract: This paper presents a novel method for assessing the resilience of the ICP algorithm via learning-based, worst-case attacks on lidar point clouds. For safety-critical applications such as autonomous navigation, ensuring the resilience of algorithms before deployments is crucial. The ICP algorithm is the standard for lidar-based localization, but its accuracy can be greatly affected by corrupted measurements from various sources, including occlusions, adverse weather, or mechanical sensor issues. Unfortunately, the complex and iterative nature of ICP makes assessing its resilience to corruption challenging. While there have been efforts to create challenging datasets and develop simulations to evaluate the resilience of ICP, our method focuses on finding the maximum possible ICP error that can arise from corrupted measurements at a location. We demonstrate that our perturbation-based adversarial attacks can be used pre-deployment to identify locations on a map where ICP is particularly vulnerable to corruptions in the measurements. With such information, autonomous robots can take safer paths when deployed, to mitigate against their measurements being corrupted. The proposed attack outperforms baselines more than 88% of the time across a wide range of scenarios.
|
|
15:30-15:35, Paper ThDT13.4 | |
SLAMSpoof: Practical LiDAR Spoofing Attacks on Localization Systems Guided by Scan Matching Vulnerability Analysis |
|
Nagata, Rokuto | Keio University |
Koide, Kenji | National Institute of Advanced Industrial Science and Technology |
Hayakawa, Yuki | Keio University |
Suzuki, Ryo | Keio University |
Ikeda, Kazuma | Keio University |
Sako, Ozora | Keio University |
Chen, Qi Alfred | University of California, Irvine |
Sato, Takami | University of California, Irvine |
Yoshioka, Kentaro | Keio University |
Keywords: Localization, SLAM, Intelligent Transportation Systems
Abstract: Accurate localization is essential for enabling modern full self-driving services. These services heavily rely on map-based traffic information to reduce uncertainties in recognizing lane shapes, traffic light locations, and traffic signs. Achieving this level of reliance on map information requires centimeter-level localization accuracy, which is currently only achievable with LiDAR sensors. However, LiDAR is known to be vulnerable to spoofing attacks that emit malicious lasers against LiDAR to overwrite its measurements. Once localization is compromised, the attack could lead the victim off roads or make them ignore traffic lights. Motivated by these serious safety implications, we design SLAMSpoof, the first practical LiDAR spoofing attack on localization systems for self-driving to assess the actual attack significance on autonomous vehicles. SLAMSpoof can effectively find the effective attack location based on our scan matching vulnerability score (SMVS), a point-wise metric representing the potential vulnerability to spoofing attacks. To evaluate the effectiveness of the attack, we conduct real-world experiments on ground vehicles and confirm its high capability in real-world scenarios, inducing position errors of ≥4.2 meters (more than typical lane width) for all 3 popular LiDAR-based localization algorithms. We finally discuss the potential countermeasures of this attack. Code is available at https://github.com/Keio-CSG/slamspoof
|
|
15:35-15:40, Paper ThDT13.5 | |
Gradient-Based Adversarial Attacks on Deep LiDAR Odometry |
|
Song, Zhenbo | Nanjing University of Science and Technology |
Chen, Xuanzhu | Nanjing University of Science and Technology |
Zhang, Zhenyuan | Nanjing University of Science and Technology |
Zhang, Kaihao | Harbin Institute of Technology, Shenzhen |
Lu, Jianfeng | Nanjing University of Science & Technology |
Li, Weiqing | Nanjing University of Sci.&Tech |
Keywords: Intelligent Transportation Systems, Robot Safety, Deep Learning Methods
Abstract: Adversarial attacks have been recently investigated in LiDAR perception problems for autonomous driving, where a small perturbation to the source inputs can result in incorrect predictions. However, most prior studies focus on attacks on single-frame perception modules, lacking explorations of attacks on consecutive-frame tasks, i.e. the LiDAR odometry. In this paper, we propose a gradient optimization-based adversarial attack towards deep LiDAR odometry networks. To generate point clouds consistent with real-world scenarios, we constrain adversarial points within the range of a small object, e.g. a traffic cone, and render new points to simulate real LiDAR measurements. By incorporating such adversarial points in consecutive frames, we demonstrate a significant decrease in pose estimation accuracy of current popular LiDAR odometry networks. In addition, we also evaluate traditional geometric odometry approaches and report their robustness over adversarial points. Extensive experiments on the KITTI and Waymo datasets illustrate the effectiveness of the proposed attack method and the vulnerability of deep LiDAR odometry methods against adversarial points.
|
|
15:40-15:45, Paper ThDT13.6 | |
Enhancing 3D Robotic Vision Robustness by Minimizing Adversarial Mutual Information through a Curriculum Training Approach |
|
Darabi, Nastaran | University of Illinois Chicago |
Jayasuriya, Dinithi | University of Illinois Chicago |
Naik, Devashri | University of Illinois Chicago |
Tulabandhula, Theja | University of Illinois Chicago |
Trivedi, Amit Ranjan | University of Illinois at Chicago (UIC), Chicago, USA |
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, Robot Safety
Abstract: Adversarial attacks exploit vulnerabilities in a model's decision boundaries through small, carefully crafted perturbations that lead to significant mispredictions. In 3D vision, the high dimensionality and sparsity of data greatly expand the attack surface, making 3D vision particularly vulnerable for safety-critical robotics. To enhance 3D vision's adversarial robustness, we propose a training objective that simultaneously minimizes prediction loss and mutual information (MI) under adversarial perturbations to contain the upper bound of misprediction errors. This approach simplifies handling adversarial examples compared to conventional methods, which require explicit searching and training on adversarial samples. However, minimizing prediction loss conflicts with minimizing MI, leading to reduced robustness and catastrophic forgetting. To address this, we integrate curriculum advisors in the training setup that gradually introduce adversarial objectives to balance training and prevent models from being overwhelmed by difficult cases early in the process. The advisors also enhance robustness by encouraging training on diverse MI examples through entropy regularizers. We evaluated our method on ModelNet40 and KITTI using PointNet, DGCNN, SECOND, and PointTransformers, achieving 2--5% accuracy gains on ModelNet40 and a 5--10% mAP improvement in object detection. Our code is publicly available at https://github.com/nstrndrbi/Mine-N-Learn.
|
|
ThDT14 |
402 |
End-Effectors |
Regular Session |
Chair: Hughes, Josie | EPFL |
Co-Chair: Tadokoro, Satoshi | Tohoku University |
|
15:15-15:20, Paper ThDT14.1 | |
PaTS-Wheel: A Passively-Transformable Single-Part Wheel for Mobile Robot Navigation on Unstructured Terrain |
|
Godden, Thomas | Imperial College London |
Mulvey, Barry William | Imperial College London |
Redgrave, Ellen | Imperial College London |
Nanayakkara, Thrishantha | Imperial College London |
Keywords: Compliant Joints and Mechanisms, Underactuated Robots
Abstract: Most mobile robots use wheels that perform well on even and structured ground, like in factories and warehouses. However, they face challenges traversing unstructured terrain such as stepped obstacles. This paper presents the design and testing of the PaTS-Wheel: a Passively-Transformable Single-part Wheel that can transform to render hooks when presented with obstacles. The passive rendering of this useful morphological feature is guided purely by the geometry of the obstacle. The energy consumption and vibrational profile of the PaTS-Wheel on flat ground is comparable to a standard wheel of the same size. In addition, our novel wheel design (with a diameter of 120 mm) was tested traversing different terrains with stepped obstacles of incremental heights. The PaTS-Wheel achieved 100 % success rate at traversing stepped obstacles 83 mm high (~ 70 % its diameter), higher than the results obtained for an equivalent wheel at 30 mm (~ 25 % its diameter) and an equivalent wheg at 73 mm (~ 61% its diameter). This achieves the design objectives of combining the energy efficiency and ride smoothness of wheels with the obstacle traversal capabilities of legged robots, all without requiring any sensors, actuators, or controllers.
|
|
15:20-15:25, Paper ThDT14.2 | |
Balloon Pin-Array Gripper: Two-Step Shape Adaptation Mechanism for Stable Grasping against Object Misalignment |
|
Kemmotsu, Yuto | Tohoku University |
Tadakuma, Kenjiro | Osaka University |
Abe, Kazuki | Osaka University |
Watanabe, Masahiro | Osaka University |
Tadokoro, Satoshi | Tohoku University |
Keywords: Compliant Joints and Mechanisms, Grasping, Soft Robot Materials and Design
Abstract: This study introduces a balloon pin-array gripper combining shape adaptability to various objects, stable holding by multipoint contact, and isotropic grasping performance. This is particularly useful when the shape or position of the objects cannot be accurately determined because of sensor limitations. This gripper has multiple pins whose tips are covered by flexible balloons. The gripper can adapt to the shapes of objects in two steps: axial sliding of the pins and radial inflation of the balloons. This study focuses on the effect of the layout of pins on grasping and proposes a simulation model to quantify the characteristics of each layout. Simulations showed that the concentric layout enables stable grasping by ensuring many pins contact the object, regardless of misalignment. Experiments using a prototype gripper demonstrated a trend consistent with the simulation results, proving the validity of the simulation model.
|
|
15:25-15:30, Paper ThDT14.3 | |
Adaptive Perching and Grasping by Aerial Robot with Light-Weight and High Grip-Force Tendon-Driven Three-Fingered Hand Using Single Actuator |
|
Iida, Hisaaki | The University of Tokyo |
Sugihara, Junichiro | The University of Tokyo |
Sugihara, Kazuki | The University of Tokyo |
Kozuka, Haruki | The University of Tokyo |
Li, Jinjie | The University of Tokyo |
Nagato, Keisuke | The University of Tokyo |
Zhao, Moju | The University of Tokyo |
Keywords: Aerial Systems: Applications, Multifingered Hands, Tendon/Wire Mechanism
Abstract: Aerial robots, especially multirotor type, have been utilized in various scenarios such as inspection, surveillance, and logistics. The most critical issue for multirotor type is the limited flight time due to the large power consumption for hovering against gravity. Inspired by nature, various research focus on the perching and grasping ability by deploying a gripper on the multirotor to grasp arboreal environments for saving energy; however, most the mechanical design for gripper restricts the approach path, which significantly limits the performance of perching and grasping. Besides, it is also challenging to design a light gripper that also offers sufficiently large grip force to hang itself. Therefore, in this work, we develop a single-actuator hand for aerial robot that enables adaptive grasping of various objects, and thus can perch from various approach directions. First, we present the design of the lightweight three-fingered hand with a pair of special two-dimensional differential plates that enables the adaptive grasping with a single actuator. In addition, we develop a unique control method for the over-actuated aerial robot equipped with this hand to perform both adaptive pendulum-like perching and detachment. Finally, we demonstrate the feasibility of the prototype hand via load-bearing test and various object grasping tests, along with the inflight perching experiments.
|
|
15:30-15:35, Paper ThDT14.4 | |
CAFEs: Cable-Driven Collaborative Floating End-Effectors for Agriculture Applications |
|
Cheng, Hung Hon | EPFL |
Hughes, Josie | EPFL |
Keywords: Robotics and Automation in Agriculture and Forestry, Tendon/Wire Mechanism, Actuation and Joint Mechanisms
Abstract: CAFEs (Collaborative Agricultural Floating End-effectors) is a new robot design and control approach to automating large-scale agricultural tasks. Based upon a cable driven robot architecture, by sharing the same roller-driven cable set with modular robotic arms, a fast-switching clamping mechanism allows each CAFE to clamp onto or release from the moving cables, enabling both independent and synchronized movement across the workspace. The methods developed to enable this system include the mechanical design, precise position control and a dynamic model for the spring-mass liked system, ensuring accurate and stable movement of the robotic arms. The system's scalability is further explored by studying the tension and sag in the cables to maintain performance as more robotic arms are deployed. Experimental and simulation results demonstrate the system’s effectiveness in tasks including pick-and-place showing its potential to contribute to agricultural automation.
|
|
15:35-15:40, Paper ThDT14.5 | |
A Robotic Finger with a 4-Bar Linkage-Based Compact and Continuously Variable Active Transmission |
|
Chung, Sungho | Sogang University |
Sohn, Eugene | Sogang University |
Jeong, Seokhwan | Mechanical Eng., Sogang University |
Keywords: Mechanism Design, Actuation and Joint Mechanisms, Compliant Joints and Mechanisms
Abstract: This paper presents a practical design implementation of 4-bar linkage-based compact and continuously variable active transmission (CCVAT) specifically tailored for the form factor of a robotic finger. The proposed CCVAT aims to solve the two major limitations of conventional linkage-based continuously variable transmission: increased inertia and complexities associated with miniaturization. To counter these limitations, our design incorporates a custom flexible shaft within the joint of the robotic finger, enhancing its adaptability and operational efficiency. In addition, we proposed a cascaded control architecture, combining a disturbance-observer-based low-level controller and a mid-level controller responsible for managing both the transmission ratio and flexion angle of the system. Finally, the feasibility of the prototype was evaluated by conducting several experiments.
|
|
15:40-15:45, Paper ThDT14.6 | |
A Dexterous and Compliant (DexCo) Hand Based on Soft Hydraulic Actuation for Human Inspired Fine In-Hand Manipulation |
|
Zhou, Jianshu | University of California, Berkeley |
Junda, Huang | Chinese University of Hong Kong |
Dou, Qi | The Chinese University of Hong Kong |
Abbeel, Pieter | UC Berkeley |
Liu, Yunhui | Chinese University of Hong Kong |
Keywords: Dexterous Manipulation, Soft Robot Applications, Multifingered Hands, Grippers and Other End-Effectors
Abstract: Human beings possess a remarkable skill for fine in-hand manipulation, utilizing both intra-finger interactions (in-finger) and finger-environment interactions across a wide range of daily tasks. These tasks range from skilled activities like screwing light bulbs, picking and sorting pills, and in-hand rotation, to more complex tasks such as opening plastic bags, cluttered bin picking, and counting cards. Despite its prevalence in human activities, replicating these fine motor skills in robotics remains a substantial challenge. This study tackles the challenge of fine in-hand manipulation by introducing the dexterous and compliant (DexCo) hand system. The DexCo hand mimics human dexterity, replicating the intricate interaction between the thumb, index, and middle fingers, with a contractable palm. The key to maneuverable fine in-hand manipulation lies in its innovative soft hydraulic actuation, which strikes a balance between control complexity, dexterity, compliance, and motion accuracy within a compact structure, enhancing the overall performance of the system. The model of soft hydraulic actuation, based on hydrostatic force analysis, reveals the compliance of hand joints, whic
|
|
ThDT15 |
403 |
Robot Applications |
Regular Session |
Chair: Yim, Sehyuk | KIST |
Co-Chair: Cramariuc, Andrei | ETH Zurich |
|
15:15-15:20, Paper ThDT15.1 | |
A Minimally Designed Audio-Animatronic Robot |
|
Park, Kyu Min | Sejong University |
Cheon, Jeongah | Korea Institute of Science and Technology |
Yim, Sehyuk | KIST |
Keywords: Mechanism Design, Additive Manufacturing, Tendon/Wire Mechanism, Audio-Driven Motion Generation
Abstract: Animatronic robots that simulate lively and realistic motions of creatures can be excellent robotic platforms for social interaction with people. In particular, a robot head is a very important part to express various emotions and generate human-friendly and aesthetic impressions. This article presents Ray, a new type of audio-animatronic robot head. All mechanical structure of the robot is built in one step by 3D printing and has multiple layers expressing the overall shape of a human head and important features such as eyes, nose, mouth, and chin. This simple, lightweight structure and the separate tendon-based actuation system underneath allow for smooth, fast motions of the robot. We also develop an audio-driven motion generation module that automatically synthesizes natural and rhythmic motions of the head and mouth. The developed robot platform is used for various applications for example as a talking robot, robot singer, and robot MC. We expect this research opens up a new paradigm and application possibilities for minimally designed audio-animatronic robots.
|
|
15:20-15:25, Paper ThDT15.2 | |
High Speed Robotic Table Tennis Swinging Using Lightweight Hardware with Model Predictive Control |
|
Nguyen, David | Massachusetts Institute of Technology |
Cancio, Kendrick | Massachusetts Institute of Technology |
Kim, Sangbae | Massachusetts Institute of Technology |
Keywords: Hardware-Software Integration in Robotics, Optimization and Optimal Control, Humanoid Robot Systems
Abstract: We present a robotic table tennis platform that achieves a variety of hit styles and ball-spins with high precision, power, and consistency. This is enabled by a custom lightweight, high-torque, low rotor inertia, five degree-of-freedom arm capable of high acceleration. To generate swing trajectories, we formulate an optimal control problem (OCP) that constrains the state of the paddle at the time of the strike. The terminal position is given by a predicted ball trajectory, and the terminal orientation and velocity of the paddle are chosen to match various possible styles of hits: loops (topspin), drives (flat), and chops (backspin). Finally, we construct a fixed-horizon model predictive controller (MPC) around this OCP to allow the hardware to quickly react to changes in the predicted ball trajectory. We validate on hardware that the system is capable of hitting balls with an average exit velocity of 11 m/s at an 88% success rate across the three swing types.
|
|
15:25-15:30, Paper ThDT15.3 | |
Learning Quiet Walking for a Small Home Robot |
|
Watanabe, Ryo | SONY Group |
Miki, Takahiro | ETH Zurich |
Shi, Fan | National University of Singapore |
Kadokawa, Yuki | Nara Institute of Science and Technology |
Bjelonic, Filip | ETH Zürich, Switzerland |
Kawaharazuka, Kento | The University of Tokyo |
Cramariuc, Andrei | ETHZ |
Hutter, Marco | ETH Zurich |
Keywords: Domestic Robotics, Legged Robots, Reinforcement Learning
Abstract: As home robotics gains traction, robots are increasingly integrated into households, offering companionship and assistance. Quadruped robots, particularly those resembling dogs, have emerged as popular alternatives for traditional pets. However, user feedback highlights concerns about the noise these robots generate during walking at home, particularly the loud footstep impact sound. To address this issue, we propose a reinforcement learning (RL) based approach to minimize the foot contact velocity highly related to the footstep sound. Our framework incorporates three key elements: learning varying PD gains to actively dampen and stiffen each joint, utilizing foot contact sensors, and employing curriculum learning to gradually enforce penalties on foot contact velocity. Experiments demonstrate that our learned policy achieves superior quietness compared to a RL baseline and the carefully handcrafted Sony commercial controller baselines. Furthermore, the trade-off between robustness and quietness is shown. This research contributes to developing quieter and more user-friendly robotic companions in home environments.
|
|
15:30-15:35, Paper ThDT15.4 | |
Evaluating Human-Robot Skill Gaps in Electrical Circuit Inspection: A New Electronic Task Board for Benchmarking Manipulation |
|
So, Peter | Technical University of Munich |
Swikir, Abdalla | Mohamed Bin Zayed University of Artificial Intelligence |
Abu-Dakka, Fares | New York University Abu Dhabi |
Haddadin, Sami | Mohamed Bin Zayed University of Artificial Intelligence |
Keywords: Performance Evaluation and Benchmarking, Industrial Robots, Software Tools for Benchmarking and Reproducibility
Abstract: Robot manipulation researchers reference human performance as a goal for their work, however, human data is seldom present in robotics benchmarks. We introduce a real-world benchmark targeting manipulation skills for performing electrical circuit inspection with a multimeter using an Internet-connected electronic task board. We present timing study results and an exemplary robot solution across six different tasks from the Robothon Grand Challenge at the automatica conference in 2023. Contributions from 16 robot teams were collected using task boards we manufactured and distributed as part of the 30-day international competition as an initial performance database. Our work systematically highlights the skill gap between the winning robot solution and the best human performance from a group of 30 subjects. Our goal is to chronicle progress over time in robot manipulation skills and provide a standardized, physical benchmark across the global community. Videos of the team submissions, the exemplary robot solution, as well as the project reproduction code are provided in the included repository.
|
|
15:35-15:40, Paper ThDT15.5 | |
RaccoonBot: An Autonomous Wire-Traversing Solar-Tracking Robot for Persistent Environmental Monitoring |
|
Mendez-Flores, Efrain | University of California, Irvine |
Pourshahidi, Agaton | University of California, Irvine |
Egerstedt, Magnus | University of California, Irvine |
Keywords: Hardware-Software Integration in Robotics, Environment Monitoring and Management, Energy and Environment-Aware Automation
Abstract: Environmental monitoring is used to characterize the health and relationship between organisms and their environments. In forest ecosystems, robots can serve as platforms to acquire such data, even in hard-to-reach places where wire-traversing platforms are particularly promising due to their efficient displacement. This paper presents the RaccoonBot, which is a novel autonomous wire-traversing robot for persistent environmental monitoring, featuring a fail-safe mechanical design with a self-locking mechanism in case of electrical shortage. The robot also features energy-aware mobility through a novel Solar tracking algorithm, that allows the robot to find a position on the wire to have direct contact with solar power to increase the energy harvested. Experimental results validate the electro-mechanical features of the RaccoonBot, showing that it is able to handle wire perturbations, different inclinations, and achieving energy autonomy.
|
|
15:40-15:45, Paper ThDT15.6 | |
Fast and Accurate Relative Motion Tracking for Dual Industrial Robots |
|
He, Honglu | Rensselaer Polytechnic Institute |
Lu, Chen-Lung | Rensselaer Polytechnic Institute |
Saunders, Glenn | Rensselaer Polytechnic Institute |
Wason, John | Wason Technology, LLC |
Yang, Pinghai | GE Research |
Schoonover, Jeffrey | GE Research |
Ajdelsztajn, Leo | GE |
Paternain, Santiago | Rensselaer Polytechnic Institute |
Julius, Agung | Rensselaer Polytechnic Institute |
Wen, John | Rensselaer Polytechnic Institute |
Keywords: Motion and Path Planning, Optimization and Optimal Control, Industrial Robots
Abstract: Industrial robotic applications such as spraying, welding, and additive manufacturing frequently require fast, accurate, and uniform motion along a 3D spatial curve. To increase process throughput, some manufacturers propose a dual-arm setup to overcome the speed limitation of a single robot. Industrial robot motion is programmed through waypoints connected by motion primitives (Cartesian linear and circular paths and linear joint paths at constant Cartesian speed). The actual robot motion is affected by the blending between these motion primitives and the pose of the robot (an outstretched/near-singularity pose tends to have larger path-tracking errors). Choosing the waypoints and the speed along each motion segment to achieve the performance requirement is challenging. At present, there is no automated solution, and laborious manual tuning by robot experts is needed to approach the desired performance. In this letter, we present a systematic three-step approach to designing and programming a dual-arm system to optimize system performance. The first step is to select the relative placement between the two robots based on the specified relative motion path. The second step is to select the relative waypoints and the motion primitives. The final step is to update the waypoints iteratively based on the actual measured relative motion. Waypoint iteration is first executed in simulation and then completed using the actual robots. For performance assessment, we use the mean path speed subject to the relative position and orientation constraints and the path speed uniformity constraint. We have demonstrated the effectiveness of this method on two systems, a physical testbed of two ABB robots and a simulation testbed of two FANUC robots, for two challenging test curves.
|
|
ThDT16 |
404 |
Soft Robotics 2 |
Regular Session |
Chair: Chin, Lillian | UT Austin |
Co-Chair: Han, Amy Kyungwon | Seoul National University |
|
15:15-15:20, Paper ThDT16.1 | |
Inflatable-Structure-Based Working-Channel Securing Mechanism for Soft Growing Robots |
|
Seo, Dongoh | Korea Advanced Institute of Science and Technology |
Kim, Nam Gyun | Korea Advanced Institute of Science and Technology |
Ryu, Jee-Hwan | Korea Advanced Institute of Science and Technology |
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Soft Sensors and Actuators
Abstract: Soft growing robots are being used in various fields owing to their distinct advantages. However, their ability to manipulate tools in different applications is still challenging. In this paper, we propose an inflatable-structure-based working- channel securing mechanism for soft growing robots. The pro- posed mechanism provides a solution for securing a stable and accessible working channel with pressure equal to the atmospheric pressure, while maintaining the unique advantages of soft growing robots. The proposed soft growing robot can freely transfer materials and tools through its interior channel; therefore, it can adapt and replace equipment based on specific work requirements. This capability enhances the versatility and efficiency of the robot in various applications. Prototyping and experimental validation were conducted to show the performance and capabilities of the robot. The results of the experiments demonstrated that the soft growing robot effectively secured the working channel, enabling the transfer of materials and tools without interference from the inflation pressure. The accessibility of the secured channel was validated through slide-plate and pipe-pulling experiments. The demonstration of the growing mechanism confirmed the ability of the robot to secure a working channel during its growth, whereas the steering demonstration showcased its inherent steering function.
|
|
15:20-15:25, Paper ThDT16.2 | |
Tendon Locking for Antagonistic Configuration and Stiffness-Control in Soft Robots |
|
Licher, Johann | Leibniz University Hannover |
Peters, Jan | Leibniz Universität Hannover |
Raatz, Annika | Leibniz Universität Hannover |
Wurdemann, Helge Arne | University College London |
Keywords: Soft Sensors and Actuators, Soft Robot Applications, Soft Robot Materials and Design
Abstract: Some applications, such as surgical interventions, require that potential soft robots have the capability to alter their shape and enhance their force output on demand. This paper presents an antagonistic stiffening mechanism combining pneumatic actuation with tendon locking to achieve configuration- and stiffness control. Elongation of a soft pneumatic section, resulting from air actuation, is opposed by constraining the length of integrated tendons. These tendons can be locked in length by pneumatically activated levers at the base of each segment. Hence, tendon locking will not affect the configuration of other segments of a multi-segment manipulator. Our concept achieves a stiffness increase of up to 201.7% and a larger, more uniform radial workspace compared to the widely used pneumatic actuation concept while maintaining the low technical effort required for actuation. We also demonstrate how our actuation concept enables independent control of stiffness levels for individual segments of a multi-segment manipulator and their MR compatibility.
|
|
15:25-15:30, Paper ThDT16.3 | |
Large-Expansion Bi-Layer Auxetics Create Compliant Cellular Motion |
|
Chin, Lillian | UT Austin |
Xie, Gregory | MIT |
Lipton, Jeffrey | Northeastern University |
Rus, Daniela | MIT |
Keywords: Actuation and Joint Mechanisms, Swarm Robotics, Compliant Joints and Mechanisms
Abstract: There is significant interest in creating compliant modular robots that can change their volume. Inspired by how biological cells move, these systems can potentially combine the resilience of modular robotics with the increased environmental interactions of soft robotics. However, current versions have limited speed, expansion, and portability. In this paper, we address these concerns through AuxSwarm, a compliant system composed of auxetic-based robotic voxels. These voxels control their volume through a scissor-like bi-layer auxetic design, growing up to 1.57 times their original size in 0.2 seconds. This combination of speed and expansion is unique across modular soft robots, enabling dynamic locomotion capabilities. We characterize the voxels and demonstrate the versatility of this approach through case studies of 2D bending and 3D cube flipping. AuxSwarm provides a first step towards addressable voxel-based smart materials, while simultaneously addressing the robustness and actuation challenges faced by soft robots
|
|
15:30-15:35, Paper ThDT16.4 | |
EViper-2D: A Thin Large-Area Soft Robotics Platform |
|
Cheng, Hsin | Princeton University |
Veilleux, Elias | Princeton University |
Zheng, Zhiwu | Princeton University |
Wagner, Sigurd | Princeton University |
Verma, Naveen | Princeton University |
Sturm, James | Princeton University |
Chen, Minjie | Princeton University |
Keywords: Modeling, Control, and Learning for Soft Robots
Abstract: This paper presents the key principles of eViper-2D -- a thin large-area soft robotics platform -- as a new development of the previous extendable Vibrating Intelligent Piezo-Electric Robot (eViper) platform. We first introduce the mechanical, electrical, and control framework of eViper-2D, and then develop systematic and scalable methods to study the impact of diverse actuation patterns on robotic motion dynamics and energy efficiency. By integrating power electronics, communication circuits, piezoelectric actuators, and batteries onboard, the eViper-2D platform enables rapid design iteration and quick evaluation of different control strategies for the multi-actuator soft robot. The platform supports data-driven modeling via automated data acquisition. We show that eViper-2D can provide rich insights into optimizing actuation patterns to achieve agile motion and minimal cost of transport (COT).
|
|
15:35-15:40, Paper ThDT16.5 | |
Bio-Inspired Soft Magnetic Swimming Robot for Flexible Motions |
|
Li, Xiaosa | Tsinghua University |
Lin, Zenan | Tsinghua University |
Ding, Wenbo | Tsinghua University |
Keywords: Soft Robot Materials and Design, Modeling, Control, and Learning for Soft Robots, Software-Hardware Integration for Robot Systems
Abstract: Bio-inspired soft robots have gained significant attention for their flexible design and adaptability to various environments, making them suitable for exploration and task execution in confined or hazardous areas. However, the deformation and motion of soft magnetic robots rely on both their structural design and magnetization, which complicates the guided movement and balance maintenance for aquatic environments. In this work, inspired by the flat and symmetrical body of rays, we design a soft magnetic fish-shaped robot capable of flexible motions and trajectory swimming on the water surface. This robot features the muscle made of magnetic elastomer, which connects with the acrylic skeleton and silicone film fins with a soft body. In the external magnetic field, the robot achieves hovering by flapping its fins, driven by the magnetically actuated deformation of its magnetic muscle. Besides, the robot's axial magnetization enables the rapid steering guided by a horizontal field. In experiments, the soft magnetic robot was tasked with performing a looping figure-eight trajectory movement on the water surface, guided by the field gradient generated by a dense planar electromagnetic coils' array. When moving, the onboard circuit board of the robot collected its inertial and temperature information, and sent these data to the host computer via Bluetooth in real-time for motion monitoring. Received data demonstrated that our robot performed the specified afloat swimming trajectory, exhibiting a good stability on its yaw angle during the continuous motion. The soft magnetic swimming robot shows its integrated functionalities in untethered actuation, on-robot sensing, and wireless communication, indicating a significant prospect on applications in inspection and cleaning within narrow pipelines and enclosed mechanical interior spaces.
|
|
15:40-15:45, Paper ThDT16.6 | |
Magnetic Programming of Soft Materials Using Digitally Processed Laser Heating |
|
Kocabas, Fatih | University of Wisconsin-Madison |
Oguztuzun, Ozan | University of Wisconsin-Madison |
Zhou, Youyi | University of Wisconsin-Madison |
Alapan, Yunus | University of Wisconsin-Madison |
Keywords: Soft Robot Materials and Design
Abstract: Spatial programming of magnetic soft materials holds immense potential for wide ranging applications in soft robotics, minimally invasive medicine, and haptic interfaces. Despite tremendous and rapid progress in encoding spatially resolved magnetization directions over soft structures, the currently available approaches employ sequential encoding, resulting in slow and tedious processes with limited throughput. In this paper, we present a rapid and parallel magnetic programming strategy based on digitally processed laser heating. Heating above the Curie temperature of the magnetic microparticles embedded within the soft material allows their facile magnetization in desired directions via small external magnetic fields. To achieve parallel and rapid magnetic programming, we developed an integrated digital laser processing and magnetic field generation system, facilitating generation of desired shapes and patterns at high-resolution. Performance of the pattern generation and magnetic soft material are experimentally evaluated. Employing the described magnetic programming framework, shape-morphing of magnetic soft structures with varying magnetic profiles are shown. The proposed approach establishes a rapid and facile encoding procedure with high-throughput magnetic programming potential.
|
|
15:45-15:50, Paper ThDT16.7 | |
Proprioceptive State Estimation for Amphibious Tactile Sensing |
|
Han, Xudong | Southern University of Science and Technology |
Guo, Ning | Southern University of Science and Technology |
Zhong, Shuqiao | Southern University of Science and Technology |
Zhou, Zhiyuan | Southern University of Science and Technology |
Lin, Jian | Southern University of Science and Technology |
Song, Chaoyang | Southern University of Science and Technology |
Wan, Fang | Southern University of Science and Technology |
Keywords: Modeling, Control, and Learning for Soft Robots, Computer Vision for Other Robotic Applications, Grasping, Proprioceptive State Estimation
Abstract: This paper presents a novel vision-based proprioception approach for a soft robotic finger that can estimate and reconstruct tactile interactions in terrestrial and aquatic environments. The key to this system lies in the finger's unique metamaterial structure, which facilitates omni-directional passive adaptation during grasping, protecting delicate objects across diverse scenarios. A compact in-finger camera captures high-framerate images of the finger's deformation during contact, extracting crucial tactile data in real time. We present a volumetric discretized model of the soft finger and use the geometry constraints captured by the camera to find the optimal estimation of the deformed shape. The approach is benchmarked using a motion capture system with sparse markers and a haptic device with dense measurements. Both results show state-of-the-art accuracies, with a median error of 1.96 mm for overall body deformation, corresponding to 2.1% of the finger's length. More importantly, the state estimation is robust in both on-land and underwater environments, as we demonstrate its usage for underwater object shape sensing. This combination of passive adaptation and real-time tactile sensing paves the way for amphibious robotic grasping applications. All codes are shared on GitHub: https://github.com/ancorasir/PropSE.
|
|
ThDT17 |
405 |
Planning with Contact |
Regular Session |
Chair: Lozano-Perez, Tomas | MIT |
Co-Chair: Stueckler, Joerg | University of Augsburg |
|
15:15-15:20, Paper ThDT17.1 | |
Fast Contact-Implicit Model Predictive Control |
|
Le Cleac'h, Simon | Stanford University |
Howell, Taylor | Stanford University |
Yang, Shuo | Carnegie Mellon University |
Lee, Chi-Yen | Carnegie Mellon University |
Zhang, John | Carnegie Mellon University |
Bishop, Arun | Carnegie Mellon University |
Schwager, Mac | Stanford University |
Manchester, Zachary | Carnegie Mellon University |
Keywords: Optimization and Optimal Control, Model Predictive Control, Legged Robots, Motion Control
Abstract: We present a general approach for controlling robotic systems that make and break contact with their environments. Contact-implicit model predictive control (CI-MPC) generalizes linear MPC to contact-rich settings by utilizing a bi-level planning formulation with lower-level contact dynamics formulated as time-varying linear complementarity problems (LCPs) computed using strategic Taylor approximations about a reference trajectory. These dynamics enable the upper-level planning problem to reason about contact timing and forces, and generate entirely new contact-mode sequences online. To achieve reliable and fast numerical convergence, we devise a structure-exploiting interior-point solver for these LCP contact dynamics and a custom trajectory optimizer for the tracking problem. We demonstrate real-time solution rates for CI-MPC and the ability to generate and track non-periodic behaviours in hardware experiments on a quadrupedal robot. We also show that the controller is robust to model mismatch and can respond to disturbances by discovering and exploiting new contact modes across a variety of robotic systems in simulation, including a pushbot, planar hopper, planar quadruped, and planar biped.
|
|
15:20-15:25, Paper ThDT17.2 | |
Robo-GS: A Physics Consistent Spatial-Temporal Model for Robotic Arm with Hybrid Representation |
|
Lou, Haozhe | University of Southern California |
Liu, Yurong | Beijing Institute of Technology |
Pan, Yike | University of Michigan |
Geng, Yiran | Peking University |
Chen, Jianteng | Hong Kong University of Science and Technology |
Ma, Wenlong | Beijing Institute of Technology |
Li, Chenglong | Beijing Institute of Technology |
Wang, Lin | Beijing Institute of Technology |
Feng, Hengzhen | Beijing Institute of Technology |
Shi, Lu | Tsinghua University |
Shi, Yongliang | Tsinghua University |
Keywords: Simulation and Animation, Methods and Tools for Robot System Design, Software Architecture for Robotic and Automation
Abstract: Real2Sim2Real plays a critical role in robotic arm control and reinforcement learning, yet bridging this gap remains a significant challenge due to the complex physical properties of robots and the objects they manipulate. Existing methods lack a comprehensive solution to accurately reconstruct real-world objects with spatial representations and their associated physics attributes. We propose a Real2Sim pipeline with a hybrid representation model that integrates mesh geometry, 3D Gaussian kernels, and physics attributes to enhance the digital asset representation of robotic arms. This hybrid representation is implemented through a Gaussian-Mesh-Pixel binding technique, which establishes an isomorphic mapping between mesh vertices and Gaussian models. This enables a fully differentiable rendering pipeline that can be optimized through numerical solvers, achieves high-fidelity rendering via Gaussian Splatting, and facilitates physically plausible simulation of the robotic arm's interaction with its environment using mesh-based methods. Given the digital assets, we propose a manipulable Real2Sim pipeline that standardizes coordinate systems and scales, ensuring the seamless integration of multiple components. In addition to reconstructing the robotic arm, the surrounding static background and objects can be holistically reconstructed, enabling seamless interactions between the robotic arm and its environment. We also provide datasets covering various robotic manipulation tasks and robotic arm mesh reconstructions. These datasets include real-world motion captured in digital assets, ensuring precise representation of mass and friction, which are crucial for robotic manipulation. Our model achieves state-of-the-art results in realistic rendering and mesh reconstruction quality for robotic applications.
|
|
15:25-15:30, Paper ThDT17.3 | |
One-Shot Manipulation Strategy Learning by Making Contact Analogies |
|
Liu, Yuyao | Tsinghua University |
Mao, Jiayuan | MIT |
Tenenbaum, Joshua | Massachusetts Institute of Technology |
Lozano-Perez, Tomas | MIT |
Kaelbling, Leslie | MIT |
Keywords: Integrated Planning and Learning, Deep Learning in Grasping and Manipulation, Learning from Demonstration
Abstract: We present a novel approach, MAGIC (manipulation analogies for generalizable intelligent contacts), for one-shot learning of manipulation strategies with fast and extensive generalization to novel objects. By leveraging a reference action trajectory, MAGIC effectively identifies similar contact points and sequences of actions on novel objects to replicate a demonstrated strategy, such as using different hooks to retrieve distant objects of different shapes and sizes. Our method is based on a two-stage contact-point matching process that combines global shape matching using pretrained neural features with local curvature analysis to ensure precise and physically plausible contact points. We experiment with three tasks including scooping, hanging, and hooking objects. MAGIC demonstrates superior performance over existing methods, achieving significant improvements in runtime speed and generalization to different object categories. Website: https://magic-2024.github.io/.
|
|
15:30-15:35, Paper ThDT17.4 | |
Incremental Few-Shot Adaptation for Non-Prehensile Object Manipulation Using Parallelizable Physics Simulators |
|
Baumeister, Fabian | Max Planck Institute for Intelligent Systems |
Mack, Lukas | University of Augsburg |
Stueckler, Joerg | University of Augsburg |
Keywords: Incremental Learning, Integrated Planning and Learning, Learning from Experience
Abstract: Few-shot adaptation is an important capability for intelligent robots that perform tasks in open-world settings such as everyday environments or flexible production. In this paper, we propose a novel approach for non-prehensile manipulation which incrementally adapts a physics-based dynamics model for model-predictive control (MPC). The model prediction is aligned with a few examples of robot-object interactions collected with the MPC. This is achieved by using a parallelizable rigid-body physics simulation as dynamic world model and sampling-based optimization of the model parameters. In turn, the optimized dynamics model can be used for MPC using efficient sampling-based optimization. We evaluate our few-shot adaptation approach in object pushing experiments in simulation and with a real robot.
|
|
15:35-15:40, Paper ThDT17.5 | |
Efficient Gradient-Based Inference for Manipulation Planning in Contact Factor Graphs |
|
Lee, Jeongmin | Seoul National University |
Park, Sunkyung | Seoul National University |
Lee, Minji | Seoul National University |
Lee, Dongjun | Seoul National University |
Keywords: Manipulation Planning, Contact Modeling, Dexterous Manipulation
Abstract: This paper presents a framework designed to tackle a range of planning problems arise in manipulation, which typically involve complex geometric-physical reasoning related to contact and dynamic constraints. We introduce the Contact Factor Graph (CFG) to graphically model these diverse factors, enabling us to perform inference on the graphs to approximate the distribution and sample appropriate solutions. We propose a novel approach that can incorporate various phenomena of contact manipulation as differentiable factors, and develop an efficient inference algorithm for CFG that leverages this differentiability along with the conditional probabilities arising from the structured nature of contact. Our results demonstrate the capability of our framework in generating viable samples and approximating posterior distributions for various manipulation scenarios.
|
|
15:40-15:45, Paper ThDT17.6 | |
Polyhedral Collision Detection Via Vertex Enumeration |
|
Cinar, Andrew | Vanderbilt University |
Zhao, Yue | Vanderbilt University |
Laine, Forrest | Vanderbilt University |
Keywords: Collision Avoidance, Constrained Motion Planning
Abstract: Collision detection is a critical functionality for robotics. The degree to which objects collide cannot be represented as a continuously differentiable function for any shapes other than spheres. This paper proposes a framework for handling collision detection between polyhedral shapes. We frame the signed distance between two polyhedral bodies as the optimal value of a convex optimization, and consider constraining the signed distance in a bilevel optimization problem. To avoid relying on specialized bilevel solvers, our method exploits the fact that the signed distance is the minimal point of a convex region related to the two bodies. Our method enumerates the values obtained at all extreme points of this region and lists them as constraints in the higher-level problem. We compare our formulation to existing methods in terms of accuracy and speed when solved using the same mixed complementarity problem solver. We demonstrate that our approach more reliably solves difficult collision detection problems with multiple obstacles than other methods, and is faster than existing methods in some cases.
|
|
15:45-15:50, Paper ThDT17.7 | |
Flying Calligrapher: Contact-Aware Motion and Force Planning and Control for Aerial Manipulation |
|
Guo, Xiaofeng | Carnegie Mellon Univeristy |
He, Guanqi | Carnegie Mellon University |
Xu, Jiahe | Carnegie Mellon University |
Mousaei, Mohammadreza | Carnegie Mellon University |
Geng, Junyi | Pennsylvania State University |
Scherer, Sebastian | Carnegie Mellon University |
Shi, Guanya | Carnegie Mellon University |
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Integrated Planning and Control
Abstract: Aerial manipulation has gained interest in completing high-altitude tasks that are challenging for human workers, such as contact inspection and defect detection, etc. Previous research has focused on maintaining static contact points or forces. This letter addresses a more general and dynamic task: simultaneously tracking time-varying contact force in the surface normal direction and motion trajectories on tangential surfaces. We propose a pipeline that includes a contact-aware trajectory planner to generate dynamically feasible trajectories, and a hybrid motion-force controller to track such trajectories. We demonstrate the approach in an aerial calligraphy task using a novel sponge pen design as the end-effector, whose stroke width is positively related to the contact force. Additionally, we develop a touchscreen interface for flexible user input. Experiments show our method can effectively draw diverse letters, achieving an IoU of 0.59 and an end-effector position (force) tracking RMSE of 2.9 cm (0.7 N). Website: https://xiaofeng-guo.github.io/flying-calligrapher/.
|
|
ThDT18 |
406 |
Imaging, Scanning, Localization |
Regular Session |
Chair: Jiang, Zhongliang | Technical University of Munich |
Co-Chair: Huang, Baoru | Imperial College London |
|
15:15-15:20, Paper ThDT18.1 | |
Autonomous Robotic Ultrasound Approach for Fetoscope Tracking by Fusing Optical and 2D Ultrasound Data |
|
Cai, Yuyu | KU Leuven |
Li, Ruixuan | KU Leuven |
Davoodi, Ayoob | Katholieke Universiteit Leuven(KU Leuven) |
Ourak, Mouloud | University of Leuven |
Deprest, Jan | University Hospital Leuven |
Vander Poorten, Emmanuel B | KU Leuven |
Keywords: Medical Robots and Systems, Sensor Fusion
Abstract: 2D ultrasound (US) guidance is an essential tool in fetoscopic laser photocoagulation (FLP) to treat twin-to-twin transfusion syndrome (TTTS). During the procedure, the sonographer and endoscopic surgeon manage different image modalities each with its own field of view. Tacit collaboration is needed between them to visualize the right information and ensure the smooth operation of the procedure. Robotic approaches could simplify this interaction but would require robust localization tools to cope with the complex fetoscopic motion patterns. This study proposes a method for robotic ultrasound (rUS) fetoscope tracking, fusing optical tracking system (OTS) and 2D US imaging. The Kalman filter is defined to guarantee robust online registration and enhance fetoscope tracking. Real-time detection of the fetoscope tip is achieved using the You Only Look Once (YOLO v7) algorithm. Additionally, a US image-based searching strategy is proposed for situations where the optical camera is obstructed. Hybrid position-force control is employed to manipulate the US probe safely against the pregnant abdomen. Validation on a silicone phantom demonstrates accurate tracking results with a mean error below 2.59 mm and tip visibility exceeding 90% is found in most experiments. The proposed system could potentially reduce surgeon workload and training costs for FLP surgery.
|
|
15:20-15:25, Paper ThDT18.2 | |
Guiding the Last Centimeter: Novel Anatomy-Aware Probe Servoing for Standardized Imaging Plane Navigation in Robotic Lung Ultrasound (I) |
|
Ma, Xihan | Worcester Polytechnic Institute |
Zeng, Mingjie | Worcester Polytechnic Institute |
Hill, Jeffrey C. | MCPHS University |
Hoffmann, Beatrice | Beth Israel Deaconess Medical Center |
Zhang, Ziming | Worcester Polytechnic Institute |
Zhang, Haichong | Worcester Polytechnic Institute |
Keywords: Medical Robots and Systems, Visual Servoing, Object Detection, Segmentation and Categorization
Abstract: Navigating the ultrasound (US) probe to the standardized imaging plane (SIP) for image acquisition is a critical but operator-dependent task in conventional freehand diagnostic US. Robotic US systems (RUSS) offer the potential to enhance imaging consistency by leveraging real-time US image feedback to optimize the probe pose, thereby reducing reliance on operator expertise. However, determining the proper approach to extracting generalizable features from the US images for probe pose adjustment remains challenging. In this work, we propose a SIP navigation framework for RUSS, exemplified in the context of robotic lung ultrasound (LUS). This framework facilitates automatic probe adjustment when in proximity to the SIP. This is achieved by explicitly extracting multiple anatomical features presented in real-time LUS images and performing non-patient-specific template matching to generate probe motion towards the SIP using image-based visual servoing (IBVS). The framework is further integrated with the active-sensing end-effector (A-SEE), a customized robot end-effector that leverages patient external body geometry to maintain optimal probe alignment with the contact surface, thus preserving US signal quality throughout the navigation. The proposed approach ensures procedural interpretability and inter-patient adaptability. Validation is conducted through anatomy-mimicking phantom and in-vivo evaluations involving five human subjects.
|
|
15:25-15:30, Paper ThDT18.3 | |
Automatic Robotic-Assisted Diffuse Reflectance Spectroscopy Scanning System |
|
Deng, Kaizhong | Imperial College London |
Peters, Christopher | Imperial College London |
Mylonas, George | Imperial College London |
Elson, Daniel | Imperial College London |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics, Visual Servoing
Abstract: Diffuse Reflectance Spectroscopy (DRS) is a well-established optical technique for tissue composition assessment which has been validated for tumour detection to ensure the complete removal of cancerous tissue. While point-wise assessment has many potential applications, incorporating automated large-area scanning would enable holistic tissue sampling with higher consistency. We propose a robotic system to facilitate autonomous DRS scanning with hybrid visual servoing control. A specially designed height compensation module enables precise contact condition control. The evaluation results show that the system can accurately execute the scanning command and acquire consistent DRS spectra with comparable results to the manual collection, which is the current gold standard protocol. Integrating the proposed system into surgery lays the groundwork for autonomous intra-operative DRS tissue assessment with high reliability and repeatability. This could reduce the need for manual scanning by the surgeon while ensuring complete tumor removal in clinical practice.
|
|
15:30-15:35, Paper ThDT18.4 | |
Robust and Accurate Multi-View 2D/3D Image Registration with Differentiable X-Ray Rendering and Dual Cross-View Constraints |
|
Cui, Yuxin | Shandong University |
Min, Zhe | University College London |
Song, Rui | Shandong University |
Li, Yibin | Shandong University |
Meng, Max Q.-H. | The Chinese University of Hong Kong |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics
Abstract: Robust and accurate 2D/3D registration, which aligns the preoperative model and the intraoperative image of the same anatomy, plays an important role in enabling successful interventional navigation. To alleviate the challenge of limited field of view associated with single intraoperative image, more than one intraoperative images can be leveraged and the multi-view 2D/3D registration is thus needed. In this paper, we propose a novel multi-view 2D/3D rigid registration approach which consists of two stages. In the first stage, the combined loss function consisting of the differences between the predicted and ground-truth poses, and dissimilarities (e.g., normalized crosscorrelation) between the simulated and observed intraoperative images. More importantly, the additional cross-view training loss terms are formulated for both pose and image loss, to explicitly consider the cross-view constraints. In the second stage, the test-time optimization is conducted to refine the estimated poses in the coarse stage. Our method leverages the mutual constraints of multi-frame view projection poses to enhance the robustness of the multi-view 2D/3D registration approach. The proposed framework achieves an mTRE of 0.79±2.17 mm on six datasets from DeepFluoro, further advancing beyond the state-of-the-art registration algorithms on this dataset.
|
|
15:35-15:40, Paper ThDT18.5 | |
Robust Robotic Breast Ultrasound Scanning and Real-Time Lesion Localization |
|
Cao, Zhiyan | Huazhong University of Science and Technology |
Wang, Yiwei | Huazhong University of Science and Technology |
Zhao, Huan | Huazhong University of Science and Technology |
Ding, Han | Huazhong University of Science and Technology |
Zhang, Shaohua | Huazhong University of Science and Technology |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics
Abstract: The inherent flexibility and real-time deformation of breast tissue pose significant challenges for achieving full coverage and accurate lesion localization in autonomous breast ultrasound scanning. This paper introduces a robust finite state machine-based framework that mimics the decision-making process of an experienced physician, dynamically transitioning between global breast scan and fine lesion scan. An autonomous radial and anti-radial global scan pattern ensures comprehensive breast coverage. To avoid lesion misidentification caused by soft tissue movement, a real-time lesion fine scan method is proposed for lesion detection and localization. Experimental results demonstrate that our system in full coverage tests achieves 7 identified lesions out of 7 existing lesions and maintains a robust localization accuracy of 3.23 mm across phantoms with varying stiffnesses.
|
|
15:40-15:45, Paper ThDT18.6 | |
Hybrid Deep Reinforcement Learning for Radio Tracer Localisation in Robotic-Assisted Radioguided Surgery |
|
Zhang, Hanyi | Imperial College London |
Deng, Kaizhong | Imperial College London |
Hu, Zhaoyang Jacopo | Imperial College London |
Huang, Baoru | Imperial College London |
Elson, Daniel | Imperial College London |
Keywords: Medical Robots and Systems, Surgical Robotics: Laparoscopy, Reinforcement Learning
Abstract: Radioguided surgery, such as sentinel lymph node biopsy, relies on the precise localization of radioactive targets by non-imaging gamma/beta detectors. Manual radioactive target detection based on visual display or audible indication of gamma level is highly dependent on the ability of the surgeon to track and interpret the spatial information. This paper presents a learning-based method to realize the autonomous radiotracer detection in robot-assisted surgeries by navigating the probe to the radioactive target. We proposed novel hybrid approach that combines deep reinforcement learning (DRL) with adaptive robotic scanning. The adaptive grid-based scanning could provide initial direction estimation while the DRL-based agent could efficiently navigate to the target utilising historical data. Simulation experiments demonstrate a 95% success rate, and improved efficiency and robustness compared to conventional techniques. Real-world evaluation on the da Vinci Research Kit (dVRK) further confirms the feasibility of the approach, achieving an 80% success rate in radiotracer detection. This method has the potential to enhance consistency, reduce operator dependency, and improve procedural accuracy in radioguided surgeries.
|
|
15:45-15:50, Paper ThDT18.7 | |
Improving Probe Localization for Freehand 3D Ultrasound Using Lightweight Cameras |
|
Huang, Dianye | Technical University of Munich |
Navab, Nassir | TU Munich |
Jiang, Zhongliang | Technical University of Munich |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics
Abstract: Ultrasound (US) probe localization relative to the examined subject is essential for freehand 3D US imaging, which offers significant clinical value due to its affordability and unrestricted field of view. However, existing methods often rely on expensive tracking systems or bulky probes, while recent US image-based deep learning methods suffer from accumulated errors during probe maneuvering. To address these challenges, this study proposes a versatile, cost-effective probe pose localization method for freehand 3D US imaging, utilizing two lightweight cameras. To eliminate accumulated errors during US scans, we introduce PoseNet, which directly predicts the probe's 6D pose relative to a preset world coordinate system based on camera observations. We first jointly train pose and camera image encoders based on pairs of 6D pose and camera observations densely sampled in simulation. This will encourage each pair of probe pose and its corresponding camera observation to share the same representation in latent space. To ensure the two encoders handle unseen images and poses effectively, we incorporate a triplet loss that enforces smaller differences in latent features between nearby poses compared to distant ones. Then, the pose decoder uses the latent representation of the camera images to predict the probe's 6D pose. To bridge the sim-to-real gap, in the real world, we use the trained image encoder and pose decoder for initial predictions, followed by an additional MLP layer to refine the estimated pose, improving accuracy. The results obtained from an arm phantom demonstrate the effectiveness of the proposed method, which notably surpasses state-of-the-art techniques, achieving average positional and rotational errors of 2.03 mm and 0.37 deg, respectively.
|
|
ThDT19 |
407 |
Manufacturing and Assembly Processes |
Regular Session |
Chair: Fox, Dieter | University of Washington |
Co-Chair: Fang, Kuan | Cornell University |
|
15:15-15:20, Paper ThDT19.1 | |
Robot-Based Automatic Charging for Electric Vehicles Using Incremental Learning and Biomimetic Control |
|
Zeng, Chao | University of Liverpool |
Ye, Dexi | South China University of Technology |
Wang, Ning | Sheffield Hallam University |
Feng, Chen | Zhejiang VIE Science & Technology Co., Ltd |
Yang, Chenguang | University of Liverpool |
Keywords: Compliant Assembly, Incremental Learning, Compliance and Impedance Control
Abstract: With the growing popularity of electric vehicles, the demand for robot-based unmanned automatic charging has become both urgent and challenging. Two key challenges need to be addressed: how to efficiently locate the charging port, and how to compliantly insert the connector into the port. In this paper, we propose an incremental learning method based on the broad learning system to address the visual positioning error of the charging port. This method allows the robot to transfer and generalize the search skills learned in simulation to real-world scenarios. As a result, the robot can rapidly locate the charging port in real world environments without the need for complex contact state modeling, time-consuming data collection, or model retraining. Subsequently, a biomimetic admittance controller is designed to enable the robot to adapt its compliant behavior online during the plugging process. Finally, experiments are performed on a UR robot to verify the effectiveness of our method.
|
|
15:20-15:25, Paper ThDT19.2 | |
CC-STAR: An Estimation for Contact State Transition Using Reconstruction-Based Anomaly Detection for Peg-In-Hole Assembly |
|
Lee, Haeseong | Graduate School of Convergence Science and Technology, Seoul Nat |
Sung, Eunho | Seoul National University |
You, Seungbin | Seoul National University |
Park, Jaeheung | Seoul National University |
Keywords: Assembly, Intelligent and Flexible Manufacturing, AI-Enabled Robotics
Abstract: For successful peg-in-hole assembly, predefined sub-tasks should be executed sequentially according to the current contact state. Therefore, recognizing contact state transitions is essential in order to determine whether to continue the current task or proceed to the next. In that context, learning-based solutions have shown outstanding results. However, these methods heavily rely on balanced datasets, which are challenging to obtain due to the short duration of certain contact states and rare failure cases. To address this issue, this paper proposes a framework for estimating contact state transitions using anomaly detection through input data reconstruction. The proposed framework operates in a semi-supervised manner, eliminating the need for balanced datasets during training. For input data reconstruction, a convolutional neural network is combined with a variational autoencoder to process various sensor measurements as a multivariate time series. Unlike traditional binary anomaly detection, the proposed anomaly detector scores reconstruction errors and leverages domain knowledge to identify various contact state transitions in the peg-in-hole assembly. The effectiveness of the proposed framework is validated through experiments using a torque-controlled dual manipulator system.
|
|
15:25-15:30, Paper ThDT19.3 | |
Blox-Net: Generative Design-For-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset |
|
Goldberg, Andrew | University of California Berkeley |
Kondap, Kavish | University of California, Berkeley |
Qiu, Tianshuang | University of California, Berkeley |
Ma, Zehan | University of California, Berkeley |
Fu, Letian | UC Berkeley |
Kerr, Justin | University of California, Berkeley |
Huang, Huang | University of California at Berkeley |
Chen, Kaiyuan | University of California, Berkeley |
Fang, Kuan | Cornell University |
Goldberg, Ken | UC Berkeley |
Keywords: Assembly, Robotics and Automation in Construction, AI-Based Methods
Abstract: Generative AI systems have shown impressive capabilities in creating text, code, and images. Inspired by the importance of research in industrial Design for Assembly, we introduce a novel problem: Generative Design-for-Robot- Assembly (GDfRA). The task is to generate an assembly based on a natural language prompt (e.g., “giraffe”) and an image of available physical components, such as 3D-printed blocks. The output is an assembly, a spatial arrangement of these components, accompanied by instructions for a robot to build it. The output geometry must 1) resemble the requested object and 2) be reliably assembled by a 6 DoF robot arm with a suction gripper. We then present Blox-Net, a GDfRA system that com- bines generative vision language models with well-established methods in computer vision, simulation, perturbation analysis, motion planning, and physical robot experimentation to solve a class of GDfRA problems without human supervision. Blox-Net achieved a Top-1 accuracy of 63.5% in the semantic accuracy of its designed assemblies. Six designs, after Blox-Net’s automated pertubation redesign, were reliably assembled by a robot, achieving near-perfect success across 10 consecutive assembly iterations with human intervention only during reset prior to assembly. The entire pipeline from the textual word to reliable physical assembly is performed without human intervention.
|
|
15:30-15:35, Paper ThDT19.4 | |
Geometry and Force-Informed Robotic Assembly with Small Relative Initial Deviations for Circular Electrical Connectors |
|
Wang, Zhenyu | Huazhong University of Science and Technology |
Li, Xiangfei | Huazhong University of Science and Technology |
Zhao, Huan | Huazhong University of Science and Technology |
Shao, Lingjun | Huazhong University of Science and Technology |
Zhang, Hao | Huazhong University of Science and Technology |
Ding, Han | Huazhong University of Science and Technology |
Keywords: Assembly, Compliant Assembly
Abstract: Circular electrical connectors (CECs) have a wide range of applications in scenarios that require reliable connections. However, sockets are often located in narrow scenes with random spatial orientations, complex lighting conditions, and obstructions from cables, making it difficult to accurately locate them through cameras. Besides, due to the complex geometric structure of CECs and the presence of electrode protection slots, the existing research on the assembly of cylindrical or polygonal pegs and holes may not be applicable to the assembly of such components. To this end, this article proposes a novel robotic assembly strategy for CECs with small relative initial deviations, whose core is to design a search trajectory and heuristic force strategy to perceive force/pose (F/P) discontinuity characteristics under different geometric constraints. This assembly strategy is independent of the CEC's size and is not affected by the socket's spatial orientation. The experiments with two different sizes of CECs on a robot equipped with a 6-dimensional force/torque (F/T) sensor are conducted, and the effectiveness and robustness of the proposed assembly strategy for CECs are demonstrated.
|
|
15:35-15:40, Paper ThDT19.5 | |
MatchMaker: Automated Asset Generation for Robotic Assembly |
|
Wang, Yian | Umass Amherst |
Tang, Bingjie | University of Southern California |
Gan, Chuang | IBM |
Fox, Dieter | University of Washington |
Mo, Kaichun | NVIDIA |
Narang, Yashraj | NVIDIA |
Akinola, Iretiayo | Columbia University |
Keywords: Assembly, AI-Enabled Robotics, Computer Vision for Manufacturing
Abstract: Robotic assembly remains a significant challenge due to complexities in visual perception, functional grasping, contact-rich manipulation, and performing high-precision tasks. Simulation-based learning and sim-to-real transfer have led to recent success in solving assembly tasks in the presence of object pose variation, perception noise, and control error; however, the development of a generalist (i.e., multi-task) agent for a broad range of assembly tasks has been limited by the need to manually curate assembly assets, which greatly constrains the number and diversity of assembly problems that can be used for policy learning. Inspired by recent success of using Generative AI to scale up robot learning, we propose MatchMaker, a pipeline to automatically generate diverse, simulation-compatible assembly asset pairs to facilitate learning assembly skills. Specifically, MatchMaker can 1) take a simulation-incompatible, interpenetrating asset pair as input, and automatically convert it into a simulation-compatible, interpenetration-free pair, 2) take an arbitrary single asset as input , and generate a geometrically-mating asset to create an asset pair, 3) automatically erode contact surfaces from (1) or (2) according to a user-specified clearance parameter to generate realistic parts.
|
|
15:40-15:45, Paper ThDT19.6 | |
CNSv2: Probabilistic Correspondence Encoded Neural Image Servo |
|
Chen, Anzhe | Zhejiang University |
Yu, Hongxiang | Zhejiang University |
Li, Shuxin | Zhejiang University |
Chen, Yuxi | Zhejiang University |
Zhou, Zhongxiang | Zhejiang University |
Sun, WenTao | Beijing Institute of Technology |
Xiong, Rong | Zhejiang University |
Wang, Yue | Zhejiang University |
Keywords: Assembly, Intelligent and Flexible Manufacturing, Visual Servoing
Abstract: Visual servo based on traditional image matching methods often requires accurate keypoint correspondence for high precision control. However, keypoint detection or matching tends to fail in challenging scenarios with inconsistent illuminations or textureless objects, resulting significant performance degradation. Previous approaches, including our proposed Correspondence encoded Neural image Servo policy (CNS), attempted to alleviate these issues by integrating neural control strategies. While CNS shows certain improvement against error correspondence over conventional image-based controllers, it could not fully resolve the limitations arising from poor keypoint detection and matching. In this paper, we continue to address this problem and propose a new solution: Probabilistic Correspondence Encoded Neural Image Servo (CNSv2). CNSv2 leverages probabilistic feature matching to improve robustness in challenging scenarios. By redesigning the architecture to condition on multimodal feature matching, CNSv2 achieves high precision, improved robustness across diverse scenes and runs in real-time. We validate CNSv2 with simulations and real-world experiments, demonstrating its effectiveness in overcoming the limitations of detector-based methods in visual servo tasks.
|
|
15:45-15:50, Paper ThDT19.7 | |
Supervised Representation Learning towards Generalizable Assembly State Recognition |
|
Schoonbeek, Tim Jeroen | Eindhoven University of Technology |
Balachandran, Goutham | ASML |
Onvlee, Hans | ASML |
Houben, Tim | Eindhoven University of Technology |
Hung, Shao-Hsuan | Eindhoven University of Technology |
Kustra, Jacek | ASML |
de With, Peter H.N. | Eindhoven University of Technology |
van der Sommen, Fons | Eindhoven University of Technology |
Keywords: Representation Learning, Computer Vision for Manufacturing, Deep Learning Methods
Abstract: Assembly state recognition facilitates the execution of assembly procedures, offering feedback to enhance efficiency and minimize errors. However, recognizing assembly states poses challenges in scalability, since parts are frequently updated, and the robustness to execution errors remains underexplored. To address these challenges, this paper proposes an approach based on representation learning and the novel intermediate-state informed loss function modification (ISIL). ISIL leverages unlabeled transitions between states and demonstrates significant improvements in clustering and classification performance for all tested architectures and losses. Despite being trained exclusively on images without execution errors, thorough analysis on error states demonstrates that our approach accurately distinguishes between correct states and states with various types of execution errors. The integration of the proposed algorithm can offer meaningful assistance to workers and mitigate unexpected losses due to procedural mishaps in industrial settings. The code and data are publicly available.
|
|
ThDT20 |
408 |
Agricultural Automation 3 |
Regular Session |
Chair: Papageorgiou, Dimitrios | Hellenic Mediterranean University |
Co-Chair: Berenson, Dmitry | University of Michigan |
|
15:15-15:20, Paper ThDT20.1 | |
Panoptic Segmentation with Partial Annotations for Agricultural Robots |
|
Weyler, Jan | University of Bonn |
Läbe, Thomas | University of Bonn |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Keywords: Robotics and Automation in Agriculture and Forestry, Semantic Scene Understanding, Deep Learning for Visual Perception
Abstract: A detailed analysis of agricultural fields is key toward reducing the use of agrochemicals to achieve a more sustainable crop production. To this end, agricultural robots equipped with vision-based systems offer the potential to detect individual plants in the field automatically. This capability enables targeted management actions in the field, effectively reducing the amount of agrochemicals. A primary target of such vision systems is to perform a panoptic segmentation, combining the task of semantic and instance segmentation. Recent methods use neural networks for this task, which typically have to be trained on densely annotated images containing the required ground truth information for each pixel. Gathering these dense annotations is generally daunting and requires domain experts' knowledge in the agricultural domain. In this paper, we propose a method to effectively reduce the annotation bottleneck and yet achieve high performance using partial annotations. These partial annotations contain ground truth information only for a subset of pixels per image and are thus much faster to obtain than dense annotations. We propose a novel set of losses that exploit measures from vector fields used in physics, i.e., divergence and curl, to effectively supervise predictions without ground truth annotations. The experimental evaluation shows that our approach outperforms several state-of-the-art methods targeting to reduce the amount of annotations.
|
|
15:20-15:25, Paper ThDT20.2 | |
Robotic 3D Flower Pose Estimation for Small-Scale Urban Farms |
|
Muriki, Venkata Harsh Suhith | Georgia Institute of Technology |
Teo, Hong Ray | Georgia Institute of Technology |
Sengupta, Ved | Georgia Tech Research Institute |
Hu, Ai-Ping | Georgia Tech Research Institute |
Keywords: Robotics and Automation in Agriculture and Forestry, Computer Vision for Automation
Abstract: The small scale of urban farms and the commercial availability of low-cost robots (such as the FarmBot) that automate simple tending tasks enable an accessible platform for plant phenotyping. We have used a FarmBot with a custom camera end-effector to estimate strawberry plant flower pose (for robotic pollination) from acquired 3D point cloud models. We describe a novel algorithm that translates individual occupancy grids along orthogonal axes of a point cloud to obtain 2D images corresponding to the six viewpoints. For each image, 2D object detection models for flowers are used to identify 2D bounding boxes which can be converted into the 3D space to extract flower point clouds. Pose estimation is performed by fitting three shapes (superellipsoids, paraboloids and planes) to the flower point clouds and compared with manually labeled ground truth. Our method successfully finds approximately 80% of flowers scanned using our customized FarmBot platform and has a mean flower pose error of 7.7 degrees, which is sufficient for robotic pollination and rivals previous results. All code will be made available at https://github.com/harshmuriki/flowerPose.git.
|
|
15:25-15:30, Paper ThDT20.3 | |
Fault Management System for the Safety of Perception Systems in Highly Automated Agricultural Machines |
|
Lee, Changjoo | Technical University of Munich |
Schätzle, Simon | STW (Sensor-Technik Wiedemann GmbH) |
Lang, Stefan Andreas | Sensor-Technik Wiedemann |
Maier, Michael | Technical University of Munich |
Oksanen, Timo | Technical University of Munich |
Keywords: Robotics and Automation in Agriculture and Forestry, Robot Safety, Deep Learning for Visual Perception
Abstract: Safe and reliable environmental perception is crucial for the highly automated or even autonomous operation of agriculture machines. However, developing such a system is challenging due to imperfect perception sensors. This article proposes a fault management system (FMS) for detecting, diagnosing, and mitigating risks that compromise the safety and reliability of perception systems. This article aims to develop an improved image quality safety model (IQSM) for the FMS to detect and diagnose the causes of performance insufficiencies in object detection. The IQSM exhibits remarkable performance, achieving an accuracy of about 98%, demonstrating its ability to effectively identify performance insufficiencies under pre-defined hazardous scenarios.
|
|
15:30-15:35, Paper ThDT20.4 | |
Learning to Prune Branches in Modern Tree-Fruit Orchards |
|
Jain, Abhinav | Oregon State University |
Grimm, Cindy | Oregon State University |
Lee, Stefan | Oregon State University |
Keywords: Robotics and Automation in Agriculture and Forestry, Visual Servoing, Field Robots
Abstract: Dormant tree pruning is labor-intensive but essential to maintaining modern highly-productive fruit orchards. In this work we present a closed-loop visuomotor controller for robotic pruning. The controller guides the cutter through a cluttered tree environment to reach a specified cut point and ensures the cutters are perpendicular to the branch. We train the controller using a novel orchard simulation that captures the geometric distribution of branches in a target apple orchard configuration. Unlike traditional methods requiring full 3D reconstruction, our controller uses just optical flow images from a wrist-mounted camera. We deploy our learned policy in simulation and the real-world for an example V-Trellis envy tree with zero-shot transfer, achieving a sim30% success rate -- approximately half the performance of an oracle planner.
|
|
15:35-15:40, Paper ThDT20.5 | |
Towards Safe and Efficient Through-The-Canopy Autonomous Fruit Counting with UAVs |
|
Yang, Teaya | UC Berkeley |
Ibrahimov, Roman | UC Berkeley |
Mueller, Mark Wilfried | University of California, Berkeley |
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Agricultural Automation
Abstract: We present an autonomous aerial system for safe and efficient through-the-canopy fruit counting. Aerial robot applications in large-scale orchards face significant challenges due to the complexity of fine-tuning flight paths based on orchard layouts, canopy density, and plant variability. Through-the-canopy navigation is crucial for minimizing occlusion by leaves and branches but is more challenging due to the complex and dense environment compared to traditional over-the-canopy flights. Our system addresses these challenges by integrating: i) a high-fidelity simulation framework for global path planning, ii) a low-cost autonomy stack for canopy-level navigation and data collection, and iii) a robust workflow for fruit detection and counting using RGB images. We validate our approach through fruit counting with canopy-level aerial images and by demonstrating the autonomous navigation capabilities of our experimental vehicle.
|
|
15:40-15:45, Paper ThDT20.6 | |
Language-Guided Object Search in Agricultural Environments |
|
Balaji, Advaith | University of Michigan |
Pradhan, Saket | University of Michigan |
Berenson, Dmitry | University of Michigan |
Keywords: Robotics and Automation in Agriculture and Forestry, Deep Learning Methods
Abstract: Creating robots that can assist in farms and gardens can help reduce the mental and physical workload experienced by farm workers. We tackle the problem of object search in a farm environment, providing a method that allows a robot to semantically reason about the location of an unseen target object among a set of previously seen objects in the environment using a Large Language Model (LLM). We leverage object-to-object semantic relationships to plan a path through the environment that will allow us to accurately and efficiently locate our target object while also reducing the overall distance traveled, without needing high-level room or area-level semantic relationships. During our evaluations, we found that our method outperformed a current state-of-the-art baseline and our ablations. Our offline testing yielded an average path efficiency of 84%, reflecting how closely the predicted path aligns with the ideal path. Upon deploying our system on the Boston Dynamics Spot robot in a real-world farm environment, we found that our system had a success rate of 80%, with a success weighted by path length of 0.67, which demonstrates a reasonable trade-off between task success and path efficiency under real-world conditions. The project website can be viewed at: adi-balaji.github.io/losae
|
|
15:45-15:50, Paper ThDT20.7 | |
Robotic Grape Inspection and Selective Harvesting in Vineyards |
|
Stavridis, Sotiris | Aristotle University of Thessaloniki |
Droukas, Leonidas | Aristotle University of Thessaloniki |
Doulgeri, Zoe | Aristotle University of Thessaloniki |
Papageorgiou, Dimitrios | Hellenic Mediterranean University |
Dimeas, Fotios | Aristotle University of Thessaloniki |
Soriano, Angel | Robotnik Automation SL |
Molina, Sergi | University of Lincoln |
Deiri, Ahmed Sami | SAGA Robotics |
Hutchinson, Michael | Saga Robotics |
Pulido Fentanes, Jaime | Saga Robotics |
Hroob, Ibrahim | University of Lincoln |
Polvara, Riccardo | University of Lincoln |
Hanheide, Marc | University of Lincoln |
Cielniak, Grzegorz | University of Lincoln |
Samarinas, Nikiforos | Laboratory of Remote Sensing, Spectroscopy, and GIS, School of A |
Kateris, Dimitrios | Centre for Research and Technology Hellas (CERTH) |
Bochtis, Dionysis | CERTH |
Peleka, Georgia | CERTH, Thessaloniki Greece |
Papadam, Stefanos | Certh / Iti |
Triantafyllou, Dimitra | CERTH |
Papadimitriou, Alexios | Certh / Iti |
Papadopoulos, Christos | ITI/CERTH |
Mariolis, Ioannis | CERTH |
Giakoumis, Dimitris | Centre for Research and Technology Hellas |
Tzovaras, Dimitrios | Centre for Research and Technology Hellas |
Keywords: Robotics and Automation in Agriculture and Forestry, Bimanual Manipulation, Computer Vision for Automation
Abstract: Driven by the increasing food demand and the need for higher-quality cultivation, precision agriculture grows steadily during the last decade. It involves the application of mobile robots and intelligent robotic technologies in various agricultural field tasks, concerning a variety of crop types. Aiming at compensating for the lack of selective robotic harvesting solutions regarding the high-value crop of grapes, the EU-funded project BACCHUS develops an intelligent mobile robotic system, comprising two independent and cooperative robots: one for the grape inspection and collection of valuable data regarding their maturity level, and one for the bimanual harvesting of grapes in a human-inspired manner. Validated via real-field trials, the proposed autonomous system pushes forward the precision agriculture application for a particularly sensitive crop type in the challenging and heavily cluttered environment of vineyards, facilitating the selective harvesting of high-quality grapes.
|
|
ThDT21 |
410 |
Diffusion for Manipulation |
Regular Session |
Chair: Duong, Thai | Rice University |
Co-Chair: Pérez-D'Arpino, Claudia | NVIDIA |
|
15:15-15:20, Paper ThDT21.1 | |
ProDapt: Proprioceptive Adaptation Using Long-Term Memory Diffusion |
|
Pizarro Bejarano, Federico | University of Toronto |
Jones, Bryson | NASA Jet Propulsion Laboratory |
Pastor, Daniel | Caltech |
Bowkett, Joseph | NASA Jet Propulsion Laboratory |
Schoellig, Angela P. | TU Munich |
Backes, Paul | Jet Propulsion Laboratory |
Keywords: Machine Learning for Robot Control, Imitation Learning, Space Robotics and Automation
Abstract: Diffusion models have revolutionized imitation learning, allowing robots to replicate complex behaviours. However, diffusion often relies on cameras and other exteroceptive sensors to observe the environment and lacks long-term memory. In space, military, and underwater applications, robots must be highly robust to failures in exteroceptive sensors, operating using only proprioceptive information. In this paper, we propose ProDapt, a method of incorporating long-term memory of previous contacts between the robot and the environment in the diffusion process, allowing it to complete tasks using only proprioceptive data. This is achieved by identifying "keypoints", essential past observations maintained as inputs to the policy. We test our approach using a UR10e robotic arm in both simulation and real experiments and demonstrate the necessity of this long-term memory for task completion.
|
|
15:20-15:25, Paper ThDT21.2 | |
Latent Embedding Adaptation for Human Preference Alignment in Diffusion Planners |
|
Ng, Wen Zheng Terence | Nanyang Technological University |
Chen, Jianda | Nanyang Technological University |
Xu, Yuan | Nanyang Technological University |
Zhang, Tianwei | Nanyang Technological University |
Keywords: Deep Learning Methods, Reinforcement Learning, Representation Learning
Abstract: This work addresses the challenge of personalizing automated decision-making systems by introducing a resource-efficient approach that enables rapid adaptation to individual users' preferences. Our method leverages a pretrained conditional diffusion model with Preference Latent Embeddings (PLE), trained on a large, reward-free offline dataset. The PLE serves as a compact representation for capturing specific user preferences. By adapting the pretrained model using our proposed preference inversion method, which directly optimizes the learnable PLE, we achieve superior alignment with human preferences compared to existing solutions like Reinforcement Learning from Human Feedback (RLHF) and Low-Rank Adaptation (LoRA). To better reflect practical applications, we create a benchmark experiment using real human preferences on diverse, optimal trajectories.
|
|
15:25-15:30, Paper ThDT21.3 | |
Joint Localization and Planning Using Diffusion |
|
Lao Beyer, Lukas | Massachusetts Institute of Technology |
Karaman, Sertac | Massachusetts Institute of Technology |
Keywords: Deep Learning Methods, Localization, Autonomous Vehicle Navigation
Abstract: Diffusion models have been successfully applied to robotics problems such as manipulation and vehicle path planning. In this work, we explore their application to end-to-end navigation -- including both perception and planning -- by considering the problem of jointly performing global localization and path planning in known but arbitrary 2D environments. In particular, we introduce a diffusion model which produces collision-free paths in a global reference frame given an egocentric LIDAR scan, an arbitrary map, and a desired goal position. To this end, we implement diffusion in the space of paths in SE(2), and describe how to condition the denoising process on both obstacles and sensor observations. In our evaluation, we show that the proposed conditioning techniques enable generalization to realistic maps of considerably different appearance than the training environment, demonstrate our model's ability to accurately describe ambiguous solutions, and run extensive simulation experiments showcasing our model's use as a real-time, end-to-end localization and planning stack.
|
|
15:30-15:35, Paper ThDT21.4 | |
Diverse Motion Planning with Stein Diffusion Trajectory Inference |
|
Zeya, Yin | Univeristy of Sydney |
Lai, Tin | University of Sydney |
Barcelos, Lucas | University of Sydney |
Jacob, Jayadeep | University of Sydney |
Li, Yong Hui | Univeristy of Sydney |
Ramos, Fabio | University of Sydney, NVIDIA |
Keywords: Probabilistic Inference, Integrated Planning and Learning
Abstract: Acquiring prior knowledge of trajectory distributions in specific environments can significantly expedite the optimisation process in robot motion planning. Leveraging successful past plans and utilising trajectory generative models as priors offers a clear advantage. Previous studies have proposed various methods to harness these priors, such as using prior samples for initialisation or incorporating the prior distribution into trajectory optimisation through inference. Recently, diffusion models have demonstrated effectiveness in encoding multimodal data in high-dimensional settings. In this study, we propose a method that uses diffusion models as priors and employs Stein variational inference with Gaussian Process trajectories to integrate them into a batch inverse denoising process. This approach reduces the computation time required to approximate the posterior distribution of trajectories, particularly when adapting to new, unseen environments. Additionally, we incorporate path signatures into our method to enhance the diversity of the posterior distribution. To validate our approach, we conduct comparative assessments against multiple baseline methods across various scenarios, including 2D planar robots and robotic manipulators. Our experiments demonstrate that our method identifies the optimal solution with significantly reduced computational time.
|
|
15:35-15:40, Paper ThDT21.5 | |
The Ingredients for Robotic Diffusion Transformers |
|
Dasari, Sudeep | Carnegie Mellon University |
Mees, Oier | University of California, Berkeley |
Zhao, Sebastian | University of California, Berkeley |
Srirama, Mohan Kumar | Carnegie Mellon University |
Levine, Sergey | UC Berkeley |
Keywords: Machine Learning for Robot Control, Learning from Demonstration, Deep Learning in Grasping and Manipulation
Abstract: In recent years roboticists have achieved remarkable progress in solving increasingly general tasks on dexterous robotic hardware by leveraging high capacity Transformer network architectures and generative diffusion models. Unfortunately, combining these two orthogonal improvements has proven surprisingly difficult, since there is no clear and well understood process for making important design choices. In this paper, we identify, study and improve key architectural design decisions for high-capacity diffusion transformer policies. The resulting models can efficiently solve diverse tasks on multiple robot embodiments, without the excruciating pain of per-setup hyper-parameter tuning. By combining the results of our investigation with our improved model components, we are able to present a novel architecture, named DiT-Block Policy, that significantly outperforms the state of the art in solving long-horizon (1500+ time-steps) dexterous tasks on a bi-manual ALOHA robot. In addition, we find that our policies show improved scaling performance when trained on 10 hours of highly multi-modal, language annotated ALOHA demonstration data. We hope this work will open the door for future robot learning techniques that leverage the efficiency of generative diffusion modeling with the scalability of large scale transformer architectures. Code, robot dataset, and videos are available at: https://dit-policy.github.io
|
|
15:40-15:45, Paper ThDT21.6 | |
Inference-Time Policy Steering through Human Interactions |
|
Wang, Yanwei | MIT |
Wang, Lirui | Massachusetts Institute of Technology |
Du, Yilun | MIT |
Sundaralingam, Balakumar | NVIDIA Corporation |
Yang, Xuning | NVIDIA |
Chao, Yu-Wei | NVIDIA |
Pérez-D'Arpino, Claudia | NVIDIA |
Fox, Dieter | University of Washington |
Shah, Julie A. | MIT |
Keywords: Imitation Learning, Human-Robot Collaboration, Deep Learning Methods
Abstract: Generative policies trained with human demonstrations can autonomously accomplish multimodal, long-horizon tasks. However, during inference, humans are often removed from the policy execution loop, limiting the ability to guide a pre-trained policy towards a specific sub-goal or trajectory shape among multiple predictions. Naive human intervention may inadvertently exacerbate distribution shift, leading to constraint violations or execution failures. To better align policy output with human intent without inducing out-of-distribution errors, we propose an Inference-Time Policy Steering (ITPS) framework that leverages human interactions to bias the generative sampling process, rather than fine-tuning the policy on interaction data. We evaluate ITPS across three simulated and real-world benchmarks, testing three forms of human interaction and associated alignment distance metrics. Among six sampling strategies, our proposed stochastic sampling with diffusion policy achieves the best trade-off between alignment and distribution shift. Videos are available at https://yanweiw.github.io/itps/.
|
|
15:45-15:50, Paper ThDT21.7 | |
Legibility Diffuser: Offline Imitation for Intent Expressive Motion |
|
Bronars, Matthew | Carnegie Mellon University |
Cheng, Shuo | Gatech |
Xu, Danfei | Georgia Institute of Technology |
Keywords: Imitation Learning, Human-Robot Collaboration, Deep Learning Methods
Abstract: In human-robot collaboration, legible motion that conveys a robot's intentions and goals is known to improve safety, task efficiency, and user experience. Legible robot motion is typically generated using hand-designed cost functions and classical motion planners. However, with the rise of deep learning and data-driven robot policies, we need methods for training end-to-end on offline demonstration data. In this paper, we propose Legibility Diffuser, a diffusion-based policy that learns intent expressive motion directly from human demonstrations. By variably combining the noise predictions from a goal-conditioned diffusion model, we guide the robot's motion toward the most legible trajectory in the training dataset. We find that decaying the guidance weight over the course of the trajectory is critical for maintaining a high success rate while maximizing legibility.
|
|
ThDT22 |
411 |
Imitation Learning 3 |
Regular Session |
Chair: Kober, Jens | TU Delft |
Co-Chair: Bıyık, Erdem | University of Southern California |
|
15:15-15:20, Paper ThDT22.1 | |
Visually Robust Adversarial Imitation Learning from Videos with Contrastive Learning |
|
Giammarino, Vittorio | Boston University |
Queeney, James | Mitsubishi Electric Research Laboratories |
Paschalidis, Ioannis | Boston University |
Keywords: Imitation Learning, Reinforcement Learning, Visual Learning
Abstract: We propose C-LAIfO, a computationally efficient algorithm designed for imitation learning from videos in the presence of visual mismatch between agent and expert domains. We analyze the problem of imitation from expert videos with visual discrepancies, and introduce a solution for robust latent space estimation using contrastive learning and data augmentation. Provided a visually robust latent space, our algorithm performs imitation entirely within this space using off-policy adversarial imitation learning. We conduct a thorough ablation study to justify our design and test C-LAIfO on high-dimensional continuous robotic tasks. Additionally, we demonstrate how C-LAIfO can be combined with other reward signals to facilitate learning on a set of challenging hand manipulation tasks with sparse rewards. Our experiments show improved performance compared to baseline methods, highlighting the effectiveness of C-LAIfO. To ensure reproducibility, we open source our code.
|
|
15:20-15:25, Paper ThDT22.2 | |
One-Shot Imitation under Mismatched Execution |
|
Kedia, Kushal | Cornell University |
Dan, Prithwish | Cornell University |
Chao, Angela | Cornell University |
Pace, Maximus | Cornell University |
Choudhury, Sanjiban | Cornell University |
Keywords: Learning from Demonstration, Representation Learning, Transfer Learning
Abstract: Human demonstrations as prompts are a powerful way to program robots to do long-horizon manipulation tasks. However, translating these demonstrations into robot-executable actions presents significant challenges due to execution mismatches in movement styles and physical capabilities. Existing methods for human-robot translation either depend on paired data, which is infeasible to scale, or rely heavily on frame-level visual similarities that often break down in practice. To address these challenges, we propose RHyME, a novel framework that automatically pairs human and robot trajectories using sequence-level optimal transport cost functions. Given long-horizon robot demonstrations, RHyME synthesizes semantically equivalent human videos by retrieving and composing short-horizon human clips. This approach facilitates effective policy training without the need for paired data. RHyME successfully imitates a range of cross-embodiment demonstrators, both in simulation and with a real human hand, achieving over 50% increase in task success compared to previous methods. We release our code and datasets at https://portal-cornell.github.io/rhyme/.
|
|
15:25-15:30, Paper ThDT22.3 | |
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning |
|
Dai, Yinpei | University of Michigan |
Lee, Jayjun | University of Michigan |
Fazeli, Nima | University of Michigan |
Chai, Joyce | University of Michigan |
Keywords: Imitation Learning, Data Sets for Robot Learning, Deep Learning in Grasping and Manipulation
Abstract: Developing robust and correctable visuomotor policies for robotic manipulation is challenging due to the lack of self-recovery mechanisms from failures and the limitations of simple language instructions in guiding robot actions. To address these issues, we propose a scalable data generation pipeline that automatically augments expert demonstrations with failure recovery trajectories and fine-grained language annotations for training. We then introduce Rich languAge-guided failure reCovERy (RACER), a supervisor-actor framework, which combines failure recovery data with rich language descriptions to enhance robot control. RACER features a vision-language model (VLM) that acts as an online supervisor, providing detailed language guidance for error correction and task execution, and a language-conditioned visuomotor policy as an actor to predict the next actions. Our experimental results show that RACER outperforms the state-of-the-art Robotic View Transformer (RVT) on RLbench across various evaluation settings, including standard long-horizon tasks, dynamic goal-change tasks and zero-shot unseen tasks, achieving superior performance in both simulated and real world environments. Videos and code are available at: https://rich-language-failure-recovery.github.io
|
|
15:30-15:35, Paper ThDT22.4 | |
Improving Vision-Language-Action Model with Online Reinforcement Learning |
|
Guo, Yanjiang | Tsinghua University |
Zhang, Jianke | Tsinghua University |
Chen, Xiaoyu | Tsinghua University |
Ji, Xiang | Tsinghua University |
Wang, Yen-Jen | University of California, Berkeley |
Hu, Yucheng | Tsinghua |
Chen, Jianyu | Tsinghua University |
Keywords: Imitation Learning, Continual Learning, Reinforcement Learning
Abstract: Recent studies have successfully integrated large vision-language models (VLMs) into low-level robotic control by supervised fine-tuning (SFT) with expert robotic datasets, resulting in what we term vision-language-action (VLA) models. Although the VLA models are powerful, how to improve these large models during interaction with environments remains an open question. In this paper, we explore how to further improve these VLA models via Reinforcement Learning (RL), a commonly used fine-tuning technique for large models. However, we find that directly applying online RL to large VLA models presents significant challenges, including training instability that severely impacts the performance of large models, and computing demands that exceed the capabilities of most local machines. To address these problems, we propose iRe-VLA framework, which iterates between Reinforcement Learning and supervised learning to effectively improve VLA models, leveraging the exploratory benefits of RL while maintaining the stability of supervised learning. Experiments in two simulated benchmarks and a real-world manipulation suite validate the effectiveness of our method.
|
|
15:35-15:40, Paper ThDT22.5 | |
MILE: Model-Based Intervention Learning |
|
Korkmaz, Yigit | University of Southern California |
Bıyık, Erdem | University of Southern California |
Keywords: Imitation Learning, AI-Based Methods, Human Factors and Human-in-the-Loop
Abstract: Imitation learning techniques have been shown to be highly effective in real-world control scenarios, such as robotics. However, these approaches not only suffer from compounding error issues but also require human experts to provide complete trajectories. Although there exist interactive methods where an expert oversees the robot and intervenes if needed, these extensions usually only utilize the data collected during intervention periods and ignore the feedback signal hidden in non-intervention timesteps. In this work, we create a model to formulate how the interventions occur in such cases, and show that it is possible to learn a policy with just a handful of expert interventions. Our key insight is that it is possible to get crucial information about the quality of the current state and the optimality of the chosen action from expert feedback, regardless of the presence or the absence of intervention. We evaluate our method on various discrete and continuous simulation environments, a real-world robotic manipulation task, as well as a human subject study. Videos and the code can be found at https://liralab.usc.edu/mile.
|
|
15:40-15:45, Paper ThDT22.6 | |
Validity Learning on Failures: Mitigating the Distribution Shift in Autonomous Vehicle Planning |
|
Arasteh, Fazel | Noah's Ark Lab, Huawei |
Elmahgiubi, Mohammed | Huawei Technologies Inc |
Khamidehi, Behzad | University of Toronto |
Mirkhani, Hamidreza | Huawei Technologies Canada |
Zhang, Weize | Huawei |
Cao, Tongtong | Noah's Ark Lab, Huawei Technologies |
Rezaee, Kasra | Huawei Technologies |
Keywords: Imitation Learning, Learning from Demonstration, Reinforcement Learning
Abstract: The planning problem constitutes a fundamental aspect of the autonomous driving framework. Recent strides in representation learning have empowered vehicles to comprehend their surrounding environments, thereby facilitating the integration of learning-based planning strategies. Among these approaches, Imitation Learning stands out due to its notable training efficiency. However, traditional Imitation Learning methodologies encounter challenges associated with the covariate shift phenomenon. We propose Validity Learning on Failures, VL(on failure), as a remedy to address this issue. The essence of our method lies in deploying a pre-trained planner across diverse scenarios. Instances where the planner deviates from its immediate objectives, such as maintaining a safe distance from obstacles or adhering to traffic rules, are flagged as failures. The states corresponding to these failures are compiled into a new dataset, termed the failure dataset. Notably, the absence of expert annotations for this data precludes the applicability of standard imitation learning approaches. To facilitate learning from the closed-loop mistakes, we introduce the VL objective which aims to discern valid trajectories within the current environmental context. Experimental evaluations conducted on both reactive CARLA simulation and non-reactive log-replay simulations reveal substantial enhancements in closed-loop metrics such as Score, Progress, and Success Rate, underscoring the effectiveness of the proposed methodology. Further evaluations against the Bench2Drive benchmark demonstrate that VL(on failure) outperforms the state-of-the-art methods by a large margin.
|
|
15:45-15:50, Paper ThDT22.7 | |
Iteratively Adding Latent Human Knowledge within Trajectory Optimization Specifications Improves Learning and Task Outcomes |
|
Chang, Christine T | University of Colorado Boulder |
Stull, Maria P | University of Colorado Boulder |
Crockett, Breanne | University of Colorado Boulder |
Jensen, Emily | University of Colorado Boulder |
Lohrmann, Clare | University of Colorado Boulder |
Hebert, Mitchell | Draper |
Hayes, Bradley | University of Colorado Boulder |
Keywords: Human Factors and Human-in-the-Loop, Human-Robot Teaming, Aerial Systems: Applications
Abstract: Frictionless and understandable tasking is essential for leveraging human-autonomy teaming in commercial, military, and public safety applications. Existing technology for facilitating human teaming with uncrewed aerial vehicles (UAVs), utilizing planners or trajectory optimizers that incorporate human input, introduces a usability and operator capability gap by not explicitly effecting user upskilling by promoting system understanding or predictability. Supplementing annotated waypoints with natural language guidance affords an opportunity for both. In this work we investigate one-shot versus iterative input, introducing a testbed system based on government and industry UAV planning tools that affords inputs in the form of both natural language text and drawn annotations on a terrain map. The testbed uses an LLM-based subsystem to map user inputs into additional terms for the trajectory optimization objective function. We demonstrate through a human subjects study that prompting a human teammate to iteratively add latent knowledge to a trajectory optimization aids the user in learning how the system functions, elicits more desirable robot behaviors, and ultimately achieves better task outcomes.
|
|
ThDT23 |
412 |
Autonomous Vehicle Perception 6 |
Regular Session |
Chair: Dam, Tanmoy | Emory University |
Co-Chair: Ding, Zhengming | Tulane University |
|
15:15-15:20, Paper ThDT23.1 | |
HybridOcc: NeRF Enhanced Transformer-Based Multi-Camera 3D Occupancy Prediction |
|
Zhao, Xiao | Fudan University |
Chen, Bo | FAW Group |
Sun, Mingyang | Fudan University |
Yang, Dingkang | Fudan University |
Wang, Youxing | Fudan University |
Zhang, Xukun | Fudan University |
Li, Mingcheng | Fudan University |
Kou, Dongliang | Fudan University |
Wei, Xiaoyi | Fudan University |
ZHang, Lihua | Fudan University |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception, Recognition
Abstract: Vision-based 3D semantic scene completion (SSC) describes autonomous driving scenes through 3D volume representations. However, the occlusion of invisible voxels by scene surfaces poses challenges to current SSC methods in hallucinating refined 3D geometry. This paper proposes HybridOcc, a hybrid 3D volume query proposal method generated by Transformer framework and NeRF representation and refined in a coarse-to-fine SSC prediction framework. HybridOcc aggregates contextual features through the Transformer paradigm based on hybrid query proposals while combining it with NeRF representation to obtain depth supervision. The Transformer branch contains multiple scales and uses spatial cross-attention for 2D to 3D transformation. The newly designed NeRF branch implicitly infers scene occupancy through volume rendering, including visible and invisible voxels, and explicitly captures scene depth rather than generating RGB color. Furthermore, we present an innovative occupancy-aware ray sampling method to orient the SSC task instead of focusing on the scene surface, further improving the overall performance. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the effectiveness of our HybridOcc on the SSC task.
|
|
15:20-15:25, Paper ThDT23.2 | |
Temporal Consistency for RGB-Thermal Data-Based Semantic Scene Understanding |
|
Li, Haotian | The Hong Kong Polytechnic University |
Chu, Henry | The Hong Kong Polytechnic University |
Sun, Yuxiang | City University of Hong Kong |
Keywords: Automation Technologies for Smart Cities, Intelligent Transportation Systems
Abstract: Semantic scene understanding is a fundamental capability for autonomous vehicles. Under challenging lighting conditions, such as nighttime and on-coming headlights, the semantic scene understanding performance using only RGB images are usually degraded. Thermal images can provide complementary information to RGB images, so many recent semantic segmentation networks have been proposed using RGB-Thermal (RGB-T) images. However, most existing networks focus only on improving segmentation accuracy for single image frames, omitting the information consistency between consecutive frames. To provide a solution to this issue, we propose a temporal-consistent framework for RGB-T semantic segmentation, which introduces a virtual view image generation module to synthesize a virtual image for the next moment, and a consistency loss function to ensure the segmentation consistency. We also propose an evaluation metric to measure both the accuracy and consistency for semantic segmentation. Experimental results show that our framework outperforms state-of-the-art methods.
|
|
15:25-15:30, Paper ThDT23.3 | |
SaViD: Spectravista Aesthetic Vision Integration for Robust and Discerning 3D Object Detection in Challenging Environments |
|
Dam, Tanmoy | Emory University |
Dharavath, Sanjay Bhargav | Indian Institute of Technology, Kharagpur, India |
Alam, Sameer | Saab-NTU Joint Lab, Nanyang Technological University, Singapore |
Lilith, Nimrod | Saab-NTU Joint Lab, Nanyang Technological University, Singapore |
Maiti, Aniruddha | ADP |
Chakraborty, Supriyo | Indian Institute of Technology, Kharagpur, India |
Feroskhan, Mir | Nanyang Technological University |
Keywords: Object Detection, Segmentation and Categorization, Autonomous Vehicle Navigation, Sensor Fusion
Abstract: The fusion of LiDAR and camera sensors has demonstrated significant effectiveness in achieving accurate detection for short-range tasks in autonomous driving. However, this fusion approach could face challenges when dealing with long-range detection scenarios due to disparity between sparsity of LiDAR and high-resolution camera data. Moreover, sensor corruption introduces complexities that affect the ability to maintain robustness, despite the growing adoption of sensor fusion in this domain. We present SaViD, a novel framework comprised of a three-stage fusion alignment mechanism designed to address long-range detection challenges in the presence of natural corruption. The SaViD framework consists of three key elements: the Global Memory Attention Network (GMAN), which enhances the extraction of image features through offering a deeper understanding of global patterns; the Attentional Sparse Memory Network (ASMN), which enhances the integration of LiDAR and image features; and the KNNnectivity Graph Fusion (KGF), which enables the entire fusion of spatial information. SaViD achieves superior performance on the long-range detection Argoverse-2 (AV2) dataset with a performance improvement of 9.87% in AP value and an improvement of 2.39% in mAPH for L2 difficulties on the Waymo Open dataset (WOD). Comprehensive experiments are carried out to showcase its robustness against 14 natural sensor corruptions. SaViD exhibits a robust performance improvement of 31.43% for AV2 and 16.13% for WOD in RCE value compared to other existing fusion-based methods while considering all the corruptions for both datasets. Our code is available at href{https://anonymous.4open.science/r/SAVID-2A0D/README.m d}{textcolor{blue}{SaViD}}.
|
|
15:30-15:35, Paper ThDT23.4 | |
CRAB: Camera-Radar Fusion for Reducing Depth Ambiguity in Backward Projection Based View Transformation |
|
Lee, In-Jae | Seoul National University |
Hwang, Sihwan | Korea Advanced Institute of Science and Technology |
Kim, Youngseok | Korea Advanced Institute of Science and Technology |
Kim, Wonjune | Korea Advanced Institute of Science and Technology |
Kim, Sanmin | Kookmin University |
Kum, Dongsuk | KAIST |
Keywords: Object Detection, Segmentation and Categorization, Computer Vision for Transportation, Deep Learning for Visual Perception
Abstract: Recently, camera-radar fusion-based 3D object detection methods in bird's eye view (BEV) have gained attention due to the complementary characteristics and cost-effectiveness of these sensors. Previous approaches using forward projection struggle with sparse BEV feature generation, while those employing backward projection overlook depth ambiguity, leading to false positives. In this paper, to address the aforementioned limitations, we propose a novel camera-radar fusion-based 3D object detection and segmentation model named CRAB (Camera-Radar fusion for reducing depth Ambiguity in Backward projection-based view transformation), using a backward projection that leverages radar to mitigate depth ambiguity. During the view transformation, CRAB aggregates perspective view image context features into BEV queries. It improves depth distinction among queries along the same ray by combining the dense but unreliable depth distribution from images with the sparse yet precise depth information from radar occupancy. We further introduce spatial cross-attention with a feature map containing radar context information to enhance the comprehension of the 3D scene. When evaluated on the nuScenes open dataset, our proposed approach achieves a state-of-the-art performance among backward projection-based camera-radar fusion methods with 62.4% NDS and 54.0% mAP in 3D object detection.
|
|
15:35-15:40, Paper ThDT23.5 | |
Efficient 3D Perception on Multi-Sweep Point Cloud with Gumbel Spatial Pruning |
|
Li, Jianhao | Beihang University |
Sun, Tianyu | Tsinghua University |
Zhang, Xueqian | Tsinghua University |
Wang, Zhongdao | Noah's Ark Laboratory |
Feng, Bailan | Noah's Ark Laboratory |
Xu, Ke | Beihang University |
Keywords: Object Detection, Segmentation and Categorization, Computer Vision for Transportation
Abstract: This paper studies point cloud perception within outdoor environments. Existing methods face limitations in recognizing objects located at a distance or occluded, due to the sparse nature of outdoor point clouds. In this work, we observe a significant mitigation of this problem by accumulating multiple temporally consecutive LiDAR sweeps, resulting in a remarkable improvement in perception accuracy. However, the computation cost also increases, hindering previous approaches from utilizing a large number of LiDAR sweeps. To tackle this challenge, we find that a considerable portion of points in the accumulated point cloud is redundant, and discarding these points has minimal impact on perception accuracy. We introduce a simple yet effective Gumbel Spatial Pruning (GSP) layer that dynamically prunes points based on a learned end-to-end sampling. The GSP layer is decoupled from other network components and thus can be seamlessly integrated into existing point cloud network architectures. Extensive experiments show that our pruning strategy improves several perception algorithms in multiple tasks.
|
|
15:40-15:45, Paper ThDT23.6 | |
RoBiFusion: A Robust and Bidirectional Interaction Camera-LiDAR 3D Object Detection Framework |
|
Wen, Xubin | Southeast University |
Xia, Haifeng | Southeast University |
Ding, Zhengming | Tulane University |
Xia, Siyu | Southeast University |
Keywords: Object Detection, Segmentation and Categorization, Intelligent Transportation Systems, Sensor Fusion
Abstract: Camera-LiDAR 3D object detection is currently becoming a crucial component in the field of autonomous driving perception. However, previous models only performed feature fusion in the deep-level BEV hierarchy when dealing with camera-LiDAR feature fusion. This approach lacks interaction with the shallow-level sensor features, which is beneficial in constructing the corresponding BEV features. However, a simple shallow-level feature interaction can introduce sensor noise caused by intrinsic and extrinsic camera calibration errors. To address this, we propose RoBiFusion, a novel camera-LiDAR 3D object detection framework designed for effective sensor feature interaction and mitigating sensor noise interference. This framework consists of three submodules: the Camera-LiDAR Feature Matching module, the LiDAR-to-Camera module, and the Camera-to-LiDAR module. Firstly, in the Camera-LiDAR Feature Matching module, we use the cross-attention module to dynamically match the camera features and the LiDAR features, which solves the problem of feature inconsistency caused by noise in the camera's intrinsic and extrinsic parameters. Secondly, in the LiDAR-to-Camera module, we propose a novel depth representation that can effectively mitigate LiDAR noise interference. Thirdly, in the Camera-to-LiDAR module, we introduce deformable attention to help LiDAR feature capture instance-level semantic features. Additionally, we design a novel differentiable and efficient grid sample module to accelerate the process since the bilinear grid sample module in deformable attention is time-consuming and not deployment-friendly. We compared RoBiFusion to the state-of-the-art BEVFusion on the nuScenes dataset and found that RoBiFusion surpasses BEVFusion by 1.5% mAP and 2.4% NDS. Furthermore, we designed a series of ablation experiments to verify the effectiveness of the aforementioned modules.
|
|
15:45-15:50, Paper ThDT23.7 | |
Towards Accurate Semi-Supervised BEV 3D Object Detection with Depth-Aware Refinement and Denoising-Aided Alignment |
|
Yang, Zhao | Xi'an Jiaotong University |
Shi, Yinan | Technical University Munich |
Zhu, Jiangtong | Xi'an Jiaotong University |
Xu, Weixiang | Institute of Automation, Chinese Academy of Sciences |
Liu, Longjun | Xi'an Jiaotong University |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning Methods, Deep Learning for Visual Perception
Abstract: Recently, camera-based Bird’s-Eye View (BEV) representation has gained significant traction in 3D object detection. However, training high-performance BEV 3D detectors typically requires a large number of annotated samples, which can be costly. Traditional semi-supervised methods for BEV 3D object detection face challenges including loss of rich depth information, inconsistent object representations across spaces, and unreliable pseudo label generation, leading to decreased accuracy and performance. Addressing this challenge, we pioneer the introduction of a semi-supervised BEV 3D object detection framework. Our approach leverages a small set of labeled data alongside a larger set of unlabeled data, significantly reducing annotation costs while maintaining robust detection performance. Firstly, we propose a depth-based self-refinement module to generate high-quality and stable pseudo labels, which can effectively regulate training with noisy labels. Secondly, we designed a denoising labels regression module that integrates denoising for both labeled and unlabeled data. Thirdly, in order to alleviate object inconsistency, we propose a consistent object-guided alignment method to ensure the consistency of objects in multi-spaces. Finally, our method can be easily plugged into various BEV 3D detection networks. Extensive experiments show that the proposed method achieves a new state-of-the-art compared to various camera-based 3D detectors tested on multiple public autonomous driving datasets.
|
|
ThET1 |
302 |
Visual Perception and Learning |
Regular Session |
Chair: Zhang, Hao | University of Massachusetts Amherst |
Co-Chair: Zhang, Jing | New York University |
|
16:35-16:40, Paper ThET1.1 | |
Open-RGBT: Open-Vocabulary RGB-T Zero-Shot Semantic Segmentation in Open-World Environments |
|
Yu, Meng | Beijing Institute of Technology |
Yue, Yufeng | Beijing Institute of Technology |
Yang, Luojie | Beijing Institute of Technology |
He, Xunjie | Beijing Institute of Technology |
Yang, Yi | Beijing Institute of Technology |
Fu, Mengyin | Beijing Institute of Technology |
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception
Abstract: Semantic segmentation is a critical technique for effective scene understanding. Traditional RGB-T semantic segmentation models often struggle to generalize across diverse scenarios due to their reliance on pretrained models and predefined categories. Recent advancements in Visual Language Models (VLMs) have facilitated a shift from closed-set to open-vocabulary semantic segmentation methods. However, these models face challenges in dealing with intricate scenes, primarily due to the heterogeneity between RGB and thermal modalities. To address this gap, we present Open-RGBT, a novel open-vocabulary RGB-T semantic segmentation model. Specifically, we obtain instance-level detection proposals by incorporating visual prompts to enhance category understanding. Additionally, we employ the CLIP model to assess image-text similarity, which helps correct semantic consistency and mitigates ambiguities in category identification. Empirical evaluations demonstrate that Open-RGBT achieves superior performance in diverse and challenging real-world scenarios, even in the wild, significantly advancing the field of RGB-T semantic segmentation. The project page of Open-RGBT is available at https://OpenRGBT.github.io/.
|
|
16:40-16:45, Paper ThET1.2 | |
Positioning in Congested Space by Combining Vision-Based and Proximity-Based Control |
|
Thomas, John | Institut Pascal |
Chaumette, Francois | Inria Center at University of Rennes |
Keywords: Sensor-based Control, Visual Servoing
Abstract: In this paper, we consider positioning in congested space within the framework of Sensor-based Control (SBC) using vision and proximity sensors. Vision acts as primary sensing modality for performing the positioning task, while proximity sensors complement it by ensuring that the robotic platform does not collide with objects in the workspace. Sensor information is combined in a shared manner using the QP formalism where ideas from safety-critical control are used to express inequality constraints. The proposed method is validated through various real experiments.
|
|
16:45-16:50, Paper ThET1.3 | |
SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation |
|
Li, Jianing | Nanjing University |
Lu, Ming | Intel Labs |
Liu, Juntao | China Mobile Research Institute |
Wang, Hao | Peking University |
Gu, Chenyang | Peking University |
Zheng, Wenzhao | Tsinghua University |
Du, Li | Nanjing University |
Zhang, Shanghang | Peking University |
Keywords: Computer Vision for Manufacturing, Deep Learning for Visual Perception, Visual Learning
Abstract: 3D semantic occupancy prediction is a crucial task in visual perception, demanding a simultaneous understanding of both scene geometry and semantics. It plays a pivotal role in 3D scene comprehension and holds great potential for various applications, such as robotic vision perception and autonomous driving. Many previous works leverage planar-based representations like Bird’s Eye View (BEV) and Tri-Perspective View (TPV), which aim to simplify the complexity of 3D scenes while preserving essential object information, thereby facilitating efficient scene representation. However, in dense indoor environments where occlusions are prevalent, directly applying these planar-based methods often leads to difficulties in capturing global semantic occupancy, ultimately degrading model performance. In this paper, we introduce a novel vertical slice representation, which divides the scene along the vertical axis and projects spatial point features onto the nearest pair of parallel planes. To harness these slice features, we propose SliceOcc, a camera-based model specifically tailored for indoor 3D semantic occupancy prediction. SliceOcc utilizes pairs of slice queries and cross-attention mechanisms to extract planar features from input images. These local planar features are then combined to form a global scene representation, which is employed for indoor occupancy estimation. Experimental results on the EmbodiedScan dataset demonstrate that SliceOcc achieves a mIoU of 15.45% across 81 indoor categories, setting a new state-of-the-art performance among RGB-based models for indoor 3D semantic occupancy prediction.
|
|
16:50-16:55, Paper ThET1.4 | |
Bandwidth-Adaptive Spatiotemporal Correspondence Identification for Collaborative Perception |
|
Gao, Peng | North Carolina State University |
Jose, Williard Joshua | University of Massachusetts Amherst |
Zhang, Hao | University of Massachusetts Amherst |
Keywords: RGB-D Perception, Deep Learning Methods, Multi-Robot Systems
Abstract: Correspondence identification (CoID) is an essential capability for multi-robot collaborative perception, which allows a group of robots to consistently refer to the same objects in their own fields of view. In real-world applications, such as connected autonomous driving, connected vehicles cannot directly share their raw observations due to the limited communication bandwidth. To address this challenge, we propose a novel approach of bandwidth-adaptive spatiotemporal CoID for collaborative perception, where robots interactively select partial spatiotemporal observations to share with others, while adapting to the communication constraint that dynamically changes over time. We evaluate our approach over various scenarios in connected autonomous driving simulations. Experimental results have demonstrated that our approach enables CoID and adapts to the dynamic change of bandwidth constraints. In addition, our approach achieves 8%-56% overall improvements in terms of covisible object retrieval for CoID and data sharing efficiency, which outperforms the previous techniques and achieves the state-of-the-art performance. More information is available at: https://gaopeng5.github.io/acoid/.
|
|
16:55-17:00, Paper ThET1.5 | |
Polyp-Gen: Realistic and Diverse Polyp Image Generation for Endoscopic Dataset Expansion |
|
Liu, Shengyuan | The Chinese University of Hong Kong |
Chen, Zhen | Centre for Artificial Intelligence and Robotics (CAIR), Hong Kon |
Yang, Qiushi | City University of Hong Kong |
Yu, Weihao | Chinese University of Hong Kong |
Dong, Di | Institute of Automation, Chinese Academy of Sciences |
Hu, Jiancong | The Sixth Affiliated Hospital, Sun Yat-Sen University |
Yuan, Yixuan | Chinese University of Hong Kong |
Keywords: Computer Vision for Automation, Medical Robots and Systems, Deep Learning for Visual Perception
Abstract: Automated diagnostic systems (ADS) have shown significant potential in the early detection of polyps during endoscopic examinations, thereby reducing the incidence of colorectal cancer. However, due to high annotation costs and strict privacy concerns, acquiring high-quality endoscopic images poses a considerable challenge in the development of ADS. Despite recent advancements in generating synthetic images for dataset expansion, existing endoscopic image generation algorithms failed to accurately generate the details of polyp boundary regions and typically required medical priors to specify plausible locations and shapes of polyps, which limited the realism and diversity of the generated images. To address these limitations, we present Polyp-Gen, the first full-automatic diffusion-based endoscopic image generation framework. Specifically, we devise a spatial-aware diffusion training scheme with a lesion-guided loss to enhance the structural context of polyp boundary regions. Moreover, to capture medical priors for the localization of potential polyp areas, we introduce a hierarchical retrieval-based sampling strategy to match similar fine-grained spatial features. In this way, our Polyp-Gen can generate realistic and diverse endoscopic images for building reliable ADS. Extensive experiments demonstrate the state-of-the-art generation quality and the synthetic images can improve the downstream polyp detection task. Additionally, our Polyp-Gen has shown remarkable zero-shot generalizability on other datasets. The source code is available at https://github.com/CUHK-AIM-Group/Polyp-Gen.
|
|
17:00-17:05, Paper ThET1.6 | |
DetailRefine: Towards Fine-Grained and Efficient Online Monocular 3D Reconstruction |
|
Chu, Fupeng | Chinese Academy of Sciences |
Cong, Yang | Chinese Academy of Science, China |
Chen, Ronghan | Sheyang Institute of Automation, Chinese Academy of Sciences |
Keywords: Computer Vision for Automation, Visual Learning, Deep Learning for Visual Perception
Abstract: Online monocular 3D reconstruction has attracted widespread attention as it promotes the application of robots in interactive scenarios. Most existing methods focus on 1) real-time reconstruction, 2) accurate voxel featuring learning, and 3) effective voxel sparsification algorithm. To this end, 1) they adopt a coarse-to-fine pipeline, where all non-empty voxels are sent to the next level for refinement. However, this results in over-refinement of flat regions, leading to unnecessary computational overhead. Furthermore, 2) advanced methods focus on exploring view visibility but overlook the discriminability among visible views, which limits the representation of learned voxel features. Moreover, 3) existing sparsification algorithms struggle to distinguish detailed and empty voxels, resulting in either the loss of detailed voxels or the retention of empty voxels. To tackle these challenges, 1) we present Dynamic Detail Refinement (DDR) to allocate more voxels to detailed regions for refinement, which could alleviate the computational burden. Furthermore, 2) we propose Discriminability-Aware Fusion (DAF) to focus on discriminative views, which helps to capture accurate voxel features. In addition, 3) we propose Hierarchical Hybrid Sparsification (HHS) to balance global completeness and local refinement, which helps to preserve detailed voxels at hierarchical levels effectively. Extensive experiments conducted on the representative ScanNet (V2) and 7-Scenes datasets demonstrate the superiority of the proposed method.
|
|
17:05-17:10, Paper ThET1.7 | |
DAP-LED: Learning Degradation-Aware Priors with CLIP for Joint Low-Light Enhancement and Deblurring |
|
Wang, Ling | HKUST(GZ) |
Wu, Chen | University of Science and Technology of China |
Wang, Lin | Nanyang Technological University (NTU) |
Keywords: Visual Learning, Deep Learning for Visual Perception
Abstract: Autonomous vehicles and robots often struggle with reliable visual perception at night due to the low illumination and motion blur caused by the long exposure time of RGB cameras. Existing methods address this challenge by sequentially connecting the off-the-shelf pretrained low-light enhancement and deblurring models. Unfortunately, these methods often lead to noticeable artifacts (eg., color distortions) in the over-exposed regions or make it hardly possible to learn the motion cues of the dark regions. In this paper, we interestingly find vision-language models, eg., Contrastive Language-Image Pretraining (CLIP), can comprehensively perceive diverse degradation levels at night. In light of this, we propose a novel transformer-based joint learning framework, named DAP-LED, which can jointly achieve low-light enhancement and deblurring, benefiting downstream tasks, such as depth estimation, segmentation, and detection in the dark. The key insight is to leverage CLIP to adaptively learn the degradation levels from images at night. This subtly enables learning rich semantic information and visual representation for optimization of the joint tasks. To achieve this, we first introduce a CLIP-guided cross-fusion module to obtain multi-scale patch-wise degradation heatmaps from the image embeddings. Then, the heatmaps are fused via the designed CLIP-enhanced transformer blocks to retain useful degradation information for effective model optimization. Experimental results show that, compared to existing methods, our DAP-LED achieves state-of-the-art performance in the dark. Meanwhile, the enhanced results are demonstrated to be effective for three downstream tasks. For demo and more results, please check the project page: url{https://vlislab22.github.io/dap-led/}.
|
|
17:10-17:15, Paper ThET1.8 | |
FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction |
|
Fang, Irving | New York University |
Shi, Kairui | New York University |
He, Xujin | New York University |
Tan, Siqi | New York University |
Wang, Yifan | New York University |
Zhao, Hanwen | New York University |
Huang, Hung-Jui | Carnegie Mellon University |
Yuan, Wenzhen | University of Illinois |
Feng, Chen | New York University |
Zhang, Jing | NYU |
Keywords: Deep Learning for Visual Perception, Force and Tactile Sensing, Object Detection, Segmentation and Categorization
Abstract: Humans effortlessly integrate common-sense knowledge with sensory input from vision and touch to understand their surroundings. Emulating this capability, we introduce FusionSense, a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors. FusionSense addresses three key challenges: (i) How can robots efficiently acquire robust global shape information about the surrounding scene and objects? (ii) How can robots strategically select touch points on the object using geometric and common-sense priors? (iii) How can partial observations such as tactile signals improve the overall representation of the object? Our framework employs 3D Gaussian Splatting as a core representation and incorporates a hierarchical optimization strategy involving global structure construction, object visual hull pruning and local geometric constraints. This advancement results in fast and robust perception in environments with traditionally challenging objects that are transparent, reflective, or dark, enabling more downstream manipulation or navigation tasks. Experiments on real-world data suggest that our framework outperforms previously state-of-the-art sparse-view methods. All code and data are open-sourced on the project website.
|
|
ThET2 |
301 |
Multi-Robot SLAM and Mapping |
Regular Session |
Chair: Chli, Margarita | ETH Zurich & University of Cyprus |
Co-Chair: Morbidi, Fabio | Université De Picardie Jules Verne |
|
16:35-16:40, Paper ThET2.1 | |
Multi-Robot Object SLAM Using Distributed Variational Inference |
|
Cao, Hanwen | University of California, San Diego |
Shreedharan, Sriram | University of California, San Diego |
Atanasov, Nikolay | University of California, San Diego |
Keywords: Multi-Robot SLAM, Distributed Robot Systems, Probability and Statistical Methods
Abstract: Multi-robot simultaneous localization and mapping (SLAM) enables a robot team to achieve coordinated tasks by relying on a common map of the environment. Constructing a map by centralized processing of the robot observations is undesirable because it creates a single point of failure and requires pre-existing infrastructure and significant communication throughput. This paper formulates multi-robot object SLAM as a variational inference problem over a communication graph subject to consensus constraints on the object estimates maintained by different robots. To solve the problem, we develop a distributed mirror descent algorithm with regularization enforcing consensus among the communicating robots. Using Gaussian distributions in the algorithm, we also derive a distributed multi-state constraint Kalman filter (MSCKF) for multi-robot object SLAM. Experiments on real and simulated data show that our method improves the trajectory and object estimates, compared to individual-robot SLAM, while achieving better scaling to large robot teams, compared to centralized multi-robot SLAM.
|
|
16:40-16:45, Paper ThET2.2 | |
DVM-SLAM: Decentralized Visual Monocular Simultaneous Localization and Mapping for Multi-Agent Systems |
|
Bird, Joshua | University of Cambridge |
Blumenkamp, Jan | University of Cambrdige |
Prorok, Amanda | University of Cambridge |
Keywords: Multi-Robot Systems, Multi-Robot SLAM, SLAM
Abstract: Cooperative Simultaneous Localization and Mapping (C-SLAM) enables multiple agents to work together in mapping unknown environments while simultaneously estimating their own positions. This approach enhances robustness, scalability, and accuracy by sharing information between agents, reducing drift, and enabling collective exploration of larger areas. In this paper, we present Decentralized Visual Monocular SLAM (DVM-SLAM), the first open-source decentralized monocular C-SLAM system. By only utilizing low-cost and light-weight monocular vision sensors, our system is well suited for small robots and micro aerial vehicles (MAVs). DVM-SLAM's real-world applicability is validated on physical robots with a custom collision avoidance framework, showcasing its potential in real-time multi-agent autonomous navigation scenarios. We also demonstrate comparable accuracy to state-of-the-art centralized monocular C-SLAM systems. We open-source our code and provide supplementary material online.
|
|
16:45-16:50, Paper ThET2.3 | |
TCAFF: Temporal Consistency for Robot Frame Alignment |
|
Peterson, Mason B. | Massachusetts Institute of Technology |
Lusk, Parker C. | Massachusetts Institute of Technology |
Avila, Antonio | Massachusetts Institute of Technology |
How, Jonathan | Massachusetts Institute of Technology |
Keywords: Localization, Multi-Robot SLAM
Abstract: In the field of collaborative robotics, the ability to communicate spatial information like planned trajectories and shared environment information is crucial. When no global position information is available (e.g., indoor or GPS-denied environments), agents must align their coordinate frames before shared spatial information can be properly expressed and interpreted. Coordinate frame alignment is particularly difficult when robots have no initial alignment and are affected by odometry drift. To this end, we develop a novel multiple hypothesis algorithm, called TCAFF, for aligning the coordinate frames of neighboring robots. TCAFF considers potential alignments from associating sparse open-set object maps and leverages temporal consistency to determine an initial alignment and correct for drift, all without any initial knowledge of neighboring robot poses. We demonstrate TCAFF being used for frame alignment in a collaborative object tracking application on a team of four robots tracking six pedestrians and show that TCAFF enables robots to achieve a tracking accuracy similar to that of a system with ground truth localization. The code and hardware dataset are available at https://github.com/mit-acl/tcaff.
|
|
16:50-16:55, Paper ThET2.4 | |
Effective Heterogeneous Point Cloud-Based Place Recognition and Relative Localization for Ground and Aerial Vehicles |
|
Mao, Rui | Sun Yat-Sen University |
Cheng, Hui | Sun Yat-Sen University |
Keywords: Range Sensing, Localization, Multi-Robot SLAM
Abstract: Place recognition and relative localization are crucial for realizing the potential of collaboration in ground and aerial robot teams. Many existing works focus only on ground robots and are not well-suited for heterogeneous robot systems in large-scale environments. In this paper, we propose a novel pipeline based on BEV density image, combined with an enhanced data structure, for place recognition in air-ground robotic collaboration systems. An efficient height alignment algorithm is proposed for relative localization. Extensive experiments on various types of public datasets validate the efficacy of our method compared to other SOTA works. We also show that our method is capable to detect inter- and intra-robot loop closures in a ground and aerial multi-session SLAM system.
|
|
16:55-17:00, Paper ThET2.5 | |
Distributed Invariant Kalman Filter for Object-Level Multi-Robot Pose SLAM |
|
Li, Haoying | Chinese University of Hong Kong, Shenzhen |
Zeng, Qingcheng | The Hong Kong University of Science and Technology (Guangzhou) |
Li, Haoran | Chinese University of Hong Kong, Shenzhen |
Zhang, Yanglin | The Chinese University of Hong Kong, Shenzhen |
Wu, Junfeng | The Chinese Unviersity of Hong Kong, Shenzhen |
Keywords: Distributed Robot Systems, Multi-Robot SLAM, Autonomous Agents
Abstract: Cooperative localization and target tracking are essential for multi-robot systems to implement high-level tasks. To this end, we propose a distributed invariant Kalman filter~(KF) based on covariance intersection~(CI) for effective multi-robot pose estimation. The paper utilizes the object-level measurement models, which have condensed information further reducing the communication burden. Besides, by modeling states on special Lie groups, and representing uncertainty in corresponding Lie algebras, better linearity and consistency are obtained under the invariant KF framework. We also use a combination of CI and KF to avoid overly confident or conservative estimates in multi-robot systems with intricate and unknown correlations, and some level of robot degradation is acceptable through multi-robot collaboration. The simulation and real data experiment validate the practicability and superiority of the proposed algorithm.
|
|
17:00-17:05, Paper ThET2.6 | |
MT-PCR: Leveraging Modality Transformation for Large-Scale Point Cloud Registration with Limited Overlap |
|
Wu, Yilong | University of Science and Technology of China |
Duan, Yifan | University of Science and Technology of China |
Chen, Yuxi | University of Science and Technology of China |
Zhang, Xinran | University of Science and Technology of China |
Shen, Yedong | University of Science & Technology of China |
Ji, Jianmin | University of Science and Technology of China |
Zhang, Yanyong | University of Science and Technology of China |
Zhang, Lu | Institute of Artificial Intelligence, Hefei Comprehensive Nation |
Keywords: Multi-Robot SLAM, Aerial Systems: Perception and Autonomy, Mapping
Abstract: Large-scale scene point cloud registration with limited overlap is a challenging task due to computational load and constrained data acquisition. To tackle these issues, we propose a point cloud registration method, MT-PCR, based on Modality Transformation. MT-PCR leverages a Bird’s Eye View (BEV) capturing the maximal overlap information to improve the accuracy and utilizes images to provide complementary spatial features. Specifically, MT-PCR converts 3D point clouds to BEV images and estimates correspondence by 2D image keypoints extraction and matching. Subsequently, the 2D correspondence estimates are then transformed back to 3D point clouds using inverse mapping. We have applied MT-PCR to Terrestrial Laser Scanning (TLS) and Aerial Laser Scanning (ALS) point cloud registration on the GrAco dataset, involving 8 low-overlap, square-kilometer scale registration scenarios. Experiments and comparisons with commonly used methods demonstrate that MT-PCR can achieve superior accuracy and robustness in large-scale scenes with limited overlap.
|
|
17:05-17:10, Paper ThET2.7 | |
Large-Scale Multi-Session Point-Cloud Map Merging |
|
Wei, Hairuo | The University of Hong Kong |
Li, Rundong | University of Hong Kong |
Cai, Yixi | KTH Royal Institute of Technology |
Yuan, Chongjian | The University of Hong Kong |
Ren, Yunfan | The University of Hong Kong |
Zou, Zuhao | HongKong University |
Wu, Huajie | Hong Kong University |
Zheng, Chunran | The University of Hong Kong |
Zhou, Shunbo | Huawei |
Xue, Kaiwen | The Chinese University of Hong Kong, Shenzhen |
Zhang, Fu | University of Hong Kong |
Keywords: Multi-Robot SLAM, Mapping, SLAM
Abstract: This paper introduces LAMM, an open-source framework for large-scale multi-session 3D LiDAR point cloud map merging. LAMM can automatically integrate sub-maps from multiple agents carrying LiDARs with different scanning patterns, facilitating place feature extraction, data association, and global optimization in various environments. Our framework incorporates two key novelties that enable robust, accurate, large-scale map merging. The first novelty is a temporal bidirectional filtering mechanism that removes dynamic objects from 3D LiDAR point cloud data. This eliminates the effect of dynamic objects on the 3D map model, providing higher-quality map merging results. The second novelty is a robust and efficient outlier removal algorithm for detected loop closures. This algorithm ensures a high recall rate and a low false alarm rate in position retrieval, significantly reducing outliers in repetitive environments during large-scale merging. We evaluate our framework using various datasets, including KITTI, H
|
|
ThET3 |
303 |
Robotics and Automation in Life Science and Rescue Applications |
Regular Session |
Chair: Kaiser, Tanja Katharina | University of Technology Nuremberg |
Co-Chair: Alterovitz, Ron | University of North Carolina at Chapel Hill |
|
16:35-16:40, Paper ThET3.1 | |
The qPCRBot: Combining Automated Data Handling, Standardization, and Robotic Labware Transport for Better qPCR Measurements |
|
Zwirnmann, Henning | Technical University of Munich |
Eckhoff, Moritz | Technical University of Munich |
Knobbe, Dennis | Technical University of Munich |
Fülöp, Dorian | Technical University of Munich (TUM) |
Gabrielli, Andrea | Technical University of Munich |
Haddadin, Sami | Mohamed Bin Zayed University of Artificial Intelligence |
Keywords: Robotics and Automation in Life Sciences, Software Architecture for Robotic and Automation, Biological Cell Manipulation
Abstract: Laboratory automation is a key driver for higher efficiency and reproducibility of experiments and measurements in natural science laboratories. One process that is particularly susceptible to both manual errors in the physical handling of labware, faulty data analyses, and incomplete reporting is the quantitative Polymerase Chain Reaction (qPCR). It is a ubiquitous analysis method in biolaboratories to amplify and measure the amount of a specific DNA sequence in a sample. Our system, which we call the qPCRBot, addresses these issues through three key pillars: automating data analysis and handling processes, standardizing data management and system communication protocols, and utilizing a robotic manipulator for labware transport. To achieve this, we developed a SiLA 2-based client-server architecture for unified and standardized access to both the qPCR device and the robot. For the manipulator, we implemented a Cartesian motion generator to ensure proper labware transport. We transform all experiment data to a standardized, XML-based format and integrate a widely-used Laboratory Information Management System for its storage. These developments collectively enable streamlined qPCR measurements without human interaction, thus enhancing both efficiency and reproducibility.
|
|
16:40-16:45, Paper ThET3.2 | |
Distributed Pursuit of an Evader with Adaptive Robust Path Control under State Measurement Uncertainty |
|
Rao, Kai | East China University of Science and Technology |
Yan, Huaicheng | East China University of Science and Technology |
Huang, Zhihao | East China University of Science and Technology |
Yang, Penghui | East China University of Science and Technology |
Lv, Yunkai | East China University of Science and Technology |
Keywords: Surveillance Robotic Systems, Search and Rescue Robots, Multi-Robot Systems
Abstract: This paper presents a distributed pursuit framework for environments with obstacles considering state measurement uncertainty. Our framework consists of two primary components: the computation of safe pursuit regions based on Voronoi cell (VC) and the solution of an adaptive robust path controller based on Control Barrier Function (CBF). Initially, the chance constrained obstacle-aware Voronoi cell (CCOVC) for each pursuer is constructed by calculating separation hyperplane and buffer terms. Subsequently, we formulate chance CBF and chance Control Lyapunov Function (CLF) constraints, using convex approximation to determine their upper bounds. We then find the adaptive robust path controller by solving a Quadratically Constrained Quadratic Program (QCQP). The advantage of this framework lies in its capability to adaptively compute the path controller and ensure robust collision avoidance among pursuers and with obstacles. Simulation and experimental results demonstrate the effectiveness and robustness of the proposed framework.
|
|
16:45-16:50, Paper ThET3.3 | |
Multimodal Behaviour Trees for Robotic Laboratory Task Automation |
|
Fakhruldeen, Hatem | University of Liverpool |
Raveendran Nambiar, Arvind | University of Liverpool |
Veeramani, Satheeshkumar | University of Liverpool |
Tailor, Bonilkumar Vijaykumar | University of Liverpool |
Beyzaee Juneghani, Hadi | University of Liverpool |
Pizzuto, Gabriella | University of Liverpool |
Cooper, Andrew Ian | University of Liverpool |
Keywords: Robotics and Automation in Life Sciences
Abstract: Laboratory robotics offer the capability to conduct experiments with a high degree of precision and reproducibility, with the potential to transform scientific research. Trivial and repeatable tasks; e.g., sample transportation for analysis and vial capping are well-suited for robots; if done successfully and reliably, chemists could contribute their efforts towards more critical research activities. Currently, robots can perform these tasks faster than chemists, but how reliable are they? Improper capping could result in human exposure to toxic chemicals which could be fatal. To ensure that robots perform these tasks as accurately as humans, sensory feedback is required to assess the progress of task execution. To address this, we propose a novel methodology based on behaviour trees with multimodal perception. Along with automating robotic tasks, this methodology also verifies the successful execution of the task, a fundamental requirement in safety-critical environments. The experimental evaluation was conducted on two lab tasks: sample vial capping and laboratory rack insertion. The results show high success rate, i.e., 88% for capping and 92% for insertion, along with strong error detection capabilities. This ultimately proves the robustness and reliability of our approach and that using multimodal behaviour trees should pave the way towards the next generation of robotic chemists.
|
|
16:50-16:55, Paper ThET3.4 | |
A Hierarchical Graph-Based Terrain-Aware Autonomous Navigation Approach for Complementary Multimodal Ground-Aerial Exploration |
|
Patel, Akash | Luleĺ University of Technology |
Valdes Saucedo, Mario Alberto | Lulea University of Technology |
Stathoulopoulos, Nikolaos | Luleĺ University of Technology |
Sankaranarayanan, Viswa Narayanan | Lulea University of Techonology |
Tevetzidis, Ilias | Luleĺ University of Technology |
Kanellakis, Christoforos | LTU |
Nikolakopoulos, George | Luleĺ University of Technology |
Keywords: Search and Rescue Robots, Field Robots, Cooperating Robots
Abstract: Autonomous navigation in unknown environments is a fundamental challenge in robotics, particularly in coordinating ground and aerial robots to maximize exploration efficiency. This paper presents a novel approach that utilizes a hierarchical graph to represent the environment, encoding both geometric and semantic traversability. The framework enables the robots to compute a shared confidence metric, which helps the ground robot assess terrain and determine when deploying the aerial robot will extend exploration. The robot's confidence in traversing a path is based on factors such as predicted volumetric gain, path traversability, and collision risk. A hierarchy of graphs is used to maintain an efficient representation of traversability and frontier information through multi-resolution maps. Evaluated in a real subterranean exploration scenario, the approach allows the ground robot to autonomously identify zones that are no longer traversable but suitable for aerial deployment. By leveraging this hierarchical structure, the ground robot can selectively share graph information on confidence-assessed frontier targets from parts of the scene, enabling the aerial robot to navigate beyond obstacles and continue exploration.
|
|
16:55-17:00, Paper ThET3.5 | |
Introducing Collaborative Robots As a First Step towards Autonomous Reprocessing of Medical Equipment |
|
Voigt, Florian | Technical University of Munich |
Naceri, Abdeldjallil | Technical University of Munich |
Haddadin, Sami | Mohamed Bin Zayed University of Artificial Intelligence |
Keywords: Robotics and Automation in Life Sciences, Bimanual Manipulation, Medical Robots and Systems
Abstract: Ensuring the sterility of medical equipment, particularly endoscopes used in environments teeming with diverse pathogens and drug-resistant bacteria, is crucial for safe medical procedures. However, the complexity of endoscope reprocessing, which involves numerous dexterous manual manipulations, poses significant challenges. Achieving certification for sterilization requires precise, repetitive execution with strict tolerances. In this study, we propose a framework that automates the handling and storage of endoscopes right after the sterilization process and employs compliant collaborative robots to address these dexterous manipulation challenges. In the first stage, we identified the key manipulation skills involved in the process through observations and feedback from medical personnel. In the second stage, we proposed a system that employs a high-level action planner to orchestrate the removal and storage of endoscopes, integrating two collaborative robots and a linear unit. Through real-time force measurements, compliant control, task knowledge, and safety protocols, we establish a system that ensures the safety of both medical equipment and personnel in proximity. In our first experiment, we conducted 50 trials with a 100% reliability rate. Each trial had an execution time of 102 seconds, with a variance of 1.2 seconds. In our second experiment, we performed 10 trials with a human obstructing the transfer path, facing away from the robot. In all cases, the system successfully and promptly detected the collision. This work pioneers the automation of medical reprocessing in sterile environments using tactile robots and addresses the associated challenges.
|
|
17:00-17:05, Paper ThET3.6 | |
CloudTrack: Scalable UAV Tracking with Cloud Semantics |
|
Blei, Yannik | University of Technology Nuremberg |
Krawez, Michael | University of Technology Nuremberg |
Nilavadi, Nisarga | University of Technology Nuremberg |
Kaiser, Tanja Katharina | University of Technology Nuremberg |
Burgard, Wolfram | University of Technology Nuremberg |
Keywords: Search and Rescue Robots, Aerial Systems: Applications, Human Detection and Tracking
Abstract: Nowadays, unmanned aerial vehicles (UAVs) are commonly used in search and rescue scenarios to gather information in the search area. The automatic identification of the person searched for in aerial footage could increase the autonomy of such systems, reduce the search time, and thus increase the missed person’s chances of survival. In this paper, we present a novel approach to perform semantically conditioned open vocabulary object tracking that is specifically designed to cope with the limitations of UAV hardware. Our approach has several advantages: It can run with verbal descriptions of the missing person, e.g., the color of the shirt, it does not require dedicated training to execute the mission, and can efficiently track a potentially moving person. Our experimental results demonstrate the versatility and efficacy of our approach. We publish the methods source code at https://github.com/utn-blei/CloudTrack.
|
|
17:05-17:10, Paper ThET3.7 | |
The Experiment Orchestration System (EOS): Comprehensive Foundation for Laboratory Automation |
|
Angelopoulos, Angelos | University of North Carolina at Chapel Hill |
Baykal, Cem | University of North Carolina at Chapel Hill |
Kandel, Jade | University of North Carolina at Chapel Hill |
Verber, Matthew | University of North Carolina at Chapel Hill |
Cahoon, James | University of North Carolina at Chapel Hill |
Alterovitz, Ron | University of North Carolina at Chapel Hill |
Keywords: Robotics and Automation in Life Sciences, Software Architecture for Robotic and Automation, Foundations of Automation
Abstract: As scientific research in chemistry, materials science, and applied sciences becomes increasingly complex and data-driven, there is a growing need for efficient, scalable, and flexible automation to accelerate discoveries and reduce human burden and error in laboratories. We introduce the Experiment Orchestration System (EOS), an open-source software framework and runtime offering a comprehensive foundation for laboratory automation. EOS offers an extensible framework allowing users to define labs, devices, tasks, experiments, and optimization criteria using YAML and Python plugins, and also offers a distributed runtime for managing and executing automation. EOS has a central orchestrator that communicates with and controls laboratory equipment to execute tasks. EOS implements autonomous experiment campaigns, parameter optimization, task scheduling, result aggregation, and more. By providing a common infrastructure for laboratory automation, EOS aims to reduce automation implementation barriers and accelerate discoveries in science laboratories.
|
|
ThET4 |
304 |
Bioinspiration and Biomimetics 3 |
Regular Session |
Chair: Floreano, Dario | Ecole Polytechnique Fédérale De Lausanne (EPFL) |
Co-Chair: Degani, Amir | Technion - Israel Institute of Technology |
|
16:35-16:40, Paper ThET4.1 | |
Design of a Bioinspired Jumping Mechanism for Self-Takeoff of Flapping Robot |
|
Pan, Erzhen | Harbin Institute of Technology, Shenzhen |
Sun, Wei | Harbin Institute of Technology Shenzhen |
Xu, Wenfu | Harbin Institute of Technology, Shenzhen |
Keywords: Biologically-Inspired Robots, Biomimetics
Abstract: Most birds in nature rely on jumping for take-off. Flapping-Wing robots can flap and fly like birds but require an operator to take off, which are unable to generate sufficient lift to maintain flight at a low airspeed and must accelerate to take-off speed in a short time. It poses a challenge for the design of the jumping mechanism. This study is inspired by the jump-takeoff of birds and designs a simple and lightweight jumping leg, which is capable of storing and releasing the energy with only one degree of freedom. In addition, a prototype was developed and tested, with a wingspan of 2 meters and a mass of 1.6 kilograms, accelerating to 4 m/s in 52 milliseconds by jumping, achieving the jumping take-off from the ground.
|
|
16:40-16:45, Paper ThET4.2 | |
Embodied Adaptive Sensing for Odor Concentration Maximization in Bio-Inspired Robotics |
|
Homchanthanakul, Jettanan | Vidyasirimedhi Institute of Science and Technology |
Shigaki, Shunsuke | National Institute of Informatics |
Manoonpong, Poramate | Vidyasirimedhi Institute of Science and Technology (VISTEC) |
Keywords: Biologically-Inspired Robots, Neural and Fuzzy Control, Legged Robots
Abstract: Animals exhibit remarkable adaptability in sensing their environments, employing strategies that optimize information gathering. For instance, silk moths adjust their wing-flapping frequency to detect pheromones, while dogs modify their sniffing behavior by altering sniff height and frequency based on proximity to an odor source. Despite the potential to enhance odor detection for olfactory navigation by drawing inspiration from these natural mechanisms, many existing approaches focus on computationally intensive methods like multi-sensory integration or rely on multiple robots for odor localization, rather than leveraging embodied sensing. In this study, we propose an embodied adaptive sensing strategy that enhances odor detection by implementing an active odor sensor on a legged robot and applying a bio-inspired adaptive robot height control system for dynamically adapting the robot's height based on real-time gas concentration feedback. The control system employs a simple artificial hormone mechanism to regulate the robot height by processing gas concentration derivatives, mimicking biological adaptability. By utilizing the interaction between the active odor sensor, adaptive control system, and the legged body, this approach allows the robot to optimize its height online to capture the maximum gas concentration, thereby reducing the need for complex algorithms and high computational resources. As a result, it offers a more efficient solution for odor-driven tasks, with potential applications in real-world environments.
|
|
16:45-16:50, Paper ThET4.3 | |
SKOOTR: A SKating, Omni-Oriented, Tripedal Robot |
|
Hung, Adam Joshua | University of Michigan |
Enninful Adu, Challen | University of Michigan |
Moore, Talia | University of Michigan |
Keywords: Biologically-Inspired Robots, Biomimetics
Abstract: In both animals and robots, locomotion capabilities are determined by the physical structure of the system. The majority of legged animals and robots are bilaterally symmetric, which facilitates locomotion with consistent headings and obstacle traversal, but leads to constraints in their turning ability. On the other hand, radially symmetric animals have demonstrated rapid turning abilities enabled by their omni-directional body plans. Radially symmetric tripedal robots are able to turn instantaneously, but are commonly constrained by needing to change direction with every step, resulting in inefficient and less stable locomotion. Inspired by the radial symmetry and maneuverability of brittle stars and octopuses, we introduce a novel design for a tripedal robot that has both frictional and rolling contacts. Additionally, a freely rotating central sphere provides an added contact point so the robot can retain a stable tripod base of support while lifting and pushing with any one of its legs. The SKating, Omni-Oriented, Tripedal Robot (SKOOTR) is more versatile and stable than existing tripedal robots. It is capable of multiple forward gaits, multiple turning maneuvers, obstacle traversal, and stair climbing. SKOOTR has been designed to facilitate customization for diverse applications: it is fully open-source, is constructed with 3D printed or off-the-shelf parts, and costs approximately 500 USD to build. A project page with CAD files, assembly guide, and links to the github repository is posted at https://www.embirlab.com/skootr.
|
|
16:50-16:55, Paper ThET4.4 | |
AllGaits: Learning All Quadruped Gaits and Transitions |
|
Bellegarda, Guillaume | EPFL |
Shafiee, Milad | EPFL |
Ijspeert, Auke | EPFL |
Keywords: Biologically-Inspired Robots, Legged Robots
Abstract: We present a framework for learning a single policy capable of producing all quadruped gaits and transitions. The framework consists of a policy trained with deep reinforcement learning (DRL) to modulate the parameters of a system of abstract oscillators (i.e. Central Pattern Generator), whose output is mapped to joint commands through a pattern formation layer that sets the gait style, i.e. body height, swing foot ground clearance height, and foot offset. Different gaits are formed by changing the coupling between different oscillators, which can be instantaneously selected at any velocity by a user. With this framework, we systematically investigate which gait should be used at which velocity, and when gait transitions should occur from a Cost of Transport (COT), i.e. energy-efficiency, point of view. Additionally, we note how gait style changes as a function of locomotion speed for each gait to keep the most energy-efficient locomotion. While the currently most popular gait (trot) does not result in the lowest COT, we find that considering different co-dependent metrics such as mean base angular velocity and joint acceleration result in different 'optimal' gaits than those that minimize COT. We deploy our controller in various hardware experiments, focusing on 9 quadruped animal gaits, and demonstrate generalizability to novel and unseen gaits during training, and robustness to leg failures.
|
|
16:55-17:00, Paper ThET4.5 | |
Bird-Inspired Tendon Coupling Improves Paddling Efficiency by Shortening Phase Transition Times |
|
Lin, Jianfeng | Georgia Institute of Technology |
Guo, Zhao | Wuhan University |
Badri-Spröwitz, Alexander | Max Planck Institute for Intelligent Systems |
Keywords: Biologically-Inspired Robots, Biomimetics, Tendon/Wire Mechanism
Abstract: Drag-based swimming with rowing appendages, fins, and webbed feet is a widely adapted locomotion form in aquatic animals. To develop effective underwater and swimming vehicles, a wide range of bioinspired drag-based paddles have been proposed, often faced with a trade-off between propulsive efficiency and versatility. Webbed feet provide an effective propulsive force in the power phase, are light weight and robust, and can even be partially folded away in the recovery phase. However, during the transition between recovery and power phase, much time is lost folding and unfolding, leading to drag and reducing efficiency. In this work, we took inspiration from the coupling tendons of aquatic birds and utilized tendon coupling mechanisms to shorten the transition time between recovery and power phase. Results from our hardware experiments show that the proposed mechanisms improve propulsive efficiency by 2.0 and 2.4 times compared to a design without extensor tendons or based on passive paddle, respectively. We further report that distal leg joint clutching, which has been shown to improve efficiency in terrestrial walking, did not play an major role in swimming locomotion. In sum, we describe a new principle for an efficient, drag-based leg and paddle design, with potential relevance for the swimming mechanics in aquatic birds.
|
|
17:00-17:05, Paper ThET4.6 | |
A Bio-Inspired Sand-Rolling Robot: Effect of Body Shape on Sand Rolling Performance |
|
Liao, Xingjue | University of Southern California |
Liu, Wenhao | University of Southern California |
Wu, Hao | University of Southern California |
Qian, Feifei | University of Southern California |
Keywords: Biologically-Inspired Robots, Biomimetics, Passive Walking
Abstract: The capability of effectively moving on complex terrains such as sand and gravel can empower our robots to robustly operate in outdoor environments, and assist with critical tasks such as environment monitoring, search-and-rescue, and supply delivery. Inspired by the Mount Lyell salamander's ability to curl its body into a loop and effectively roll down hill slopes, in this study we develop a sand-rolling robot and investigate how its locomotion performance is governed by the shape of its body. We experimentally tested three different body shapes: Hexagon, Quadrilateral, and Triangle. We found that Hexagon and Triangle can achieve a faster rolling speed on sand, but exhibited more frequent failures of getting stuck. Analysis of the interaction between robot and sand revealed the failure mechanism: the deformation of the sand produced a local ``sand incline'' underneath robot contact segments, increasing the effective region of supporting polygon (ERSP) and preventing the robot from shifting its center of mass (CoM) outside the ERSP to produce sustainable rolling. Based on this mechanism, a highly-simplified model successfully captured the critical body pitch for each rolling shape to produce sustained rolling on sand, and informed design adaptations that mitigated the locomotion failures and improved robot speed by more than 200%. Our results provide insights into how locomotors can utilize different morphological features to achieve robust rolling motion across deformable substrates.
|
|
17:05-17:10, Paper ThET4.7 | |
A Programmable Substrate to Study Robots Jumping from Non-Rigid Surfaces |
|
Divi, Sathvik | Carnegie Mellon University |
Yim, Justin K. | University of Illinois Urbana-Champaign |
Bedillion, Mark | Carnegie Mellon University |
Bergbreiter, Sarah | Carnegie Mellon University |
Keywords: Biologically-Inspired Robots, Biomimetics, Compliance and Impedance Control
Abstract: This study presents the development, characterization, and demonstration of a tunable substrate for small jumping robots. Jumping robots in the literature are typically evaluated when jumping from rigid surfaces, in contrast to surfaces with more significant compliance or damping that are encountered in the natural world. The aim of this work is to create a physical substrate, or 'ground', for which the effective mass, compliance, and damping can be programmed. This system enables quick testing of various substrate conditions and also allows for the introduction of complex nonlinearities to analyze the interactions between latch-mediated spring actuation (LaMSA) systems and their environment. A mathematical model for the substrate is defined and the system is built with a fast brushless DC motor and controller running on a real-time target machine. The results illustrate the range of compliance and damping that can be achieved, as well as example jumps from the substrate using a 4 g jumper and a 108 g jumping robot.
|
|
ThET5 |
305 |
Learning for Legged Locomotion 2 |
Regular Session |
Chair: Havoutis, Ioannis | University of Oxford |
Co-Chair: Daniel, Mélodie | LaBRI - Université De Bordeaux |
|
16:35-16:40, Paper ThET5.1 | |
Fine-Tuning Hard-To-Simulate Objectives for Quadruped Locomotion: A Case Study on Total Power Saving |
|
Nai, Ruiqian | Tsinghua University |
You, Jiacheng | Tsinghua University |
Cao, Liu | Tsinghua University |
Cui, Hanchen | University of Minnesota Twin Cities |
Zhang, Shiyuan | Tsinghua University |
Xu, Huazhe | Tsinghua University |
Gao, Yang | Tsinghua University |
Keywords: Reinforcement Learning, Legged Robots
Abstract: Legged locomotion is not just about mobility; it also encompasses crucial objectives such as energy efficiency, safety, and user experience, which are vital for real-world applications. However, key factors such as battery power consumption and stepping noise are often inaccurately modeled or missing in common simulators, leaving these aspects poorly optimized or unaddressed by current sim-to-real methods. Hand-designed proxies, such as mechanical power and foot contact forces, have been used to address these challenges but are often problem-specific and inaccurate. In this paper, we propose a data-driven framework for fine-tuning locomotion policies, targeting these hard-to-simulate objectives. Our framework leverages real-world data to model these objectives and incorporates the learned model into simulation for policy improvement. We demonstrate the effectiveness of our framework on power saving for quadruped locomotion, achieving a significant 24-28% net reduction in total power consumption from the battery pack at various speeds. In essence, our approach offers a versatile solution for optimizing hard-to-simulate objectives in quadruped locomotion, providing an easy-to-adapt paradigm for continual improving with real-world knowledge.
|
|
16:40-16:45, Paper ThET5.2 | |
Think on Your Feet: Seamless and Command-Adaptive Transition between Human-Like Locomotions |
|
Huang, Huaxing | Noetix Robotics |
Cui, Wenhao | Noetix |
Zhang, Tonghe | Tsinghua University |
Li, Shengtao | Noetix |
Han, Jinchao | Noetix |
Qin, Bangyu | Noetix Robotics |
Zheng, Liang | Noetix |
Tang, Ziyang | Noetix Robotics |
Zhang, Tianchu | Noetix Robotics |
Hu, Chenxu | Tsinghua University |
Zhang, Shipu | Noetix Robotics |
Jiang, Zheyuan | NOETIX Robotics |
Keywords: Reinforcement Learning, Imitation Learning, Humanoid and Bipedal Locomotion
Abstract: While it is relatively easier to train humanoid robots to mimic specific locomotion skills, it is more challenging to learn from various motions and adhere to continuously changing commands. These robots must accurately track motion instructions, seamlessly transition between a variety of movements,} and master intermediate motions not present in their reference data. In this work, we propose a novel approach that integrates human-like motion transfer with precise velocity tracking by a series of improvements to classical imitation learning. To enhance generalization, we employ the Wasserstein divergence criterion (WGAN-div). Furthermore, a Hybrid Internal Model provides structured estimates of hidden states and velocity to enhance mobile stability and environment adaptability, while a curiosity bonus fosters exploration. Our comprehensive method promises highly human-like locomotion that adapts to varying velocity requirements, direct generalization to unseen motions and multitasking, as well as zero-shot transfer to the simulator and the real world across different terrains. These advancements are validated through simulations across various robot models and extensive real-world experiments.
|
|
16:45-16:50, Paper ThET5.3 | |
RINA: Rapid Introspective Neural Adaptation for Out-Of-Distribution Payload Configurations on Quadruped Robots |
|
Youngquist, Oscar | University of Massachusetts Amherst |
Zhang, Hao | University of Massachusetts Amherst |
Keywords: Machine Learning for Robot Control, Legged Robots, Deep Learning Methods
Abstract: Adaptive locomotion is a fundamental capability for quadruped robots, particularly in real-world scenarios when they must transport novel or out-of-distribution (O.O.D.) payloads across diverse terrains. Previous learning-based methods often tightly couple a locomotion controller's learned parameters with the adaptation process, which requires extensive pre-training or slow online updates when encountering O.O.D. payloads. To enable adaptation of quadruped locomotion to O.O.D. payloads, we propose the novel Rapid Introspective Neural Adaptation (RINA) method that rapidly compensates for differences between expected and actual joint torques caused by O.O.D. payloads. RINA introduces an adaptive residual dynamics representation that decouples the learning model's parameters from those used for adaptation. A new neural operator network is introduced to learn a set of basis functions as the learning model, which are combined using linear coefficients to predict residual dynamics. Then, these residual dynamics are used to adjust the locomotion controller's output, compensating for additional torques induced by the O.O.D. payload. During execution, the mixing coefficients can be rapidly and introspectively adapted on-the-go to generate joint torque compensations for O.O.D. payloads, while keeping the learned basis functions unchanged. Experimental results have demonstrated that our RINA approach well addresses on-the-go O.O.D. payload adaptation on varied natural terrains without collecting and retraining on additional data and outperforms baseline methods.
|
|
16:50-16:55, Paper ThET5.4 | |
Masked Sensory-Temporal Attention for Sensor Generalization in Quadruped Locomotion |
|
Liu, Dikai | NVIDIA |
Zhang, Tianwei | Nanyang Technological University |
Yin, Jianxiong | NVIDIA |
See, Simon | NVIDIA |
Keywords: Legged Robots
Abstract: With the rising focus on quadrupeds, a generalized policy capable of handling different robot models and sensor inputs becomes highly beneficial. Although several methods have been proposed to address different morphologies, it remains a challenge for learning-based policies to manage various combinations of proprioceptive information. This paper presents Masked Sensory-Temporal Attention (MSTA), a novel transformer-based mechanism with masking for quadruped locomotion. It employs direct sensor-level attention to enhance the sensory-temporal understanding and handle different combinations of sensor data, serving as a foundation for incorporating unseen information. MSTA can effectively understand its states even with a large portion of missing information, and is flexible enough to be deployed on physical systems despite the long input sequence.
|
|
16:55-17:00, Paper ThET5.5 | |
Robust Robot Walker: Learning Agile Locomotion Over Tiny Traps |
|
Zhu, Shaoting | Tsinghua University |
Huang, Runhan | Tsinghua University |
Mou, Linzhan | University of Pennsylvania |
Zhao, Hang | Tsinghua University |
Keywords: Legged Robots, Reinforcement Learning, AI-Based Methods
Abstract: Quadruped robots must exhibit robust walking capabilities in practical applications. In this work, we propose a novel approach that enables quadruped robots to pass various small obstacles, or "tiny traps". Existing methods often rely on exteroceptive sensors, which can be unreliable for detecting such tiny traps. To overcome this limitation, our approach focuses solely on proprioceptive inputs. We introduce a two-stage training framework incorporating a contact encoder and a classification head to learn implicit representations of different traps. Additionally, we design a set of tailored reward functions to improve both the stability of training and the ease of deployment for goal-tracking tasks. To benefit further research, we design a new benchmark for tiny trap task. Extensive experiments in both simulation and real-world settings demonstrate the effectiveness and robustness of our method. Appendix can be found in project page: https://robust-robot-walker.github.io/.
|
|
17:00-17:05, Paper ThET5.6 | |
FRASA: An End-To-End Reinforcement Learning Agent for Fall Recovery and Stand up of Humanoid Robots |
|
Gaspard, Clément | LaBRI - University of Bordeaux |
Duclusaud, Marc | LaBRI - University of Bordeaux |
Passault, Grégoire | LaBRI |
Daniel, Mélodie | LaBRI - Université De Bordeaux |
Ly, Olivier | LaBRI - Bordeaux University |
Keywords: Reinforcement Learning, Humanoid Robot Systems, Body Balancing
Abstract: Humanoid robotics faces significant challenges in achieving stable locomotion and recovering from falls in dynamic environments. Traditional methods, such as Model Predictive Control (MPC) and Key Frame Based (KFB) routines, either require extensive fine-tuning or lack real-time adaptability. This paper introduces FRASA, a Deep Reinforcement Learning (DRL) agent that integrates fall recovery and stand up strategies into a unified framework. Leveraging the Cross-Q algorithm, FRASA significantly reduces training time and offers a versatile recovery strategy that adapts to unpredictable disturbances. Comparative tests on Sigmaban humanoid robots demonstrate FRASA superior performance against the KFB method deployed in the RoboCup 2023 by the Rhoban Team, world champion of the KidSize League.
|
|
17:05-17:10, Paper ThET5.7 | |
DreamFLEX: Learning Fault-Aware Quadrupedal Locomotion Controller for Anomaly Situation in Rough Terrains |
|
Lee, Seunghyun | KAIST (Korea Advanced Institute of Science and Technology) |
Nahrendra, I Made Aswin | KAIST |
Lee, Dongkyu | KAIST |
Yu, Byeongho | KAIST |
Oh, Minho | KAIST |
Lee, Hyeonwoo | KAIST (Korea Advanced Institute of Science and Technology) |
Myung, Hyun | KAIST (Korea Advanced Institute of Science and Technology) |
Keywords: Legged Robots, Reinforcement Learning, Robust/Adaptive Control
Abstract: Recent advances in quadrupedal robots have demonstrated impressive agility and the ability to traverse diverse terrains. However, hardware issues, such as motor overheating or joint locking, may occur during long-distance walking or traversing through rough terrains and lead to locomotion failures. Although several studies have proposed fault-tolerant control methods for quadrupedal robots, there are still challenges in traversing unstructured terrains. In this paper, we propose DreamFLEX, a robust fault-tolerant locomotion controller that enables a quadrupedal robot to traverse complex environments even under joint failure condition. DreamFLEX integrates an explicit failure estimation and modulation network that jointly estimates the robot's joint fault vector and utilizes this information to adapt the locomotion pattern to faulty conditions in real-time, enabling quadrupedal robots to maintain stability and performance in rough terrains. Experimental results demonstrate that DreamFLEX outperforms existing methods in both simulation and real-world scenarios, effectively managing hardware failures while maintaining robust locomotion performance.
|
|
17:10-17:15, Paper ThET5.8 | |
Curriculum-Based Reinforcement Learning for Quadrupedal Jumping: A Reference-Free Design |
|
Atanassov, Vassil | University of Oxford |
Ding, Jiatao | Delft University of Technology |
Kober, Jens | TU Delft |
Havoutis, Ioannis | University of Oxford |
Della Santina, Cosimo | TU Delft |
Keywords: Legged Robots, Reinforcement Learning, Machine Learning for Robot Control
Abstract: Deep reinforcement learning (DRL) has emerged as a promising solution to mastering explosive and versatile quadrupedal jumping skills. However, current DRL-based frameworks usually rely on pre-existing reference trajectories obtained by capturing animal motions or transferring experience from existing controllers. This work aims to prove that learning dynamic jumping is possible without relying on imitating a reference trajectory by leveraging a curriculum design. Starting from a vertical in-place jump, we generalize the learned policy to forward and diagonal jumps and, finally, we learn to jump across obstacles. Conditioned on the desired landing location, orientation, and obstacle dimensions, the proposed approach yields a wide range of omnidirectional jumping motions in real-world experiments. Particularly we achieve a 90cm forward jump, exceeding all previous records for similar robots reported in the existing literature. Additionally, the robot can reliably execute continuous jumping on soft grassy grounds, which is especially remarkable as such conditions were not included in the training stage.
|
|
ThET6 |
307 |
Perception for Manipulation 4 |
Regular Session |
Chair: Liu, Katherine | Toyota Research Institute |
Co-Chair: Gaidon, Adrien | Toyota Research Institute |
|
16:35-16:40, Paper ThET6.1 | |
OmniShape: Zero-Shot Multi-Hypothesis Shape and Pose Estimation in the Real World |
|
Liu, Katherine | Toyota Research Institute |
Zakharov, Sergey | Toyota Research Institute |
Chen, Dian | Toyota Research Institute |
Ikeda, Takuya | Woven by Toyota, Inc |
Shakhnarovich, Gregory | Toyota Technological Institute at Chicago |
Gaidon, Adrien | Toyota Research Institute |
Ambrus, Rares | Toyota Research Institute |
Keywords: Deep Learning for Visual Perception, Perception for Grasping and Manipulation
Abstract: We would like to estimate the pose and full shape of an object from a single observation, without assuming known 3D model or category. In this work, we propose OmniShape, the first method of its kind to enable probabilistic pose and shape estimation. OmniShape is based on the key insight that shape completion can be decoupled into two multi-modal distributions: one capturing how measurements project into a normalized object reference frame defined by the dataset and the other modelling a prior over object geometries represented as triplanar neural fields. By training separate conditional diffusion models for these two distributions, we enable sampling multiple hypotheses from the joint pose and shape distribution. OmniShape demonstrates compelling performance on challenging real world datasets.
|
|
16:40-16:45, Paper ThET6.2 | |
Self-Supervised Learning of Reconstructing Deformable Linear Objects under Single-Frame Occluded View |
|
Wang, Song | Tsinghua University |
Shen, Guanghui | Tsinghua University |
Wu, Shirui | Tsinghua University |
Wu, Dan | Tsinghua University |
Keywords: Perception for Grasping and Manipulation, RGB-D Perception, Deep Learning for Visual Perception
Abstract: Deformable linear objects (DLOs), such as ropes,cables, and rods, are common in various scenarios, and accurate occlusion reconstruction of them is crucial for effective robotic manipulation. Previous studies for DLO reconstruction either rely on supervised learning, which is limited by the availability of labeled real-world data, or geometric approaches, which fail to capture global features and often struggle with occlusions and complex shapes. This paper presents a novel DLO occlusion reconstruction framework that integrates self-supervised point cloud completion with traditional techniques like clustering, sorting, and fitting to generate ordered key points. A memory module is proposed to enhance the self-supervised training process by consolidating prototype information, while DLO shape constraints are utilized to improve reconstruction accuracy. Experimental results on both synthetic and real-world datasets demonstrate that our method outperforms state-of the-art algorithms, particularly in scenarios involving complex occlusions and intricate self-intersections.
|
|
16:45-16:50, Paper ThET6.3 | |
PseudoTouch: Efficiently Imaging the Surface Feel of Objects for Robotic Manipulation |
|
Röfer, Adrian | University of Freiburg |
Heppert, Nick | University of Freiburg |
Ayad, Abdallah | University of Freiburg |
Chisari, Eugenio | University of Freiburg |
Valada, Abhinav | University of Freiburg |
Keywords: Perception for Grasping and Manipulation, Force and Tactile Sensing, Representation Learning
Abstract: Tactile sensing is vital for human dexterous manipulation, however, it has not been widely used in robotics. Compact, low-cost sensing platforms can facilitate a change, but unlike their popular optical counterparts, they are difficult to deploy in high-fidelity tasks due to their low signal dimensionality and lack of a simulation model. To overcome these challenges, we introduce PseudoTouch which links high-dimensional structural information to low-dimensional sensor signals. It does so by learning a low-dimensional visual-tactile embedding, wherein we encode a depth patch from which we decode the tactile signal. We collect and train PseudoTouch on a dataset comprising aligned tactile and visual data pairs obtained through random touching of eight basic geometric shapes. We demonstrate the utility of our trained PseudoTouch model in two downstream tasks: object recognition and grasp stability prediction. In the object recognition task, we evaluate the learned embedding's performance on a set of five basic geometric shapes and five household objects. Using PseudoTouch, we achieve an object recognition accuracy 84% after just ten touches, surpassing a proprioception baseline. For the grasp stability task, we use ACRONYM labels to train and evaluate a grasp success predictor using PseudoTouch's predictions derived from virtual depth information. Our approach yields a 32% absolute improvement in accuracy compared to the baseline relying on partial point cloud data. We make the data, code, and trained models publicly available at https://pseudotouch.cs.uni-freiburg.de.
|
|
16:50-16:55, Paper ThET6.4 | |
Segment Any Repeated Object |
|
Liu, Yushi | University Tübingen |
Graf, Christian | Robert Bosch GmbH |
Spies, Markus | Bosch Center for Artificial Intelligence |
Keuper, Margret | University of Mannheim |
Keywords: Perception for Grasping and Manipulation, Object Detection, Segmentation and Categorization, Semantic Scene Understanding
Abstract: Understanding a scene in terms of objects and their properties is fundamental for various vision-based robotic applications, including item picking. To effectively clear a bin, a robot must comprehend objects as graspable entities, often without prior access to models of the target object. This study focuses on open world object segmentation with the additional requirement of assigning identical class labels for repeated instances of the same object. This capability enables item picking tasks with homogeneous bins, filtering out packaging material, and sorting tasks. We propose a novel pipeline for detecting repeated instances of identical objects, building on recent advancements in vision foundation models and exploring approaches for estimating object similarities based on feature embeddings or keypoint correspondence matching. Through a comprehensive experimental evaluation, we establish a new state-of-the-art on ARMBench repeated objects segmentation, a particularly challenging open problem in bin-picking robotics. Additionally, we demonstrate the real-world application of our method integrated into a robot picking cell to showcase its relevance to industrial use cases.
|
|
16:55-17:00, Paper ThET6.5 | |
ViTa-Zero: Zero-Shot Visuotactile Object 6D Pose Estimation |
|
Li, Hongyu | Brown University |
Akl, James | Amazon |
Sridhar, Srinath | Brown University |
Brady, Tye | Amazon |
Padir, Taskin | Northeastern University |
Keywords: Perception for Grasping and Manipulation, Force and Tactile Sensing, Sensor Fusion
Abstract: Object 6D pose estimation is a critical challenge in robotics, particularly for manipulation tasks. While prior research combining visual and tactile (visuotactile) information has shown promise, these approaches often struggle with generalization due to the limited availability of visuotactile data. In this paper, we introduce ViTa-Zero, a zero-shot visuotactile pose estimation framework. Our key innovation lies in leveraging a visual model as its backbone and performing feasibility checking and test-time optimization based on physical constraints derived from tactile and proprioceptive observations. Specifically, we model the gripper-object interaction as a spring–mass system, where tactile sensors induce attractive forces, and proprioception generates repulsive forces. We validate our framework through experiments on a real-world robot setup, demonstrating its effectiveness across representative visual backbones and manipulation scenarios, including grasping, object picking, and bimanual handover. Compared to the visual models, our approach overcomes some drastic failure modes while tracking the in-hand object pose. In our experiments, our approach shows an average increase of 55% in AUC of ADD-S and 60% in ADD, along with an 80% lower position error compared to FoundationPose.
|
|
17:00-17:05, Paper ThET6.6 | |
DoorBot: Closed-Loop Task Planning and Manipulation for Door Opening in the Wild with Haptic Feedback |
|
Wang, Zhi | UIUC |
Mo, Yuchen | University of Illinois, Urbana-Champaign |
Jin, Shengmiao | University of Illinois Urbana-Champaign |
Yuan, Wenzhen | University of Illinois |
Keywords: Force and Tactile Sensing, Mobile Manipulation, Perception for Grasping and Manipulation
Abstract: Robots operating in unstructured environments face significant challenges when interacting with everyday objects like doors. They particularly struggle to generalize across diverse door types and conditions. Existing vision-based and open-loop planning methods often lack the robustness to handle varying door designs, mechanisms, and push/pull configurations. In this work, we propose a haptic-aware closed-loop hierarchical control framework that enables robots to explore and open different unseen doors in the wild. Our approach leverages real-time haptic feedback, allowing the robot to adjust its strategy dynamically based on force feedback during manipulation. We test our system on 20 unseen doors across different buildings, featuring diverse appearances and mechanical types. Our framework achieves a 90% success rate, demonstrating its ability to generalize and robustly handle varied door-opening tasks. This scalable solution offers potential applications in broader open-world articulated object manipulation tasks.
|
|
17:05-17:10, Paper ThET6.7 | |
SEDMamba: Enhancing Selective State Space Modelling with Bottleneck Mechanism and Fine-To-Coarse Temporal Fusion for Efficient Error Detection in Robot-Assisted Surgery |
|
Xu, Jialang | University College London |
Sirajudeen, Nazir | University College London |
Boal, Matthew | The Griffin Institute |
Francis, Nader | The Griffin Institute |
Stoyanov, Danail | University College London |
Mazomenos, Evangelos | UCL |
Keywords: Computer Vision for Medical Robotics, Surgical Robotics: Laparoscopy, Visual Learning
Abstract: Automated detection of surgical errors can improve robotic-assisted surgery. Despite promising progress, existing methods still face challenges in capturing rich temporal context to establish long-term dependencies while maintaining computational efficiency. In this paper, we propose a novel hierarchical model named SEDMamba, which incorporates the selective state space model (SSM) into surgical error detection, facilitating efficient long sequence modelling with linear complexity. SEDMamba enhances selective SSM with a bottleneck mechanism and fine-to-coarse temporal fusion (FCTF) to detect and temporally localize surgical errors in long videos. The bottleneck mechanism compresses and restores features within their spatial dimension, thereby reducing computational complexity. FCTF utilizes multiple dilated 1D convolutional layers to merge temporal information across diverse scale ranges, accommodating errors of varying duration. Our work also contributes the first-of-its-kind, frame-level, in-vivo surgical error dataset to support error detection in real surgical cases. Specifically, we deploy the clinically validated observational clinical human reliability assessment tool (OCHRA) to annotate the errors during suturing tasks in an open-source radical prostatectomy dataset (SAR-RARP50). Experimental results demonstrate that our SEDMamba outperforms state-of-the-art methods with at least 1.82% AUC and 3.80% AP performance gains with significantly reduced computational complexity. The corresponding error annotations, code and models will be released at https://github.com/wzjialang/SEDMamba.
|
|
ThET7 |
309 |
Deep Learning Applications |
Regular Session |
Chair: Ostyn, Frederik | Ghent University |
Co-Chair: Attali, Amnon | University of Illinois at Urbana-Champaign |
|
16:35-16:40, Paper ThET7.1 | |
Automated Generation of Transformations to Mitigate Sensor Hardware Migration in ADS |
|
Von Stein, Meriel | University of Virginia |
Elbaum, Sebastian | University of Virginia |
Wang, Hongning | University of Virginia |
Keywords: Sensor-based Control, Deep Learning Methods, Autonomous Vehicle Navigation
Abstract: Autonomous driving systems (ADSs) rely on massive amounts of sensed data to train their underlying machine-learned components. Common sensor hardware migrations can render an existing machine-learned pipeline inadequate. This necessitates the development of bespoke transformations to adapt new sensor data to the old learned model, or the retraining of a new model with new sensor data. These solutions are expensive, often performed reactively to sensor hardware migration, and rely on empirical reconstruction and validation metrics only which lack knowledge of the features important to the learned model. To address these challenges, we propose PreFixer, a technique that can systematically generate transformations for many types of sensor hardware migration during the ADS development lifecycle. PreFixer collects small datasets using colocated new and old sensors, and then uses that data and the output of the learned model to train an augmented encoder to learn a transformation that maps new sensor data to old sensor data. The trained encoder can then be deployed as a preprocessor to the old learned model. Our study shows that, for a common set of camera sensor hardware migrations, PreFixer can match or improve the performance of the best-performing baseline technique in terms of distance travelled safely with 10% of the training dataset, and take at most half of the training time.
|
|
16:40-16:45, Paper ThET7.2 | |
Probabilistic Latent Variable Modeling for Dynamic Friction Identification and Estimation |
|
Vantilborgh, Victor | Ghent University |
De Witte, Sander | Ghent University |
Ostyn, Frederik | Ghent University |
Lefebvre, Tom | Ghent University |
Crevecoeur, Guillaume | Ghent University |
Keywords: Industrial Robots, Deep Learning Methods, Probabilistic Inference
Abstract: Precise identification of dynamic models in robotics is essential to support dynamic simulations, control design, friction compensation, output torque estimation, etc. A longstanding challenge remains in the development and identification of friction models for robotic joints, given the numerous physical phenomena affecting the underlying friction dynamics which result into nonlinear characteristics and hysteresis behaviour in particular. These phenomena proof difficult to be modelled and captured accurately using physical analogies alone. This has motivated researchers to shift from physics-based to data-driven models. Currently, these methods are still limited in their ability to generalize effectively to typical industrial robot deployement, characterized by high- and low-velocity operations and frequent direction reversals. Empirical observations motivate the use of dynamic friction models but these remain particulary challenging to establish. To address the current limitations, we propose to account for unidentified dynamics in the robot joints using latent dynamic states. The friction model may then utilize both the dynamic robot state and additional information encoded in the latent state to evaluate the friction torque. We cast this stochastic and partially unsupervised identification problem as a standard probabilistic representation learning problem. In this work both the friction model and latent state dynamics are parametrized as neural networks and are integrated in the conventional lumped parameter dynamic robot model. The complete dynamics model is directly learned from the noisy encoder measurements in the robot joints. We use the Expectation-Maximisation (EM) algorithm to find a Maximum Likelihood Estimate (MLE) of the model parameters. The effectiveness of the proposed method is validated in terms of open-loop prediction accuracy in comparison with baseline methods, using the Kuka KR6 R700 as a test platform.
|
|
16:45-16:50, Paper ThET7.3 | |
Learning Three-Dimensional Bin Packing with Adjustable-Order Semi-Online Setting |
|
Yin, Hao | Southwest Jiaotong University |
Zhang, Chenxi | Southwest Jiaotong University |
Chen, Fan | Southwest Jiaotong University |
He, Hongjie | Southwest Jiaotong University |
Keywords: Reinforcement Learning, Deep Learning Methods, Industrial Robots
Abstract: The online setting brings greater flexibility and practicality to the three-dimensional bin packing problem (3D-BPP) but at the cost of algorithm performance. Existing methods mitigate the performance impact by introducing semi-online settings with look-ahead or buffer zones. However, these methods either fail to fundamentally alter the packing order or reduce packing efficiency. This paper proposes a novel semi-online setting that allows for the observation of multiple items and the selection of one for packing, thereby adjusting the packing order without reducing packing efficiency. We do work for solving the semi-online packing problem via reinforcement learning which faces two real-world challenges: (1) a variable and difficult-to-predict number of observed items, and (2) the obstruction of robotic arm movement by already packed items. On the one hand, we design a policy network capable of adapting to variable item quantities. On the other hand, we introduce a guided bottom-up packing reward function to free up space for robotic arm motion. We show that our method outperforms the baselines in terms of space utilization with the condition of observing at least two items. Further experiments demonstrate the functionality of our reward function, which can guide a virtual robot to complete packing tasks.
|
|
16:50-16:55, Paper ThET7.4 | |
Multiple Rotation Averaging with Constrained Reweighting Deep Matrix Factorization |
|
Li, Shiqi | Xi'an Jiaotong University |
Zhu, Jihua | Xi'an Jiaotong University |
Xie, Yifan | Xi'an Jiaotong University |
Hu, Naiwen | Xi'an Jiaotong University |
Zhu, Mingchen | University of California, Davis |
Li, Zhongyu | Xi'an Jiaotong University |
Wang, Di | Xi'an Jiaotong University |
Lu, Huimin | Southeast University |
Keywords: SLAM, Deep Learning for Visual Perception
Abstract: Multiple rotation averaging plays a crucial role in computer vision and robotics domains. The conventional optimization-based methods optimize a nonlinear cost function based on certain noise assumptions, while most previous learning-based methods require ground truth labels in the supervised training process. Recognizing the handcrafted noise assumption may not be reasonable in all real-world scenarios, this paper proposes an effective rotation averaging method for mining data patterns in a learning manner while avoiding the requirement of labels. Specifically, we apply deep matrix factorization to directly solve the multiple rotation averaging problem in free linear space. For deep matrix factorization, we design a neural network model, which is explicitly low-rank and symmetric to better suit the background of multiple rotation averaging. Meanwhile, we utilize a spanning tree-based edge filtering to suppress the influence of rotation outliers. What's more, we also adopt a reweighting scheme and dynamic depth selection strategy to further improve the robustness. Our method synthesizes the merit of both optimization-based and learning-based methods. Experimental results on various datasets validate the effectiveness of our proposed method.
|
|
16:55-17:00, Paper ThET7.5 | |
Magnetometer-Calibrated Hybrid Transformer for Robust Inertial Tracking in Robotics |
|
Zheng, Xinzhe | The University of Hong Kong |
Ji, Sijie | California Institute of Technology |
Pan, Yipeng | The University of Hong Kong |
Zhang, Kaiwen | The University of Hong Kong |
Pan, Jia | University of Hong Kong |
Wu, Chenshu | The University of Hong Kong |
Keywords: Localization, Deep Learning Methods
Abstract: Inertial tracking is vital for autonomous robots and has gained popularity with the ubiquity of low-cost Inertial Measurement Units (IMUs) and deep learning-powered tracking algorithms. Existing works, however, have not fully utilized IMU measurements, particularly magnetometers, nor maximized the potential of deep learning to achieve the desired accuracy. To bridge the gap, we introduce NeurIT, which employs a Time-Frequency Block-recurrent Transformer (TF-BRT) at its core, combining RNN and Transformer to learn both time-frequency representative features. To fully utilize IMU information, we strategically employ differentiation of body-frame magnetometers for orientation calibration in a sensor fusion manner. Experiments conducted in diverse environments show that NeurIT maintains a mere 1-meter tracking error over a 300-meter distance, surpassing state-of-the-art baselines by 48.21% on unseen data. NeurIT also performs comparably to the visual-inertial approach (Tango Phone) in vision-favored conditions and surpasses it in plain environments. We share the code and data to promote further research: https://github.com/aiot-lab/NeurIT.
|
|
17:00-17:05, Paper ThET7.6 | |
MotionGlot: A Multi-Embodied Motion Generation Model |
|
Harithas, Sudarshan S | Brown University |
Sridhar, Srinath | Brown University |
Keywords: AI-Enabled Robotics, AI-Based Methods, Representation Learning
Abstract: This paper introduces MotionGlot, a model that can generate motion across multiple embodiments with different action dimensions, such as quadruped robots and human bodies. By leveraging the well-established training procedures commonly used in large language models (LLMs), we introduce an instruction-tuning template specifically designed for motion related tasks. Our approach demonstrates that the principles underlying LLM training can be successfully adapted to learn a wide range of motion generation tasks across multiple embodiments with different action dimensions. We demonstrate the various abilities of MotionGlot on a set of 6 tasks and report an average improvement of 35.3% across tasks. Additionally, we contribute two new datasets: (1) a dataset of expert controlled quadruped locomotion with approximately 48,000 trajectories paired with direction-based text annotations, and (2) a dataset of over 23,000 situational text prompts for human motion generation tasks. Finally, we conduct hardware experiments to validate the capabilities of our system in real-world applications.
|
|
17:05-17:10, Paper ThET7.7 | |
Retinex-BEVFormer: Using Retinex to Enhance Multi-View Image-Based BEV Detector in Low Light Scenes |
|
Liu, Xuan | Beihang University |
Xiong, Zhongxia | Beihang University |
Yao, Ziying | Beihang University |
Wu, Xinkai | Beihang University |
Keywords: Intelligent Transportation Systems, Deep Learning for Visual Perception
Abstract: Multi-view image-based BEV (Bird's Eye View) 3D perception is gaining attention as an alternative to high-cost LiDAR systems and has achieved notable success. However, there is a significant safety concern for future image-based BEV autonomous driving in low-light conditions (such as nighttime) while the limited research on BEV detectors for these scenes. In this paper, we attempt to enhance low-light BEV perception with illumination-guided feature fusion. We propose Retinex-BEVFormer, which uses illumination information generated by the Retinex theory to enhance the model's robustness to varying lighting conditions and improve detection performance in low-light scenes. Additionally, to address the illumination estimation discontinuity from multi-view images that can adversely affect detection, we propose the MVB-Retinex module, which balances illumination estimation by leveraging overlapping regions between adjacent images. Notably, our proposed method is a plug-and-play module that can be applied to any image-based BEV detector method and does not require any additional ground truth supervision. We conduct extensive experiments on the Nuscenes dataset, validating our algorithm in nighttime and daytime scenes. Compared to the baseline, our algorithm achieves a 2.9% increase in mAP on the validation set with minimal computational cost, especially showing a 3.6% improvement in nighttime scene. The experiments demonstrate that our Retinex-BEVFormer effectively improves detection performance in low-light conditions and enhances performance under normal illumination, indicating increased robustness of the BEV detector.
|
|
ThET8 |
311 |
Collision Avoidance 2 |
Regular Session |
Chair: Figueredo, Luis | University of Nottingham (UoN) |
Co-Chair: Bylard, Andrew | Stanford University |
|
16:35-16:40, Paper ThET8.1 | |
Reactive Collision Avoidance for Safe Agile Navigation |
|
Saviolo, Alessandro | New York University |
Picello, Niko | University of Padova |
Mao, Jeffrey | New York University |
Verma, Rishabh | New York University |
Loianno, Giuseppe | New York University |
Keywords: Aerial Systems: Perception and Autonomy, Aerial Systems: Mechanics and Control, Aerial Systems: Applications
Abstract: Reactive collision avoidance is essential for agile robots navigating complex and dynamic environments, enabling real-time obstacle response. However, this task is inherently challenging because it requires a tight integration of perception, planning, and control, which traditional methods often handle separately, resulting in compounded errors and delays. This paper introduces a novel approach that unifies these tasks into a single reactive framework using solely onboard sensing and computing. Our method combines nonlinear model predictive control with adaptive control barrier functions, directly linking perception-driven constraints to real-time planning and control. Constraints are determined by using a neural network to refine noisy RGB-D data, enhancing depth accuracy, and selecting points with the minimum time-to-collision to prioritize the most immediate threats. To maintain a balance between safety and agility, a heuristic dynamically adjusts the optimization process, preventing overconstraints in real time. Extensive experiments with an agile quadrotor demonstrate effective collision avoidance across diverse indoor and outdoor environments, without requiring environment-specific tuning or explicit mapping.
|
|
16:40-16:45, Paper ThET8.2 | |
Hardware-Accelerated Ray Tracing for Discrete and Continuous Collision Detection on GPUs |
|
Sui, Sizhe | University of Texas, Austin |
Sentis, Luis | The University of Texas at Austin |
Bylard, Andrew | Stanford University |
Keywords: Collision Avoidance, Computational Geometry, Motion and Path Planning
Abstract: This paper presents a set of simple and intuitive robot collision detection algorithms that show substantial scaling improvements for high geometric complexity and large numbers of collision queries by leveraging hardware-accelerated ray tracing on GPUs. It is the first leveraging hardware-accelerated ray-tracing for direct volume mesh-to-mesh discrete collision detection and applying it to continuous collision detection. We introduce two methods: Ray-Traced Discrete-Pose Collision Detection for exact robot mesh to obstacle mesh collision detection, and Ray-Traced Continuous Collision Detection for robot sphere representation to obstacle mesh swept collision detection, using piecewise-linear or quadratic B-splines. For robot link meshes totaling 24k triangles and obstacle meshes of over 190k triangles, our methods were up to 2.8 times faster in batched discrete-pose queries than a state-of-the-art GPU-based method using a sphere robot representation. For the same obstacle mesh scene, our sphere-robot continuous collision detection was up to 7 times faster depending on trajectory batch size. We also performed detailed measurements of the volume coverage accuracy of various sphere/mesh pose/path representations to provide insight into the tradeoffs between speed and accuracy of different robot collision detection methods.
|
|
16:45-16:50, Paper ThET8.3 | |
Collision Avoidance in Model Predictive Control Using Velocity Damper |
|
Haffemayer, Arthur | LAAS-CNRS |
Jordana, Armand | New York University |
De Matteďs, Ludovic | LAAS-CNRS |
Wojciechowski, Krzysztof | LAAS-CNRS |
Righetti, Ludovic | New York University |
Lamiraux, Florent | CNRS |
Mansard, Nicolas | CNRS |
Keywords: Collision Avoidance, Optimization and Optimal Control
Abstract: We propose an advanced method for controlling the motion of a manipulator robot with strict collision avoidance in dynamic environments, leveraging a velocity damper constraint. Unlike conventional distance-based constraints, which tend to saturate near obstacles to reach optimality, the velocity damper constraint considers both distance and relative velocity, ensuring a safer separation. This constraint is incorporated into a model predictive control framework and enforced as a hard constraint through analytical derivatives supplied to the numerical solver. The approach has been fully implemented on a Franka Emika Panda robot and validated through experimental trials, demonstrating effective collision avoidance during dynamic tasks and robustness to unmodeled disturbances. An efficient open-source implementation along examples are provided here: url{https://gepettoweb.laas.fr/articles/haffemayer2025.html}.
|
|
16:50-16:55, Paper ThET8.4 | |
On the Synthesis of Reactive Collision-Free Whole-Body Robot Motions: A Complementarity-Based Approach |
|
Yao, Haowen | Technical Univerity of Munich |
Laha, Riddhiman | Technical University of Munich |
Sinha, Anirban | GE Aerospace Research |
Hall, Jonas | Boston University |
Figueredo, Luis | University of Nottingham (UoN) |
Chakraborty, Nilanjan | Stony Brook University |
Haddadin, Sami | Mohamed Bin Zayed University of Artificial Intelligence |
Keywords: Optimization and Optimal Control, Whole-Body Motion Planning and Control, Reactive and Sensor-Based Planning
Abstract: This paper is about generating motion plans for high degree-of-freedom systems that account for both static and dynamic collisions along the entire body. A particular class of mathematical programs with complementarity constraints become useful in this regard. Optimization-based planners can tackle confined space trajectory planning while being cognizant of robot and (mostly static) obstacle constraints. However, handling moving obstacles is non-trivial in a real-time setting. To this end, we present the FLIQC (Fast LInear Quadratic Complementarity based) motion planner. Our reactive planner employs a novel motion model that captures the entire rigid robot as well as the obstacle geometry and ensures non- penetration between the surfaces due to the imposed constraint. We perform thorough comparative studies with the state-of- the-art, which demonstrate improved performance. Extensive simulation and hardware experiments validate our claim of generating continuous and real-time motion plans at 1 kHz for modern collaborative robots with constant minimal parameters.
|
|
16:55-17:00, Paper ThET8.5 | |
Rapid Dynamic Obstacle Avoidance for UAVs Enhanced by DVS and Neuromorphic Computing |
|
Wang, Siyang | Xi'an Jiaotong University |
Yu, Sheng | Xi'an Jiaotong University |
Liang, Tingbang | Xi'an Jiaotong University |
Shi, Yilin | Xi’an Jiaotong University |
Ma, Yongqiang | Xi'an Jiaotong University |
Ren, Pengju | Xi'an Jiaotong University |
Keywords: Collision Avoidance, Aerial Systems: Applications, Force Control
Abstract: Achieving rapid and accurate dynamic obstacle avoidance is crucial for enhancing the survivability of unmanned aerial vehicles (UAVs) in hazardous conditions. To accomplish dynamic obstacle avoidance, sensors with high temporal resolution and efficient processing models are required. Dynamic vision sensors (DVS) fulfill the sensing requirements, while spiking neural networks (SNNs) address the processing demands. In this paper, we develop an end-to-end obstacle avoidance algorithm for UAVs using only a single monocular DVS as the sensor and further enhance accuracy and speed through our proposed mechanisms. The algorithm consists of three components: ego-motion compensation, an SNN model for movement analysis, and a force filter inspired by spiking neurons. In movement analysis, we propose the temporal potential pooling (TPP) and incremental event (EI) mechanisms to accelerate our SNN model. The real-flight experiments confirm that our algorithm achieves approximately 90% accuracy with a processing latency as low as 4ms on a GPU, surpassing state-of-the-art methods. Ablation studies show that the proposed method maintains high accuracy in movement detection while significantly reducing computational time. Our method operates in real-time, achieves high accuracy, and is feasible across a wide range of environments. Our code is available at https://github.com/AmperiaWang/oanet_s1 for reproducibility.
|
|
17:00-17:05, Paper ThET8.6 | |
Efficient Collision Detection Framework for Enhancing Collision-Free Robot Motion |
|
Zhu, Xiankun | Tsinghua University |
Xin, Yucheng | Tsinghua University |
Li, Shoujie | Tsinghua Shenzhen International Graduate School |
Liu, Houde | Shenzhen Graduate School, Tsinghua University |
Xia, Chongkun | Sun Yat-Sen University |
Liang, Bin | Center for Artificial Intelligence and Robotics, Graduate School |
Keywords: Collision Avoidance, Integrated Planning and Learning, Reactive and Sensor-Based Planning
Abstract: Fast and efficient collision detection is essential for motion generation in robotics. In this paper, we propose an efficient collision detection framework based on the Signed Distance Field (SDF) of robots, seamlessly integrated with a self-collision detection module. Firstly, we decompose the robot's SDF using forward kinematics and leverage multiple extremely lightweight networks in parallel to efficiently approximate the SDF. Moreover, we introduce support vector machines to integrate the self-collision detection module into the framework, which we refer to as the SDF-SC framework. Using statistical features, our approach unifies the representation of collision distance for both SDF and self-collision detection. During this process, we maintain and utilize the differentiable properties of the framework to optimize collision-free robot trajectories. Finally, we develop a reactive motion controller based on our framework, enabling real-time avoidance of multiple dynamic obstacles. While maintaining high accuracy, our framework achieves inference speeds up to five times faster than previous methods. Experimental results on the Franka robotic arm demonstrate the effectiveness of our approach.
|
|
17:05-17:10, Paper ThET8.7 | |
Differentiable Composite Neural Signed Distance Fields for Robot Navigation in Dynamic Indoor Environments |
|
Bukhari, Syed Talha | Purdue University |
Lawson, Daniel | Purdue University |
Qureshi, Ahmed H. | Purdue University |
Keywords: Vision-Based Navigation, RGB-D Perception
Abstract: Neural Signed Distance Fields (SDFs) provide a differentiable environment representation to readily obtain collision checks and well-defined gradients for robot navigation tasks. However, updating neural SDFs as the scene evolves entails re-training, which is tedious, time consuming, and inefficient, making it unsuitable for robot navigation with limited field-of-view in dynamic environments. Towards this objective, we propose a compositional framework of neural SDFs to solve robot navigation in indoor environments using only an onboard RGB-D sensor. Our framework embodies a dual mode procedure for trajectory optimization, with different modes using complementary methods of modeling collision costs and collision avoidance gradients. The primary stage queries the robot body's SDF, swept along the route to goal, at the obstacle point cloud, enabling swift local optimization of trajectories. The secondary stage infers the visible scene's SDF by aligning and composing the SDF representations of its constituents, providing better informed costs and gradients for trajectory optimization. The dual mode procedure combines the best of both stages, achieving a success rate of 98%, 14.4% higher than baseline with comparable amortized plan time on iGibson 2.0. We also demonstrate its effectiveness in adapting to real-world indoor scenarios.
|
|
17:10-17:15, Paper ThET8.8 | |
On the Evaluation of Collision Probability Along a Path |
|
Paiola, Lorenzo | Istituto Italiano Di Tecnologia |
Grioli, Giorgio | Istituto Italiano Di Tecnologia |
Bicchi, Antonio | Fondazione Istituto Italiano Di Tecnologia |
Keywords: Risk, Collision Avoidance, Probability and Statistical Methods, Robot Safety
Abstract: Characterizing the risk of operations is a fundamental requirement in robotics, and a crucial ingredient of safe planning. The problem is multifaceted, with multiple definitions arising in the vast recent literature fitting different application scenarios and leading to different computational approaches. A basic element shared by most frameworks is the definition and evaluation of the probability of collision for a mobile object in an environment with obstacles. We observe that, even in basic cases, different interpretations are possible. This paper proposes an index we call Risk Density, which offers a theoretical link between conceptually distant assumptions about the interplay of single collision events along a continuous path. We show how this index can be used to approximate the collision probability in the case where the robot evolves along a nominal continuous curve from random initial conditions. Indeed under this hypothesis the proposed approximation outperforms some well-established methods either in accuracy or computational cost.
|
|
ThET9 |
312 |
Task and Motion Planning 4 |
Regular Session |
Chair: Bera, Aniket | Purdue University |
Co-Chair: Shkurti, Florian | University of Toronto |
|
16:35-16:40, Paper ThET9.1 | |
Fast and Accurate Task Planning Using Neuro-Symbolic Language Models and Multi-Level Goal Decomposition |
|
Kwon, Minseo | Ewha Womans University |
Kim, Yaesol | Istituto Italiano Di Tecnologia |
Kim, Young J. | Ewha Womans University |
Keywords: Task Planning, Task and Motion Planning
Abstract: In robotic task planning, symbolic planners using rule-based representations like PDDL are effective but struggle with long-sequential tasks in complicated environments due to exponentially increasing search space. Meanwhile, LLM-based approaches, which are grounded in artificial neural networks, offer faster inference and commonsense reasoning but suffer from lower success rates. To address the limitations of the current symbolic (slow speed) or LLM-based approaches (low accuracy), we propose a novel neuro-symbolic task planner that decomposes complex tasks into subgoals using LLM and carries out task planning for each subgoal using either symbolic or MCTS-based LLM planners, depending on the subgoal complexity. This decomposition reduces planning time and improves success rates by narrowing the search space and enabling LLMs to focus on more manageable tasks. Our method significantly reduces planning time while maintaining high success rates across task planning domains, as well as real-world and simulated robotics environments. More details are available at http://graphics.ewha.ac.kr/LLMTAMP/.
|
|
16:40-16:45, Paper ThET9.2 | |
OpenBench: A New Benchmark and Baseline for Semantic Navigation in Smart Logistics |
|
Wang, Junhui | Macau University of Science and Technology |
Huo, Dongjie | Beijing University of Chemical Technology |
Xu, ZeHui | Harbin Institute of Technology |
Shi, Yongliang | Tsinghua University |
Yan, Yimin | University of Chinese Academy of Sciences |
Wang, Yuanxin | Beijing Institute of Technology |
Gao, Chao | University of Cambridge |
Qiao, Yan | Macau University of Science and Technology |
Zhou, Guyue | Tsinghua University |
Keywords: Autonomous Vehicle Navigation, Task and Motion Planning, Engineering for Robotic Systems
Abstract: The increasing demand for efficient last-mile delivery in smart logistics underscores the role of autonomous robots in enhancing operational efficiency and reducing costs. Traditional navigation methods, which depend on high-precision maps, are resource-intensive, while learning-based approaches often struggle with generalization in real-world scenarios. To address these challenges, this work proposes the Openstreetmap-enhanced oPen-air sEmantic Navigation (OPEN) system that combines foundation models with classic algorithms for scalable outdoor navigation. The system leverages OpenStreetMap (OSM) for flexible map representation, thereby eliminating the need for extensive pre-mapping efforts. It also employs Large Language Models (LLMs) to comprehend delivery instructions and Vision-Language Models (VLMs) for global localization, map updates, and house number recognition. To compensate the limitations of existing benchmarks that are inadequate for assessing last-mile delivery, this work introduces a new benchmark specifically designed for outdoor navigation in residential areas, reflecting the real-world challenges faced by autonomous delivery systems. Extensive experiments validate the effectiveness of the proposed system in enhancing navigation efficiency and reliability. To facilitate further research, our code and benchmark are publicly available.
|
|
16:45-16:50, Paper ThET9.3 | |
KARMA: Augmenting Embodied AI Agents with Long-And-Short Term Memory Systems |
|
Wang, Zixuan | Institute of Automation, Chinese Academy of Sciences |
Yu, Bo | Shenzhen Institute of Artificial Intelligence and Robotics for S |
Zhao, Junzhe | Alibaba |
Sun, Wenhao | Institute of Computing Technology, Chinese Academy of Sciences |
Hou, Sai | Beijing Institute of Technology |
Liang, Shuai | Institute of Computing Technology, Chinese Academy of Sciences ( |
Hu, Xing | Institute of Computing Technology, Chinese Academy of Sciences |
Han, Yinhe | Institute of Computing Technology, Chinese Academy of Sciences |
Gan, Yiming | Institute of Computing Technology, Chinese Academy of Sciences |
Keywords: AI-Based Methods, Task Planning, Motion and Path Planning
Abstract: 负责执行互连的、长序列的家务任务经常面临上下文记忆的困难,导致任务执行效率低下和错误。为了解决这个问题,我们引入了 KARMA,一种创新的记忆系统它集成了长期和短期记忆模块,增强大型语言模型 (LLM) 以进行规划通过记忆增强提示进行具身代理。 业 区分长期记忆和短期记忆,其中长期记忆捕获全面的 3D 场景图作为环境的表示,而短期记忆动态记录对象位置的变化,并且 国家。 这种双重记忆结构允许代理检索相关的过去场景体验,从而提高任务规划的准确性和效率。 短期 内存采用有效和自适应内存替换的策略,确保保留关键信息同时丢弃不太相关的数据
|
|
16:50-16:55, Paper ThET9.4 | |
Socratic Planner: Self-QA-Based Zero-Shot Planning for Embodied Instruction Following |
|
Shin, Suyeon | Seoul National University |
Jeon, Sujin | Seoul National University |
Kim, Junghyun | Seoul National University |
Kang, Gi-Cheon | Seoul National University |
Zhang, Byoung-Tak | Seoul National University |
Keywords: Task and Motion Planning, AI-Based Methods, Task Planning
Abstract: Embodied Instruction Following (EIF) is the task of executing natural language instructions by navigating and interacting with objects in interactive environments. A key challenge in EIF is compositional task planning, typically addressed through supervised learning or few-shot in-context learning with labeled data. To this end, we introduce the Socratic Planner, a self-QA-based zero-shot planning method that infers an appropriate plan without any further training. The Socratic Planner first facilitates the Large Language Model (LLM) in performing self-questioning and answering, which in turn helps generate a sequence of subgoals. While executing the subgoals, an embodied agent may encounter unexpected situations, such as unforeseen obstacles. The Socratic Planner then adjusts plans based on dense visual feedback through a visually-grounded re-planning mechanism. Experiments demonstrate the effectiveness of the Socratic Planner, outperforming current state-of-the-art planning models on the ALFRED benchmark across all metrics, particularly excelling in long-horizon tasks that demand complex inference. We further demonstrate real-world applicability through deployment on a physical robot.
|
|
16:55-17:00, Paper ThET9.5 | |
Hypergraph-Based Coordinated Task Allocation and Socially-Aware Navigation for Multi-Robot Systems |
|
Wang, Weizheng | Purdue University |
Bera, Aniket | Purdue University |
Min, Byung-Cheol | Purdue University |
Keywords: Task and Motion Planning, Deep Learning Methods, Multi-Robot Systems
Abstract: A team of multiple robots seamlessly and safely working in human-filled public environments requires adaptive task allocation and socially-aware navigation that account for dynamic human behavior. Current approaches struggle with highly dynamic pedestrian movement and the need for flexible task allocation. We propose Hyper-SAMARL, a hypergraph-based system for multi-robot task allocation and socially-aware navigation, leveraging multi-agent reinforcement learning (MARL). Hyper-SAMARL models the environmental dynamics between robots, humans, and points of interest (POIs) using a hypergraph, enabling adaptive task assignment and socially-compliant navigation through a hypergraph diffusion mechanism. Our framework, trained with MARL, effectively captures interactions between robots and humans, adapting tasks based on real-time changes in human activity. Experimental results demonstrate that Hyper-SAMARL outperforms baseline models in terms of social navigation, task completion efficiency, and adaptability in various simulated scenarios.
|
|
17:00-17:05, Paper ThET9.6 | |
Bootstrapping Object-Level Planning with Large Language Models |
|
Paulius, David | Brown University |
Agostini, Alejandro | University of Innsbruck |
Quartey, Benedict | Brown University |
Konidaris, George | Brown University |
Keywords: Task and Motion Planning, AI-Based Methods, Task Planning
Abstract: We introduce a new method that extracts knowledge from a large language model (LLM) to produce object-level plans, which describe high-level changes to object state, and uses them to bootstrap task and motion planning (TAMP). Existing work uses LLMs to directly output task plans or generate goals in representations like PDDL. However, these methods fall short because they rely on the LLM to do the actual planning or output a hard-to-satisfy goal. Our approach instead extracts knowledge from an LLM in the form of plan schemas as an object-level representation called functional object-oriented networks (FOON), from which we automatically generate PDDL subgoals. Our method markedly outperforms alternative planning strategies in completing several pick-and-place tasks in simulation.
|
|
17:05-17:10, Paper ThET9.7 | |
GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration |
|
Wake, Naoki | Microsoft |
Kanehira, Atsushi | Microsoft |
Sasabuchi, Kazuhiro | Microsoft |
Takamatsu, Jun | Microsoft |
Ikeuchi, Katsushi | Microsoft |
Keywords: Task and Motion Planning, Task Planning, Imitation Learning
Abstract: We introduce a pipeline that enhances a general-purpose Vision Language Model, GPT-4V(ision), to facilitate one-shot visual teaching for robotic manipulation. This system analyzes videos of humans performing tasks and outputs executable robot programs that incorporate insights into affordances. The process begins with GPT-4V analyzing the videos to obtain textual explanations of environmental and action details. A GPT-4-based task planner then encodes these details into a symbolic task plan. Subsequently, vision systems spatially and temporally ground the task plan in the videos—objects are identified using an open-vocabulary object detector, and hand-object interactions are analyzed to pinpoint moments of grasping and releasing. This spatiotemporal grounding allows for the gathering of affordance information (e.g., grasp types, waypoints, and body postures) critical for robot execution. Experiments across various scenarios demonstrate the method's efficacy in achieving real robots' operations from human demonstrations in a one-shot manner. Meanwhile, quantitative tests have revealed instances of hallucination in GPT-4V, highlighting the importance of incorporating human supervision within the pipeline. The prompts of GPT-4V/GPT-4 are available at this project page: https://microsoft.github.io/GPT4Vision-Robot-Manipulation-Prompts/
|
|
17:10-17:15, Paper ThET9.8 | |
Action Contextualization: Adaptive Task Planning and Action Tuning Using Large Language Models |
|
Gupta, Sthithpragya | Ecole Polytechnique Federale De Lausanne |
Yao, Kunpeng | Massachusetts Institute of Technology |
Niederhauser, Loďc | EPFL |
Billard, Aude | EPFL |
Keywords: Task and Motion Planning, Task Planning, AI-Based Methods
Abstract: Large Language Models (LLMs) present a promising frontier in robotic task planning by leveraging extensive human knowledge. Nevertheless, the current literature often overlooks the critical aspects of robots' adaptability and error correction. This work aims to overcome this limitation by enabling robots to modify their motions and select the most suitable task plans based on the context. We introduce a novel framework to achieve action contextualization, aimed at tailoring robot actions to the context of specific tasks, thereby enhancing adaptability through applying LLM-derived contextual insights. Our framework integrates motion metrics that evaluate robot performances for each motion to resolve redundancy in planning. Moreover, it supports online feedback between the robot and the LLM, enabling immediate modifications to the task plans and corrections of errors. An overall success rate of 81.25% has been achieved through extensive experimental validation. Finally, when integrated with dynamical system (DS)-based robot controllers, the robotic arm-hand system demonstrates its proficiency in autonomously executing LLM-generated motion plans for sequential table-clearing tasks, rectifying errors without human intervention, and showcasing robustness against external disturbances. Our proposed framework also features the potential to be integrated with modular control approaches, significantly enhancing robots' adaptability and autonomy in performing sequential tasks in the real world.
|
|
ThET10 |
313 |
Multi-Robot Systems and Tools |
Regular Session |
Chair: Wilson, Sean | Georgia Institute of Technology, Georgia Tech Research Institute |
Co-Chair: Goldberg, Ken | UC Berkeley |
|
16:35-16:40, Paper ThET10.1 | |
CognitiveOS: Large Multimodal Model Based System to Endow Any Type of Robot with Generative AI |
|
Lykov, Artem | Skolkovo Institute of Science and Technology |
Konenkov, Mikhail | Skolkovo Institute of Science and Technology |
Gbagbe, Koffivi Fidele | Skolkovo Institute of Science and Technology |
Litvinov, Mikhail | Skolkovo Institute of Science and Technology |
Davletshin, Denis | Skolkovo Institute of Science and Technology |
Fedoseev, Aleksey | Skolkovo Institute of Science and Technology |
Altamirano Cabrera, Miguel | Skolkovo Institute of Science and Technology (Skoltech), Moscow, |
Peter Vimalathas, Robinroy | Intelligent Space Robotics Laboratory, Skolkovo Institute of Sci |
Tsetserukou, Dzmitry | Skolkovo Institute of Science and Technology |
Keywords: Cognitive Control Architectures, Multi-Modal Perception for HRI, Cooperating Robots
Abstract: This paper introduces CognitiveOS, the first operating system designed for cognitive robots capable of functioning across diverse robotic platforms. CognitiveOS is structured as a multi-agent system comprising modules built upon a transformer architecture, facilitating communication through an internal monologue format. These modules collectively empower the robot to tackle intricate real-world tasks. The paper delineates the operational principles of the system along with descriptions of its nine distinct modules. The modular design endows the system with distinctive advantages over traditional end-to-end methodologies, notably in terms of adaptability and scalability. The system's modules are configurable, modifiable, or deactivatable depending on the task requirements, while new modules can be seamlessly integrated. This system serves as a foundational resource for researchers and developers in the cognitive robotics domain, alleviating the burden of constructing a cognitive robot system from scratch. Experimental findings demonstrate the system's advanced task comprehension and adaptability across varied tasks, robotic platforms, and module configurations, underscoring its potential for real-world applications. Moreover, in the category of Reasoning it outperformed CognitiveDog (by 15%) and RT2 (by 31%), achieving the highest to date rate of 77%. We provide a code repository and dataset for the replication of CognitiveOS: https://github.com/Arcwy0/cognitiveos
|
|
16:40-16:45, Paper ThET10.2 | |
CLSTR: Capability-Level System for Tracking Robots |
|
Bejarano, Alexandra | Colorado School of Mines |
Bonial, Claire | US Army Research Laboratory |
Williams, Tom | Colorado School of Mines |
Keywords: Multi-Robot Systems
Abstract: For human operators to effectively task teams of robots, it is critical that they maintain situational awareness about the status of those robots. However, maintaining this situational awareness becomes particularly difficult when there are dynamic changes not only in the members of the robot team, but also in the capabilities of those robots. Prior work has shown that situational awareness can be supported through interfaces that effectively visualize task-relevant information. As such, in this work, we introduce a Capability-Level System for Tracking Robots (CLSTR), a new visualization for supporting operators to maintain an appropriate level of situational awareness over the capabilities of a dynamic robot team. In evaluating CLSTR through an online human-subject study (n=123), we found that a combination of different visual elements within an interface like the use of icons to summarize robot capabilities and animations to indicate team changes can help operators maintain awareness over robot teams.
|
|
16:45-16:50, Paper ThET10.3 | |
Mitigating Side Effects in Multi-Agent Systems Using Blame Assignment |
|
Rustagi, Pulkit | Oregon State University |
Saisubramanian, Sandhya | Oregon State University |
Keywords: Multi-Robot Systems, Planning under Uncertainty, Path Planning for Multiple Mobile Robots or Agents
Abstract: When independently trained or designed robots are deployed in a shared environment, their combined actions can lead to unintended negative side effects (NSEs). To ensure safe and efficient operation, robots must optimize task performance while minimizing the penalties associated with NSEs, balancing individual objectives with collective impact. We model the problem of mitigating NSEs in a cooperative multi-agent system as a bi-objective lexicographic decentralized Markov decision process. We assume independence of transitions and rewards with respect to the robots' tasks, but the joint NSE penalty creates a form of dependence in this setting. To improve scalability, the joint NSE penalty is decomposed into individual penalties for each robot using credit assignment, which facilitates decentralized policy computation. We empirically demonstrate, using mobile robots and in simulation, the effectiveness and scalability of our approach in mitigating NSEs. Code: https://tinyurl.com/RECON-NSE-Mitigation
|
|
16:50-16:55, Paper ThET10.4 | |
Decentralized Drone Swaps for Online Rebalancing of Drone Delivery Tasks |
|
Vakil, Kamran | Boston University |
Pierson, Alyssa | Boston University |
Keywords: Multi-Robot Systems, Networked Robots, Sensor Networks
Abstract: Recent research has seen the advancement of drone depot models as a promising way to allocate drones for large-scale task completion. Applications of these drone depot models include data collection, environmental monitoring, package delivery, and more. This paper focuses on sharing agents between static depots for task allocation based on expected demand. We model the problem as a Binary Nonlinear Program, then derive an iterative neighborhood search based on solving a series of Binary Linear Programs to drive towards the optimal configuration of agents for each depot. We show that our method is more tractable than a Branch and Bound approach for this model as problem complexity grows. We also show through simulations that with near optimal allocation between local depots, the overall system performance will outperform greedy and non-sharing approaches.
|
|
16:55-17:00, Paper ThET10.5 | |
A Fairness-Oriented Control Framework for Safety-Critical Multi-Robot Systems: Alternative Authority Control |
|
Shi, Lei | Johns Hopkins University |
Liu, Qichao | University of Wisconsin–Madison |
Zhou, Cheng | Tencent |
Li, Xiong | Tencent |
Keywords: Multi-Robot Systems, Intelligent Transportation Systems, Collision Avoidance
Abstract: This paper proposes a fair control framework for multi-robot systems, which integrates the newly introduced Alternative Authority Control (AAC) and Flexible Control Barrier Function (F-CBF). Control authority refers to a single robot which can plan its trajectory while considering others as moving obstacles, meaning the other robots do not have authority to plan their own paths. The AAC method dynamically distributes the control authority, enabling fair and coordinated movement across the system. This approach significantly improves computational efficiency, scalability, and robustness in complex environments. The proposed F-CBF extends traditional CBFs by incorporating obstacle shape, velocity, and orientation. F-CBF enhances safety by accurate dynamic obstacle avoidance. The framework is validated through simulations in multi-robot scenarios, demonstrating its safety, robustness and computational efficiency.
|
|
17:00-17:05, Paper ThET10.6 | |
FogROS2-PLR: Probabilistic Latency-Reliability for Cloud Robotics |
|
Chen, Kaiyuan | University of California, Berkeley |
Tian, Nan | University of California, Berkeley |
Juette, Christian | Bosch Research |
Qiu, Tianshuang | University of California, Berkeley |
Ren, Liu | Robert Bosch North America Research Technology Center |
Kubiatowicz, John | UC Berkeley |
Goldberg, Ken | UC Berkeley |
Keywords: Networked Robots, Cellular and Modular Robots, Engineering for Robotic Systems
Abstract: Cloud robotics enables robots to offload complex computational tasks to cloud servers for performance, cost, and ease of management. However, the network and cloud computing infrastructure are not designed for reliable timing guarantee, leading to fluctuating Quality-of-Service (QoS). In this work, we formulate an impossibility triangle of latency reliability, singleton deployment and commodity hardware. The theorem implicates that providing replicated resources with uncorrelated failures exponentially reduces the probability of missing a deadline. We present FogROS2-Probabilistic Latency Reliability (RLR) that uses multiple independent network interfaces to send requests to replicated cloud resources and uses the first response back. We design routing mechanisms to discover, connect, and route through non-default network interfaces on robots. FogROS2-PLR optimizes the selection of interfaces to servers by minimizing the probability of missing a deadline. We conduct a cloud-connected driving experiment with two 5G service providers, demonstrating FogROS2-PLR effectively provides smooth service quality even if one of the service providers experiences low coverage and base station handover. We use 99 Percentile (P99) latency to evaluate anomalous long-tail latency behavior. In the experiment, FogROS2-PLR improves P99 latency by up to 3.7x compared to using one service provider. We deploy FogROS2-PLR on a physical Stretch 3 robot with an indoor human-tracking task. Even in a fully covered Wi-Fi and 5G environment, FogROS2-PLR improves the responsiveness of the robot by reducing 36% of mean latency and 33% P99 latency. Code and supplementary can be found on website.
|
|
17:05-17:10, Paper ThET10.7 | |
Jointly Assigning Processes to Machines and Generating Plans for Autonomous Mobile Robots in a Smart Factory |
|
Leet, Christopher | University of Southern California |
Sciortino, Aidan | University of Rochester |
Koenig, Sven | University of Southern California |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems, Industrial Robots
Abstract: A modern smart factory runs a manufacturing procedure using a collection of programmable machines. Typically, materials are ferried between these machines using a team of mobile robots. To embed a manufacturing procedure in a smart factory, a factory operator must a) assign its processes to the smart factory's machines and b) determine how agents should carry materials between machines. A good embedding maximizes the smart factory's throughput; the rate at which it outputs products. Existing smart factory management systems solve the aforementioned problems sequentially, limiting the throughput that they can achieve. In this paper we introduce ACES, the Anytime Cyclic Embedding Solver, the first solver which jointly optimizes the assignment of processes to machines and the assignment of paths to agents. We evaluate ACES and show that it can scale to real industrial scenarios.
|
|
ThET11 |
314 |
Physical Human-Robot Interaction |
Regular Session |
Chair: Song, Kai-Tai | National Yang Ming Chiao Tung University |
Co-Chair: Secchi, Cristian | Univ. of Modena & Reggio Emilia |
|
16:35-16:40, Paper ThET11.1 | |
A Control Scheme for Collaborative Object Transportation between a Human and a Quadruped Robot Using the MIGHTY Suction Cup |
|
Plotas, Konstantinos | Hellenic Mediterranean University |
Papadakis, Emmanouil | Foundation for Research and Technology - Hellas |
Drosakis, Drosakis | Foundation for Research and Technology–Hellas |
Trahanias, Panos | Foundation for Research and Technology – Hellas (FORTH) |
Papageorgiou, Dimitrios | Hellenic Mediterranean University |
Keywords: Physical Human-Robot Interaction, Human-Robot Collaboration, Compliance and Impedance Control
Abstract: In this work, a control scheme for human-robot collaborative object transportation is proposed, considering a quadruped robot equipped with the MIGHTY suction cup that serves both as a gripper for holding the object and a force/torque sensor. The proposed control scheme is based on the notion of admittance control, and incorporates a variable damping term aiming towards increasing the controllability of the human and, at the same time, decreasing her/his effort. Furthermore, to ensure that the object is not detached from the suction cup during the collaboration, an additional control signal is proposed, which is based on a barrier artificial potential. The proposed control scheme is proven to be passive and its performance is demonstrated through experimental evaluations conducted using the Unitree Go1 robot equipped with the MIGHTY suction cup.
|
|
16:40-16:45, Paper ThET11.2 | |
DTRT: Enhancing Human Intent Estimation and Role Allocation for Physical Human-Robot Collaboration |
|
Liu, Haotian | Institute of Automation, Chinese Academy of Sciences |
Tong, Yuchuang | The Institute of Automation of the Chinese Academy of Sciences |
Zhang, Zhengtao | Institute of Automation, Chinese Academy of Sciences |
Keywords: Physical Human-Robot Interaction, Human-Robot Collaboration, Intention Recognition
Abstract: In physical Human-Robot Collaboration (pHRC), accurate human intent estimation and rational human-robot role allocation are crucial for safe and efficient assistance. Existing methods that rely on short-term motion data for intention estimation lack multi-step prediction capabilities, hindering their ability to sense intent changes and adjust human-robot assignments autonomously, resulting in potential discrepancies. To address these issues, we propose a Dual Transformer-based Robot Trajectron (DTRT) featuring a hierarchical architecture, which harnesses human-guided motion and force data to rapidly capture human intent changes, enabling accurate trajectory predictions and dynamic robot behavior adjustments for effective collaboration. Specifically, human intent estimation in DTRT uses two Transformer-based Conditional Variational Autoencoders (CVAEs), incorporating robot motion data in obstacle-free case with human-guided trajectory and force for obstacle avoidance. Additionally, Differential Cooperative Game Theory (DCGT) is employed to synthesize predictions based on human-applied forces, ensuring robot behavior align with human intention. Compared to state-of-the-art (SOTA) methods, DTRT incorporates human dynamics into long-term prediction, providing an accurate understanding of intention and enabling rational role allocation, achieving robot autonomy and maneuverability. Experiments demonstrate DTRT's accurate intent estimation and superior collaboration performance.
|
|
16:45-16:50, Paper ThET11.3 | |
Learning-Based Dynamic Robot-To-Human Handover |
|
Kim, Hyeonseong | Korea University |
Kim, Chanwoo | Korea University |
Pan, Matthew | Queen's University |
Lee, Kyungjae | Korea University |
Choi, Sungjoon | Korea University |
Keywords: Physical Human-Robot Interaction, Human-Aware Motion Planning, Learning from Experience
Abstract: This paper presents a novel learning-based approach to dynamic robot-to-human handover, addressing the challenges of delivering objects to a moving receiver. We hypothesize that dynamic handover, where the robot adjusts to the receiver’s movements, results in more efficient and comfortable interaction compared to static handover, where the receiver is assumed to be stationary. To validate this, we developed a nonparametric method for generating continuous handover motion, conditioned on the receiver's movements, and trained the model using a dataset of 1,000 human-to-human handover demonstrations. We integrated preference learning for improved handover effectiveness and applied impedance control to ensure user safety and adaptiveness. The approach was evaluated in both simulation and real-world settings, with user studies demonstrating that dynamic handover significantly reduces handover time and improves user comfort compared to static methods. Videos and demonstrations of our approach are available at https://zerotohero7886.github.io/dyn-r2h-handover/.
|
|
16:50-16:55, Paper ThET11.4 | |
A Novel Dynamic Motion Primitives Framework for Safe Human-Robot Collaboration |
|
Pupa, Andrea | University of Modena and Reggio Emilia |
Di Vittorio, Filippo | University of Modena and Reggio Emilia |
Secchi, Cristian | Univ. of Modena & Reggio Emilia |
Keywords: Human-Robot Collaboration, Safety in HRI, Learning from Demonstration
Abstract: Learning by demonstration techniques are gaining popularity within the human-robot collaboration (HRC) scenarios. This is because they allow to deeply exploit the versatility of collaborative robots. In this context, dynamic motion primitives (DMPs) have become a standard method for enabling human operators to easily teach tasks to robots. However, DMPs have two main limitations. First, they may encounter difficulties in generalizing some tasks, which can lead to non-intuitive behavior. Second, it is not guaranteed that the output of DMPs is compliant with ISO/TS 15066, which provides guidelines for assessing safety in collaborative scenarios. This work aims to address these two issues by introducing a novel control pipeline. This pipeline leverages a new variant of DMPs, called Swap DMPs (SDMPs), introduced in this work. The SDMPs enable a more intuitive behavior when the robot reproduces the learned task. Subsequently, SDMPs are encoded into a new optimization problem that ensures the robot complies with the Speed and Separation Monitoring (SSM) collaborative mode. The proposed approach has been experimentally validated and compared with traditional DMPs in both simulation and a real scenario, where a UR5e and a human operator collaborate on a polishing task.
|
|
16:55-17:00, Paper ThET11.5 | |
Depth Restoration of Hand-Held Transparent Objects for Human-To-Robot Handover |
|
Yu, Ran | Tsinghua University |
Yu, Haixin | Tsinghua Shenzhen International Graduate School |
Li, Shoujie | Tsinghua Shenzhen International Graduate School |
Huang, Yan | Tsinghua University |
Song, Ziwu | Tsinghua University |
Ding, Wenbo | Tsinghua University |
Keywords: Multi-Modal Perception for HRI, Perception for Grasping and Manipulation, RGB-D Perception
Abstract: Transparent objects are common in daily life, while their optical properties pose challenges for RGB-D cameras to capture accurate depth information. This issue is further amplified when these objects are hand-held, as hand occlusions further complicate depth estimation. For assistant robots, however, accurately perceiving hand-held transparent objects is critical to effective human-robot interaction. This paper presents a Hand-Aware Depth Restoration (HADR) method based on creating an implicit neural representation function from a single RGB-D image. The proposed method utilizes hand posture as an important guidance to leverage semantic and geometric information of hand-object interaction. To train and evaluate the proposed method, we create a high-fidelity synthetic dataset named TransHand-14K with a real-to-sim data generation scheme. Experiments show that our method has better performance and generalization ability compared with existing methods. We further develop a real-world human-to-robot handover system based on HADR, demonstrating its potential in human-robot interaction applications.
|
|
17:00-17:05, Paper ThET11.6 | |
Leveraging Semantic and Geometric Information for Zero-Shot Robot-To-Human Handover |
|
Liu, Jiangshan | Southern University of Science and Technology |
Dong, Wenlong | Southern University of Science and Technology |
Wang, Jiankun | Southern University of Science and Technology |
Meng, Max Q.-H. | The Chinese University of Hong Kong |
Keywords: Physical Human-Robot Interaction, Robot Companions, Grasping
Abstract: Human-robot interaction (HRI) encompasses a wide range of collaborative tasks, with handover being one of the most fundamental. As robots become more integrated into human environments, the potential for service robots to assist in handing objects to humans is increasingly promising. In robot-to-human (R2H) handover, selecting the optimal grasp is crucial for success, as it requires avoiding interference with the human's preferred grasp region and minimizing intrusion into their workspace. Existing methods either inadequately consider geometric information or rely on data-driven approaches, which often struggle to generalize across diverse objects. To address these limitations, we propose a novel zero-shot system that combines semantic and geometric information to generate optimal handover grasps. Our method first identifies grasp regions using semantic knowledge from vision-language models (VLMs) and, by incorporating customized visual prompts, achieves finer granularity in region grounding. A grasp is then selected based on grasp distance and approach angle to maximize human ease and avoid interference. We validate our approach through ablation studies and real-world comparison experiments. Results demonstrate that our system improves handover success rates and provides a more user-preferred interaction experience. Videos, appendixes and more are available at https://sites.google.com/view/vlm-handover/.
|
|
17:05-17:10, Paper ThET11.7 | |
Human-To-Robot Handover Control of an Autonomous Mobile Robot Based on Hand-Masked Object Pose Estimation |
|
Song, Kai-Tai | National Yang Ming Chiao Tung University |
Huang, Yu-Yun | National Yang Ming Chiao Tung University |
Keywords: Human-Robot Collaboration, Grasping, Visual Servoing
Abstract: This paper presents a human-to-robot handover design for an Autonomous Mobile Robot (AMR). The developed control system enables the AMR to navigate to a specific person and grasp the object that the person wants to handover. This paper proposes a motion planning algorithm for grasping an unseen object held in hand. Through hand detection and segmentation, the hand region is masked and removed from the acquired depth image, which is used to estimate the object pose for grasping. For grasp pose determination, we propose to add the Convolutional Block Attention Module (CBAM) to the Generative Grasping Convolutional Neural Network (GGCNN) model to enhance the recognition rate. For the object-grasp task, the AMR localizes the object in person’s hand, and uses the Model Predictive Control (MPC)-based controller to simultaneously control the mobile base and manipulator to grasp the object. A laboratory-developed mobile manipulator, equipped with a 6-DoF TM5M-900 is used for experimental verification. The experimental results show an average handover success rate of 81% for five different objects.
|
|
ThET12 |
315 |
Motion Control 2 |
Regular Session |
Chair: Fan, Chuchu | Massachusetts Institute of Technology |
Co-Chair: Oh, Sehoon | DGIST |
|
16:35-16:40, Paper ThET12.1 | |
Learning Multimodal Confidence for Intention Recognition in Human-Robot Interaction |
|
Zhao, Xiyuan | Southeast University |
Li, Huijun | Southeast University |
Miao, Tianyuan | Southeast University |
Zhu, Xianyi | Southeast University |
Wei, Zhikai | Southeast University |
Tan, Lifen | China Astronaut Research and Training Center |
Song, Aiguo | Southeast University |
Keywords: Multi-Modal Perception for HRI, Human Factors and Human-in-the-Loop
Abstract: The rapid development of collaborative robotics has provided a new possibility of helping the elderly who has difficulties in daily life, allowing robots to operate according to specific intentions. However, efficient human-robot cooperation requires natural, accurate and reliable intention recognition in shared environments. The current paramount challenge for this is reducing the uncertainty of multimodal fused intention to be recognized and reasoning adaptively a more reliable result despite current interactive condition. In this work we propose a novel learning-based multimodal fusion framework Batch Multimodal Confidence Learning for Opinion Pool (BMCLOP). Our approach combines Bayesian multimodal fusion method and batch confidence learning algorithm to improve accuracy, uncertainty reduction and success rate given the interactive condition. In particular, the generic and practical multimodal intention recognition framework can be easily extended further. Our desired assistive scenarios consider three modalities gestures, speech and gaze, all of which produce categorical distributions over all the finite intentions. The proposed method is validated with a six-DoF robot through extensive experiments and exhibits high performance compared to baselines.
|
|
16:40-16:45, Paper ThET12.2 | |
Optimize and Coordinate Multiple DMPs under Constraints to Achieve a Collaborative Manipulation Task |
|
Kordia, Ali H. | Instituto Superior Técnico |
Melo, Francisco S. | Instituto Superior Tecnico |
Keywords: Motion Control, Planning, Scheduling and Coordination, Human-Robot Collaboration
Abstract: This paper addresses a significant challenge in achieving collaborative tasks; how can a robot or multiple robots, endowed with a library of pre-learned primitive movements, generate multiple simultaneous coordinated robotic movements, adapting and optimizing those in the library, to complete one collaborative task? This work can thus be seen as a follow-up to the work with a motion presented as dynamic movement primitive (DMP) that now considers collaborative tasks and the existence of multiple robots/manipulators. Specifically, we start with a simple task using one DMP and extend it to accommodate the coordinated execution of multiple DMPs in robots with multiple manipulators or---alternatively---multiple robots with a single manipulator. We investigate mechanisms to jointly optimize multiple DMPs to perform one task in a coordinated fashion. The joint trajectory is built from initial DMPs learned for a single manipulator, and its optimization must comply with task-specific constraints. We illustrate the application of our approach both in a simulated environment and in a simulated and real Baxter robot.
|
|
16:45-16:50, Paper ThET12.3 | |
A Modified Resistance Model for Magnetic Honeycomb Robots to Navigate in Low Reynolds Number Fluids |
|
Zou, Leyao | Fudan University |
Ma, Shihao | Fudan University |
Liu, Yi | Fudan University |
Dong, Xinyang | Fudan University |
Zhou, Ziqing | Fudan University |
Ouyang, Chun | Fudan University |
Gan, Zhongxue | Fudan University |
Keywords: Motion Control, Micro/Nano Robots, Motion and Path Planning
Abstract: In recent years, magnetically controlled microrobots have garnered significant attention. This paper presents the H-robot, a self-designed microrobot featuring an innovative structure. The H-robot features a honeycomb porous spherical design specifically engineered to enhance cargo capacity. A new dynamic model for this structure has been developed for low Reynolds number fluid environments, along with a robust backstepping sliding mode control (RBSMC) strategy. Experiments were conducted in a calibrated magnetic field generated by a magnetic field generator to achieve precise motion control. The results demonstrate that the H-robot accurately tracks standard trajectories, with root mean square errors (RMSE) of 9.09×10−4 m for the Number-8 path and 8.29×10−4 m for the S-shaped path. Additionally, the proposed resistance model enhances tracking accuracy by 73.61% compared to traditional models, effectively adjusting the dynamic behavior of the H-robot in low Reynolds number fluids and significantly improving its motion performance. Finally, path planning experiments in a maze demonstrate the H-robot’s ability to navigate and avoid obstacles.
|
|
16:50-16:55, Paper ThET12.4 | |
Manual, Semi or Fully Autonomous Flipper Control? a Framework for Fair Comparison |
|
Číhala, Valentýn | Ceske Vysoke Uceni Technicke V Praze, FEL |
Pecka, Martin | Ceske Vysoke Uceni Technicke V Praze, FEL |
Svoboda, Tomas | Ceske Vysoke Uceni Technicke V Praze, FEL |
Zimmermann, Karel | Ceske Vysoke Uceni Technicke V Praze, FEL |
Keywords: Motion Control, Software Tools for Benchmarking and Reproducibility, Imitation Learning
Abstract: We investigated the performance of existing semi- and fully autonomous methods for controlling flipper-based skid-steer robots. Our study involves reimplementation of these methods for fair comparison and it introduces a novel semi-autonomous control policy that provides a compelling trade-off among current state-of-the-art approaches. We also propose new metrics for assessing cognitive load and traversal quality and offer a benchmarking interface for generating Quality-Load graphs from recorded data. Our results, presented in a 2D Quality-Load space, demonstrate that the new control policy effectively bridges the gap between autonomous and manual control methods. Additionally, we reveal a surprising fact that fully manual, continuous control of all six degrees of freedom remains highly effective when performed by an experienced operator on a well-designed analog controller from third person view.
|
|
16:55-17:00, Paper ThET12.5 | |
Safety-Critical Locomotion of Biped Robots in Infeasible Paths: Overcoming Obstacles During Navigation Toward Destination |
|
Lee, Jaemin | North Carolina State University |
Dai, Min | California Institute of Technology |
Kim, Jeeseop | Caltech |
Ames, Aaron | California Institute of Technology |
Keywords: Motion Control, Robot Safety, Humanoid and Bipedal Locomotion
Abstract: This paper proposes a safety-critical locomotion control framework employed for legged robots exploring through infeasible path in obstacle-rich environments. Our research focus is on achieving safe and robust locomotion where robots confront unavoidable obstacles en route to their designated destination. Through the utilization of outcomes from physical interactions with unknown objects, we establish a hierarchy among the safety-critical conditions avoiding the obstacles. This hierarchy enables the generation of a safe reference trajectory that adeptly mitigates conflicts among safety conditions and reduce the risk while controlling the robot toward its destination without additional motion planning methods. In addition, robust bipedal locomotion is achieved by utilizing the Hybrid Linear Inverted Pendulum model, coupled with a disturbance observer addressing a disturbance from the physical interaction.
|
|
17:00-17:05, Paper ThET12.6 | |
Optimal Framework for Constrained Admittance Path-Following Control |
|
Besi, Giulio | University of Modena and Reggio Emilia |
Pupa, Andrea | University of Modena and Reggio Emilia |
Secchi, Cristian | Univ. of Modena & Reggio Emilia |
Ferraguti, Federica | Universitŕ Degli Studi Di Modena E Reggio Emilia |
Keywords: Motion and Path Planning, Physical Human-Robot Interaction, Compliance and Impedance Control
Abstract: In this article, an optimal controller for achieving constrained admittance control is proposed. This controller strictly adheres to the constraint boundaries while ensuring minimal variations in kinematic energy. The proposed method integrates admittance control for human-robot interaction with the Udwadia-Kalaba equations for constrained motion into a unified framework. The proposed architecture has been tested and validated both with simulations and real tests on a 6-DoF UR5e robot. The results demonstrate that the proposed architecture outperforms virtual fixtures, one of the most commonly used techniques to implement effective path-following control.
|
|
17:05-17:10, Paper ThET12.7 | |
Robust Orientation Control of Robot Manipulator Using Orientation Disturbance Observer |
|
Choi, Kiyoung | Deagu Gyeongbuk Institute of Science and Technology |
Song, JunHo | Daegu Gyeongbuk Institute of Science and Technology |
Yun, WonBum | Korea Institute of Robotics and Technology Convergence |
Oh, Sehoon | DGIST |
Keywords: Motion Control, Dynamics
Abstract: This paper presents a robust control algorithm for precise orientation control of robot manipulators using a Disturbance Observer (DOB) specifically designed for orientation dynamics. Our approach addresses the challenges of 3D orientation control by incorporating various orientation representations, such as Euler angles, quaternions, and exponential coordinates, and analyzing their impact on DOB performance. Through theoretical analysis and experimental validation, we demonstrate the effectiveness of our method in achieving high-precision orientation control under uncertainties and disturbances. This work offers a comprehensive framework for robust orientation control, advancing the application of DOB in complex robotic tasks.
|
|
17:10-17:15, Paper ThET12.8 | |
Predictive Kinematic Coordinate Control for Aerial Manipulators Based on Modified Kinematics Learning |
|
Li, Zhengzhen | Westlake University |
Shen, Jiahao | Westlake University |
Ji, Mengyu | Westlake University |
Cao, Huazi | Beihang University |
Zhao, Shiyu | Westlake University |
Keywords: Motion Control, Aerial Systems: Mechanics and Control, Kinematics
Abstract: High-precision manipulation has always been a developmental goal for aerial manipulators. This paper investigates the kinematic coordinate control issue in aerial manipulators. We propose a predictive kinematic coordinate control method based on model learning, which includes a learning-based modified kinematic model and a model predictive control (MPC) scheme based on weight allocation. Compared to existing methods, our proposed approach offers several attractive features. First, the kinematic model incorporates closed-loop dynamics characteristics and online residual learning. Compared to methods that do not consider closed-loop dynamics and residuals, our proposed method has improved accuracy by 59.6%. Second, a MPC method that considers weight allocation has been proposed, which can coordinate the motion strategies of quadcopters and manipulators. Compared to methods that do not consider weight allocation, the proposed method can meet the requirements of more tasks. The proposed approach is verified through complex trajectory tracking and moving target tracking experiments. The results validate the effectiveness of the proposed method.
|
|
ThET13 |
316 |
Resiliency and Security 2 |
Regular Session |
Chair: Ueda, Jun | Georgia Institute of Technology |
Co-Chair: Chou, Glen | Georgia Institute of Technology |
|
16:35-16:40, Paper ThET13.1 | |
Affine Transformation-Based Perfectly Undetectable False Data Injection Attacks on Remote Manipulator Kinematic Control with Attack Detector |
|
Ueda, Jun | Georgia Institute of Technology |
Blevins, Jacob | Georgia Institute of Technology |
Keywords: Networked Robots, Failure Detection and Recovery, Motion Control
Abstract: This paper demonstrates the viability of perfectly undetectable affine transformation attacks against robotic manipulators where intelligent attackers can inject multiplicative and additive false data while remaining completely hidden from system users. The attacker can implement these communication line attacks by satisfying three Conditions presented in this work. These claims are experimentally validated on a FANUC 6 degree of freedom manipulator by comparing a nominal (non-attacked) trial and a detectable attack case against three perfectly undetectable trajectory attack Scenarios: scaling, reflection, and shearing. The results show similar observed end effector error for the attack Scenarios and the nominal case, indicating that the perfectly undetectable affine transformation attack method keeps the attacker perfectly hidden while enabling them to attack manipulator trajectories.
|
|
16:40-16:45, Paper ThET13.2 | |
CDA: Covert Deception Attacks in Multi-Agent Resource Scheduling |
|
Hao, Wei | Nanjing University |
Liu, Jia | Nanjing University |
Li, Wenjie | Nanjng University |
Chen, Lijun | Nanjing University |
Keywords: Robot Safety, Swarm Robotics, Deep Learning Methods
Abstract: In this letter, we address the critical security concerns in multi-agent systems, where illegal infiltration is commonly used to convert agents into malicious entities. Existing research predominantly focuses on explicit malicious attack patterns. Our work introduces a covert deception attack framework in the context of multi-agent resource scheduling scenarios. We first highlight vulnerabilities in scheduling strategies based on time and path costs. Exploiting these weaknesses, an infiltrated agent clandestinely gathers motion characteristics of other agents while posing as a teammate. Using these motion characteristics, the infiltrated agent employs an LSTM architecture to learn and predict congestion areas, thereby designing attack paths with greater time efficiency. This approach allows the infiltrated agent to secure additional resources and evade capture more effectively. Validation through simulation and real-world experiments demonstrates the feasibility and effectiveness of our approach, underscoring the importance of evaluating covert attacks in risk assessments within multi-agent systems.
|
|
16:45-16:50, Paper ThET13.3 | |
Early Model-Based Safety Analysis for Collaborative Robotic Systems (I) |
|
Manjunath, Meenakshi | Technical University of Applied Sciences Würzburg-Schweinfurt |
Jesus Raja, Jeshwitha | Technical University of Applied Sciences Würzburg-Schweinfurt |
Daun, Marian | Technical University of Applied Sciences Würzburg-Schweinfurt |
Keywords: Safety in HRI, Intelligent and Flexible Manufacturing, Modeling, Control, and Learning for Soft Robots
Abstract: The current era is marked by an accelerated digitization of manufacturing processes, with robotic systems increasingly integrated into various workflows. Yet, despite significant advancements, it is impractical to fully automate certain tasks due to prohibitive costs and technical constraints. As a result, there’s a growing emphasis on human-robot collaboration (HRC) for intricate operations. In HRC scenarios, humans and robots co-inhabit the same work environment, operating side by side. More than just mere coexistence in the same space, they actively collaborate on shared tasks, thus raising the stakes in terms of safety. The dynamic behavior of robots must be synchronized with the anticipated and unexpected human actions, adding another layer of complexity to the safety considerations. It is essential to conduct comprehensive safety analyses that identify potential risks that pose harm to the human operator. As a proactive measure to foster early-stage safety and risk analysis, we propose the use of goal models. The approach enables the specification of safety threats within the HRC context, thereby facilitating the development of safety tasks and supportive monitoring mechanisms. This approach helps in the refinement and implementation of safety measures, ensuring a secure and productive environment for human-robot collaboration.
|
|
16:50-16:55, Paper ThET13.4 | |
Investigating Security Threats in Multi-Tenant ROS 2 Systems |
|
Xia, Lichen | University of Delaware |
Gao, Xing | University of Delaware |
Shi, Weisong | University of Delaware |
Keywords: Multi-Robot Systems, Software Architecture for Robotic and Automation
Abstract: Robot Operating System (ROS) has been widely used to develop robotic applications. The first generation of ROS generally lacks security features, and ROS 2 is introduced with security support. However, security concerns still exist for running ROS in practical multi-tenant environments. In this paper, we conduct an in-depth investigation into the security of ROS 2. We focus on vulnerabilities in ROS nodes and topics and intend to explore methods to break the isolation and security mechanisms systematically. We devise a set of strategies that can be exploited by attackers to escalate privilege or cause information leakage in a multi-tenant environment. These attacks can bypass existing isolation and security mechanisms, including ROS 2’s native security module. To validate our findings, we employ simulations across various real-world scenarios to demonstrate how attackers could exploit these vulnerabilities to bypass existing security mechanisms. Finally, we present several defense practices to mitigate these identified threats.
|
|
16:55-17:00, Paper ThET13.5 | |
Multi-Task Robustness Enhancement Framework against Various Adversarial Patches |
|
Jing, Lihua | Chinese Academy of Sciences |
Wang, Rui | Chinese Academy of Sciences |
Li, Runbo | Chinese Academy of Sciences |
Zhu, Zixuan | Chinese Academy of Sciences |
Wei, Xingxing | Beihang University |
Keywords: Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization, Visual Learning
Abstract: Autonomous systems leveraging visual perception face a rising threat from adversarial patches, jeopardizing their robustness. Existing defense methods adaptable to various pre-trained models typically rely on observed patch characteristics or prior attack data, having difficulty adapting to new threats. This study innovatively focuses on modeling patch attack behavior instead of existing patches, proposing a unified robustness enhancement framework against various adversarial patches. Through self-supervised learning, we accurately locate diverse adversarial patches without prior attack knowledge. Furthermore, we introduce an efficient adaptive patch inpainting method to mitigate patch impact while maintaining visual coherence. Experiments show that our methods effectively boost the robustness of visual perception models against various adversarial patches across different tasks.
|
|
17:00-17:05, Paper ThET13.6 | |
Perfectly Undetectable False Data Injection Attacks on Encrypted Bilateral Teleoperation System Based on Dynamic Symmetry and Malleability |
|
Kwon, Hyukbin | Georgia Institute of Technology |
Kawase, Hiroaki | The University of Electro-Communications |
Nieves-Vazquez, Heriberto Andres | Georgia Institute of Technology |
Kogiso, Kiminao | The University of Electro-Communications |
Ueda, Jun | Georgia Institute of Technology |
Keywords: Telerobotics and Teleoperation, Networked Robots, Dynamics
Abstract: This paper investigates the vulnerability of bilateral teleoperation systems to perfectly undetectable False Data Injection Attacks (FDIAs). Teleoperation, one of major applications in robotics, involves a leader manipulator operated by a human and a follower manipulator at a remote site, connected via a communication channel. While this setup enables operation in challenging environments, it also introduces cybersecurity risks, particularly in the communication link. The paper focuses on a specific class of cyberattacks: perfectly undetectable FDIAs, where attackers alter signals without leaving detectable traces at all. Compared to previous research on linear and first-order nonlinear systems, this paper examines bilateral teleoperation systems with second-order nonlinear manipulator dynamics. The paper derives mathematical conditions based on Lie Group theory that enable such attacks, demonstrating how an attacker can modify the follower manipulator's motion while the operator perceives normal operation through the leader device. This vulnerability challenges conventional detection methods based on observable changes and highlights the need for advanced security measures in teleoperation systems. To validate the theoretical results, the paper presents experimental demonstrations using a teleoperation system connecting robots in the US and Japan.
|
|
ThET14 |
402 |
Hand and Gripper Design |
Regular Session |
Chair: Bekiroglu, Yasemin | Chalmers University of Technology, University College London |
Co-Chair: Plecnik, Mark | University of Notre Dame |
|
16:35-16:40, Paper ThET14.1 | |
A Novel Under-Actuated Gripper Based on Passive-Locking Mechanism for Stable Gripping under Environmental Constraints |
|
Yang, Seokjun | Kwangwoon University |
Lee, Sungon | Hanyang University |
Yang, Woosung | Kwangwoon University |
Keywords: Grippers and Other End-Effectors, Mechanism Design, Grasping
Abstract: This paper presents a novel under-actuated two finger gripper that passively adapts to various environments and maintains its grip posture using a passive-locking mechanism. The proposed mechanism features fingers with three phalanges, each incorporating four-bar and eight-bar linkages arranged in parallel. These linkages perform crucial functions, including maintaining the grip angle and ensuring passive characteristics during pinch grips. Previous grippers with passive mechanisms and three-phalanx fingers faced issues with gripping instability, particularly when changes in the passive joint angle were caused by object inertia or external lateral forces. To address this problem, we propose a new passive-locking mechanism utilizing an eight-bar linkage. This innovative design is engineered to adapt to environmental conditions, establish a secure grip, and maintain the grip angle of the passive joint after the grip is achieved. To demonstrate the advantages of the proposed mechanism, this paper conducts a fingertip force vector analysis and a mobility analysis according to the pinch sequence. It also details the derivation process and principles of the mechanism. The gripper’s operational range and gripping force are examined through kinematic analysis and verified by simulation. Furthermore, the study shows that the proposed mechanism effectively responds to environmental constraints, even in environments with obstacles surrounding the object. Comparative experiments with and without a contact bar indicate that the proposed gripper can stably secure an object in scenarios involving swing motions and external forces of approximately 5N.
|
|
16:40-16:45, Paper ThET14.2 | |
Juzu Type Gripper That Can Change Both Shape and Firmness |
|
Hara, Shunya | Osaka University |
Fukuda, Osamu | Saga University |
Higashimori, Mitsuru | Osaka University |
Keywords: Grippers and Other End-Effectors, Mechanism Design
Abstract: This paper presents a novel gripper capable of actively changing both shape and firmness. The gripper increases its grasp ability by changing its finger posture and firmness suitable for given target objects. In the proposed gripper, each finger is constructed by serially connecting multiple Juzu units. By controlling the angles between neighboring Juzu units individually using two actuators used for sending and bending, arbitrary finger shapes can be generated. In addition, by controlling the tension of the wire that penetrated all Juzu units in each finger, the friction between Juzu units is adjusted and the firmness of finger can be varied. A prototype gripper was designed and developed, and experiments to evaluate the capabilities of changing shape and firmness were conducted. Furthermore, through experiments of preshaping and grasping for various objects with different shape and size, the validity of the proposed method was demonstrated.
|
|
16:45-16:50, Paper ThET14.3 | |
A Direct-Drive Gripper Designed by Ellipse Synthesis across Two Output Modes |
|
Ramesh, Shashank | University of Notre Dame |
Plecnik, Mark | University of Notre Dame |
Keywords: Mechanism Design, Grippers and Other End-Effectors, Kinematics
Abstract: There are many ways for a gripper to estimate the forces between its fingers. If powered by direct-drive brushless motors, then one technique is to measure their current. This is not the most accurate technique, but it is simple, keeps the sensor remote, and requires no new components. The estimation involves multiplying current signals through by the torque constant and the inverse transpose of the Jacobian. The Jacobian either amplifies the signal from fingertip force to motor current (at the cost of tip force production), or diminishes it (with the gain of tip force production), indicating an inherent trade-off. However, the Jacobian is a function of configuration, and for any workspace point there are multiple configurations (multiple inverse kinematics solutions), therefore a selection of Jacobian exists. For a given workspace point, the number of Jacobian choices is just a few, but these choices can be designed (through dimensional synthesis) to overcome the trade-off. The problem can be framed as velocity ellipse synthesis over multiple output modes. In this work, we conduct optimal synthesis to compute a new gripper design. The gripper was built and tested. It transitions between two different modes: sense mode and grip mode. Sense mode can sense forces 3 times smaller than grip mode. Grip mode can exert forces 4 times greater than sense mode.
|
|
16:50-16:55, Paper ThET14.4 | |
Mechanisms and Computational Design of Multi-Modal End-Effector with Force Sensing Using Gated Networks |
|
Tanaka, Yusuke | University of California, Los Angeles |
Zhu, Alvin | University of California Los Angeles |
Lin, Richard | UC Los Angeles |
Mehta, Ankur | UCLA |
Hong, Dennis | UCLA |
Keywords: Grippers and Other End-Effectors, Legged Robots, Climbing Robots
Abstract: In limbed robotics, end-effectors must serve dual functions, such as both feet for locomotion and grippers for grasping, which presents design challenges. This paper introduces a multi-modal end-effector capable of transitioning between flat and line foot configurations while providing grasping capabilities. MAGPIE integrates 8-axis force sensing using proposed mechanisms with hall effect sensors, enabling both contact and tactile force measurements. We present a computational design framework for our sensing mechanism that accounts for noise and interference, allowing for desired sensitivity and force ranges and generating ideal inverse models. The hardware implementation of MAGPIE is validated through experiments, demonstrating its capability as a foot and verifying the performance of the sensing mechanisms, ideal models, and gated network-based models.
|
|
16:55-17:00, Paper ThET14.5 | |
Single-Motor-Driven (4 + 2)-Fingered Robotic Gripper Capable of Expanding the Workable Space in the Extremely Confined Environment |
|
Nishimura, Toshihiro | Kanazawa University |
Akasaka, Keisuke | Kanazawa University |
Ishikawa, Subaru | Kanazawa University |
Watanabe, Tetsuyou | Kanazawa University |
Keywords: Grippers and Other End-Effectors, Mechanism Design, Grasping
Abstract: This study proposes a novel robotic gripper that can expand workable spaces in a target environment to pick up objects from confined spaces. The proposed gripper is most effective for retrieving objects from deformable environments, such as taking an object out of a drawstring bag, or for extracting target objects located behind surrounding objects. The proposed gripper achieves both work-space expansion and grasping motion by using only a single motor. The gripper is equipped with four outer fingers for expanding the environment and two inner fingers for grasping an object. The inner and outer fingers move in different directions for their respective functions of grasping and spatial expansion. To realize two different movements of the fingers, a novel self-motion switching mechanism that switches between the functions as feed-screw and rack-and-pinion mechanisms is developed. The mechanism switches the motions according to the magnitude of the force applied to the inner fingers. This paper presents the mechanism design of the developed gripper, including the self-motion switching mechanism and the actuation strategy for expanding the workable space. The mechanical analysis is also presented, and the analysis result is validated experimentally. Moreover, an automatic object-picking system using the developed gripper is constructed to evaluate the gripper.
|
|
17:00-17:05, Paper ThET14.6 | |
A Three-Finger Adaptive Gripper with Finger-Embedded Suction Cups for Enhanced Object Grasping Mechanism |
|
Yoon, Jimin | Sungkyunkwan University |
Jeong, Heeyeon | Sungkyunkwan University |
Park, Jae Hyeong | Sungkwunkwan University |
Gong, Young Jin | SungKyunKwan University(SKKU) |
Shin, Dongsu | Sungkyunkwan University |
Seo, Hyeon-Woong | Sungkyunkwan University |
Moon, Seung Jae | Sungkyunkwan, Mechanical Engineering, Robottory |
Choi, Hyouk Ryeol | Sungkyunkwan University |
Keywords: Grippers and Other End-Effectors, Grasping, Mechanism Design
Abstract: With the growth of logistics automation, there is an increasing demand for advanced grippers. This study presents a gripper that integrates suction cups into the fingertips to overcome the limitations of traditional robotic gripping methods. Designed with a 5-degree-of-freedom (DOF) structure, the gripper allows for angle adjustment of the suction cups, facilitating effective grasping in various environments. Its adaptive grasping mechanism simplifies control by using the fingertips and distal phalanxes to cage objects without manually controlling them. The versatility of the gripper was tested by performing hybrid finger-suction gripping, as well as conventional finger and suction gripping. These advanced gripping strategies are designed to enhance flexibility and efficiency in logistics automation when handling a diverse range of objects.
|
|
ThET15 |
403 |
Datasets and Benchmarking |
Regular Session |
Chair: Xiao, Ted | Google DeepMind |
Co-Chair: Sintov, Avishai | Tel-Aviv University |
|
16:35-16:40, Paper ThET15.1 | |
Syn-Mediverse: A Multimodal Synthetic Dataset for Intelligent Scene Understanding of Healthcare Facilities |
|
Mohan, Rohit | University of Freiburg |
Arce y de la Borbolla, José | University of Freiburg |
Mokhtar, Sassan | University of Freiburg |
Cattaneo, Daniele | University of Freiburg |
Valada, Abhinav | University of Freiburg |
Keywords: Data Sets for Robotic Vision, Computer Vision for Medical Robotics, Medical Robots and Systems
Abstract: Safety and efficiency are paramount in healthcare facilities where the lives of patients are at stake. Despite the adoption of robots to assist medical staff in challenging tasks such as complex surgeries, human expertise is still indispensable. The next generation of autonomous healthcare robots hinges on their capacity to perceive and understand their complex and frenetic environments. While deep learning models are increasingly used for this purpose, they require extensive annotated training data which is impractical to obtain in real-world healthcare settings. To bridge this gap, we present Syn-Mediverse, the first hyper-realistic multimodal synthetic dataset of diverse healthcare facilities. Syn-Mediverse contains over 48,000 images from a simulated industry-standard optical tracking camera and provides more than 1.5M annotations spanning five different scene understanding tasks including depth estimation, object detection, semantic segmentation, instance segmentation, and panoptic segmentation. We demonstrate the complexity of our dataset by evaluating the performance on a broad range of state- of-the-art baselines for each task. To further advance research on scene understanding of healthcare facilities, along with the public dataset we provide an online evaluation benchmark available at http://syn-mediverse.cs.uni-freiburg.de.
|
|
16:40-16:45, Paper ThET15.2 | |
STEER: Flexible Robotic Manipulation Via Dense Language Grounding |
|
Smith, Laura | UC Berkeley |
Irpan, Alexander | Google |
Gonzalez Arenas, Montserrat | Google |
Kirmani, Sean | Google DeepMind |
Kalashnikov, Dmitry | Google Brain |
Shah, Dhruv | Google DeepMind |
Xiao, Ted | Google DeepMind |
Keywords: Learning from Demonstration, Data Sets for Robot Learning, Big Data in Robotics and Automation
Abstract: The complexity of the real world demands robotic systems that can intelligently adapt to unseen situations. We present STEER, a robot learning framework that bridges high-level, commonsense reasoning with precise, flexible low-level control. Our approach translates complex situational awareness into actionable low-level behavior through training language-grounded policies with dense annotation. By structuring policy training around fundamental, modular manipulation skills expressed in natural language, STEER exposes an expressive interface for humans or Vision-Language Models (VLMs) to intelligently orchestrate the robot's behavior by reasoning about the task and context. Our experiments demonstrate the skills learned via STEER can be combined to synthesize novel behaviors to adapt to new situations or perform completely new tasks without additional data collection or training.
|
|
16:45-16:50, Paper ThET15.3 | |
MBE-ARI: A Multimodal Dataset Mapping Bi-Directional Engagement in Animal-Robot Interaction |
|
Noronha, Ian | Purdue University |
Jawaji, Advait Prasad | Purdue University |
Soto, Juan | Purdue University |
An, Jiajun | The Chinese University of Hong Kong |
Gu, Yan | Purdue University |
Kaur, Upinder | Purdue University |
Keywords: Gesture, Posture and Facial Expressions, Data Sets for Robot Learning, Multi-Modal Perception for HRI
Abstract: Animal-robot interaction (ARI) remains an unexplored challenge in robotics, as robots struggle to interpret the complex, multimodal communication cues of animals, such as body language, movement, and vocalizations. Unlike human-robot interaction, which benefits from established datasets and frameworks, animal-robot interaction lacks the foundational resources needed to facilitate meaningful bidirectional communication. To bridge this gap, we present the MBE-ARI (Multimodal Bidirectional Engagement in Animal-Robot Interaction), a novel multimodal dataset that captures detailed interactions between a legged robot and cows. The dataset includes synchronized RGB-D streams from multiple viewpoints, annotated with body pose and activity labels across interaction phases, offering an unprecedented level of detail for ARI research. Additionally, we introduce a full-body pose estimation model tailored for quadruped animals, capable of tracking 39 keypoints with a mean average precision (mAP) of 92.7%, outperforming existing benchmarks in animal pose estimation. The MBE-ARI dataset and our pose estimation framework lay a robust foundation for advancing research in animal-robot interaction, providing essential tools for developing perception, reasoning, and interaction frameworks needed for effective collaboration between robots and animals. The dataset and resources are publicly available at https://github.com/RISELabPurdue/MBE-ARI/, inviting further exploration and development in this critical area.
|
|
16:50-16:55, Paper ThET15.4 | |
A Diffusion-Based Data Generator for Training Object Recognition Models in Ultra-Range Distance |
|
Bamani Beeri, Eran | Tel Aviv University |
Nissinman, Eden | Tel-Aviv University |
Koenigsberg, Lisa | Tel-Aviv University |
Meir, Inbar | Tel Aviv University |
Sintov, Avishai | Tel-Aviv University |
Keywords: Data Sets for Robotic Vision, Gesture, Posture and Facial Expressions, Recognition
Abstract: Object recognition, commonly performed by a camera, is a fundamental requirement for robots to complete complex tasks. Some tasks require recognizing objects far from the robot's camera. A challenging example is Ultra-Range Gesture Recognition (URGR) in human-robot interaction where the user exhibits directive gestures at a distance of up to 25~m from the robot. However, training a model to recognize hardly visible objects located in ultra-range requires an exhaustive collection of a significant amount of labeled samples. The generation of synthetic training datasets is a recent solution to the lack of real-world data, while unable to properly replicate the realistic visual characteristics of distant objects in images. In this letter, we propose the Diffusion in Ultra-Range (DUR) framework based on a Diffusion model to generate labeled images of distant objects in various scenes. The DUR generator receives a desired distance and class (e.g., gesture) and outputs a corresponding synthetic image. We apply DUR to train a URGR model with directive gestures in which fine details of the gesturing hand are challenging to distinguish. DUR is compared to other types of generative models showcasing superiority both in fidelity and in recognition success rate when training a URGR model. More importantly, training a DUR model on a limited amount of real data and then using it to generate synthetic data for training a URGR model outperforms directly training the URGR model on real data. The synthetic-based URGR model is also demonstrated in gesture-based direction of a ground robot.
|
|
16:55-17:00, Paper ThET15.5 | |
MovingCables: Moving Cable Segmentation Method and Dataset |
|
Holesovsky, Ondrej | Czech Technical University in Prague |
Skoviera, Radoslav | Czech Institute of Informatics, Robotics, and Cybernetics; Czech |
Hlavac, Vaclav | Czech Technical University in Prague |
Keywords: Data Sets for Robotic Vision, Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization
Abstract: Manipulating cluttered cables, hoses or ropes is challenging for both robots and humans. Humans often simplify these perceptually challenging tasks by pulling or pushing tangled cables and observing the resulting motions. We propose to use a similar interactive perception principle to aid robotic cable manipulation. A fundamental building block of such an endeavor is a cable motion segmentation method that densely labels moving cable image pixels. This letter presents MovingCables, a moving cable dataset, which we hope will motivate the development and evaluation of cable motion segmentation algorithms. The dataset consists of real-world image sequences automatically annotated with ground truth segmentation masks and optical flow. In addition, we propose a cable motion segmentation method and evaluate its performance on the new dataset.
|
|
ThET16 |
404 |
Soft Sensors |
Regular Session |
Chair: Stuart, Hannah | UC Berkeley |
Co-Chair: Monje, Concepción A. | University Carlos III of Madrid |
|
16:35-16:40, Paper ThET16.1 | |
Dynamic Contact Force Estimation Via Integration of Soft Sensor Based on Fiber Bragg Grating and Series Elastic Actuator |
|
Na, Hyunbin | DGIST |
Lee, Hyunwook | Gyeongsang National University |
Park, Chang Hyun | Pusan National University |
Kim, Gyeong Hun | Pusan National University |
Kim, Chang-Seok | Pusan National University |
Oh, Sehoon | DGIST |
Keywords: Force and Tactile Sensing, Compliant Joints and Mechanisms, Flexible Robotics
Abstract: Research on interactive force measurement in robotics follows two trends: distributed force sensing using soft tactile sensors and centered force sensing using rigid sensors. This study proposes a novel force sensing mechanism and algorithm to integrate the two approaches taking advantage of a soft tactile sensor and rigid actuator based on spring. Soft tactile sensors allow for gentle contact with humans, but have limited recovery and measurable force range. The rigidity of a spring-based actuator is utilized to address their force estimation issues. This allows for estimating a wider range of forces while maintaining the softness. The paper presents a novel approach for integrating two sensors using sophisticated algorithms. Specifically, a deep neural network is developed to estimate the contact location through the tactile sensor. Subsequently, a state-space observer is proposed based on the dynamic characteristics of the robot link, which integrates the network output and the torque measurements obtained from a spring-based actuator. This algorithm provides accurate force estimation during dynamic behavior and enables a wide measurable force range across the entire area of the robot link. The efficacy of the proposed mechanism and algorithm is validated through rigorous experimentation, demonstrating the fast recovery characteristics and accuracy.
|
|
16:40-16:45, Paper ThET16.2 | |
A Piezoresistive Printable Strain Sensor for Monitoring and Control of Soft Robotic Links |
|
Sánchez, Claudia | University Carlos III of Madrid |
Rodriguez, Daniel | AIMPLAS |
Otero, Susana | AIMPLAS |
Monje, Concepción A. | University Carlos III of Madrid |
Keywords: Soft Sensors and Actuators, Additive Manufacturing, Flexible Robotics
Abstract: Integrating sensors into soft links with complex geometries without compromising their flexibility, precision, or structural integrity remains one of the main challenges in soft robotics. This article presents the design, fabrication, and electromechanical evaluation of a 3D-printed flexible strain sensor tailored for monitoring and controlling these links. By combining Fused Filament Fabrication (FFF) and Direct Ink Writing (DIW) technologies, we manufactured a sensor composed of a thermoplastic polyurethane (TPU) substrate and a pattern of silver (Ag) nanoparticles ink, ensuring high flexibility and conductivity. We performed electromechanical tests to assess the sensor's performance, including three-point bending tests, cyclic loading to evaluate its durability, and angular deflection measurements to confirm its precision in detecting bending angles. The sensor demonstrated efficient piezoresistive behavior within a defined working range between 3% and 8% of flexure strain with a Gauge Factor (GF) of 0.24 and stable repeatability. We also tested its integration into a soft link, showing that the sensor maintains flexibility and accuracy during deformation.
|
|
16:45-16:50, Paper ThET16.3 | |
AnySkin: Plug-And-Play Skin Sensing for Robotic Touch |
|
Bhirangi, Raunaq Mahesh | New York University |
Pattabiraman, Venkatesh | New York University |
Erciyes, Mehmet Enes | New York University |
Cao, Yifeng | Columbia University |
Hellebrekers, Tess | Meta AI Research |
Pinto, Lerrel | New York University |
Keywords: Soft Sensors and Actuators, Sensorimotor Learning, Transfer Learning
Abstract: While tactile sensing is widely accepted as an important and useful sensing modality, its use pales in comparison to other sensory modalities like vision and proprioception. AnySkin addresses the critical challenges that impede the use of tactile sensing -- versatility, replaceability, and data reusability. Building on the simplistic design of ReSkin, and decoupling the sensing electronics from the sensing interface, AnySkin simplifies integration making it as straightforward as putting on a phone case and connecting a charger. Furthermore, AnySkin is the first uncalibrated tactile-sensor with cross-instance generalizability of learned manipulation policies. To summarize, this work makes three key contributions: first, we introduce a streamlined fabrication process and a design tool for creating an adhesive-free, durable and easily replaceable magnetic tactile sensor; second, we characterize slip detection and policy learning with the AnySkin sensor; and third, we demonstrate zero-shot generalization of models trained on one instance of AnySkin to new instances, and compare it with popular existing tactile solutions like DIGIT and ReSkin. Code, design files, and videos of policy experiments can be found on https://any-skin.github.io
|
|
16:50-16:55, Paper ThET16.4 | |
Proximity and Visuotactile Point Cloud Fusion for Contact Patches in Extreme Deformation |
|
Yin, Jessica | University of Pennsylvania |
Shah, Paarth | University of Oxford |
Kuppuswamy, Naveen | Toyota Research Institute |
Beaulieu, Andrew | Toyota Research Institute |
Uttamchandani, Avinash | Toyota Research Institute |
Castro, Alejandro | Toyota Research Institute |
Pikul, James | University of Pennsylvania |
Tedrake, Russ | Massachusetts Institute of Technology |
Keywords: Force and Tactile Sensing, Soft Sensors and Actuators, Soft Robot Materials and Design
Abstract: Visuotactile sensors are a popular tactile sensing strategy due to high-fidelity estimates of local object geometry. However, existing algorithms for processing raw sensor inputs to useful intermediate signals such as contact patches struggle in high-deformation regimes. This is due to physical constraints imposed by sensor hardware and small-deformation assumptions used by mechanics-based models. In this work, we propose a fusion algorithm for proximity and visuotactile point clouds for contact patch segmentation, entirely independent from membrane mechanics. This algorithm exploits the synchronous, high spatial resolution proximity and visuotactile modalities enabled by an extremely deformable, selectively transmissive soft membrane, which uses visible light for visuotactile sensing and infrared light for proximity depth. We evaluate our contact patch algorithm in low (10%), medium (60%), and high (100%+) strain states. We compare our method against three baselines: proximity-only, tactile-only, and a first principles mechanics model. Our approach outperforms all baselines with an average RMSE under 2.8 mm of the contact patch geometry across all strain ranges. We demonstrate our contact patch algorithm in four applications: varied stiffness membranes, torque and shear-induced wrinkling, closed loop control, and pose estimation.
|
|
16:55-17:00, Paper ThET16.5 | |
Spatial Sensitivity Equalization of ERT-Based Robotic Skin through Gauge Factor Distribution Optimization |
|
Cho, Junhwi | KAIST |
Chung, Hyunjo | Korea Advanced Institute of Science and Technology (KAIST) |
Park, Kyungseo | Daegu Gyeongbuk Institute of Science and Technology (DGIST) |
Kim, Jung | KAIST |
Keywords: Force and Tactile Sensing, Touch in HRI, Soft Sensors and Actuators
Abstract: Electrical Resistance Tomography (ERT) has emerged as a promising technology for large-area robotic skin due to its ability to reconstruct pressure distribution over extensive regions using a few sparsely distributed electrodes. Despite ERT’s potential to reconstruct the external forces applied on 3D surfaces, the uneven distribution of spatial sensitivity leads to significant errors in identifying the physical quantities of contacts, inhibiting this technique from being an effective tactile sensor. To address this issue, this paper proposes a method to equalize the spatial sensitivity by modulating the conductivity of ERT sensors through topology optimization. In a simulation environment, the sensor's conductive domain was converted into a binary image and optimized to equalize spatial sensitivity and reduce disparities between low and high-sensitivity areas. Additionally, we present a sensor fabrication method with a complex optimized conductive patch pattern from simulation by applying screen printing techniques. The effectiveness of the implemented spatial sensitivity equalization was validated by comparing it to a conventional ERT sensor in both simulations and real-world environments. The proposed sensitivity optimization method expands the use of ERT-based sensors for distributed tactile sensing in physical human-robot interaction scenarios.
|
|
17:00-17:05, Paper ThET16.6 | |
Milli-Scale AcousTac Sensing Using Soft Helmholtz Resonators |
|
Aderibigbe, Jadesola | University of California, Berkeley |
Li, Monica | Yale University |
Lee, Jungpyo | University of California, Berkeley |
Stuart, Hannah | UC Berkeley |
Keywords: Soft Sensors and Actuators, Force and Tactile Sensing, Soft Robot Applications
Abstract: Acoustic transmission, or sound, can effectively communicate information over distances through various media. We focus on generating acoustic transmission using pneumatically driven resonators for wireless tactile sensing without the need for any electronics at the end-effector or contact point. We explore the relationship between emitted frequency and the geometry of the resonance chamber. When a normal compressive force is applied to the end cap, the compliant resonant cavity deforms, leading to an increase in frequency measurable by an external microphone. Prior work uses tube resonators with fipple attachments. In the present work, we study whether a different smaller audible cylindrical resonator with air blown across the entryway can be utilized instead. We test the utility of the Helmholtz resonator model in predicting the experimental frequency response. Resonance is often modeled for rigid cavities, presenting unique challenges in predicting resonance for the design of soft resonating taxels.
|
|
17:05-17:10, Paper ThET16.7 | |
Enhanced Model-Free Dynamic State Estimation for a Soft Robot Finger Using an Embedded Optical Waveguide Sensor |
|
Krauss, Henrik | Keio University, Faculty of Science and Technology |
Takemura, Kenjiro | Keio University |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Sensors and Actuators, Machine Learning for Robot Control
Abstract: In this letter, an advanced stretchable optical waveguide sensor is implemented into a multidirectional PneuNet soft actuator to enhance dynamic state estimation through a NARX neural network. The stretchable waveguide featuring a semidivided core design from previous work is sensitive to multiple strain modes. It is integrated into a soft finger actuator with two pressure chambers that replicates human finger motions. The soft finger, designed for applications in soft robotic grippers or hands, is viewed in isolation under pneumatic actuation controlled by motorized linear stages. The research first characterizes the soft finger's workspace and sensor response. Subsequently, three dynamic state estimators are developed using NARX architecture, differing in the degree of incorporating the optical waveguide sensor response. Evaluation on a testing path reveals that the full sensor response significantly improves end effector position estimation, reducing mean error by 51% from 5.70 mm to 2.80 mm, compared to only 21% improvement to 4.53 mm using the estimator representing a single core waveguide design. The letter concludes by discussing the application of these estimators for (open-loop) model-predictive control and recommends future focus on advanced, structured soft (optical) sensors for model-free state estimation and control of soft robots.
|
|
ThET17 |
405 |
Design and Control |
Regular Session |
Chair: Le Goff, Leni Kenneth | Edinburgh Napier University |
Co-Chair: Padir, Taskin | Northeastern University |
|
16:35-16:40, Paper ThET17.1 | |
Efficient and Diverse Generative Robot Designs Using Evolution and Intrinsic Motivation |
|
Le Goff, Leni Kenneth | UPMC |
Smith, Simón C. | Edinburgh Napier University |
Keywords: Evolutionary Robotics, Methods and Tools for Robot System Design, Embodied Cognitive Science
Abstract: Methods for generative design of robot physical configurations can automatically find optimal and innovative solutions for challenging tasks in complex environments. The vast search-space includes the physical design-space and the controller parameter-space, making it a challenging problem in machine learning and optimisation in general. Evolutionary algorithms (EAs) have shown promising results in generating robot designs via gradient-free optimisation. Morpho-evolution with learning (MEL) uses EAs to concurrently generate robot designs and learn the optimal parameters of the controllers. Two main issues prevent MEL from scaling to higher complexity tasks: i) computational cost and ii) premature convergence to sub-optimal designs. To address these issues, we propose combining morpho-evolution with intrinsic motivations. Intrinsically motivated behaviour arises from embodiment and simple learning rules without external guidance. We use a homeokinetic controller that generates exploratory behaviour in a few seconds with minimal knowledge of the robot’s design. Homeokinesis replaces costly learning phases, reducing computational time and favouring diversity, preventing premature convergence. We compare our approach with current MEL methods in several downstream tasks. The generated designs score higher in all the tasks, are more diverse, and are quickly generated compared to morpho-evolution with static parameters. Source and containers available at github.com/AutonomousRoboticEvolution.
|
|
16:40-16:45, Paper ThET17.2 | |
A Novel Hybrid Hysteresis Modeling Method for Multiloop-Asymmetry Hysteresis Behavior of Nonlinear Compliant Actuators |
|
Zhou, Libo | Zhejiang University of Technology |
Xu, Lingpeng | Zhejiang University of Technology |
Ou, Linlin | Zhejiang University of Technology |
Yu, Xinyi | Zhejiang University of Technology |
Feng, Yalei | Midea Group |
Bai, Shaoping | Aalborg University |
Keywords: Compliant Joints and Mechanisms, Prosthetics and Exoskeletons, Wearable Robotics
Abstract: Nonlinear compliant actuators are being increasingly used in human-robot interaction scenarios due to their inherent flexibility. However, a limitation is that nonlinear hysteresis exists, which will degrade the force/torque tracking performance if the hysteresis is not modeled accurately. Moreover, the existing methods are difficult to deal with the multi-loop asymmetry hysteresis. In this work, we present a novel modeling method, in which the hysteresis curves are decoupled into nonlinear reference lines and symmetrical hysteresis loops. A hybrid hysteresis model based on power function and Maxwellslip model is then developed to fit the nonlinear reference lines and symmetrical hysteresis loops respectively. Experiments were conducted on a nonlinear compliant actuator and the results show that the root-mean-square-errors (RMSE) of the hysteresis model decreases by 24.4% when compared with the Maxwellslip based hysteresis model.
|
|
16:45-16:50, Paper ThET17.3 | |
Dynamic Mode Decomposition with Sonomyography and Electromyography for Predictive Modeling of Lower Limb Exoskeleton Walking |
|
Lambeth, Krysten | North Carolina State University |
Xue, Xiangming | North Carolina State University |
Singh, Mayank | North Carolina State University |
Huang, He (Helen) | North Carolina State University and University of North Carolina |
Sharma, Nitin | North Carolina State University |
Keywords: Model Learning for Control, Prosthetics and Exoskeletons, Rehabilitation Robotics
Abstract: The nonlinear dynamics required to model walking with multi-joint lower limb exoskeleton assistance results in high computational burden. To address this, we derive a Koopman-based linearized model of the human-exoskeleton system using electromyography and ultrasound-derived metrics of volitional muscle activity during exoskeleton-assisted walking. Data are collected from one participant with spinal cord injury (SCI) and two participants with no disabilities. Various electromyography and ultrasound-derived features in addition to normalized motor currents are used to derive predictive models, and we identify which muscle activation metrics produce the most accurate model for each subject. For both subjects without disabilities, the most accurate model uses only ultrasound-derived echogenicity as a metric of muscle activity, while the most accurate model for the subject with SCI uses only EMG wave length. Furthermore, the inclusion of ground reaction force increases the prediction accuracy of all models for one participant with no disabilities while decreasing the accuracy of most models for the participant with SCI. For all subjects, the most accurate subject-specific linear model has a root-mean-square error (averaged across limb segment angles) of <8°.
|
|
16:50-16:55, Paper ThET17.4 | |
Data-Driven Sampling Based Stochastic MPC for Skid-Steer Mobile Robot Navigation |
|
Trivedi, Ananya | Northeastern University |
Prajapati, Sarvesh | Northeastern University |
Shirgaonkar, Anway Prasad | Northeastern University |
Zolotas, Mark | Toyota Research Institute |
Padir, Taskin | Northeastern University |
Keywords: Model Learning for Control, Planning under Uncertainty, Robust/Adaptive Control
Abstract: Traditional approaches to motion modeling for skid-steer robots struggle to capture nonlinear tire-terrain dynamics, especially during high-speed maneuvers. In this paper, we tackle such nonlinearities by enhancing a dynamic unicycle model with Gaussian Process (GP) regression outputs. This enables us to develop an adaptive, uncertainty-informed navigation formulation. We solve the resultant stochastic optimal control problem using a chance-constrained Model Predictive Path Integral (MPPI) control method. This approach formulates obstacle avoidance and path-following as chance constraints, accounting for residual uncertainties from the GP to ensure safety and reliability in control. Leveraging GPU acceleration, we efficiently manage the non-convex nature of the problem, ensuring real-time performance. Our approach unifies path-following and obstacle avoidance across different terrains, unlike prior works which typically focus on one or the other. We compare our GP-MPPI method against unicycle and data-driven kinematic models within the MPPI framework. In simulations, our approach shows superior tracking accuracy and obstacle avoidance. We further validate our approach through hardware experiments on a skid-steer robot platform, demonstrating its effectiveness in high-speed navigation. The GPU implementation of the proposed method and supplementary video footage are available at https://stochasticmppi.github.io.
|
|
16:55-17:00, Paper ThET17.5 | |
Agile Mobility with Rapid Online Adaptation Via Meta-Learning and Uncertainty-Aware MPPI |
|
Kalaria, Dvij | Carnegie Mellon University |
Xue, Haoru | University of California Berkeley |
Xiao, Wenli | Carnegie Mellon University |
Tao, Tony | Carnegie Mellon University |
Shi, Guanya | Carnegie Mellon University |
Dolan, John M. | Carnegie Mellon University |
Keywords: Robust/Adaptive Control, Machine Learning for Robot Control, Representation Learning
Abstract: Modern non-linear model-based controllers require an accurate physics model and model parameters to be able to control mobile robots at their limits. Also, due to surface slipping at high speeds, the friction parameters may continually change (like tire degradation in autonomous racing), and the controller may need to adapt rapidly. Many works derive a task-specific robot model with a parameter adaptation scheme that works well for the task but requires a lot of effort and tuning for each platform and task. In this work, we design a full model-learning-based controller based on meta pre-training that can very quickly adapt using few-shot dynamics data to any wheel-based robot with any model parameters, while also reasoning about model uncertainty. We demonstrate our results in small-scale numeric simulation, the large-scale Unity simulator, and on a medium-scale hardware platform with a wide range of settings. We show that our results are comparable to domain-specific well-engineered controllers, and have excellent generalization performance across all scenarios
|
|
17:00-17:05, Paper ThET17.6 | |
Variable Transmission Mechanisms for Robotic Applications: A Review |
|
Park, Jihyuk | Yeungnam University |
Lee, Joon | Sogang University |
Seo, Hyung-Tae | Kyonggi University |
Jeong, Seokhwan | Mechanical Eng., Sogang University |
Keywords: Mechanism Design, Actuation and Joint Mechanisms, Compliant Joints and Mechanisms
Abstract: Actuators play a crucial role in robotics, determining the force and speed capabilities necessary for varied tasks, directly affecting the performance of the robotic system. With the growing reliance on robotics in both industrial applications and daily life, innovative actuator research has expanded significantly. Despite advances, traditional actuators encounter limitations in performance and operational range due to inherent physical constraints. To address these challenges, variable transmission mechanisms (VTMs) have emerged over the past decade as one of the alternative solutions, enhancing the adaptability and efficiency of robotic systems. However, there is currently a lack of survey articles that comprehensively cover the mechanisms and working principles of VTMs in robotics. This review article fills this gap by offering an extensive analysis of VTM applications in robotics. It categorizes VTMs based on their mechanisms and principles, presents case studies on both commercial and experimental VTMs, and provides insights into future
|
|
17:05-17:10, Paper ThET17.7 | |
Continuously Variable Transmission and Stiffness Actuator Based on Actively Variable Four-Bar Linkage for Highly Dynamic Robot Systems |
|
Hur, Jungwoo | Sogang University |
Song, Hangyeol | Georgia Institute of Technology |
Jeong, Seokhwan | Mechanical Eng., Sogang University |
Keywords: Mechanism Design, Actuation and Joint Mechanisms, Compliant Joints and Mechanisms
Abstract: This paper presents a novel actuation mechanism that combines a continuously variable transmission (CVT) mechanism with a variable stiffness actuator (VSA) for highly dynamic robot systems such as legged robots. The CVT effectively changes the input-output transmission ratio of the system, thereby extending the operational torque-speed range. Concurrently, the VSA adjusts the system stiffness, altering its compliance characteristics. Both CVT and VSA are seamlessly integrated into a single four-bar linkage mechanism, with their active features enabled by an actively variable link within this linkage. This CVT-VSA mechanism offers a range of dynamic advantages by inversely varying transmission ratio and stiffness, which includes impact mitigation, torque or speed amplification, and expanded control bandwidth. The implementation and efficacy of the CVT-VSA mechanism in a legged robot were tested and validated through a series of experiments.
|
|
ThET18 |
406 |
Planning under Uncertainty 3 |
Regular Session |
Chair: Kurniawati, Hanna | Australian National University |
Co-Chair: Fridovich-Keil, David | The University of Texas at Austin |
|
16:35-16:40, Paper ThET18.1 | |
A Data-Driven Aggressive Autonomous Racing Framework Utilizing Local Trajectory Planning with Velocity Prediction |
|
Li, Zhouheng | Zhejiang University |
Zhou, Bei | Zhejiang University |
Hu, Cheng | Zhejiang University |
Xie, Lei | Zhejiang University |
Su, Hongye | Zhejiang University |
Keywords: Integrated Planning and Learning, Integrated Planning and Control, Constrained Motion Planning
Abstract: The development of autonomous driving has boosted the research on autonomous racing. However, existing local trajectory planning methods have difficulty planning trajectories with optimal velocity profiles at racetracks with sharp corners, thus weakening the performance of autonomous racing. To address this problem, we propose a local trajectory planning method that integrates Velocity Prediction based on Model Predictive Contouring Control (VPMPCC). The optimal parameters of VPMPCC are learned through Bayesian Optimization (BO) based on a proposed novel Objective Function adapted to Racing (OFR). Specifically, VPMPCC achieves velocity prediction by encoding the racetrack as a reference velocity profile and incorporating it into the optimization problem. This method optimizes the velocity profile of local trajectories, especially at corners with significant curvature. The proposed OFR balances racing performance with vehicle safety, ensuring safe and efficient BO training. In the simulation, the number of training iterations for OFR-based BO is reduced by 42.86% compared to the state-of-the-art method. The optimal simulation-trained parameters are then applied to a real-world F1TENTH vehicle without retraining. During prolonged racing on a custom-built racetrack featuring significant sharp corners, the mean projected velocity of VPMPCC reaches 93.18% of the vehicle's handling limits. The released code is available at https://github.com/zhouhengli/VPMPCC.
|
|
16:40-16:45, Paper ThET18.2 | |
RLPP: A Residual Method for Zero-Shot Real-World Autonomous Racing on Scaled Platforms |
|
Ghignone, Edoardo | ETH |
Baumann, Nicolas | ETH |
Hu, Cheng | Zhejiang University |
Wang, Jonathan | ETH Zurich |
Xie, Lei | Zhejiang University |
Carron, Andrea | ETH Zurich |
Magno, Michele | ETH Zurich |
Keywords: Field Robots, Reinforcement Learning, Wheeled Robots
Abstract: Autonomous racing presents a complex environment requiring robust controllers capable of making rapid decisions under dynamic conditions. While traditional controllers based on tire models are reliable, they often demand extensive tuning or system identification. RL methods offer significant potential due to their ability to learn directly from interaction, yet they typically suffer from the sim-to-real gap, where policies trained in simulation fail to perform effectively in the real world. In this paper, we propose RLPP, a residual RL framework that enhances a PP controller with an RL-based residual. This hybrid approach leverages the reliability and interpretability of PP while using RL to fine-tune the controller's performance in real-world scenarios. Extensive testing on the F1TENTH platform demonstrates that RLPP improves lap times of the baseline controllers by up to 6.37%, closing the gap to the SotA methods by more than 52% and providing reliable performance in zero-shot real-world deployment, overcoming key challenges associated with the sim-to-real transfer and reducing the performance gap from simulation to reality by more than 8-fold when compared to the baseline RL controller. The RLPP framework is made available as an open-source tool, encouraging further exploration and advancement in autonomous racing research. The code is available at: www.github.com/forzaeth/rlpp.
|
|
16:45-16:50, Paper ThET18.3 | |
Uncertainty-Aware Probabilistic Risk Quantification of SOTIF for Autonomous Vehicles |
|
Yao, Botao | Harbin Institute of Technology |
Huang, Shuohan | Harbin Institute of Technology |
Liu, Chuanyi | Harbin Institute of Technology |
Han, Peiyi | Harbin Institute of Technology |
Lin, Jie | Harbin Institute of Technology |
Duan, Shaoming | Pengcheng Laboratory |
Keywords: Collision Avoidance, Intelligent Transportation Systems, Motion and Path Planning
Abstract: Ensuring the Safety of the Intended Functionality (SOTIF) for autonomous vehicles (AVs) is critical. Effective risk assessment helps AVs make decisions and avoid risks. However, existing methods face challenges due to environmental uncertainties, insufficient multi-dimensional risk quantification, and limited predictive accuracy. To address this challenge, we propose an uncertainty-aware probabilistic risk assessment framework that quantifies the risk of AVs violating safety constraints and calculates the expected average severity of such violations in uncertain environments. We first establish a general SOTIF risk model to characterize the static risk of the AV and surrounding traffic participants. Following this, we introduce a method for predicting dynamic uncertainty risks, resulting in probabilistic risk quantification. This framework accounts for multi-dimensional uncertainties and enhances safety under dynamic conditions. Extensive evaluations across typical traffic scenarios—including highways, intersections, and roundabouts—demonstrate that our method outperforms typical algorithms like Time Headway (THW) and Time-to-Collision (TTC). Empirical studies in extreme scenarios further validate the framework's ability to reduce risks and improve system generalization. The related code is available at: https://github.com/idslab-autosec/risk_uncertainty.
|
|
16:50-16:55, Paper ThET18.4 | |
Think Deep and Fast: Learning Neural Nonlinear Opinion Dynamics from Inverse Dynamic Games for Split-Second Interactions |
|
Hu, Haimin | Princeton University |
Fernández Fisac, Jaime | Princeton University |
Leonard, Naomi | Princeton University |
Gopinath, Deepak | Northwestern University |
DeCastro, Jonathan | Cornell University |
Rosman, Guy | Massachusetts Institute of Technology |
Keywords: Motion and Path Planning, Human-Aware Motion Planning, Learning from Demonstration
Abstract: Non-cooperative interactions commonly occur in multi-agent scenarios such as car racing, where an ego vehicle can choose to overtake the rival, or stay behind it until a safe overtaking “corridor” opens. While an expert human can do well at making such time-sensitive decisions, autonomous agents are incapable of rapidly reasoning about complex, potentially conflicting options, leading to suboptimal behaviors such as deadlocks. Recently, the nonlinear opinion dynamics (NOD) model has proven to exhibit fast opinion formation and avoidance of decision deadlocks. However, NOD modeling parameters are oftentimes assumed fixed, limiting their applicability in complex and dynamic environments. It remains an open challenge to determine such parameters automatically and adaptively, accounting for the ever-changing environment. In this work, we propose for the first time a learning-based and game-theoretic approach to synthesize a Neural NOD model from expert demonstrations, given as a dataset containing (possibly incomplete) state and action trajectories of interacting agents. We demonstrate Neural NOD’s ability to make fast and deadlock-free decisions in a simulated autonomous racing example. We find that Neural NOD consistently outperforms the state-of-the-art data-driven inverse game baseline in terms of safety and overtaking performance.
|
|
16:55-17:00, Paper ThET18.5 | |
Online Risk-Bounded Graph-Based Local Planning for Autonomous Driving with Theoretical Guarantees |
|
Ahmad, Abdulrahman | Khalifa University of Science and Technology |
Khonji, Majid | Khalifa University |
Elbassioni, Khaled | Khalifa University of Science and Technology |
Dias, Jorge | Khalifa University |
Al-Sumaiti, Ameena | Khalifa University |
Keywords: Constrained Motion Planning, Collision Avoidance, Planning under Uncertainty
Abstract: Risk-bounded motion planning in dynamic environments for autonomous driving presents complex challenges, particularly in solving the nonconvex problem of ensuring continuous, safe, and real-time navigation towards a destination. This paper introduces an online graph-based local planning approach constrained by a user-defined driving style in terms of a risk budget Delta for the entire mission. Our online approach assigns a risk bound to each motion planning decision, ensuring that the total risk consumed remains within Delta. First, we construct a spatial lattice graph that adheres to the vehicle's curvature constraints. Then, the trajectory planning problem is reformulated as an online optimization problem, where decisions must be made sequentially without prior knowledge of future events. Therefore, we propose a reduction to the problem to be online multiple-choice knapsack problem (ON-MCKP), where the knapsack items are candidate paths generated by solving constrained shortest-path problems. To solve the ON-MCKP, we deploy online algorithms that offer theoretical guarantees on the risk allocation throughout the entire mission. The effectiveness of our method is demonstrated empirically, showing significant improvements in the objective without violating safety constraints.
|
|
17:00-17:05, Paper ThET18.6 | |
Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning |
|
Wang, Xian | Zhejiang University |
Zhou, Jin | Zhejiang University |
Feng, Yuanli | Zhejiang University |
Mei, Jiahao | Zhejiang University of Technology |
Chen, Jiming | Zhejiang University |
Li, Shuo | Zhejiang University |
Keywords: Reinforcement Learning, Motion and Path Planning
Abstract: Recent innovations in autonomous drones have facilitated time-optimal flight in single-drone configurations, and enhanced maneuverability in multi-drone systems by applying optimal control and learning-based methods. However, few studies have achieved time-optimal motion planning for multi-drone systems, particularly during highly agile maneuvers or in dynamic scenarios. This paper presents a decentralized policy network using multi-agent reinforcement learning for time-optimal multi-drone flight. To strike a balance between flight efficiency and collision avoidance, we introduce a soft collision-free mechanism inspired by optimization-based methods. By customizing PPO in a centralized training, decentralized execution (CTDE) fashion, we unlock higher efficiency and stability in training while ensuring lightweight implementation. Extensive simulations show that, despite slight performance trade-offs compared to single-drone systems, our multi-drone approach maintains near-time-optimal performance with a low collision rate. Real-world experiments validate our method, with two quadrotors using the same network as in simulation achieving a maximum speed of 13.65 m/s and a maximum body rate of 13.4 rad/s in a 5.5 m × 5.5 m × 2.0 m space across various tracks, relying entirely on onboard computation.
|
|
17:05-17:10, Paper ThET18.7 | |
Kernel-Based Metrics Learning for Uncertain Opponent Vehicle Trajectory Prediction in Autonomous Racing |
|
Lee, Hojin | Ulsan National Institute of Science and Technology |
Nam, Youngim | Ulsan National Institute of Science and Technology |
Lee, Sanghun | Ulsan Institute of Science and Technology |
Kwon, Cheolhyeon | Ulsan National Institute of Science and Technology |
Keywords: Planning under Uncertainty, Integrated Planning and Learning, Machine Learning for Robot Control
Abstract: Autonomous racing confronts significant challenges in safely overtaking Opponent Vehicles (OVs) that exhibit uncertain trajectories, stemming from unknown driving policies. To address these challenges, this study proposes heterogeneous kernel metrics for Deep Kernel Learning (DKL), designed to robustly capture the diverse driving policies of OVs, and carry out precise trajectory predictions along with the associated uncertainties. A key virtue of the proposed kernel metrics lies in their ability to align similar driving policies and disjoin dissimilar ones in an unsupervised manner, given the observed interactions between the Ego Vehicle (EV) and OVs. The efficacy of the proposed method is substantiated through experimental studies on a 1/10th scale racecar platform, demonstrating improved prediction accuracy and thereby safely overtaking against OVs. Furthermore, our method is computationally efficient for onboard computing units, affirming its viability in fast-paced racing environments.
|
|
17:10-17:15, Paper ThET18.8 | |
Inferring Occluded Agent Behavior in Dynamic Games from Noise Corrupted Observations |
|
Qiu, Tianyu | University of Texas at Austin |
Fridovich-Keil, David | The University of Texas at Austin |
Keywords: Planning under Uncertainty, Optimization and Optimal Control, Multi-Robot Systems
Abstract: In mobile robotics and autonomous driving, it is natural to model agent interactions as the Nash equilibrium of a noncooperative, dynamic game. These methods inherently rely on observations from sensors such as lidars and cameras to identify agents participating in the game and, therefore, have difficulty when some agents are occluded. To address this limitation, this paper presents an occlusion-aware game-theoretic inference method to estimate the locations of potentially occluded agents, and simultaneously infer the intentions of both visible and occluded agents, which best accounts for the observations of visible agents. Additionally, we propose a receding horizon planning strategy based on an occlusion-aware contingency game designed to navigate in scenarios with potentially occluded agents. Monte Carlo simulations validate our approach, demonstrating that it accurately estimates the game model and trajectories for both visible and occluded agents using noisy observations of visible agents. Our planning pipeline significantly enhances navigation safety when compared to occlusion-ignorant baseline as well.
|
|
ThET19 |
407 |
Manufacturing and Processes |
Regular Session |
Chair: Lennartson, Bengt | Chalmers University of Technology |
Co-Chair: Zhou, Zhengxue | University of Liverpool |
|
16:35-16:40, Paper ThET19.1 | |
Domain Randomization for Object Detection in Manufacturing Applications Using Synthetic Data: A Comprehensive Study |
|
Zhu, Xiaomeng | KTH and Scania CV AB |
Henningsson, Jacob | Uppsala University |
Li, Duruo | Scania CV AB |
Mĺrtensson, Pär | Scania |
Hanson, Lars | Skövde University |
Björkman, Mĺrten | KTH |
Maki, Atsuto | KTH Royal Institute of Technology |
Keywords: Computer Vision for Manufacturing, Data Sets for Robotic Vision, Computer Vision for Automation
Abstract: This paper addresses key aspects of domain randomization in generating synthetic data for manufacturing object detection applications. To this end, we present a comprehensive data generation pipeline that reflects different factors: object characteristics, background, illumination, camera settings, and post-processing. We also introduce the Synthetic Industrial Parts Object Detection dataset (SIP15-OD) consisting of 15 objects from three industrial use cases under varying environments as a test bed for the study, while also employing an industrial dataset publicly available for robotic applications. In our experiments, we present more abundant results and insights into the feasibility as well as challenges of sim-to-real object detection. In particular, we identified material properties, rendering methods, post-processing, and distractors as important factors. Our method, leveraging these, achieves top performance on the public dataset with Yolov8 models trained exclusively on synthetic data; mAP@50 scores of 96.4% for the robotics dataset, and 94.1%, 99.5%, and 95.3% across three of the SIP15-OD use cases, respectively. The results showcase the effectiveness of the proposed domain randomization, potentially covering the distribution close to real data for the applications.
|
|
16:40-16:45, Paper ThET19.2 | |
Component-Aware Unsupervised Logical Anomaly Generation for Industrial Anomaly Detection |
|
Tong, Xuan | Fudan University |
Chang, Yang | Fudan University |
Zhao, Qing | Fudan University |
Yu, Jiawen | Fudan University |
Wang, Boyang | Fudan University |
Lin, Junxiong | Fudan University |
Lin, Yuxuan | Fudan University |
Mai, Xinji | Fudan University |
Wang, Haoran | Fudan University |
Tao, Zeng | Fudan University |
Wang, Yan | Fudan University |
Zhang, Wenqiang | Fudan University |
Keywords: Computer Vision for Manufacturing, Computer Vision for Automation, Deep Learning for Visual Perception
Abstract: Anomaly detection is critical in industrial manufacturing for ensuring product quality and improving efficiency in automated processes. The scarcity of anomalous samples limits traditional detection methods, making anomaly generation essential for expanding the data repository. However, recent generative models often produce unrealistic anomalies increasing false positives, or require real-world anomaly samples for training. In this work, we treat anomaly generation as a compositional problem and propose ComGEN, a component-aware and unsupervised framework that addresses the gap in logical anomaly generation. Our method comprises a multi-component learning strategy to disentangle visual components, followed by subsequent generation editing procedures. Disentangled text-to-component pairs, revealing intrinsic logical constraints, conduct attention-guided residual mapping and model training with iteratively matched references across multiple scales. Experiments on the MVTecLOCO dataset confirm the efficacy of ComGEN, achieving the best AUROC score of 91.2%. Additional experiments on the real-world scenario of Diesel Engine and widely-used MVTecAD dataset demonstrate significant performance improvements when integrating simulated anomalies generated by ComGEN into automated production workflows.
|
|
16:45-16:50, Paper ThET19.3 | |
Use the Force, Bot! - Force-Aware ProDMP with Event-Based Replanning |
|
Lödige, Paul Werner | Karlsruhe Institute of Technology |
Li, Maximilian Xiling | Karlsruhe Institute of Technology |
Lioutikov, Rudolf | Karlsruhe Institute of Technology |
Keywords: Learning from Demonstration, Imitation Learning
Abstract: Movement Primitives (MPs) are a well-established method for representing and generating modular robot trajectories. This work presents FA-ProDMP, a novel approach that introduces force awareness to Probabilistic Dynamic Movement Primitives (ProDMP). FA-ProDMP adapts trajectories during runtime to account for measured and desired forces, offering smooth trajectories and capturing position and force correlations across multiple demonstrations. FA-ProDMPs support multiple axes of force, making them agnostic to Cartesian or joint space control. This versatility makes FA-ProDMP a valuable tool for learning contact rich manipulation tasks, such power plug insertion. To reliably evaluate FA-ProDMP, this work additionally introduces a modular, 3D printed task suite called POEMPEL, inspired by the popular Lego Technic pins. POEMPEL mimics industrial peg-in-hole assembly tasks with force requirements and offers multiple parameters of adjustment, such as position, orientation and plug stiffness level, thereby varying the direction and amount of required forces. Our experiments demonstrate that FA-ProDMP outperforms other MP formulations on the POEMPEL setup and a electrical power plug insertion task, thanks to its replanning capabilities based on measured forces. These findings highlight how FA-ProDMP enhances the performance of robotic systems in contact-rich manipulation tasks.
|
|
16:50-16:55, Paper ThET19.4 | |
Reinforcement Learning on Reconfigurable Hardware: Overcoming Material Variability in Laser Material Processing |
|
Masinelli, Giulio | EPFL |
Rajani, Chang | Swiss Federal Laboratories for Materials Science and Technology |
Hoffmann, Patrik | Empa |
Wasmer, Kilian | EMPA |
Atienza, David | Epfl Sti Imt Esl |
Keywords: Manufacturing, Maintenance and Supply Chains, Reinforcement Learning, Hardware-Software Integration in Robotics
Abstract: Ensuring consistent processing quality is challenging in laser processes due to varying material properties and surface conditions. Although some approaches have shown promise in solving this problem via automation, they often rely on predetermined targets or are limited to simulated environments. To address these shortcomings, we propose a novel real-time reinforcement learning approach for laser process control, implemented on a Field Programmable Gate Array to achieve real-time execution. Our experimental results from laser welding tests on stainless steel samples with a range of surface roughnesses validated the method's ability to adapt autonomously, without relying on reward engineering or prior setup information. Specifically, the algorithm learned the optimal power profile for each unique surface characteristic, demonstrating significant improvements over hand-engineered optimal constant power strategies — up to 23% better performance on rougher surfaces and 7% on mixed surfaces. This approach represents a significant advancement in automating and optimizing laser processes, with potential applications across multiple industries.
|
|
16:55-17:00, Paper ThET19.5 | |
GenCo: A Dual LVLM Generate-Correct Framework for Adaptive Peg-In-Hole Robotics |
|
Zhou, Zhengxue | University of Liverpool |
Veeramani, Satheeshkumar | University of Liverpool |
Fakhruldeen, Hatem | University of Liverpool |
Uyanik, Seda | University of Liverpool |
Cooper, Andrew Ian | University of Liverpool |
Keywords: Perception-Action Coupling, Cognitive Control Architectures, Industrial Robots
Abstract: Recent advances in Vision Language Models (VLMs) have enhanced their application in robotics, encompassing both high-level task planning and low-level action control. Despite their strong performance across various robotic tasks, even for zero-shot scenarios, most VLM applications remain open-loop, adhering to a plan-and-execute paradigm without mechanisms to assess task completion. To address this limitation, we propose GenCo, a Generate-Correct framework designed to automate a peg-in-hole task using a UR5e robot. This framework integrates an VLM-based motion generator and motion expert, working collaboratively to refine and correct actions during robotic task execution. Both VLM agents are fine-tuned using the pre-trained LLaVA, enhancing adaptability and scaling efficiently to diverse tasks. Our experiments demonstrate the adaptiveness of the framework, improving the success rate for the peg-in-hole task by 12.75% compared to a single VLM open-loop method. Notably, in unseen scenarios, the success rate for a triangular peg was increased by 15%, and for a random-shaped peg by 17%, underscoring the system's effectiveness in handling novel tasks. Adaptive testing under varied camera positions demonstrated robust performance, affirming reliability despite shifts in the visual input. The framework is also designed to be lightweight and efficient, facilitating broader adoption and practical deployment. Access to our code and model is provided here: https://github.com/Zhengxuez/generate_correct
|
|
17:00-17:05, Paper ThET19.6 | |
ASCENT: Autonomous Skill Learning Toward Complex Embodied Tasks with Foundation Models |
|
Wu, Haolin | Sun Yat-Sen University |
Liu, Yuecheng | Huawei Noah's Ark Lab |
Dong, Junyi | Cornell University |
Zhang, Heng | Huawei |
Mao, Sitong | ShenZhen Huawei Cloud Computing Technologies Co., Ltd |
Wang, Hesheng | Shanghai Jiao Tong University |
Wu, Weigang | Sun Yat-Sen University |
Zhou, Shunbo | Huawei |
Keywords: Domestic Robotics
Abstract: Collecting data from simulated scenarios for training robotic skills provides a safer and more controllable alternative to real-world environments. However, it demands considerable effort, including the manual construction of simulation environments, the careful design of tasks, and the challenge of obtaining effective trajectories. These limitations hinder the efficiency of data collection from simulated scenarios. In this paper, we leverage the prior knowledge of Large Language Models (LLMs) and Large Multimodal Models (LMMs) to generate simulated scenarios and embodied tasks. We introduce a novel framework, ASCENT (Autonomous Skill learning toward Complex Embodied tasks with fouNdaTion models), designed to efficiently accomplish these tasks and generate trajectory data. ASCENT features a fully autonomous skill learning mechanism based on AI agent. During task training, the AI agent identifies suitable atomic skills from an atomic skill library to either directly complete the task or serve as an initial policy for further training. Newly acquired atomic skills are subsequently added to the library. To address training failures and enhance efficiency, the AI agent uses an LLM to automatically optimize the skill training process based on feedback received from simulations. Experimental results indicate that the number of training steps required for learning new tasks can be reduced by up to 65.9%.
|
|
17:05-17:10, Paper ThET19.7 | |
Ms. NAMI: Multimodal Semantic Navigation on Relative Metric Intention Graph |
|
Zhai, Shichao | Zhejiang University |
Cui, Yuxiang | Zhejiang University |
Ye, Shuhao | Zhejiang University |
Yu, Xuan | Zhejiang University |
Mao, Sitong | ShenZhen Huawei Cloud Computing Technologies Co., Ltd |
Zhou, Shunbo | Huawei |
Xiong, Rong | Zhejiang University |
Wang, Yue | Zhejiang University |
Keywords: Domestic Robotics, Autonomous Vehicle Navigation, Reinforcement Learning
Abstract: Embodied navigation in unknown environments presents the significant challenge of integrating tasks with multimodal goals into a unified framework. In this paper, we propose the Multimodal Semantic Navigation on Relative Metric Intention Graph (Ms. NAMI), a framework that integrates various navigation tasks with multimodal goals based on a relative topo-metric intention graph. A reinforcement learning based policy with a concise action space, consisting of frontier nodes and intention nodes, is designed to guide the agent to select reasonable sub-goals. A sparse reward design is introduced to reduce bias during training. Additionally, several engineering optimizations are implemented to enhance overall performance. The experimental results indicate that our method can achieve robust navigation performance in a variety of unknown environments.
|
|
ThET20 |
408 |
Agricultural Automation 4 |
Regular Session |
Chair: Hauser, Kris | University of Illinois at Urbana-Champaign |
Co-Chair: Behley, Jens | University of Bonn |
|
16:35-16:40, Paper ThET20.1 | |
Towards Autonomous Crop Monitoring: Inserting Sensors in Cluttered Environments |
|
Lee, Moonyoung | Carnegie Mellon University |
Berger, Aaron | Harvard University |
Guri, Dominic | Carnegie Mellon University |
Zhang, Kevin | Carnegie Mellon University |
Coffey, Lisa | Iowa State University |
Kantor, George | Carnegie Mellon University |
Kroemer, Oliver | Carnegie Mellon University |
Keywords: Agricultural Automation, Robotics and Automation in Agriculture and Forestry, Hardware-Software Integration in Robotics
Abstract: Monitoring crop nutrients can aid farmers in optimizing fertilizer use. Many existing robots rely on visionbased phenotyping, however, which can only indirectly estimate nutrient deficiencies once crops have undergone visible color changes. We present a contact-based phenotyping robot platform that can directly insert nitrate sensors into cornstalks to proactively monitor macronutrient levels in crops. This task is challenging because inserting such sensors requires subcentimeter precision in an environment which contains high levels of clutter, lighting variation, and occlusion. To address these challenges, we develop a robust perception-action pipeline to grasp stalks, and create a custom robot gripper which mechanically aligns the sensor before inserting it into the stalk. Through experimental validation on 48 unique stalks in a cornfield in Iowa, we demonstrate our platform’s capability of detecting a stalk with 94% success, grasping a stalk with 90% success, and inserting a sensor with 60% success. In addition to developing an autonomous phenotyping research platform, we share key challenges and insights obtained from deployment in the field. Our research platform is open-sourced, with additional information available at https://kantor-lab.github.io/cornbot.
|
|
16:40-16:45, Paper ThET20.2 | |
A Dataset and Benchmark for Shape Completion of Fruits for Agricultural Robotics |
|
Magistri, Federico | University of Bonn |
Läbe, Thomas | University of Bonn |
Marks, Elias Ariel | University of Bonn |
Nagulavancha, Sumanth | University of Bonn |
Pan, Yue | University of Bonn |
Smitt, Claus | University of Bonn |
Klingbeil, Lasse | University of Bonn |
Halstead, Michael Allan | Bonn University |
Kuhlmann, Heiner | University of Bonn |
McCool, Christopher Steven | University of Bonn |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Keywords: Robotics and Automation in Agriculture and Forestry, Data Sets for Robotic Vision, Agricultural Automation
Abstract: As the world population is expected to reach 10 billion by 2050, our agricultural production system needs to double its productivity despite a decline of human workforce in the agricultural sector. Autonomous robotic systems are one promising pathway to increase productivity by taking over labor-intensive manual tasks like fruit picking. To be effective, such systems need to monitor and interact with plants and fruits precisely, which is challenging due to the cluttered nature of agricultural environments causing, for example, strong occlusions. Thus, being able to estimate the complete 3D shapes of objects in presence of occlusions is crucial for automating operations such as fruit harvesting. In this paper, we propose the first publicly available 3D shape completion dataset for agricultural vision systems. We provide an RGB-D dataset for estimating the 3D shape of fruits. Specifically, our dataset contains RGB-D frames of single sweet peppers in lab conditions but also in a commercial greenhouse. For each fruit, we additionally collected high-precision point clouds that we use as ground truth. For acquiring the ground truth shape, we developed a measuring process that allows us to record data of real sweet pepper plants, both in the lab and in the greenhouse with high precision, and determine the shape of the sensed fruits. We release our dataset, consisting of almost 7,000 RGB-D frames belonging to more than 100 different fruits. We provide segmented RGB-D frames, with camera intrinsics to easily obtain colored point clouds, together with the corresponding high-precision, occlusion-free point clouds obtained with a high-precision laser scanner. We additionally enable evaluation of shape completion approaches on a hidden test set through a public challenge on a benchmark server.
|
|
16:45-16:50, Paper ThET20.3 | |
A Novel Control Strategy for Offset Points Tracking in the Context of Agricultural Robotics |
|
Ngnepiepaye Wembe, Stephane | University of Clermont Auvergne, French National Research Instit |
Rousseau, Vincent | IRSTEA |
Laconte, Johann | French National Research Institute for Agriculture, Food and The |
Lenain, Roland | INRAE |
Keywords: Agricultural Automation, Motion Control, Robotics and Automation in Agriculture and Forestry
Abstract: In this paper, we present a novel method to control a rigidly connected location on the vehicle, such as a point on the implement in case of agricultural tasks. Agricultural robots are transforming modern farming by enabling precise and efficient operations, replacing humans in arduous tasks while reducing the use of chemicals. Traditionally, path-following algorithms are designed to guide the vehicle’s center along a predefined trajectory. However, since the actual agronomic task is performed by the implement, it is essential to control a specific point on the tool itself rather than the vehicle’s center. As such, we present in this paper two approaches for achieving the control of an offset point on the robot. The first approach adapts existing control laws, initially intended for the rear axle’s midpoint, to manage the desired lateral deviation. The second approach employs backstepping control techniques to create a control law that directly targets the implement. We conduct real-world experiments, highlighting the limitations of traditional approaches for offset point control, and demonstrating the strengths and weaknesses of the proposed methods.
|
|
16:50-16:55, Paper ThET20.4 | |
Towards Over-Canopy Autonomous Navigation: Crop-Agnostic LiDAR-Based Crop-Row Detection in Arable Fields |
|
Liu, Ruiji | Carnegie Mellon University |
Yandun, Francisco | Carnegie Mellon University |
Kantor, George | Carnegie Mellon University |
Keywords: Agricultural Automation, Reactive and Sensor-Based Planning, Field Robots
Abstract: Autonomous navigation is crucial for various robotics applications in agriculture. However, many existing methods depend on RTK-GPS devices, which can be susceptible to loss of radio signal or intermittent reception of corrections from the internet. Consequently, research has increasingly focused on using RGB cameras for crop-row detection, though challenges persist when dealing with grown plants. This paper introduces a LiDAR-based navigation system that can achieve crop-agnostic over-canopy autonomous navigation in row-crop fields, even when the canopy fully blocks the inter-row spacing. Our algorithm can detect crop rows across diverse scenarios, encompassing various crop types, growth stages, illumination conditions, the presence of weeds, curved rows, and discontinuities. Without utilizing a global localization method (i.e., based on GPS), our navigation system can perform autonomous navigation in these challenging scenarios, detect the end of the crop rows, and navigate to the next crop row autonomously, providing a crop-agnostic approach to navigate an entire field. The proposed navigation system has undergone tests in various simulated and real agricultural fields, achieving an average cross-track error of 3.55 cm without human intervention. The system has been deployed on a customized UGV robot, which can be reconfigured depending on the field conditions.
|
|
16:55-17:00, Paper ThET20.5 | |
Safe Leaf Manipulation for Accurate Shape and Pose Estimation of Occluded Fruits |
|
Yao, Shaoxiong | University of Illinois Urbana-Champaign |
Pan, Sicong | University of Bonn |
Bennewitz, Maren | University of Bonn |
Hauser, Kris | University of Illinois at Urbana-Champaign |
Keywords: Agricultural Automation
Abstract: Fruit monitoring plays an important role in crop management, and rising global fruit consumption combined with labor shortages necessitates automated monitoring with robots. However, occlusions from plant foliage often hinder accurate shape and pose estimation. Therefore, we propose an active fruit shape and pose estimation method that physically manipulates occluding leaves to reveal hidden fruits. This paper introduces a framework that plans robot actions to maximize visibility and minimize leaf damage. We developed a novel scene-consistent shape completion technique to improve fruit estimation under heavy occlusion and utilize a perception-driven deformation graph model to predict leaf deformation during planning. Experiments on artificial and real sweet pepper plants demonstrate that our method enables robots to safely move leaves aside, exposing fruits for accurate shape and pose estimation, outperforming baseline methods. Project page: https://shaoxiongyao.github.io/lmap-ssc/.
|
|
17:00-17:05, Paper ThET20.6 | |
Autonomous Sensor Exchange and Calibration for Cornstalk Nitrate Monitoring Robot |
|
Lee, Janice Seungyeon | Carnegie Mellon University |
Detlefsen, Thomas | Carnegie Mellon University |
Lawande, Shara | Carnegie Mellon University |
Ghatge, Saudamini | Carnegie Mellon University |
Ramesh Shanthi, Shrudhi | Carnegie Mellon University |
Mukkamala, Sruthi | Carnegie Mellon University |
Kantor, George | Carnegie Mellon University |
Kroemer, Oliver | Carnegie Mellon University |
Keywords: Robotics and Automation in Agriculture and Forestry, Grippers and Other End-Effectors, Agricultural Automation
Abstract: Interactive sensors are an important component of robotic systems but often require manual replacement due to wear and tear. Automating this process can enhance system autonomy and facilitate long-term deployment. We developed an autonomous sensor exchange and calibration system for an agriculture crop monitoring robot that inserts a nitrate sensor into cornstalks. A novel gripper and replacement mechanism, featuring a reliable funneling design, were developed to enable efficient and reliable sensor exchanges. To maintain consistent nitrate sensor measurement, an on-board sensor calibration station was integrated to provide in-field sensor cleaning and calibration. The system was deployed at the Ames Curtis Farm in June 2024, where it successfully inserted nitrate sensors with high accuracy into 30 cornstalks with a 77% success rate.
|
|
17:05-17:10, Paper ThET20.7 | |
Enhancing Agricultural Environment Perception Via Active Vision and Zero-Shot Learning |
|
La Greca, Michele Carlo | Politecnico Di Milano |
Usuelli, Mirko | Politecnico Di Milano |
Matteucci, Matteo | Politecnico Di Milano |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, RGB-D Perception
Abstract: Agriculture, fundamental for human sustenance, faces unprecedented challenges. The need for efficient, human-cooperative, and sustainable farming methods has never been greater. The core contributions of this work involve leveraging Active Vision (AV) techniques and Zero-Shot Learning (ZSL) to improve the robot's ability to perceive and interact with agricultural environment in the context of fruit harvesting. The AV Pipeline implemented within ROS 2 integrates the Next-Best View (NBV) Planning for 3D environment reconstruction through a dynamic 3D Occupancy Map. Our system allows the robotics arm to dynamically plan and move to the most informative viewpoints and explore the environment, updating the 3D reconstruction using semantic information produced through ZSL models. Simulation and real-world experimental results demonstrate our system's effectiveness in complex visibility conditions, outperforming traditional and static predefined planning methods. ZSL segmentation models employed, such as YOLO World + EfficientViT SAM, exhibit high-speed performance and accurate segmentation, allowing flexibility when dealing with semantic information in unknown agricultural contexts without requiring any fine-tuning process.
|
|
17:10-17:15, Paper ThET20.8 | |
CitDet: A Benchmark Dataset for Citrus Fruit Detection |
|
James, Jordan | University of Texas at Arlington |
Manching, Heather K. | North Carolina State University |
Mattia, Matthew R. | USDA Agricultural Research Service |
Bowman, Kim D. | USDA Agricultural Research Service |
Hulse-Kemp, Amanda M. | US Department of Agriculture |
Beksi, William J. | The University of Texas at Arlington |
Keywords: Agricultural Automation, Data Sets for Robotic Vision, Deep Learning for Visual Perception
Abstract: In this letter, we present a new dataset to advance the state of the art in detecting citrus fruit and accurately estimate yield on trees affected by the Huanglongbing (HLB) disease in orchard environments via imaging. Despite the fact that significant progress has been made in solving the fruit detection problem, the lack of publicly available datasets has complicated direct comparison of results. For instance, citrus detection has long been of interest to the agricultural research community, yet there is an absence of work, particularly involving public datasets of citrus affected by HLB. To address this issue, we enhance state-of-the-art object detection methods for use in typical orchard settings. Concretely, we provide high-resolution images of citrus trees located in an area known to be highly affected by HLB, along with high-quality bounding box annotations of citrus fruit. Fruit on both the trees and the ground are labeled to allow for identification of fruit location, which contributes to advancements in yield estimation and potential measure of HLB impact via fruit drop. The dataset consists of over 32,000 bounding box annotations for fruit instances contained in 579 high-resolution images. In summary, our contributions are the following: (i) we introduce a novel dataset along with baseline performance benchmarks on multiple contemporary object detection algorithms, (ii) we show the ability to accurately capture fruit location on tree or on ground, and finally (ii) we present a correlation of our results with yield estimations.
|
|
ThET21 |
410 |
Integrating Motion Planning and Learning 3 |
Regular Session |
Chair: Balakirsky, Stephen | Georgia Tech |
Co-Chair: Solovey, Kiril | Technion--Israel Institute of Technology |
|
16:35-16:40, Paper ThET21.1 | |
Transformer-Enhanced Motion Planner: Attention-Guided Sampling for State-Specific Decision Making |
|
Zhuang, Lei | Harbin Institute of Technology |
Zhao, Jingdong | Harbin Institute of Technology |
Li, Yuntao | Harbin Institute of Technology |
Xu, Zichun | Harbin Institute of Technology, School of Mechatronics Engineeri |
Zhao, Liangliang | Harbin Institute of Technology |
Liu, Hong | Harbin Institute of Technology |
Keywords: Motion and Path Planning, Deep Learning Methods
Abstract: Sampling-based motion planning (SBMP) algorithms are renowned for their robust global search capabilities. However, the inherent randomness in their sampling mechanisms often results in inconsistent path quality and limited search efficiency. In response to these challenges, this work proposes a novel deep learning-based motion planning framework, named Transformer-Enhanced Motion Planner (TEMP), which synergizes a Co-Regulation Environmental Information Encoder (CEIE) with a Motion Planning Transformer (MPT). CEIE converts scenario data into encoded environmental information (EEI), providing MPT with an insightful understanding of the environment. MPT leverages an attention mechanism to dynamically recalibrate its focus on EEI, task objectives, and historical planning data, refining the sampling node generation. To demonstrate the capabilities of TEMP, we train our model using a dataset consisting of planning results produced by RRT*. CEIE and MPT are collaboratively trained, enabling CEIE to autonomously learn and extract patterns from environmental data, thereby forming informative representations that MPT can more effectively interpret and utilize for motion planning. Subsequently, we systematically evaluate TEMP's efficacy across diverse dimensions and assess it in out-of-distribution real-world scenarios, demonstrating that TEMP achieves exceptional performance metrics and a heightened degree of generalizability compared to state-of-the-art SBMPs.
|
|
16:40-16:45, Paper ThET21.2 | |
From Configuration-Space Clearance to Feature-Space Margin: Sample Complexity in Learning-Based Collision Detection |
|
Tubul, Sapir | Technion - Israel Institute of Technology |
Tamar, Aviv | Technion |
Solovey, Kiril | Technion--Israel Institute of Technology |
Salzman, Oren | Technion |
Keywords: Integrated Planning and Learning, Probability and Statistical Methods, Collision Avoidance
Abstract: Motion planning is a central challenge in robotics, with learning-based approaches gaining significant attention in recent years. Our work focuses on a specific aspect of these approaches: using machine-learning techniques, particularly Support Vector Machines (SVM), to evaluate whether robot configurations are collision free, an operation termed “collision detection”. Despite the growing popularity of these methods, there is a lack of theory supporting their efficiency and prediction accuracy. This is in stark contrast to the rich theoretical results of machine-learning methods in general and of SVMs in particular. Our work bridges this gap by analyzing the sample complexity of an SVM classifier for learning-based collision detection in motion planning. We bound the number of samples needed to achieve a specified accuracy at a given confidence level. This result is stated in terms relevant to robot motion planning such as the system’s clearance. Building on these theoretical results, we propose a collision-detection algorithm that can also provide statistical guarantees on the algorithm’s error in classifying robot configurations as collision-free or not.
|
|
16:45-16:50, Paper ThET21.3 | |
CTSAC: Curriculum-Based Transformer Soft Actor-Critic for Goal-Oriented Robot Exploration |
|
Yang, Chunyu | China University of Mining and Technology |
Bi, Shengben | China University of Mining and Technology |
Xu, Yihui | China University of Mining and Technology |
Zhang, Xin | China University of Mining and Technology |
Keywords: Integrated Planning and Learning, Reinforcement Learning, Planning under Uncertainty
Abstract: With the increasing demand for efficient and flexible robotic exploration solutions, Reinforcement Learning (RL) is becoming a promising approach in the field of autonomous robotic exploration. However, current RL-based exploration algorithms often face limited environmental reasoning capabilities, slow convergence rates, and substantial challenges in Sim-To-Real (S2R) transfer. To address these issues, we propose a Curriculum Learning-based Transformer Reinforcement Learning Algorithm (CTSAC) aimed at improving both exploration efficiency and transfer performance. To enhance the robot's reasoning ability, a Transformer is integrated into the perception network of the Soft Actor-Critic (SAC) framework, leveraging historical information to improve the farsightedness of the strategy. A periodic review-based curriculum learning is proposed, which enhances training efficiency while mitigating catastrophic forgetting during curriculum transitions. Training is conducted on the ROS-Gazebo continuous robotic simulation platform, with LiDAR clustering optimization to further reduce the S2R gap. Experimental results demonstrate the CTSAC algorithm outperforms the state-of-the-art non-learning and learning-based algorithms in terms of success rate and success rate-weighted exploration time. Moreover, real-world experiments validate the strong S2R transfer capabilities of CTSAC.
|
|
16:50-16:55, Paper ThET21.4 | |
Guiding Long-Horizon Task and Motion Planning with Vision Language Models |
|
Yang, Zhutian | Massachusetts Institute of Technology |
Garrett, Caelan | NVIDIA |
Fox, Dieter | University of Washington |
Lozano-Perez, Tomas | MIT |
Kaelbling, Leslie | MIT |
Keywords: Integrated Planning and Learning, Task and Motion Planning, Mobile Manipulation
Abstract: Vision-Language Models (VLM) can generate plausible high-level plans when prompted with a goal, the context, an image of the scene, and any planning constraints. However, there is no guarantee that the predicted actions are geometrically and kinematically feasible for a particular robot embodiment. As a result, many prerequisite steps such as opening drawers to access objects are often omitted. Task and motion planners can generate motion trajectories that respect the geometric feasibility of actions and insert physically necessary actions, but do not scale to everyday problems that require common-sense knowledge and involve large state spaces comprised of many variables. We leverage the VLM for 1) system dynamics (i.e. recipe) and 2) search help. We propose VLM-TAMP, a hierarchical planning algorithm that leverages a VLM to generate intermediate subgoals that guide the sampling of a task and motion planner. When a subgoal or action cannot be refined, the VLM is queried again for replanning. We evaluate VLM-TAMP on kitchen tasks where a robot must accomplish cooking goals that require performing 30-50 actions in sequence and interacting with up to 21 objects. We found that VLM-TAMP substantially outperforms baselines that rigidly and independently execute VLM-generated action sequences (success rate 50 to 100% versus 0%, average task completion percentage 72 to 100% versus 15 to 45%). See the project site https://zt-yang.github.io/vlm-tamp-robot/ for more information.
|
|
16:55-17:00, Paper ThET21.5 | |
CrowdSurfer: Sampling Optimization Augmented with Vector-Quantized Variational AutoEncoder for Dense Crowd Navigation |
|
Kumar, Naman | Robotics Research Center, IIIT Hyderabad, India |
Singha, Antareep | Robotics Research Center, IIIT Hyderabad |
Nanwani, Laksh | Robotics Research Center, IIIT Hyderabad, India |
Potdar, Dhruv | Robotics Research Center, IIIT Hyderabad, India |
Ramakrishnan, Tarun | Robotics Research Center, IIIT Hyderabad, India |
Rastgar, Fatemeh | Örebro University |
Idoko, Simon | University of Tartu |
Singh, Arun Kumar | University of Tartu |
Krishna, Madhava | IIIT Hyderabad |
Keywords: Integrated Planning and Learning, Collision Avoidance, Motion and Path Planning
Abstract: Navigation amongst densely packed crowds remains a challenge for mobile robots. The complexity increases further if the environment layout changes, making the prior computed global plan infeasible. In this paper, we show that it is possible to dramatically enhance crowd navigation by just improving the local planner. Our approach combines generative modelling with inference time optimization to generate sophisticated long-horizon local plans at interactive rates. More specifically, we train a Vector Quantized Variational AutoEncoder to learn a prior over the expert trajectory distribution conditioned on the perception input. At run-time, this is used as an initialization for a sampling-based optimizer for further refinement. Our approach does not require any sophisticated prediction of dynamic obstacles and yet provides state-of-the-art performance. In particular, we compare against the recent DRL-VO approach and show a 40% improvement in success rate and a 6% improvement in travel time.
|
|
17:00-17:05, Paper ThET21.6 | |
CLIMB: Language-Guided Continual Learning for Task Planning with Iterative Model Building |
|
Byrnes, Walker | Georgia Institute of Technology |
Bogdanovic, Miroslav | University of Toronto |
Balakirsky, Avi | The Ohio State University |
Balakirsky, Stephen | Georgia Tech |
Garg, Animesh | Georgia Institute of Technology |
Keywords: Integrated Planning and Learning, Continual Learning, Incremental Learning
Abstract: Intelligent and reliable task planning is a core capability for generalized robotics, which requires a descriptive domain representation that sufficiently models all object and state information for the scene. We present CLIMB, a continual learning framework for robot task planning that leverages foundation models and feedback from execution to guide the construction of domain models. CLIMB can build a model from a natural language description, learn non-obvious predicates while solving tasks, and store that information for future problems. We demonstrate the ability of CLIMB to improve performance in common planning environments compared to baseline methods. We also developed the BlocksWorld++ domain, a simulated environment with an easily usable real counterpart, together with a curriculum of tasks with progressing difficulty to evaluate continual learning.
|
|
17:05-17:10, Paper ThET21.7 | |
Safe Multi-Agent Navigation Guided by Goal-Conditioned Safe Reinforcement Learning |
|
Feng, Meng | MIT |
Parimi, Viraj | Massachusetts Institute of Technology |
Williams, Brian | MIT |
Keywords: Integrated Planning and Learning, Robot Safety, Reinforcement Learning
Abstract: Safe navigation is essential for autonomous systems operating in hazardous environments. Traditional planning methods are effective for solving long-horizon tasks but depend on the availability of a graph representation with predefined distance metrics. In contrast, safe Reinforcement Learning (RL) is capable of learning complex behaviors without relying on manual heuristics but fails to solve long-horizon tasks, particularly in goal-conditioned and multi-agent scenarios. In this paper, we introduce a novel method that integrates the strengths of both planning and safe RL. Our method leverages goal-conditioned RL (GCRL) and safe RL to learn a goal-conditioned policy for navigation while concurrently estimating cumulative distance and safety levels using learned value functions via an automated self-training algorithm. By constructing a graph with states from the replay buffer, our method prunes unsafe edges and generates a waypoint-based plan that the agent then executes by following those waypoints sequentially until their goal locations are reached. This graph pruning and planning approach via the learned value functions allows our approach to flexibly balance the trade-off between faster and safer routes especially over extended horizons. Utilizing this unified high-level graph and a shared low-level safe GCRL policy, we extend this approach to address the multi-agent safe navigation problem. In particular, we leverage Conflict-Based Search (CBS) to create waypoint-based plans for multiple agents allowing for their safer navigation over extended horizons. This integration enhances the scalability of goal-conditioned safe RL in multi-agent scenarios, enabling efficient coordination among agents. Extensive benchmarking against state-of-the-art baselines demonstrates the effectiveness of our method in achieving distance goals safely for multiple agents in complex and hazardous environments. More details can be found at https://safe-visual-mapf-mers.mit.csail.mit.
|
|
17:10-17:15, Paper ThET21.8 | |
Motion Planning for 2-DOF Transformable Wheel Robots Using Reinforcement Learning |
|
Park, Inha | Hanyang University |
Ryu, Sijun | Hanyang University |
Won, Jeeho | Hanyang University |
Yoon, Hyeongyu | Hanyang University |
Kim, SangGyun | Hanyang University |
Kim, Hwa Soo | Kyonggi University |
Seo, TaeWon | Hanyang University |
Keywords: Motion and Path Planning, Reinforcement Learning, Model Learning for Control
Abstract: Transformable robots have been developed to perform various tasks using flexible methods. However, the transformation properties present challenges in controlling and planning motion strategies, as the system model changes when transformations occur. To address this issue, we propose a planning framework based on artificial intelligence, called Geometric Manipulability Reinforcement Learning (GM-RL). GM-RL consists of two components: the manipulability estimator and the motion planner. The manipulability estimator employs graph neural networks (GNN) to provide action guidelines based on the dynamic manipulability of the transformable robots. The motion planner generates transformation plans using reinforcement learning (RL). The activation ratio alpha adjusts the ratio of the guideline accepted between the two components. In experiments utilizing a 2-DoF transformable wheel called STEP, GM-RL with alpha=0.5 generated an optimal transformation plan with an average dynamic manipulability measure of 0.0424, the highest measure compared to pure dynamic manipulability and reinforcement learning. A real-world experiment demonstrated that the transformation plan is efficient for overcoming stairs.
|
|
ThET22 |
411 |
Imitation Learning for Manipulation 2 |
Regular Session |
Chair: Martín-Martín, Roberto | University of Texas at Austin |
Co-Chair: Hou, Mengxue | University of Notre Dame |
|
16:35-16:40, Paper ThET22.1 | |
Towards Effective Utilization of Mixed-Quality Demonstrations in Robotic Manipulation Via Segment-Level Selection and Optimization |
|
Chen, Jingjing | Shanghai Jiao Tong University |
Fang, Hongjie | Shanghai Jiao Tong University |
Fang, Hao-Shu | Massachusetts Institute of Technology |
Lu, Cewu | ShangHai Jiao Tong University |
Keywords: Learning from Demonstration, Imitation Learning, Deep Learning in Grasping and Manipulation
Abstract: Data is crucial for robotic manipulation, as it underpins the development of robotic systems for complex tasks. While high-quality, diverse datasets enhance the performance and adaptability of robotic manipulation policies, collecting extensive expert-level data is resource-intensive. Consequently, many current datasets suffer from quality inconsistencies due to operator variability, highlighting the need for methods to utilize mixed-quality data effectively. To mitigate these issues, we propose "Select Segments to Imitate" (S2I), a framework that selects and optimizes mixed-quality demonstration data at the segment level, while ensuring plug-and-play compatibility with existing robotic manipulation policies. The framework has three components: demonstration segmentation dividing origin data into meaningful segments, segment selection using contrastive learning to find high-quality segments, and trajectory optimization to refine suboptimal segments for better policy learning. We evaluate S2I through comprehensive experiments in simulation and real-world environments across six tasks, demonstrating that with only 3 expert demonstrations for reference, S2I can improve the performance of various downstream policies when trained with mixed-quality demonstrations. Project website: https://tonyfang.net/s2i/.
|
|
16:40-16:45, Paper ThET22.2 | |
DABI: Evaluation of Data Augmentation Methods Using Downsampling in Bilateral Control-Based Imitation Learning with Images |
|
Kobayashi, Masato | Osaka University |
Buamanee, Thanpimon | Osaka University |
Uranishi, Yuki | Osaka University |
Keywords: Imitation Learning, Deep Learning in Grasping and Manipulation, Learning from Demonstration
Abstract: Autonomous robot manipulation is a complex and continuously evolving robotics field. This paper focuses on data augmentation methods in imitation learning. Imitation learning consists of three stages: data collection from experts, learning model, and execution. However, collecting expert data requires manual effort and is time-consuming. Additionally, as sensors have different data acquisition intervals, preprocessing such as downsampling to match the lowest frequency is necessary. Downsampling enables data augmentation and also contributes to the stabilization of robot operations. In light of this background, this paper proposes the Data Augmentation Method for Bilateral Control-Based Imitation Learning with Images, called "DABI". DABI collects robot joint angles, velocities, and torques at 1000 Hz, and uses images from gripper and environmental cameras captured at 100 Hz as the basis for data augmentation. This enables a tenfold increase in data. In this paper, we collected just 5 expert demonstration datasets. We trained the bilateral control Bi-ACT model with the unaltered dataset and two augmentation methods for comparative experiments and conducted real-world experiments. The results confirmed a significant improvement in success rates, thereby proving the effectiveness of DABI. For additional material, please check:https://mertcookimg.github.io/dabi
|
|
16:45-16:50, Paper ThET22.3 | |
Learning from Imperfect Demonstrations with Self-Supervision for Robotic Manipulation |
|
Wu, Kun | Syracuse University |
Liu, Ning | Beijing Innovation Center of Humanoid Robotics |
Zhao, Zhen | Midea Group |
Qiu, Di | Peking University |
Li, Jinming | Shanghai University |
Che, Zhengping | X-Humanoid |
Xu, Zhiyuan | Midea Group |
Tang, Jian | Midea Group (Shanghai) Co., Ltd |
Keywords: Learning from Demonstration, Imitation Learning, Deep Learning in Grasping and Manipulation
Abstract: Improving data utilization, especially for imperfect data from task failures, is crucial for robotic manipulation due to the challenging, time-consuming, and expensive data collection process in the real world. Current imitation learning (IL) typically discards imperfect data, focusing solely on successful expert data. While reinforcement learning (RL) can learn from explorations and failures, the sim2real gap and its reliance on dense reward and online exploration make it difficult to apply effectively in real-world scenarios. In this work, we aim to conquer the challenge of leveraging imperfect data without the need for reward information to improve the model performance for robotic manipulation in an offline manner. Specifically, we introduce a Self-Supervised Data Filtering framework (SSDF) that combines expert and imperfect data to compute quality scores for failed trajectory segments. High-quality segments from the failed data are used to expand the training dataset. Then, the enhanced dataset can be used with any downstream policy learning method for robotic manipulation tasks. Extensive experiments on the ManiSkill2 benchmark built on the high-fidelity Sapien simulator and real-world robotic manipulation tasks using the Franka robot arm demonstrated that the SSDF can accurately expand the training dataset with high-quality imperfect data and improve the success rates for all robotic manipulation tasks.
|
|
16:50-16:55, Paper ThET22.4 | |
MATCH POLICY: A Simple Pipeline from Point Cloud Registration to Manipulation Policies |
|
Huang, Haojie | Northeastern University |
Liu, Haotian | Worcester Polytechnic Institute |
Wang, Dian | Northeastern University |
Walters, Robin | Northeastern University |
Platt, Robert | Northeastern University |
Keywords: Learning from Demonstration, Imitation Learning, Transfer Learning
Abstract: Many manipulation tasks require the robot to rearrange objects relative to one another. Such tasks can be described as a sequence of relative poses between parts of a set of rigid bodies. In this work, we propose Match Policy, a simple but novel pipeline for solving high-precision pick and place tasks. Instead of predicting actions directly, our method registers the pick and place targets to the stored demonstrations. This transfers action inference into a point cloud registration task and enables us to realize nontrivial manipulation policies without any training. Match Policy is designed to solve high-precision tasks with a key-frame setting. By leveraging the geometric interaction and the symmetries of the task, it achieves extremely high sample efficiency and generalizability to unseen configurations. We demonstrate its state-of-the-art performance across various tasks on RLbench benchmark compared with several strong baselines and test it on a real robot with six tasks.
|
|
16:55-17:00, Paper ThET22.5 | |
Self-Improving Autonomous Underwater Manipulation |
|
Liu, Ruoshi | Columbia University |
Ha, Huy | Columbia University |
Hou, Mengxue | University of Notre Dame |
Song, Shuran | Stanford University |
Vondrick, Carl | Columbia |
Keywords: Sensorimotor Learning, Marine Robotics, Imitation Learning
Abstract: Underwater robotic manipulation faces significant challenges due to complex fluid dynamics and unstructured environments, causing most manipulation systems to rely heavily on human teleoperation. In this paper, we introduce AquaBot, a fully autonomous manipulation system that combines behavior cloning from human demonstrations with self-learning optimization to improve beyond human teleoperation performance. With extensive real-world experiments, we demonstrate AquaBot's versatility across diverse manipulation tasks, including object grasping, trash sorting, and rescue retrieval. Our real-world experiments show that AquaBot's self-optimized policy outperforms a human operator by 41% in speed. AquaBot represents a promising step towards autonomous and self-improving underwater manipulation systems.
|
|
17:00-17:05, Paper ThET22.6 | |
DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation Via Imitation Learning |
|
Jiang, Zhenyu | The Unversity of Texas at Austin |
Xie, Yuqi | University of Texas at Austin |
Lin, Kevin | Stanford |
Xu, Zhenjia | Columbia University |
Wan, Weikang | Peking University |
Mandlekar, Ajay Uday | NVIDIA |
Fan, Linxi | Stanford University |
Zhu, Yuke | The University of Texas at Austin |
Keywords: Imitation Learning, Big Data in Robotics and Automation, Learning from Demonstration
Abstract: Imitation learning from human demonstrations is an effective means to teach robots manipulation skills. But data acquisition is a major bottleneck in applying this paradigm more broadly, due to the high costs and human efforts involved. There has been significant interest in imitation learning for bimanual dexterous robots, like humanoids. Unfortunately, data collection is even more challenging here due to the difficulty of simultaneously controlling the two arms and multi-fingered hands. Automated data generation in simulation is a compelling, scalable alternative to fuel this need for training data. To this end, we introduce DexMimicGen, a large-scale automated data generation system that synthesizes trajectories from a handful of human demonstrations for bimanual robots with dexterous hands. We present a collection of simulation environments in the setting of bimanual dexterous manipulation, spanning a range of manipulation behaviors and different requirements for coordination among the two arms. We generate 21K demos across these tasks from just 60 source human demos and study the effect of several data generation and policy learning decisions on agent performance. Finally, we present a real-to-sim-to-real pipeline and deploy it on a real-world humanoid can sorting task. Generated datasets, simulation environments and additional results are at dexmimicgen.github.io.
|
|
17:05-17:10, Paper ThET22.7 | |
The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations |
|
von Hartz, Jan Ole | University of Freiburg |
Welschehold, Tim | Albert-Ludwigs-Universität Freiburg |
Valada, Abhinav | University of Freiburg |
Boedecker, Joschka | University of Freiburg |
Keywords: Imitation Learning, Learning from Demonstration, Sensorimotor Learning
Abstract: Task Parametrized Gaussian Mixture Models (TP-GMM) are a sample-efficient method for learning object-centric robot manipulation tasks. However, there are several open challenges to applying TP-GMMs in the wild. In this work, we tackle three crucial challenges synergistically. First, end-effector velocities are non-Euclidean and thus hard to model using standard GMMs. We thus propose to factorize the robot's end-effector velocity into its direction and magnitude, and model them using Riemannian GMMs. Second, we leverage the factorized velocities to segment and sequence skills from complex demonstration trajectories. Through the segmentation, we further align skill trajectories and hence leverage time as a powerful inductive bias. Third, we present a method to automatically detect relevant task parameters per skill from visual observations. Our approach enables learning complex manipulation tasks from just five demonstrations while using only RGB-D observations. Extensive experimental evaluations on RLBench demonstrate that our approach achieves state-of-the-art performance with 20-fold improved sample efficiency. Our policies generalize across different environments, object instances, and object positions, while the learned skills are reusable.
|
|
17:10-17:15, Paper ThET22.8 | |
ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos |
|
Shi, Junyao | University of Pennsylvania |
Zhao, Zhuolun | University of Pennsylvania, Skild AI |
Wang, Tianyou | University of Pennsylvania |
Pedroza, Ian | University of Pennsylvania |
Luo, Amy | University of Pennsylvania |
Wang, Jie | University of Pennsylvania |
Ma, Yecheng Jason | University of Pennsylvania |
Jayaraman, Dinesh | University of Pennsylvania |
Keywords: Imitation Learning, Sensorimotor Learning, Transfer Learning
Abstract: Many recent advances in robotic manipulation have come through imitation learning, yet these rely largely on mimicking a particularly hard-to-acquire form of demonstrations: those collected on the same robot in the same room with the same objects as the trained policy must handle at test time. In contrast, large pre-recorded human video datasets demonstrating manipulation skills in-the-wild already exist, which contain valuable information for robots. Is it possible to distill a repository of useful robotic skill policies out of such data without any additional requirements on robot-specific demonstrations or exploration? We present the first such system ZeroMimic, that generates immediately deployable image goal-conditioned skill policies for several common categories of manipulation tasks (opening, closing, pouring, pick&place, cutting, and stirring) each capable of acting upon diverse objects and across diverse unseen task setups. ZeroMimic is carefully designed to exploit recent advances in semantic and geometric visual understanding of human videos, together with modern grasp affordance detectors and imitation policy classes. After training ZeroMimic on the popular EpicKitchens dataset of ego-centric human videos, we evaluate its out-of-the-box performance in varied real-world and simulated kitchen settings with two different robot embodiments, demonstrating its impressive abilities to handle these varied tasks. To enable plug-and-play reuse of ZeroMimic policies on other task setups and robots, we release software and policy checkpoints of our skill policies.
|
|
ThET23 |
412 |
Autonomous Vehicle Perception 7 |
Regular Session |
Chair: Sun, Shunqiao | The University of Alabama |
Co-Chair: Zhou, MengChu | New Jersey Institute of Technology |
|
16:35-16:40, Paper ThET23.1 | |
Object Importance Estimation Using Counterfactual Reasoning for Intelligent Driving |
|
Gupta, Pranay | Carnegie Mellon University |
Biswas, Abhijat | Carnegie Mellon University |
Admoni, Henny | Carnegie Mellon University |
Held, David | Carnegie Mellon University |
Keywords: Autonomous Vehicle Navigation, Intelligent Transportation Systems
Abstract: The ability to identify important objects in a complex and dynamic driving environment is essential for autonomous driving agents to make safe and efficient driving decisions. It also helps assistive driving systems decide when to alert drivers. We tackle object importance estimation in a data-driven fashion and introduce HOIST -Human-annotated Object Importance in Simulated Traffic. HOIST contains driving scenarios with human-annotated importance labels for vehicles and pedestrians. We additionally propose a novel approach that relies on counterfactual reasoning to estimate an object's importance. We generate counterfactual scenarios by modifying the motion of objects and ascribe importance based on how the modifications affect the ego vehicle's driving. Our approach outperforms strong baselines for the task of object importance estimation on HOIST. We also perform ablation studies to justify our design choices and show the significance of the different components of our proposed approach.
|
|
16:40-16:45, Paper ThET23.2 | |
3D Multi-Modal Object Detection Based on Cross-Attention Feature Fusion |
|
Jhong, Sin-Ye | Tamkang University |
Ho, Min-Hsuan | National Taiwan University of Science and Technology |
Lu, Si-Yu | National Taiwan University |
Chen, Yung-Yao | National Taiwan University of Science and Technology |
Keywords: Object Detection, Segmentation and Categorization, Sensor Fusion
Abstract: In Advanced Driver Assistance Systems (ADAS), environmental perception and object detection are crucial for ensuring safe autonomous driving. Single-modality systems often struggle under adverse weather conditions, underscoring the need for multi-modal approaches. Current fusion methods typically rely on simplistic concatenation of multi-modal fea-tures, which neglects semantic alignment and does not fully exploit inter-modal correlations. This paper proposes a cross-attention feature fusion specifically designed to enhance the global correlation between camera and radar features. By dynamically adjusting feature weights through cross-attention, our approach significantly improves feature integration. Fur-thermore, we propose a depth-weighted voting fusion strategy to select the most accurate sensor depth, thereby enhancing decision-making stability. Experimental results on the nuScenes dataset show substantial improvements, with mean Average Precision (mAP) of 0.399 and mean Average Translation Error (mATE) of 0.602, highlighting the effectiveness of our approach in enhancing the robustness and accuracy of multi-modal fusion.
|
|
16:45-16:50, Paper ThET23.3 | |
Multi-Modality Test-Time Adaptation for Semantic Segmentation in Robotic Perception |
|
Liu, Yan | Sun Yat-Sen Univerisity |
Zhu, Hongyuan | A*STAR |
Zhang, Ye | Sun Yat-Sen University |
Lei, Yinjie | Sichuan University |
Guo, Yulan | Sun Yat-Sen University |
Keywords: Object Detection, Segmentation and Categorization, Computer Vision for Automation, Sensor Fusion
Abstract: Test-Time Adaptation (TTA) adjusts pre-trained models among unlabeled unseen environments during the test phase, making it more practical for robotic applications. However, the constant changes of the physical world create significant domain gaps between the received data during robot deployment and the source data used for training. In addition, existing methods mainly focus on a single modality, {e.g.}, RGB images, limiting the application of these methods in multi-modality input scenarios. In this work, we propose a Deep Multi-modality Aggregation Test-time Adaptation (DMATA) method to address the above mentioned issues. To prevent the domain shifts from disrupting the adaptation process, we first propose a Momentum-based Teacher-Student (MTS) framework. Since the teacher model and the student model contain complementary information, we design an Uncertainty-Guide (UG) feature fusion block to fuse the teacher model and student model of each modality. Finally, we introduce a 3D-Guide-2D (3G2) feature fusion block to extract spatial information from RGB images. In this way, 2D feature extraction is enhanced.
|
|
16:50-16:55, Paper ThET23.4 | |
MDC-Seg: Multi-Directional Convolution-Based Semantic Segmentation for LiDAR Point Clouds |
|
Ouyang, Xin | Northeastern University |
Qian, Xiaolong | Northeastern University, China |
Zhang, Yunzhou | Northeastern University |
Shen, You | Northeastern University |
Wang, Guiyuan | Jiangsu Shuguang Optoelectronics Co., Ltd., Yangzhou, China |
Liu, Wei | Jiangsu Shuguang Optoelectronics Co., Ltd., Yangzhou, China |
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: LiDAR point clouds 3D semantic segmentation enables efficient and accurate environmental sensing for intelligent vehicles and autonomous robots, greatly advancing these domains. Existing advanced methods use 3D sparse convolutional often suffer from a small Effective Receptive Field (ERF), limiting context sensing and challenging high-performance segmentation. Building on this observation, we propose MDC-Seg for efficient ERF enlargement. We design Multi-directional Convolution (MDConv), which simultaneously performs sparse feature encoding on the Bird's Eye View (BEV) and Range View (RV) planes to enlarge the ERF of 3D sparse convolution. To enhance feature fusion in MDConv, we introduce an attention mechanism and design an efficient multi-feature fusion (EMFF) module suitable for both 3D and 2D sparse features.To improve segmentation accuracy, we design a point-voxel constraint (PVC) module to handle edge voxels containing multiple point cloud categories, optimizing the final inference results. These modules add minimal memory and inference time but significantly improve performance compared to the baseline. Extensive experiments benchmarks on SemanticKITTI achieve excellent performance, while supplementary experiments on nuScenes also yield good results, demonstrating the superiority of MDC-Seg. The source code is available at https://github.com/OYgreat-river/MDC-Seg.
|
|
16:55-17:00, Paper ThET23.5 | |
Illumination Adaptation for SAM to Achieve Accurate Segmentation of Images Taken in Low-Light Scenes |
|
Mu, Hongmin | Beijing University of Chemical Technology |
Zhou, MengChu | New Jersey Institute of Technology |
Cao, Zhengcai | Harbin Institute of Technology |
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception
Abstract: Achieving accurate segmentation in low-light scenes is challenging due to 1) severe domain shift encountered when models trained on daylight data are applied to such scenes and 2) lack of large-scale fine-grained labels in low-light conditions. A good idea is to use the generalization capabilities of segmentation foundation models like Segment Anything Model (SAM) to address the scarcity of annotated data. However, applying SAM to low-light scenes faces a severe domain shift issue due to the lack of inductive bias in effectively transforming low-light features into natural-light ones. To address this issue, we propose to adapt SAM for low-light scenes. To reduce the reliance on labels of low-light data, we develop a self-training method that makes SAM generate source-free predictions. To reduce the domain gap between low-light target data and SAM's natural-light trained data, we design a transformation head that enhances low-light features prior to the application of SAM. We further propose a domain shift compensation loss that trains our model to select a domain-adaptation-optimal illumination-enhanced feature map. Experimental results demonstrate that our method well outperforms the state of the art on the Dark Zurich and Nighttime Driving datasets.
|
|
17:00-17:05, Paper ThET23.6 | |
4DRadDet: Cluster-Queried Enhanced 3D Object Detection with 4D Radar |
|
Weng, Caien | Tongji University |
Bi, Xin | College of Automotive Studies,Tongji University |
Tong, Panpan | Tongji University |
Eichberger, Arno | Graz University of Technology |
Keywords: Object Detection, Segmentation and Categorization, Intelligent Transportation Systems, Computer Vision for Automation
Abstract: 3D object detection plays a critical role in advancing autonomous driving technology. To improve perception capabilities while maintaining low costs and ensuring performance in adverse weather conditions, 4D radar has emerged as a promising alternative for 3D object detection. However, current methods fail to fully exploit raw data and density information of 4D radar point clouds to tackle challenges like sparse data and noise. To address these limitations and make use of the unique Doppler velocity information provided by 4D radar, we propose a novel approach called 4DRadDet, which uses cross-attention fusion with cluster-queried techniques for 3D object detection. The 4DRadDet model uses a specially designed incremental clustering method to cluster potential object point clouds, reducing measurement errors from limited radar angular resolution and signal multipath effects. The cross-attention feature fusion (CAFF) module enhances network performance by querying the clustered point cloud feature map, allowing the network to leverage reliable prior information from the clustered point cloud to better detect potential objects. Our experimental evaluations on the View-of-Delft (VoD) dataset demonstrate the effectiveness of 4DRadDet, showcasing state-of-the-art performance. Specifically, 4DRadDet achieves a 3D mean average precision (mAP3D) of 51.44% and a bird's-eye view mean average precision (mAPBEV) of 57.07%. Our proposed method demonstrates impressive inference times and achieves real-time detection capabilities.
|
|
17:05-17:10, Paper ThET23.7 | |
Robust Visual Localization System with HD Map Based on Joint Probabilistic Data Association |
|
Gu, Zizhen | Harbin Institute of Technology |
Cheng, Shaowu | Harbin Institute of Technology |
Wang, Chuan | Harbin Institute of Technology |
Wang, Ruihan | Harbin Institute of Technology |
Zhao, Yong | Harbin Institute of Technology |
Keywords: Autonomous Vehicle Navigation, Vision-Based Navigation, Localization
Abstract: Localization based on a high-definition (HD) map is a pivotal technology for autonomous driving. Nonetheless, establishing precise data association (DA) between detected landmarks and map landmarks presents a formidable challenge when leveraging prior information on maps. Traditional DA algorithms relying on nearest-neighbor methods only partially mitigate the ambiguity in DA caused by missed or false detections from the perception module, especially in complex and challenging environments. In this letter, we propose a novel joint probability data association (JPDA) algorithm. By integrating joint probability encompassing semantic likelihood, local spatial likelihood, and global structural likelihood of landmarks, alongside incorporating inter-frame temporal continuity of DA, the proposed algorithm can effectively rectify the erroneous DA. Additionally, we also introduce a max-mixture factor graph optimization framework, which couples the measurements of landmarks and odometry for pose estimation. Building upon these methods, a high-precision and robust visual semantic localization system employing consumer-level sensors has been developed. Experiments conducted on public datasets and real urban roads validate the efficacy of the proposed system in providing more robust and accurate localization results for autonomous driving vehicles.
|
|
17:10-17:15, Paper ThET23.8 | |
SALON: Self-Supervised Adaptive Learning for Off-Road Navigation |
|
Sivaprakasam, Matthew | Carnegie Mellon University |
Triest, Samuel | Carnegie Mellon University |
Ho, Cherie | Carnegie Mellon University |
Aich, Shubhra | Carnegie Mellon University Robotics Institute |
Lew, Jeric Jieyi | National University of Singapore |
Adu, Isaiah | Pennsylvania State University |
Wang, Wenshan | Carnegie Mellon University |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Vision-Based Navigation, Learning from Experience, Field Robots
Abstract: Autonomous robot navigation in off-road environments presents a number of challenges due to its lack of structure, making it difficult to handcraft robust heuristics for diverse scenarios. While learned methods using hand labels or self-supervised data improve generalizability, they often require a tremendous amount of data and can be vulnerable to domain shifts. To improve generalization in novel environments, recent works have incorporated adaptation and self-supervision to develop autonomous systems that can learn from their own experiences online. However, current works often rely on significant prior data, for example minutes of human teleoperation data for each terrain type, which is difficult to scale with more environments and robots. To address these limitations, we propose SALON, a perception-action framework for textit{fast} adaptation of traversability estimates with textit{minimal} human input. SALON rapidly learns online from experience while avoiding out of distribution terrains to produce adaptive and risk-aware cost and speed maps. Within textit{seconds} of collected experience, our results demonstrate comparable navigation performance over kilometer-scale courses in diverse off-road terrain as methods trained on 100-1000x more data. We additionally show promising results on significantly different robots in different environments. Our code is available at https://theairlab.org/SALON.
|
| |