IROS 2025 Program | Tuesday October 21, 2025


TuAT1	401
Award Finalists 1	Regular Session
Chair: Liu, Yunhui	Chinese University of Hong Kong
Co-Chair: Liu, Lu	City University of Hong Kong

10:30-10:35, Paper TuAT1.1
Neural MP: A Neural Motion Planner

Dalal, Murtaza	Carnegie Mellon University
Yang, Jiahui	Carnegie Mellon University
Mendonca, Russell	Carnegie Mellon University
Khaky, Youssef	Carnegie Mellon University
Salakhutdinov, Ruslan	University of Toronto
Pathak, Deepak	Carnegie Mellon University
Keywords: Big Data in Robotics and Automation, Data Sets for Robot Learning, Machine Learning for Robot Control Abstract: The current paradigm for motion planning generates solutions from scratch for every new problem, which consumes significant amounts of time and computational resources. For complex, cluttered scenes, motion planning approaches can often take minutes to produce a solution, while humans are able to accurately and safely reach any goal in seconds by leveraging their prior experience. We seek to do the same by applying data-driven learning at scale to the problem of motion planning. Our approach builds a large number of complex scenes in simulation, collects expert data from a motion planner, then distills it into a reactive neural policy. We then combine this with lightweight optimization to obtain a safe path for real world deployment. We perform a thorough evaluation of our method on 64 motion planning tasks across four diverse environments with randomized poses, scenes and obstacles, in the real world, demonstrating an improvement of 23%, 17% and 79% motion planning success rate over state of the art sampling, optimization and learning based planning methods. All code, models and datasets will be released on acceptance. Video results available at mihdalal.github.io/neuralmotionplanner.

10:35-10:40, Paper TuAT1.2
Interactive Navigation for Legged Manipulators with Learned Arm-Pushing Controller

Bi, Zhihai	Hong Kong University of Science and Technology (Guangzhou)
Chen, Kai	The Hong Kong University of Science and Technology
Zheng, Chunxin	The Hong Kong University of Science and Technology(Guangzhou)
Li, Yulin	Hong Kong University of Science and Technology(HKUST)
Li, Haoang	Hong Kong University of Science and Technology (Guangzhou)
Ma, Jun	The Hong Kong University of Science and Technology
Keywords: Mobile Manipulation, Autonomous Vehicle Navigation, Collision Avoidance Abstract: Interactive navigation is crucial in scenarios where proactively interacting with objects can yield shorter paths, thus significantly improving traversal efficiency. While existing methods primarily rely on body-velocity-based pushing for relocation of large obstacles (which could be comparable to the size of a robot), they prove ineffective in narrow or constrained spaces where the robot's dimensions restrict its manipulation capabilities, thus compromising clearance and overall navigation performance. This paper introduces a novel interactive navigation framework for legged manipulators, featuring an active arm-pushing mechanism for effective obstacle clearance. The framework enables the robot to dynamically detect and reposition movable obstacles in space-constrained environments, leading to more feasible and efficient navigation paths. At the core of this framework, we develop a reinforcement learning-based arm-pushing controller with a two-stage reward strategy for large-object manipulation. Specifically, this strategy progressively refines the pushing behavior, first guiding the manipulator to a pre-pushing zone for kinematically feasible contact configuration, and then maintaining end-effector positioning at appropriate contact points for stable object displacement while preventing toppling. Through simulations and ablation studies, we validate the robustness of the arm-pushing controller, showing that the two-stage reward strategy improves policy convergence and long-term performance in large-object manipulation. Real-world experiments further underscore its effectiveness in confined environments with movable obstacles, achieving shorter paths and reduced traversal time. The open-source project can be found at url{https://github.com/Zhihaibi/Interactive-Navigation-for -legged-manipulator.git}.

10:40-10:45, Paper TuAT1.3
Implicit Disparity-Blur Alignment for Fast and Precise Autofocus in Robotic Microsurgical Imaging

Fu, Pan	Beijing Institute of Technology
Li, Zhen	Institute of Automation, Chinese Academy of Sciences
Zhang, Ming-Yang	Institute of Automation, Chinese Academy of Sciences
Zhai, Yu-Peng	Taiyuan University of Technology
Wang, Junzheng	Beijing Institute of Technology
He, Wenhao	Institute of Automation, Chinese Academy of Sciences
Bian, Gui-Bin	Institute of Automation, Chinese Academy of Sciences
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, Visual Servoing Abstract: Creating an intelligent surgical environment requires not only advanced robotic systems but also optimized microscopic imaging. However, autofocus remains a fundamental challenge, with current methods suffering from slow iterative processes or directional ambiguity, which compromises real-time performance. This paper presents an implicit disparity-blur alignment approach for robotic microsurgical autofocus, integrating stereo geometry’s monotonic depth cues with defocus characteristics for rapid convergence. A novel physics-guided dual-stream network is developed to encode implicit depth representations through hierarchical cross-pathway feature fusion, enabling reliable focus prediction without explicit stereo matching in blur-degraded regions. An ROI-aware attention module is proposed to dynamically optimize focus-critical regions, coupled with learnable physics-guided kernel learning for precise Z-offset estimation. The approach achieves a top directional accuracy of 94.85% and a single-pass focus error of 0.20 mm with an inference time of 53 ms on a surgical dataset, which outperforms state-of-the-art methods in reducing iteration count by 22.8% and inference time by 51.8%. An intelligent robotic microscope prototype is developed, with validation through ex vivo tests demonstrating its ability to enable fast and precise multi-region focusing for microsurgeries.

10:45-10:50, Paper TuAT1.4
Accelerating Layered Manufacturing-Based 3D Printing through Optimized Non-Printing Travel-Path Planning and Infill Strategies

Wang, Liuyin	University of Nevada, Reno
Hua, Weijian	University of Nevada, Reno
Jin, Yifei	University of Nevada Reno
Shen, Yantao	University of Nevada, Reno
Keywords: Additive Manufacturing, Intelligent and Flexible Manufacturing Abstract: This paper presents a rapid 3D printing framework that enhances the efficiency of commercially available layered manufacturing-based 3D printers. Unlike traditional methods that simplify printing regions to a single point or rely on predefined entry and exit points, our approach utilizes an improved Traveling Salesman Problem (TSP) algorithm to autonomously generate an optimized, cyclic printing path, while automatically assigning entry and exit points for each region. This minimizes non-printing paths and improves efficiency. Additionally, we propose a principal axis calculation method for irregular shapes, aligning better with geometric orientation. This optimization enhances infill uniformity and surface smoothness. Simulations and experimental results demonstrate that the proposed framework improves printing efficiency while maintaining print quality, with promising applicability to large-scale and complex 3D printing models.

10:50-10:55, Paper TuAT1.5
Designing a Magnetic Endoscope for in Vivo Contact-Based Tissue Scanning Using Developable Roller

Greenidge, Nikita Jasmine	University of Leeds
Marzi, Christian	Karlsruhe Institute of Technology
Calmé, Benjamin	STORM Lab, School of Electronic and Electrical Engineering, Univ
Martin, James William	University of Leeds
Scaglioni, Bruno	University of Leeds
Mathis-Ullrich, Franziska	Friedrich-Alexander-University Erlangen-Nurnberg (FAU)
Valdastri, Pietro	University of Leeds
Keywords: Medical Robots and Systems, Sensor-based Control, Underactuated Robots Abstract: Magnetic manipulation has been adopted as a method of actuation in both wireless capsule endoscopy and softtethered endoscopy, with the goal of improving gastrointestinal procedures. However, by nature of magnetic manipulation, these endoscopes are typically limited to a maximum of five degrees of freedom (DoF). With the need to introduce additional contactbased sensing modalities for subsurface investigation into these systems as well as to improve overall dexterity, it is both practically and clinically beneficial to recover the lost DoF i.e. the roll around the main axis. This paper presents a method of achieving the magnetic manipulation of an underactuated device by leveraging developable surfaces, specifically, the oloid shape. The design of a clinically relevant magnetic endoscope with all its ancillary elements, as well as contact sensors, is proposed and demonstrated in vivo. The contact sensor data from the in vivo experiments show that for sweeping motions over 100◦ of roll, contact between the endoscope’s sensor region and the colon wall can be maintained for 74% of the motion.

10:55-11:00, Paper TuAT1.6
PhysGCN-DL: Physics-Informed Graph Convolutional Networks with Diversity-Aware Loss Optimization for Multimodal Pedestrian Trajectory Prediction

Jiang, ZiHan	Tongji University
Liu, Ruonan	Shanghai Jiao Tong University
Yibo, Zhou	Shanghai Jiao Tong University
Lu, Haibo	Peng Cheng Laboratory
Yang, Boyuan	Nanjing University
Lin, Di	Tianjin University
Zhang, Weidong	Shanghai JiaoTong University
Keywords: Intelligent Transportation Systems, Deep Learning Methods, AI-Based Methods Abstract: Pedestrian trajectory prediction ensures safe navigation in autonomous driving and intelligent robots. Existing methods, including physical models, machine learning, and deep learning, have shown promising results but still face challenges in handling dynamic environments, social interactions, and high-dimensional data. In this paper, we propose a novel PhysGCN-DL model within the itransformer framework to address these challenges. Our model incorporates physically-inspired dynamic interaction modeling by representing physical interactions between pedestrians as edge weights in graph convolution. This approach captures the heterogeneity of pedestrian movement and improves the interpretability of social interactions. Additionally, we introduce a new loss function that balances diversity and accuracy in trajectory prediction, ensuring robustness in dense and sparse environments. Experimental results demonstrate the superiority of our model in predicting diverse and accurate pedestrian trajectories compared to existing methods.

11:00-11:05, Paper TuAT1.7
Reinforcement Learning Assist-As-Needed Control Promotes Recovery of Walking Speed Following Ankle Weight Perturbations

Li, Andy	Stevens Institute of Technology
Li, Haoran	Stevens Institute of Technology
Teker, Aytac	Stevens Institute of Technology
Hernandez-Rocha, Mariana	Stevens Institute of Technology
Gebre, Biruk	Stevens Institute of Technology
Nolan J., Karen	Kessler Foundation
Pochiraju, Kishore	Stevens Institute of Technology
Zanotto, Damiano	Stevens Institute of Technology
Keywords: Rehabilitation Robotics, Prosthetics and Exoskeletons, Wearable Robotics Abstract: Self-selected walking speed is a key outcome for exercise-based rehabilitation programs following lower-extremity trauma. This work introduces a novel reinforcement learning-based assist-as-needed (RL-AAN) controller for ankle exoskeletons, aimed at gait speed training. Built on an actor–critic architecture, the RL-AAN controller integrates a control objective that balances the trade-off between expected stride velocity (SV) errors and exoskeleton assistance. This approach allows the exoskeleton to progressively reduce ankle plantar- and dorsiflexion (PDF) assistance as the user’s performance improves, promoting active participation. The desired assistive torque is computed as the product of the actor output and the wearer’s biomechanical ankle PDF moment, estimated by a subject-agnostic model, thereby ensuring personalized and biomechanically relevant assistance. In a proof-of-concept study with healthy individuals walking on a self-paced treadmill with ankle weights, the RL-AAN controller outperformed a conventional Fixed-K controller—achieving greater immediate speed increases during assisted walking (14.2% vs. 10.0% relative to unassisted perturbed walking) and inducing short-term gait speed adaptation post-training, not observed with the conventional controller. These findings highlight the potential of RL-AAN control for subject-tailored gait training, with promising clinical implications for exercise-based rehabilitation in individuals with neurological or musculoskeletal gait impairments.

11:05-11:10, Paper TuAT1.8
Autonomous Hiking Trail Navigation Via Semantic Segmentation and Geometric Analysis

Reed, Camndon	West Virginia University
Arend Tatsch, Christopher Alexander	West Virginia University
Gross, Jason	West Virginia University
Gu, Yu	West Virginia University
Keywords: Field Robots, Vision-Based Navigation, Robotics and Automation in Agriculture and Forestry Abstract: Natural environments pose significant challenges for autonomous robot navigation, particularly due to their unstructured and ever-changing nature. Hiking trails, with their dynamic conditions influenced by weather, vegetation, and human traffic, represent one of these challenges. This work introduces a novel approach to autonomous hiking trail navigation that balances trail adherence with the flexibility to adapt to off-trail routes when necessary. The solution is a Traversability Analysis module that integrates semantic data from camera images with geometric information from LiDAR to create a comprehensive understanding of the surrounding terrain. A planner uses this traversability map to navigate safely, adhering to trails while allowing off-trail movement when necessary to avoid on-trail hazards or for safe off-trail shortcuts. The method is evaluated through simulation to determine the balance between semantic and geometric information in traversability estimation. These simulations tested various weights to assess their impact on navigation performance across different trail scenarios. Weights were then validated through autonomous field tests at the West Virginia University Core Arboretum, demonstrating the method's effectiveness in a real-world environment.


TuAT2	402
Mobile Manipulation 1	Regular Session
Chair: Chen, Haoyao	Harbin Institute of Technology, Shenzhen

10:30-10:35, Paper TuAT2.1
GeT-USE: Learning Generalized Tool Usage for Bimanual Mobile Manipulation Via Simulated Embodiment Extensions

Wu, Bohan	Stanford University
de La Sayette, Paul	Stanford
Fei-Fei, Li	Stanford University
Martín-Martín, Roberto	University of Texas at Austin
Keywords: Mobile Manipulation, Bimanual Manipulation, Deep Learning in Grasping and Manipulation Abstract: The ability to use random objects as tools in a generalizable manner is a missing piece in robots' intelligence today to boost their versatility and problem-solving capabilities. State-of-the-art robotic tool usage methods focused on procedurally generating or crowd-sourcing datasets of tools for a task to learn how to grasp and manipulate them for that task. However, these methods assume that only one object is provided and that it is possible, with the correct grasp, to perform the task; they are not capable of identifying, grasping, and using the best object for a task when many are available, especially when the optimal tool is absent. In this work, we propose GeT-USE, a two-step procedure that learns to perform real-robot generalized tool usage by learning first to extend the robot's embodiment in simulation and then transferring the learned strategies to real-robot visuomotor policies. Our key insight is that by exploring a robot's embodiment extensions (i.e., building new end-effectors) in simulation, the robot can identify the general tool geometries most beneficial for a task. This learned geometric knowledge can then be distilled to perform generalized tool usage tasks by selecting and using the best available real-world object as tool. On a real robot with 22 degrees of freedom (DOFs), GeT-USE outperforms state-of-the-art methods by 30-60% success rates across three vision-based bimanual mobile manipulation tool-usage tasks under unseen real-world objects, environments, and lighting conditions.

10:35-10:40, Paper TuAT2.2
Globally-Guided Geometric Fabrics for Reactive Mobile Manipulation in Dynamic Environments

Merva, Tomas	Technical University of Kosice
Bakker, Saray	Delft University of Technology
Spahn, Max	TU Delft
Zhao, Danning	Delft University of Technology (TU Delft)
Virgala, Ivan	Technical University of Košice
Alonso-Mora, Javier	Delft University of Technology
Keywords: Mobile Manipulation, Constrained Motion Planning, Collision Avoidance Abstract: Mobile manipulators operating in dynamic environments shared with humans and robots must adapt in real time to environmental changes to complete their tasks effectively. While global planning methods are effective at considering the full task scope, they lack the computational efficiency required for reactive adaptation. In contrast, local planning approaches can be executed online but are limited by their inability to account for the full task’s duration. To tackle this, we propose Globally-Guided Geometric Fabrics (G3F), a framework for real-time motion generation along the full task horizon, by interleaving an optimization-based planner with a fast reactive geometric motion planner, called Geometric Fabrics (GF). The approach adapts the path and explores a multitude of acceptable target poses, while accounting for collision avoidance and the robot’s physical constraints. This results in a real-time adaptive framework considering whole-body motions, where a robot operates in close proximity to other robots and humans. We validate our approach through various simulations and real-world experiments on mobile manipulators in multi-agent settings, achieving improved success rates compared to vanilla GF, Prioritized Rollout Fabrics and Model Predictive Control.

10:40-10:45, Paper TuAT2.3
PlaceNet: Obstacle Aware Mobile Manipulator Base Placement through Deep Learning

Navarro, Alex	University of Texas at Austin
Pryor, Mitchell	University of Texas
Keywords: Mobile Manipulation, Deep Learning in Grasping and Manipulation, Task and Motion Planning Abstract: In this work, we present PlaceNet: a deep learning framework for mobile manipulator base placement which provides solutions to the shortcomings common in the state-of-the-art. Our method addresses the lack of obstacle awareness of reachability methods and the limited generalization of learning methods. Using only the raw pointcloud and task pose data as input, PlaceNet learns the concepts of reachability and obstacle occlusions in an environment-independent manner, enabling its use in situations outside its training experiences. Tests comparing PlaceNet to inverse reachability and heuristic methods demonstrated state-of-the-art performance in both the In-Distribution and Out-Of-Distribution test sets, achieving as high as 98% success rate for problems with many solutions, and an 82% success rate overall. PlaceNet can be trained on grounded pointcloud data from any source without the need for dynamic simulation, marking it as an accessible alternative to similar frameworks which require expensive, high-performance GPUs for running simultaneous simulation and training or which depend on labor intensive data collection. PlaceNet is lightweight during deployment and can easily run with low latency on affordable hardware, including laptop GPUs and the NVIDIA Jetson line for embedded deployment.

10:45-10:50, Paper TuAT2.4
Benchmarking Long-Horizon Mobile Manipulation in Multi-Room Dynamic Environments

Zhang, Junbo	Tsinghua University
Ma, Kaisheng	Tsinghua University
Keywords: Mobile Manipulation, Deep Learning Methods, Visual Learning Abstract: Long-horizon reasoning and task execution are crucial for complex mobile manipulation tasks in household environments. Existing benchmarks and methods primarily focus on single-room or single-object mobile manipulation scenarios, limiting the scope of long-horizon planning and scene-level understanding. To address this gap, we introduce a novel benchmark for long-horizon mobile manipulation in multi-room household environments. Our task requires agents to follow a sequence of language instructions, each directing the movement of specific objects across receptacles and rooms. In this task, we investigate the role of long-term memory by constructing a hierarchical scene graph that captures the relationships between objects, furniture, and rooms. This scene graph-based memory is dynamically updated as the agent explores the environment, which effectively aligns the scene information with the targets and environmental context specified in the language instructions. Additionally, we benchmark the proposed task in dynamic environments where objects can be relocated during task execution, simulating real-world scenarios. Our results demonstrate that the scene graph-based memory significantly improves the agent's performance in long-horizon mobile manipulation tasks. Moreover, dynamically updating the state of objects within the scene graph enables the agent to better adapt to dynamic conditions.

10:50-10:55, Paper TuAT2.5
Manipulate-To-Navigate: Reinforcement Learning with Visual Affordances and Manipulability Priors

Zhang, Yuying	Aalto University
Pajarinen, Joni	Aalto University
Keywords: Mobile Manipulation, Legged Robots, Reinforcement Learning Abstract: Mobile manipulation in dynamic environments is challenging due to movable obstacles blocking the robot’s path. Traditional methods, which treat navigation and manipulation as separate tasks, often fail in such "manipulate-to-navigate" scenarios, as obstacles must be removed before navigation. In these cases, active interaction with the environment is required to clear obstacles while ensuring sufficient space for movement. To address the manipulate-to-navigate problem, we propose a reinforcement learning-based approach for learning manipulation actions that facilitate subsequent navigation. Our method combines manipulability priors to focus the robot on high manipulability body positions with affordance maps for selecting high-quality manipulation actions. By focusing on feasible and meaningful actions, our approach reduces unnecessary exploration and allows the robot to learn manipulation strategies more effectively. We present two new manipulate-to-navigate simulation tasks called Reach and Door with the Boston Dynamics Spot robot. The first task tests whether the robot can select a good hand position in the target area such that the robot base can move effectively forward while keeping the end effector position fixed. The second task requires the robot to move a door aside in order to clear the navigation path. Both of these two tasks need first manipulation and then navigating the base forward. Results show that our method allows a robot to effectively interact with and traverse dynamic environments. Finally, we transfer the learned policy to a real Boston Dynamics Spot robot, which successfully performs the Reach task.

10:55-11:00, Paper TuAT2.6
Dynamic Open-Vocabulary 3D Scene Graphs for Long-Term Language-Guided Mobile Manipulation

Yan, Zhijie	Beihang University
Li, Shufei	City University of Hong Kong
Wang, Zuoxu	Beihang University
Wu, Lixiu	Minzu University of China
Wang, Han	Institute for AI Industry Research, Tsinghua University
Zhu, Jun	Afanti
Chen, Lijiang	Afanti Tech LLC
Liu, Jihong	Beihang University
Keywords: Mobile Manipulation, Long term Interaction, Semantic Scene Understanding Abstract: Enabling mobile robots to perform long-term tasks in dynamic real-world environments is a formidable challenge, especially when the environment changes frequently due to human-robot interactions or the robot's own actions. Traditional methods typically assume static scenes, which limits their applicability in the continuously changing real world. To overcome these limitations, we present DovSG, a novel mobile manipulation framework that leverages dynamic open-vocabulary 3D scene graphs and a language-guided task planning module for long-term task execution. DovSG takes RGB-D sequences as input and utilizes vision-language models (VLMs) for object detection to obtain high-level object semantic features. Based on the segmented objects, a structured 3D scene graph is generated for low-level spatial relationships. Furthermore, an efficient mechanism for locally updating the scene graph, allows the robot to adjust parts of the graph dynamically during interactions without the need for full scene reconstruction. This mechanism is particularly valuable in dynamic environments, enabling the robot to continually adapt to scene changes and effectively support the execution of long-term tasks. We validated our system in real-world environments with varying degrees of manual modifications, demonstrating its effectiveness and superior performance in long-term tasks. Our project page is available at https://bjhyzj.github.io/dovsg-web.

11:00-11:05, Paper TuAT2.7
CushionCatch: A Compliant Catching Mechanism for Mobile Manipulators Via Combined Optimization and Learning

Chen, Bingjie	Tsinghua
Fan, Keyu	Tsinghua University
Yang, Qi	Tsinghua University
Cheng, Yi	Tsinghua University
Liu, Houde	Shenzhen Graduate School, Tsinghua University
Dong, Kangkang	Jianghuai Advanced Technology Center
Xia, Chongkun	Sun Yat-Sen University
Han, Liang	Hefei University of Technology
Liang, Bin	Center for Artificial Intelligence and Robotics, Graduate School
Keywords: Mobile Manipulation, Manipulation Planning, Learning from Demonstration Abstract: Catching flying objects with a cushioning process is a skill commonly performed by humans, yet it remains a significant challenge for robots. In this paper, we present a framework that combines optimization and learning to achieve compliant catching on mobile manipulators (CCMM). First, we propose a high-level capture planner for mobile manipulators (MM) that calculates the optimal capture point and joint configuration. Next, the pre-catching (PRC) planner ensures the robot reaches the target joint configuration as quickly as possible. To learn compliant catching strategies, we propose a network that leverages the strengths of LSTM for capturing temporal dependencies and positional encoding for spatial context (P-LSTM). This network is designed to effectively learn compliant strategies from human demonstrations. Following this, the post-catching (POC) planner tracks the compliant sequence output by the P-LSTM while avoiding potential collisions due to structural differences between humans and robots. We validate the CCMM framework through both simulated and real-world ball-catching scenarios, achieving a success rate of 98.70% in simulation, 92.59% in real-world tests, and a 28.7% reduction in impact torques. The open source code has be released for the reference of the community.

11:05-11:10, Paper TuAT2.8
Predictive Reachability for Embodiment Selection in Mobile Manipulation Behaviors

Feng, Xiaoxu	Osaka University
Horii, Takato	Osaka University
Nagai, Takayuki	Osaka University
Keywords: Mobile Manipulation, Reinforcement Learning Abstract: Mobile manipulators require coordinated control between navigation and manipulation to accomplish tasks. Typically, coordinated mobile manipulation behaviors have base navigation to approach the goal followed by arm manipulation to reach the desired pose. Selecting the embodiment between the base and arm can be determined based on reachability. Previous methods evaluate reachability by computing inverse kinematics and activate arm motions once solutions are identified. In this study, we introduce a new approach called predictive reachability that decides reachability based on predicted arm motions. Our model utilizes a hierarchical policy framework built upon a world model. The world model allows the prediction of future trajectories and the evaluation of reachability. The hierarchical policy selects the embodiment based on the predicted reachability and plans accordingly. Unlike methods that require prior knowledge about robots and environments for inverse kinematics, our method only relies on image-based observations. We evaluate our approach through basic reaching tasks across various environments. The results demonstrate that our method outperforms previous model-based approaches in both sample efficiency and performance, while enabling more reasonable embodiment selection based on predictive reachability.


TuAT3	403
In-Hand Manipulation	Regular Session
Co-Chair: Zhao, Na	Dalian Maritime University

10:30-10:35, Paper TuAT3.1
Learning Dexterous In-Hand Manipulation with Multifingered Hands Via Visuomotor Diffusion

Koczy, Piotr	KTH Royal Institute of Technology
Welle, Michael C.	KTH Royal Institute of Technology
Kragic, Danica	KTH
Keywords: In-Hand Manipulation, Imitation Learning, Multifingered Hands Abstract: We present a framework for learning dexterous in-hand manipulation with multifingered hands using visuomotor diffusion policies. Our system enables complex in-hand manipulation tasks, such as unscrewing a bottle lid with one hand, by leveraging a fast and responsive teleoperation setup for the four-fingered Allegro Hand. We collect high-quality expert demonstrations using an augmented reality (AR) interface that tracks hand movements and applies inverse kinematics and motion retargeting for precise control. The AR headset provides real-time visualization, while gesture controls streamline teleoperation. To enhance policy learning, we introduce a novel demonstration outlier removal approach based on HDBSCAN clustering and the Global-Local Outlier Score from Hierarchies (GLOSH) algorithm, effectively filtering out low-quality demonstrations that could degrade performance. We evaluate our approach extensively in real-world settings and provide all experimental videos on the project website: https://dex-manip.github.io/

10:35-10:40, Paper TuAT3.2
Wearable Roller Rings to Augment In-Hand Manipulation through Active Surfaces

Webb, Hayden	Rice University
Chanrungmaneekul, Podshara	Rice University
Yuan, Shenli	The Boston Dynamics AI Institute
Hang, Kaiyu	Rice University
Keywords: In-Hand Manipulation, Multifingered Hands Abstract: In-hand manipulation is a crucial ability for reorienting and repositioning objects within grasps. The main challenges in this are not only the complexity of the computational models, but also the risks of grasp instability caused by active finger motions, such as rolling, sliding, breaking, and remaking contacts. This paper presents the development of the Roller Ring (RR), a modular robotic attachment with active surfaces that is wearable by both robot and human hands to manipulate without lifting a finger. By installing the angled RRs on hands, such that their spatial motions are not colinear, we derive a general differential motion model for manipulating objects. Our motion model shows that complete in-hand manipulation skill sets can be provided by as few as only 2 RRs through non-holonomic object motions, while more RRs can enable enhanced manipulation dexterity with fewer motion constraints. Through extensive experiments, we test the RRs on both a robot hand and a human hand to evaluate their manipulation capabilities. We show that the RRs can be employed to manipulate arbitrary object shapes to provide dexterous in-hand manipulation.

10:40-10:45, Paper TuAT3.3
Open-Loop Deep Reinforcement Learning Control of Soft Robotic In-Hand Manipulations

Suske, Gabriel	Thales
Pilch, Samuel	University of Stuttgart
Beger, Artem	Festo SE & Co. KG
Heidingsfeld, Julia Laura	University of Stuttgart
Sawodny, Oliver	University of Stuttgart
Keywords: In-Hand Manipulation, Modeling, Control, and Learning for Soft Robots, Reinforcement Learning Abstract: In-hand manipulation tasks using hand-like robotic grippers offer a promising approach to accomplish various tasks in a human-centered environment. Due to the inherent safety of soft robots, in-hand manipulations performed by soft robots provide great opportunities for future human-robot collaboration, which is the scope of this paper. By modeling a new and innovative soft-robotic gripper known as the Anthropomorphic Soft Gripper and synthesis of an open-loop controller with deep reinforcement learning, it is demonstrated how the movement of objects by in-hand manipulations can be accomplished. Moreover, this work explores the application of deep reinforcement learning methods without the employment for domain randomization. As noted by Bhatt et al. the inherent soft properties of soft robotic grippers enable remarkably robust in-hand manipulation in open-loop control, giving the impetus for the approach that is being followed in this work. Motion sequences generated in simulation are successfully transferred to the real anthropomorphic soft gripper and validated in experiments.

10:45-10:50, Paper TuAT3.4
Vibration-Induced Friction Modulation to Enable Controlled Sliding for In-Hand Manipulation

Mane, Shambhuraj	Worcester Polytechnic Institute
Jagetia, Anuj	Worcester Polytechnic Institute
Naukudkar, Samruddhi	Worcester Polytechnic Institute
Morgan, Andrew	The AI Institute
Calli, Berk	Worcester Polytechnic Institute
Keywords: In-Hand Manipulation, Grippers and Other End-Effectors, Dexterous Manipulation Abstract: Achieving controlled sliding of objects on finger surfaces is a significant challenge for robots, substantially constraining their ability to perform complex in-hand manipulation tasks. In this work, we investigate the role of surface vibration in modulating the effective friction at the object-finger contact locations to facilitate controlled sliding. We demonstrate that friction at contact points can be reduced by applying targeted vibrations at specific locations on a robotic finger, creating regions that are suitable for sliding. In this way, we create sticking/sliding regions on finger surfaces on demand and can easily switch between sliding and rolling contacts. To investigate this phenomenon, we embedded an array of vibration modules into robotic fingers. We first analyzed the velocity fields created by surface vibrations on a single finger. Then, we developed a method to select the appropriate activation states of the modules that achieve the desired velocity field at a given object location. Utilizing these fingers and the vibration selection method, we formed a two-finger robotic hand and demonstrated controlled sliding and rotation of a held object within the hand. To the best of our knowledge, this is the first work that utilizes vibration-induced friction modulation for in-hand manipulation that can achieve combinations of object sliding and rolling actions.

10:50-10:55, Paper TuAT3.5
On the Role of Jacobians in Robust Manipulation

Grace, Joshua	Yale University
Chanrungmaneekul, Podshara	Rice University
Hang, Kaiyu	Rice University
Dollar, Aaron	Yale University
Keywords: In-Hand Manipulation, Dexterous Manipulation Abstract: Traditional robot control relies on analytical methods that require precise system models, which are hard to apply in real-world settings and limit generalization to arbitrary tasks. However, systems like serial manipulators and passively adaptive hands feature inherently stable regions without control discontinuities like loss of contact or singularities. In these regions, approximate controllers focusing on the correct direction of motion enable successful coarse manipulation. When coupled with a rough estimation of the motion magnitude, precision manipulation is achieved. Leveraging this insight, we introduce a novel inverse Jacobian estimation method that independently estimates the primary motion direction and magnitude of the manipulator's actuators. Our method efficiently estimates the direct mapping from task to actuator space with no need for a priori system knowledge enabling the same framework to control both hands and arms without compromising task performance. We present a novel control method with no a priori knowledge for precision manipulation. Experiments on the Yale Model O hand, Yale Stewart Hand, and a UR5e arm demonstrate that the inverse Jacobians estimated via our approach enable real-time control with submillimeter precision in manipulation tasks. These results highlight that online self-ID data alone is sufficient for precise real-world manipulation.


TuAT4	404
Robot Safety 1	Regular Session
Chair: Sun, Zhiyong	Peking University (PKU)

10:30-10:35, Paper TuAT4.1
Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards

Brunke, Lukas	Technical University of Munich
Zhang, Yanni	Technical University of Munich
Römer, Ralf	Technical University of Munich
Naimer, Jack	University of Toronto
Staykov, Nikola	ETH
Zhou, Siqi	Technical University of Munich
Schoellig, Angela P.	TU Munich
Keywords: Robot Safety, AI-Enabled Robotics Abstract: Ensuring safe interactions in human-centric environments requires robots to understand and adhere to constraints recognized by humans as "common sense" (e.g., "moving a cup of water above a laptop is unsafe as the water may spill" or "rotating a cup of water is unsafe as it can lead to pouring its content"). Recent advances in computer vision and machine learning have enabled robots to acquire a semantic understanding of and reason about their operating environments. While extensive literature on safe robot decision-making exists, semantic understanding is rarely integrated into these formulations. In this work, we propose a semantic safety filter framework to certify robot inputs with respect to semantically defined constraints (e.g., unsafe spatial relationships, behaviors, and poses) and geometrically defined constraints (e.g., environment-collision and self-collision constraints). In our proposed approach, given perception inputs, we build a semantic map of the 3D environment and leverage the contextual reasoning capabilities of large language models to infer semantically unsafe conditions. These semantically unsafe conditions are then mapped to safe actions through a control barrier certification formulation. We demonstrate the proposed semantic safety filter in teleoperated manipulation tasks and with learned diffusion policies applied in a real-world kitchen environment that further showcases its effectiveness in addressing practical semantic safety constraints. Together, these experiments highlight our approach's capability to integrate semantics into safety certification, enabling safe robot operation beyond traditional collision avoidance.

10:35-10:40, Paper TuAT4.2
Analysis and Mitigation of Inconsistencies in Blockchain-Enabled Robot Swarms

Simionato, Giada	University of Pisa
Strobel, Volker	Université Libre De Bruxelles
Cimino, Mario G. C. A.	University of Pisa
Dorigo, Marco	Université Libre De Bruxelles
Keywords: Swarm Robotics, Multi-Robot Systems, Robot Safety Abstract: Recent research has demonstrated that blockchain-enabled robot swarms—where robots coordinate using blockchain technology— can secure robot swarms by neutralizing malicious and malfunctioning robots. This security is achieved through blockchain technology’s consistency properties. However, prior work addressed malfunctions at the information level, that is, it studied how to neutralize robots that stored information in the blockchain that did not correspond to the real-world state (i.e., it studied the oracle problem). In contrast, this study focuses on inconsistencies at the blockchain protocol level. We analyze how network partitions, which may arise from robots’ local-only communication capabilities, malfunctioning hardware, or external attacks, can lead to inconsistent information in a robot swarm. In order to mitigate these disruptions, we propose a decentralized approach to detect partitions and a corresponding response. We study our approach in a swarm robotics simulator, where we demonstrate its effectiveness in reducing blockchain inconsistencies.

10:40-10:45, Paper TuAT4.3
Integrating Opinion Dynamics into Safety Control for Decentralized Airplane Encounter Resolution

Qi, Shuhao	Eindhoven University of Technology
Tang, Zhiqi	KTH Royal Institute of Technology
Sun, Zhiyong	Peking University (PKU)
Haesaert, Sofie	Eindhoven University of Technology
Keywords: Robot Safety, Collision Avoidance Abstract: As the airspace becomes increasingly congested, decentralized conflict resolution methods for airplane encounters have become essential. While decentralized safety controllers can prevent dangerous midair collisions, they do not always ensure prompt conflict resolution. As a result, airplane progress may be blocked for extended periods in certain situations. To address this blocking phenomenon, this paper proposes integrating bio-inspired nonlinear opinion dynamics into the airplane safety control framework, thereby guaranteeing both safety and blocking-free resolution. In particular, opinion dynamics enable the safety controller to achieve collaborative decision-making for blocking resolution and facilitate rapid, safe coordination without relying on communication or preset rules. Extensive simulation results validate the improved flight efficiency and safety guarantees. This study provides practical insights into the design of autonomous controllers for airplanes.

10:45-10:50, Paper TuAT4.4
Robots That Suggest Safe Alternatives

Jeong, Hyun Joe	University of California, San Diego
Chen, Rosy	Carnegie Mellon University
Bajcsy, Andrea	Carnegie Mellon University
Keywords: Robot Safety, Safety in HRI, Reinforcement Learning Abstract: Goal-conditioned policies, such as those learned via imitation learning, provide an easy way for humans to influence what tasks robots accomplish. However, these robot policies are not guaranteed to execute safely or to succeed when faced with out-of-distribution goal requests. In this work, we enable robots to know when they can confidently execute a user's desired goal, and automatically suggest safe alternatives when they cannot. Our approach is inspired by control-theoretic safety filtering, wherein a safety filter minimally adjusts a robot's candidate action to be safe. Our key idea is to pose alternative suggestion as a safe control problem in goal space, rather than in action space. Offline, we use reachability analysis to compute a goal-parameterized reach-avoid value network which quantifies the safety and liveness of the robot’s pre-trained policy. Online, our robot uses the reach-avoid value network as a safety filter, monitoring the human's given goal and actively suggesting alternatives that are similar but meet the safety specification. We demonstrate our Safe ALTernatives (SALT) framework in simulation experiments with Franka Panda tabletop manipulation. We find that SALT is able to learn to predict successful and failed closed-loop executions, is a less pessimistic monitor than open-loop uncertainty quantification, and proposes alternatives that consistently align with those that people find acceptable.

10:50-10:55, Paper TuAT4.5
Safe, Task-Consistent Manipulation with Operational Space Control Barrier Functions

Morton, Daniel	Stanford University
Pavone, Marco	Stanford University
Keywords: Robot Safety, Manipulation Planning, Optimization and Optimal Control Abstract: Safe real-time control of robotic manipulators in unstructured environments requires handling numerous safety constraints without compromising task performance. Traditional approaches, such as artificial potential fields (APFs), suffer from local minima, oscillations, and limited scalability, while model predictive control (MPC) can be computationally expensive. Control barrier functions (CBFs) offer a promising alternative due to their high level of robustness and low computational cost, but these safety filters must be carefully designed to avoid significant reductions in the overall performance of the manipulator. In this work, we introduce an Operational Space Control Barrier Function (OSCBF) framework that integrates safety constraints while preserving task-consistent behavior. Our approach scales to hundreds of simultaneous constraints while retaining real-time control rates, ensuring collision avoidance, singularity prevention, and workspace containment even in highly cluttered settings or during dynamic motions. By explicitly accounting for the task hierarchy in the CBF objective, we prevent degraded performance across both joint-space and operational-space tasks, when at the limit of safety. We validate performance in both simulation and hardware, and release our open-source high-performance code and media on our project webpage, https://stanfordasl.github.io/oscbf/

10:55-11:00, Paper TuAT4.6
Secure Safety Filter: Towards Safe Flight Control under Sensor Attacks

Tan, Xiao	California Institute of Technology
Sundar, Junior	Technology Innovation Institute
Bruzzone, Renzo	Technology Innovation Institute
Ong, Pio	California Institute of Technology
T. Lunardi, Willian	Technology Innovation Institute
Andreoni, Martin	Technology Innovation Institute
Tabuada, Paulo	UCLA
Ames, Aaron	Caltech
Keywords: Robot Safety, Robust/Adaptive Control, Collision Avoidance Abstract: Modern autopilot systems are prone to sensor attacks that can jeopardize flight safety. To mitigate this risk, we proposed a modular solution: the secure safety filter, which extends the well-established control barrier function (CBF)-based safety filter to account for, and mitigate, sensor attacks. This module consists of a secure state reconstructor (which generates plausible states) and a safety filter (which computes the safe control input that is closest to the nominal one). Differing from existing work focusing on linear, noise-free systems, the proposed secure safety filter handles bounded measurement noise and, by leveraging reduced-order model techniques, is applicable to the nonlinear dynamics of drones. Software-in-the-loop simulations and drone hardware experiments demonstrate the effectiveness of the secure safety filter in rendering the system safe in the presence of sensor attacks.

11:00-11:05, Paper TuAT4.7
SHIELD: Safety on Humanoids Via CBFs in Expectation on Learned Dynamics

Yang, Lizhi	California Institute of Technology
Werner, Blake	California Institute of Technology
Cosner, Ryan	California Institute of Technology
Fridovich-Keil, David	The University of Texas at Austin
Culbertson, Preston	Cornell University
Ames, Aaron	Caltech
Keywords: Robot Safety, Machine Learning for Robot Control, Collision Avoidance Abstract: Robot learning has produced remarkably effective “black-box” controllers for complex tasks such as dynamic locomotion on humanoids. Yet ensuring dynamic safety, i.e., constraint satisfaction, remains challenging for such policies. Reinforcement learning (RL) embeds constraints heuristically through reward engineering, and adding or modifying constraints requires retraining. Model-based approaches, like control barrier functions (CBFs), enable runtime constraint specification with formal guarantees but require accurate dynamics models. This paper presents SHIELD, a layered safety framework that bridges this gap by: (1) training a generative, stochastic dynamics residual model using real-world data from hardware rollouts of the nominal controller, capturing system behavior and uncertainties; and (2) adding a safety layer on top of the nominal (learned locomotion) controller that leverages this model via a stochastic discrete-time CBF formulation enforcing safety constraints in probability. The result is a minimally-invasive safety layer that can be added to the existing autonomy stack to give probabilistic guarantees of safety that balance risk and performance. In hardware experiments on an Unitree G1 humanoid, SHIELD enables safe navigation (obstacle avoidance) through varied indoor and outdoor environments using a nominal (unknown) RL controller and onboard perception.

11:05-11:10, Paper TuAT4.8
Feasibility Analysis of Real-Time Robustness Certification

Seferis, Emmanouil	NTUA
Kollias, Stefanos	National Technical University of Athens
Keywords: Robot Safety, Deep Learning Methods, Vision-Based Navigation Abstract: The robustness certification of deep neural networks (DNNs) is crucial in many safety-critical domains. Randomized Smoothing (RS) has emerged as the current state-of-the-art method for DNN robustness verification that successfully scales on large DNNs used in practice, has achieved excellent results, and has been extended for a large variety of adversarial perturbation scenarios. However, an important cost in RS is during inference, since it requires passing tens or hundreds of thousands of perturbed samples through the DNN to perform the verification. In this work we aim to address this, and explore what happens as we decrease the number of samples by orders of magnitude, and the effect on the certified radius. Surprisingly, we find that emph{the performance reduction in terms of average certified radius is not too large, even if we decrease the number of samples by two orders of magnitude, or more}. Moreover, we find that the resulting certified radius reduction can be mitigated using off-the-self methods designed to improve RS performance. This can pave the way for dramatically faster robustness certification, unlocking the possibility of performing it in real-time, which we demonstrate. We perform a detailed analysis, both theoretically and experimentally, and show promising results on the standard CIFAR-10 and ImageNet datasets.


TuAT5	407
Motion Control 1	Regular Session
Chair: Oh, Sehoon	DGIST
Co-Chair: Liang, Xiao	Nankai University

10:30-10:35, Paper TuAT5.1
Design and Flight Control of a Novel Thrust-Vectored Tricopter Using Twisting and Tilting Rotors

Li, Xinliang	Sun Yat-Sen University
Chen, Zheyu	Sun Yat-Sen University
Wei, Jingbo	Sun Yat-Sen University
Qin, Zijie	Sun Yat-Sen University
Chen, Weijian	Sun Yat-Sen University
Liu, Kun	Sun Yat-Sen University
Keywords: Aerial Systems: Mechanics and Control, Motion Control Abstract: This paper presents a novel, compact overactuated tricopter featuring a servo-driven twisting and tilting mechanism, preventing the adverse effects of internal force contradiction during flight. Each arm's vectored thrust is provided by a single motor, with the twisting and tilting angles controlled by two vertically mounted servos. These components are collectively mounted within a 3D-printed semi-ring structure, and are rigidly attached to the fuselage via carbon tubes at the twist end. To address the asymmetry inherent in the tricopter configuration, we conducted a qualitative analysis of the disturbances introduced by the actuators. Additionally, we emphasize the need to include gyroscopic torque effects caused by arm rotations. This issue is addressed using a control allocation method, with our proposed improved Force Decomposition (FD)-based iteration offering a low-cost computational solution. The dynamic models of the motion of rotational joints, identified and employed as virtual sensors, contribute to the estimation of the improved control effective matrix. This overactuated tricopter can operate like a conventional rotorcraft, with the added capability of achieving attitude adjustments through manual control inputs. Finally, we demonstrate the tricopter’s advantages by comparing simulations and flight experiments, both with and without the application of the improved method.

10:35-10:40, Paper TuAT5.2
Online Anti-Swing Trajectory Refinement for Variable-Length Cable-Suspended Aerial Transportation Robot

Yu, Hai	Nankai University
Yang, Zhichao	Nankai University
He, Wei	Nankai University
Han, Jianda	Nankai University
Fang, Yongchun	Nankai University
Liang, Xiao	Nankai University
Keywords: Aerial Systems: Mechanics and Control, Intelligent Transportation Systems, Underactuated Robots Abstract: Aerial robots have demonstrated significant potential in suspended cargo transportation, especially in industries such as logistics and food delivery. Due to the underactuated and nonlinear dynamics of the cable-suspended system, directly tracking a given trajectory with a multicopter without modifying its controller often leads to significant payload swing. This compromises the safety and stability of the cargo. To address the aforementioned issue, this paper proposes an online trajectory refinement method for a variable-length cable-suspended aerial transportation robot, independent from the control layer. By incorporating payload swing angle information, the reference trajectory is refined in real-time, effectively suppressing payload oscillations during transportation. Specially, Lyapunov techniques and LaSalle's invariance theorem are employed to rigorously guarantee the feasibility of the designed trajectory refinement scheme. Finally, hardware experiments are conducted to validate the effectiveness and superiority of the proposed method. The results demonstrate that the refined trajectory not only enables precise positioning of the multicopter, but also effectively suppresses payload oscillations during transportation, significantly enhancing the safety and reliability of the aerial cargo delivery.

10:40-10:45, Paper TuAT5.3
Differential-Flatness-Based Tracking Control for Tractor-Trailers in Reversing Maneuvers

Yang, Bo	Tsinghua University
Zhuang, Zhenhao	East China University of Science and Technology
Yu, Zitian	KargoBot
Wang, Qian	KargoBot
Wei, Junqing	KargoBot
Mo, Yilin	Tsinghua University
Yang, Wen	East China University of Science and Technology
Keywords: Autonomous Vehicle Navigation, Wheeled Robots, Motion Control Abstract: In this paper, we propose a differential-flatness-based control~(DFBC) framework for precise tracking control of tractor-trailers, particularly during reversing maneuvers, which are challenging due to the unstable nonlinear kinematics. The proposed controller leverages the differential flatness property of tractor-trailers, equivalently transforming the nonlinear kinematics into a Brunovsky canonical from, thus allowing the incorporation of a linear feedback mechanism within the space defined by the flat outputs and their derivatives. Compared to the traditional linear quadratic regulator~(LQR) controllers based on Jacobian linearization, the proposed DFBC method achieves higher precision and stability in reversing maneuvers. We also implement the proposed DFBC method on a self-developed 1/10 scale autonomous tractor-trailer to showcase its effectiveness in real-world scenarios.

10:45-10:50, Paper TuAT5.4
Steering Elongate Multi-Legged Robots by Modulating Body Undulation Waves

Flores, Esteban	Georgia Institute of Technology
Chong, Baxi	Georgia Institute of Technology
Soto, Daniel	Georgia Institute of Technology
Goldman, Daniel	Georgia Institute of Technology
Keywords: Biologically-Inspired Robots, Redundant Robots, Motion Control Abstract: Centipedes exhibit great maneuverability in diverse environments due to their many legs and body-driven control. By leveraging similar morphologies and control strategies, their robotic counterparts also demonstrate effective terrestrial locomotion. However, the success of these multi-legged robots is largely limited to forward locomotion; steering is substantially less studied, in part because of the difficulty in coordinating a high degree-of-freedom robot to follow predictable, planar trajectories. To resolve these challenges, we take inspiration from control schemes based on geometric mechanics(GM) in elongate systems' locomotion through highly damped environments. We model the elongate, multi-legged system as a ``terrestrial swimmer" in highly frictional environments and implement steering schemes derived from low-order templates. We identify an effective turning strategy by superimposing two traveling waves of lateral body undulation and further explore variations of the ``turning wave" to enable a spectrum of arc-following steering primitives. We test our hypothesized modulation scheme on a robophysical model and validate steering trajectories against theoretically predicted displacements producing steering radii between 0 and 0.6 body length. We then apply our control framework to Ground Control Robotics' elongate multi-legged robot, Major Tom, using these motion primitives to autonomously navigate around obstacles and corners on indoor and outdoor terrain. Our work creates a systematic framework for controlling these highly mobile devices in the plane using a low-order model based on sequences of body shape changes.

10:50-10:55, Paper TuAT5.5
LESO-Based NMPC Tracking Control of Climbing Robot on Large Components with Variable Curvature (I)

Tan, Ke	Huazhong University of Science and Technology
Gong, Zeyu	Huazhong University of Science and Technology
Tao, Bo	Huazhong University of Science and Technology
Zhang, Yuhao	Huazhong University of Science and Technology
Shi, Ying	Huazhong University of Science and Technology
Gu, Zhenfeng	Huazhong University of Science and Technology
Wu, Chong	Fuzhou University
Ding, Han	Huazhong University of Science and Technology
Keywords: Climbing Robots, Industrial Robots, Motion Control Abstract: Wheeled climbing robots have great application prospects in the machining of large components with variable curvature. However, its accurate motion control on variable curvature surfaces faces two fatal challenges. The varied contact states between the robot’s wheels and the variable curvature surfaces make it difficult to establish an accurate kinematics model. Additionally, there exists a different degree of robot slippage when the robot moves in different attitudes due to the dragging effect of gravity. To overcome the above problems, we first present a kinematics modeling method with instantaneous plane constraints on a variable curvature surface. Subsequently, a linear extended state observer (LESO)-based nonlinear model predictive control (NMPC) scheme is designed, in which the NMPC is used to calculate the nominal control inputs and the LESO is used to estimate and compensate for lumped disturbance brought by the robot slippage and surface constraints. Experiments on a real wind turbine blade with variable curvature show that the proposed control scheme can well eliminate the influence of lumped disturbance, and the climbing robots can achieve unbiased (AVG < 0.1 mm) and high-precision (RMSE < 2 mm) trajectory tracking. provides a solution for automatic and high-precision trajectory tracking for climbing robot with localization system in factories, offering the potential for high-quality machining of large components.

10:55-11:00, Paper TuAT5.6
Adaptive Observer-Based Sliding Mode Control for Piezoelectric Nanopositioning System (I)

Chen, Liheng	Harbin Engineering University
Tan, Yongjie	Harbin Engineering University
Fu, Shasha	Heilongjiang University
Keywords: Robust/Adaptive Control, Motion Control Abstract: This paper proposes a novel adaptive global sliding mode control strategy for a two-degree-of-freedom (2- DOF) piezoelectric nanopositioning system based on the integral extended state observer (IESO) technique. Its uniqueness is that a global robustness property is generated in the whole precision motion control process, which effectively circumvents sensitivity to the perturbations during the reaching phase. First, to generate an accurate disturbance estimation for compensation control, the IESO is constructed by incorporating an integral action into the observer design. Then, a global sliding mode control approach is developed for the nanopositioning system to ensure global robustness against the unknown hysteresis nonlinearity and crossaxis coupling motion. Moreover, an adaptive rule is established for the global sliding mode controller, which does not require a priori knowledge of the estimation error, hysteresis, and crosscoupling nonlinearity in the control design. Experimental studies are conducted to demonstrate the effectiveness and superiority of the proposed motion control scheme over existing ones.


TuAT6	301
Micro/Nano Robots 1	Regular Session
Chair: Wang, Huaping	Beijig Institute of Technology
Co-Chair: Wang, Yue	Shanghai University

10:30-10:35, Paper TuAT6.1
An Intelligent Skeleton Based on Liquid Metal for Biohybrid Actuator Powered by Muscle

Lu, Xiaoqi	Shanghai University
Zhang, Yuyin	Shanghai University
Gan, Yuanjie	Shanghai University
Gao, Shen	Shanghai University
Wang, Yue	Shanghai University
Liu, Na	Shanghai University, Shanghai, China
Yue, Tao	Shanghai University
Keywords: Micro/Nano Robots, Soft Sensors and Actuators, Soft Robot Materials and Design Abstract: Biological machines that use biological cells and soft materials in combination to obtain a sense of the environment driven by bioenergy and generate driving force are called biohybrid actuators. With the development of tissue engineering and organoid technology, researchers have applied biohybrid actuators technology to the research of precision medicine and targeted drug delivery, but the research on feedback and evaluation of biohybrid actuation performance is limited to visual and simulation calculations. Therefore, we hope to develop an intelligent crawling skeleton for sensing function, which can be used to evaluate the actuation ability of muscle actuators, and eventually realize the high-precision control of biohybrid actuators. In this work, an intelligent crawling skeleton based on three-dimensional liquid metal is proposed to detect and feedback the crawling of C2C12 muscle actuators. Three-dimensional muscle tissue was composed of mixing hydrogels and cells, and the functionalization of muscle rings was promoted using static mechanical forces and external electric field stimulation. The composite crawling skeleton is fabricated by inverting mold and soft lithography technology. The skeleton can adapt to large deformations above 90 degrees and is more sensitive to deformations by adjusting materials with different elastic modulus. Inspired by the tendon-bone structure, the intelligent crawling skeleton can obtain the deformation degree of the biohybrid actuator in the crawling process according to the characteristics of the deformation from the muscle tissue, and put forward a good idea for the feedback and closed-loop control of the biohybrid actuators.

10:35-10:40, Paper TuAT6.2
Design of DNA Origami-Engineered Tetrahedral Nanorobots

Chen, Haowen	Beijing Institute of Technology
Liu, Fengyu	Beijing Institute of Technology
Huang, Qiang	Beijing Institute of Technology
Arai, Tatsuo	University of Electro-Communications
Liu, Xiaoming	Beijing Institute of Technology
Keywords: Micro/Nano Robots, Nanomanufacturing, Automation at Micro-Nano Scales Abstract: DNA nanorobots have emerged as a promising technology in the biomedical field, owing to their nanoscale dimensions, programmable structures, and minimal physiological toxicity. However, existing DNA-based carriers often exhibit limited stability in complex biological environments and lack efficient targeted drug delivery capabilities. Addressing these challenges requires the development of DNA nanorobots with highly controllable and multifunctional designs for precise biomedical applications. This study introduces a conformational transition design of tetrahedral DNA structures, leveraging a multi-resolution molecular dynamics simulation model validated by all-atom molecular modeling. Through this approach, we dynamically analyzed the stability of the nanomechanical structures using simulation-based methods. Utilizing this design strategy, we successfully constructed a deformable tetrahedral DNA nanorobot characterized by high stability and efficient drug delivery capabilities. Furthermore, we demonstrated the precise release of therapeutic agents with low physiological toxicity and high controllability by employing the DNA nanorobot for in vitro targeting of circulating tumor cells (CTCs). These findings highlight the potential of the proposed DNA origami-based nanorobot design method to advance the development of nanorobots in biomedicine and their application in targeted therapeutic interventions.

10:40-10:45, Paper TuAT6.3
On-Chip Dynamic Mechanical Characterization: From Cells to Nucleus

Ge, Jingjin	Beijing Institute of Technology
Chen, Zhuo	Beijing Institute of Technology
Bai, Chenhao	Beijing Institute of Technology
Liu, Fengyu	Beijing Institute of Technology
Li, Yuke	Beijing Institute of Technology
Kojima, Masaru	Osaka University
Huang, Qiang	Beijing Institute of Technology
Arai, Tatsuo	University of Electro-Communications
Liu, Xiaoming	Beijing Institute of Technology
Keywords: Micro/Nano Robots, Medical Robots and Systems Abstract: Traditional single-cell mechanical characterization techniques (e.g., atomic force microscopy) often face limitations in throughput, require invasive labeling, or fail to replicate physiological microenvironments, impeding their clinical utility for rapid cancer cell analysis. To address these limitations for automated characterization of cellular mechanical properties, this study proposes a novel method using microchannels with narrow geometric structures to measure cellular mechanical characteristics. A dynamic mechanical characterization technique with serially connected microchannels simulates malignant tumor cell deformation and migration in vivo, enabling precise identification of three malignant tumor cell lines and three normal cell lines through consecutive compressions. High-speed imaging combined with computer vision and image processing techniques facilitates rapid and accurate automated analysis for tumor cells. Furthermore, this study reveals that the mechanical properties of the cell nucleus determine the overall cellular mechanics, with the differences between tumor and normal cells attributed to variations in nucleus mechanics. This approach shows promise for early cancer diagnosis.

10:45-10:50, Paper TuAT6.4
A Study on the Generation of Single Cell Droplets Via the Combination of Lateral-Field Optoelectronic Tweezers and Electrowetting-On-Dielectric

Zhao, Jiawei	Beihang University, School of Mechanical Engineering and Automati
Huang, Shunxiao	Beihang University
Xiong, Hongyi	Beihang University
Gan, Chunyuan	Beihang University
Ye, Jingwen	Beihang University
Niu, Wenyan	Beihang University
Feng, Lin	Beihang University
Keywords: Micro/Nano Robots, Biological Cell Manipulation, Automation at Micro-Nano Scales Abstract: Microfluidic technology is currently a popular approach in the field of single-cell research, which is used to reveal the heterogeneity among cells. However, most of the existing microfluidic technologies for single-cell research lack the ability to control the microenvironment of single cells after isolating them. In this work, a technology that combines lateral field optoelectronic tweezers (LOET) with electrowetting-on dielectric (EWOD) is used to separate cells into single cells and then encapsulate each single cell within an individual droplet, generating single-cell droplets. More importantly, it also enables the control of the microenvironment of the separated single cells. The driving control of the single- cell droplets is achieved through the EWOD, which has good application prospects in the field of single-cell research.

10:50-10:55, Paper TuAT6.5
Achieving Lift-To-Weight Ratio >3.5 in Piezoelectric Direct-Driven Insect-Scale Flapping-Wing MAVs

Lu, Xiang	National University of Defense Technology
Chen, Jie	National University of Defense Technology
Chen, Yang	National University of Defense Technology
Deng, Zixin	National University of Defense Technology
Wu, Yulie	National University of Defense Technology
Wu, Xuezhong	National University of Defense Technology
Xiao, Dingbang	National University of Defense Technology
Keywords: Micro/Nano Robots, Biologically-Inspired Robots, Mechanism Design Abstract: Insect-scale flapping-wing micro aerial vehicles (FWMAVs) employing piezoelectric direct-drive configurations eliminate traditional kinematic chains through direct coupling of the wing and actuator. While this design approach significantly reduces structural complexity and manufacturing costs compared to transmission-dependent systems, it inherently limits wing stroke amplitude and consequent lift generation. This paper presents a novel lift-enhancement strategy for piezoelectric direct-drive FWMAVs, effectively improving payload capacity through optimized aerodynamic performance. The redesigned X-configuration prototype demonstrates outstanding metrics: 68 mm wingspan with 212 mg total mass achieves 7.47 mN maximum lift (exceeding 3.5:1 lift-to-weight ratio) and 1.25 m/s takeoff speed. Experimental validation confirms 39% payload capacity improvement and 34% lift-to-weight ratio enhancement compared to baseline designs. This enhancement establishes our robot as the current state-of-the-art in piezoelectric direct-drive FWMAVs regarding lift-to-weight ratio.

10:55-11:00, Paper TuAT6.6
Dual-Mode Motion Control of Multi-Stimulus Deformable Miniature Robots with Adaptive Orientation Compensation in Unstructured Environments

Zhong, Shihao	Beijing Institute of Technology
Li, Wenbo	Beijing Institute of Technology
Yang, Haotian	Beijing Institute of Technology
Niu, Zhenyang	Beijing Institute of Technology
Hou, Yaozhen	Beijing Institute of Technoogy
Huang, Qiang	Beijing Institute of Technology
Wang, Huaping	Beijig Institute of Technology
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Soft Robot Materials and Design Abstract: Miniature robots hold great promise for performing micromanipulation tasks within hard-to-reach confined spaces. However, effectively maneuvering across complex and unstructured terrain, achieving adaptive morphogenesis, and developing adaptive multimodal locomotion strategies remain challenges for these robotic systems. Here, we develop a multi-stimulus-responsive deformable miniature robot integrated with an adaptive multimodal motion control method. Sodium alginate hydrogel and graphene-coated magnetic elastomer are integrated into the sheet-shaped robot to enable responsiveness to temperature, humidity, and magnetic fields. A kinematic gait model is designed to control oscillatory motion in the semi-contracted state and rotational motion in the fully contracted state of the miniature robot. To automatically mitigate angular deviation between the robot's motion direction and the intended path, an adaptive orientation compensation control algorithm based on Support Vector Regression (SVR) is proposed. Experimental results demonstrate that the proposed robot exhibits capabilities for flexible and accurate navigation within unstructured environments (e.g., rock piles and stomach models), and is further shown to be capable of cargo transport. The proposed adaptive morphogenesis robots, enabled by dual-mode motion control, hold significant potential for targeted delivery and other micromanipulation applications in complex, unstructured, and confined environments.

11:00-11:05, Paper TuAT6.7
Enhanced Rolling Motion of Magnetic Microparticles by Turning Interface Lubrication

Li, Yuke	Beijing Institute of Technology
Liang, Xiyue	Beijing Institute of Technology
Chen, Zhuo	Beijing Institute of Technology
Liao, Hongzhe	Beijing Institute of Technology
Zhao, Yue	Beijing Institute of Technology
Kojima, Masaru	Osaka University
Huang, Qiang	Beijing Institute of Technology
Arai, Tatsuo	University of Electro-Communications
Liu, Xiaoming	Beijing Institute of Technology
Keywords: Micro/Nano Robots, Medical Robots and Systems, Dynamics Abstract: Micro-nano robots must break the symmetry of the flow field to generate net displacement in the low Reynolds number environment. The spherical micro-robots utilize the frictional forces generated through interaction with the surface. We designed a magnetic microroller robot powered by the rotating AC magnetic field. Here, we employed dual measurements of laser ranging and computer vision to demonstrate that a single 100 μm microroller maintains a lubrication film of 1 to 15 μm with the surface during normal motion. We found that the translational velocity of the microroller is correlated with the lubrication film thickness. Based on the robot's gravity, we controlled an additional downward gradient magnetic field to effectively increase the load of robot and reduce the lubrication film thickness, thereby controllably increasing the translational velocity of the robot. For example, the gradient magnetic field generated by superimposing a 30 mA direct current input can reduce the lubrication film thickness from 8 μm to 4 μm in a 10 Hz rotating magnetic field, and increase the translational velocity from 230 μm/s to 460 μm/s. The enhancement of the robot's motion performance enables it to better control its movement in fluids. Finally, we validated the strategy for controllable acceleration of micro-scale particles rolling on surfaces, applied to control fluid motion in multiple arteries within blood vessels. These results offer deeper insights into the physical motion mechanism of surface robots and hold significant implications for future applications in biomedical engineering.

11:05-11:10, Paper TuAT6.8
On-Demand Motion Conversion of Magnetic Helical Microrobots Using Chemistry and Microstructural-Modified Surface Wettability Modulation

Hou, Yaozhen	Beijing Institute of Technoogy
Bai, Shanming	Beijing Institute of Technology
Siyi, Li	Beijing Institute of Technology
Nie, Ruhao	Beijing Institute of Technology
Du, Jiabao	Beijing Institute of Technology
Huang, Qiang	Beijing Institute of Technology
Wang, Huaping	Beijig Institute of Technology
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Robotics and Automation in Life Sciences Abstract: Magnetic helical microrobots have been widely applicated in environmental remediation, sensing, targeted medical applications, and so on. However, for locomotion and manipulation in unstructured liquid environments, the capabilities of distinguished motions and on-demand parking/starting over a team of microrobot are essential. Here, we propose a method for achieving on-demand motions conversion of helical microrobots by modulating surface wettability through surface chemical modification and surface microstructural modifications. An obvious difference shows that microrobot after chemical modification exhibit hydrophilicity and microrobot after microstructural modification exhibit hydrophobicity, where the latter possess higher moving step-out frequency and maximum forward velocity compared to the microrobot after surface chemical modification. The step-out frequencies and maximum velocities of the three types of microrobots (chemistry-modified, unmodified, and pimples-modified) are 13 Hz, 16 Hz, 22 Hz, and 385 μm/s, 511 μm/s, 649 μm/s. Furthermore, our method has demonstrated that can be employed to achieve effective on-demand targeted motions and modal conversion in liquid environment. We anticipate that the method can be potentially employed to achieve precise targeted drug delivery and surgery in biomedical applications


TuAT7	307
Motion and Path Planning 1	Regular Session
Chair: Tzes, Anthony	New York University Abu Dhabi
Co-Chair: Popovic, Marija	TU Delft

10:30-10:35, Paper TuAT7.1
Dynamic Risk-Aware MPPI for Mobile Robots in Crowds Via Efficient Monte Carlo Approximations

Trevisan, Elia	Delft University of Technology
Mustafa, Khaled	TU Delft
Notten, Godert Christiaan	TU Delft
Wang, Xinwei	Queen Mary University of London
Alonso-Mora, Javier	Delft University of Technology
Keywords: Motion and Path Planning, Collision Avoidance, Robot Safety Abstract: Deploying mobile robots safely among humans requires the motion planner to account for the uncertainty in the other agents' predicted trajectories. This remains challenging in traditional approaches, especially with arbitrarily shaped predictions and real-time constraints. To address these challenges, we propose a Dynamic Risk-Aware Model Predictive Path Integral control (DRA-MPPI), a motion planner that incorporates uncertain future motions modelled with potentially non-Gaussian stochastic predictions. By leveraging MPPI’s gradient-free nature, we propose a method that efficiently approximates the joint Collision Probability (CP) among multiple dynamic obstacles for several hundred sampled trajectories in real-time via a Monte Carlo (MC) approach. This enables the rejection of samples exceeding a predefined CP threshold or the integration of CP as a weighted objective within the navigation cost function. Consequently, DRA-MPPI mitigates the freezing robot problem while enhancing safety. Real-world and simulated experiments with multiple dynamic obstacles demonstrate DRA-MPPI’s superior performance compared to state-of-the-art approaches, including Scenario-based Model Predictive Control (S-MPC), Frenét planner, and vanilla MPPI. Videos of the experiments can be found at https://autonomousrobots.nl/paper_websites/dra-mppi.

10:35-10:40, Paper TuAT7.2
Towards Map-Agnostic Policies for Adaptive Informative Path Planning

Rückin, Julius	University of Bonn
Morilla-Cabello, David	Universidad De Zaragoza
Stachniss, Cyrill	University of Bonn
Montijano, Eduardo	Universidad De Zaragoza
Popovic, Marija	TU Delft
Keywords: Motion and Path Planning, Aerial Systems: Perception and Autonomy, Reinforcement Learning Abstract: Robots are frequently tasked to gather relevant sensor data in unknown terrains. A key challenge for classical path planning algorithms used for autonomous information gathering is adaptively replanning paths online as the terrain is explored given limited onboard compute resources. Recently, learning-based approaches emerged that train planning policies offline and enable computationally efficient online replanning performing policy inference. These approaches are designed and trained for terrain monitoring missions assuming a single specific map representation, which limits their applicability to different terrains. To address this limitation, we propose a novel formulation of the adaptive informative path planning problem unified across different map representations, enabling training and deploying planning policies in a larger variety of monitoring missions. Experimental results validate that our novel formulation easily integrates with classical non-learning-based planning approaches while maintaining their performance. Our trained planning policy performs similarly to state-of-the-art map-specifically trained policies. We validate our learned policy on unseen real-world terrain datasets.

10:40-10:45, Paper TuAT7.3
Vision-Language Guided Adaptive Robot Action Planning: Responding to Intermediate Results and Implicit Human Intentions

Cai, Weihao	Ritsumeikan University
Mori, Yoshiki	The University of Osaka
Shimada, Nobutaka	Ritsumeikan University
Keywords: Motion and Path Planning, AI-Based Methods, Human-Robot Collaboration Abstract: Recent advances in research have demonstrated that Vision-Language Models (VLMs) are a promising technology for robot task planning. This paper presents a novel approach that leverages visual prompts and VLMs to generate feasible robot action sequences for achieving shared tasks through human-robot collaboration while simultaneously estimating human intentions. Our method enhances VLMs’ understanding of the environment by utilizing annotations (bounding boxes and labels) and dynamically infers human intentions based on changing environmental conditions to generate optimal robot action sequences to achieve common goals. Additionally, the system incorporates a mechanism to regenerate new sequences through VLM analysis when action failures or external interference occur. Furthermore, by designing prompts as versatile modules for diverse tasks, our proposed technology offers a new approach to robot action planning that excels in both efficiency and adaptability.

10:45-10:50, Paper TuAT7.4
Causal-Planner: Causal Interaction Disentangling with Episodic Memory Gating for Autonomous Planning

Yuan, Yibo	Xi'an Jiaotong University
Fang, Jianwu	Xian Jiaotong University
Zhou, Yang	Xi'an Jiaotong University
Yang, Zhao	Xi'an Jiaotong University
Lv, Chen	Nanyang Technological University
Xue, Jianru	Xi'an Jiaotong University
Keywords: Motion and Path Planning, Autonomous Agents, Imitation Learning Abstract: Autonomous vehicle trajectory planning faces significant challenges in dynamic traffic environments due to the complex and mixed causal relationships between critical scene elements (e.g., pedestrians, vehicles, road markings) and safe decision-making. To identify the causal factors influencing planning outcomes, we propose Causal-Planner, which disentangles the scene interaction graph into causal and confounding components via attention-based adversarial graph learning. Additionally, we introduce a long-short-term episodic memory gating (LSTEM) module that enhances causal interaction disentangling by adaptively capturing evolving causal relationships in dynamic scenarios through bidirectional gated memory fusion. Extensive experiments on the nuPlan dataset suggest that Causal-Planner achieves competitive performance, performing well in both Test-random and Test-hard scenarios under open-loop and closed-loop evaluations.

10:50-10:55, Paper TuAT7.5
Motion Planning and Control with Unknown Nonlinear Dynamics through Predicted Reachability

Zhang, Zhiquan	University of Illinois-Urbana Champaign
Puthumanaillam, Gokul	University of Illinois Urbana-Champaign
Vora, Manav	University of Illinois Urbana-Champaign
Ornik, Melkior	University of Illinois Urbana-Champaign
Keywords: Motion and Path Planning, Autonomous Agents, Planning, Scheduling and Coordination Abstract: Autonomous motion planning under unknown nonlinear dynamics presents significant challenges. An agent needs to continuously explore the system dynamics to acquire its properties, such as reachability, in order to guide system navigation adaptively. In this paper, we propose a hybrid planning-control framework designed to compute a feasible trajectory toward a target. Our approach involves partitioning the state space and approximating the system by a piecewise affine (PWA) system with constrained control inputs. By abstracting the PWA system into a directed weighted graph, we incrementally update the existence of its edges via affine system identification and reach control theory, introducing a predictive reachability condition by exploiting prior information of the unknown dynamics. Heuristic weights are assigned to edges based on whether their existence is certain or remains indeterminate. Consequently, we propose a framework that adaptively collects and analyzes data during mission execution, continually updates the predictive graph, and synthesizes a controller online based on the graph search outcomes. We demonstrate the efficacy of our approach through simulation scenarios involving a mobile robot operating in unknown terrains, with its unknown dynamics abstracted as a single integrator model.

10:55-11:00, Paper TuAT7.6
BSSM: GPU-Accelerated Point-Cloud Distance Metric for Motion Planning

Gonçalves, Vinicius Mariano	Federal University of Minas Gerais, UFMG, Brazil
Krishnamurthy, Prashanth	New York University Tandon School of Engineering
Tzes, Anthony	New York University Abu Dhabi
Khorrami, Farshad	New York University Tandon School of Engineering
Keywords: Motion and Path Planning, Collision Avoidance, Computational Geometry Abstract: We propose the BSSM: Point-Cloud based (B)iased (S)igned (S)mooth (M)etric, which is used to compute a distance metric between a manipulator and its environment. Unlike many methods that requires that the environment is modeled using simple geometric primitives such as spheres, boxes, and cylinders, our proposed metric directly utilizes point clouds. The proposed metric has properties of being smooth (infinitely differentiable), signed (yielding non-zero values upon overlap), and biased. The latter is a novel feature that, as demonstrated by our simulation results, offers advantages for motion planning. This metric is suitable for GPU parallelization and simulation studies and a real experiment are offered to investigate its benefits.

11:00-11:05, Paper TuAT7.7
Neural Configuration Distance Function for Continuum Robot Control

Long, Kehan	University of California at San Diego
Parwana, Hardik	University of Michigan
Fainekos, Georgios	Toyota NA-R&D
Hoxha, Bardh	Toyota Research Institute of North America
Okamoto, Hideki	Toyota Motor North America
Atanasov, Nikolay	University of California, San Diego
Keywords: Motion and Path Planning, Collision Avoidance, Motion Control Abstract: This paper presents a novel method for modeling the shape of a continuum robot as a Neural Configuration Euclidean Distance Function (N-CEDF). By learning separate distance fields for each link and combining them through the kinematics chain, the learned N-CEDF provides an accurate and computationally efficient representation of the robot's shape. The key advantage of a distance function representation of a continuum robot is that it enables efficient collision checking for motion planning in dynamic and cluttered environments, even with point-cloud observations. We integrate the N-CEDF into a Model Predictive Path Integral (MPPI) controller to generate safe trajectories for multi-segment continuum robots. The proposed approach is validated for continuum robots with various links in several simulated environments with static and dynamic obstacles.

11:05-11:10, Paper TuAT7.8
Reactive Model Predictive Contouring Control for Robot Manipulators

Yoon, Junheon	Seoul National University
Baek, Woo-Jeong	Seoul National University
Park, Jaeheung	Seoul National University
Keywords: Motion and Path Planning, Collision Avoidance, Optimization and Optimal Control Abstract: This contribution presents a robot path-following framework via Reactive Model Predictive Contouring Control (RMPCC) that successfully avoids obstacles, singularities and self-collisions in dynamic environments at 100 Hz. Many path-following methods rely on the time parametrization, but struggle to handle collision and singularity avoidance while adhering kinematic limits or other constraints. Specifically, the error between the desired path and the actual position can become large when executing evasive maneuvers. Thus, this paper derives a method that parametrizes the reference path by a path parameter and performs the optimization via RMPCC. In particular, Control Barrier Functions (CBFs) are introduced to avoid collisions and singularities in dynamic environments. A Jacobian-based linearization and Gauss-Newton Hessian approximation enable solving the nonlinear RMPCC problem at 100 Hz, outperforming state-of-the-art methods by a factor of 10. Experiments confirm that the framework handles dynamic obstacles in real-world settings with low contouring error and low robot acceleration.


TuAT8	308
Medical Robots and Systems 1	Regular Session
Chair: Ren, Hongliang	Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS)

10:30-10:35, Paper TuAT8.1
Development of a Novel Miniaturized Dexterous Manipulator with Variable Stiffness for NOTES

Cong, Rong	Tsinghua University
Wu, Xipeng	Beijing Institute of Technology
Qian, Chao	Beijing Institute of Technology
Zhang, Kaijie	Beijing Institute of Technology
Duan, Xing-guang	Intelligent Robotics Institute, Beijing InstituteofTechnology
Li, Changsheng	Beijing Institute of Technology
Keywords: Flexible Robotics, Medical Robots and Systems, Mechanism Design Abstract: Natural Orifice Transluminal Endoscopic Surgery (NOTES) holds great promise due to its ability to eliminate external incisions, reduce trauma, and accelerate recovery. However, the adoption of NOTES is hindered by the limited capabilities of existing instruments, particularly in achieving the required balance between compact size, dexterity, and load capacity. This paper introduces a novel robotic manipulator designed for NOTES, featuring a 5 mm diameter and 7 degrees of freedom (DoF). The manipulator incorporates an innovative 3-PRS flexible parallel mechanism combined with a continuum parallel structure, achieving enhanced dexterity and variable stiffness functionality within a miniaturized design. A kinematic and variable stiffness analysis is performed, and experimental validation demonstrates its bending performance and stiffness modulation. Additionally, the feasibility and practicality of the robotic system are confirmed through a peg-transfer experiment, proving its potential for real-world surgical applications. This research offers a viable solution for enhancing the performance of NOTES instruments.

10:35-10:40, Paper TuAT8.2
Design of a Soft Automatic Anchoring System for Enhanced Mobility and Stability in Colonoscopy Robots

Liang, Yiying	Tsinghua University
Wang, Xuchen	National University of Singapore
Shu, Jing	The Chinese University of Hong Kong
Zhang, Huayu	Southeast University
Zhu, Puchen	The Chinese University of Hong Kong
Xia, Xianfeng	Chow Yuk Ho Technology Centre for Innovative Medicine, the Chine
Ma, Xin	Chinese Univerisity of HongKong
Keywords: Medical Robots and Systems, Flexible Robotics Abstract: Ensuring both mobility and stability during colonoscopy is crucial for enhancing procedural efficiency and reducing the risk of complications. Traditional colonoscopy robots face challenges due to the fixed diameter of the colonoscope and the variability in colon anatomy. To address this, we propose a soft automatic anchoring system (SAAS) to enhance mobility and stability in colonoscopy robots. The SAAS features proximal and distal soft balloon anchors, employing origami principles to achieve a 42.2% higher maximum expansion capability compared to conventional flat surface anchors. With real-time pressure feedback, the SAAS automatically anchors upon contact detection, ensuring precise anchoring without over-inflation while adapting to colon diameter variations and robot posture changes. Experimental results show a tenfold reduction in displacement during the stability test, significantly enhancing the robot's performance under external loads. Performance comparison tests in phantom further demonstrated notable improvements in the efficiency of colonoscopy procedures using the SAAS. This system has the potential to greatly enhance the safety, precision, and overall efficiency of colonoscopy, offering substantial benefits for both medical practitioners and patient outcomes.

10:40-10:45, Paper TuAT8.3
Head-Mounted Robotic Needle Positioning: Learning from Augmented Reality Demonstration of Neuronavigation and Planning

Fang, Zhiwei	The Chinese University of Hong Kong
Hung, Hok Man	The Chinese University of Hong Kong
Gao, Huxin	National University of Singapore
Ren, Hongliang	Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS)
Keywords: Medical Robots and Systems, Virtual Reality and Interfaces Abstract: Robotic needle positioning tasks in neurosurgery often face challenges due to insufficient perception of planar guidance images during surgery. In this work, we propose an Augmented Reality (AR) interface to help perform the robotic needle positioning tasks by learning from demonstration (LfD). Enhanced immersion in the workflow is achieved by displaying surgical scenes and calculated navigation information. The framework utilizes mixed interactive interfaces in virtual and real environments, enhancing demonstration efficiency and quality. A head-mounted display and an optical tracking system are utilized to perform the visualization and needle tracking. Gaussian Mixture Model (GMM) and Gaussian Mixture Re- gression (GMR) are employed to learn a robust and smooth trajectory policy from demonstrations. Experiments on robot reproduction of the needle positioning task achieved a final positioning error of 0.6 mm and an average trajectory error of 1.07 mm. Comparative user studies with haptic device-based teleoperation exhibit a low completion time of 62.76 s and reduced workload of the proposed system.

10:45-10:50, Paper TuAT8.4
An Inflatable Soft Robotic Manipulator with Decoupled Dual-Wrist Design for Advanced Endoscopy

Lou, Hanqi	Imperial College London
Yang, Jianlin	Nanjing University of Aeronautics and Astronautics
Zhou, Zhangxi	Imperial College London
Chen, Junhong	Imperial College London
Runciman, Mark	Imperial College London
Mylonas, George	Imperial College London
Keywords: Medical Robots and Systems, Tendon/Wire Mechanism, Soft Robot Materials and Design Abstract: Minimally Invasive Surgery has advanced surgical practice, yet early-stage gastrointestinal cancer treatment remains challenging. Endoscopic Submucosal Dissection offers a solution but faces maneuverability constraints in complex anatomical environments. This paper presents a novel inflatable soft robotic manipulator with a biocompatible thin-film shell and tendon-driven antagonistic actuation. The robot remains compact at 6.5 mm of diameter, expanding to 11.7 mm to enhance stiffness for force exertion and precise manipulation. Featuring two decoupled wrists with four degrees of freedom, it enables dexterous motion for advanced endoscopic procedures. The study details design, fabrication, actuation modeling, workspace evaluation, and simulated retraction experiments in a constrained environment. Results demonstrate high repeatability, high master-slave control accuracy, effective workspace utilization, and feasibility for endoluminal applications, enhancing robotic-assisted endoscopic procedures with improved dexterity and adaptability under soft actuation constraints.

10:50-10:55, Paper TuAT8.5
Augmented Bridge Spinal Fixation: A New Concept for Addressing Pedicle Screw Pullout Via a Steerable Drilling Robot and Flexible Pedicle Screws

Kulkarni, Yash	The University of Texas at Austin
Sharma, Susheela	Vanderbilt University
Rezayof, Omid	University of Texas at Austin
Kapuria, Siddhartha	University of Texas at Austin
Amadio, Jordan P.	University of Texas Dell Medical School
Khadem, Mohsen	University of Edinburgh
Tilton, Maryam	University of Texas at Austin
Alambeigi, Farshid	University of Texas at Austin
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles Abstract: To address the screw loosening and pullout limitations of rigid pedicle screws in spinal fixation procedures, and to leverage our recently developed Concentric Tube Steerable Drilling Robot (CT-SDR) and Flexible Pedicle Screw (FPS), in this paper, we introduce the concept of Augmented Bridge Spinal Fixation (AB-SF). In this concept, two connecting J- shape tunnels are first drilled through pedicles of vertebra using the CT-SDR. Next, two FPSs are passed through this tunnel and bone cement is then injected through the cannulated region of the FPS to form an augmented bridge between two pedicles and reinforce strength of the fixated spine. To experimentally analyze and study the feasibility of AB-SF technique, we first used our robotic system (i.e., a CT-SDR integrated with a robotic arm) to create two different fixation scenarios in which two J-shape tunnels, forming a bridge, were drilled at different depth of a vertebral phantom. Next, we implanted two FPSs within the drilled tunnels and then successfully simulated the bone cement augmentation process.

10:55-11:00, Paper TuAT8.6
Uncertainty-Aware Shared Control for Vision-Based Micromanipulation

Tian, Huanyu	Kings College London
Huber, Martin	King's College London
Zeng, Lingyun	King's College London
Han, Zhe	Beijing Institute of Technology
Bennett, Wayne	Conceivable Life Sciences
Silvestri, Giuseppe	Conceivable Life Sciences
Chavez-Badiola, Alejandro	Conceivable Life Sciences
Mendizabal-Ruiz, Gerardo	Conceivable Life Sciences
Bergeles, Christos	King's College London
Keywords: Medical Robots and Systems, Physical Human-Robot Interaction, Computer Vision for Medical Robotics Abstract: This paper presents an uncertainty-aware shared control and calibration method for micromanipulation using a digital microscope and a tool-mounted, multi-joint robotic arm, integrating real-time human intervention with a visual-motor policy. Our calibration algorithm leverages co-manipulation control to calibrate the hand-eye relationship without requiring knowledge of the kinematics of the microtool mounted on the robot while remaining robust to camera intrinsics errors.Experimental results show that the proposed calibration method achieves a 39.6% improvement in accuracy over established methods. Additionally, our control structure and calibration method reduces the time required to reach single-point targets from 5.74 s (best conventional method) to 1.91 s, and decreases trajectory tracking errors from 392 um to 40 um. These findings establish our method as a robust solution for improving reliability in high-precision biomedical micromanipulation.

11:00-11:05, Paper TuAT8.7
CapsDT: Diffusion-Transformer for Capsule Robot Manipulation

He, Xiting	Chinese University of Hong Kong
Su, Mingwu	The Chinese University of Hongkong
Jiang, Xinqi	The Chinese University of Hong Kong
Bai, Long	Alibaba DAMO Academy
Ren, Hongliang	Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS)
Keywords: Medical Robots and Systems, Machine Learning for Robot Control, Vision-Based Navigation Abstract: Vision-Language-Action (VLA) models have emerged as a prominent research area, showcasing significant potential across a variety of applications. However, their performance in endoscopy robotics, particularly endoscopy capsule robots that perform actions within the digestive system, remains unexplored. The integration of VLA models into endoscopy robots allows more intuitive and efficient interactions between human operators and medical devices, improving both diagnostic accuracy and treatment outcomes. In this work, we design CapsDT, a Diffusion Transformer model for capsule robot manipulation in the stomach. By processing interleaved visual inputs, and textual instructions, CapsDT can infer corresponding robotic control signals to facilitate endoscopy tasks. In addition, we developed a capsule endoscopy robot system, a capsule robot controlled by a robotic arm-held magnet, addressing different levels of four endoscopy tasks and creating corresponding capsule robot datasets within the stomach simulator. Comprehensive evaluations on various robotic tasks indicate that CapsDT can serve as a robust vision-language generalist, achieving state-of-the-art performance in various levels of endoscopy tasks while achieving a 26.25% success rate in real-world simulation manipulation.

11:05-11:10, Paper TuAT8.8
Toward Safer GI Endoscopy with a Novel Robot-Assisted Endoscopic System

Feng, Guang	Tianjin University
Wang, Sai	Tianjin University
Li, Jinhua	Tianjin University
Zuo, Siyang	Tianjin University
Keywords: Medical Robots and Systems, Modeling, Control, and Learning for Soft Robots, Robot Safety Abstract: Increased demand for minimally invasive surgery has accelerated the adoption of natural oriﬁce transluminal endoscopic surgery. Meanwhile, the increasing complexity of endoscopic procedures has raised the demand for endoscopist competence. In this paper, we present a novel endoscopic system to perform robotic intervention with a flexible endoscope to enhance the safety and efficiency of endoscopy. The system is composed of an endoscopic manipulation module (EMM) for control of endoscopic interventions, a 6 DOFs robotic arm for adjusting the pose of EMM, and a haptic interface for tele-control. The compact system enables endoscopists to remotely and stably operate flexible endoscopes with one hand. Mas-ter-slave control strategy with active constraint is proposed to guide the motion of the flexible endoscope within the safe working space. The proposed system is validated by recruiting subjects to perform target area localization and endoscopic examinations through highly realistic upper digestive tract phantom. The system can reduce the time of target area localization by at least 10% compared to manual control. The mean angle errors can be reduced by 75% with the help of the constraint force. Results demonstrate the potential clinical value of the system in efficient operation and safety control.


TuAT9	309
Computer Vision Applications	Regular Session
Co-Chair: Caporali, Alessio	University of Bologna

10:30-10:35, Paper TuAT9.1
On the Benefits of Visual Stabilization for Frame and Event-Based Perception

Rodriguez-Gomez, Juan Pablo	Leonardo S.p.a
Martinez-de Dios, J.R.	University of Seville
Ollero, Anibal	AICIA. G41099946
Gallego, Guillermo	Technische Universität Berlin
Keywords: Computer Vision for Automation, Biologically-Inspired Robots Abstract: Vision-based perception systems are typically exposed to large orientation changes in different robot applications. In such conditions, their performance might be compromised due to the inherit complexity of processing data captured under challenging motion. Integration of mechanical stabilizers to compensate for the camera rotation is not always possible due to the robot payload constraints. This paper presents a processing-based stabilization approach to compensate for the camera’s rotational motion both on events and on frames(i.e., images). Assuming that the camera’s attitude is available, we evaluate the benefits of stabilization in two perception applications: feature tracking and estimating the translation component of the camera’s ego-motion. The validation is performed using synthetic data and sequences from well-known event-based vision datasets. The experiments unveil that stabilization can improve feature tracking and camera ego-motion estimation accuracy in 27.37% and 34.82%, respectively. Concurrently, stabilization can reduce the processing time of computing the camera’s linear velocity by at least 25%.

10:35-10:40, Paper TuAT9.2
GNN Topology Representation Learning for Deformable Multi-Linear Objects Dual-Arm Robotic Manipulation (I)

Caporali, Alessio	University of Bologna
Galassi, Kevin	Università Di Bologna
Zanella, Riccardo	Universita' Degli Studi Di Bologna
Palli, Gianluca	University of Bologna
Keywords: Computer Vision for Automation, Perception for Grasping and Manipulation, Dual Arm Manipulation Abstract: Deformable Multi-Linear Objects (DMLOs), or Branched Deformable Linear Objects (BDLOs), are flexible objects that possess a linear structure similar to DLOs but also feature branching or bifurcation points where the object's path diverges into multiple sections. The representation of complex DMLOs, such as wiring harnesses, poses significant challenges in various applications, including robotic systems' perception and manipulation planning. This paper proposes an approach to address the robust and efficient estimation of a topological representation for DMLOs leveraging a graph-based description of the scene obtained via graph neural networks. Starting from a binary mask of the scene, graph nodes are sampled along the objects' estimated centerlines. Then, a data-driven pipeline is employed to learn the assignment of graph edges between nodes and to characterize the node's type based on their local topology and orientation. Finally, by utilizing the learned information, a solver combines the predictions and generates a coherent representation of the objects in the scene. The approach is experimentally evaluated using a test set of complex real-world DMLOs. Within an offline evaluation, the proposed approach achieves a Dice score exceeding 90% in predicting graph edges. Similarly, the identification accuracy of branch and intersection points in the graph topology is above 90%. Additionally, the method demonstrates efficient performance, achieving a runtime of over 20 FPS. In an online assessment employing a dual-arm robotic setup, the approach is successfully applied to disentangle three automotive wiring harnesses, demonstrating the effectiveness of the proposed approach in a real-world scenario.

10:40-10:45, Paper TuAT9.3
MambaXCTrack: Mamba-Based Tracker with SSM Cross-Correlation and Motion Prompt for Ultrasound Needle Tracking

Zhang, Yuelin	CUHK
Lei, Long	Shenzhen Institute of Advanced Technology, Chinese Academy of Sc
Yan, Wanquan	The Chinese University of HongKong
Zhang, Tianyi	Duke Kunshan University
Tang, Raymond Shing-Yan	The Chinese University of Hong Kong, Department of Medicine And
Cheng, Shing Shin	The Chinese University of Hong Kong
Keywords: Computer Vision for Medical Robotics, Deep Learning Methods, Medical Robots and Systems Abstract: Ultrasound (US)-guided needle insertion is widely employed in percutaneous interventions. However, providing feedback on the needle tip position via US image presents challenges due to noise, artifacts, and the thin imaging plane of US, which degrades needle features and leads to intermittent tip visibility. In this paper, a Mamba-based US needle tracker MambaXCTrack utilizing structured state space models cross-correlation (SSMX-Corr) and implicit motion prompt is proposed, which is the first application of Mamba in US needle tracking. The SSMX-Corr enhances cross-correlation by long-range modeling and global searching of distant semantic features between template and search maps, benefiting the tracking under noise and artifacts by implicitly learning potential distant semantic cues. By combining with cross-map interleaved scan (CIS), local pixel-wise interaction with positional inductive bias can also be introduced to SSMX-Corr. The implicit low-level motion descriptor is proposed as a non-visual prompt to enhance tracking robustness, addressing the intermittent tip visibility problem. Extensive experiments on a dataset with motorized needle insertion in both phantom and tissue samples demonstrate that the proposed tracker outperforms other state-of-the-art trackers while ablation studies further highlight the effectiveness of each proposed tracking module.

10:45-10:50, Paper TuAT9.4
LiVeDet: Lightweight Density-Guided Adaptive Transformer for Online On-Device Vessel Detection

Zhang, Zijie	Tongji University
Fu, Changhong	Tongji University
Cao, Yongkang	Tongji University
Li, Mengyuan	Tongji University
Zuo, Haobo	University of Hong Kong
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception, Intelligent Transportation Systems Abstract: Vision-based online vessel detection boosts the automation of waterways monitoring, transportation management and navigation safety. However, a significant gap exists in on-device deployment between general high-performance PCs/servers and embedded AI processors. Existing state-of-the-art (SOTA) online vessel detectors lack sufficient accuracy and are prone to high latency on the edge AI camera, especially in scenarios with dense vessels and diverse distributions. To solve the above issues, a novel lightweight framework with density-guided adaptive Transformer (LiVeDet) is proposed for the edge AI camera to achieve online on-device vessel detection. Specifically, a new instance-aware representation extractor is designed to suppress cluttered background noise and capture instance-aware content information. Additionally, an innovative vessel distribution estimator is developed to direct superior feature representation learning by focusing on local regions with varying vessel density. Besides, a novel dynamic region embedding is presented to integrate hierarchical features represented by multi-scale vessels. A new benchmark comprising 100 high-definition, high-frame rate video sequences from vessel-intensive scenarios is established to evaluate the efficacy of vessel detectors under challenging conditions prevalent in dynamic waterways. Extensive evaluations on this challenging benchmark demonstrate the robustness and efficiency of LiVeDet, achieving 32.9 FPS on the edge AI camera. Furthermore, real-world applications confirm the practicality of the proposed method.

10:50-10:55, Paper TuAT9.5
EvTTC: An Event Camera Dataset for Time-To-Collision Estimation

Sun, Kaizhen	Hunan University
Li, Jinghang	Hunan University
Dai, Kuan	Hunan University
Liao, Bangyan	Westlake University
Xiong, Wei	Xidi Zhijia (Hunan) Co., Ltd
Zhou, Yi	Hunan University
Keywords: Computer Vision for Transportation, Collision Avoidance, Data Sets for Robotic Vision Abstract: Time-to-Collision (TTC) estimation lies in the core of the forward collision warning (FCW) functionality, which is key to all Automatic Emergency Braking (AEB) systems. Although the success of solutions using frame-based cameras (e.g., Mobileye's solutions) has been witnessed in normal situations, some extreme cases, such as the sudden variation in the relative speed of leading vehicles and the sudden appearance of pedestrians, still pose significant risks that cannot be handled. This is due to the inherent imaging principles of frame-based cameras, where the time interval between adjacent exposures introduces considerable system latency to AEB. Event cameras, as a novel bio-inspired sensor, offer ultra-high temporal resolution and can asynchronously report brightness changes at the microsecond level. To explore the potential of event cameras in the above-mentioned challenging cases, we propose EvTTC, which is, to the best of our knowledge, the first multi-sensor dataset focusing on TTC tasks under high-relative-speed scenarios. EvTTC consists of data collected using standard cameras and event cameras, covering various potential collision scenarios in daily driving and involving multiple collision objects. Additionally, LiDAR and GNSS/INS measurements are provided for the calculation of ground-truth TTC. Considering the high cost of testing TTC algorithms on full-scale mobile platforms, we also provide a small-scale TTC testbed for experimental validation and data augmentation. All the data and the design of the testbed are open sourced, and they can serve as a benchmark that will facilitate the development of vision-based TTC techniques.

10:55-11:00, Paper TuAT9.6
WeatherDG: LLM-Assisted Procedural Weather Generation for Domain-Generalized Semantic Segmentation

Qian, Chenghao	University of Leeds
Guo, Yuhu	Carnegie Mellon University
Mo, Yuhong	Carnegie Mellon University
Li, Wenjing	University of Leeds
Keywords: Computer Vision for Transportation, Intelligent Transportation Systems, Deep Learning for Visual Perception Abstract: In this work, we propose a novel approach, namely WeatherDG, that can generate realistic, weather-diverse, and driving-screen images based on the cooperation of two foundation models, i.e., Stable Diffusion (SD) and Large Language Model (LLM). Specifically, we first fine-tune the SD with source data, aligning the content and layout of generated samples with real-world driving scenarios. Then, we propose a procedural prompt generation method based on LLM, which can enrich scenario descriptions and help SD automatically generate more diverse, detailed images. In addition, we introduce a balanced generation strategy, which encourages the SD to generate high-quality objects of tailed classes under various weather conditions, such as riders and motorcycles. This segmentation-model-agnostic method can improve the generalization ability of existing models by additionally adapting them with the generated synthetic data. Experiments on three challenging datasets show that our method can significantly improve the segmentation performance of different state-of-the-art models on target domains. Notably, in the setting of ''Cityscapes to ACDC'', our method improves the baseline HRDA by 13.9% in mIoU.

11:00-11:05, Paper TuAT9.7
Autonomous Hyperspectral Characterisation Station: Robot Aided Measuring of Polymer Degradation (I)

Azizi, Shayan	RMIT University
Asadi, Ehsan	RMIT University
Howard, Shaun	CSIRO
Muir, Benjamin Ward	CSIRO
O'shea, Riley	CSIRO
Bab-Hadiashar, Alireza	RMIT UNIVERSITY
Keywords: Computer Vision for Automation, Reactive and Sensor-Based Planning, Task Planning Abstract: This paper addresses a gap between the capabilities and utilisation of robotics and automation in laboratory settings and builds upon the concept of Self-Driving Labs (SDL). We introduce an innovative approach to the temporal characterisation of materials. The article discusses the challenges posed by manual methods involving established laboratory equipment and presents an automated hyperspectral characterisation station. This station integrates robot-aided hyperspectral imaging (HSI), complex material characterisation modelling, and automated data analysis, offering a non-destructive and comprehensive approach. This work explains how the proposed assembly can automatically measure the half-life of biodegradable polymers with higher throughput and accuracy than manual methods. The investigation explores the effect of pH, number of average molecular weight (Mn), end groups, and blends on the degradation rate of polylactic acid (PLA). The novel contributions of the paper lie in introducing an adaptable classification station for characterisation and presenting an innovative methodology for polymer degradation rate measurements. The proposed system holds promise for expediting the development of high-throughput screening and characterisation methods within advanced material and chemistry laboratories. Note to Practitioners—The characterisation and classification of materials hold significant importance within the realms of materials science, chemistry, manufacturing, and the circular economy. We introduce an innovative approach by employing robotics to automate hyperspectral imaging and processing to accelerate and improve material characterisation within a laboratory environment. This methodology facilitates the time-dependent characterisation of materials. This automated system encompasses sample manipulation and handling for hyperspectral scanning, accompanied by automated image and data processing procedures to minimise manual interventions effectively. The developed system can be seamlessly integrated into lab settings, offering objective, accurate, and automated measurement of biodegradable polymer degradation rates.

11:05-11:10, Paper TuAT9.8
Decompose-Compose Feature Augmentation for Imbalanced Crack Recognition in Industrial Scenarios (I)

Chen, ZhuangZhuang	Shenzhen University
Xu, Chengqi	University of Southern California
Hu, Tao	Shenzhen University
Wang, Li	Shenzhen University
Chen, Jie	Shenzhen University
Li, Jianqiang	Shenzhen University,
Keywords: Computer Vision for Manufacturing, Computer Vision for Transportation Abstract: Automated crack recognition has achieved remarkable progress in the past decades as a critical task in structure health monitoring, to ensure safety and durability in many industrial scenarios. However, imbalanced crack recognition remains challenging due to the scarcity of crack samples and the consequential limited diversity. To resolve this, Artificial Intelligence Generated Content (AIGC) has been gradually adopted to generate synthetic data and reduce reliance on large amounts of labeled crack samples. This paper assumes that a crack sample in the feature space can be regarded as a combination of crack and background semantics. Then, the decompose-compose feature augmentation framework (DeCo) is proposed to perform crack data synthesis in the feature space by randomly composing crack and background semantic-relevant features. Specifically, the contrastive learning-based decomposing loss is proposed to enforce two encoders to separately learn crack and background semantics from crack samples with a theoretical guarantee. After that, an effective cross-instance feature union strategy is proposed to synthesize diverse crack samples by composing the crack-relevant features from a crack sample and background-relevant features across other training samples. Experimental results show that DeCo performs favorably against state-of-the-art competitors in imbalanced crack recognition tasks.


TuAT10	310
Computer Vision for Medical Robotics	Regular Session
Chair: Iordachita, Ioan Iulian	Johns Hopkins University
Co-Chair: Li, Shuai	University of Florida

10:30-10:35, Paper TuAT10.1
Unsupervised Liver Deformation Correction Network Using Optimal Transport for Image-Guided Liver Surgery

Liu, Mingyang	Shandong University
Li, Geng	Shandong University
Yu, Hao	Shandong University
Du, Xinzhe	Shandong University
Song, Rui	Shandong University
Li, Yibin	Shandong University
Meng, Max Q.-H.	The Chinese University of Hong Kong
Min, Zhe	University College London
Keywords: Computer Vision for Medical Robotics, Medical Robots and Systems Abstract: In this paper, we propose a novel unsupervised liver deformation correction method, Learning Coherent Point Drift Network (LCNet), for image-guided liver surgery (IGLS). In order to address the complete to partial registration problem, we first crop the preoperative liver point cloud to obtain the overlapping regions with the intraoperative point cloud. The cropped preoperative point set is then fed into the subsequent registration network for alignment. Secondly, we establish reliable correspondences between two point sets using the optimal transport (OT) module by leveraging both original points and learnt features, which are robust to the rigid transformation. Finally, we compute the displacements by solving the involved matrix equation in the Transformation module, where the point localisation noise is explicitly considered. In addition, we present three variants of the proposed approach, i.e., LCNet, LCNet-ED and LCNet-WD, where LCNet outperforms the other two, which demonstrates the superiority of the Chamfer loss. We have extensively evaluated LCNet on the MedShapeNet dataset consisting of 615 different liver shapes of real patients, and 3Dircadb consisting 20 another liver models of real patients. For example, when the overlap ratio is 25%, the deformation magnitude is 8 mm, the maximum noise magnitude is 2 mm and the rotation angle lies in the range of [-45^{circ},45^{circ}], LCNet achieves a root-mean-square error (RMSE) value being 3.21 mm on MedShapeNet, outperforming Lepard and RoITr, which are 5.41 mm (p<0.001) and 4.90 mm (p<0.001) respectively. Extensive experimental results, under different deformation and noise levels, demonstrate that LCNet exhibits significant improvements over existing state-of-the-art registration methods and holds significant application potential in IGLS.

10:35-10:40, Paper TuAT10.2
A Deep Learning-Driven Autonomous System for Retinal Vein Cannulation: Validation Using a Chicken Embryo Model

Wang, Yi	Johns Hopkins University
Zhang, Peiyao	Johns Hopkins University
Esfandiari, Mojtaba	Johns Hopkins University
Gehlbach, Peter	Johns Hopkins Medical Institute
Iordachita, Ioan Iulian	Johns Hopkins University
Keywords: Computer Vision for Medical Robotics, Medical Robots and Systems, Vision-Based Navigation Abstract: Retinal vein cannulation (RVC) is a minimally invasive microsurgical procedure for treating retinal vein occlusion (RVO), a leading cause of vision impairment. However, the small size and fragility of retinal veins, coupled with the need for high-precision, tremor-free needle manipulation, pose significant challenges for surgeons. These limitations underscore the necessity for robotic assistance to enhance surgical accuracy, stability, and reproducibility. This study presents an automated robotic system with a top-down microscope and cross-sectional B-scan optical coherence tomography (OCT) imaging for precise depth sensing. Deep learning-based models enable real-time needle navigation, contact detection, and vein puncture recognition, using a chicken embryo model, a well-established biological surrogate for human retinal veins. The system autonomously detects needle position and puncture events with 85% accuracy. The experiments demonstrate notable reductions in navigation and puncture times compared to manual methods. Our results demonstrate the potential of integrating advanced imaging and deep learning to automate microsurgical tasks, providing a pathway for safer and more reliable RVC procedures with enhanced precision and reproducibility.

10:40-10:45, Paper TuAT10.3
Deep Coarse-To-Fine Networks for Robust Segmentation and Pose Estimation of Surgical Suturing Threads

Zhou, Xinyao	Shanghai Jiao Tong University
Liu, Yuxuan	Shanghai Jiao Tong University
Zhang, Musen	Shanghai Jiaotong University
Li, Jinkai	Shanghai Jiao Tong University
Guo, Yao	Shanghai Jiao Tong University
Yang, Guang-Zhong	Shanghai Jiao Tong University
Keywords: Computer Vision for Medical Robotics, Surgical Robotics: Laparoscopy Abstract: Autonomous suturing is a critical challenge in robot-assisted surgery, where accurate segmentation and pose estimation of suturing threads are essential prerequisites. However, suturing threads are easily occluded by moving instruments and embedded in deformable tissues which make the task much more challenging. To address this, we propose a coarse-to-fine network for detailed segmentation and pose estimation of suturing threads. The coarse stage aims to capture global thread structure, while the fine stage refines the detailed structure through error residual correction. A spatial context fusion module is incorporated to improve the perception of occluded regions, and weighted balanced cross entropy loss as well as hard sample mining strategy is implemented to enhance small target segmentation performance. To deal with severe occlusions, topological constraints are utilized to effectively identify and reconstruct invisible thread segments. Experiments have been conducted on three datasets collected from different surgical scenes including phantom, endoscopy, and microsurgery. Both quantitative and qualitative results have demonstrated that our proposed framework outperforms baseline methods on segmentation and pose estimation of suturing threads, particularly in detecting occluded threads. Our proposed framework generalizes well across different surgical scenarios, showing its potential for automatic suturing.

10:45-10:50, Paper TuAT10.4
Directed Spatial Consistency-Based Partial-To-Partial Point Cloud Registration with Deep Graph Matching

Zhou, Jingwen	Shandong University
Fu, Kexue	Fudan University
Du, Xinzhe	Shandong University
Song, Rui	Shandong University
Li, Yibin	Shandong University
Meng, Max Q.-H.	The Chinese University of Hong Kong
Min, Zhe	University College London
Keywords: Computer Vision for Medical Robotics, Medical Robots and Systems Abstract: 3D point cloud registration is an essential problem in computer vision, robotics, surgical navigation and augmented reality. Accurate registration of partially overlapped intraoperative point clouds (e.g., femoral reconstruction) remains critical yet challenging in orthopedic navigation due to incomplete overlap and dynamic noise. In this study, we propose a partial-to-partial point cloud registration framework based on directional spatial consistency. First, we extract overlapped areas from partially overlapping point clouds and leverage the point registration graph matching module to calculate the hard point matching matrix. Second, we sample nodes from the source point cloud and generate translation-invariant edge vectors (direction/scale-preserving) via their k-nearest neighbors, guided by predicted point correspondences. This bypasses translation ambiguities by encoding spatial consistency through edges, reducing pose estimation to 3DoF alignment (rotation). The loss explicitly couples point-level matches with edge-level geometric constraints for dual optimization. Building upon this framework, we extract reliable overlapping edge representations and prune their similarity matrix by thresholding low-confidence scores, effectively suppressing spurious matches. The proposed edge-aware matching mechanism further exploits the translation invariance of local structures to refine point correspondences with enhanced accuracy. Finally, we introduce a bidirectional registration mechanism to reinforce optimization stability, achieving state-of-the-art performance across benchmarks. Extensive experiments on ModelNet40, ShapeNet, and MedShapeNet validate our method under diverse scenarios: partial-to-partial, unseen categories, partial-to-full, and cross-dataset generalization, surpassing existing methods in registration accuracy. This code will be publicly released https://github.com/pidan0824/DSCGM.

10:50-10:55, Paper TuAT10.5
Revisiting 3D Curve to Surface Registration Using Tangent and Normal Vectors for Computer-Assisted Orthopedic Surgery

Zhang, Zhengyan	Harbin Institute of Technology, Shenzhen
Du, Xinzhe	Shandong University
Min, Zhe	University College London
Song, Rui	Shandong University
Li, Yibin	Shandong University
Meng, Max Q.-H.	The Chinese University of Hong Kong
Keywords: Computer Vision for Medical Robotics, Medical Robots and Systems Abstract: In this paper, we present a novel curve-to-surface registration method, termed Bi-directional Hybrid Mixture Model Registration based on Dual-constrained Tangent and Normal Vectors (BiHMM-DTN). While hybrid registration models incorporating tangent and normal vectors (HMM-TN) demonstrate success, their geometric constraints prove inadequate or inappropriate for sparse intraoperative point sets, frequently yielding suboptimal optimization outcomes. By critically revisiting the geometric constraints of HMM-TN, we propose a dual-constraints-based hybrid mixture model registration framework with enhanced intraoperative point cloud acquisition protocols. To deal with noise and outliers in preoperative and intraoperative point sets—caused by reconstruction inaccuracies and tracking errors respectively—our approach employs a bi-directional registration mechanism for curve-to-surface registration. We provide rigorous proofs validating the geometric completeness of the dual constraints within this mechanism. The BiHMM-DTN framework is formulated as a maximum likelihood estimation (MLE) problem and optimized using an expectation-maximization (EM) algorithm. Futhermore, to enhance convergence stability and accelerate optimization, the rotation matrix is updated iteratively through successive incremental steps. Extensive experiments on human femur and hip models demonstrate that our method outperforms state-of-the-art approaches, including both traditional optimization and deep learning methods, under various noise and outlier conditions. Furthermore, real-world phantom experiments highlight the potential clinical value of our method for surgical navigation applications. The codes and data are available at url{https://github.com/sam-zyzhang/BiHMM-DTN.git}.

10:55-11:00, Paper TuAT10.6
Gaussian Splatting with Reflectance Regularization for Endoscopic Scene Reconstruction

Li, Chengkun	The Chinese University of Hong Kong
Chen, Kai	The Chinese University of Hong Kong
Qiu, Shi	The Chinese University of Hong Kong
Chan, Jason Ying-Kuen	The Chinese University of Hong Kong
Dou, Qi	The Chinese University of Hong Kong
Keywords: Computer Vision for Medical Robotics Abstract: Endoscopic reconstruction plays a crucial role in surgical robotics. The dynamic lighting conditions and integrated camera-light source in endoscopic scenes create a distinct reconstruction challenge: shape ambiguity. To mitigate this, we propose a Gaussian Splatting (GS) based framework for endoscopic scene reconstruction, enhanced with reflectance regularization. We embed every 3D Gaussian point with physical reflective attributes and combine this representation with a physically based inverse rendering framework. By jointly training 3DGS for view synthesis with this reflectance regularization, we are able to attain high-quality geometry without changing the volume rendering pipeline. Our experiments demonstrate the superiority in both geometry representation and rendering performance compared to existing GS approaches, making it a practical solution for endoscopic applications. Project is available at: https://med-air.github.io/GSR2.


TuAT11	311A
Reinforcement Learning 1	Regular Session
Chair: Zhou, Weitao	Tsinghua University
Co-Chair: Xiao, Xuesu	George Mason University

10:30-10:35, Paper TuAT11.1
Deep Reinforcement Learning-Based Mapless Navigation for Mobile Robot in Unknown Environment with Local Optima

Hu, Yiming	Huazhong University of Science and Technology
Wang, Shuting	Huazhong University of Science and Technology
Xie, Yuanlong	Huazhong University of Science and Technology
Zheng, Shiqi	China University of Geosciences Wuhan Campus
Shi, Peng	The University of Adelaide
Rudas, Imre J.	Óbuda University
Cheng, Xiang	Huazhong University of Science and Technology
Keywords: Reinforcement Learning, Collision Avoidance Abstract: Local optima issues challenge mobile robot mapless navigation with the dilemma of avoiding collisions and approaching the target. Planning-based methods rely on environmental models and manual strategies for updating local paths to guide the robot. In contrast, learning-based methods are capable of processing original sensor data to navigate the robot in real time but struggle with local optima problems. To address this, we designed reward rules that punishes the robot for revisiting passed areas that may traps the robot, and reward it for exploring local areas in diverse ways and escaping from local optima areas. Then, we improved the Soft Actor-Critic (SAC) algorithm by making its temperature parameter adaptive to the current training status, and memorize them in experiences for strategy updating, bringing additional exploratory behaviors and necessary stability into the training. Finally, with the assistance of auxiliary networks, the robot learns to handle various navigation tasks with local optima risks. Simulations demonstrate the advantages of our method in terms of both success rate and path efficiency compared to several existing methods. Experiments verified the proposed method in real-world scenarios.

10:35-10:40, Paper TuAT11.2
Point Cloud-Based End-To-End Formation Control Using a Two Stage SAC Algorithm

Li, Mingfei	Beijing University of Technology
Liu, Haibin	Beijing University of Technology
Xie, Feng	Beijing University of Technology
Huang, He	Beijing University of Technology
Keywords: Reinforcement Learning, Sensor-based Control, AI-Based Methods Abstract: This study develops a novel end-to-end formation strategy for leader-follower formation control of mobile robots that uses onboard LiDAR sensors in non-communication environments. The main contributions of this paper are twofold: Firstly, we propose a point cloud-based LiDAR servoing control method (PCLS) aimed at ensuring mobile robots achieve the predefined formation performance without direct communication. Secondly, an innovative two-stage Soft Actor-Critic (TSSAC) algorithm is presented, specifically designed for end-to-end training of PCLS. This algorithm skillfully combines the strengths of a distance-based agent (serving as a ``teacher") and a point cloud-based agent (serving as a ``student"), effectively addressing the issues of slow convergence and insufficient generalization in deep reinforcement learning methods that use high-dimensional features (such as point clouds, images) as inputs. Furthermore, as part of our method, we designed a novel reward function and normalized the point cloud inputs to provide consistent incentives for the agent across diverse formation tasks, thereby facilitating better learning and adaptation to formation tasks in different environments. Finally, through extensive experiments conducted in the Gazebo simulator and real-world environments, we confirmed the effectiveness of the proposed method. Compared to other formation control strategies, our approach relies solely on onboard LiDAR sensors, without the need for additional communication devices, while ensuring excellent transient and steady-state performance. The video of the test in the real-world environment can be found at https://youtu.be/jhggsLUczPA

10:40-10:45, Paper TuAT11.3
Diffusion Policies for Risk-Averse Behavior Modeling in Offline Reinforcement Learning

Chen, Xiaocong	CSIRO
Wang, Siyu	University of New South Wales
Yu, Tong	Adobe Research
Yao, Lina	Csiro & Unsw
Keywords: Reinforcement Learning Abstract: Offline reinforcement learning (RL) presents distinct challenges as it relies solely on observational data. A central concern in this context is ensuring the safety of the learned policy by quantifying uncertainties associated with various actions and environmental stochasticity. Traditional approaches primarily emphasize mitigating epistemic uncertainty by learning risk-averse policies, often overlooking environmental stochasticity. In this study, we propose an uncertainty-aware distributional offline RL method to simultaneously address both epistemic uncertainty and environmental stochasticity. We propose a model-free offline RL algorithm capable of learning risk-averse policies and characterizing the entire distribution of discounted cumulative rewards, as opposed to merely maximizing the expected value of accumulated discounted returns. Our method is rigorously evaluated through comprehensive experiments in both risk-sensitive and risk-neutral benchmarks, demonstrating its superior performance.

10:45-10:50, Paper TuAT11.4
IHGSL: Interpretable Heuristic Graph Structure Learning for Multi-Robot Autonomous Collaborative Systems

Han, Yue	AVIC SADRI
Li, Hanqi	Shenyang Aerospace University
Liu, Cuiwei	Shenyang Aerospace University
Liang, Chen	Shenyang Institute of Automation(sia), Chinese Academy of Scienc
Sun, Zhixiao	AVIC SADRI
Keywords: Reinforcement Learning, Cooperating Robots, Representation Learning Abstract: In multi-robot systems, capturing the complex and dynamic interaction relationships is essential for enhancing autonomous collaboration. However, existing learning-based approaches usually overlook the understanding of these relationships, leading to reliability issues and hindering their application to real-world scenarios. This paper proposes a novel approach called Interpretable Heuristic Graph Structure Learning (IHGSL) to better comprehend the complex collaborative relationships in multi-robot systems. We first construct a predicate space to define diverse predicates that express fundamental relationships. Then we employ the variational information bottleneck technique to acquire a latent representation of the current observation by aligning it with the historical trajectory. On this basis, the predicates that the robot should currently focus on the most are learned, and some interaction relationships are established accordingly. Thereby an interpretable relationship graph is generated heuristically to guide the achievement of multi-robot autonomous collaborative decision-making. Through experimental evaluation, we demonstrate the process of relationship inference, thus validating the interpretability of IHGSL. Compared with existing methods, IHGSL also achieves superior collaboration performance, which highlights the effectiveness of the learned heuristic graph structure.

10:50-10:55, Paper TuAT11.5
Visual Multitask Policy Learning with Asymmetric Critic Guided Distillation

Srinivasan, Krishnan	Stanford University
Xu, Jie	NVIDIA
Ang, Henry	Stanford University
Heiden, Eric	NVIDIA
Fox, Dieter	University of Washington
Bohg, Jeannette	Stanford University
Garg, Animesh	Georgia Institute of Technology
Keywords: Reinforcement Learning, Imitation Learning, Dexterous Manipulation Abstract: We present Asymmetric Critic Guided Distillation (ACGD) a framework for learning multi-task dexterous manipulation policies that can manipulate articulated objects using images as input. ACGD is a scalable student-teacher distillation approach that utilizes behavior cloning to distill multiple expert policies into a single vision-based, multi-task student policy for dexterous manipulation. The expert policies are trained with traditional RL techniques with access to privileged state information of both the robot and the manipulated object, while the distilled student policy operates under realistic sensory constraints, specifically using only camera images and robot proprioception. During distillation, we use an expert-critic that provides action labels and value estimates to refine the student's action sampling through a dual IL/RL objective. In the multi-task setting, we achieve this through an aggregate critic for different single-task experts. Our approach exhibits strong performance compared to a number of state-of-the-art imitation learning (IL) and reinforcement learning (RL) baselines. We evaluate across a variety of multi-task dexterous manipulation benchmarks including bimanual manipulation, single-hand object articulation tasks, and a tendon-actuated hand and achieves state-of-the-art performance with 10-15% improvement over baseline algorithms. Visit our website (https://critic-guided-distillation.github.io) for more details.

10:55-11:00, Paper TuAT11.6
GACL: Grounded Adaptive Curriculum Learning with Active Task and Performance Monitoring

Wang, Linji	George Mason University
Xu, Zifan	University of Texas at Austin
Stone, Peter	The University of Texas at Austin
Xiao, Xuesu	George Mason University
Keywords: Reinforcement Learning, Machine Learning for Robot Control, Continual Learning Abstract: Curriculum learning has emerged as a promising approach for training complex robotics tasks, yet current applications predominantly rely on manually designed curricula, which demand significant engineering effort and can suffer from subjective and suboptimal human design choices. While automated curriculum learning has shown success in simple domains like grid worlds and games where task distributions can be easily specified, robotics tasks present unique challenges: they require handling complex task spaces while maintaining relevance to target domain distributions that are only partially known through limited samples. To this end, we propose Grounded Adaptive Curriculum Learning, a framework specifically designed for robotics curriculum learning with three key innovations: (1) a task representation that consistently handles complex robot task design, (2) an active performance tracking mechanism that allows adaptive curriculum generation appropriate for the robot's current capabilities, and (3) a grounding approach that maintains target domain relevance through alternating sampling between reference and synthetic tasks. We validate GACL on wheeled navigation in constrained environments and quadruped locomotion in challenging 3D confined spaces, achieving 6.8% and 6.1% higher success rates, respectively, than state-of-the-art methods in each domain.

11:00-11:05, Paper TuAT11.7
RT-HCP: Dealing with Inference Delays and Sample Efficiency to Learn Directly on Robotic Platforms

El Asri, Zakariae	Sorbonne Université, CNRS, ISIR, F-75005 Paris, France
Laiche, Ibrahim	Sorbonne Université, CNRS, ISIR
Rambour, Clément	Sorbonne University
Sigaud, Olivier	Sorbonne Université
Thome, Nicolas	Sorbonne University
Keywords: Reinforcement Learning, Machine Learning for Robot Control, Model Learning for Control Abstract: Learning a controller directly on the robot requires extreme sample efficiency. Model-based reinforcement learning (RL) methods are the most sample efficient, but they often suffer from a too long inference time to meet the robot control frequency requirements. In this paper, we address the sample efficiency and inference time challenges with two contributions. First, we define a general framework to deal with inference delays where the slow inference robot controller provides a sequence of actions to feed the control-hungry robotic platform without execution gaps. Then, we compare several RL algorithms in the light of this framework and propose RT-HCP, an algorithm that offers an excellent trade-off between performance, sample efficiency and inference time. We validate the superiority of RT-HCP with experiments where we learn a controller directly on a simple but high frequency FURUTA pendulum platform.

11:05-11:10, Paper TuAT11.8
DRARL: Disengagement-Reason-Augmented Reinforcement Learning for Efficient Improvement of Autonomous Driving Policy

Zhou, Weitao	Tsinghua University
Zhang, Bo	DIdi Inc
Cao, Zhong	University of Michigan
Li, Xiang	The Lab for High Technology, Tsinghua University
Cheng, Qian	Tsinghua University
Liu, Chunyang	DiDi Chuxing
Zhang, Ya-Qin	Institute for AI Industry Research(AIR), Tsinghua University
Yang, Diange	Tsinghua University
Keywords: Robot Safety, Reinforcement Learning, Continual Learning Abstract: With the increasing presence of automated vehicles on open roads under driver supervision, disengagement cases are becoming more prevalent. While some data-driven planning systems attempt to directly utilize these disengagement cases for policy improvement, the inherent scarcity of disengagement data (often occurring as a single instance) restricts training effectiveness. Furthermore, some disengagement data should be excluded since the disengagement may not always come from the failure of driving policies, e.g. the driver may casually intervene for a while. To this end, this work proposes disengagement-reason-augmented reinforcement learning (DRARL), which enhances driving policy improvement process according to the reason of disengagement cases. Specifically, the reason of disengagement is identified by an out-of-distribution (OOD) state estimation model. When the reason doesn’t exist, the case will be identified as a casual disengagement case, which doesn’t require additional policy adjustment. Otherwise, the policy can be updated under a reason-augmented imagination environment, improving the policy performance of disengagement cases with similar reasons. The method is evaluated using real-world disengagement cases collected by autonomous driving robotaxi. Experimental results demonstrate that the method accurately identifies policy-related disengagement reasons, allowing the agent to handle both original and semantically similar cases through reason-augmented training. Furthermore, the approach prevents the agent from becoming overly conservative after policy adjustments. Overall, this work provides an efficient way to improve driving policy performance with disengagement cases.


TuAT12	311B
RGB-D Perception 1	Regular Session
Chair: Chen, Long	Chinese Academy of Sciences

10:30-10:35, Paper TuAT12.1
Towards Label-Free 3D Visual Grounding with Vision Foundation Models

Wu, Xiaopei	Zhejiang University
Hou, Yuenan	Chinese University of Hong Kong
Lin, Binbin	Zhejiang University
Zhu, Xinge	CUHK
Ma, Yuexin	ShanghaiTech University
Liu, Haifeng	Zhejiang University
Cai, Deng	Zhejiang University
Sun, Xiao	Shanghai AI Laboratory, China
Keywords: RGB-D Perception, Recognition Abstract: 3D visual grounding is pivotal for enabling intelligent agents to find the target object in the 3D scene given the linguistic descriptions. However, contemporary methods are typically hindered by the scarcity of large-scale 3D datasets with fine-grained annotations and the complexity of modeling spatial relationships in the 3D space. Inspired by the exceptional performance of Vision Foundation Models (VFMs) and Vision Language Models (VLMs), we propose a novel label-free 3D visual grounding method, termed LF-3DVG , that minimizes the heavy reliance on fine-grained annotations and leverages the off-the-shelf vision foundation models for zero-shot 3D visual grounding. Our LF-3DVG is comprised of two main components, i.e., VFM-guided 3D Object Detection and VLM- based 3D Visual Grounding. Specifically, we first utilize SAM3D to generate high-quality instance masks for the objects in the 3D scene. Since SAM3D cannot provide categorical information, we further employ Semantic-SAM to assign class labels for the detected masks. As to the VLM-based 3D Visual Grounding, we first feed multi-view images and textual descriptions to the VLM for 2D visual grounding. To lift the 2D predictions to the 3D space, we design the 2D-3D object association module to effectively match the 2D detection results with the 3D boxes produced by the 3D detector, yielding the ultimate 3D visual grounding results. Through extensive experiments on the ScanRefer and Sr3D/Nr3D benchmarks, we demonstrate that our method consistently outperforms previous approaches. Our algorithm can also boost the performance of 3D visual grounding when given labeled training samples and can be seamlessly integrated into contemporary 3D visual grounding models.

10:35-10:40, Paper TuAT12.2
LR^2Depth: Large-Region Aggregation at Low Resolution for Efficient Monocular Depth Estimation

Ning, Chao	The University of Tokyo, RIKEN
Xuan, Weihao	The University of Tokyo
Gan, Wanshui	The University of Tokyo
Yokoya, Naoto	The University of Tokyo & RIKEN AIP
Keywords: RGB-D Perception, Computer Vision for Automation, Deep Learning Methods Abstract: Monocular depth estimation (MDE) is crucial for various computer vision applications, but existing methods often struggle to balance inference speed and accuracy when processing large-region visual information. This paper introduces LR^2Depth, a novel MDE method that addresses this challenge by utilizing large-kernel convolution on low-resolution feature maps for efficient large-region feature aggregation. Our approach leverages the fact that each pixel on low-resolution feature maps corresponds to a larger region of the original image, allowing for fast and accurate depth predictions at a lower inference cost. Extensive experiments on NYU-Depth-V2, KITTI, and SUN RGB-D datasets demonstrate that LR^2Depth not only achieves state-of-the-art performance but also operates approximately twice as fast as previous MDE methods. Notably, at the time of submission, LR^2Depth secured the top-1 position on the KITTI depth prediction online benchmark.

10:40-10:45, Paper TuAT12.3
Adjacent-View Transformers for Supervised Surround-View Depth Estimation

Guo, Xianda	School of Computer Science, Wuhan University
Yuan, Wenjie	Alibaba Group
Zhang, Yunpeng	PhiGent Robotics
Yang, Tian	PhiGent Robotics
Zhang, Chenming	Xi'an Jiaotong University
Zhu, Zheng	Institute of Automation, Chinese Academy of Sciences
Zou, Qin	Wuhan University
Chen, Long	Chinese Academy of Sciences
Keywords: RGB-D Perception, Computer Vision for Transportation Abstract: Depth estimation has been widely studied and serves as the fundamental step of 3D perception for robotics and autonomous driving. Though significant progress has been made in monocular depth estimation in the past decades, these attempts are mainly conducted on the KITTI benchmark with only front-view cameras, which ignores the correlations across surround-view cameras. In this paper, we propose an Adjacent-View Transformer for Supervised Surround-view Depth estimation to jointly predict the depth maps across multiple surrounding cameras. Specifically, we employ a global-to-local feature extraction module that combines CNN with transformer layers for enriched representations. Further, the adjacent-view attention mechanism is proposed to enable the intra-view and inter-view feature propagation. The former is achieved by the self-attention module within each view, while the latter is realized by the adjacent attention module, which computes the attention across multi-cameras to exchange the multi-scale representations across surround-view feature maps. In addition, AVT-SSDepth has strong cross-dataset generaliza- tion. Extensive experiments show that our method achieves superior performance on both DDAD and nuScenes datasets.

10:45-10:50, Paper TuAT12.4
OpenFusion++: An Open-Vocabulary Real-Time Scene Understanding System

Xiaofeng, Jin	Politecnico Di Milano
Frosi, Matteo	Politecnico Di Milano
Matteucci, Matteo	Politecnico Di Milano
Keywords: RGB-D Perception, Semantic Scene Understanding, Range Sensing Abstract: Real-time open-vocabulary scene understanding is essential for efficient 3D perception in applications such as vision-language navigation, embodied intelligence, and augmented reality. However, existing methods suffer from imprecise instance segmentation, static semantic updates, and limited handling of complex queries. To address these issues, we present OpenFusion++, a TSDF-based real-time 3D semantic-geometric reconstruction system. Our approach refines 3D point clouds by fusing confidence maps from foundational models, dynamically updates global semantic labels via an adaptive cache based on instance area, and employs a dual-path encoding framework that integrates object attributes with environmental context for precise query responses. Experiments on the ICL, Replica, ScanNet, and ScanNet++ datasets demonstrate that OpenFusion++ significantly outperforms the baseline in both semantic accuracy and query responsiveness.

10:50-10:55, Paper TuAT12.5
Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot Grasping

Guo, Teng	Rutgers University
Huang, Baichuan	Rutgers University
Yu, Jingjin	Rutgers University
Keywords: RGB-D Perception, Grasping, Computer Vision for Automation Abstract: Accurate 6D object pose estimation is a prerequisite for successfully completing robotic prehensile and nonprehensile manipulation tasks. At present, 6D pose estimation for robotic manipulation generally relies on depth sensors based on, e.g., structured light, time-of-flight, and stereo-vision, which can be expensive, produce noisy output (as compared with RGB cameras), and fail to handle transparent objects. On the other hand, state-of-the-art monocular depth estimation models (MDEMs) provide only affine-invariant depths up to an unknown scale and shift. Metric MDEMs achieve some successful zero-shot results on public datasets, but fail to generalize. We propose a novel framework, monocular one-shot metric-depth alignment, MOMA, to recover metric depth from a single RGB image, through a one-shot adaptation building on MDEM techniques. MOMA performs scale-rotation-shift alignments during camera calibration, guided by sparse ground-truth depth points, enabling accurate depth estimation without additional data collection or model retraining on the testing setup. MOMA supports fine-tuning the MDEM on transparent objects, demonstrating strong generalization capabilities. Real- world experiments on tabletop 2-finger grasping and suction- based bin-picking applications show MOMA achieves high success rates in diverse tasks, confirming its effectiveness.

10:55-11:00, Paper TuAT12.6
Learning-Based Keypoints Detection with Topological Order on Deformable Linear Objects from Incomplete Point Clouds

Li, Can	Nankai University
Liu, Jingyang	Nankai University
Sun, Lei	Nankai University
Keywords: RGB-D Perception, Object Detection, Segmentation and Categorization Abstract: Detection of deformable linear objects (DLOs) in three-dimensional space is essential for robotic manipulation of DLOs. However, their complex deformations and high degrees of freedom make perception highly susceptible to occlusions, noise, and data missing. To address these challenges, we propose a deep learning-based method that leverages the topological properties of DLOs to robustly detect keypoints from incomplete point clouds while preserving the topological order of keypoints. Our approach initializes a sequence of keypoints that adheres to the topological structure of DLOs. Then, these ordered keypoints are refined through bidirectional sequence learning. Simulation results demonstrate that our method generates accurate, uniform, and smooth keypoint sequences under varying levels of occlusion. Compared to existing baselines, our approach achieves superior performance. Real-world experiments further validate the generalization capability of our method in unseen and challenging scenarios involving occlusion and self-occlusion while maintaining real-time performance.

11:00-11:05, Paper TuAT12.7
Self-Supervised Enhancement for Depth from a Lightweight ToF Sensor with Monocular Images

Ding, Laiyan	The Chinese University of Hong Kong, Shenzhen
Jiang, Hualie	Insta360 Research
Chen, Jiwei	The Chinese University of Hong Kong, Shenzhen
Huang, Rui	The Chinese University of Hong Kong, Shenzhen
Keywords: RGB-D Perception, Deep Learning Methods, Deep Learning for Visual Perception Abstract: Depth map enhancement using paired high-resolution RGB images offers a cost-effective solution for improving low-resolution depth data from lightweight ToF sensors. Nevertheless, naively adopting a depth estimation pipeline to fuse the two modalities requires groundtruth depth maps for supervision. To address this, we propose a self-supervised learning framework, SelfToF, which generates detailed and scale-aware depth maps. Starting from an image-based self-supervised depth estimation pipeline, we add low-resolution depth as inputs, design a new depth consistency loss, propose a scale-recovery module, and finally obtain a large performance boost. Furthermore, since the ToF signal sparsity varies in real-world applications, we upgrade SelfToF to SelfToF* with submanifold convolution and guided feature fusion. Consequently, SelfToF* maintain robust performance across varying sparsity levels in ToF data. Overall, our proposed method is both efficient and effective, as verified by extensive experiments on the NYU and ScanNet datasets. The code is available at https://github.com/denyingmxd/selftof.


TuAT13	311C
Deep Learning for Visual Perception 1	Regular Session
Chair: Bian, Gui-Bin	Institute of Automation, Chinese Academy of Sciences
Co-Chair: Xu, Mengya	National University of Singapore

10:30-10:35, Paper TuAT13.1
Curriculum-Based Augmented Fourier Domain Adaptation for Robust Medical Image Segmentation (I)

Wang, An	The Chinese University of Hong Kong
Islam, Mobarakol	University College London
Xu, Mengya	National University of Singapore
Ren, Hongliang	Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS)
Keywords: Deep Learning Methods, Deep Learning for Visual Perception, AI-Based Methods Abstract: Accurate and robust medical image segmentation is fundamental and crucial for enhancing the autonomy of computer-aided diagnosis and intervention systems. Medical data collection normally involves different scanners, protocols, and populations, making domain adaptation (DA) a highly demanding research field to alleviate model degradation in the deployment site. To preserve the model performance across multiple testing domains, this work proposes the Curriculum-based Augmented Fourier Domain Adaptation (Curri-AFDA) for robust medical image segmentation. In particular, our curriculum learning strategy is based on the causal relationship of a model under different levels of data shift in the deployment phase, where the higher the shift is, the harder to recognize the variance. Considering this, we progressively introduce more amplitude information from the target domain to the source domain in the frequency space during the curriculum-style training to smoothly schedule the semantic knowledge transfer in an easier-to-harder manner. Besides, we incorporate the training-time chained augmentation mixing to help expand the data distributions while preserving the domain-invariant semantics, which is beneficial for the acquired model to be more robust and generalize better to unseen domains. Extensive experiments on two segmentation tasks of Retina and Nuclei collected from multiple sites and scanners suggest that our proposed method yields superior adaptation and generalization performance. Meanwhile, our approach proves to be more robust under various corruption types and increasing severity levels. In addition, we show that our method is also beneficial in the domain-adaptive classification task with skin lesion datasets.

10:35-10:40, Paper TuAT13.2
TCNet: A Temporally Consistent Network for Self-Supervised Monocular Depth Estimation

Zhu, Ying	Peking University
Liu, Hong	Peking University
Wu, Jianbing	Peking University
Liu, Mengyuan	Peking University
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, Visual Learning Abstract: Despite significant advances in self-supervised monocular depth estimation methods, achieving temporally consistent and accurate depth maps from frame sequences remains a formidable challenge. Existing approaches often estimate depth maps for individual frames in isolation, neglecting the rich geometric and temporal coherence present across frames. Consequently, this oversight leads to temporally inconsistent outputs, resulting in noticeable temporal flickering artifacts. In response, this paper presents TCNet, a Temporal Consistent Network for self-supervised monocular depth estimation. Specifically, we propose an Inter-frame Temporal Fusion (ITF) module to emphasize the influence of preceding images on the depth estimation of the current frame. The Temporal Consistency Loss (TCL) is proposed to leverage the temporal constraints between the depth maps of adjacent frames. Besides, TCNet can also be applied to both single-frame and multi-frame scenarios during inference. Experimental evaluations on the KITTI dataset demonstrate that our method surpasses state-of-the-art depth estimation methods in accuracy and temporal consistency. Our code will be made public.

10:40-10:45, Paper TuAT13.3
DPSN: Dual Prior Knowledge Induced Tactile Paving and Obstacle Joint Segmentation Network

Song, Youqi	East China Normal University
Li, Wenqi	East China Normal University
Zhang, Zhao	East China Normal University
Wu, Yu	University of California, Davis
Jin, Zilong	East China Normal University
Wang, Changbo	East China Normal University
He, Gaoqi	East China Normal University
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, Vision-Based Navigation Abstract: Accurate semantic segmentation of both tactile paving and the obstacle is crucial for the safe mobility of visually impaired individuals. However, existing methods face two major challenges: (i) discontinuous segmentation fragments; (ii) Inaccurate obstacle recognition. To address challenge (i), we propose incorporating appearance priors of complete tactile pavings to prevent the model from directly learning irregular ground truth masks. To tackle challenge (ii), we propose introducing cross-modal semantic priors to complement the semantic information of obstacles. We implemented these strategies in proposed Dual Prior knowledge induced tactile paving and obstacle joint Segmentation Network (DPSN). Based on bilateral network architecture, DPSN merges obstacle category masks into tactile paving categories, constructing a complete tactile paving mask. Utilizing the complete mask, DPSN transfer appearance prior knowledge to detail features from boundary and structural perspectives. Concurrently, DPSN leverages the CLIP Text Encoder to guide visual feature decoding by attention mechanisms, transferring rich cross-modal semantic prior knowledge to the visual feature maps. Furthermore, we propose the TPO-Dataset, the first dataset for joint tactile paving and obstacle segmentation acquired from actual scenes. Experiments demonstrate that DPSN achieves state-of-the-art results on the TPO-Dataset, with relative gains of 27.16% in obstacle IoU and 30.53% in accuracy metrics compared to baseline methods. Notably, DPSN achieves real-time performance at 88.25 FPS on the maximum scale of 2048×512 resolution.

10:45-10:50, Paper TuAT13.4
DCT-Diffusion: Depth Completion for Transparent Objects with Diffusion Denoising Approach

Zhou, Zhenning	Shanghai Jiao Tong University
Shen, Weiqing	Shanghai Jiao Tong University
Sun, Han	Shanghai Jiao Tong University
Wang, Yizhao	Shanghai Jiao Tong University
Cao, Qixin	Shanghai Jiao Tong University
Keywords: Deep Learning for Visual Perception, RGB-D Perception, Deep Learning Methods Abstract: Transparent objects are common in industrial automation and daily life. However, accurate visual perception of these objects remains challenging due to their reflective and refractive properties. Most previous studies fail to capture contextual information and typically rely on regression-based methods at the decoder stage, suffering from overfitting and unsatisfactory object details. To overcome these limitations, we present a novel depth completion framework for transparent objects with dif-fusion denoising approach (DCT-Diffusion). First, we adopt a transformer-based encoder to globally learn the depth rela-tionships from different parts of the input by modeling long-distance dependencies. Then, we propose to introduce the diffusion model to generate refined depth maps from random depth distribution. Through iterative refinement, our model can progressively enhance depth map details and achieves fi-ne-grained performance. Lastly, a conditioned fusion module is developed, which utilizes encoder features as visual conditions and fuses them with the denoising block at each step using augmented attention. Extensive comparative studies and cross-domain experiments prove that the DCT-Diffusion out-performs previous methods and significantly improves the robustness and generalization ability. Moreover, visualization results further illustrate that our method can generate depth maps with more complete geometry and clearer boundaries, achieving satisfactory results.

10:50-10:55, Paper TuAT13.5
Stimulating Imagination: Towards General-Purpose "Something Something Placement"

Wu, Jianyang	The University of Tokyo
Gu, Jie	A4X, Rightly Robotics
Ma, Xiaokang	Rightly Robotics
Qiu, Fangzhou	Rightly Robotics
Tang, Chu	Rightly.ai
Chen, Jingmin	Rightly Robotics
Keywords: Deep Learning for Visual Perception, Perception for Grasping and Manipulation Abstract: General-purpose object placement is a fundamental capability of an intelligent generalist robot: being capable of rearranging objects following precise human instructions even in novel environments. This work is dedicated to achieving general-purpose object placement with "something something" instructions. Specifically, we break the entire process down into three parts, including object localization, goal imagination and robot control, and propose a method named SPORT. SPORT leverages a pre-trained large vision model for broad semantic reasoning about objects, and learns a diffusion-based pose estimator to ensure physically-realistic results in 3D space. Only object types (movable or reference) are communicated between these two parts, which brings two benefits. One is that we can fully leverage the powerful ability of open-set object recognition and localization since no specific fine-tuning is needed for the robotic scenario. Moreover, the diffusion-based estimator only need to "imagine" the object poses after the placement, while no necessity for their semantic information. Thus the training burden is greatly reduced and no massive training is required. The training data for the goal pose estimation is collected in simulation and annotated by using GPT-4. Experimental results demonstrate the effectiveness of our approach. SPORT can not only generate promising 3D goal poses for unseen simulated objects, but also be seamlessly applied to real-world settings.

10:55-11:00, Paper TuAT13.6
DroneKey: Drone 3D Pose Estimation in Image Sequences Using Gated Key-Representation and Pose-Adaptive Learning

Hwang, Seo-Bin	Chonnam National University
Cho, Yeong-Jun	Chonnam National University
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Deep Learning Methods Abstract: Estimating the 3D pose of a drone is important for anti-drone systems, but existing methods struggle with the unique challenges of drone keypoint detection. Drone propellers serve as keypoints but are difficult to detect due to their high visual similarity and diversity of poses. To address these challenges, we propose DroneKey, a framework that combines a 2D keypoint detector and a 3D pose estimator specifically designed for drones. In the keypoint detection stage, we extract two key-representations (intermediate and compact) from each transformer encoder layer and optimally combine them using a gated sum. We also introduce a pose-adaptive Mahalanobis distance in the loss function to ensure stable keypoint predictions across extreme poses. We built new datasets of drone 2D keypoints and 3D pose to train and evaluate our method, which have been publicly released. Experiments show that our method achieves an AP of 99.68% (OKS) in keypoint detection, outperforming existing methods. Ablation studies confirm that the pose-adaptive Mahalanobis loss function improves keypoint prediction stability and accuracy. Additionally, improvements in the encoder design enable real-time processing at 44 FPS. For 3D pose estimation, our method achieved an MAE-angle of 10.62°, an RMSE of 0.221m, and an MAE-absolute of 0.076m, demonstrating high accuracy and reliability. The code and dataset are available at https://github.com/kkanuseobin/DroneKey.

11:00-11:05, Paper TuAT13.7
L2COcc: Lightweight Camera-Centric Semantic Scene Completion Via Distillation of LiDAR Model

Wang, Ruoyu	Zhejiang University
Ma, Yukai	Zhejiang University
YaoYi, Yaoyi	Zhejiang University
Tao, Sheng	BigDataCloudAI
Li, Haoang	Hong Kong University of Science and Technology (Guangzhou)
Zhu, Zongzhi	Zhejiang Guoli Xin'an Technology Co., Ltd
Liu, Yong	Zhejiang University
Zuo, Xingxing	MBZUAI
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Computer Vision for Automation Abstract: Abstract— Semantic Scene Completion (SSC) constitutes a pivotal element in autonomous driving perception systems, tasked with inferring the 3D semantic occupancy of a scene from sensory data. To improve accuracy, prior research has implemented various computationally demanding and memory- intensive 3D operations, imposing significant computational requirements on the platform during training and testing. This paper proposes L2COcc, a lightweight camera-centric SSC framework that also accommodates LiDAR inputs. With our proposed efficient voxel transformer (EVT) and three types of cross-modal knowledge modules (FSD, TPVD, PAD), our method substantially reduce computational burden while main- taining high accuracy. The experimental evaluations demon- strate that our proposed method surpasses the current state-of- the-art vision-based SSC methods regarding accuracy on both the SemanticKITTI and SSCBench-KITTI-360 benchmarks, respectively. Additionally, our method is more lightweight, ex- hibiting a reduction in both memory consumption and inference time by over 25%. Code is available at the project page: https://studyingfufu.github.io/L2COcc/.

11:05-11:10, Paper TuAT13.8
Self-Distilled Stereo Matching: Real-Time Domain Generalization for Robotic Depth Perception

Zhang, Xuxin	Sun Yat-Sen University
Li, Kunhong	Sun Yat-Sen University
Zhang, Yongjian	Sun Yat-Sen University
Song, Zhuo	University of Chinese Academy of Sciences
Jiang, Runqing	Sun Yat-Sen University
Zhang, Ye	Sun Yat-Sen University
Guo, Yulan	Sun Yat-Sen University
Keywords: Deep Learning for Visual Perception, Machine Learning for Robot Control Abstract: While human vision inherently achieves robust cross-domain depth estimation through binocular coordination, robotic systems employing stereo matching still confront significant challenges in maintaining robustness across domains when performing real-time environmental depth perception. Furthermore, most stereo matching methods struggle with challenging regions such as object boundaries and non-overlapping areas on the left side of the left image, resulting in disparity maps that are relatively indistinct and lacking fine details. In this paper, we propose Learning More in Challenging Areas (LMC) to alleviate this problem, which enhances the domain generalization of the model through targeted training on challenging regions. LMC is a simple yet effective data-driven training framework primarily based on self-distillation. Specifically, 1) We pre-train models on a high-frequency dataset to improve perception ability on object boundaries; 2) We develop a self-distillation training strategy to benefit learning in non-overlapping areas on the left side of the left image; 3) We design an adaptive difficult area mask to balance the loss weight on other undefined challenging regions. Under our proposed training framework, GwcNet achieves 33% and 23% performance improvements in autonomous driving benchmarks KITTI 2012 and KITTI 2015 respectively, while preserving real-time inference efficiency without computational overhead.


TuAT14	311D
Deep Learning Methods 1	Regular Session
Chair: Pu, Jian	Fudan University
Co-Chair: Ma, Jun	The Hong Kong University of Science and Technology

10:30-10:35, Paper TuAT14.1
YO-CSA-T: A Real-Time Badminton Tracking System Utilizing YOLO Based on Contextual and Spatial Attention

Lai, Yuan	Shandong University
Zhiwei, Shi	Shandong University
Zhu, Chengxi	Shandong University
Keywords: Deep Learning Methods, Visual Tracking, AI-Based Methods Abstract: The 3D trajectory of a shuttlecock required for a badminton rally robot for human-robot competition demands real-time performance with high accuracy. However, the fast flight speed of the shuttlecock, along with various visual effects, and its tendency to blend with environmental elements, such as court lines and lighting, present challenges for rapid and accurate 2D detection. In this paper, we first propose the YO-CSA detection network, which optimizes and reconfigures the YOLOv8s model's backbone, neck, and head by incorporating contextual and spatial attention mechanisms to enhance model's ability in extracting and integrating both global and local features. Next, we integrate three major sub-tasks—detection, prediction, and compensation—into a real-time 3D shuttlecock trajectory detection system. Specifically, our system maps the 2D coordinate sequence extracted by YO-CSA into 3D space using stereo vision, then predicts the future 3D coordinates based on historical information, and re-projects them onto the left and right views to update the position constraints for 2D detection. Additionally, our system includes a compensation module to fill in missing intermediate frames, ensuring a more complete trajectory. We conduct extensive experiments on our own dataset to evaluate both YO-CSA's performance and system effectiveness. Experimental results show that YO-CSA achieves a high accuracy of 90.43% mAP@0.75, surpassing both YOLOv8s and YOLO11s. Our system performs excellently, maintaining a speed of over 130 fps across 12 test sequences.

10:35-10:40, Paper TuAT14.2
RCGNet: RGB-Based Category-Level 6D Object Pose Estimation with Geometric Guidance

Yu, Sheng	Beijing Institute of Techonology
Zhai, Di-Hua	Beijing Institute of Technology
Xia, Yuanqing	Beijing Institute of Technology
Keywords: Deep Learning Methods, Computer Vision for Automation, Computer Vision for Manufacturing Abstract: While most current RGB-D-based category-level object pose estimation methods achieve strong performance, they face significant challenges in scenes lacking depth information. In this paper, we propose a novel category-level object pose estimation approach that relies solely on RGB images. This method enables accurate pose estimation in real-world scenarios without the need for depth data. Specifically, we design a transformer-based neural network for category-level object pose estimation, where the transformer is employed to predict and fuse the geometric features of the target object. To ensure that these predicted geometric features faithfully capture the object's geometry, we introduce a geometric feature-guided algorithm, which enhances the network's ability to effectively represent the object's geometric information. Finally, we utilize the RANSAC-PnP algorithm to compute the object's pose, addressing the challenges associated with variable object scales in pose estimation. Experimental results on benchmark datasets demonstrate that our approach is not only highly efficient but also achieves superior accuracy compared to previous RGB-based methods. These promising results offer a new perspective for advancing category-level object pose estimation using RGB images.

10:40-10:45, Paper TuAT14.3
Application of LLM Guided Reinforcement Learning in Formation Control with Collision Avoidance

Yao, Chenhao	Shenzhen Technology University
Yuan, Zike	Shenzhen University
Liu, Xiaoxu	Shenzhen Technology University
Zhu, Chi	Shenzhen Technology University
Keywords: Deep Learning Methods, Multi-Robot Systems, Big Data in Robotics and Automation Abstract: Multi-Agent Systems (MAS) excel at accomplishing complex objectives through the collaborative efforts of individual agents. Among the methodologies employed in MAS, Multi-Agent Reinforcement Learning (MARL) stands out as one of the most efficacious algorithms. However, when confronted with the complex objective of Formation Control with Collision Avoidance (FCCA): designing an effective reward function that facilitates swift convergence of the policy network to an optimal solution. In this paper, we introduce a novel framework that aims to overcome this challenge. By giving large language models (LLMs) on the prioritization of tasks and the observable information available to each agent,our framework generates reward functions that can be dynamically adjusted online based on evaluation outcomes by employing more advanced evaluation metrics rather than the rewards themselves. This mechanism enables the MAS to simultaneously achieve formation control and obstacle avoidance in dynamic environments with enhanced efficiency, requiring fewer iterations to reach superior performance levels. Our empirical studies, conducted in both simulation and real-world settings, validate the practicality and effectiveness of our proposed approach.

10:45-10:50, Paper TuAT14.4
NaviDiffuser: Tackling Multi-Objective Robot Navigation by Weight Range Guided Diffusion Model

Zhang, Xuyang	University of Science and Technology of China
Feng, Ziyang	University of Science and Technology of China
Qiu, Quecheng	School of Data Science, USTC, Hefei 230026, China
Peng, Jie	University of Science and Technology of China
Li, Haoyu	University of Science and Technology of China
Ji, Jianmin	University of Science and Technology of China
Keywords: Deep Learning Methods, Collision Avoidance, Learning from Experience Abstract: The data-driven paradigm has shown great potential in solving many decision-making tasks. In the robot navigation realm, it also sparked a new trend. People believe powerful data-driven methods can learn efficient and general navigation policies from a vast offline dataset. However, robot navigation tasks differ from common planning tasks and present unique challenges. It often involves multi-objective optimization to meet arbitrary and ever-changing human preferences. It should also overcome the short-sighted problem to obtain globally optimal performance. Furthermore, high planning frequency is needed to address real-time demands. These factors obstruct the application of data-driven methods in robot navigation. To address these challenges, we integrate one of the most powerful data-driven methods, the diffusion model, into robot navigation. Our proposed approach, NaviDiffuser, utilizes a novel classification label to guide the diffusion model in capturing the complex connections between navigation and human preferences. Its Transformer network backbone outputs action sequences to alleviate short-sightedness. It also includes special distillation skills to boost the planning speed and quality. We conduct experiments in both simulated and real-world scenarios to evaluate our approach. In these experiments, NaviDiffuser not only demonstrates an extremely high arrival rate but also adjusts its navigation policy to align with different human preferences.

10:50-10:55, Paper TuAT14.5
Annotation-Free Curb Detection Leveraging Altitude Difference Image

Ma, Fulong	The Hong Kong University of Science and Technology
Hou, Peng	Tsinghua University
Liu, Yuxuan	Hong Kong University of Science and Technology
Liu, Yang	The Hong Kong University of Science and Technology
Liu, Ming	Hong Kong University of Science and Technology (Guangzhou)
Ma, Jun	The Hong Kong University of Science and Technology
Keywords: Deep Learning Methods, Computer Vision for Transportation, Deep Learning for Visual Perception Abstract: Road curbs are considered as one of the crucial and ubiquitous traffic features, which are essential for ensuring the safety of autonomous vehicles. Current methods for detecting curbs primarily rely on camera imagery or LiDAR point clouds. Image-based methods are vulnerable to fluctuations in lighting conditions and exhibit poor robustness, while methods based on point clouds circumvent the issues associated with lighting variations. However, it is the typical case that significant processing delays are encountered due to the voluminous amount of 3D points contained in each frame of the point cloud data. Furthermore, the inherently unstructured characteristics of point clouds poses challenges for integrating the latest deep learning advancements into point cloud data applications. To address these issues, this work proposes an annotation-free curb detection method leveraging Altitude Difference Image (ADI) (as shown in Fig. 1), which effectively mitigates the aforementioned challenges. Given that methods based on deep learning generally demand extensive, manually annotated datasets, which are both expensive and labor-intensive to create, we present an Automatic Curb Annotator (ACA) module. This module utilizes a deterministic curb detection algorithm to automatically generate a vast quantity of training data. Consequently, it facilitates the training of the curb detection model without necessitating any manual annotation of data. Finally, by incorporating a post-processing module, we manage to achieve state-of-the-art results on the KITTI 3D curb dataset with considerably reduced processing delays compared to existing methods, which underscores the effectiveness of our approach in curb detection tasks.

10:55-11:00, Paper TuAT14.6
R2LDM: An Efficient 4D Radar Super-Resolution Framework Leveraging Diffusion Model

Zheng, Boyuan	Tongji University
Lu, Shouyi	Tongji University
Huang, Renbo	Tongji University
Huang, Minqing	Tongji University
Lu, Fan	Tongji University
Tian, Wei	Tongji University
Zhuo, Guirong	Tongji University, Shanghai
Xiong, Lu	Tongji University
Keywords: Deep Learning Methods, Intelligent Transportation Systems, Autonomous Vehicle Navigation Abstract: We introduce R2LDM, an innovative approach for generating dense and accurate 4D radar point clouds, guided by corresponding LiDAR point clouds. Instead of utilizing range images or bird’s eye view (BEV) images, we represent both LiDAR and 4D radar point clouds using voxel features, which more effectively capture 3D shape information. Subsequently, we propose the Latent Voxel Diffusion Model (LVDM), which performs the diffusion process in the latent space. Additionally, a novel Latent Point Cloud Reconstruction (LPCR) module is utilized to reconstruct point clouds from high-dimensional latent voxel features. As a result, R2LDM effectively generates LiDAR-like point clouds from paired raw radar data. We evaluate our approach on two different datasets, and the experimental results demonstrate that our model achieves 6- to 10-fold densification of radar point clouds, outperforming state-of-the-art baselines in 4D radar point cloud super-resolution. Furthermore, the enhanced radar point clouds generated by our method significantly improve downstream tasks, achieving up to 31.7% improvement in point cloud registration recall rate and 24.9% improvement in object detection accuracy.

11:00-11:05, Paper TuAT14.7
DGETP: Dynamic Graph Attention Network for Embodied Task Planning

Sun, Pengfei	North China University of Technology
Wang, Guiling	North China University of Technology
Zhang, Xinli	North China University of Technology
Yu, Jian	Auckland University of Technology
Keywords: Deep Learning Methods, Task Planning, AI-Based Methods Abstract: With the development of embodied intelligence, many studies have made progress by incorporating scene graphs and GNN into task planning. However, most methods still face challenges in fully capturing the sequential relationships between agent actions and the environment, making it difficult to handle dynamic changes and complexity inherent in embodied tasks. This paper proposes a Dynamic Graph Attention Network for Embodied Task Planning (DGETP) to process scene graph sequences and robot graphs for dynamic environment perception. In DGETP, we design a Hierarchical Dynamic Graph Attention network (H-DGAT) by employing both structural and temporal attention mechanisms to model the dynamic evolution feature of the scene. A Dual-branch Action-object Predictor (DAP) is proposed in DGETP through introducing sequences of previous actions and objects to efficiently aggregate historical information. DAP captures temporal dependencies between past and future actions through explicit sequence modeling, and reduces prediction complexity via a dual-branch architecture that separates action and object prediction while preserving their correlations through targeted feature fusion. Experiments show that DGETP improves task accuracy by over 30% in seen scenes and over 15% in unseen scenes compared to other baselines. In complex scenes, DGETP demonstrates strong generalization ability. Finally, the simulation environment indicates that DGETP achieves more goals than most of the advanced task planning method.

11:05-11:10, Paper TuAT14.8
Evidential Uncertainty Estimation for Multi-Modal Trajectory Prediction

Marvi, Mohammad Sajad	University of Freiburg, Mercedes Benz AG
Rist, Christoph	Daimler
Schmidt, Julian	Mercedes-Benz AG
Jordan, Julian	Mercedes-Benz AG
Valada, Abhinav	University of Freiburg
Keywords: Deep Learning Methods, AI-Based Methods, Probabilistic Inference Abstract: Accurate trajectory prediction is crucial for autonomous driving, yet uncertainty in agent behavior and perception noise makes it inherently challenging. While multi modal trajectory prediction models generate multiple plausible future paths with associated probabilities, effectively quantifying uncertainty remains an open problem. In this work, we propose a novel multi-modal trajectory prediction approach based on evidential deep learning that estimates both positional and mode probability uncertainty in real time. Our approach leverages a Normal Inverse Gamma distribution for positional uncertainty and a Dirichlet distribution for mode uncertainty. Unlike sampling-based methods, it infers both types of uncertainty in a single forward pass, significantly improving efficiency. Additionally, we experimented with uncertainty-driven importance sampling to improve training efficiency by prioritizing underrepresented high-uncertainty samples over redundant ones. We perform extensive evaluations of our method on the Argoverse 1 and Argoverse 2 datasets, demonstrating that it provides reliable uncertainty estimates while maintaining high trajectory prediction accuracy.


TuAT15	206
Swarm Robotics 1	Regular Session
Chair: Basiri, Meysam	Instituto Superior Técnico
Co-Chair: Hao, Ning	Harbin Institute of Technology

10:30-10:35, Paper TuAT15.1
Frontier Shepherding: A Bio-Inspired Multi-Robot Framework for Large-Scale Exploration

Lewis, John	Instituto Superior Técnico, Lisboa
Basiri, Meysam	Instituto Superior Técnico
Lima, Pedro U.	Instituto Superior Técnico - Institute for Systems and Robotics
Keywords: Multi-Robot Systems, Behavior-Based Systems, Swarm Robotics Abstract: Efficient exploration of large-scale environments remains a critical challenge in robotics, with applications ranging from environmental monitoring to search and rescue operations. This article proposes Frontier Shepherding (FroShe), a bio-inspired multi-robot framework for large-scale exploration. The framework heuristically models frontier exploration based on the shepherding behavior of herding dogs, where frontiers are treated as a swarm of sheep reacting to robots modeled as shepherding dogs. FroShe is robust across varying environment sizes and obstacle densities, requiring minimal parameter tuning for deployment across multiple agents. Simulation results demonstrate that the proposed method performs consistently, regardless of environment complexity, and outperforms state-of-the-art exploration strategies by an average of 20% with three UAVs. The approach was further validated in real-world experiments using single- and dual-drone deployments in a forest-like environment.

10:35-10:40, Paper TuAT15.2
Multimodal Upstream Motion of Magnetically Controlled Micro/Nano Robots in High-Viscosity Fluids

Li, Chan	Beihang University
Zeng, Zijin	Beihang University
Fan, Tianyi	Beihang University
Wang, Shengyuan	Beihang University
Wang, Chutian	Beihang University
Sun, Hongyan	Beihang University
Huang, Shunxiao	Beihang University
Niu, Wenyan	Beihang University
Guo, Yingjian	Beihang University
Feng, Lin	Beihang University
Keywords: Micro/Nano Robots, Swarm Robotics, Field Robots Abstract: The efficacy of targeted cancer drug therapy is significantly compromised by imprecise drug delivery mechanisms. Micro/nano robots (MNRs), characterized by their controllable motion, present a promising solution to this challenge. However, the non-Newtonian nature of blood, with its high viscosity and blood cells’ interference, poses substantial limitations on the upstream efficiency of MNRs. This paper innovatively discusses for the first time the effects of blood viscosity and blood cell interference on the motion of MNRs, investigating their upstream motion capabilities in blood through comprehensive theoretical modeling, simulation, and experimental validation. A dynamic model of MNR motion was developed, and the velocity formula for MNRs in non-Newtonian fluid was derived. Experiments were conducted using different magnetic fields in pure water, high-viscosity simulated blood, and diluted blood. Results indicated that under a gradient magnetic field, the upstream velocities of MNRs in pure water, simulated blood, and diluted blood were 45.0, 14.4, and 11.1 mm/s, respectively. Under a rotating magnetic field, the velocities of vortex swarms were 825, 240, and 145 µm/s, respectively. Increased fluid viscosity reduced MNR velocity by 70%, while blood cells caused an additional 10% reduction. This research establishes a theoretical and experimental framework for the upstream motion of MNRs against blood flow, enhancing their potential in targeted drug delivery and broader biomedical applications.

10:40-10:45, Paper TuAT15.3
A Spatiotemporal Downwash Modeling for Agile Close-Proximity Multirotor Flight

Kharitenko, Pavel	Technical University of Munich
Fan, Yicheng	ShanghaiTech University
Liu, Xiaopei	SHANGHAITECH UNIVERSITY
Wang, Yang	Shanghaitech University
Keywords: Swarm Robotics, Multi-Robot Systems, Model Learning for Control Abstract: Accurate aerodynamic interaction modeling in multi-drone tasks is crucial for enhancing system stability and efficiency, especially when facing major disturbances from downwash wake effects. Conventional data-driven and empirical models mainly address simplified cases where one drone hovers or all vehicles have low absolute and relative velocities (≤ 0.5 m/s), and rely merely on relative states. In this study, we use high-fidelity Computational Fluid Dynamics (CFD) simulations to explore quadrotor interactions at higher speeds (0.5–4.0 m/s). We find that as the absolute velocities of the UAVs rise, downwash effects change significantly. To account for these discrepancies, we present a data-driven model considering both the absolute and relative properties of the downwash problem. We propose a geometric deep neural network predictor and compare its performance with existing data-driven and empirical models. Validations on two quadrotor settings show that our model gives more reliable predictions in tough scenarios and performs better in training without rigorous fine-tuning. Finally, we combine our predictor with a nonlinear feedback controller to enhance flight control under downwash disturbances. However, we encounter limitations for our speed ranges during trajectory tracking such as delays and velocity loss. Despite these challenges, our encoding and prediction method shows to be a promising step to address the downwash effects at higher speeds. We release our dataset, method, and re-implementations at: https://github.com/pavelkharitenko/flare-dw

10:45-10:50, Paper TuAT15.4
Modeling Deception in Multi-Robot Target-Attacker-Defender Game Via Deep Reinforcement Learning

Gou, Fandi	Shanghai Jiao Tong University
Zhao, Chenyu	Shanghai Jiao Tong University
Du, Haikuo	Shanghai Jiao Tong University
Cai, Yunze	Shanghai Jiao Tong University
Keywords: Multi-Robot Systems, Reinforcement Learning, Optimization and Optimal Control Abstract: Deception is a crucial strategy in adversarial scenarios, yet its application in multi-agent confrontations remains understudied. This paper investigates deception in a multi-robot Target-Attacker-Defender (MR-TAD) game, where Attackers aim to capture Targets while evading Defenders. To model deception effectively, we propose a hierarchical decision-making framework that integrates multi-agent reinforcement learning (MARL) for high-level deceptive strategies and optimal control for low-level motion control. Furthermore, we introduce a novel composite deception-oriented reward function, which combines hitting rewards, belief switch rewards, and position advantage rewards to facilitate the training of deceptive behaviors. Simulation results across varying numbers of robots demonstrate that incorporating deception significantly increases the success rate of Attackers, with an average improvement of over 70% compared to non-deceptive strategies. Additionally, real-world experiments with omnidirectional mobile robots further confirm the effectiveness of the proposed method. This study establishes a generalizable framework for modeling deception in multi-agent systems, with potential applications in various multi-agent scenarios.

10:50-10:55, Paper TuAT15.5
MRS-CWC: A Weakly Constrained Multi-Robot System with Controllable Constraint Stiffness for Mobility and Navigation in Unknown 3D Rough Environments

Xiao, Runze	The University of Tokyo
Wang, Yongdong	The University of Tokyo
Tsunoda, Yusuke	University of Hyogo
Osuka, Koichi	Osaka Institute of Technology
Asama, Hajime	The University of Tokyo
Keywords: Multi-Robot Systems, Distributed Robot Systems, Flexible Robotics Abstract: Navigating unknown three-dimensional (3D) rugged environments is challenging for multi-robot systems. Traditional discrete systems struggle with rough terrain due to limited individual mobility, while modular systems—where rigid, controllable constraints link robot units—improve traversal but suffer from high control complexity and reduced flexibility. To address these limitations, we propose the Multi-Robot System with Controllable Weak Constraints (MRS-CWC), where robot units are connected by constraints with dynamically adjustable stiffness. This adaptive mechanism softens or stiffens in real time during environmental interactions, ensuring a balance between flexibility and mobility. We formulate the system’s dynamics and control model and evaluate MRS-CWC against six baseline methods and an ablation variant in 100 benchmark simulations. Results show that MRS-CWC achieves the highest navigation completion rate and ranks second in success rate, efficiency, and energy cost in the highly rugged terrain group, outperforming all baseline methods without relying on environmental modeling, path planning, or complex control. Even where MRS-CWC ranks second, its performance is only slightly behind a more complex ablation variant with environmental modeling and path planning. Finally, we develop a physical prototype and validate its feasibility in a constructed rugged environment. For videos, simulation benchmarks, and code, please visit the project website https://wyd0817.github.io/project-mrs-cwc/.

10:55-11:00, Paper TuAT15.6
A Two-Stage Swarm Planning Framework for Efficient Multi-Drone Waypoint Traversal

Cui, Kailun	Harbin Institute of Technology
He, Fenghua	Harbin Institute of Technology
Hao, Ning	Harbin Institute of Technology
Hu, Hao	Harbin Institude of Technology
Keywords: Motion and Path Planning, Swarm Robotics, Multi-Robot Systems Abstract: The multi-drone waypoint traversal has significant potential for aerial robot swarms in various applications. However, it still faces challenges including low time efficiency, susceptibility to local minima, poor resilience to external disturbances, high computational complexity, and high communication burden. To address these issues, we propose a two-stage swarm planning framework by integrating an offline global trajectory generator and an online distributed local trajectory planner. This approach not only ensures time-optimality but also enhances resistance to external disturbances. Specifically, a complementary progress constraint (CPC)-based global trajectory planning method is first presented to generate globally optimal reference trajectories. Then, by taking these trajectories as global guidance, a local planner is designed to guarantee collision-free traversal. In the local planner, we present a distributed local re-planning algorithm by embedding the positional constraints constructed by Voronoi diagrams into the model predictive contouring control (MPCC). The drones only exchange their position information, significantly reducing the communication load. Additionally, the Voronoi-based spatial constraints allow the swarms to eliminate the collision risk caused by asynchronous communication. To reduce onboard computational resource requirements, the local planner adopts the real-time iteration (RTI) technique, executing the optimization only once per control cycle. Both simulation and real-world experiments demonstrate that our approach outperforms state-of-the-art methods in terms of waypoint tracking accuracy, safety, and global time optimality.

11:00-11:05, Paper TuAT15.7
Collective Behavior Clone with Visual Attention Via Neural Interaction Graph Prediction

Li, Kai	Zhejiang University, Westlake University
Ma, Zhao	Westlake University
Li, Liang	Max-Planck Institute of Animal Behavior
Zhao, Shiyu	Westlake University
Keywords: Swarm Robotics, Agent-Based Systems Abstract: In this paper, we propose a framework, collective behavioral cloning (CBC), to learn the underlying interaction mechanism and control policy of a swarm system. Given the trajectory data of a swarm system, we propose a graph variational autoencoder (GVAE) to learn the local interaction graph. Based on the interaction graph and swarm trajectory, we use behavioral cloning to learn the control policy of the swarm system. To demonstrate the practicality of CBC, we deploy it on a real-world decentralized vision-based robot swarm system. A visual attention network is trained based on the learned interaction graph for online neighbor selection. Experimental results show that our method outperforms previous approaches in predicting both the interaction graph and swarm actions with higher accuracy. This work offers a promising approach for understanding interaction mechanisms and swarm dynamics in future swarm robotics research. Code and data are available


TuAT16	207
Human-Robot Interaction 1	Regular Session
Chair: Xie, Biyun	University of Kentucky

10:30-10:35, Paper TuAT16.1
TagGuideBot: Enhancing Robot Intelligence with Object Tags and VLMs

Chen, Jiayi	Shenzhen University
He, Ying	Shenzhen University
Yu, Fei	Guangming Lab
Keywords: Multi-Modal Perception for HRI, Imitation Learning, Object Detection, Segmentation and Categorization Abstract: This research aims to enhance the interaction between humans and robots, especially in environments where multiple objects of the same type or semantic ambiguities exist. Traditional command-based interactions typically require users to provide precise descriptions, which often poses a significant challenge for users. To address this issue, we propose a framework named TagGuideBot, which leverages Visual Language Models (VLMs) and utilizes object markers to help locate and identify objects in the environment. By integrating positional point prompts of the target objects with robot motion planning models, we aim to achieve a more accurate understanding and execution of complex commands, thus improving the efficiency and naturalness of interactions. Experimental results demonstrate that TagGuideBot effectively addresses the challenges posed by complex commands and environmental complexities, achieving an accuracy of 66.3% on user instructions extended beyond the training set, providing solid support for further optimization of human-robot interaction.

10:35-10:40, Paper TuAT16.2
A Multimodal Neural Network for Recognizing Subjective Self-Disclosure towards Social Robots

Powell, Henry	Amazon
Laban, Guy	University of Cambridge
Cross, Emily S	ETH Zurich
Keywords: Multi-Modal Perception for HRI, Emotional Robotics, Social HRI Abstract: Subjective self-disclosure is an important feature of human social interaction. While much has been done in the social and behavioural literature to characterise the features and consequences of subjective self-disclosure, little work has been done thus far to develop computational systems that are able to accurately model it. Even less work has been done that attempts to model specifically how human interactants self-disclose with robotic partners. It is becoming more pressing as we require social robots to work in conjunction with and establish relationships with humans in various social settings. In this paper, our aim is to develop a custom multimodal attention network based on models from the emotion recognition literature, training this model on a large self-collected self-disclosure video corpus, and constructing a new loss function, the scale preserving cross entropy loss, that improves upon both classification and regression versions of this problem. Our results show that the best performing model, trained with our novel loss function, achieves an F1 score of 0.83, an improvement of 0.48 from the best baseline model. This result makes significant headway in the aim of allowing social robots to pick up on an interaction partner's self-disclosures, an ability that will be essential in social robots with social cognition.

10:40-10:45, Paper TuAT16.3
A Coarse-To-Fine Approach to Multi-Modality 3D Occupancy Grounding

Shi, Zhan	Zhejiang University
Wang, Song	Zhejiang University
Chen, Junbo	Udeer AI
Zhu, Jianke	Zhejiang University
Keywords: Multi-Modal Perception for HRI, Semantic Scene Understanding, Deep Learning for Visual Perception Abstract: Visual grounding aims at identifying objects or regions in a scene based on natural language descriptions, which is essential for spatially aware perception in autonomous driving. However, existing visual grounding tasks typically depend on bounding boxes that often fail to capture fine-grained details. Not all voxels within a bounding box are occupied, resulting in inaccurate object representations. To address this, we introduce a benchmark for 3D occupancy grounding in challenging outdoor scenes. Built on the nuScenes dataset, it fuses natural language with voxel-level occupancy annotations, offering more precise object perception compared to the traditional grounding task. Moreover, we propose GroundingOcc, an end-to-end model designed for 3D occupancy grounding through multimodal learning. It combines visual, textual, and point cloud features to predict object location and occupancy information from coarse to fine. Specifically, GroundingOcc comprises a multimodal encoder for feature extraction, an occupancy head for voxel-wise predictions, and a grounding head for refining localization. Additionally, a 2D grounding module and a depth estimation module enhance geometric understanding, thereby boosting model performance. Extensive experiments on the benchmark demonstrate that our method outperforms existing baselines on 3D occupancy grounding. The dataset is available at https://github.com/RONINGOD/GroundingOcc.

10:45-10:50, Paper TuAT16.4
ROD-VLM: A Framework of Real-Time Robotic Perception, Reasoning and Manipulation

Zhu, Yinkai	Chongqing University
Wang, Xinbei	Chongqing University
Yu, Feilin	Chongqing University
Lei, Tianjiao	Chongqing University
Sun, Yizhuo	Harbin Institute of Technology
Keywords: Multi-Modal Perception for HRI, AI-Enabled Robotics, Perception for Grasping and Manipulation Abstract: In recent years, Vision-Language Models (VLMs) have exhibited powerful capacity of reasoning, decomposing long-horizon tasks and motion planning in robotic manipulation tasks. However, the current operating speed of VLMs has limited the interaction frequency of users and the model to several seconds, which disables the real-time perception of environmental changes when executing tasks released by VLM. We propose Real-time Object Detection- VLM (ROD-VLM), a novel framework which combines classical Object Detection Algorithm YOLO-v5x with VLM to achieve the real-time robotic environmental perception, reasoning and manipulation. Specifically, we introduce the concept of key frame to VLM model, capturing the crucial information through object detection algorithm to assist VLM in perceiving varying environment.Our comprehensive real-world experiments show that ROD-VLM possess an excellent capability in real-time environmental understanding, decision-making and action executing.

10:50-10:55, Paper TuAT16.5
A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment

Inoue, Koji	Kyoto University
Okafuji, Yuki	CyberAgent, Inc
Baba, Jun	CyberAgent, Inc
Ohira, Yoshiki	CyberAgent
Hyodo, Katsuya	CyberAgent Inc
Kawahara, Tatsuya	Kyoto Univ
Keywords: Natural Dialog for HRI, Service Robotics, Multi-Modal Perception for HRI Abstract: Turn-taking is a crucial aspect of human-robot interaction, directly influencing conversational fluidity and user engagement. While previous research has explored turn-taking models in controlled environments, their robustness in real-world settings remains underexplored. In this study, we propose a noise-robust voice activity projection (VAP) model, based on a Transformer architecture, to enhance real-time turn-taking in dialogue robots. To evaluate the effectiveness of the proposed system, we conducted a field experiment in a shopping mall, comparing the VAP system with a conventional cloud-based speech recognition system. Our analysis covered both subjective user evaluations and objective behavioral analysis. The results showed that the proposed system significantly reduced response latency, leading to a more natural conversation where both the robot and users responded faster. The subjective evaluations suggested that faster responses contribute to a better interaction experience.

10:55-11:00, Paper TuAT16.6
Anomaly Detection in Human-Robot Interaction Using Multimodal Models Constructed from In-The-Wild Interactions

Mochizuki, Shota	Nagoya University
Yamashita, Sanae	Nagoya University
Hoshimure, Kenya	Osaka University
Baba, Jun	CyberAgent, Inc
Kubota, Tomonori	Nagoya University
Ogawa, Kohei	Nagoya University
Higashinaka, Ryuichiro	Nagoya University/NTT
Keywords: Natural Dialog for HRI, Multi-Modal Perception for HRI, Telerobotics and Teleoperation Abstract: In recent years, numerous studies have been conducted on dialogue robots powered by large language models, enabling sophisticated interactions such as providing guidance and engaging in small talk. However, the interaction performance remains imperfect, and the robots sometimes cause problems during interactions. In this study, we aim to automatically detect such anomalies in human-robot interactions by creating a dataset and developing anomaly detection models. To this end, we created a dataset by manually annotating videos of in-the-wild interactions collected from our field experiment designed to test a framework of parallel conversations in which a human intervenes when a problem occurs in the interaction. Using this dataset, we trained classification models to construct anomaly detection models. We then conducted another field experiment in which the model's detection results were presented as alerts to operators within the parallel conversation framework. The results confirmed that providing alerts on the basis of the anomaly detection model was useful for facilitating operator intervention.

11:00-11:05, Paper TuAT16.7
Incremental Language Understanding for Online Motion Planning of Robot Manipulators

Abrams, Mitchell	Tufts University
Oelerich, Thies	TU Wien
Hartl-Nesic, Christian	TU Wien
Kugi, Andreas	TU Wien
Scheutz, Matthias	Tufts University
Keywords: Natural Dialog for HRI, Manipulation Planning, Motion and Path Planning Abstract: Human-robot interaction requires robots to process language incrementally, adapting their actions in real-time based on evolving speech input. Existing approaches to language-guided robot motion planning typically assume fully specified instructions, resulting in inefficient stop-and-replan behavior when corrections or clarifications occur. In this paper, we introduce a novel reasoning-based incremental parser which integrates an online motion planning algorithm within the cognitive architecture. Our approach enables continuous adaptation to dynamic linguistic input, allowing robots to update motion plans without restarting execution. The incremental parser maintains multiple candidate parses, leveraging reasoning mechanisms to resolve ambiguities and revise interpretations when needed. By combining symbolic reasoning with online motion planning, our system achieves greater flexibility in handling speech corrections and dynamically changing constraints. We evaluate our framework in real-world human-robot interaction scenarios, demonstrating online adaptions of goal poses, constraints, or task objectives. Our results highlight the advantages of integrating incremental language understanding with real-time motion planning for natural and fluid human-robot collaboration. The experiments are demonstrated in the accompanying video at www.acin.tuwien.ac.at/42d5.

11:05-11:10, Paper TuAT16.8
EHoA: A Benchmark for Task-Oriented Hand-Object Action Recognition Via Event Vision (I)

Chen, Wenkai	University of Hamburg
Liu, Shang-Ching	Universität Hamburg
Zhang, Jianwei	University of Hamburg
Keywords: Datasets for Human Motion, Data Sets for Robotic Vision, Data Sets for Robot Learning Abstract: The event-based camera is a novel neuromorphic vision sensor that can perceive different dynamic behaviors due to its low latency, asynchronous data stream, and high dynamic range characteristics. There has been much work based on event cameras to solve problems such as object tracking, visual odometry, and gesture recognition. However, the adoption of event vision to analyze hand-object action in a dynamic environment, a problem that regular CMOS cameras cannot handle, is still lacking in relevant research. This work presents a richly annotated task-oriented hand-object action dataset consisting of asynchronous event streams, captured by the event-based camera system on different application scenarios. In addition, we design an attention-based residual spiking neural network (ARSNN) by learning temporal-wise and spatial-wise attention simultaneously and introducing a particular residual connection structure to achieve dynamic hand-object action recognition. Extensive experiments are validated by comparing with existing baseline methods to form a vision benchmark. We also show that the learned recognition model can be transferred to classify a real robot hand-object action.


TuAT17	210A
Autonomous Navigation	Regular Session
Chair: Zhang, Wei	Eastern Institute of Technology, Ningbo

10:30-10:35, Paper TuAT17.1
Decentralized Multi-Robot Navigation Policy with Enhanced Security Using Graph GRU Policy Network

Chen, Lin	Hunan University
Ao, Yu Xuan	Xi'an Jiaotong University
Zhou, Zhen	Hunan University
Wang, Yaonan	Hunan University
Wang, Danwei	Nanyang Technological University
Keywords: Autonomous Vehicle Navigation, Intelligent Transportation Systems Abstract: Formulating a multi-robot obstacle avoidance policy is essential for enabling safe and efficient navigation in multi-robot environments, forming a critical component of the effective operation of multi-robot systems. Recently, reinforcement learning has been applied to improve the performance of decentralized, policy-driven robots in task execution. However, ensuring the safety of these agents during movement remains a significant challenge due to the inherent risks associated with the reinforcement learning process, such as frequent collisions. To address this issue and enhance the safety of policy-guided multi-robot navigation, we propose a novel policy based on imitation learning. This framework introduces a novel policy neural network that integrates a graph attention mechanism with the GRU network structure. The key innovation lies in utilizing the interactions between neighboring robots to enhance the safety of their movements. In a multi-robot simulation environment, robot behaviors are directed by the proposed policy. A comparative analysis was conducted between our approach and RL-RVO, one of the advanced methods in the field. The results demonstrate that our approach outperforms RL-RVO, achieving a higher success rate and significantly improving safety performance.

10:35-10:40, Paper TuAT17.2
SynthDrive: Scalable Real2Sim2Real Sensor Simulation Pipeline for High-Fidelity Asset Generation and Driving Data Synthesis

Chen, ZhengQing	Fudan University
Mei, Ruohong	Soochow University
Guo, Xiaoyang	Horizon Robotics
Wang, Qingjie	HUST
Hu, Yubin	Tsinghua University
Yin, Wei	University of Adelaide
Ren, Weiqiang	Horizon Robotics
Zhang, Qian	Horizon Robotics
Keywords: Autonomous Vehicle Navigation, Simulation and Animation, Deep Learning for Visual Perception Abstract: In the field of autonomous driving, sensor simulation is essential for generating rare and diverse scenarios that are difficult to capture in real-world environments. Current solutions fall into two categories: 1) CG-based methods, such as CARLA, which lack diversity and struggle to scale to the vast array of rare cases required for robust perception training; and 2) learning-based approaches, such as NeuSim, which are limited to specific object categories (vehicles) and require extensive multi-sensor data, hindering their applicability to generic objects. To address these limitations, we propose SynthDrive, a scalable real2sim2real system that leverages 3D generation to automate asset mining, generation, and rare-case data synthesis. Our framework introduces two key innovations: 1) Automated Rare-Case Mining and Synthesis. Given a text prompt describing specific objects, SynthDrive automatically mines image data from the Internet and then generates corresponding high-fidelity 3D assets, which eliminates the need for costly manual data collection. By integrating these assets into existing street-view data, our pipeline produces photorealistic rare-case data, supporting rapid scaling to diverse assets including irregular obstacles and temporary traffic facilities. 2) High-Fidelity 3D Generation. We propose a hybrid asset generation pipeline that combines a geometry-aware LRM, iterative mesh optimization, and an improved texture fusion algorithm. Our approach achieves 0.0164 Chamfer Distance on GSO dataset, outperforming InstantMesh by 14.1% in geometry accuracy, and achieves 19.05 PSNR (vs 16.84) for texture quality. This enables fine geometry details and high-resolution texture generation, which is essential for perception model training. Experiments demonstrate that SynthDrive-generated data improves performance of downstream perception tasks (2D and 3D detection on rare objects) by 2-4% mAP. SynthDrive greatly lowers the data production cost and improves the diversity for corner-case data generation, showcasing extensive potential applications in the field of autonomous driving.

10:40-10:45, Paper TuAT17.3
Emergency Avoidance: Model Predictive Control Based Path Tracking for Unmanned Ground Vehicles with Active Obstacle Avoidance

Chen, ZongLiang	Southeast University
Pan, Shuguo	Southeast University
Tang, Xinhua	Southeast University
Gao, Wang	Southeast University
Liang, Shaobo	Southeast University
Li, Xiaocong	Eastern Institute of Technology, Ningbo
Keywords: Autonomous Vehicle Navigation, Collision Avoidance, Kinematics Abstract: Autonomous driving is a high-performance,safety-critical task. Effectively controlling autonomous vehicles to improve their control performance and safety is critical, especially when dealing with complex and dynamic environments. However, in real-time obstacle avoidance (OA) scenarios, the planning layer often fails due to high computational complexity and response delays. This trade-off between computational efficiency and safety performance highlights a critical challenge: how to achieve an optimal balance between autonomous driving safety and real-time performance. Therefore, in recent years, addressing OA at the control layer has become a key research focus for enhancing the safety of autonomous vehicles. Considering MPC’s advantages in prediction and constraint handling, this paper is specifically integrated an OA safety distance constraint into MPC to effectively address OA in UGVs. First, a Taylor expansion is used to construct the error model of the UGV. Then, the safe distance constraints for obstacle avoidance is proposed to consider both tracking errors and closeness to obstacles. Additionally, physical constraints are also taken into account in the development of a safe obstacle avoidance MPC (SOAMPC). By incorporating safety distance constraints for obstacle avoidance and considering physical constraints, SOAMPC is proposed and implemented in the UGVs. Furthermore, essential control theoretic properties are established, including recursive feasibility, the guarantee of collision avoidance, and system stability. To evaluate the effectiveness of the SOAMPC controller, simulations and experiments are conducted in a multiple-obstacle environment. The results demonstrate that the SOAMPC method successfully avoids obstacles while maintaining stability. Comparing to other methods, the results demonstrate the superior efficiency of the proposed SOAMPC, while achieving accurate path tracking.

10:45-10:50, Paper TuAT17.4
SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation Via World Model

Li, Xinqing	University of Chinese Academy of Sciences
Song, Ruiqi	Tongji University
Xie, Qingyu	University of Chinese Academy of Sciences
Wu, Ye	University of Chinese Academy of Sciences
Zeng, Nanxing	University of Chinese Academy of Sciences
Ai, Yunfeng	University of Chinese Academy of Sciences
Keywords: Autonomous Vehicle Navigation, Deep Learning Methods, Simulation and Animation Abstract: With the rapid advancement of autonomous driving technology, a lack of data has become a major obstacle to enhancing perception model accuracy. Researchers are now exploring controllable data generation using world models to diversify datasets. However, previous work has been limited to studying image generation quality on specific public datasets. There is still relatively little research on how to build data generation engines for real-world application scenes to achieve large-scale data generation for challenging scenes. In this paper, a simulator-conditioned scene generation engine based on world model is proposed. By constructing a simulation system consistent with real-world scenes, simulation data and labels, which serve as the conditions for data generation in the world model, for any scenes can be collected. It is a novel data generation pipeline by combining the powerful scene simulation capabilities of the simulation engine with the robust data generation capabilities of the world model. In addition, a benchmark with proportionally constructed virtual and real data, is provided for exploring the capabilities of world models in real-world scenes. Quantitative results show that these generated images significantly improve downstream perception models performance. Finally, we explored the generative performance of the world model in urban autonomous driving scenarios. All the data and code will be available at url{https://github.com/Li-Zn-H/SimWorld}.

10:50-10:55, Paper TuAT17.5
Enhancing Deep Reinforcement Learning-Based Robot Navigation Generalization through Scenario Augmentation

Wang, Shanze	The Hong Kong Polytechnic University
Tan, Mingao	Eastern Institute of Technology, Ningbo
Yang, Zhibo	National University of Singapore
Wang, Xianghui	Eastern Institute of Technology, Ningbo, China
Shen, Xiaoyu	Eastern Institute of Technology, Ningbo, China
Huang, Hailong	The Hong Kong Polytechnic University
Zhang, Wei	Eastern Institute of Technology, Ningbo
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, AI-Enabled Robotics Abstract: This work focuses on enhancing the generalization performance of deep reinforcement learning-based robot navigation in unseen environments. We present a novel data augmentation approach called scenario augmentation, which enables robots to navigate effectively across diverse settings without altering the training scenario. The method operates by mapping the robot's observation into an imagined space, generating an imagined action based on this transformed observation, and then remapping this action back to the real action executed in simulation. Through scenario augmentation, we conduct extensive comparative experiments to investigate the underlying causes of suboptimal navigation behaviors in unseen environments. Our analysis indicates that limited training scenarios represent the primary factor behind these undesired behaviors. Experimental results confirm that scenario augmentation substantially enhances the generalization capabilities of deep reinforcement learning-based navigation systems. The improved navigation framework demonstrates exceptional performance by producing near-optimal trajectories with significantly reduced navigation time in real-world applications.

10:55-11:00, Paper TuAT17.6
OVL-MAP: An Online Visual Language Map Approach for Vision-And-Language Navigation in Continuous Environments

Wen, Shuhuan	Yanshan University
Zhang, Ziyuan	Yanshan University
Sun, Yuxiang	City University of Hong Kong
Wang, Zhiwen	Yanshan University Hebei Key Laboratory of Industrial Computer C
Keywords: AI-Enabled Robotics, Embodied Cognitive Science, RGB-D Perception Abstract: Vision-and-Language Navigation in Continuous Envi- ronments (VLN-CE) requires agents to navigate 3D environments based on visual observations and natural language instructions. Existing approaches, focused on topological and semantic maps, often face limitations in accurately understanding and adapting to complex or previously unseen environments, particularly due to static and offline map constructions. To address these chal- lenges, this paper proposes OVL-MAP, an innovative algorithm comprising three key modules: an online vision-and-language map construction module, a waypoint prediction module, and an action decision module. The online map construction module leverages robust open-vocabulary semantic segmentation to dynamically enhance the agent’s scene understanding. The waypoint prediction module processes natural language instructions to identify task- relevant regions, predict sub-goal locations, and guide trajectory planning. The action decision module utilizes the DD-PPO strategy for effective navigation. Evaluations on the Robo-VLN and R2R- CE datasets demonstrate that OVL-MAP significantly improves navigation performance and exhibits stronger generalization in unknown environments.

11:00-11:05, Paper TuAT17.7
Opportunistic Collaborative Planning with Large Vision Model Guided Control and Joint Query-Service Optimization

Chen, Jiayi	Shenzhen Research Institute of Big Data, the Chinese University
Wang, Shuai	Shenzhen Institute of Advanced Technology, Chinese Academy of Sc
Li, Guoliang	University of Macau
Xu, Wei	Manifold Tech Limited
Zhu, Guangxu	Shenzhen Research Institute of Big Data
Ng, Derrick Wing Kwan	University of New South Wales
Xu, Chengzhong	University of Macau
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, Intelligent Transportation Systems Abstract: Navigating autonomous vehicles in open scenarios is a challenge due to the difficulties in handling unseen objects. Existing solutions either rely on small models that struggle with generalization or large models that are resource-intensive. While collaboration between the two offers a promising solution, the key challenge is deciding when and how to engage the large model. To address this issue, this paper proposes opportunistic collaborative planning (OCP), which seamlessly integrates efficient local models with powerful cloud models through two key innovations. First, we propose large vision model guided model predictive control (LVM-MPC), which leverages the cloud for LVM perception and decision making. The cloud output serves as a global guidance for a local MPC, thereby forming a closed-loop perception-to-control system. Second, to determine the best timing for large model query and service, we propose collaboration timing optimization (CTO), including object detection confidence thresholding (ODCT) and cloud forward simulation (CFS), which decides when to seek cloud assistance and when to offer cloud service. Extensive experiments show that the proposed OCP outperforms existing methods in terms of both navigation time and success rate.

11:05-11:10, Paper TuAT17.8
Multi-Step Deep Koopman Network (MDK-Net) for Vehicle Control in Frenet Frame

Abtahi, Mohammad	University of California Davis
Rabbani, Mahdis	University of California, Davis
Abdolmohammadi, Armin	University of California Davis
Nazari, Shima	University of California Davis
Keywords: Autonomous Vehicle Navigation, Deep Learning Methods, Optimization and Optimal Control Abstract: The highly nonlinear dynamics of vehicles present a major challenge for the practical implementation of optimal control and Model Predictive Control (MPC) approaches in path planning and tracking applications. Koopman operator theory offers a global linear representation of nonlinear dynamical systems, making it a promising framework for optimization-based vehicle control. This paper introduces a novel deep learning-based Koopman modeling approach that employs deep neural networks to capture the full vehicle dynamics, from pedal and steering inputs to chassis states, within a curvilinear Frenet frame. The superior accuracy of the Koopman model compared to identified linear models is shown for a double lane change maneuver. Furthermore, it is shown that an MPC controller deploying the Koopman model provides significantly improved performance while maintaining computational efficiency comparable to a linear MPC.


TuAT18	210B
Multi-Robot Systems 1	Regular Session
Chair: Malis, Ezio	Inria
Co-Chair: Zhao, Shiyu	Westlake University

10:30-10:35, Paper TuAT18.1
Bridging the Reality Gap: Communication-Aware Task Allocation with Multi-Objective Asynchronous Policy Learning

Xiong, Zehao	National University of Defense Technology
Zhou, Yu	National University of Defense Technology
Xi, YeXun	National University of Defense Technology
Cao, Yizhe	College of Intelligence Science and Technology, National Univers
Wang, Chang	National University of Defense Technology
Li, Jie	National University of Defense Technology
Keywords: Distributed Robot Systems, Multi-Robot Systems, Networked Robots Abstract: Distributed task allocation in UAV swarms is sensitive to excessive bandwidth requirements and frequent inter-UAV communication. Combining reinforcement learning and traditional distributed task allocation demonstrates great potential in enhancing algorithm performance and optimizing communication. However, existing studies rely on ideal bandwidth assumptions and unrealistic time synchronization, making training and validation impractical under real-world conditions. This paper proposes a communication-aware distributed task allocation method that employs an asynchronous strategy learning framework and multi-objective optimization to reduce communication throughput and task conflicts. First, the allocation algorithm integrated with communication is formalized as an Asynchronous Constrained Decentralized Partially Observable Markov Decision Process (ACDEC-POMDP), which extends the original learning objective to accommodate asynchronous requirements. Channel access and other features are observations, actions are inter-agent adaptive gating mechanisms, and the shared reward reflects global task conflict changes. Second, to address the asynchronous data processing problem under the Centralized Training-Distributed Execution (CTDE), a method based on 'concatenation' is proposed to stitch together trajectories from different UAVs, enabling flexible and stable training deployment. Third, a Multi-Objective Coupled Proximal Policy Optimization (MOC-PPO) is proposed, aiming to simultaneously reduce bandwidth demands and minimize task conflicts, where a reinforcement learning method integrating Lagrangian-dual optimization is employed to solve for the optimal parameters. Finally, a Hardware-in-the-Loop (HIL) environment is built, using authentic network protocols and simulated channel transmissions, to bridge the gap between simulation-trained strategies and real-world communication deployment. The experimental results show that, compared with the original asynchronous method, the trained strategy can significantly reduce the communication overhead without increasing the optimization loss. In addition, the trained communication strategies demonstrate strong scalability-generalization.

10:35-10:40, Paper TuAT18.2
MGPRL: Distributed Multi-Gaussian Processes for Wi-Fi-Based Multi-Robot Relative Localization in Large Indoor Environments

Ghanta, Sai Krishna	University of Georgia
Parasuraman, Ramviyas	University of Georgia
Keywords: Multi-Robot Systems, Localization, Sensor Networks Abstract: Relative localization is a crucial capability for multi-robot systems operating in GPS-denied environments. Existing approaches for multi-robot relative localization often depend on costly or short-range sensors like cameras and LiDARs. Consequently, these approaches face challenges such as high computational overhead (e.g., map merging) and difficulties in disjoint environments. To address this limitation, this paper introduces MGPRL, a novel distributed framework for multi-robot relative localization using convex-hull of multiple Wi-Fi access points (AP). To accomplish this, we employ co-regionalized multi-output Gaussian Processes for efficient Radio Signal Strength Indicator (RSSI) field prediction and perform uncertainty-aware multi-AP localization, which is further coupled with weighted convex hull-based alignment for robust relative pose estimation. Each robot predicts the RSSI field of the environment by an online scan of APs in its environment, which are utilized for position estimation of multiple APs. To perform relative localization, each robot aligns the convex hull of its predicted AP locations with that of the neighbor robots. This approach is well-suited for devices with limited computational resources and operates solely on widely available Wi-Fi RSSI measurements without necessitating any dedicated pre-calibration or offline fingerprinting. We rigorously evaluate the performance of the proposed MGPRL in ROS simulations and demonstrate it with real-world experiments, comparing it against multiple state-of-the-art approaches. The results showcase that MGPRL outperforms existing methods in terms of localization accuracy and computational efficiency.

10:40-10:45, Paper TuAT18.3
Impact of Heterogeneous UWB Sensor Noise on the Optimality and Sensitivity of Mobile Positioning Systems

Theunissen, Mathilde	LS2N, CNRS
Fantoni, Isabelle	CNRS
Malis, Ezio	Inria
Keywords: Multi-Robot Systems, Localization Abstract: In this paper, we propose a theoretical framework for designing a multi-robot formation equipped with Ultra-wideband (UWB) sensors to localize a target robot. In the presence of noisy range measurements, the accuracy of the target robot’s pose estimation is highly dependent on the chosen formation geometry. Different from existing works, we account for the heterogeneous standard deviations of range measurements across different UWB transmitter-receiver pairs. We establish new optimality conditions for formation geometries and conduct a sensitivity analysis of optimal formations under robot positioning errors. In a 2D setting, we derive necessary and sufficient conditions for both optimality and robustness to robot positioning uncertainty. Experimental results confirm the heterogeneous standard deviations of UWB range measurements and validate the target robot’s confidence ellipse model. An experimental comparison of formation geometries, optimized with and without considering heterogeneous noise, emphasizes the importance of accounting for the heterogeneous standard deviations of range measurements. In addition, we experimentally demonstrate that robust formation geometries improve the target robot’s confidence ellipse in the presence of positioning errors.

10:45-10:50, Paper TuAT18.4
Learning Distributed End-To-End Hunting Locomotion for Multiple Quadruped Robots

Yeung, Chung Yui	City University of Hong Kong
Wong, Shing Ming	City University of Hong Kong
Tung, Wai Nma	City University of Hong Kong
Xu, Shaohang	City University of Hong Kong
Ho, Chin Pang	City University of Hong Kong
Keywords: Multi-Robot Systems, Bioinspired Robot Learning, Machine Learning for Robot Control Abstract: Quadruped robots have demonstrated remarkable versatility in various applications, from search and rescue to exploration. Recent advancements have shifted focus from individual robots to swarms, recognizing the potential of collaborative behaviors to achieve complex tasks beyond the capabilities of a single robot. Inspired by the cooperative hunting behaviors observed in nature, this paper presents a reinforcement learning framework for a swarm of quadruped robots to learn decentralized end-to-end hunting locomotion. In particular, we integrate stable and dynamic locomotion with hunting objectives and utilize a guidance vector as privileged information for efficient training. The framework concerns the control dynamics of quadruped robots, ensuring both low-level stability and high-level hunting coordination in muti-robot environments. The trained policy is deployed onto a real robot system, and the experimental results demonstrate coordinative behavior in various scenarios. The implementation code is released to benefit the community.

10:50-10:55, Paper TuAT18.5
Deep Equivariant Multi-Agent Control Barrier Functions

Bousias, Nikolaos	University of Pennsylvania
Lindemann, Lars	University of Southern California
Pappas, George J.	University of Pennsylvania
Keywords: Multi-Robot Systems, Collision Avoidance, Machine Learning for Robot Control Abstract: With multi-agent systems increasingly deployed autonomously at scale in complex environments, ensuring safety of the data-driven policies is critical. Control Barrier Functions have emerged as an effective tool for enforcing safety constraints, yet existing learning-based methods often lack in scalability, generalization and sampling efficiency as they overlook inherent geometric structures of the system. To address this gap, we introduce symmetries-infused distributed CBFs, enforcing the satisfaction of intrinsic symmetries on learnable graph-based safety certificates. We theoretically motivate the need for equivariant parametrization of CBFs and policies, and propose a simple, yet efficient and adaptable methodology for constructing such equivariant group-modular networks via the compatible group actions. This approach encodes safety constraints in a distributed data-efficient manner, enabling zero-shot generalization to larger and denser swarms. Through extensive simulations on multi-robot navigation tasks, we demonstrate that our method outperforms state-of-the-art baselines in terms of safety, scalability, and task success rates, highlighting the importance of embedding symmetries in safe distributed neural policies.

10:55-11:00, Paper TuAT18.6
A Novel Large-Scale Collaborative Mapping Framework with Heterogeneous Point Clouds for Aerial-Ground Robots

Luan, Shuang	Dalian University of Technology
He, Guojian	Dalian Maritime University
Peng, Haoyuan	Dalian University of Technology
Yan, Fei	Dalian University of Technology
Zhuang, Yan	Dalian University of Technology
Keywords: Multi-Robot Systems, Mapping, Localization Abstract: Ground and aerial robots, with distinct sensing perspectives, acquire heterogeneous point clouds that exhibit limited overlap, presenting significant challenges for collaborative mapping. To address these challenges, this article proposes a robust LiDAR-based aerial-ground collaborative mapping framework for large-scale outdoor environments. Firstly, to perform reliable cross-source place recognition and detect loop closure between aerial-ground robots, a deep network that fuses multi-level bird’s-eye view (BEV) and geometric features is developed to ensure consistent feature extraction and emphasize overlaps between heterogeneous point clouds. Next, an overlapaware registration method is proposed to align point clouds within a detected loop closure. This method can strategically perform point cloud sparsification based on overlap ratio estimation, and mitigate the adverse effects of interfering points in non-overlapping regions. Furthermore, a graph optimization is implemented to consider all loop closure constraints simultaneously and ensure global map consistency. Comparative experiments on public and self-collected datasets demonstrate the superiority of the proposed approach. We will release our code at https://github.com/luanshuang/AGCMapping.

11:00-11:05, Paper TuAT18.7
GeoSafe: A Unified Unconstrained Multi-DOF Optimization Framework for Multi-UAV Cooperative Hoisting and Obstacle Avoidance

Li, Xingyu	Northeastern University - China
Nie, Hongyu	Shenyang University of Technology
Xu, Haoxuan	Northeastern University
Liu, Xingrui	Northeastern University
Tan, Zhaotong	Northeastern University
Jiang, Chunyu	Woozoom
Feng, Yang	Shenyang WooZoom Technology Co., Ltd
Mei, Sen	Shenyang Woozoom Technology Co., Ltd
Keywords: Multi-Robot Systems, Motion and Path Planning, Intelligent Transportation Systems Abstract: In warehouse logistics and post-disaster rescue, multi-UAV payload transport must navigate tight spaces, such as 1.2 m × 0.8 m aisles and collapsed pipelines as narrow as 0.6 m. Traditional four-DOF (translation and scaling) trajectory planning struggles under such constraints. To overcome this, we propose an optimization-based framework that introduces rotational degrees of freedom, expanding the solution space to five dimensions. Using the MINCO transformation, we reformulate constrained formation adjustment into an unconstrained optimization problem via smooth mappings and penalty functions, enabling simultaneous obstacle avoidance and formation control. The GeoSafe algorithm further enhances safe passage by integrating iterative region expansion and semi-definite programming to maximize obstacle-free space. Extensive simulations and real-world experiments show our method’s superiority over sampling-based and IF-based approaches in narrow passage traversal, computational efficiency, and formation scalability.

11:05-11:10, Paper TuAT18.8
Sparse Hierarchical LiDAR Bundle Adjustment for Online Collaborative Localization and Mapping

Liu, Jiangpin	Zhejiang University
Xu, Xuecheng	Zhejiang University
Lu, Sha	Zhejiang University
Wang, Si	Zhejiang University
Wang, Chaoqun	Shandong University
Xiong, Rong	Zhejiang University
Wang, Yue	Zhejiang University
Keywords: Multi-Robot SLAM, SLAM, Cooperating Robots Abstract: Abstract—This letter presents a sparse hierarchical LiDAR bundle adjustment method for online multi-robot collaborative simultaneous localization and mapping (C-SLAM). The motiva tion behind this work is that the pose graph cannot directly reflect map inconsistencies. As a result, the map divergence across multiple robots persists even after pose graph optimization. While existing methods have utilized LiDAR bundle adjustment (BA) to address the divergence issues, none of these methods are particularly effective at improving online localization accuracy in multi-robot scenarios. In this work, we propose a sparse hier archical mechanism where LiDAR bundle adjustment functions as a low-latency module within a centralized C-SLAM system. The hierarchical design accelerates the original BA process, while sparse selection further decreases the problem’s complexity, thereby improving computational efficiency. The combination of high-frequency multi-robot pose graph optimization (MR-PGO) and low-frequency multi-robot hierarchical bundle adjustment (MR-HBA) improves accuracy and provides real-time localization results. To validate the effectiveness of our proposed method, we conduct comparative experiments using multiple public datasets and a self-collected dataset, benchmarking against state-of-the art multi-robot SLAM systems. The results demonstrate that our method significantly enhances the accuracy of localization and mapping. Additionally, we have made the entire system available as an open-source implementation to benefit the broader research community.


TuAT19	210C
Grasping 1	Regular Session
Chair: Garrabé, Émiland	ISIR, Sorbonne Université
Co-Chair: Lan, Xuguang	Xi'an Jiaotong University

10:30-10:35, Paper TuAT19.1
An Inflatable Deployable Origami Grasper for Adaptive and High-Load Grasping

Yan, Peng	Harbin Institute of Technology, Shenzhen
Liang, Guang	Department of Mechanical Engineering and Automation, Harbin Inst
Wang, Sen	Harbin Institute of Technology, Shenzhen
Huang, Hailin	Harbin Institute of Technology, Shenzhen
Wang, Wei	Harbin Institute of Technology, Shenzhen
Li, Xu	Harbin Institute of Technology, Shenzhen
Li, Bing	Harbin Institute of Technology (Shenzhen)
Keywords: Grasping, Grippers and Other End-Effectors Abstract: Robotic graspers are essential for enhancing the efficiency and versatility of robots in grasping tasks. In this paper, we propose a novel inflatable deployable origami grasper with a rigid-flexible coupling structure. The proposed grasper can achieve multiple deployment configurations under a single pneumatic actuation, enabling both deployment and grasping operations while also allowing for passive self-folding during deflation. The design and fabrication of the grasper are presented. Then, the stiffness model for the inflatable deployable origami unit is developed based on the equivalent truss method. Experimental results show that the grasper successfully grasps objects of various shapes and sizes in both enveloping and fingertip grasping modes, using either two or four fingers. With its simple mechanical system and high deploy/fold ratio, the proposed grasper holds significant potential for applications in industrial automation and space exploration.

10:35-10:40, Paper TuAT19.2
Task-Aware Robotic Grasping by Evaluating Quality Diversity Solutions through Foundation Models

Appius, Aurel Xaver	ETH Zürich
Garrabé, Émiland	ISIR, Sorbonne Université
Hélénon, François	Sorbonne Université
Khoramshahi, Mahdi	Sorbonne University
Chetouani, Mohamed	Sorbonne University
Doncieux, Stéphane	Sorbonne University
Keywords: Grasping, AI-Based Methods, Evolutionary Robotics Abstract: Task-aware robotic grasping is a challenging problem that requires the integration of semantic understanding and geometric reasoning. This paper proposes a novel framework that leverages Large Language Models (LLMs) and Quality Diversity (QD) algorithms to enable zero-shot task-conditioned grasp synthesis. The framework segments objects into meaningful subparts and labels each subpart semantically, creating structured representations that can be used to prompt an LLM. By coupling semantic and geometric representations of an object's structure, the LLM's knowledge about tasks and which parts to grasp can be applied in the physical world. The QD-generated grasp archive provides a diverse set of grasps, allowing us to select the most suitable grasp based on the task. We evaluated the proposed method on a subset of the YCB dataset with a Franka Emika robot. A consolidated ground truth for task-specific grasp regions is established through a survey. Our work achieves a weighted intersection over union (IoU) of 73.6% in predicting task-conditioned grasp regions in 65 task-object combinations. An end-to-end validation study on a smaller subset further confirms the effectiveness of our approach, with 88% of responses favoring the task-aware grasp over the control group. A binomial test shows that participants significantly prefer the task-aware grasp.

10:40-10:45, Paper TuAT19.3
KGN-Pro: Keypoint-Based Grasp Prediction through Probabilistic 2D-3D Correspondence Learning

Chen, Bingran	Zhejiang University
Li, Baorun	Zhejiang University
Yang, Jian	China Research and Development Academy of Machinery Equipment
Liu, Yong	Zhejiang University
Zhai, Guangyao	Technical University of Munich
Keywords: Grasping, Deep Learning in Grasping and Manipulation Abstract: High-level robotic manipulation tasks demand flexible 6-DoF grasp estimation to serve as a basic function. Previous approaches either directly generate grasps from point-cloud data, suffering from challenges with small objects and sensor noise, or infer 3D information from RGB images, which introduces expensive annotation requirements and discretization issues. Recent methods mitigate some challenges by retaining a 2D representation to estimate grasp keypoints and applying Perspective-n-Point (PnP) algorithms to compute 6-DoF poses. However, these methods are limited by their non-differentiable nature and reliance solely on 2D supervision, which hinders the full exploitation of rich 3D information. In this work, we present KGN-Pro, a novel grasping network that preserves the efficiency and fine-grained object grasping of previous KGNs while integrating direct 3D optimization through probabilistic PnP layers. KGN-Pro encodes paired RGB-D images to generate grasp center heatmaps and keypoint offsets, and further computes a 2D confidence map to weight keypoint contributions during re-projection error minimization. By modeling the weighted sum of squared re-projection errors probabilistically, the network effectively transmits 3D supervision to its 2D keypoint predictions, enabling end-to-end learning. Experiments on both simulated and real-world platforms demonstrate that KGN-Pro outperforms existing methods in terms of grasp cover rate and success rate. Project website: https://waitderek.github.io/kgnpro.

10:45-10:50, Paper TuAT19.4
Towards Extrinsic Dexterity Grasping in Unrestricted Environments

Ma, Chengzhong	Xi'an Jiaotong University
Yang, Houxue	Xi'an Jiaotong University
Zhang, Hanbo	National University of Singapore
Liu, Zeyang	Xi'an Jiaotong University
Zhao, Chao	Xi'an Jiaotong University
Tang, Jian	Xi'an Jiaotong University
Lan, Xuguang	Xi'an Jiaotong University
Zheng, Nanning	Xi'an Jiaotong University
Keywords: Grasping, Manipulation Planning, Perception for Grasping and Manipulation Abstract: Grasping large and flat objects (e.g., a book or a pan) is often regarded as an ungraspable task, which poses significant challenges due to the unreachable grasping poses. Prior research has exploited environmental interactions through Extrinsic Dexterity, utilizing external structures such as walls or table edges to facilitate object grasping. However, they are confined to task-specific policies while neglecting semantic perception and planning to identify optimal pre-grasp configurations. This limits their operational versatility, impeding effective adaptation to varied extrinsic dexterity constraints. In this work, we present ExDiff, a robot manipulation approach for extrinsic dexterity grasping in unrestricted environments. It utilizes Vision-Language Models (VLMs) to perceive the environmental state and generate instructions, followed by a Goal-Conditioned Action Diffusion (GCAD) model to predict the sequence of low-level actions. This diffusion model learns the low-level policy, conditioned on high-level instructions and cumulative rewards, which improves the generation of robot actions. Simulation experiments and real-world deployment results demonstrate that ExDiff effectively performs ungraspable tasks and generalizes to previously unseen target objects and scenes.

10:50-10:55, Paper TuAT19.5
Dual Graph Attention Networks for Multi-View Visual Manipulation Relationship Detection and Robotic Grasping (I)

Ding, Mengyuan	Xi'an Jiaotong University
Liu, YaXin	Xi'an Jiaotong University
Shi, Yaorui	Xi'an Jiaotong University
Lan, Xuguang	Xi'an Jiaotong University
Keywords: Grasping, Manipulation Planning, Dual Arm Manipulation Abstract: Visual manipulation relationship detection facilitates robots to achieve safe, orderly, and efficient grasping tasks. However, most existing algorithms only model object-level or relational-level dependency individually, lacking sufficient global information, which is difficult to handle different types of reasoning errors, especially in complex environments with multi-object stacking and occlusion. To solve the above problems, we propose Dual Graph Attention Networks (Dual-GAT) for visual manipulation relationship detection, with an object-level graph network for capturing object-level dependencies and a relational-level graph network for capturing relational triplets-level interactions. The attention mechanism assigns different weights to different dependencies, obtains more accurate global context information for reasoning, and gets a manipulation relationship graph. In addition, we use multi-view feature fusion to improve the occluded object features, then enhance the relationship detection performance in multi-object scenes. Finally, our method is deployed on the robot to construct a multi-object grasping system, which can be well applied to stacking environments. Experimental results on the datasets VMRD and REGRAD show that our method significantly outperforms others.

10:55-11:00, Paper TuAT19.6
DexDiffuser: Generating Dexterous Grasps with Diffusion Models

Weng, Zehang	KTH
Lu, Haofei	Royal Institute of Technology
Kragic, Danica	KTH
Lundell, Jens	Royal Institute of Technology
Keywords: Grasping, Deep Learning in Grasping and Manipulation, Dexterous Manipulation Abstract: We introduce DexDiffuser, a novel dexterous grasping method that generates, evaluates, and refines grasps on partial object point clouds. DexDiffuser includes the conditional diffusion-based grasp sampler DexSampler and the dexterous grasp evaluator DexEvaluator. DexSampler generates high-quality grasps conditioned on object point clouds by iterative denoising of randomly sampled grasps. We also introduce two grasp refinement strategies: Evaluator-Guided Diffusion and Evaluator-based Sampling Refinement. The experiment results demonstrate that DexDiffuser consistently outperforms the state-of-the-art multi-finger grasp generation method FFHNet with an, on average, 9.12% and 19.44% higher grasp success rate in simulation and real robot experiments, respectively.

11:00-11:05, Paper TuAT19.7
GAT-Grasp: Gesture-Driven Affordance Transfer for Task-Aware Robotic Grasping

Wang, Ruixiang	Harbin Institude of Technology
Zhou, Huayi	The Chinese University of Hong Kong, Shenzhen
Yao, Xinyue	The Chinese University of Hongkong, Shenzhen
Liu, Guiliang	Chinese University of Hong Kong, Shenzhen
Jia, Kui	Shenzhen Institute of Advanced Technology, Chinese Academy
Keywords: Grasping, Perception for Grasping and Manipulation, Human-Robot Collaboration Abstract: Achieving precise and generalizable grasping across diverse objects and environments is essential for intelligent and collaborative robotic systems. However, existing approaches often struggle with ambiguous affordance reasoning and limited adaptability to unseen objects, leading to suboptimal grasp execution. In this work, we propose GAT-Grasp, a gesture-driven grasping framework that directly utilizes human hand gestures to guide the generation of task-specific grasp poses with appropriate positioning and orientation. Specifically, we introduce a retrieval-based affordance transfer paradigm, leveraging the implicit correlation between hand gestures and object affordances to extract grasping knowledge from large-scale human-object interaction videos. By eliminating the reliance on pre-given object priors, GAT-Grasp enables zero-shot generalization to novel objects and cluttered environments. Real-world evaluations confirm its robustness across diverse and unseen scenarios, demonstrating reliable grasp execution in complex task settings.

11:05-11:10, Paper TuAT19.8
Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation Via Immersive VR

Li, Kelin	Imperial College London
Wagh, Shubham Maroti	Extend Robotics Limited
Sharma, Nitish	Extend Robotics
Bhadani, Saksham	SRM Institute of Science and Technlogy
Chen, Wei	Imperial College London
Liu, Chang	Imperial College London
Kormushev, Petar	Imperial College London
Keywords: Grasping, Learning from Demonstration, Haptics and Haptic Interfaces Abstract: Robotic manipulation is essential for the widespread adoption of robots in industrial and home settings and has long been a focus within the robotics community. Advances in artificial intelligence have introduced promising learning-based methods to address this challenge, with imitation learning emerging as particularly effective. However, efficiently acquiring high-quality demonstrations remains a challenge. In this work, we introduce an immersive VR-based teleoperation setup designed to collect demonstrations from a remote human user. We also propose an imitation learning framework called Haptic Action Chunking with Transformers (Haptic-ACT). To evaluate the platform, we conducted a pick-and-place task and collected 50 demonstration episodes. Results indicate that the immersive VR platform significantly reduces demonstrator fingertip forces compared to systems without haptic feedback, enabling more delicate manipulation. Additionally, evaluations of the Haptic-ACT framework in both the MuJoCo simulator and on a real robot demonstrate its effectiveness in teaching robots more compliant manipulation compared to the original ACT. Additional materials are available at https://sites.google.com/view/hapticact.


TuAT20	210D
Humanoid Robot Systems 1	Regular Session
Chair: Romualdi, Giulio	Istituto Italiano Di Tecnologia
Co-Chair: Kheddar, Abderrahmane	CNRS-AIST

10:30-10:35, Paper TuAT20.1
Humanoid-Human Sit-To-Stand-To-Sit Assistance

Lefèvre, Hugo	LIRMM, CNRS-Université De Montpellier
Chaki, Tomohiro	Honda R&D Co., Ltd
Kawakami, Tomohiro	Honda R&D Co., Ltd
Tanguy, Arnaud	CNRS-UM LIRMM
Yoshiike, Takahide	Honda R&D Co. Ltd.,
Kheddar, Abderrahmane	CNRS-AIST
Keywords: Humanoid Robot Systems, Body Balancing, Multi-Contact Whole-Body Motion Planning and Control Abstract: Standing and sitting are basic tasks that become in- creasingly difficult with age or frailty. Assisting these movements using humanoid robots is a complex challenge, particularly in determining where and how much force the robot should apply to effectively support the human’s dynamic motions. In this letter, we propose a method to compute assistive forces directly from the human’s dynamic balance, using criteria typically employed in humanoid robots. Specifically, we map humanoid dynamic balance metrics onto human motion to calculate the forces required to stabilize the human’s current posture. These forces are then applied at the appropriate locations on the human body by the humanoid. Our approach combines the variable height 3D divergent com- ponent of motion with gravito-inertial wrench cones to define a 3D balance region. Using centroidal feedback, we compute the required assistance force to maintain balance and distribute the resulting wrenches across the human’s body using a humanoid robot dynamically balanced according to the same criteria. We demonstrate the effectiveness of this framework through both sim- ulations and experiments, where a humanoid assists a person in sit-to-stand and stand-to-sit motions, with the person wearing an age-simulation suit to emulate frailty.

10:35-10:40, Paper TuAT20.2
Teacher Motion Priors: Enhancing Robot Locomotion Over Challenging Terrain

Jin, Fangcheng	University of Chinese Academy of Sciences
Wang, Yuqi	Beijing Zhongke Huiling Robot Technology Co., LTD
Ma, Peixin	Zhongke Huiling Company
Yang, Guodong	Institute of Automation, Chinese Academy of Sciences; Beijing Zh
Zhao, Pan	Beijing Zhongke Huiling Robot Technology Co., LTD
Li, En	Institute of Automation, Chinese Academy of Sciences
Zhang, Zhengtao	Institute of Automation, Chinese Academy of Sciences
Keywords: Humanoid and Bipedal Locomotion, Reinforcement Learning, Imitation Learning Abstract: Achieving robust locomotion on complex terrains remains a challenge due to high-dimensional control and environmental uncertainties. This paper introduces a teacher-prior framework based on the teacher-student paradigm, integrating imitation and auxiliary task learning to improve learning efficiency and generalization. Unlike traditional paradigms that strongly rely on encoder-based state embeddings, our framework decouples the network design, simplifying the policy network and deployment. A high-performance teacher policy is first trained using privileged information to acquire generalizable motion skills. The teacher’s motion distribution is transferred to the student policy, which relies only on noisy proprioceptive data, via a generative adversarial mechanism to mitigate performance degradation caused by distributional shifts. Additionally, auxiliary task learning enhances the student policy’s feature representation, speeding up convergence and improving adaptability to varying terrains. The framework is validated on a humanoid robot, showing a great improvement in locomotion stability on dynamic terrains and significant reductions in development costs. This work provides a practical solution for deploying robust locomotion strategies in humanoid robots.

10:40-10:45, Paper TuAT20.3
Enhancing Humanoid Robot Dynamics: An Optimization Framework for Shoulder Base Angle Adjustment

Yoon, Jiwon	Korea University
Lee, Sujin	Korea Institute of Science and Technology
Ihn, Yong Seok	Korea Institute of Science and Technology
Keywords: Humanoid Robot Systems, Optimization and Optimal Control, Dynamics Abstract: Optimizing the initial angle of the shoulder's base frame is crucial for defining the workspace and enhancing the manipulation performance of humanoid robotic arms. Previous studies primarily emphasized geometric analyses, neglecting dynamic factors, which limits their practical applicability. This study presents a multi-metric optimization framework to enhance the dynamic performance of humanoid robotic arms by optimizing the shoulder’s initial angle. We formulate a cost function that incorporates torque efficiency, energy consumption, and overload ratio, utilizing the differential evolution (DE) algorithm for optimization. Furthermore, to address the limitations of conventional geometric workspace analysis, we introduce the concept of effective workspace, integrating dynamic constraints to quantitatively evaluate the effects of optimized shoulder angles. We validate the proposed framework in a hybrid simulation environment combining MuJoCo and RBDL, using the KIST humanoid and Unitree G1 robotic arms. Experimental results confirm that the optimized shoulder angles enhance torque distribution, expanding the effective workspace by 18.4% and 3.78% for the KIST and Unitree G1 robotic arms, respectively. These findings demonstrate that the proposed optimization framework enhances manipulation and dynamic performance as well as energy efficiency and system reliability, contributing to advancements in humanoid robotic arm design.

10:45-10:50, Paper TuAT20.4
Multi-Objective Optimization of Humanoid Robot Hardware and Control for Multiple Tasks Via Genetic Algorithms

Sartore, Carlotta	Istituto Italiano Di Tecnologia
Traversaro, Silvio	Istituto Italiano Di Tecnologia
Pucci, Daniele	Italian Institute of Technology
Keywords: Humanoid Robot Systems, Methods and Tools for Robot System Design, Optimization and Optimal Control Abstract: The optimization of hardware and control of humanoid robots for multiple tasks is still an open challenge due to the competing objectives of different behaviors and the complexity of considering control architectures at the design level of a humanoid robot. In this work, we propose a unified multi-objective optimization framework that jointly optimizes both hardware and hierarchical control architectures of a humanoid robot to enhance performance in multiple tasks. Our method employs a Non-dominated Sorting Genetic Algorithm II (NSGA-II) to identify optimal robot morphology and control parameters while balancing trade-offs between diverse task requirements. By leveraging genetic algorithms, we enable the integration of discrete search spaces while overcoming the local minima limitations associated with classical nonlinear optimization techniques. Furthermore, the proposed approach directly incorporates the simulation results, ensuring that hardware optimization is performed considering the system dynamics. We validate our approach by optimizing a humanoid robot for two distinct tasks: walking and payload lifting, leveraging MuJoCo to evaluate the task performances. The proposed framework successfully identifies Pareto-optimal tradeoffs, providing a set of design solutions adaptable to different operational requirements.

10:50-10:55, Paper TuAT20.5
End-Effectors Changer Design for Humanoids

Roux, Julien	LIRMM, CNRS - Université De Montpellier, France
Izard, Jean-Baptiste	Alted
Tanguy, Arnaud	CNRS-UM LIRMM
Kaminaga, Hiroshi	National Inst. of AIST
Kanehiro, Fumio	National Inst. of AIST
Kheddar, Abderrahmane	CNRS-AIST
Keywords: Humanoid Robot Systems, Engineering for Robotic Systems, Mechanism Design Abstract: Almost all (not to say all) existing humanoids have not been designed with the ability to change their end-effectors (head, feet and hands) on-the-fly. Inspired by the tool-changing mechanisms in robotics automation and manufacturing, we propose an end-effectors changer mechanism suitable to humanoids. This letter explains why enabling humanoids with such a technology is important and why existing tool-changer mechanisms are not adapted. The proposed changer mechanism is not actuated. Yet, it does not require human intervention to assist the change of end-effectors. We assess our mechanism through a comparative study with existing tool-changers and demonstrations with the HRP-4 humanoid. We claim that our idea could be a new turn in the design of future humanoids and open perspective to modular sizing of robots. The proposed mechanism can possibly apply to animaloids to some extent.

10:55-11:00, Paper TuAT20.6
Reinforcement Learning-Based Optimization of Humanoid Joint Motion Control Via Text-Driven Human Motion Mapping

Xu, Zihan	Tongji University
Hu, Mengxian	Tongji University
Xiao, Kaiyan	Tongji University
Fang, Qin	Tongji University
Liu, Chengju	Tongji University
Chen, Qijun	Tongji University
Keywords: Humanoid Robot Systems, Reinforcement Learning, Motion Control Abstract: Human motion retargeting for humanoid robots, transferring human motion data to robots for imitation, presents significant challenges but offers considerable potential for real-world applications. Traditionally, this process relies on human demonstrations captured through pose estimation or motion capture systems. In this paper, we explore a text-driven approach to obtain imitation motion data more flexibly and simply. To address the inherent discrepancies between the generated motion representations and the kinematic constraints of humanoid robots, we propose an angle signal network based on normposition and rotation loss (NPR Loss). It generates joint angles, which serve as inputs to a reinforcement learningbased whole-body motion control policy. The policy ensures tracking of the generated motions while maintaining the robot's stability during execution. Our experimental results demonstrate the efficacy of this approach, successfully transferring text-driven human motion to a real humanoid robot NAO.

11:00-11:05, Paper TuAT20.7
FABRIC: Fabricating Bodily-Expressive Robots for Inclusive and Low-Cost Design

Arabi, Abul Al	Texas A&M University
Kim, Jeeeun	Texas A&M University
Keywords: Humanoid Robot Systems, Software Tools for Robot Programming, Mechanism Design Abstract: Sign language serves individuals with hearing impairments as a crucial communication mode operating through visual-manual means. While there has been established theory and agreement about embodiment in multiple fields, only limited research has deeply engaged to lower access to the physical body for spatial perception and engagement. Embodied robots are often cost-prohibitive, and existing open- source robot fabrication packages are limited in their ability to fully address communication nuances, typically running only on predefined programs. Reprogramming for broader bodily interactions, such as gestures in various domains (e.g., construction), is nearly impossible unless expertise precedes. We introduce FABRIC, an end-to-end toolkit for fabricating and programming bodily language for unique human-robot interactions. The toolkit includes a fully 3D-printable robot, designed for consumer-grade FDM machinery, that learns from demonstration (LfD) to capture and translate users’ bodily expressions through its upper torso (arms and hands) movements. A visual programming interface enables appending or sequencing demonstrations from various sources, i.e., videos, cameras, and expandable word/phrase/sentence libraries.


TuAT21	101
Optimization and Optimal Control 1	Regular Session
Chair: Saska, Martin	Czech Technical University in Prague
Co-Chair: Singh, Arun Kumar	University of Tartu

10:30-10:35, Paper TuAT21.1
Sampling-Based Model Predictive Control Leveraging Parallelizable Physics Simulations

Pezzato, Corrado	Delft University of Technology
Salmi, Chadi	Delft University of Technology
Trevisan, Elia	Delft University of Technology
Spahn, Max	TU Delft
Alonso-Mora, Javier	Delft University of Technology
Hernández Corbato, Carlos	Delft University of Technology
Keywords: Optimization and Optimal Control, Contact Modeling, Whole-Body Motion Planning and Control Abstract: We present a sampling-based model predictive control method that uses a generic physics simulator as the dynamical model. In particular, we propose a Model Predictive Path Integral controller (MPPI) that employs the GPU-parallelizable IsaacGym simulator to compute the forward dynamics of the robot and environment. Since the simulator implicitly defines the dynamic model, our method is readily extendable to different objects and robots, allowing one to solve complex navigation and contact-rich tasks. We demonstrate the effectiveness of this method in several simulated and real-world settings, including mobile navigation with collision avoidance, non-prehensile manipulation, and whole-body control for high-dimensional configuration spaces. This is a powerful and accessible open-source tool to solve many contact-rich motion planning tasks.

10:35-10:40, Paper TuAT21.2
Interpreting and Improving Optimal Control Problems with Directional Corrections

Barron, Trevor	Apple Inc
Zhang, Xiaojing	Apple Inc
Keywords: Optimization and Optimal Control, Machine Learning for Robot Control Abstract: Many robotics tasks, such as path planning or trajectory optimization, are formulated as optimal control problems (OCPs). The key to obtaining high performance lies in the design of the OCP's objective function. In practice, the objective function consists of a set of individual components that must be carefully modeled and traded off such that the OCP has the desired solution. It is often challenging to balance multiple components to achieve the desired solution and to understand, when the solution is undesired, the impact of individual cost components. In this paper, we present a framework addressing these challenges based on the concept of directional corrections. Specifically, given the solution to an OCP that is deemed undesirable, and access to an expert providing the direction of change that would increase the desirability of the solution, our method analyzes the individual cost components for their consistency with the provided directional correction. This information can be used to improve the OCP formulation, e.g., by increasing the weight of consistent cost components, or reducing the weight of -- or even redesigning -- inconsistent cost components. We also show that our framework can automatically tune parameters of the OCP to achieve consistency with a set of corrections.

10:40-10:45, Paper TuAT21.3
pi-MPPI: A Projection-Based Model Predictive Path Integral Scheme for Smooth Optimal Control of Fixed-Wing Aerial Vehicles

Andrejev, Edvin Martin	University of Tartu
Manoharan, Amith	University of Tartu
Unt, Karl-Eerik	Estonian Aviation Academy
Singh, Arun Kumar	University of Tartu
Keywords: Optimization and Optimal Control, Motion and Path Planning, Machine Learning for Robot Control Abstract: Model Predictive Path Integral (MPPI) is a popular sampling-based Model Predictive Control (MPC) algorithm for nonlinear systems. It optimizes trajectories by sampling control sequences and averaging them. However, a key issue with MPPI is the non-smoothness of the optimal control sequence, leading to oscillations in systems like fixed-wing aerial vehicles (FWVs). Existing solutions use post-hoc smoothing, which fails to bound control derivatives. This paper introduces a new approach: we add a projection filter pi to minimally correct control samples, ensuring bounds on control magnitude and higher-order derivatives. The filtered samples are then averaged using MPPI, leading to our pi-MPPI approach. We minimize computational overhead by using a neural accelerated custom optimizer for the projection filter. pi-MPPI offers a simple way to achieve arbitrary smoothness in control sequences. While we focus on FWVs, this projection filter can be integrated into any MPPI pipeline. Applied to FWVs, pi-MPPI is easier to tune than the baseline, resulting in smoother, more robust performance.

10:45-10:50, Paper TuAT21.4
Parallel Transmission Aware Co-Design: Enhancing Manipulator Performance through Actuation-Space Optimization

Kumar, Rohit	DFKI GmbH
Boukheddimi, Melya	DFKI GmbH
Mronga, Dennis	University of Bremen, German Research Center for Artificial Inte
Kumar, Shivesh	DFKI GmbH
Kirchner, Frank	University of Bremen
Keywords: Optimization and Optimal Control, Parallel Robots, Robotics and Automation in Agriculture and Forestry Abstract: In robotics, structural design and behavior optimization have long been considered separate processes, resulting in the development of systems with limited capabilities. Recently, co-design methods have gained popularity, where bi-level formulations are used to simultaneously optimize the robot design and behavior for specific tasks. However, most implementations assume a serial or tree-type model of the robot, overlooking the fact that many robot platforms incorporate parallel mechanisms. In this paper, we present a first co-design formulation that explicitly incorporates parallel coupling constraints into the dynamic model of the robot. In this framework, an outer optimization loop focuses on the design parameters, in our case the transmission ratios of a parallel belt-driven manipulator, which map the desired torques from the joint space to the actuation space. An inner loop performs trajectory optimization in the actuation space, thus exploiting the entire dynamic range of the manipulator. We compare the proposed method with a conventional co-design approach based on a simplified tree-type model. By taking advantage of the actuation space representation, our approach leads to a significant increase in dynamic payload capacity compared to the conventional co-design implementation.

10:50-10:55, Paper TuAT21.5
Budget-Optimal Multi-Robot Layout Design for Box Sorting

Zeng, Peiyu	ETH Zurich
Huang, Yijiang	ETH Zurich
Huber, Simon	ETH Züirch
Coros, Stelian	ETH Zurich
Keywords: Multi-Robot Systems, Optimization and Optimal Control, Logistics Abstract: Robotic systems are routinely used in the logistics industry to enhance operational efficiency, but the design of robot workspaces remains a complex and manual task, which limits the system's flexibility to changing demands. This paper aims to automate robot workspace design by proposing a computational framework to generate a budget-minimizing layout by selectively placing stationary robots on a floor grid to sort packages from given input and output locations. Finding a good layout that minimizes the hardware budget while ensuring motion feasibility is a challenging combinatorial problem with nonconvex motion constraints. We propose a new optimization-based approach that models layout planning as a subgraph optimization problem subject to network flow constraints. Our core insight is to abstract away motion constraints from the layout optimization by precomputing a kinematic reachability graph and then extract the optimal layout on this ground graph. We validate the motion feasibility of our approach by proposing a simple task assignment and motion planning technique. We benchmark our algorithm on problems with various grid resolutions and number of outputs and show improvements in memory efficiency over a heuristic search algorithm. In addition, we demonstrate that our algorithm can be extended to handle various types of robot manipulators and conveyor belts, box payload constraints, and cost assignments.

10:55-11:00, Paper TuAT21.6
LoL-NMPC: Low-Level Dynamics Integration in Nonlinear Model Predictive Control for Unmanned Aerial Vehicles

Gupta, Parakh M.	Czech Technical University in Prague
Procházka, Ondřej	Czech Technical University in Prague
Hřebec, Jan	CTU in Prague
Novosad, Matej	Czech Technical University in Prague
Penicka, Robert	Czech Technical University in Prague
Saska, Martin	Czech Technical University in Prague
Keywords: Optimization and Optimal Control, Aerial Systems: Mechanics and Control, Aerial Systems: Applications Abstract: In this paper, we address the problem of tracking high-speed agile trajectories for Unmanned Aerial Vehicles (UAVs), where model inaccuracies can lead to large track- ing errors. Existing Nonlinear Model Predictive Controller (NMPC) methods typically neglect the dynamics of the low- level flight controllers such as underlying PID controller present in many flight stacks, and this results in suboptimal tracking performance at high speeds and accelerations. To this end, we propose a novel NMPC formulation, LoL-NMPC, which explicitly incorporates low-level controller dynamics and motor dynamics in order to minimize trajectory tracking errors while maintaining computational efficiency. By leveraging linear constraints inside low-level dynamics, our approach inherently accounts for actuator constraints without requiring additional reallocation strategies. The proposed method is validated in both simulation and real-world experiments, demonstrating improved tracking accuracy and robustness at speeds up to 98.57 km h−1 and accelerations of 3.5 g. Our results show an average 21.97 % reduction in trajectory tracking error over standard NMPC formulation, with LoL-NMPC maintaining real-time feasibility at 100 Hz on an embedded ARM-based flight computer.

11:00-11:05, Paper TuAT21.7
DBaS-Log-MPPI: Efficient and Safe Trajectory Optimization Via Barrier States

Wang, Fanxin	Xi'an Jiaotong-Liverpool University (XJTLU)
Jiang, Haolong	Xi'an Jiaotong-Liverpool University (XJTLU)
Wan, Wenbin	UNM
Tao, Chuyuan	University of Illinois, Urbana and Champaign
Cheng, Yikun	University of Illinois at Urbana-Champaign
Keywords: Optimization and Optimal Control, Motion and Path Planning, Autonomous Vehicle Navigation Abstract: Optimizing trajectory costs for nonlinear control systems remains a significant challenge. Model Predictive Control (MPC), particularly sampling-based approaches such as the Model Predictive Path Integral (MPPI) method, has recently demonstrated considerable success by leveraging parallel computing to efficiently evaluate numerous trajectories. However, MPPI often struggles to balance safe navigation in constrained environments with effective exploration in open spaces, leading to infeasibility in cluttered conditions. To address these limitations, we propose DBaS-Log-MPPI, a novel algorithm that integrates Discrete Barrier States (DBaS) to ensure safety while enabling adaptive exploration with enhanced feasibility. Our method is efficiently validated through three simulation missions and one real-world experiment, involving a 2D quadrotor and a ground vehicle navigating through cluttered obstacles. We demonstrate that our algorithm surpasses both Vanilla MPPI and Log-MPPI, achieving higher success rates, lower tracking errors, and a conservative average speed.


TuAT22	102A
Robotics and Automation in Agriculture and Forestry 1	Regular Session
Chair: Peng, Chen	Zhejiang University
Co-Chair: Gu, Yu	West Virginia University

10:30-10:35, Paper TuAT22.1
Efficient and Safe Trajectory Planning for Autonomous Agricultural Vehicle Headland Turning in Cluttered Orchard Environments

Wei, Peng	University of California, Davis
Lu, Wenwu	ZJU-Hangzhou Global Scientific and Technological Innovation Cent
Zhu, Yuankai	University of California, Davis
Vougioukas, Stavros	UC Davis
Fei, Zhenghao	Zhejiang University
Ge, Zhikang	Zhejiang University
Peng, Chen	Zhejiang University
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, Motion and Path Planning Abstract: Autonomous agricultural vehicles (AAVs), including field robots and autonomous tractors, are becoming essential in modern farming by improving efficiency and reducing labor costs. A critical task in AAV operations is headland turning between crop rows. This task is challenging in orchards with limited headland space, irregular boundaries, operational constraints, and static obstacles. While traditional trajectory planning methods work well in arable farming, they often fail in cluttered orchard environments. This letter presents a novel trajectory planner that enhances the safety and efficiency of AAV headland maneuvers, leveraging advancements in autonomous driving. Our approach includes an efficient front-end algorithm and a high-performance back-end optimization. Applied to vehicles with various implements, it outperforms state-of-the-art methods in both standard and challenging orchard fields. This work bridges agricultural and autonomous driving technologies, facilitating a broader adoption of AAVs in complex orchards.

10:35-10:40, Paper TuAT22.2
SemP-NBV: Semantic-Aware Predictive Next-Best-View for Autonomous Plant 3D Reconstruction

Li, Xingjian	North Carolina State University
He, Weilong	North Carolina State University
Park, Jeremy	North Carolina State University
Reberg-Horton, s. Chris	North Carolina State University
Mirsky, Steven	USDA ARS
Lobaton, Edgar	North Carolina State University
Xiang, Lirong	Cornell University
Keywords: Robotics and Automation in Agriculture and Forestry, Reactive and Sensor-Based Planning, Simulation and Animation Abstract: Three-dimensional (3D) Plant Phenotyping enables comprehensive trait analysis for evaluating plant growth in precision agriculture. Current phenotyping frameworks are low-throughput due to frequent manual intervention and inefficiencies in handling large volumes of repetitive data. Existing view planners for phenotyping are limited to static sampling methods or narrow focus on specific plant organs, restricting their utility in capturing the full complexity of plant structures. To address these limitations, we propose SemP-NBV, a novel semantic-aware predictive next-best-view approach for sample-efficient 3D plant phenotyping. Evaluated in a photorealistic simulator with 12 plant categories, SemP-NBV achieves 15.3% more observed points than an even-space sampler at 8 images on average, while matching the reconstruction quality at 20 even-space images. We demonstrate that existing state-of-the-art predictive planners designed for artificial structures struggle with zero performance increase compared to even-space sampler for complex plant structures. Furthermore, our approach generates semantic information during reconstruction, reducing the need for post hoc semantic labeling, and streamlining the 3D phenotyping workflow. The project is available at https://github.com/ARLabXiang/SemP-NBV.

10:40-10:45, Paper TuAT22.3
Force Aware Branch Manipulation to Assist Agricultural Tasks

Rijal, Madhav	West Virginia University
Shrestha, Rashik	West Virginia University
Smith, Trevor	West Virginia University
Gu, Yu	West Virginia University
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, Manipulation Planning Abstract: This study presents a methodology to safely manipulate branches to aid various agricultural tasks. Humans in a real agricultural environment often manipulate branches to perform agricultural tasks effectively, but current agricultural robots lack this capability. This proposed strategy to manipulate branches can aid in different precision agriculture tasks, such as fruit picking in dense foliage, pollinating flowers under occlusion, and moving overhanging vines and branches for navigation. The proposed method modifies RRT* to plan a path that satisfies the branch geometric constraints and obeys branch deformable characteristics. Re-planning is done to obtain a path that helps the robot exert force within a desired range so that branches are not damaged during manipulation. Experimentally, this method achieved a success rate of 78% across 50 trials, successfully moving a branch from the starting point to the target region.

10:45-10:50, Paper TuAT22.4
3D Hierarchical Panoptic Segmentation in Real Orchard Environments across Different Sensors

Sodano, Matteo	Photogrammetry and Robotics Lab, University of Bonn
Magistri, Federico	University of Bonn
Marks, Elias Ariel	University of Bonn
Aboul Hosn, Fares	University of Bonn
Zurbayev, Aibek	University of Bonn
Marcuzzi, Rodrigo	University of Bonn
Malladi, Meher Venkata Ramakrishna	University of Bonn
Behley, Jens	University of Bonn
Stachniss, Cyrill	University of Bonn
Keywords: Robotics and Automation in Agriculture and Forestry, Data Sets for Robotic Vision Abstract: Crop yield estimation is a relevant problem in agriculture, because an accurate yield estimate can support farmers’ decisions on harvesting or precision intervention. Robots can help to automate this process. To do so, they need to be able to perceive the surrounding environment to identify target objects such as trees and plants. In this paper, we introduce a novel approach to address the problem of hierarchical panoptic segmentation of apple orchards on 3D data from different sensors. Our approach is able to simultaneously provide semantic segmentation, instance segmentation of trunks and fruits, and instance segmentation of trees (a trunkwith its fruits). This allows us to identify relevant information such as individual plants, fruits, and trunks, and capture the relationship among them, such as precisely estimate the number of fruits associated to each tree in an orchard. To efficiently evaluate our approach for hierarchical panoptic segmentation, we provide a dataset designed specifically for this task. Our dataset is recorded in Bonn, Germany, in a real apple orchard with a variety of sensors, spanning from a terrestrial laser scanner to a RGB-D camera mounted on different robots platforms. The experiments show that our approach surpasses state-of-the-art approaches in 3D panoptic segmentation in the agricultural domain, while also providing full hierarchical panoptic segmentation. Our dataset is publicly available at https://www.ipb.uni-bonn.de/data/hops/. The open-source implementation of our approach is available at https://github.com/PRBonn/hapt3D.

10:50-10:55, Paper TuAT22.5
Optimal Scheduling of a Dual-Arm Robot for Efficient Strawberry Harvesting in Plant Factories

Zhu, Yuankai	University of California, Davis
Lu, Wenwu	ZJU-Hangzhou Global Scientific and Technological Innovation Cent
Ren, Guoqiang	Zhejiang University
Gao, Haiming	Zhejiang University
Vougioukas, Stavros	UC Davis
Ying, Yibin	Zhejiang University
Peng, Chen	Zhejiang University
Keywords: Agricultural Automation, Planning, Scheduling and Coordination, Field Robots Abstract: Plant factory cultivation is widely recognized for its ability to optimize resource use and boost crop yields. To further increase the efficiency in these environments, we propose a general mixed-integer linear programming (MILP) framework that systematically schedules and coordinates dual-arm harvesting tasks, minimizing the overall harvesting makespan based on pre-mapped fruit locations. Specifically, we focus on a specialized dual-arm harvesting robot and employ pose coverage analysis of its end effector to maximize picking reachability. Additionally, we compare the performance of the dual-arm configuration with that of a single-arm vehicle, demonstrating that the dual-arm system can nearly double efficiency when fruit densities are roughly equal on both sides. Extensive simulations show a 10–20% increase in throughput and a significant reduction in the number of stops compared to nonoptimized methods. These results underscore the advantages of an optimal scheduling approach in improving the scalability and efficiency of robotic harvesting in plant factories.

10:55-11:00, Paper TuAT22.6
JPDS-NN: Reinforcement Learning-Based Dynamic Task Allocation for Agricultural Vehicle Routing Optimization

Fan, Yixuan	Tsinghua University
Xu, Haotian	Beijing Institute of Astronautical Systems Engineering
Liu, Mengqiao	Tsinghua University
Zhuo, Qing	TSINGHUA University
Zhang, Tao	Tsinghua University
Keywords: Agricultural Automation, Planning, Scheduling and Coordination, Reinforcement Learning Abstract: The Entrance Dependent Vehicle Routing Problem (EDVRP) is a variant of the Vehicle Routing Problem (VRP) where the scale of cities influences routing outcomes, necessitating consideration of their entrances. This paper addresses EDVRP in agriculture, focusing on multi-parameter vehicle planning for irregularly shaped fields. To address the limitations of traditional methods, such as heuristic approaches, which often overlook field geometry and entrance constraints, we propose a Joint Probability Distribution Sampling Neural Network (JPDS-NN) to effectively solve the EDVRP. The network uses an encoder-decoder architecture with graph transformers and attention mechanisms to model routing as a Markov Decision Process, and is trained via reinforcement learning for efficient and rapid end-to-end planning. Experimental results indicate that JPDS-NN reduces travel distances by 48.4–65.4%, lowers fuel consumption by 14.0–17.6%, and computes two orders of magnitude faster than baseline methods, while demonstrating 15–25% superior performance in dynamic arrangement scenarios. Ablation studies validate the necessity of cross-attention and pre-training. The framework enables scalable, intelligent routing for large-scale farming under dynamic constraints.

11:00-11:05, Paper TuAT22.7
Adaptive Viewpoint Selection for Tomato Truss Localization Via Polytope Hypotheses

van den Brandt, Gijs	Eindhoven University of Technology
Senden, Jordy Patrick Franciscus	Eindhoven University of Technology
Van Esch, Hilde	Eindhoven University of Technology
Torta, Elena	Eindhoven University of Technology
van de Molengraft, Marinus Jacobus Gerardus	University of Technology Eindhoven
Keywords: Computational Geometry, Visual Servoing, Agricultural Automation Abstract: Robotization is considered a key solution to labor shortages in the agri-food industry. However, deploying robots in natural environments is challenging due to unpredictable factors such as plant variances and occlusions. This paper focuses on the localization of tomato trusses for autonomous harvesting by servoing a robot-mounted camera to different viewpoints. We build on previous work where the robot is provided with prior knowledge of the tomato plant. Specifically, the geometric relations between the trusses are modeled as ranges, which reflect uncertainty. Our main contribution is an approach that represents this uncertainty as polytope volumes. Polytopes enable scalable reasoning that facilitates likelihood estimation for viewpoint selection. Our method first constructs polytope hypotheses regarding the truss locations based on prior plant knowledge. It then refines the polytope shapes using Bayesian updates based on camera observations. Finally, the polytopes are used to select the next viewpoint that maximizes the chance of observing a new tomato truss. Experiments show that polytope-based viewpoint selection speeds up truss localization compared to earlier methods, advancing robotic harvesting.


TuAT23	102B
Sensor Fusion 1	Regular Session
Chair: Zhou, Boyu	Southern University of Science and Technology
Co-Chair: Yue, Yufeng	Beijing Institute of Technology

10:30-10:35, Paper TuAT23.1
MT-Fusion: Multi-Task Learning for Degradation-Aware Infrared and Visible Image Fusion

Gao, Yuyang	Waseda University
Zhang, Yifei	Waseda University
Xie, Jianan	Waseda University
Xu, Zhen	Waseda University
Shi, Mengyao	WASEDA UNIVERSITY
Hashimoto, Kenji	Waseda University
Keywords: Sensor Fusion, AI-Based Methods Abstract: The effective fusion of infrared and visible images could ehance environment perception during robot rescue mission by combining complementary information from both sensors. However, most existing fusion methods are developed for images captured under normal conditions, which limits their performance in real-world rescue scenarios where images often suffer from diverse degradation such as haze, low-light, noise, low-contrast, and so on. To address this challenge, we propose MT-Fusion, a novel deep learning workflow based on multi-task learning (MTL) that targets multiple specific degradation scenarios. MT-Fusion incorporates specific encoder modules for processing various degraded images, a degradation attention mechanism for fusion, and a shared decoder for image reconstruction. Extensive experiments demonstrate that our proposed specific scenario-guided image fusion strategy has obvious advantages in robot perception in the image fusion performance and degradation treatment. For the inference stage, we propose a selector that could automatically categorize the degradation of input and activate the specific encoder. The MT-Fusion also provides a more practical solution for enhancing robot rescue operations in challenging environments.

10:35-10:40, Paper TuAT23.2
CalibMutiL: Online Calibration of LiDAR-Camera Based on Multi-Level Visual Feature Fusion

Zhang, GuangHui	Xinjiang University
Eksan, Firkat	Tsinghua University
Suleyman, Eliyas	School of Computing Science, University of Glasgow
Xie, Bangquan	Great Bay University
Li, Fengze	Chang‘an University
Hamdulla, Askar	Xinjiang University
Keywords: Sensor Fusion, Autonomous Vehicle Navigation, Intelligent Transportation Systems Abstract: Multi-sensor fusion is a key technology in the field of autonomous driving and robotics.Traditional offline multi-sensor fusion calibration methods rely on manual operations and fail to meet real-time requirements, while recent online calibration technologies have limited generalization capabilities.This paper proposes CalibMutiL, an end-to-end calibration network that departs from conventional deep feature fusion by leveraging multi-level RGB image features to guide point cloud alignment.CalibMutiL introduces a Multi-level Fusion module (MLF) that effectively utilizes the rich visual features of the image. In addition, we regard the alignment process as a sequence prediction problem and further improve the performance through an Iterative Refinement module (IRM).Evaluation of the KITTI odometry and raw dataset demonstrates the average calibration error reaches 0.81 cm and 0.09°. The generalization tests resulted in errors of 4.24cm and 0.13°, outperforming existing methods. Our implementation will be publicly available at https://github.com/VIP-G/CalibMutiL.

10:40-10:45, Paper TuAT23.3
IMMNN: Robust Wireless Electromagnetic-Inertial Fusion Tracking Via Learning an Adaptive IMM

Lin, Sichao	Quanzhou Institute of Equipment Manufacturing Haixi Institute, C
Wang, Zengwei	Technical University of Munich
Sun, Yilun	Technical University of Munich
Hao, Guangjun	Fujian Agriculture and Forestry University
Lian, Yanglin	Quanzhou Institute of Equipment Manufacturing, Fujian Institute
Xia, Xuke	Quanzhou Institute of Equipment Manufacturing, Haixi Institutes,
Dai, Houde	Haixi Institutes, Chinese Academy of Sciences
Lueth, Tim C.	Technical University of Munich
Keywords: Sensor Fusion, AI-Based Methods, Deep Learning Methods Abstract: Wireless Electromagnetic Tracking (WEMT) enables non-line-of-sight (NLoS) pose estimation in robotics but faces accuracy limitations from restricted operational range and environmental interference. This paper proposes a WEMT-inertial fusion system enhanced by a learning-based Interacting Multiple Model (IMMNN) to address these challenges. The framework integrates a multi-transmitter array with a WEMT-IMU fusion tracker, leveraging IMMNN to mitigate performance degradation caused by nonlinear spatial noise and motion uncertainty in dynamic, array-based environments. IMMNN employs a graph attention network to dynamically model spatial correlations among array units, adaptively optimizing state transition probabilities across motion models. A gated recurrent framework further enhances robustness by analyzing residual sequences to suppress transient noise and outliers. Experimental results demonstrate that the proposed system achieves a root-mean-square error (RMSE) of 30.4 mm over an expanded 1.9×1.9 m² operational area. The graph attention mechanism enables adaptive spatial noise suppression and ensures stable tracking under rapid motion and electromagnetic disturbances. By synergizing model-driven filtering with data-driven learning, IMMNN effectively improves accuracy and robustness, advancing high-precision WEMT solutions for complex robotic applications.

10:45-10:50, Paper TuAT23.4
AKF-LIO: LiDAR-Inertial Odometry with Gaussian Map by Adaptive Kalman Filter

Xie, Xupeng	The Hong Kong University of Science and Technology
Geng, Ruoyu	Hong Kong University of Science and Technology
Ma, Jun	The Hong Kong University of Science and Technology
Zhou, Boyu	Southern University of Science and Technology
Keywords: Localization, Sensor Fusion, SLAM Abstract: Existing LiDAR-Inertial Odometry (LIO) systems typically use sensor-specific or environment-dependent measurement covariances during state estimation, leading to laborious parameter tuning and suboptimal performance in challenging conditions (e.g., sensor degeneracy, noisy observations). Therefore, we propose an Adaptive Kalman Filter (AKF) framework that dynamically estimates time-varying noise covariances of LiDAR and Inertial Measurement Unit (IMU) measurements, enabling context-aware confidence weighting between sensors. During LiDAR degeneracy, the system prioritizes IMU data while suppressing contributions from unreliable inputs like moving objects or noisy point clouds. Furthermore, a compact Gaussian-based map representation is introduced to model environmental planarity and spatial noise. A correlated registration strategy ensures accurate plane normal estimation via pseudo-merge, even in unstructured environments like forests. Extensive experiments validate the robustness of the proposed system across diverse environments, including dynamic scenes and geometrically degraded scenarios. Our method achieves reliable localization results across all MARS-LVIG sequences and ranks 8th on the KITTI Odometry Benchmark. The code will be released at https://github.com/xpxie/AKF-LIO.git.

10:50-10:55, Paper TuAT23.5
Opti-Acoustic Scene Reconstruction in Highly Turbid Underwater Environments

Collado-Gonzalez, Ivana	Stevens Institute of Technology
McConnell, John	United States Naval Academy
Szenher, Paul	Stevens Institute of Technology
Englot, Brendan	Stevens Institute of Technology
Keywords: Marine Robotics, Sensor Fusion, Mapping Abstract: Scene reconstruction is an essential capability for underwater robots navigating in close proximity to structures. Monocular vision-based reconstruction methods are unreliable in turbid waters and lack depth scale information. Sonars are robust to turbid water and non-uniform lighting conditions, however, they have low resolution and elevation ambiguity. This work proposes a real-time opti-acoustic scene reconstruction method that is specially optimized to work in turbid water. Our strategy avoids having to identify point features in visual data and instead identifies regions of interest in the data. We then match relevant regions in the image to corresponding sonar data. A reconstruction is obtained by leveraging range data from the sonar and elevation data from the camera image. Experimental comparisons against other vision-based and sonar-based approaches at varying turbidity levels, and field tests conducted in marina environments, validate the effectiveness of the proposed approach. We have made our code open-source to facilitate reproducibility and encourage community engagement.

10:55-11:00, Paper TuAT23.6
BEV-LIO(LC): BEV Image Assisted LiDAR-Inertial Odometry with Loop Closure

Cai, Haoxin	Guangdong University of Technology
Yuan, Shenghai	Nanyang Technological University
Li, Xinyi	Guangdong University of Technology
Guo, Junfeng	Guangdong University of Technology
Liu, Jianqi	Guangdong University of Technology
Keywords: Sensor Fusion, Computer Vision for Automation, SLAM Abstract: This work introduces BEV-LIO(LC), a novel LiDAR-Inertial Odometry (LIO) framework that combines Bird's Eye View (BEV) image representations of LiDAR data with geometry-based point cloud registration and incorporates loop closure (LC) through BEV image features. By normalizing point density, we project LiDAR point clouds into BEV images, thereby enabling efficient feature extraction and matching. A lightweight convolutional neural network (CNN) based feature extractor is employed to extract distinctive local and global descriptors from the BEV images. Local descriptors are used to match BEV images with FAST keypoints for reprojection error construction, while global descriptors facilitate loop closure detection. Reprojection error minimization is then integrated with point-to-plane registration within an iterated Extended Kalman Filter (iEKF). In the back-end, global descriptors are used to create a KD-tree-indexed keyframe database for accurate loop closure detection. When a loop closure is detected, Random Sample Consensus (RANSAC) computes a coarse transform from BEV image matching, which serves as the initial estimate for Iterative Closest Point (ICP). The refined transform is subsequently incorporated into a factor graph along with odometry factors, improving the global consistency of localization. Extensive experiments conducted in various scenarios with different LiDAR types demonstrate that BEV-LIO(LC) outperforms state-of-the-art methods, achieving competitive localization accuracy. Our code and video can be found at url{https://github.com/HxCa1/BEV-LIO-LC}.

11:00-11:05, Paper TuAT23.7
What Really Matters for Robust Multi-Sensor HD Map Construction?

Hao, Xiaoshuai	Samsung Research China - Beijing (SRC-B)
Zhao, Yuting	Institute of Automation，Chinese Academy of Sciences
Ji, Yuheng	Chinese Academy of Science
Dai, Luanyuan	Nanjing University of Science and Technology
Hao, Peng	University of Chinese Academy of Sciences
Li, Dingzhe	Beihang University
Cheng, Shuai	China North Artificial Intelligent & Innovation Research Institu
Yin, Rong	Institute of Information Engineering, Chinese Academy of Science
Keywords: Sensor Fusion, Autonomous Vehicle Navigation, Deep Learning for Visual Perception Abstract: High-definition (HD) map construction methods are crucial for providing precise and comprehensive static environmental information, which is essential for autonomous driving systems. While Camera-LiDAR fusion techniques have shown promising results by integrating data from both modalities, existing approaches primarily focus on improving model accuracy, often neglecting the robustness of perception models—a critical aspect for real-world applications. In this paper, we explore strategies to enhance the robustness of multi-modal fusion methods for HD map construction while maintaining high accuracy. We propose three key components: data augmentation, a novel multi-modal fusion module, and a modality dropout training strategy. These components are evaluated on a challenging dataset containing 13 types of multi-sensor corruption. Experimental results demonstrate that our proposed modules significantly enhance the robustness of baseline methods. Furthermore, our approach achieves state-of-the-art performance on the clean validation set of the NuScenes dataset. Our findings provide valuable insights for developing more robust and reliable HD map construction models, advancing their applicability in real-world autonomous driving scenarios.

11:05-11:10, Paper TuAT23.8
OpenVox: Real-Time Instance-Level Open-Vocabulary Probabilistic Voxel Representation

Deng, Yinan	Beijing Institute of Technology
Yao, BiCheng	Beijing Institute of Technology
Tang, Yihang	Beijing Institute of Technology
Zhou, Tianxing	Beijing Institute of Technology
Yang, Yi	Beijing Institute of Technology
Yue, Yufeng	Beijing Institute of Technology
Keywords: Mapping, Semantic Scene Understanding Abstract: In recent years, vision-language models (VLMs) have advanced open-vocabulary mapping, enabling mobile robots to simultaneously achieve environmental reconstruction and high-level semantic understanding. While integrated object cognition helps mitigate semantic ambiguity in point-wise feature maps, efficiently obtaining rich semantic understanding and robust incremental reconstruction at the instance-level remains challenging. To address these challenges, we introduce OpenVox, a real-time incremental open-vocabulary probabilistic instance voxel representation. In the front-end, we design an efficient instance segmentation and comprehension pipeline that enhances language reasoning through encoding captions. In the back-end, we implement probabilistic instance voxels and formulate the cross-frame incremental fusion process into two subtasks: instance association and live map evolution, ensuring robustness to sensor and segmentation noise. Extensive evaluations across multiple datasets demonstrate that OpenVox achieves state-of-the-art performance in zero-shot instance segmentation, semantic segmentation, and open-vocabulary retrieval. The project page of OpenVox is available at https://open-vox.github.io/.


TuAT24	102C
Software Architecture and Tools	Regular Session
Chair: Bayón-Gutiérrez, Martín	Universidad De León
Co-Chair: Nguyen, Minh	University of Bremen

10:30-10:35, Paper TuAT24.1
PyRoki: A Modular Toolkit for Robot Kinematic Optimization

Kim, Chung Min	University of California, Berkeley
Yi, Brent	University of California, Berkeley
Choi, Hongsuk	Samsung AI Center, New York
Ma, Yi	University of Illinois at Urbana-Champaign
Goldberg, Ken	UC Berkeley
Kanazawa, Angjoo	UC Berkeley
Keywords: Software Tools for Robot Programming, Kinematics Abstract: Robot motion can have many goals. Depending on the task, we might optimize for pose error, speed, collision, or similarity to a human demonstration. Motivated by this, we present PyRoki: a modular, extensible, and device-agnostic toolkit for solving kinematic optimization problems. PyRoki couples an interface for specifying kinematic variables and costs with an efficient nonlinear least squares optimizer. Unlike existing tools, it is also device-agnostic: optimization runs natively on CPU, GPU, and TPU. In this paper, we present (i) the design and implementation of PyRoki, (ii) motion retargeting and planning case studies that highlight the advantages of PyRoki's modularity, and (iii) optimization benchmarking, where PyRoki can be 1.4-1.7x faster and converges to lower errors than cuRobo, an existing GPU-accelerated inverse kinematics library. The code is open-sourced at https://pyroki-toolkit.github.io.

10:35-10:40, Paper TuAT24.2
Ad-Trait: A Fast and Flexible Automatic Differentiation Library in Rust

Liang, Chen	Yale University
Wang, Qian	Yale University
Xu, Andy	Yale University
Rakita, Daniel	Yale University
Keywords: Software, Middleware and Programming Environments, Optimization and Optimal Control Abstract: The Rust programming language is an attractive choice for robotics and related fields, offering highly efficient and memory-safe code. However, a key limitation preventing its broader adoption in these domains is the lack of high-quality, well-supported Automatic Differentiation (AD)—a fundamental technique that enables convenient derivative computation by systematically accumulating data during function evaluation. In this work, we introduce ad-trait, a new Rust-based AD library. Our implementation overloads Rust's standard floating-point type with a flexible trait that can efficiently accumulate necessary information for derivative computation. The library supports both forward-mode and reverse-mode automatic differentiation, making it the first operator-overloading AD implementation in Rust to offer both options. Additionally, ad-trait leverages Rust’s performance-oriented features, such as Single Instruction, Multiple Data acceleration in forward-mode AD, to enhance efficiency. Through benchmarking experiments, we show that our library is among the fastest AD implementations across several programming languages for computing derivatives. Moreover, it is already integrated into a Rust-based robotics library, where we showcase its ability to facilitate fast optimization procedures. We conclude with a discussion of the limitations and broader implications of our work.

10:40-10:45, Paper TuAT24.3
Automated Behaviour-Driven Acceptance Testing of Robotic Systems

Nguyen, Minh	University of Bremen
Wrede, Sebastian	Bielefeld University
Hochgeschwender, Nico	University of Bremen
Keywords: Software Tools for Robot Programming, Software Tools for Benchmarking and Reproducibility, Engineering for Robotic Systems Abstract: The specification and validation of robotics applications require bridging the gap between formulating requirements and systematic testing. This often involves manual and error-prone tasks that become more complex as requirements, design, and implementation evolve. To address this challenge systematically, we propose extending behaviour-driven development (BDD) to define and verify acceptance criteria for robotic systems. In this context, we use domain-specific modelling and represent composable BDD models as knowledge graphs for robust querying and manipulation, facilitating the generation of executable testing models. A domain-specific language helps to efficiently specify robotic acceptance criteria. We explore the potential for automated generation and execution of acceptance tests through a software architecture that integrates a BDD framework, Isaac Sim, and model transformations, focusing on acceptance criteria for pick-and-place applications. We tested this architecture with an existing pick-and-place implementation and evaluated the execution results, which shows how this application behaves and fails differently when tested against variations of the agent and environment. This research advances the rigorous and automated evaluation of robotic systems, contributing to their reliability and trustworthiness.

10:45-10:50, Paper TuAT24.4
AS2FM: Enabling Statistical Model Checking of ROS 2 Systems for Robust Autonomy

Henkel, Christian	Robert Bosch GmbH
Lampacrescia, Marco	Bosch Research
Klauck, Michaela	Bosch Research
Morelli, Matteo	Université Paris Saclay, CEA, LIST, F-91120 Palaiseau, France
Keywords: Software, Middleware and Programming Environments, Software Tools for Robot Programming, Software Architecture for Robotic and Automation Abstract: Designing robotic systems to act autonomously in unforeseen environments is a challenging task. This work presents a novel approach to use formal verification, specifically Statistical Model Checking (SMC), to verify system properties of autonomous robots at design-time. We introduce an extension of the SCXML format, designed to model system components including both Robot Operating System 2 (ROS 2) and Behavior Tree (BT) features. Further, we contribute Autonomous Systems to Formal Models (AS2FM), a tool to translate the full system model into JANI. The use of JANI, a standard format for quantitative model checking, enables verification of system properties with off-the-shelf SMC tools. We demonstrate the practical usability of AS2FM both in terms of applicability to real-world autonomous robotic control systems, and in terms of verification runtime scaling. We provide a case study, where we successfully identify problems in a ROS 2-based robotic manipulation use case that is verifiable in less than one second using consumer hardware. Additionally, we compare to the state of the art and demonstrate that our method is more comprehensive in system feature support, and that the verification runtime scales linearly with the size of the model, instead of exponentially.

10:50-10:55, Paper TuAT24.5
QBIT: Quality-Aware Cloud-Based Benchmarking for Robotic Insertion Tasks

Schempp, Constantin	Karlsruhe University of Applied Sciences
Zhang, Yongzhou	Karlsruhe University of Applied Sciences
Friedrich, Christian	Karlsruhe University of Applied Sciences
Hein, Björn	Karlsruhe University of Applied Sciences
Keywords: Software Tools for Benchmarking and Reproducibility, Assembly, Methods and Tools for Robot System Design Abstract: Insertion tasks are fundamental yet challenging for robots, particularly in autonomous operations, due to their continuous interaction with the environment. AI-based approaches appear to be up to the challenge, but in production they must not only achieve high success rates. They must also ensure insertion quality and reliability. To address this, we introduce QBIT, a quality-aware benchmarking framework that incorporates additional metrics such as force energy, force smoothness and completion time to provide a comprehensive assessment. To ensure statistical significance and minimize the sim-to-real gap, we randomize contact parameters in the MuJoCo simulator, account for perceptual uncertainty, and conduct large-scale experiments on a Kubernetes-based infrastructure. Our microservice-oriented architecture ensures extensibility, broad applicability, and improved reproducibility. To facilitate seamless transitions to physical robotic testing, we use ROS2 with containerization to reduce integration barriers. We evaluate QBIT using three insertion approaches: geometric-based, force-based, and learning-based, in both simulated and real-world environments. In simulation, we compare the accuracy of contact simulation using different mesh decomposition techniques. Our results demonstrate the effectiveness of QBIT in comparing different insertion approaches and accelerating the transition from laboratory to real-world applications.

10:55-11:00, Paper TuAT24.6
Set Phasers to Stun: Beaming Power and Control to Mobile Robots with Laser Light

Carver, Charles	Massachusetts Institute of Technology
Schwartz, Hadleigh	Columbia University
Itagaki, Toma	Columbia University
Englhardt, Zachary	University of Washington
Liu, Kechen	Columbia University
Graciela Nauli Manik, Megan	Columbia University
Chang, Chun-Cheng	University of Washington
Iyer, Vikram	University of Washington
Plancher, Brian	Barnard College, Columbia University and Dartmouth College
Zhou, Xia	Columbia University
Keywords: Software-Hardware Integration for Robot Systems, Engineering for Robotic Systems, Micro/Nano Robots Abstract: We present Phaser, a flexible system that directs narrow-beam laser light to moving robots for concurrent wireless power delivery and communication. We design a semi-automatic calibration procedure to enable fusion of stereo-vision-based 3D robot tracking with high-power beam steering,and a low-power optical communication scheme that reuses the laser light as a data channel. We fabricate a Phaser prototype using off-the-shelf hardware and evaluate its per-formance with battery-free autonomous robots. Phaser delivers optical power densities of over 110 mW/cm2 and error-free data to mobile robots at multi-meter ranges, with on-board decoding drawing 0.3 mA (97% less current than Bluetooth Low Energy). We demonstrate Phaser fully powering gram- scale battery-free robots to nearly 2x higher speeds than prior work while simultaneously controlling them to navigate around obstacles and along paths. Code, an open-source design guide, and a demonstration video of Phaser is available at https://mobilex.cs.columbia.edu/phaser/.

11:00-11:05, Paper TuAT24.7
Service Discovery-Based Hybrid Network Middleware for Efficient Communication in Distributed Robotic Systems

Sang, Shiyao	Huaiyin Institute of Technology
Ling, Yinggang	Huaiyin Institute of Technology
Keywords: Software Architecture for Robotic and Automation, Computer Architecture for Robotic and Automation, Software, Middleware and Programming Environments Abstract: Robotic middleware is fundamental to ensuring reliable communication among system components and is crucial for intelligent robotics, autonomous vehicles, and smart manufacturing. However, existing robotic middleware often struggles to meet the diverse communication demands, optimize data transmission efficiency, and maintain scheduling determinism between Orin computing units in large-scale L4 autonomous vehicle deployments. This paper presents RIMAOS2C, a service discovery-based hybrid network communication middleware designed to tackle these challenges. By leveraging multi-level service discovery multicast, RIMAOS2C supports a wide variety of communication modes, including multiple cross-chip Ethernet protocols and PCIe communication capabilities. The core mechanism of the middleware, the Message Bridge, optimizes data flow forwarding and employs shared memory for centralized message distribution, reducing message redundancy and minimizing transmission delay uncertainty, thus improving both communication efficiency and scheduling stability. Tested and validated on L4 vehicles and Jetson Orin domain controllers, RIMAOS2C leverages TCP-based ZeroMQ to overcome the large-message transmission bottleneck inherent in native CyberRT middleware. In scenarios involving two cross-chip subscribers, RIMAOS2C eliminates message redundancy, enhancing transmission efficiency by 36%–40% for large data transfers while reducing callback time differences by 42%–906%. This research advances the communication capabilities of robotic operating systems and introduces a novel approach to optimizing communication in distributed computing architectures for autonomous driving systems.

11:05-11:10, Paper TuAT24.8
HyperGraph ROS: An Open-Source Robot Operating System for Hybrid Parallel Computing Based on Computational HyperGraph

Zhang, Shufang	Tianjin University
Wu, Jiazheng	Tianjin University
Wang, Kaiyi	Tianjin University
JiaCheng, He	TianJin University
An, Shan	JD.COM Inc
Keywords: Software Architecture for Robotic and Automation, Software Tools for Robot Programming, Software-Hardware Integration for Robot Systems Abstract: This paper presents HyperGraph ROS, an open-source robot operating system that unifies intra-process, inter-process, and cross-device computation into a computational hypergraph for efficient message passing and parallel execution. In order to optimize communication, HyperGraph ROS dynamically selects the optimal communication mechanism while maintaining a consistent API. For intra-process messages, Intel-TBB Flow Graph is used with C++ pointer passing, which ensures zero memory copying and instant delivery. Meanwhile, inter-process and cross-device communication seamlessly switch to ZeroMQ. When a node receives a message from any source, it is immediately activated and scheduled for parallel execution by Intel-TBB. The computational hypergraph consists of nodes represented by TBB flow graph nodes and edges formed by TBB pointer-based connections for intra-process communication, as well as ZeroMQ links for inter-process and cross-device communication. This structure enables seamless distributed parallelism. Additionally, HyperGraph ROS provides ROS-like utilities such as a parameter server, a coordinate transformation tree, and visualization tools. Evaluation in diverse robotic scenarios demonstrates significantly higher transmission efficiency compared to ROS2. Our work is available at url{https://github.com/wujiazheng2020/hyper_graph_ros}.


TuAT25	103A
Dexterous Manipulation 1	Regular Session
Chair: Li, Zhengxiong	University of Colorado Denver
Co-Chair: Kim, Jiyun	Ulsan National Institute of Science and Technology

10:30-10:35, Paper TuAT25.1
LDexMM: Language-Guided Dexterous Multi-Task Manipulation with Reinforcement Learning

Yan, Hengxu	Shanghai Jiao Tong University
Wang, Junbo	Shanghai Jiao Tong University
Fang, Hao-Shu	Massachusetts Institute of Technology
Yu, Qiaojun	Shanghai Jiao Tong University
Lu, Cewu	ShangHai Jiao Tong University
Keywords: Dexterous Manipulation, Reinforcement Learning, Grasping Abstract: Language plays a crucial role in robotic manipulation, particularly in facilitating complex tasks. Previous work primarily focused on two-finger manipulation. However, leveraging language to guide reinforcement learning for dexterous hands remains a challenge due to their high degrees of freedom. In this work, we introduce a language-guided dexterous multi-task manipulation framework (LDexMM), which decomposes the problem into two distinct phases. First, we use language instructions to guide a segmentation model in generating a dexterous grasp pose for the functional part of the object. After establishing this initial grasp, reinforcement learning is employed to refine the grasp pose and complete the task. Simultaneously, language constraints are applied to focus the actions on the specified object. Our experiments demonstrate success rates of 31%, 40%, 49.2%, and 72.7% on 10, 7, 5, and 3 tasks, respectively, with a single model.

10:35-10:40, Paper TuAT25.2
RoboDexVLM: Visual Language Model-Enabled Task Planning and Motion Control for Dexterous Robot Manipulation

Liu, Haichao	The Hong Kong University of Science and Technology
Guo, Sikai	Harbin Institute of Technology
Mai, Pengfei	The Hong Kong University of Science and Technology (Guangzhou)
Cao, Jiahang	The Hong Kong University of Science and Technology (Guangzhou)
Li, Haoang	Hong Kong University of Science and Technology (Guangzhou)
Ma, Jun	The Hong Kong University of Science and Technology
Keywords: Dexterous Manipulation, Manipulation Planning, Perception for Grasping and Manipulation Abstract: This paper introduces RoboDexVLM, an innovative framework for robot task planning and grasp detection tailored for a collaborative manipulator equipped with a dexterous hand. Previous methods focus on simplified and limited manipulation tasks, which often neglect the complexities associated with grasping a diverse array of objects in a long-horizon manner. In contrast, our proposed framework utilizes a dexterous hand capable of grasping objects of varying shapes and sizes while executing tasks based on natural language commands. The proposed approach has the following core components: First, a robust task planner with a task-level recovery mechanism that leverages vision-language models (VLMs) is designed, which enables the system to interpret and execute open-vocabulary commands for long sequence tasks. Second, a language-guided dexterous grasp perception algorithm is presented based on robot kinematics and formal methods, tailored for zero-shot dexterous manipulation with diverse objects and commands. Comprehensive experimental results validate the effectiveness, adaptability, and robustness of RoboDexVLM in handling long-horizon scenarios and performing dexterous grasping. These results highlight the framework's ability to operate in complex environments, showcasing its potential for open-vocabulary dexterous manipulation. Our open-source project page can be found at https://henryhcliu.github.io/robodexvlm.

10:40-10:45, Paper TuAT25.3
Dexterous Manipulation Based on Prior Dexterous Grasp Pose Knowledge

Yan, Hengxu	Shanghai Jiao Tong University
Fang, Hao-Shu	Massachusetts Institute of Technology
Lu, Cewu	ShangHai Jiao Tong University
Keywords: Dexterous Manipulation, Grasping, Reinforcement Learning Abstract: Dexterous manipulation has received considerable attention in recent research. Predominantly, existing studies have concentrated on reinforcement learning methods to address the substantial degrees of freedom in hand movements. Nonetheless, these methods typically suffer from low efficiency and accuracy. In this work, we introduce a novel reinforcement learning approach that leverages prior dexterous grasp pose knowledge to enhance both efficiency and accuracy. Unlike previous work, they always make the robotic hand go with a fixed dexterous grasp pose, We decouple the manipulation process into two distinct phases: initially, we generate a dexterous grasp pose targeting the functional part of the object; after that, we employ reinforcement learning to comprehensively explore the environment. Our findings suggest that the majority of learning time is expended in identifying the appropriate initial position and selecting the optimal manipulation viewpoint. Experimental results demonstrate significant improvements in learning efficiency and success rates across four distinct tasks.

10:45-10:50, Paper TuAT25.4
DexPour: Effective and Efficient High-DoF Robotic Hand Liquid Pouring Via Hierarchical Reward with Approximated Proxy Abstraction

Fang, Xinmin	University of Colorado Denver
Tao, Lingfeng	Kennesaw State University
Li, Zhengxiong	University of Colorado Denver
Keywords: Dexterous Manipulation, Deep Learning in Grasping and Manipulation Abstract: Pouring fluids is a routine task for humans but challenging for high-DoF robots, particularly given fluid simulation's computational demands while training policies. In this paper, we propose DexPour, a novel reinforcement learning method with hierarchical rewards and Approximated Proxy Abstraction (APA) method. APA efficiently approximates liquid behavior using a small set of spheres, reducing computational overhead. Meanwhile, our hierarchical reward framework breaks down the intricate pouring process into four distinct stages—approach, grasp, transport, and pour—providing fine-grained feedback and fostering stable policy learning. Extensive experiments demonstrate that DexPour achieves a 92% fluid transfer efficiency with a 70% cup fill and a 99% efficiency at 30% fill, highlighting its robust performance across varying liquid volumes. Ablation studies highlight the contribution of each component, confirming the necessity of detailed stage-wise guidance for complex dexterous manipulation. In addition, we compare DexPour with a full fluid simulation baseline, showing comparable pouring efficiency while reducing training time by 81.6%, demonstrating DexPour's efficiency and practical viability for fluid manipulation tasks.

10:50-10:55, Paper TuAT25.5
High DOF Tendon-Driven Soft Hand: A Modular System for Versatile and Dexterous Manipulation

Jang, Yeonwoo	Ulsan National Institute of Science and Technology (UNIST)
Lee, Hajun	UC Berkeley
Kim, Junghyo	Ulsan National Institute of Science and Technology
Yoon, Taerim	Korea University
Chai, Yoonbyung	Korea University
Won, Heejae	Ulsan National Institute of Science and Technology
Choi, Sungjoon	Korea University
Kim, Jiyun	Ulsan National Institute of Science and Technology
Keywords: Dexterous Manipulation, Soft Robot Materials and Design, Tendon/Wire Mechanism Abstract: The soft robotic hand exhibits a wide range of manipulation capabilities, which are attributed to the dexterity of its soft fingers and their coordinated movements. Therefore, designing a versatile soft hand requires careful consideration of both the characteristics of the individual fingers, such as degree of freedom (DOF), and their strategic arrangements to optimize performance for specific target tasks. This work presents a modularized high DOF tendon-driven soft finger and a customized design of a soft robotic hand for diverse dexterous manipulation tasks. Furthermore, an all-in-one module is developed that integrates both the 4-way tendon-driven soft finger body and drive parts. Its high DOF enables multi-directional actuations with a wide actuation range, thereby expanding possible manipulation modes. The modularity of the system expands the design space for finger arrangements, which enables the diverse configuration of robotic hands and facilitates the customization of task-oriented platforms. To achieve sophisticated control of these complex configurations, we employ neural network-planned trajectories, enabling the precise execution of complicated tasks. The performance of a single finger is validated, including dexterity and payload, and several real-world manipulation tasks are demonstrated, including writing, grasping, rotating, and spreading, using motion primitives of diverse soft hands with distinctive finger arrangements. These demonstrations showcase the system's versatility and precision in various tasks. We expect that our system will contribute to the expansion of possibilities in the field of soft robotic manipulation.

10:55-11:00, Paper TuAT25.6
Peg-In-Hole Assembly Method Based on Visual Reinforcement Learning and Tactile Pose Estimation

Tao, Yong	Beijing University of Aeronautics and Astronautics
Chen, Shuo	Beihang University
Liu, Haitao	Beihang University
Gao, He	BeiHang University
Tao, Yu	Beihang University
Chen, Yixian	Beihang University
Wei, Hongxing	Beihang University
Keywords: Dexterous Manipulation, Learning from Experience, Haptics and Haptic Interfaces Abstract: When robots replicate human actions in peg-in-hole assembly tasks, such as USB Type-A insertion and removal, the complexity of the process and frequent obstructions from the inner walls make it difficult for robots to handle collisions or avoid jamming. These difficulties contribute to a low success rate in assembly.This paper proposes a vision-guided reinforcement learning pre-assembly combined with tactile feedback-based pose estimation adjustment method for peg-in-hole assembly, achieving significant improvement in success rates for complex assembly tasks. First, during the pretraining process of reinforcement learning, high-reward sample data is collected, and a behaviour cloning (BC) algorithm is constructed based on sample data structure. The network is pretrained as a policy regression layer. Under sparse reward conditions, outputs of the twin delayed deep deterministic policy gradient (TD3) network and the BC network are combined to improve training stability and accelerate convergence, enhancing the efficiency of vision-based assembly. Then, to address the instability caused by collisions with the inner and outer walls of the hole when vision-based assembly remains incomplete, an in-hand pose estimation algorithm based on the Gelsight visuotactile sensor is integrated. This algorithm facilitates real-time adjustments to the position of the robot’s end-effector, improving the likelihood of successful peg-in-hole assembly. Finally, to validate the effectiveness of the proposed method, experiments were conducted using the V-REP simulation platform and the real Franka robot platform. In the experiments, success rates of 90-93% and 80-85%, respectively, were achieved.

11:00-11:05, Paper TuAT25.7
Vision Guided Cable Installation in Constraint Environments Utilizing Parametric Curve Representation

Jiang, Xin	Harbin Institute of Technology, Shenzhen
Wei, HuangTao	Harbin Institute of Technology，Shenzhen
Liu, Zhitong	Harbin Institute of Technology, Shenzhen
Liao, Wenxi	Harbin Institute of Technology
Ran, Wei	Harbin Institute of Technology
Keywords: Dexterous Manipulation, Manipulation Planning Abstract: In this paper, a vision-based method is proposed for cable installation tasks in constrained environments. The main challenge of such tasks lies in the potential interference between the cable and surrounding obstacles. Model-based approaches are not well-suited for these industrial scenarios due to variations in the physical properties of workpieces. To address this, the proposed method integrates a potential field-based tip trajectory regulation with a shape deformation servo. In the shape deformation servo, a planner is employed to determine a feasible shape curve that avoids obstacles. This step is crucial, as an infeasible reference for shape control may result in unstable behavior. The effectiveness of the proposed method is validated through experiments. Notably, this approach does not rely on prior model information, making it highly adaptable for industrial deployment.


TuAT26	103B
Soft Robot Materials and Design 1	Regular Session
Chair: Zhang, Yunce	Shandong University
Co-Chair: Zhang, Hongying	National University of Singapore

10:30-10:35, Paper TuAT26.1
PneuChip: A Compact Pneumatic Controller for Large-Scale Soft Artificial Muscles

Wang, Zheng	National University of Singapore
Liu, Zhe	Xi'an Jiaotong University
Wang, Yimo	National University of Singapore
Zhang, Hongying	National University of Singapore
Keywords: Soft Robot Materials and Design, Modeling, Control, and Learning for Soft Robots, Soft Robot Applications Abstract: Pneumatic soft actuators are known for their versatility and reliability; however, their control presents a major challenge as systems scale beyond tens of actuators. Traditional rigid pneumatic valves add bulk, weight, and complexity, while most soft valves fail to generate programmable independent output states. We propose PneuChip, a compact pneumatic controller designed for large-scale soft actuators. The PneuChip functions as a two-dimensional array with m rows and n columns, controlled by m+n input signals to generate 2^(m+n)-2^m-2^n+2 distinct output states, enabling programmable control over m×n soft actuators. To validate its effectiveness, we implemented PneuChip in a muscular-skeletal robotic arm comprising 24 Miura-Ori inspired, negative pressure-actuated artificial muscles and a rigid two-link skeleton connected by a ball joint. A 4×6 PneuChip was fabricated and integrated to control the arm’s 3 degree of freedoms (DOFs) motion. Within a 120° rotation range, the robot arm achieved 946 distinct positions with smooth state transitions, paving the way for future applications in trajectory tracking and dexterous manipulation. The compact design and high controllability of PneuChip promise to notably simplify complex pneumatic systems, significantly enhancing the practicality of large-scale soft robots for various applications.

10:35-10:40, Paper TuAT26.2
A Bio-Inspired Stiffness-Programmable Robotic Flexible Joint Based on Electro-Adhesive Clutches

Ma, Yongxian	Southern University of Science and Technology
Wu, Chuang	Xi'an University of Architecture and Technology
Li, Qingbiao	The University of Macau
Cao, Chongjing	Shenzhen Institute of Advanced Technology, Chinese Academy of Sc
Li, Xiaozheng	Shenzhen Institute of Advanced Technology, Chinese Academy of Sc
Keywords: Soft Robot Materials and Design, Compliant Joints and Mechanisms, Biologically-Inspired Robots Abstract: Robots with active variable stiffness (VS) capabilities can potentially achieve safer interactions with humans and better adaptabilities to uncertainties in complex environments. Currently, the conventional jamming or phase-change-based VS mechanisms simultaneously act on the stiffnesses of the robotic joint in all axes, making it difficult to achieve decoupled stiffness programming in different directions/axes. To overcome this challenge, a bio-inspired stiffness-programmable robotic flexible joint (SPRFJ) based on the electro-adhesive (EA) clutches is proposed. By programming the ON/OFF states of the EA clutches on different surfaces around the SPRFJ, customization of stiffness profiles in different directions/axes can be realized, and therefore, the load-bearing capacity and flexibility of the robotic arm can be adjusted. A SPRFJ prototype consisting of four EA clutch units is developed, and through extensive experiments, we demonstrate that it can achieve a stiffness change of 21 times and can withstand resisting forces up to 13.41 N at 1 kV. The reliable multi-directional stiffness programmability of the SPRFJ is shown via extensive tests. Demonstrations on stable position locking at different angles and free movements while carrying payloads are conducted to showcase its application in soft robotics. This SPRFJ developed in this work processes the potential in industrial robots, search-and-rescue missions, and space explorations.

10:40-10:45, Paper TuAT26.3
T-Touch: A Soft Thermal-Haptic Multimodal Fingertip Wearable Device for Immersive Virtual Reality

Wang, Youzhan	University of Chinese Academy of Sciences
Li, Jinjun	Shenzhen Institute of Advanced Technology，Chinese Academy
Wu, Chuang	Xi'an University of Architecture and Technology
Li, Xiaozheng	Shenzhen Institute of Advanced Technology, Chinese Academy of Sc
Li, Qingbiao	The University of Macau
Digumarti, Krishna Manaswi	Queensland University of Technology
Cao, Chongjing	Shenzhen Institute of Advanced Technology, Chinese Academy of Sc
Keywords: Soft Robot Applications, Soft Robot Materials and Design, Haptics and Haptic Interfaces Abstract: Virtual reality (VR) technology has enormous applications in education, entertainment, and healthcare. Haptic feedback can significantly enhance the immersive experience in VR. However, most commercial hand/fingertip wearable VR haptic devices rely on bulky rigid structures, which are limited in the offered stimuli and cause fatigue. This study introduces a novel soft wearable fingertip device, T-Touch, that provides both thermal and multi-frequency haptic feedback for more realistic VR experiences. A flexible electrohydraulic actuator (EHA) is adopted for multi-frequency mechanical stimuli, and a flexible thermoelectric array (Flex-TEA) is utilized for distinct thermal stimuli. The EHA and Flex-TEA can be independently controlled to activate simultaneously or independently, thereby rendering ON/OFF contact stimuli, vibrations, controlled temperature stimuli, or any combination of the three modalities. Our T-Touch device features a compact form factor of 35 mm × 25 mm × 22 mm and weighs only ∼ 8 g. It can generate mechanical stimuli with the maximum stroke of ∼ 1 mm, a force of 0.47 N, at a bandwidth >10 Hz, and can render precise thermal stimuli in the range of 20 to 40 °C. The main performance of the EHA and Flex-TEA modules is characterized in extensive experiments and the effects of the key design and actuation parameters are investigated to optimise performance. Preliminary user tests verify the efficacy of our T-Touch design in immersive VR applications.

10:45-10:50, Paper TuAT26.4
Modeling of Viscoelastic Liquid Crystal Elastomer Actuators

Wang, Hao	The Chinese University of Hong Kong, Shenzhen
Xu, Yiqun	The Chinese University of Hong Kong, Shenzhen
Xiao, Fei	The Chinese University of Hong Kong, Shenzhen
Xu, Zhipeng	The Chinese University of Hong Kong, Shenzhen
Li, Jisen	Shenzhen Institute of Artificial Intelligence and Robotics for S
He, Qiguang	The Chinese University of Hong Kong
Zhu, Jian	Chinese University of Hong Kong, Shenzhen
Keywords: Soft Robot Materials and Design, Modeling, Control, and Learning for Soft Robots, Soft Sensors and Actuators Abstract: Soft robots and smart materials have seen rapid advancements in recent years, with significant potential applications in medical devices. Liquid crystal elastomers (LCEs) are particularly known for their large deformations and diverse actuation modes, which can serve as the end effector in soft catheters. However, LCEs exhibit strong hysteresis, which make their modeling and control challenging. In this paper, we develop a dynamic model of a light-stimulated LCE to describe its nonlinear time-dependent behavior. We first derive the relationship between the input laser power and the resulting temperature change, and then analyze the viscoelastic behavior of the LCE by taking advantage of a spring-dashpot frame. For both the linear contraction actuator and the bending actuator, the dynamic equations can describe their behavior with acceptable errors. The model is well-suited for LCE actuators with complex structures and provides a foundation for future control.

10:50-10:55, Paper TuAT26.5
Target Handling Modalities with Obstacle Avoidance for Planar Soft Growing Manipulator Design

Nurcan, Ozan	Kadir Has University
Astar, Ahmet	Kadir Has University
Kalafatlar, Ömer	Kadir Has University
Stroppa, Fabio	Kadir Has University
Keywords: Soft Robot Materials and Design, Collision Avoidance, Soft Robot Applications Abstract: Soft growing robots mimic plant-like growth to navigate complex environments thanks to their specific actuation and material. This class of robots can also be used for manipulation tasks. While manufacturing these robots for specific tasks, it is crucial to carefully design their length and placement of joints. In this work, we extend our state-of-the-art optimizer for planar soft growing manipulators design, which retrieves the optimal robot dimensions for a specific given task. While the first version of the optimizer only considered a base case (where targets were only points in space), in this work, we implement five target handling modalities based on real-case manipulation scenarios. Specifically, targets are treated as obstacles and, as such, occupy space in the environment. Depending on the modality, the way these targets are handled can change. Results show that with this extension, the optimizer can tackle different manipulation cases correctly.

10:55-11:00, Paper TuAT26.6
A Crab-Inspired Soft Gripper with Single-Finger Dexterous Grasping Capabilities

Zhang, Yunce	Shandong University
Lv, Haobin	Shandong University
Liu, Yixiang	Shandong University
Min, Zhe	University College London
Zhou, Shizhao	Zhejiang University
Wang, Tao	Zhejiang University
Zhu, Shiqiang	Zhejiang University
Song, Rui	Shandong University
Keywords: Soft Robot Materials and Design, Modeling, Control, and Learning for Soft Robots, Soft Robot Applications Abstract: Soft grippers conform to the shape and surface properties of the objects to be grasped, effectively avoiding damage to soft and fragile items. Despite the variety of existing soft gripper designs, their structures lack sufficient flexibility for effectively grasping slender objects or operating in narrow spaces. To address these challenges, we propose a soft gripper with single-finger grasping capabilities, inspired by the structure of crab claws. The structural design and the fabrication method of the gripper are introduced, and the analytical bending model is derived. Experiments are conducted under typical operating conditions to validate the model, and the results indicate that the measured data are in good accordance with the predicted responses. Furthermore, a series of grasping experiments are carried out to test the single-finger grasping capabilities of the proposed soft gripper. The results indicate that the proposed soft gripper can efficiently and stably grasp slender or irregular objects with a single finger. In particular, it demonstrates suitability for operations in narrow spaces and shows potential for handling complex tasks. This innovative design effectively reduces the complexity of the system, while exhibiting promising capabilities in grasping slender or irregular objects and operating within restricted spaces.

11:00-11:05, Paper TuAT26.7
Design and Development of a Propulsion Induced Rolling Spherical Tensegrity Robot

Zhang, Niansong	Nanjing University of Science and Technology
Jiang, Rui	Nanjing University of Science and Technology
Shao, Xinfeng	Nanjing University of Science and Technology
Wu, Yongliang	Xihua University
Qiang, Guiyan	Anhui Province Key Laboratory of Machine Vision Detection and Pe
Huang, Wenkai	Shandong University
Liu, Yixiang	Shandong University
Li, Yibin	Shandong University
Zhao, Jie	Harbin Institute of Technology
Keywords: Soft Robot Applications, Soft Robot Materials and Design, Robotics in Hazardous Fields Abstract: Spherical tensegrity structure has good dynamic stability, support strength and flexibility, and is widely used in the field of mobile robot research. Most of the tensegrity spherical robots deform themselves to make gravity work to realize the motion, but the deformation of both rods and ropes affects the robot's motion efficiency and motion instability. In this paper, a new type of tensegrity spherical robot is proposed, which is powered by six fixed ducted thrusters, and the thrust is provided to induce the robot to roll when in different attitudes. This paper firstly introduces the structural design and principle of the robot. Secondly analyzes the magnitude of the propulsive force required for the robot's motion and establishes a kinematic model. Finally, the robot prototype model was built and the robot motion experiments were conducted in simulation and the real environment respectively. The experimental results show that the robot has a simple structure but high motion efficiency, and has a strong ability to adapt to the environment.

11:05-11:10, Paper TuAT26.8
Towards Deformation Modeling and Simulation of a Soft and Inflatable Endoscopic Vision-Based Tactile Sensing Balloon for Cancer Diagnosis

Kara, Ozdemir Can	University of Texas at Austin
Alambeigi, Farshid	University of Texas at Austin
Keywords: Soft Robot Applications, Soft Robot Materials and Design, Modeling, Control, and Learning for Soft Robots Abstract: In this study, we introduce a simulation-based modeling framework for the optimal design of our recently developed inflatable endoscopic vision-based tactile sensing balloon (E-VTSB). Of note, E-VTSB is designed for providing a safe and high-resolution textural mapping and morphology characterization of colorectal cancer (CRC) polyps to enhance the early diagnosis of cancerous polyps. Leveraging the Simulation Open Framework Architecture (SOFA) software and by performing complementary experimental validation, we thoroughly analyzed and investigated the impact of the elastic modulus of the material constitution of E-VTSB on its deformation behavior under different applied pressures. Our findings revealed a close correlation between the simulated outcomes and experimental data performed on two different E-VTSBs. In particular, with the maximum absolute deformation error of <12%, our results clearly validated the proposed framework's accuracy in predicting the E-VTSB's deformation trend and its potential use for optimizing the involved design parameters of the sensor.


TuAT27	103C
Space Robotics and Automation	Regular Session
Chair: Ma, Zhiqiang	Northwestern Polytechnical University
Co-Chair: Ramezani, Mahya	University of Luxembourg

10:30-10:35, Paper TuAT27.1
Hybrid Control Approach for Walking-Assembly Integrated Space Robots in On-Orbit Assembly

Douglas, Darran A.J.	Beijing Institute of Technology
Shi, Lingling	Beijing Institute of Technology
Hu, Yong	Beijing Institute of Control Engineering
Yan, Xinle	Beijing Institute of Technology， School of Mechanical Eng
Keywords: Space Robotics and Automation, Motion Control, Reinforcement Learning Abstract: As space exploration advances, the demand for assembling large-scale structures in orbit, such as telescopes and space stations, continues to grow due to transportation size constraints. Robotic systems play a critical role in these tasks, requiring precise control and efficient energy management in challenging space conditions. This paper proposes a hybrid control strategy that combines Model Predictive Control (MPC) with Reinforcement Learning (RL) to optimize the performance of a 7-degree-of-freedom walking-assembly integrated space robot. MPC optimizes control inputs by predicting the robot's response over a defined horizon while handling constraints in real time, and RL dynamically tunes MPC parameters to adapt the control system to task-specific priorities. The proposed controller operates in two distinct modes: energy-efficient mode and accuracy-focused mode, balancing energy consumption and task precision. Simulation results validate the approach, demonstrating significant energy savings while maintaining high accuracy during space assembly tasks.

10:35-10:40, Paper TuAT27.2
VLM-Empowered Multi-Mode System for Efficient and Safe Planetary Navigation

Cheng, Sinuo	Harbin Institute of Technology
Zhou, Ruyi	Harbin Institute of Technology
Feng, Wenhao	Harbin Institute of Technology
Yang, Huaiguang	Harbin Institute of Technology
Gao, Haibo	Harbin Institute of Technology
Deng, Zongquan	Harbin Institute of Technology
Ding, Liang	Harbin Institute of Technology
Keywords: Space Robotics and Automation, Wheeled Robots, Engineering for Robotic Systems Abstract: The increasingly complex and diverse planetary exploration environment requires more adaptable and flexible rover navigation strategy. In this study, we propose a VLM-empowered multi-mode system to achieve efficient while safe autonomous navigation for planetary rovers. Vision-Language Model (VLM) is used to parse scene information by image inputs to achieve a human-level understanding of terrain complexity. Based on the complexity classification, the system switches to the most suitable navigation mode, composing of perception, mapping and planning modules designed for different terrain types, to traverse the terrain ahead before reaching the next waypoint. By integrating the local navigation system with a map server and a global waypoint generation module, the rover is equipped to handle long-distance navigation tasks in complex scenarios. The navigation system is evaluated in various simulation environments. Compared to the single-mode conservative navigation method, our multi-mode system is able to bootstrap the time and energy efficiency in a long-distance traversal with varied type of obstacles, enhancing efficiency by 79.5%, while maintaining its avoidance capabilities against terrain hazards to guarantee rover safety. More system information is shown at https://chengsn1234.github.io/multi-mode-planetary-navigation/.

10:40-10:45, Paper TuAT27.3
REALMS2 - Resilient Exploration and Lunar Mapping System 2 – a Comprehensive Approach

van der Meer, Dave	Interdisciplinary Centre for Security, Reliability and Trust - U
Chovet, Loïck	Interdisciplinary Centre for Security, Reliability and Trust (Sn
Garcia, Gabriel Manuel	University of Luxembourg - SnT - SpaceR
Bera, Abhishek	Interdisciplinary Centre for Security, Reliability and Trust - U
Olivares-Mendez, Miguel A.	Interdisciplinary Centre for Security, Reliability and Trust - U
Keywords: Space Robotics and Automation, Distributed Robot Systems, Multi-Robot SLAM Abstract: The European Space Agency (ESA) and the European Space Resources Innovation Centre (ESRIC) created the Space Resources Challenge to invite researchers and companies to propose innovative solutions for Multi-Robot Systems (MRS) space prospection. This paper proposes the Resilient Exploration And Lunar Mapping System 2 (REALMS2), a MRS framework for planetary prospection and mapping. Based on Robot Operating System version 2 (ROS 2) and enhanced with Visual Simultaneous Localisation And Mapping (vSLAM) for map generation, REALMS2 uses a mesh network for a robust ad hoc network. A single graphical user interface (GUI) controls all the rovers, providing a simple overview of the robotic mission. This system is designed for heterogeneous multi-robot exploratory missions, tackling the challenges presented by extraterrestrial environments. REALMS2 was used during the second field test of the ESA-ESRIC Challenge and allowed to map around 60% of the area, using three homogeneous rovers while handling communication delays and blackouts.

10:45-10:50, Paper TuAT27.4
MPC-Based Deep Reinforcement Learning Method for Space Robotic Control with Fuel Sloshing Mitigation

Ramezani, Mahya	University of Luxembourg
Alandihallaj, Mohammadamin	University of Luxembourg
Yalcin, Baris	SpaceR. SnT-University of Luxembourg
Olivares-Mendez, Miguel A.	Interdisciplinary Centre for Security, Reliability and Trust - U
Voos, Holger	University of Luxembourg
Keywords: Space Robotics and Automation, Optimization and Optimal Control, Machine Learning for Robot Control Abstract: This paper presents an integrated Reinforcement Learning (RL) and Model Predictive Control (MPC) framework for autonomous satellite docking with a partially filled fuel tank. Traditional docking control faces challenges due to fuel sloshing in microgravity, which induces unpredictable forces affecting stability. To address this, we integrate Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) RL algorithms with MPC, leveraging MPC’s predictive capabilities to accelerate RL training and improve control robustness. The proposed approach is validated through Zero-G Lab experiments for planar stabilization and high-fidelity numerical simulations for 6-DOF docking with fuel sloshing dynamics. Results demonstrate that SAC-MPC achieves superior docking accuracy, higher success rates, and lower control effort, outperforming standalone RL and PPO-MPC methods. This study advances fuel-efficient and disturbance-resilient satellite docking, enhancing the feasibility of on-orbit refueling and servicing missions.

10:55-11:00, Paper TuAT27.6
VDTF-ACT: ACT-Based Multimodal Space Fine Manipulation Method with Visual Depth Tactile Fusion

Lang, Siyi	Northwestern Polytechnical University
Chen, Jihang	Northwestern Polytechnical University
Zhang, Bo	Shanghai Xiaoyuan Innovation Center
Dong, Hanlin	Northwestern Polytechnical University
Huang, Panfeng	Northwestern Polytechnical University
Ma, Zhiqiang	Northwestern Polytechnical University
Keywords: Space Robotics and Automation, Force and Tactile Sensing, Deep Learning in Grasping and Manipulation Abstract: Autonomous fine manipulation in space for orbital assembly continues to present a critical challenge in the field of aerospace engineering. Under low-gravity conditions, during satellite manipulator operations on free-floating objects, the absence of significant gravitational forces and friction constraints leads to unpredictable relative motions between the manipulator's end-effectors and objects, degrading the manipulation performance. This study proposes a fine manipulation method for satellite robots with floating platforms, grounded in multimodal perception and an enhanced Action Chunking with Transformer (ACT) architecture that enables multimodal state interaction. By integrating visual, tactile, and depth sensory data, the satellite robot's space fine manipulation capabilities are substantially improved. From the experimental results generated from simulated environments, the proposed method achieves a success rate exceeding 80% for peg-in-socket insertion tasks, outperforming conventional approaches with a success rate of approximately 45%. Project Website: https://github.com/LSY0528/VDTF-ACT.


TuAT28	104
Marine Robotics 1	Regular Session
Chair: Li, Ji-Hong	Korea Institute of Robotics and Technology Convergence

10:30-10:35, Paper TuAT28.1
AVIP: Acoustic-Visual-Inertial-Pressure Fusion-Based Underwater Localization System with Multi-Centric Calibration

Xue, Yuanbo	The Hong Kong Polytechnic University
Hu, Yang	University College London
Zhang, Dejin	Shenzhen University
Wen, Chih-yung	The Hong Kong Polytechnic University
Wang, Bing	University of Oxford
Keywords: Marine Robotics, Localization, Sensor Fusion Abstract: Underwater localization is a crucial capability for ensuring robust and accurate vehicle navigation. Although various well-developed localization systems exist, their primary focus is on ground and aerial applications. The challenges posed by underwater environments, such as sparse textures and dynamic disturbances, enable the multi-modal fusion a promising solution for localization. This paper presents AVIP, a localization method that fuses Acoustic, Visual, Inertial, and Pressure modalities for underwater applications. To integrate the information from all modalities during initialization, visual and inertial modalities are alternately assigned as centric sensors to pairwise predict and update estimations of other modalities. The multi-centric calibration problem is addressed through factor graph optimization, which is fully integrated into the graph-based AVIP system as the calibration factor. To evaluate the performance and compare to state-of-the-art approaches, the proposed method is evaluated using semi physical datasets recorded by a BlueROV2 robot and public real-world datasets. Extensive experiments demonstrate that AVIP achieves superior localization accuracy and exhibits adaptability across a range of sensor configurations.

10:35-10:40, Paper TuAT28.2
OceanSim: A GPU-Accelerated Underwater Robot Perception Simulation Framework

Song, Jingyu	University of Michigan
Ma, Haoyu	University of Michigan
Bagoren, Onur	University of Michigan
Venkatramanan Sethuraman, Advaith	University of Michigan
Zhang, Yiting	University of Michigan, Ann Arbor
Skinner, Katherine	University of Michigan
Keywords: Marine Robotics, Simulation and Animation, Field Robots Abstract: Underwater simulators offer support for building robust underwater perception solutions. Significant work has recently been done to develop new simulators and to advance the performance of existing underwater simulators. Still, there remains room for improvement on physics-based underwater sensor modeling and rendering efficiency. In this paper, we propose OceanSim, a high-fidelity GPU-accelerated underwater simulator to address this research gap. We propose advanced physics-based rendering techniques to reduce the sim-to-real gap for underwater image simulation. We develop OceanSim to fully leverage the computing advantages of GPUs and achieve real-time imaging sonar rendering and fast synthetic data generation. We evaluate the capabilities and realism of OceanSim using real-world data to provide qualitative and quantitative results. The code and detailed documentation are made available on the project website to support the marine robotics community: https://umfieldrobotics.github.io/OceanSim.

10:40-10:45, Paper TuAT28.3
Decentralized Declustering of Multiple Underactuated Autonomous Surface Vehicles: Managing Robot Swarms in the Field

Strømstad, Filip Traasdahl	Massachusetts Institute of Technology
Benjamin, Michael	Massachusetts Institute of Technology
Keywords: Marine Robotics, Swarm Robotics, Underactuated Robots Abstract: The task of deploying a large number of autonomous vehicles is challenging, risky and often overlooked in the literature. These vehicles are typically deployed from a single location, and their underactuated nature, close proximity, and susceptibility to external disturbances make it difficult to achieve a mission-ready configuration without collisions. In this paper, we address the problem of transitioning a set of underactuated Autonomous Surface Vehicles (ASVs) from arbitrary and inconvenient initial conditions, to a deconflicted set of deployed vehicles. We propose a decentralized and scalable method that assigns the vehicles to their target positions, generates optimal paths given minimum turning radii and assures collision avoidance between the vehicles. Performance is verified through simulation and extensive field trials. Results demonstrate that our approach improves the time to decluster with 58% compared to the current manual method. By improving efficiency and robustness while eliminating human involvement, this work streamlines ASV fleet deployments, enabling more effective multi-agent field operations.

10:45-10:50, Paper TuAT28.4
MCE-Based Direct FTC Method for Dynamic Positioning of Underwater Vehicles with Thruster Redundancy

Li, Ji-Hong	Korea Institute of Robotics and Technology Convergence
Lee, Munjik	Korea Institute of Robot & Convergence
Kim, Min-gyu	Korea Institute of Robotics & Technology Convergence
Kang, Hyungjoo	Korea Institute of Robot and Convergence
Jin, Han-Sol	Korea Institute of Robotics & Technology Convergence
Park, JungHyeun	Korea Institute of Robotics & Technology Convergence
Cho, Gun Rae	Korea Institute of Robotics and Technology Convergence
Keywords: Marine Robotics, Redundant Robots, Failure Detection and Recovery Abstract: This paper presents an active, model-based FTC (fault tolerant control) method for the dynamic positioning of underwater vehicles with thruster redundancy. Unlike conventional appraoches that rely heavily on state and parameter estimation, the proposed scheme directly utilizes the vehicle's motion control error (MCE) to construct a residual for detecting thruster faults and failures during the steady state operation of the control system. One of the primary challenges in thruster fault identification is the unavailability of the actual control input under fault conditions. However, by conducting a detailed and rigorous analysis of the MCE variation trends associated with thruster faults, valuable information about this unknown control input can be extracted. This insight forms the foundation of the proposed FTC strategy. As for control reconfiguration, it's straightforward since the thrust losses can be directly estimated through the fault identification process. Numerical studies with the real world vehicle model are carried out to validate the effectiveness of the proposed method.

10:50-10:55, Paper TuAT28.5
Distributed Cooperative Target Tracking and Active Sensing of Dual-AUV Based on Flank Array Sonar Detection

Qi, Qi	Harbin Engineering University
Chen, Tao	Harbin Engineering University
Jiang, Yiming	Harbin Engineering University
Pan, Yanjie	Harbin Engineering University
Keywords: Marine Robotics, Cooperating Robots, Sensor-based Control Abstract: When tracking underwater target, autonomous underwater vehicles (AUVs) need to estimate the target state based on the information detected by sensors and plan their own tracking paths accordingly to achieve active sensing of the target. When the sensor equipped on the AUV is a flank array sonar, the problem becomes significantly more complex due to the limited field of view (FOV) of the sonar and the fact that bearing-only information is available for observation. To address this issue, this paper proposes a distributed solution for cooperative tracking and active sensing using dual-AUV systems equipped with flank array sonar for detection. Based on the analysis of underwater acoustic communication modes in dual-AUV systems, this study decomposes the problem into two aspects: cooperative estimation and planning control for active sensing. Corresponding algorithms are proposed and their effectiveness is verified.

10:55-11:00, Paper TuAT28.6
A Sim-To-Real Transfer Framework for Enhancing Marine Vehicle Performance in Ocean Environments

Zheng, Ze	Shanghai University
Wang, Zihao	Shanghai University
Xie, Wenbo	Shanghai University
Keywords: Marine Robotics, Transfer Learning, Calibration and Identification Abstract: Reinforcement learning (RL) has gained attention for complex decision-making in uncertain environments. However, high costs and risks of real-world experimentation limit its direct application to marine vehicles. This motivates the use of simulation-based training and sim-to-real transfer techniques. Despite growing interest, a systematic understanding of how to design effective transfer strategies for marine contexts remains lacking. This paper presents a sim-to-real transfer framework tailored for marine vehicles, integrating high-fidelity, data-driven dynamics modeling with multi-factor domain randomization to address marine environmental uncertainties. Maneuvering data is utilized to extract nonlinear hydrodynamic characteristics of marine vehicles to enhance model realism. Additionally, domain randomization is explored across multiple environmental factors, including wind, wave, and current. To evaluate transferability, we construct a sim-to-sim platform with a pseudo-real environment that emulates the reality gap and adopt a path-following task using Soft Actor-Critic. We comprehensively assess the impacts of model fidelity and environmental randomization strategies on sim-to-real transfer performance. Results indicate that model accuracy positively impacts transfer performance, while aggressive domain randomization may reduce adaptability in calm conditions. Finally, a data-driven modeling and multi-factor randomization recipe is proposed for RL policy transfer in marine applications.

11:00-11:05, Paper TuAT28.7
Design and Dynamic Modeling Analysis of Undulatory Propulsion Underwater Robot with Rotational Passive Degrees of Freedom in Fin Rays

Zhang, Tangjia	Xi'an Jiaotong University
Hu, Qiao	Xi'an Jiaotong University
Li, Shijie	Xi'an Jiaotong University
Zeng, Yangbin	Xi'an Jiaotong University
Zu, Siyu	Xi'an Jiaotong University
Sun, Liangjie	Xi'an Jiaotong University
Keywords: Marine Robotics, Biomimetics, Dynamics Abstract: Current research on undulatory propulsion robots has pre-dominantly centered on hydrodynamic performance simulations. However, challenges such as limited mobility and difficulties in parameter identification during underwater biomimetic motion remain unresolved. To address these issues, this study proposes a novel undulating fin robot featuring passive rotational joints, aiming to enhance motion capabilities and facilitate more accu-rate modeling. These joints enhance both the agility and stability of the robot's movements. Initially, the research develops models for the undulatory motion of the undulating fin and the rota-tional passive degrees of freedom in the fin rays. Based on fluid drag theory, a hydrodynamic model for undulating fin propul-sion is constructed to analyze the thrust, lateral force, and lift generated at varying frequencies. Furthermore, a comprehensive dynamics model for the underwater motion of the biomimetic undulating fin robot is developed. Numerical simulations of the robot's non-steady-state motion are conducted to identify the hydrodynamic parameters of the model, thereby enabling the solution of the dynamic model. The experimental results demonstrate that the robot achieves an underwater straight-line motion speed exceeding 0.5m/s, a turning speed of approximately 45°/s, and an inclined upward motion speed of 0.21 m/s. This study provides a novel approach for the design of underwater undulating fin robots and the resolution of kinematic models for underwater robots. It is hoped that this research can contribute to the further development of undulatory propulsion robot technology.

11:05-11:10, Paper TuAT28.8
Coordinated Energy-Trajectory Economic Model Predictive Control for Autonomous Surface Vehicles under Disturbances

Deng, Zhongqi	Hunan University
Wang, Yuan	Hunan University
Huang, Jian	Huazhong University of Science and Technology
Zhang, Hui	Hunan University
Wang, Yaonan	Hunan University
Keywords: Marine Robotics, Software-Hardware Integration for Robot Systems, Underactuated Robots Abstract: The paper proposes a novel Economic Model Predictive Control (EMPC) scheme for Autonomous Surface Vehicles (ASVs) to simultaneously address path following accuracy and energy constraints under environmental disturbances. By formulating lateral deviations as energy-equivalent penalties in the cost function, our method enables explicit trade-offs between tracking precision and energy consumption. Furthermore, a motion-dependent decomposition technique is proposed to estimate terminal energy costs based on vehicle dynamics. Compared with the existing EMPC method, simulations with real-world ocean disturbance data demonstrate the controller's energy consumption with a 0.06% energy increase while reducing cross-track errors by up to 18.61%. Field experiments conducted on an ASV equipped with an Intel N100 CPU in natural lake environments validate practical feasibility, achieving 0.22 m average cross-track error at nearly 1 m/s and 10 Hz control frequency. The proposed scheme provides a computationally tractable solution for ASVs operating under resource constraints.


TuAT29	105
SLAM 1	Regular Session
Chair: Pei, Ling	Shanghai Jiao Tong University
Co-Chair: Chen, Xieyuanli	National University of Defense Technology

10:30-10:35, Paper TuAT29.1
SE(3)-Manifold Reinforcement Learning for Robust Extrinsic Calibration with Degenerate Motion Resilience

Li, Baorun	Zhejiang University
Zhu, Chengrui	Zhejiang University
Du, Siyi	Zhejiang University
Chen, Bingran	Zhejiang University
Ren, Jie	Zhejiang University
Wang, Wenfei	Zhejiang Guozi Robotics Technology Co., Ltd
Liu, Yong	Zhejiang University
Lv, Jiajun	Zhejiang University
Keywords: SLAM, Calibration and Identification, Sensor Fusion Abstract: Extrinsic calibration is essential for multi-sensor fusion, but existing methods rely on structured targets or fully-excited data, limiting real-world applicability. Online calibration further suffers from weak excitation, leading to unreliable estimates. To address these limitations, we propose a reinforcement learning (RL)-based extrinsic calibration framework that formulates extrinsic calibration as a decision-making problem, directly optimizes SE(3) extrinsics to enhance odometry accuracy. Our approach leverages a probabilistic Bingham distribution to model 3D rotations, ensuring stable optimization while inherently retaining quaternion symmetry. A trajectory alignment reward mechanism enables robust calibration without structured targets by quantitatively evaluating estimated tightly-coupled trajectory against reference trajectory. Additionally, an automated data selection module filters uninformative samples, significantly improving efficiency and scalability for large-scale datasets. Extensive experiments on UAVs, UGVs, and handheld platforms demonstrate that our method outperforms traditional optimization-based approaches, achieving high-precision calibration even under weak excitation conditions. Our framework simplifies deployment across diverse robotic platforms by eliminating the need for high-quality initial extrinsics and enabling calibration from operational data. The code will be available at url{https://github.com/APRIL-ZJU/learn-to-calibrate}.

10:35-10:40, Paper TuAT29.2
Hyla-SLAM: Toward Maximally Scalable 3D LiDAR-Based SLAM Using Dynamic Memory Management and Behavior Trees

Swanbeck, Steven	The University of Texas at Austin
Pryor, Mitchell	University of Texas
Keywords: Mapping, SLAM, Software Architecture for Robotic and Automation Abstract: Solving the Simultaneous Localization and Mapping (SLAM) problem is essential for most mobile robotics applications that do not have an a priori environment representation. The SLAM problem is well-studied, with previous works demonstrating impressive results with a variety of robots, sensors, and environments. However, despite the widespread need for SLAM solutions across a broad spectrum of robotics disciplines and applications, it remains challenging to quickly and easily scale existing solutions to new robots, environments, and tasks. In this paper, we address this problem by introducing Hyla-SLAM, a framework for 3D LiDAR-based SLAM that uses dynamic memory management to efficiently create and manage maps of environments of arbitrary size and density. Hyla-SLAM also scales to diverse systems and applications by using behavior trees to maximize runtime flexibility and extensibility. We demonstrate the scalability of Hyla-SLAM in experiments using datasets collected in North America, Europe, and Asia to generate a single, unified global-scale map thousands of kilometers across that can be efficiently accessed and expanded. We also show experiments using the behavior tree interface to make robot- or task-informed modifications that enable deployment on heterogeneous robots with varying system constraints. These results demonstrate the framework’s ability to efficiently create and manage huge maps while generalizing to a wide range of systems and applications with minimal reconfiguration. We release Hyla-SLAM’s code implementation open-source.

10:40-10:45, Paper TuAT29.3
LiDAR-Inertial Odometry in Dynamic Driving Scenarios Using Label Consistency Detection

Yuan, Zikang	Huazhong University, Wuhan, 430073, China
Wang, Xiaoxiang	Huazhong University of Science and Technology
Wu, Jingying	Huazhong University of Science and Technology
Cheng, Junda	Huazhong University of Science and Technology
Yang, Xin	Huazhong University of Science and Technology
Keywords: SLAM, Localization, Sensor Fusion Abstract: In this paper, a LiDAR-inertial odometry (LIO) method that eliminates the influence of moving objects in dynamic driving scenarios is proposed. This method constructs binarized labels for 3D points of current sweep, and utilizes the label difference between each point and its surrounding points in global map to identify moving objects. The surrounding points in global map are localized by voxel-location-based nearest neighbor search, without involving any massive computations. In addition, the proposed method is embeded into a LIO system (i.e., Dynamic-LIO), and achieves state-of-the-art performance on public datasets with extremlely low computational overhead (i.e., 1sim9ms/sweep). We have released the source code of this work for the development of the community.

10:45-10:50, Paper TuAT29.4
CGS-SLAM: Compact 3D Gaussian Splatting for Dense Visual SLAM

Deng, Tianchen	Shanghai Jiao Tong University
Chen, Yaohui	Shanghai Jiao Tong University
Yang, Jianfei	Nanyang Technological University
Yuan, Shenghai	Nanyang Technological University
Liu, Jiuming	Shanghai Jiao Tong University
Wang, Danwei	Nanyang Technological University
Chen, Weidong	Shanghai Jiao Tong University
Keywords: SLAM, Deep Learning for Visual Perception, Mapping Abstract: Recent work has shown that 3D Gaussian-based SLAM enables high-quality reconstruction, accurate pose estimation, and real-time rendering of scenes. However, these approaches are built on a tremendous number of redundant 3D Gaussian ellipsoids, leading to high memory and storage costs and slow training speed. To address this limitation, we propose a compact 3D Gaussian Splatting SLAM system that reduces the number and the parameter size of Gaussian ellipsoids. A sliding window-based masking strategy is first proposed to reduce the redundant ellipsoids. Then, a novel geometry codebook-based quantization method is proposed to further compress 3D Gaussian geometric attributes. Robust and accurate pose estimation is achieved by a local-to-global bundle adjustment method with reprojection loss. Extensive experiments demonstrate that our method achieves faster training, rendering speed, and low memory usage while maintaining the state-of-the-art (SOTA) quality of the scene representation.

10:50-10:55, Paper TuAT29.5
Leveraging Semantic Graphs for Efficient and Robust LiDAR SLAM

Wang, Neng	National University of Defense Technology
Lu, Huimin	National University of Defense Technology
Zheng, Zhiqiang	National University of Defense Technology
Liu, Yunhui	Chinese University of Hong Kong
Chen, Xieyuanli	National University of Defense Technology
Keywords: SLAM, Localization Abstract: Accurate and robust simultaneous localization and mapping (SLAM) is crucial for autonomous mobile systems, typically achieved by leveraging the geometric features of the environment. Incorporating semantics provides a richer scene representation that not only enhances localization accuracy in SLAM but also enables advanced cognitive functionalities for downstream navigation and planning tasks. Existing point-wise semantic LiDAR SLAM methods often suffer from poor efficiency and generalization, making them less robust in diverse real-world scenarios. In this paper, we propose a semantic graph-enhanced SLAM framework, named SG-SLAM, which effectively leverages the geometric, semantic, and topological characteristics inherent in environmental structures. The semantic graph serves as a fundamental component that facilitates critical functionalities of SLAM, including robust relocalization during odometry failures, accurate loop closing, and semantic graph map construction. Our method employs a dual-threaded architecture, with one thread dedicated to online odometry and relocalization, while the other handles loop closure, pose graph optimization, and map updating. This design enables our method to operate in real time and generate globally consistent semantic graph maps and point cloud maps. We extensively evaluate our method across the KITTI, MulRAN, and Apollo datasets, and the results demonstrate its superiority compared to state-of-the-art methods. Our method has been released at https://github.com/nubot-nudt/SG-SLAM.

10:55-11:00, Paper TuAT29.6
Maximum Clique-Based Floorplan Association for Robust Multi-Session Stereo SLAM in Challenging Indoor Environments

Wang, Haolin	Institute of Automation, Chinese Academy of Sciences
Wei, Hao	University of Chinese Academy of Sciences
Lv, Zeren	Beijing University of Chemical Technology
Zhu, Haijiang	Beijing University of Chemical Technology
Wu, Yihong	National Laboratory of Pattern Recognition, InstituteofAutomatio
Keywords: SLAM, Localization Abstract: Existing multi-session visual simultaneous localization and mapping (SLAM) systems struggle severely to achieve robust localization and map merging under extreme viewpoint and illumination variations, particularly when handling completely opposite viewpoints and drastic day-night lighting changes. These challenges stem largely from the limited viewpoint/illumination invariance of conventional low-level visual features and their inability to capture a global structural context. In this paper, we make the critical observation that a life-long floorplan not only encodes rich geometric and semantic information—serving as a robust high-level structural representation—but is also inherently more robust to severe viewpoint and illumination variations than purely visual data. Building on this insight, we propose a novel hierarchical framework for multi-session SLAM that integrates a floorplan-based map as a global feature to achieve robust indoor localization and map merging under drastic viewpoint and illumination shifts. In particular, we innovatively formulate floorplan association as a maximum clique problem augmented with trajectory data to achieve robust floorplan-level global localization. We further introduce a novel coarse-to-fine localization and map merging strategy that seamlessly integrates floorplan alignment, multi-stage point cloud registration, and feature matching, fully leveraging the macro-level stability of global features and the micro-level precision of local features to achieve keyframe-level fine localization. Extensive experiments on both public and self-collected datasets demonstrate that our method consistently outperforms state-of-the-art approaches reliant solely on low-level visual or geometric features. Crucially, it delivers superior accuracy and robustness even in the face of completely opposite viewpoints and extreme day–night illumination changes. This work underscores the promise of fusing macro-level floorplan representations with conventional SLAM frameworks to advance long-term, robust indoor localization and map merging under the most challenging conditions.

11:00-11:05, Paper TuAT29.7
THE-SEAN: A Heart Rate Variation-Inspired Temporally High-Order Event-Based Visual Odometry with Self-Supervised Spiking Event Accumulation Networks

Xiong, Chaoran	Shanghai Jiao Tong University
Wei, Litao	ShangHai JiaoTong University
Ma, Kehui	Shanghai Jiao Tong University
Sun, Zhen	Shanghai Jiao Tong University
Xiang, Yan	Shanghai Jiao Tong University
ZiHan, Nan	Beijing Institute of Aerospace Control Devices
Truong, Trieu-Kien	Shanghai Jiao Tong University
Pei, Ling	Shanghai Jiao Tong University
Keywords: SLAM, Bioinspired Robot Learning, Reinforcement Learning Abstract: Event-based visual odometry has recently gained attention for its high accuracy and real-time performance in fast-motion systems. Unlike traditional synchronous estimators that rely on constant-frequency (zero-order) triggers, event-based visual odometry can actively accumulate information to generate temporally high-order estimation triggers. However, existing methods primarily focus on adaptive event representation after estimation triggers, neglecting the decision-making process for efficient temporal triggering itself. This oversight leads to the computational redundancy and noise accumulation. In this paper, we introduce a temporally high-order event-based visual odometry with spiking event accumulation networks (THE-SEAN). To the best of our knowledge, it is the first event-based visual odometry capable of dynamically adjusting its estimation trigger decision in response to motion and environmental changes. Inspired by biological systems that regulate hormone secretion to modulate heart rate, a self-supervised spiking neural network is designed to generate estimation triggers. This spiking network extracts temporal features to produce triggers, with rewards based on block matching points and Fisher information matrix (FIM) trace acquired from the estimator itself. Finally, THE-SEAN is evaluated across several open datasets, thereby demonstrating average improvements of 13% in estimation accuracy, 9% in smoothness, and 38% in triggering efficiency compared to the state-of-the-art methods.

11:05-11:10, Paper TuAT29.8
Parallel Self-Assembly for a Multi-USV System on Water Surface with Obstacles (I)

Zhang, Lianxin	The Chinese University of Hong Kong, Shenzhen
Huang, Yihan	Stevens Institute of Technology
Cao, Zhongzhong	Chinese University of Hong Kong（Shenzhen)
Jiao, Yang	University of California San Diego
Qian, Huihuan (Alex)	The Chinese University of Hong Kong, Shenzhen
Keywords: Multi-Robot SLAM, Assembly, Collision Avoidance Abstract: Parallel self-assembly is an efficient approach to accelerate the assembly process for modular robots. However, these approaches cannot accommodate complicated environments with obstacles, which restricts their applications. We in previous work consider the surrounding stationary obstacles and propose a parallel self-assembly planning algorithm. With this algorithm, modular robots can avoid immovable obstacles when performing docking actions, which adapts the parallel self-assembly process to complex scenes. The algorithm was simulated in 25 distinct maps with different obstacle configurations and shows a significantly higher success rate, which is more than 80%, compared to the existing parallel self-assembly algorithms. For verification in real-world applications, we in this paper develop a multi-agent hardware testbed system. The algorithm is successfully deployed on four omnidirectional unmanned surface vehicles, CuBoats. The navigation strategy that translates the high-level discrete plan to the continuous controller on the CuBoats is presented. The algorithm’s feasibility and flexibility were demonstrated through successful self-assembly experiments on 5 maps with varying obstacle configurations.


TuAT30	106
Aerial Perception 1	Regular Session
Chair: Cao, Zhengcai	Harbin Institute of Technology
Co-Chair: Chen, Kuan-Wen	National Yang Ming Chiao Tung University

10:30-10:35, Paper TuAT30.1
RSSS: Robust Structural Semantic Segmentation for Autonomous Drone Delivery to Door

Xia, Shengqing	Purdue University
Du, Jiaxin	Purdue University
Peng, Chunyi	Purdue University
Keywords: Aerial Systems: Perception and Autonomy, Computer Vision for Automation, Deep Learning for Visual Perception Abstract: Autonomous drone delivery to door relies on a popular computer vision technique called semantic segmentation (SS) to recognize meaningful house segments and determine a precise drop-off point near the door. While this SS-based approach is effective under specific environments, it fails to perform well across other common environmental factors such as different seasons, hours of the day, weather and illumination levels. In this work, we propose Robust Structural Semantic Segmentation (RSSS), a novel patch to the existing SS solution without requiring re-training for new environments. The core idea is to “Let Strong Help Weak”, where the results of semantic segmentation obtained under favorable/strong conditions are utilized to enhance the weaker ones in adverse settings. This improvement is achieved by leveraging house structures and spatial layouts, which remain largely invariant across various environments. Our evaluation shows that RSSS outperforms the state-of-the-art methods and significantly enhances the robustness of SS and drone delivery across various environments. The dataset collected for this study is released at Github.

10:35-10:40, Paper TuAT30.2
EDSOD: An Encoder-Decoder, Diffusion-Model, and Swin-Transformer-Based Small Object Detector

Li, Junnian	Beijing Univ. of Chemical Tech
Zhou, MengChu	New Jersey Institute of Technology
Cao, Zhengcai	Harbin Institute of Technology
Keywords: Aerial Systems: Perception and Autonomy, Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization Abstract: Small object detection (SOD) given aerial images suffers from an information imbalance across different feature scales. This makes it extreme challenging to perform accurate SOD. Existing methods, e.g., Feature Pyramid Network (FPN)-based algorithms, focus on extracting high-resolution and low-resolution semantic features from different convolution layers. However, in deeper convolution layers, semantic feature misalignment and the loss of key information are inevitable. To tackle such issues, this work proposes a new encoder-decoder-based SOD framework with a Diffusion Model and Swin Transformer called EDSOD for short. First, we reformulate an SOD task as a Noise-to-Box process. We then construct an encoder-decoder framework by using a diffusion model and Swin Transformer for dynamic bounding box generation. We introduce a decoupling training and inferencing strategy to recognize and locate small objects accurately. We finally evaluate the proposed framework on several public benchmarks. The experimental results well show its better SOD performance than the state of the art.

10:40-10:45, Paper TuAT30.3
Rapid Flight Trajectory Planning for Autonomous Terrain Avoidance Via Generative Learning

Cetin, Ahmet Talha	Istanbul Technical University
Aldabbas, Samer Raed	Turkish Aerospace Industries, Istanbul Technical University
Abu-Khalaf, Murad	Massachusetts Institute of Technology
Koyuncu, Emre	Istanbul Technical University
Keywords: Aerial Systems: Perception and Autonomy, Human Factors and Human-in-the-Loop, Motion and Path Planning Abstract: Ensuring aircraft safety against terrain collisions in complex and dynamic environments remains a critical challenge in aviation. To address this, a parallel autonomy system is proposed that can take control from a human pilot to prevent a controlled flight into terrain collision. The proposed system operates in the background, continuously maintaining a forward-looking motion plan that can be executed immediately if a terrain collision is projected to happen, absent its timely intervention. Terrain avoidance motion plans are rapidly generated based on the aircraft's current state vector and a Digital Elevation Model of the surrounding terrain. The planning process involves two main steps: first, a sampling-based motion planner leverages prior knowledge acquired through generative adversarial learning to bias the search toward escape paths within the most favorable regions of Cartesian space. Second; differential flatness of the aircraft model is utilized to ensure the dynamic feasibility of an associated Cartesian space escape trajectory and flattening it into a state-control trajectory. This converts the output tracking problem in Cartesian space into a ready-to-invoke state-feedback control.

10:45-10:50, Paper TuAT30.4
Rotation-Equivariant Robot Vision: A Perspective Via Correspondence-Matching and Pre-Training

Su, Shuai	Tongji University, China
Pan, Xianghui	Tongji University
Du, Jiayuan	Tongji University
Liu, Chengju	Tongji University
Chen, Qijun	Tongji University
Keywords: Aerial Systems: Perception and Autonomy, Computer Vision for Automation, Deep Learning for Visual Perception Abstract: Correspondence matching is a fundamental and crucial task in robot vision. In recent years, deep learning-based keypoint matching techniques have shown outstanding performance in downstream tasks. Conventional learning-based correspondence matching methods rely on large datasets and a specific training procedure. Correspondence techniques based on pre-trained features have been preliminarily explored by researchers. Unfortunately, traditional convolutional neural networks only possess translation invariance but lack rotational invariance, hence, their performance suffers significantly under heavy rotations. Therefore, we propose a correspondence matching method based on pre-trained group-equivariant neural networks and compare the performance of various rotation-equivariant to rotation-invariant transformers. We conducted experiments on the Rotated-Hpatches and Rotated-MegaDepth datasets, and the results indicate that our proposed method is concise and effective, achieving state-of-the-art performance without the need for retraining in downstream tasks.

10:50-10:55, Paper TuAT30.5
FIELD: Fast Information-Driven Autonomous Exploration Using Larger Perception Distance

Zhang, Yuefeng	Huazhong University of Science and Technology
Yang, Fan	Huazhong University of Science and Technology
Yuan, Nanjun	Huazhong University of Science and Technology
Tao, Wenbing	Huazhong University of Science & Technology
Keywords: Aerial Systems: Perception and Autonomy, Aerial Systems: Applications, Task and Motion Planning Abstract: Autonomous exploration is a critical challenge for various unmanned aerial vehicle (UAV) applications. Existing methods often suffer from low exploration rates due to limitations such as inefficient global coverage and inadequate sensor data utilization. In this paper, we introduce FIELD, a Fast Information-driven aerial robot Exploration planner using Larger perception Distance. FIELD leverages a larger perception distance to identify high-information-gain viewpoints while maintaining mapping precision and utilizing more sensor data to guide the exploration process. Then, the method incorporates a history-aware coverage path to determine a consistent and reasonable sequence for visiting frontier viewpoints. Local viewpoints are refined to find the optimal combination of these viewpoints. We compare our method with state-of-the-art frontier-based approaches in benchmark environments. Our method shows a 13% to 17% improvement in exploration efficiency.

10:55-11:00, Paper TuAT30.6
High-Fidelity Integrated Aerial Platform Simulation for Control, Perception, and Learning (I)

Du, Jianrui	Beijing Institute of Technology
Wang, Kaidi	Beijing Institute of Technology
Fan, Yingjun	Beijing Institute of Technology
Lai, Ganghua	Beijing Institute of Technology
Yu, Yushu	Beijing Institute of Technology
Keywords: Simulation and Animation, Aerial Systems: Perception and Autonomy, SLAM Abstract: This paper presents a simulator framework tailored Integrated Aerial Platforms (IAPs) using multiple quadrotors. Our framework prioritizes photo and contact fidelity, achieved through a modular design that balances rendering and dynamics computation. Key features include: i) support for diverse IAP configurations; ii) a customizable physics engine for realistic motion and contact simulation for aerial manipulation; and iii) Unreal Engine 5 for lifelike rendering, with sensor designs for visual-inertial SLAM positioning simulation. We showcase our framework’s versatility through a range of scenarios, including trajectory tracking for both fully and under-actuated IAPs, peg-in-hole and direct wrench control tasks under external wrench influence, tightly-coupled SLAM positioning with physical constraints, and air docking task training and testing using offline-to-online reinforcement learning. Furthermore, we validate our simulator framework’s fidelity by comparing results with real flight data for trajectory tracking and direct wrench control tasks. Our simulator framework promises to be valuable for developing and testing integrated aerial platform systems for aerial manipulation. Note to Practitioners—Motivated by the demand for effective simulation tools for Integrated Aerial Platforms (IAPs), this research addresses a significant gap in the availability of comprehensive simulation platforms designed to meet their unique challenges. This paper presents a high-fidelity simulation platform tailored specifically for IAPs, supporting a variety of configurations and capabilities. The platform not only generates high-fidelity image data and facilitates contact simulation but also serves as a vital resource for advancing perception, control, and learning for IAPs. By offering a robust simulation environment, this work aims to bridge the divide between theoretical research and practical applications, ultimately driving advancements in the field of aerial robotics.

11:00-11:05, Paper TuAT30.7
Efficient and Precise Drone Rephotography for Video Sequences

Xu, Hao-Liang	National Yang Ming Chiao Tung University
Chi, Chu-Chun	National Yang Ming Chiao Tung University
Chen, Kuan-Wen	National Yang Ming Chiao Tung University
Keywords: Aerial Systems: Perception and Autonomy, Computer Vision for Automation, Deep Learning for Visual Perception Abstract: Precise drone rephotography technology aims to recover camera poses from a reference sequence and obtain well-aligned image sequences, playing a crucial role in autonomous drone inspection tasks. However, existing rephotography methods rely on static image inputs, resulting in low efficiency and limited applicability in real-world scenarios. This paper presents a novel video-based precise drone rephotography system leveraging video sequences. To the best of our knowledge, this is the first work to extend precise drone rephotography from still images to videos while significantly reducing rephotography time. The proposed approach integrates advanced visual SLAM techniques with a dense flow prediction model to continuously refine the drone’s pose, enabling robust and precise rephotography tasks. To further quantify system performance, we introduce a trajectory-based visual similarity evaluation standard—Dynamic Frame Alignment Error (DFAE), which assesses the visual similarity of drone-captured videos of varying durations. We conducted multiple experiments with drones in real-world scenarios. Experimental results demonstrate that the proposed system achieves efficient and precise rephotography across multiple indoor and outdoor trials. Specifically, the average rephotography error is only 7.956 pixels indoors and 9.800 pixels outdoors. More importantly, the rephotography time is only half of the baseline.


TuBT1	401
Award Finalists 2	Regular Session
Chair: Hasegawa, Yasuhisa	Nagoya University
Co-Chair: Laschi, Cecilia	National University of Singapore

13:20-13:25, Paper TuBT1.1
FruitNeRF++: A Generalized Multi-Fruit Counting Method Utilizing Contrastive Learning and Neural Radiance Fields

Meyer, Lukas	Friedrich-Alexander-Universität Erlangen-Nürnberg
Ardelean, Andrei-Timotei	Friedrich-Alexander-Universität Erlangen-Nürnberg
Weyrich, Tim	Friedrich-Alexander-Universität Erlangen-Nürnberg
Stamminger, Marc	Universität Erlangen-Nürnberg
Keywords: Agricultural Automation, Robotics and Automation in Agriculture and Forestry Abstract: We introduce FruitNeRF++, a novel fruit-counting approach that combines contrastive learning with neural radiance fields to count fruits from unstructured input photographs of orchards. Our work is based on FruitNeRF, which employs a neural semantic field combined with a fruit-specific clustering approach. The requirement for adaptation for each fruit type limits the applicability of the method, and makes it difficult to use in practice. To lift this limitation, we design a shape-agnostic multi-fruit counting framework, that complements the RGB and semantic data with instance masks predicted by a vision foundation model. The masks are used to encode the identity of each fruit as instance embeddings into a neural instance field. By volumetrically sampling the neural fields, we extract a point cloud embedded with the instance features, which can be clustered in a fruit-agnostic manner to obtain the fruit count. We evaluate our approach using a synthetic dataset containing apples, plums, lemons, pears, peaches, and mangoes, as well as a real-world benchmark apple dataset. Our results demonstrate that FruitNeRF++ is easier to control and compares favorably to other state-of-the-art methods.

13:25-13:30, Paper TuBT1.2
Interdigitated Electrodes for Selective Stimulation of Skeletal Muscle Actuators in Biosyncretic Robots

Yang, Lianchao	Shenyang Institute of Automation, Chinese Academy of Sciences
Zhang, Chuang	Shenyang Institute of Automation Chinese Academy of Sciences
Zhang, Qi	Shenyang Institute of Automation, Chinese Academy of Sciences
Zhang, Yiwei	Chinese Academy of Sciences
Qin, Hengshen	Shenyang Institute of Automation, Chinese Academy of Sciences
Liu, Lianqing	Shenyang Institute of Automation
Keywords: Biologically-Inspired Robots, Biomimetics, Soft Sensors and Actuators Abstract: Engineered skeletal muscle tissue (SMT) is the ideal driving units for achieving fine movements in biosyncretic robots due to their excellent controllability and potentially large driving force. However, the selective stimulation of SMTs continues to pose a significant technical challenge. In this study, we propose a method for the selective stimulation of 3D SMT using thin-film interdigitated electrodes (IDEs). By optimizing the IDEs geometry through finite element simulations, electrical field intensity of fingertip is effectively reduced. The thin-film IDEs are fabricated on a Polyester (PET) substrate using screen printing technology and successfully enable selective activation and controlled contraction of the SMTs. Compared to conventional parallel-plate electrodes (PPEs) and rod-shaped electrodes (RSEs), the IDEs significantly improve the electrical field distribution and enhance spatial resolution. This advancement provides a promising new approach for achieving high-precision motion control in biosyncretic robots (or biohybrid robots).

13:30-13:35, Paper TuBT1.3
Human-Guided Robotic-Assistance Handheld Continuum Medical Robot System

Wang, Fei	ZheJiang University
Luo, Changhao	ZheJiang University
Zhao, Zexi	Zhejiang University
Xiang, Pingyu	Zhejiang University
Qiu, Ke	Zhejiang University
Wei, Yufei	Zhejiang University
Wang, Yue	Zhejiang University
Xiong, Rong	Zhejiang University
Lu, Haojian	Zhejiang University
Keywords: Biologically-Inspired Robots, Automation at Micro-Nano Scales, Medical Robots and Systems Abstract: Nowadays laparoscopic surgery procedures face a trade-off between expensive, complex robotic systems and manual instruments with limited functionality. Fully robotic solutions offer precision but lack haptic feedback, portability, and intuitive control, while manual tools rely solely on the surgeon’s dexterity, limiting maneuverability and depth perception in confined spaces. To bridge this, we propose a Human-Guided Robotic-Assistance Handheld Continuum Medical Robot System (HRHC). This system combines intuitive manual operation with robotic precision, extending the surgeon’s capabilities while maintaining portability. Additionally, a stereo vision system enhances real-time depth perception, improving spatial awareness in minimally invasive procedures.

13:35-13:40, Paper TuBT1.4
Evaluation and Analysis of Precision Leaf Pruning End-Effectors within Dense Foliage Agriculture

Barthelme, Quinlan Teagan	Queensland University of Technology
Lehnert, Christopher	Queensland University of Technology
Parayil, Nidhi	QUT Centre for Robotics
Keywords: Agricultural Automation, Robotics and Automation in Agriculture and Forestry, Grippers and Other End-Effectors Abstract: Physical interaction tasks within dense foliage such as leaf pruning and fruit harvesting are current challenges for agricultural robotics. This is due to the cluttered and unstructured environment these tasks are conducted within, complex stem structures providing a number of obstacles that obscure and constrain end-effector operational workspaces. Therefore, enabling robots to operate within a dense foliage environment requires a purpose-built end-effector, able to perform precision based tasks despite workspace challenges. Whilst many tools have been implemented within related literature, most prior work evaluates their operational performance within reduced foliage testing environments. As such, this paper presents the performance evaluation of three end-effector mechanisms developed for robotic leaf pruning operations within unaltered dense foliage, referred to as the: scissor-cutter, curved-cutter and vacuum-cutter. End-effector mechanisms were chosen based on compact tool shapes, target approach direction range and deployment within related literature. Evaluation criteria focused on the mechanisms operational success rate and damage caused to the plant. Through this paper we show that a vacuum-cutter had the highest success rate of 75% whilst the scissor-cutter caused the least plant damage. A comprehensive failure mechanism assessment and improvement recommendations for future prototypes are also provided.

13:40-13:45, Paper TuBT1.5
On the Design of Fast-Response Variable-Stiffness Continuum Robot with Electro-Permanent Magnet-Based Ball Joints

Lee, Taerim	Korea Instituts of Science and Technology
Lee, Han-Sol	Korea Institute of Science and Technology
Song, Changseob	Carnegie Mellon University
Hwang, Donghyun	Korea Institute of Science and Technology
Keywords: Actuation and Joint Mechanisms, Mechanism Design, Flexible Robotics Abstract: Continuum robots offer exceptional adaptability and dexterity for tasks in confined and complex spaces. However, their flexible structures inherently limit payload capacity and structural robustness. To overcome this trade-off, many previous studies have proposed variable stiffness continuum robots, but these robots have slow response times that hinder rapid tasks. In this regard, we propose a novel continuum robot with electro-permanent magnet (EPM)-based ball joints. This robot has two key components: an EPM actuator and a variable stiffness ball joint (VSBJ). The EPM-embedded VSBJ constitutes discrete segments of the robot and dynamically transitions between low- and high-stiffness states by controlling the EPM’s magnetic field. This provides the robot with fast-response and variable-stiffness capabilities. In experiments, the VSBJ achieved a 205-fold stiffness variation ratio with a response time of 0.019 s, surpassing the performance of many previously studied robots. Leveraging these capabilities, the proposed robot demonstrated single or multiple bending motions by controlling the stiffness of discrete segments. Finally, the potential of the fast-response variable-stiffness continuum robot for real-world applications was confirmed.

13:45-13:50, Paper TuBT1.6
TFRR: A Novel Tensegrity-Based Fracture Reduction Robot with Force Sensing

Cui, Chenguang	University of Electronic Science and Technology of China
Wei, Dunwen	University of Electronic Science and Technology of China
Ficuciello, Fanny	Università Di Napoli Federico II
Keywords: Medical Robots and Systems, Bioinspired Robot Learning, Tendon/Wire Mechanism Abstract: This paper proposes a novel Tensegrity-Based Fracture Reduction Robot (TFRR) designed to enhance the safety and efficacy of orthopedic procedures through integrated force-sensing and control capabilities. Inspired by the biomechanics of skeletal muscles, the robot adopts a tensegrity architecture that enables real-time monitoring of internal force distribution and dynamic adjustment of posture and inter-bone contact forces via controlled tensioning of its string network. To establish a theoretical foundation for system control, a comprehensive static analysis of the tensegrity structure is conducted, allowing accurate modulation of topological configurations through systematic tension control. Extensive experimental validation demonstrates the robustness and reliability of the proposed method across a range of operating conditions. In particular, targeted experiments on contact-force regulation confirm the robot’s ability to precisely monitor and adjust inter-bone forces during fracture reduction. These features collectively enable safer, more controlled surgical interventions, with the potential to reduce tissue trauma and improve clinical outcomes.

13:50-13:55, Paper TuBT1.7
A Ribbed Hybrid Rigid-Flexible Tail with Graded Stiffness and Anisotropic Friction for Enhanced Robot Locomotion and Fall Damage Prevention

Borijindakul, Pongsiri	Vidyasirimedhi Institute of Science & Technology
Khaheshi, Ali	London South Bank University
Phetpoon, Theerawath	Vidyasirimedhi Institute of Science and Technology (VISTEC)
Rajabi, Hamed	London South Bank University
Manoonpong, Poramate	Vidyasirimedhi Institute of Science and Technology (VISTEC)
Keywords: Biologically-Inspired Robots, Climbing Robots, Mechanism Design Abstract: Lizards are capable of climbing stably on various terrains. Their tails are key to this ability. The lizard uses its flexible tail with graded stiffness as a fifth limb and climbing aid. The tail also enables soft landings, preventing injury from falls. Inspired by this, tails have been incorporated into many climbing robots to enhance their mobility, mimicking lizards. These robotic tails are generally classified as either rigid (stiff) or flexible (soft). A rigid tail can provide a large preload for pitch-back prevention but has a limited contact area for surface adhesion to avoid sliding backward on slopes. In contrast, a flexible tail conforms to the terrain’s contours, increasing the contact area and thereby improving surface adhesion. However, it provides limited preload. Therefore, in this study, we propose a novel hybrid rigid-flexible robotic tail (HIFLEX) that achieves a balanced combination of preload and contact area. The tail structure design features double-sided inclined ribs and is divided into three modular segments (base, middle, and tip), with graded stiffness decreasing progressively from the base to the tip. The asymmetric (inclined) ribbed structure allows the tail to generate anisotropic friction, resulting in high adhesion (tail-to-surface attachment) to prevent backward sliding and low friction (tail-to-surface release) to facilitate upward climbing. The proposed tail is attached to a climbing robot via an actuator capable of pressing the tail downward to generate sufficient preload. The experimental results demonstrate that this unique tail enhances the robot’s climbing performance on rough and deformable slopes while preventing damage to the robot during falls.

13:55-14:00, Paper TuBT1.8
Design and Geometry-Aware Planning of a Novel Probe-Scanning Manipulator with RCM Constraint

Luo, Xiao	The Chinese University of Hong Kong
Jiang, Zixing	The Chinese University of Hong Kong
Lei, Man Cheong	The Chinese University of Hong Kong
Xian, Yitian	The Chinese University of Hong Kong
Hu, Yingbai	Technische Universität München
Dong, Ai	The Chinese University of Hong Kong
Chiu, Peter Ka Fung	The Chinese University of Hong Kong
Li, Zheng	The Chinese University of Hong Kong
Keywords: Medical Robots and Systems, Mechanism Design, Motion and Path Planning Abstract: The remote center of motion (RCM) constraint is a vital requirement in the design of robotic systems for transrectal ultrasound (TRUS) probe-scanning. This paper presents the design and development of a novel RCM-constrained manipulator specifically tailored for TRUS probe-scanning applications. The proposed system features a six-degree-of-freedom (6-DoF) parallel-serial hybrid mechanism that enables the TRUS probe to perform pivot and spin rotations while maintaining the RCM constraint. Subsequently, the kinematic model incorporating the RCM constraint is derived. Additionally, a geometry-aware path planning method is then introduced, considering variations in the desired rotation targets. This method parameterizes distance metrics on SO(3) (a Lie group) using coordinate-free Riemannian geometry, enabling the dynamic optimization of rotation orders to minimize the calculated Riemannian metrics. Furthermore, a smooth rotational trajectory generation method is proposed, constructing rotation curves between the ordered matrices on SO(3) while minimizing angular acceleration. Both simulations and experimental results validate the effectiveness and practicality of the proposed manipulator and its path planning method.


TuBT2	402
Mobile Manipulation 2	Regular Session
Chair: Fischer, Tobias	Queensland University of Technology

13:20-13:25, Paper TuBT2.1
Learning Accurate Whole-Body Throwing with High-Frequency Residual Policy and Pullback Tube Acceleration

Ma, Yuntao	Light Robotics
Liu, Yang	EPFL
Qu, Kaixian	ETH Zürich
Hutter, Marco	ETH Zurich
Keywords: Mobile Manipulation, Whole-Body Motion Planning and Control, Reinforcement Learning Abstract: Throwing is a fundamental skill that enables robots to manipulate objects in ways that extend beyond the reach of their arms. We present a control framework that combines learning and model-based control for prehensile whole-body throwing with legged mobile manipulators. The framework integrates a nominal end-effector tracking policy, a high-frequency residual policy for improved tracking accuracy, and an optimization-based module for precise end-effector acceleration control. The proposed controller achieved the average of 0.28 m landing error when throwing at targets located 6 m away. Furthermore, in a comparison study with university students, the system achieved a velocity tracking error of 0.398 m/s and a success rate of 56.8%, hitting small targets randomly placed at distances of 3-5 m while throwing at a specified speed of 6 m/s. In contrast, humans have a success rate of only 15.2%. This work provides an early demonstration of prehensile throwing with quantified accuracy on hardware, contributing to progress in dynamic whole-body manipulation.

13:25-13:30, Paper TuBT2.2
Trajectory Tracking Control of Wheeled Mobile Manipulators with Joint Flexibility Via Virtual Decomposition Approach (I)

Xing, Hongjun	Nanjing University of Aeronautics and Astronautics
Xu, YuZhe	Nanjing University of Aeronautics and Astronautics
Ding, Liang	Harbin Institute of Technology
Chen, Jinbao	Nanjing University of Aeronautics and Astronautics
Gao, Haibo	Harbin Institute of Technology
Tavakoli, Mahdi	University of Alberta
Keywords: Mobile Manipulation, Whole-Body Motion Planning and Control, Robust/Adaptive Control Abstract: Wheeled mobile manipulators (WMMs) involving a wheeled mobile platform and a serial manipulator are finding increasing applications in diverse fields, creating new challenges in performing high-precision operations in a spacious workspace. WMMs are challenging to control due to uncertainties in system parameters, coupled dynamics, and external disturbances, which make stability guarantees difficult. This paper proposes a virtual decomposition control (VDC)-based trajectory tracking controller for WMMs, addressing joint flexibility, external disturbances, etc. The proposed method uses a VDC-based iterative approach to manage the complex coupled dynamics and employs a separate adaptive controller to handle joint flexibility. The robotic system’s stability is validated using the specific features of VDC (proof of each subsystem’s virtual stability) according to the Lyapunov stability theory. The advantages and effectiveness of the proposed method are demonstrated through experiments.

13:30-13:35, Paper TuBT2.3
RMMI: Reactive Mobile Manipulation Using an Implicit Neural Map

Marticorena, Nicolás	Queensland University of Technology
Fischer, Tobias	Queensland University of Technology
Haviland, Jesse	Queensland University of Technology
Sünderhauf, Niko	Queensland University of Technology
Keywords: Mobile Manipulation Abstract: Mobile manipulator robots operating in complex domestic and industrial environments must effectively coordinate their base and arm motions while avoiding obstacles. While current reactive control methods gracefully achieve this coordination, they rely on simplified and idealised geometric representations of the environment to avoid collisions. This limits their performance in cluttered environments. To address this problem, we introduce RMMI, a reactive control framework that leverages the ability of neural signed distance fields (SDFs) to provide a continuous and differentiable representation of the environment's geometry. RMMI formulates a quadratic program that optimises jointly for robot base and arm motion, maximises the manipulability, and avoids collisions through a set of inequality constraints. These constraints are constructed by querying the SDF for the distance and direction to the closest obstacle for a large number of sampling points on the robot. We evaluate RMMI both in simulation and in a set of real-world experiments. For reaching in cluttered environments, we observe a 25% increase in success rate. For additional details, code and experiment videos please visit https://rmmi.github.io/

13:35-13:40, Paper TuBT2.4
MORE: Mobile Manipulation Rearrangement through Grounded Language Reasoning

Mohammadi, Mohammad	University of Toronto
Honerkamp, Daniel	Albert Ludwigs Universität Freiburg
Büchner, Martin	University of Freiburg
Cassinelli, Matteo	Toyota Motor Europe
Welschehold, Tim	Albert-Ludwigs-Universität Freiburg
Despinoy, Fabien	Toyota Motor Europe
Gilitschenski, Igor	University of Toronto
Valada, Abhinav	University of Freiburg
Keywords: Mobile Manipulation, Task Planning, AI-Enabled Robotics Abstract: Autonomous long-horizon mobile manipulation encompasses a multitude of challenges, including scene dynamics, unexplored areas, and error recovery. Recent works have leveraged foundation models for scene-level robotic reasoning and planning. However, the performance of these methods degrades when dealing with a large number of objects and large-scale environments. To address these limitations, we propose MORE, a novel approach for enhancing the capabilities of language models to solve zero-shot mobile manipulation planning for rearrangement tasks. MORE leverages scene graphs to represent environments, incorporates instance differentiation, and introduces an active filtering scheme that extracts task-relevant subgraphs of object and region instances. These steps yield a bounded planning problem, effectively mitigating hallucinations and improving reliability. Additionally, we introduce several enhancements that enable planning across both indoor and outdoor environments. We evaluate MORE on 81 diverse rearrangement tasks from the BEHAVIOR-1K benchmark, where it becomes the first approach to successfully solve a significant share of the benchmark, outperforming recent foundation model-based approaches. Furthermore, we demonstrate the capabilities of our approach in several complex real-world tasks, mimicking everyday activities. We make the code publicly available at https://more-model.cs.uni-freiburg.de.

13:40-13:45, Paper TuBT2.5
Env-Mani: Quadrupedal Robot Loco-Manipulation with Environment-In-The-Loop

Li, Yixuan	Beijing Institute of Technology
Wang, Zan	Beijing Institute of Technology
Liang, Wei	Beijing Institute of Technology
Keywords: Mobile Manipulation, Whole-Body Motion Planning and Control, Manipulation Planning Abstract: Dogs can climb onto tables using their front legs for support, enabling them to retrieve objects and sig- nificantly expand their workspace by leveraging the external environment. However, the ability of quadrupedal robots to perform similar skills remains largely unexplored. In this work, we introduce a unified, learning-based loco-manipulation framework for quadrupedal robots, allowing them to utilize the external environment as support to extend their workspace and enhance their manipulation capabilities. Specifically, our method proposes a unified policy that takes limited onboard sensors and proprioception as input, generating whole-body actions that enable the robot to manipulate objects. To guide the policy learning for environment-in-the-loop manipulation, we design a set of rewards that address challenges such as imprecise perception and center-of-mass shifts. Additionally, we employ curriculum learning to train both teacher and student policies, ensuring effective skill transfer in complex tasks. We train the policy in simulation and conduct extensive experiments, demonstrating that our approach allows robots to manipulate previously inaccessible objects, opening up new possibilities for enhancing quadrupedal robot capabilities without the need for hardware modifications or additional costs. The project page is available at https://sites.google.com/view/env-mani.


TuBT3	403
Agricultural Automation	Regular Session
Chair: Gao, Yue	Shanghai JiaoTong University

13:20-13:25, Paper TuBT3.1
A Strawberry Harvesting Tool with Minimal Footprint

Sorour, Mohamed	University of Edinburgh
Abdelwahab, Mohamed Heshmat Hassan	Mohamed Bin Zayed University of Artificial Intelligence, MBZUAI
Elgeneidy, Khaled	Coventry University
From, Pål Johan	Norwegian University of Life Sciences
Keywords: Agricultural Automation, Service Robotics, Grippers and Other End-Effectors Abstract: In this paper, a novel prototype for harvesting table-top grown strawberries is presented, that is minimalist in its footprint interacting with the fruit. In our methodology, a smooth trapper manipulates the stem into a precise groove location at which a distant laser beam is focused. The tool reaches temperatures as high as 188 degree Celsius and as such killing germs and preventing the spread of local plant diseases. The burnt stem wound preserves water content and in turn the fruit shelf life. Cycle and cut times achieved are 5.56 and 2.88 seconds respectively in successful in-door harvesting demonstration. Extensive experiments are performed to optimize the laser spot diameter and lateral speed against the cutting time.

13:25-13:30, Paper TuBT3.2
GO-VMP: Global Optimization for View Motion Planning in Fruit Mapping

Isaac Jose, Allen	Bonn-Rhein-Sieg University of Applied Sciences
Pan, Sicong	University of Bonn
Zaenker, Tobias	University of Bonn
Menon, Rohit	University of Bonn
Houben, Sebastian	University of Applied Sciences Bonn-Rhein-Sieg
Bennewitz, Maren	University of Bonn
Keywords: Agricultural Automation Abstract: Automating labor-intensive tasks such as crop monitoring with robots is essential for enhancing production and conserving resources. However, autonomously monitoring horticulture crops remains challenging due to their complex structures, which often result in fruit occlusions. Existing view planning methods attempt to reduce occlusions but either struggle to achieve adequate coverage or incur high robot motion costs. We introduce a global optimization approach for view motion planning that aims to minimize robot motion costs while maximizing fruit coverage. To this end, we leverage coverage constraints derived from the set covering problem (SCP) within a shortest Hamiltonian path problem (SHPP) formulation. While both SCP and SHPP are well-established, their tailored integration enables a unified framework that computes a global view path with minimized motion while ensuring full coverage of selected targets. Given the NP-hard nature of the problem, we employ a region-prior-based selection of coverage targets and a sparse graph structure to achieve effective optimization outcomes within a limited time. Experiments in simulation demonstrate that our method detects more fruits, enhances surface coverage, and achieves higher volume accuracy than the motion-efficient baseline with a moderate increase in motion cost, while significantly reducing motion costs compared to the coverage-focused baseline. Real-world experiments further confirm the practical applicability of our approach.

13:30-13:35, Paper TuBT3.3
SheepDA-YOLO: Cross-Domain Adaptive Mean Teacher with Dual-Path Decoupling for Sheep Behavior Recognition

Chen, Xinjie	Northwest A&F University
Zhang, Haotian	Northwest A&F University
Qiao, Yongyuan	Northwest A&F University
Wang, Meili	Northwest A&F University
Keywords: Agricultural Automation, Robotics and Automation in Agriculture and Forestry, Deep Learning Methods Abstract: With the rapid advancement of smart farming towards large-scale livestock operations, the demand for model generalization in cross-pen behavior recognition has significantly increased. Traditional deep learning models suffer from substantial performance degradation due to variations in illumination and structure across different sheep pens, often necessitating the re-annotation of tens of thousands of frames for each new environment to mitigate domain shift issues. This severely limits the deployment of models in large-scale sheep farms. To achieve the goal of "annotate once, generalize across pens," we propose the SheepDA-YOLO framework, which innovatively integrates contrastive image translation and feature decoupling to address cross-domain adaptation challenges in agriculture. The core of our method consists of four parts: generating bidirectional pseudo-images for source and target domains based on CUT method to reduce image-level domain discrepancies through mixed training sets; employing a Mean Teacher architecture combined with a quadruple loss function to ensure stable knowledge transfer; proposing DP-DMAF module, which suppresses illumination interference and feature confusion through dual-path feature decoupling and separable large-kernel attention, complemented by a high-resolution detection layer to enhance small-target recognition accuracy. Experimental results demonstrate that SheepDA-YOLO achieves 89.7% mAP in cross-domain testing on target sheep pens, outperforming state-of-the-art methods by 3.4% and significantly reducing annotation costs. The study is the first to validate the feasibility of cross-pen adaptation, providing an efficient solution for the scalable implementation of smart livestock farming.

13:35-13:40, Paper TuBT3.4
A “Botany-Bot” for Digital Twin Monitoring of Occluded and Underleaf Plant Structures

Adebola, Simeon Oluwafunmilore	University of California, Berkeley
Kim, Chung Min	University of California, Berkeley
Kerr, Justin	University of California, Berkeley
Xie, Shuangyu	UC Berkeley
Akella, Prithvi	California Institute of Technology
Susa Rincon, Jose Luis	Siemens Corporation
Solowjow, Eugen	Siemens Corporation
Goldberg, Ken	UC Berkeley
Keywords: Agricultural Automation, Computer Vision for Automation, Sensor Fusion Abstract: Commercial plant phenotyping systems using fixed cameras cannot perceive many plant details due to leaf occlusion. In this paper, we present Botany-Bot, a system for building detailed “annotated digital twins” of living plants using two stereo cameras, a digital turntable inside a lightbox, an industrial robot arm, and 3D segmentated Gaussian Splat models. We also present robot algorithms for manipulating leaves to take high-resolution indexable images of occluded details such as stem buds and the underside/topside of leaves. Results from experiments suggest that Botany-Bot can segment leaves with 90.8% accuracy, detect leaves with 86.2% accuracy, lift/push leaves with 77.9% accuracy, and take detailed overside/underside images with 77.3% accuracy. Code, videos, and datasets are available at https://berkeleyautomation.github.io/Botany-Bot/.

13:40-13:45, Paper TuBT3.5
Wireless Collaborative Inference Acceleration Based on Distillation for Weed Detection and Instance Segmentation

Li, Rongjiao	Southwest Minzu University
Mo, Yunchao	Southwest Minzu University
Zhao, Rongze	Southwest Minzu University
Gao, Haojia	Beijing Univeristy of Technology
Que, Haohua	Tsinghua University
Mu, Lei	Southwest Minzu University
Keywords: Agricultural Automation Abstract: This paper presents a wireless collaborative inference framework optimized for deep learning-based weed instance segmentation on resource-limited weeding robots. Traditional Mask R-CNN struggles with detecting small weeds, suffers from low recall rates, and exhibits the checkerboard effect in segmentation results. To address these challenges, we introduce three key improvements: a feature fusion strategy in the backbone network to enhance small object detection, an improved Region Proposal Network (RPN) with Soft-NMS to reduce false positives and missed detections in complex environments, and a refined mask branch incorporating fully connected upsampling to mitigate checkerboard effects. Additionally, knowledge distillation is employed to compress the model, significantly improving inference speed while maintaining segmentation accuracy. To further enhance inference efficiency, we propose a two-stage approach for determining the optimal partition point and develop a resource-aware optimization algorithm that dynamically adjusts to fluctuating network bandwidth and computational constraints. Experimental evaluations confirm that the proposed approach surpasses existing methods and remains stable across varying resource conditions. A real-world implementation of a drone-server system validates the feasibility of the framework, showcasing its potential for robust and scalable weed detection and segmentation in precision agriculture applications.

13:45-13:50, Paper TuBT3.6
Deep Learning-Based Pig Behavior Captioning for Smart Livestock Farming

Jiang, Honghua	Shandong Agriculture University
Zeng, Yongqing	Shandong Agricultural University
Qiao, Yongliang	University of Adelaide
Keywords: Agricultural Automation, Robotics and Automation in Agriculture and Forestry, Deep Learning for Visual Perception Abstract: Precise monitoring of pig behavior has become pivotal for enhancing animal welfare and breeding efficiency. However, existing studies predominantly focus on behavior recognition while neglecting environmental influences, and lack specialized image captioning models and datasets tailored for farm scenarios, hindering textual analysis of behavior-environment interactions. In this study, a multimodal image captioning model was proposed to generate semantic textual descriptions of pig behavior, thereby supporting smart decision-making in farm management. The model employs a ResNet-18 encoder to extract pig visual biometric features from RGB and depth images, coupled with an innovative decoder integrating an enhanced Long Short-Term Memory (LSTM) network and Graph Convolutional Network (GCN) for pig behavior textual description, effectively resolving the input inconsistency between training and inference phases in traditional Encoder-Decoder architectures. Additionally, a dedicated pig behavior dataset comprising 9,052 annotated images was constructed, covering four behavioral categories: standing, sitting, lying, and eating. The experimental results show that the proposed approach achieves a METEOR score of 88.25%, which outperformed baseline models by up to 21.58%. By recognizing pig behavior and interpreting environmental context, the proposed approach introduces a practical methodology for analyzing behavior–environment interactions and facilitates the integration of LLM-embedded robotic systems into smart livestock farming.

13:50-13:55, Paper TuBT3.7
In-Situ Classification of Soil Types Exploiting Electrical Impedance Tomography with a Robotic Actuating Probe

Xu, Xiaoxian	University of Cambridge
Merchant, Catherine Emma Maria	University of Cambridge
Ishida, Michael	University of Cambridge
Hardman, David	University of Cambridge
Iida, Fumiya	University of Cambridge
Keywords: Agricultural Automation, Robotics and Automation in Agriculture and Forestry, Sensor Networks Abstract: Soil is a vital resource for various industries, including agriculture, engineering, and manufacturing, where accurate in-situ classification is essential for a wide range of applications. Electrical Impedance Tomography (EIT) enables real-time soil classification by capturing complex impedance data across varying distances. This study presents a novel approach integrating EIT with actuating probes to dynamically generate rich datasets for distinguishing soil types and moisture levels. By utilizing eight moving electrodes multiplexed across 32 channels, this system overcomes the limitations of traditional laboratory-based methods, such as time constraints and data skew caused by non-homogeneous inclusions. The moving electrode design significantly outperforms the stationary setup, achieving an average classification accuracy of 93% across varying moisture levels of sand, clay, and silt combinations. Experimental results on a larger data set demonstrates a classification accuracy of up to 79.7% across 25 different soil-moisture combinations, underscoring the technique's potential for effective in-field soil analysis The improved accuracy achieved through actuation, compared to stationary probes, suggests broader applications in precision agriculture, civil engineering, and environmental monitoring.

13:55-14:00, Paper TuBT3.8
A Climbing Robot for Tube-Sheet Inspection Based on Planar Parallel Mechanisms

Zhang, Kuan	Harbin Institute of Technology
Fan, Jizhuang	Robot Research Institute, Harbin Institute of Technology
Xu, Tian	Harbin Institute of Technology
Zhang, Xuehe	Harbin Institute of Technology
Lin, Jinghan	Harbin Institute of Technology
Yu, Xuan	Harbin Institute of Technology
Zhao, Jie	Harbin Institute of Technology
Keywords: Climbing Robots, Mechanism Design, Robotics in Hazardous Fields Abstract: This paper introduces a novel climbing robot for tube-sheet inspection (CRTI) that uses inner wall grippers (IWGs) to grasp heat transfer tubes (HTTs), enabling it to hang and crawl beneath the tube-sheet plane. The robot is designed primarily for inspecting HTTs within the steam generators of nuclear power plants. A planar parallel configuration with metamorphic characteristics is proposed to address the limitations of existing CRTIs in adaptability, load capacity, and efficiency. Meanwhile, a modular pneumatic IWG with an inclined wedge mechanism is introduced, featuring passive rotational freedom, a load capacity exceeding 100 kg, fall prevention in the event of gas or power failure, and a grasping action that does not damage the HTT. Furthermore, a forward kinematic solution method for parallel robots based on coordinate transformation is proposed, providing a unique analytical solution for the forward kinematics of all parallel configurations of this CRTI. The performance evaluation, which is grounded in singularity analysis and typical experiments, reveals that the proposed CRTI offers high efficiency, substantial load capacity, and exceptional adaptability, indicating its significant potential for future applications.


TuBT4	404
Robot Safety 2	Regular Session
Chair: Xie, Yangmin	Shanghai University
Co-Chair: Strobel, Volker	Université Libre De Bruxelles

13:20-13:25, Paper TuBT4.1
Learning-Based Passive Fault-Tolerant Control of a Quadrotor with Rotor Failure

Chen, Jiehao	Harbin Institute of Technology Shenzhen
Zhao, Kaidong	Harbin Institute of Technology
Liu, Zihan	Harbin Institude of Technology, ShenZhen
Li, Yanjie	Harbin Institute of Technology (Shenzhen)
Lou, Yunjiang	Harbin Institute of Technology, Shenzhen
Keywords: Machine Learning for Robot Control, Robot Safety, Aerial Systems: Mechanics and Control Abstract: This paper proposes a learning-based passive fault-tolerant control (PFTC) method for quadrotor capable of handling arbitrary single-rotor failures, including conditions ranging from fault-free to complete rotor failure, without requiring any rotor fault information or controller switching. Unlike existing methods that treat rotor faults as disturbances and rely on a single controller for multiple fault scenarios, our approach introduces a novel Selector-Controller network structure. This architecture integrates fault detection module and the controller into a unified policy network, effectively combining the adaptability to multiple fault scenarios of PFTC with the superior control performance of active fault-tolerant control (AFTC). To optimize performance, the policy network is trained using a hybrid framework that synergizes reinforcement learning (RL), behavior cloning (BC), and supervised learning with fault information. Extensive simulations and real-world experiments validate the proposed method, demonstrating significant improvements in fault response speed and position tracking performance compared to state-of-the-art PFTC and AFTC approaches.

13:25-13:30, Paper TuBT4.2
FOCI: Trajectory Optimization on Gaussian Splats

Gomez Andreu, Mario Alejandro	Technical University Darmstadt
Wilder-Smith, Maximum	ETH Zurich
Klemm, Victor	ETH Zurich
Patil, Vaishakh	RSL ETH Zurich
Tordesillas Torres, Jesus	ICAI School of Engineering, Comillas Pontifical University
Hutter, Marco	ETH Zurich
Keywords: Motion and Path Planning, Collision Avoidance, Robot Safety Abstract: 3D Gaussian Splatting (3DGS) has recently gained popularity as a faster alternative to Neural Radiance Fields (NeRFs) in 3D reconstruction and view synthesis methods. Leveraging the spacial information encoded in 3DGS, this work proposes ConvGauss, an algorithm that is able to optimize trajectories directly on the Gaussians themselves. ConvGauss leverages a novel and interpretable collision formulation for 3DGS using the notion of the convolution between Gaussians. Contrary to other approaches, which represent the robot with conservative bounding boxes that underestimate the traversability of the environment, we propose to represent the environment and the robot as Gaussian Splats. This not only has desirable computational properties, but also allows for orientation-aware planning, allowing the robot to pass through very tight and narrow spaces. We extensively test our algorithm in both synthetic and real Gaussian Splats, showcasing that collision-free trajectories for the ANYmal legged robot that can be computed in a few seconds, even with hundreds of thousands of Gaussians making up the environment.

13:30-13:35, Paper TuBT4.3
Safety-Guided RRT*: Hyperoctant Sampling-Based Path Planning with SDF-Based Robotic Representation

Xie, Yangmin	Shanghai University
Zhong, Yuqiao	Shanghai University
Shi, Hang	Shanghai University
Yang, Yusheng	Shanghai University
Keywords: Motion and Path Planning, Collision Avoidance, Robot Safety Abstract: Sampling-based path planning algorithms, such as Rapidly-exploring Random Tree (RRT), are widely used for motion planning in high degree-of-freedom robotic systems due to their efficiency in exploring high-dimensional spaces. However, traditional methods rely on binary collision detection, which only determines whether a sampled configuration is in a collision without quantifying its safety, often resulting in trajectories that are overly close to obstacles and reducing planning success rates, especially in complex environments with narrow passages. To address this issue, we propose Safety-Guided RRT* (SG-RRT), which integrates a quantitative safety metric based on signed distance functions (SDFs) with a hyperoctant sampling strategy, enabling the planner to prioritize safer configurations and steer tree expansion toward collision-free regions. This approach significantly improves path planning success rates while generating safer trajectories with greater clearance from obstacles. Extensive simulations and real-world experiments demonstrate that SG-RRT outperforms state-of-the-art methods, including RRT, Informed-RRT, TRRT, and Bi-TRRT, by achieving higher success rates and reducing collision risks, with only a slight increase in trajectory length.

13:35-13:40, Paper TuBT4.4
Fault-Tolerant Model Predictive Control for Safety of Unmanned Surface Vessels Berthing under Multimodal Disturbances and Various Constraints

Shi, Jiangteng	Hainan University
Deng, Shineng	HaiNanUniversity
Ren, Jia	Hainan University
Chen, Yujing	HaiNan University
Keywords: Marine Robotics, Robot Safety, Robust/Adaptive Control Abstract: Autonomous berthing is a typical task for maritime operations of unmanned surface vessels (USVs). However, during the berthing operations, USVs are subject to multimodal disturbances, such as external ocean disturbances (EODs) and internal thruster faults (ITFs), as well as various constraints, including underactuated nonlinear dynamic constraints, actuator saturation constraints, and obstacle avoidance constraint. In this paper, a fault-tolerant model predictive control (FTMPC) framework is proposed for safety of USV berthing by uniformly considering both disturbances and constraints. Specially, a control density function is integrated into the FTMPC framework to model the obstacle avoidance constraint. Moreover, leveraging the backstepping method and fuzzy logic system, an auxiliary control law considering the EODs and ITFs is constructed as a constraint. Furthermore, sufficient conditions that ensure recursive feasibility, and thus closed-loop stability, are provided analytically. Within this FTMPC framework, multimodal disturbances and various constraints can be naturally considered simultaneously. Multiple experimental results in autonomous berthing scenarios demonstrate that the proposed method has excellent fault tolerance and safety under multimodal disturbances and various constraints.

13:40-13:45, Paper TuBT4.5
Risk Euclidean Distance-Based Model Predictive Path Integral to Safety-Critical Obstacle Avoidance

Huang, Zihao	ShenZhen University
Li, Ruocheng	Beijing Institute of Technology
Weili, Chen	Shenzhen University
Lin, Zicong	ShenZhen University
Wu, Zhipeng	Shenzhen University
Zhang, Bo	Shenzhen University, Shenzhen 518060, China
Keywords: Motion Control, Robust/Adaptive Control, Robot Safety Abstract: Sampling-based Model Predictive Control (MPC) algorithms such as Model Predictive Path Integral (MPPI) excel in managing nonlinear constraints and complex systems. However, their conventional sampling strategies often result in suboptimal local solutions. To address this problem, we propose RESM-MPPI, a novel dynamic obstacle avoidance algorithm that integrates the Risk Euclidean Safety Metric (RESM), which is an enhanced version of the Conventional Euclidean Safety Metric (CESM), to more effectively quantify collision risks between autonomous mobile robots (AMRs) and dynamic obstacles. Our approach extends the classical Control Barrier Function (CBF) framework by introducing the Risk Control Barrier Function (RCBF) and intergrating a Control Obstacle Avoidance Annealing (COAA) sampling strategy to enhance obstacle avoidance performance. This com- bination enables the generation of safe and smooth trajectories for AMRs in dynamic environments. Sufficient simulations and real-world experiments demonstrate the effectiveness of the proposed method. Experimental videos are available at: https://youtu.be/WUchIzz_0wU.

13:45-13:50, Paper TuBT4.6
On the Vulnerability of LLM/VLM-Controlled Robotics

Wu, Xiyang	University of Maryland
Chakraborty, Souradip	University of Maryland
Xian, Ruiqi	University of Maryland-College Park
Liang, Jing	University of Maryland
Guan, Tianrui	University of Maryland
Liu, Fuxiao	University of Maryland
Sadler, Brian	Army Research Laboratory
Manocha, Dinesh	University of Maryland
Bedi, Amrit Singh	University of Maryland, College Park
Keywords: Machine Learning for Robot Control, Robot Safety Abstract: In this work, we highlight vulnerabilities in robotic systems integrating large language models (LLMs) and vision-language models (VLMs) due to input modality sensitivities. While LLM/VLM-controlled robots show impressive performance across various tasks, their reliability under slight input variations remains underexplored yet critical. These models are highly sensitive to instruction or perceptual input changes, which can trigger misalignment issues, leading to execution failures with severe real-world consequences. To study this issue, we analyze the misalignment-induced vulnerabilities within LLM/VLM-controlled robotic systems and present a mathematical formulation for failure modes arising from variations in input modalities. We propose empirical perturbation strategies to expose these vulnerabilities and validate their effectiveness through experiments on multiple robot manipulation tasks. Our results show that simple input perturbations reduce task execution success rates by 22.2% and 14.6% in two representative LLM/VLM-controlled robotic systems. These findings underscore the importance of input modality robustness and motivate further research to ensure the safe and reliable deployment of advanced LLM/VLM-controlled robotic systems.

13:50-13:55, Paper TuBT4.7
In-Link Actuators for Low-Inertia Robots (I)

Morikawa, Kazuma	Keio University
Katsura, Seiichiro	Keio University
Keywords: Actuation and Joint Mechanisms, Robot Safety, Motion Control Abstract: Manipulators are versatile and widely used as industrial robots, but they are also expected to be widely used in environments familiar to the general public. To facilitate close interaction with humans and to integrate into people’s daily lives, safety is extremely important. By reducing the inertia of the moving parts of the robot, the kinetic energy of the robot can be reduced, thereby, increasing safety even when the robot is driven at high speed. In addition, when the inertia of a multidegree-of-freedom manipulator or robot is large, the output required for the actuators that support it becomes large. This results in a negative spiral in which actuators with large outputs are installed, and the actuators of the root link become increasingly larger to support them. Thus, low inertia is important as a means to improve the fundamental problems that affect a wide range of different robots. In this study, an in-link actuator is proposed, in which a drive unit consisting of coils and a magnet is directly incorporated into the links to reduce the inertia of the joints and the robot as a whole. Coils and a magnet embedded in each separate link make it possible to drive the links simply by combining them. An in-link actuator with drive performance aimed at practical use was fabricated, and through experiments, the drive performance as an actuator and the ability to significantly reduce the moment of inertia and some of its advantages were confirmed.


TuBT5	407
Motion Control 2	Regular Session
Chair: Boehler, Quentin	ETH Zurich
Co-Chair: Lou, Yunjiang	Harbin Institute of Technology, Shenzhen

13:20-13:25, Paper TuBT5.1
Mysteric-Net: MIMO Hysteretic Friction-Aware Lagrangian-Based Network for Legged Robot

Yeo, Hoyeong	DGIST
Hong, Jinsong	DGIST
Kong, Taejune	DGIST
Oh, Sehoon	DGIST
Keywords: Motion Control, Deep Learning Methods, Model Learning for Control Abstract: Accurate dynamics modeling is crucial for achieving precise Ground Reaction Force (GRF) control and high-performance legged locomotion. However, real-world legged systems exhibit strong frictional effects with hysteresis and inter-joint coupling, which conventional static friction models or purely data-driven approaches often fail to capture. In this paper, we propose Mysteric-Net, a novel MIMO hysteretic friction-aware network that combines a Lagrangian-based formulation with a Temporal Convolutional Network (TCN). By embedding the physical laws of Lagrangian mechanics while modeling history-dependent frictional dissipation via the TCN, our framework accurately identifies the system dynamics, including complex friction and coupling effects. This paper demonstrates that the proposed method significantly improves the accuracy of inverse dynamics estimation on a robotic leg. Furthermore, this paper shows that the learned model enables the design of an effective feedforward controller that mitigates friction and enhances tracking performance over conventional baseline methods.

13:25-13:30, Paper TuBT5.2
Proxy-Based Super-Twisting Algorithm for MEMS Mirror Control under Input Saturation and Vibration (I)

Fan, Zhiyu	Harbin Institute of Technology Shenzhen
Xiong, Xiaogang	Harbin Institute of Technology, Shenzhen
Lou, Yunjiang	Harbin Institute of Technology, Shenzhen
Zhu, Xu	Harbin Institute of Technology Shenzhen
Keywords: Motion Control, Embedded Systems for Robotic and Automation, Robust/Adaptive Control Abstract: Microelectromechanical systems (MEMSs) scanning mirrors are widely used in imaging devices such as light detection and ranging (LIDAR) and head-up displays (HUDs). When installed in autonomous vehicles, these MEMS mirrors must accurately track set-points or continuous trajectories despite vibration caused by rugged environments and limited input voltage due to battery management. Super-twisting algorithm (STA), one of the sliding mode control (SMC) strategies with lower chattering effects, can be used to control the MEMS scanning mirror. However, the conventional STA cannot be employed due to requirements for antiwindup and robustness against discontinuous disturbances. This manuscript proposes a novel strategy called proxy-based STA (PSTA), combined with a set-valued terminal SMC, for controlling MEMS scanning mirrors under conditions of discontinuous disturbances and input saturation. The PSTA inherits advantages of the terminal SMC for antiwindup effects and robustness to discontinuous disturbances along with high control accuracy from STA. To reduce chattering effects caused by the terminal SMC, this manuscript further discretizes and realizes PSTA through semi-implicit Euler methods. Both simulation and experimental results demonstrate that this proposed PSTA realization method achieves better performance regarding discontinuous disturbances and input saturation limits.

13:30-13:35, Paper TuBT5.3
Minimum Time Formation Control of AUVs with Smooth Transition in Communication Topology

Yan, Jing	Yanshan University
Wu, Chuang	Yanshan University
Yang, Xian	Yanshan University
Chen, Cailian	Shanghai Jiao Tong University
Guan, Xinping	Shanghai Jiaotong University
Keywords: Motion Control, Machine Learning for Robot Control, Integrated Planning and Control Abstract: This letter studies the minimum time formation control issue of autonomous underwater vehicles (AUVs), subject to switching topology during the formation procedure. We first employ the smoothstep function to describe the smooth transition of communication topology. Then, a minimum time formation control problem is constructed by minimizing the integral temporal difference errors of AUVs. To solve the above problem, an integral reinforcement learning (IRL) based controller is designed for AUVs to achieve leader-follower formation task. Of note, the smooth transition in this letter can reduce oscillation and improve safety during the formation change procedure, and meanwhile, the IRL-based formation controller can achieve minimum-time formation task in an unknown environment. Finally, the theory solutions are verified by simulation and experimental results.

13:35-13:40, Paper TuBT5.4
Robust Stabilization of an Autonomous Underwater Vehicle in Specified Finite-Time with Disturbance Rejection

Niu, Hongjiao	Tsinghua University
Keywords: Motion Control, Marine Robotics, Robust/Adaptive Control Abstract: This study investigates the robust finite-time stabilization of an autonomous underwater vehicle (AUV) with disturbance rejection, where the finite-time can be predetermined. The AUV is modeled as a rigid body moving within fluids, and the system’s dynamics involves uncertain parameters arising from the hydrodynamic coupling between the AUV and fluid, along with unknown external disturbances. Direct compensation for the system's dynamics, a common approach in controller design, is ineffective for AUVs with uncertainties. Therefore, a robust anti-disturbance control method without compensating for unknown dynamics is proposed. To begin, a time rescaling method is introduced to convert the specified finite-time stabilization of the system into an asymptotic stabilization for a time rescaling system. Then, an exponential PID controller is designed for the time rescaling system to handle unknown constant disturbances while a high-gain control strategy is used to suppress the uncertain dynamics, which also enhances the robustness against the uncertain dynamics. The specified finite-time stabilization control law is ultimately derived from the designed exponential control law of the time rescaling system. Numerical simulations are conducted to verify the results.

13:40-13:45, Paper TuBT5.5
Dynamic Electromagnetic Navigation

Zughaibi, Jasan	ETH Zurich, Swiss Federal Institute of Technology Zurich
Nelson, Bradley J.	ETH Zurich
Muehlebach, Michael	Max Planck Institute for Intelligent Systems
Keywords: Motion Control, Medical Robots and Systems Abstract: Magnetic navigation offers wireless control over magnetic objects, which has important medical applications, such as targeted drug delivery and minimally invasive surgery. Magnetic navigation systems are categorized into systems using permanent magnets and systems based on electromagnets. Electromagnetic Navigation Systems (eMNSs) are believed to have a superior actuation bandwidth, facilitating trajectory tracking and disturbance rejection. This greatly expands the range of potential medical applications and includes even dynamic environments as encountered in cardiovascular interventions. To showcase the dynamic capabilities of eMNSs, we successfully stabilize a (non-magnetic) inverted pendulum on the tip of a magnetically driven arm. Our approach employs a model-based framework that leverages Lagrangian mechanics to capture the interaction between the mechanical dynamics and the magnetic field. Using system identification, we estimate unknown parameters, the actuation bandwidth, and characterize the system's nonlinearity. To explore the limits of electromagnetic navigation and evaluate its scalability, we characterize the electrical system dynamics and perform reference measurements on a clinical-scale eMNS, affirming that the proposed dynamic control methodologies effectively translate to larger coil configurations. A state-feedback controller stabilizes the inherently unstable pendulum, and an iterative learning control scheme enables accurate tracking of non-equilibrium trajectories. Furthermore, to understand structural limitations of our control strategy, we analyze the influence of magnetic field gradients on the motion of the system. To our knowledge, this is the first demonstration to stabilize a 3D inverted pendulum through electromagnetic navigation.

13:45-13:50, Paper TuBT5.6
LLA-MPC: Fast Adaptive Control for Autonomous Racing

AL-Sunni, Maitham	Carnegie Mellon University
Almubarak, Hassan	Georgia Institute of Technology, King Fahd University of Petrole
Horng, Katherine	Carnegie Mellon University
Dolan, John M.	Carnegie Mellon University
Keywords: Motion Control, Robust/Adaptive Control, Optimization and Optimal Control Abstract: We present Look-Back and Look-Ahead Adaptive Model Predictive Control (LLA-MPC), a real-time adaptive control framework for autonomous racing that addresses the challenge of rapidly changing tire-surface interactions. Unlike existing approaches requiring substantial data collection or offline training, LLA-MPC employs parallelization over a bank of models for rapid adaptation with no training. It integrates two key mechanisms: a look-back window that uses recent vehicle behavior to optimize the model used in a look-ahead stage for trajectory optimization and control. The optimized model and its associated parameters are then incorporated into an adaptive path planner to optimize reference racing paths in real time. Experiments across diverse racing scenarios demonstrate that LLA-MPC outperforms state-of-the-art methods in adaptation speed and handling, even during sudden friction transitions. Its learning-free, computationally efficient design enables rapid adaptation, making it ideal for high-speed autonomous racing in multi-surface environments.

13:50-13:55, Paper TuBT5.7
Adaptive Predefined-Time Synchronization and Tracking Control for Multimotor Driving Servo Systems (I)

Hu, Shuangyi	Zhejiang University of Technology
Chen, Qiang	Zhejiang University of Technology
Ren, Xuemei	Beijing Institute of Technology
Wang, Shubo	Kunming University of Science and Technology
Keywords: Motion Control, Robust/Adaptive Control Abstract: This article proposes a predefined-time synchronization and tracking control strategy for multimotor driving servo systems. First, a mean relative coupling synchronization controller is constructed with predefined-time convergence, such that the convergence time can be preset by adjusting one parameter and does not rely on initial states. In order to avoid the singularity problem, a predefined-time command filtered backstepping tracking controller is designed to reduce the computational complexity and ensure that the tracking error can converge into a neighborhood near the origin within a predefined time. Experiments are conducted to verify the efficiency of the proposed method.

13:55-14:00, Paper TuBT5.8
Feedback Control of a Two-Degree-Of-Freedom Electromagnetic Reluctance Precision Motion System

Pumphrey, Michael Joseph	University of Guelph
Al Saaideh, Mohammad	Memorial University of Newfoundland
Alatawneh, Natheer	University of Guelph
Al Janaideh, Mohammad	University of Guelph
Keywords: Motion Control Abstract: This study investigates a novel Xθ actuation system driven by a reluctance actuator (RA) and two accompanying moving magnet actuators (MMAs). The system enables precise control of both translational (x) and rotational (θ) motion, offering a two-degree-of-freedom (2DOF) solution for high-precision applications. The two MMAs introduce additional force and torque dynamics through the solenoid and permanent magnet (PM) pairs. Flexure hinges assist with the retraction force of the mover element, providing the necessary stiffness without introducing frictional effects. The system was modeled analytically, optimized, and validated experimentally with a developed feedback and feedback control, achieving steady-state errors of approximately ±7 µm in x translation and ±0.3 mrad in θ rotation which can be attributed to systematic errors in the sensor itself. The most relevant application is scanning mirror systems where specific targeted rotational and translational trajectories can benefit light beam positioning. This system allows translation and rotation specifications to be realized in one actuation unit, opening up more design possibilities for controlling precision motion systems.


TuBT6	301
Micro/Nano Robots 2	Regular Session
Chair: Huang, Chenyang	Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Co-Chair: Qiu, Tian	German Cancer Research Center (DKFZ)

13:20-13:25, Paper TuBT6.1
A Self-Moving Piezoelectric Actuator with High Carrying/Positioning Capability Via Bending-Resonant-Vibration-Induced Stick-Slip Motion (I)

Liu, Jinshuo	Shandong University
Ding, Zhaochun	Shandong University
Wu, Jiang	Shandong University
Wang, Lipeng	Yanshan University
Chen, Teng	Shandong University
Rong, Xuewen	Shandong University
Song, Rui	Shandong University
Li, Yibin	Shandong University
Keywords: Micro/Nano Robots, Actuation and Joint Mechanisms, Medical Robots and Systems Abstract: A self-moving piezoelectric actuator (SMPA) with high carrying/positioning capability is presented in this article. Its Π-shaped mechanical part comprises four piezo-legs, each of which combines the bending vibrations in the first three orders into the motion in the quasi- sawtooth waveform at the driving foot. Besides, a homemade onboard circuit is integrated with the mechanical part to form compact structure. Initially, by establishing a Krimhertz-transmission-theory-based model, the piezo-leg's resonant frequencies of the 1st, 2nd, and 3rd bending modes were structurally tuned to be approximately 1:2:3. Subsequently, a prototype with the size of 75 × 55 × 55 mm3 and the weight of 38.5 g was fabricated to assess its moving/carrying/positioning performance. At 2065 Hz frequency, SMPA in a tethered manner yielded the maximal payload of 1130 g (equal to 29.3 times of its weight), the maximal speed of 224.1 mm/s, and the maximal towing force of 1.24 N. In an untethered manner, SMPA provided planar movements when receiving the command wirelessly, and it produced the minimal step displacements and the maximal running distance of 12.2 nm and 9.16 m, respectively. Benefitting from the two-DOF untethered movement, SMPA is potentially applicable to the robotic-assistant precise operation, e.g., cell puncture.

13:25-13:30, Paper TuBT6.2
Reinforcement Learning-Based Energy-Efficient and Obstacle-Free Path Planning for Magnetic Microrobots in Dynamic Environments

Wang, Hongwei	Chinese Academic of Science
Cai, Mingxue	The Chinese University of Hong Kong (CUHK), Shatin NT, Hong Kong
Luo, Jun	Chinese Academy of Sciences
Jiang, Mingguo	Shenzhen Institute of Advanced Technology，Chinese Academy
Huang, Chenyang	Shenzhen Institutes of Advanced Technology, Chinese Academy of S
Shen, Haolan	Chinese Academy of Sciences
Xu, Tiantian	Chinese Academy of Sciences
Keywords: Micro/Nano Robots, AI-Based Methods, Motion and Path Planning Abstract: Online path planning for magnetic microrobots actuated by electromagnetic system in dynamic flow field presents significant challenges due to time-varying fluid dynamics, energy constraints, and collision risks. Traditional path planning approaches, which often rely on static flow assumptions or simplified geometric models, struggle to balance energy efficiency, path continuity, and adaptability in real-world scenarios. This paper introduces an end-to-end path planner for energy-efficient and collision-free navigation of magnetic helical microrobots, integrating flow field feature extraction and reinforcement learning (RL) framework. Our method employs a transformer encoder to capture contextual correlations of flow field and uses a Soft Actor-Critic (SAC) framework to optimize energy consumption while ensuring dynamic obstacle avoidance. Simulations and experiments in dynamic flow environments validate our approach, demonstrating 14.7% lower energy consumption and robust collision avoidance in several different test scenarios.

13:30-13:35, Paper TuBT6.3
Zoned Artificial Repulsion: Path Planning through Local Minima for Multiple-Robot Dexterous Micromanipulation

Dannawi Aissaoui, Tala	Université Marie Et Louis Pasteur
Dahmouche, Redwan	Université De Franche Comté
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Dexterous Manipulation Abstract: We introduce in this paper an original planning algorithm, Zoned Artificial Repulsion (ZAR), designed for multiple-robot micromanipulation. The algorithm combines a modified Artificial Potential Field (APF) with A to efficiently compute micromanipulation trajectories. The motivation behind ZAR* is to integrate the advantages of APF and A* while avoiding their limitations. While APF is computationally efficient but prone to local minima, A* guarantees completeness but has high complexity. ZAR* employs a modified repulsive field to segment the configuration space into distinct zones, each converging to a unique local minimum. These zones are reduced to nodes in a graph, allowing A* to compute the inter-zone transitions. The modified APF then handles navigation within the zones. This method ensures the completeness of the algorithm, avoids local minima, and significantly reduces the number of nodes in the graph, leading to a highly efficient algorithm in terms of processing time and path cost. We compare ZAR* against A, APF, RRT, RRT ∗, and PRM, as well as more recent hybrid algorithms. On average, ZAR reduces the number of nodes by 909 times and speeds up path construction efficiency (time × cost) by 499 times on average, while maintaining a 100% success rate. It also performs more than 4 times better than the best hybrid alternative of standard algorithms, making it suitable for dexterous micromanipulation tasks.

13:35-13:40, Paper TuBT6.4
Optoelectronic Navigation-Based Microtruck: For Efficient Cargo Loading, Transport, and Unloading

Wang, Ao	Beihang University
Niu, Wenyan	Beihang University
Ni, Caiding	Beihang University
Huang, Shunxiao	Beihang University
Guo, Yingjian	Beihang University
Feng, Lin	Beihang University
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Biological Cell Manipulation Abstract: This study proposes an optoelectronic navigation strategy leveraging Ag-SiO2 microspheres as "microtruck" to overcome the limitations of traditional optoelectronic tweezers (OET) in manipulating negative dielectrophoresis (nDEP) particles. By dynamically adjusting electric field frequency and optical parameters, we regulate particle-induced dielectrophoretic forces (PiDEP) to achieve efficient adsorption, high-speed transport, and site-specific unloading of nDEP-responsive cargo. Experimental results demonstrate a seven times enhancement in manipulation velocity compared to conventional direct optical methods, along with the capability for simultaneous multi-particle transport. In addition, we utilized finite element simulations to analyze the optimal electric field frequency and optical parameters for the microtruck’s loading and unloading processes. Furthermore, a systematic analysis of critical velocities and failure modes under varying cargo loads further validates the robustness of this approach. Demonstrated within a labyrinthine microenvironment, this strategy enables programmable navigation, sequential cargo handling, and micrometer positional accuracy. This study provides an efficient solution for biomedical applications, including precise single-cell manipulation and targeted drug delivery.

13:40-13:45, Paper TuBT6.5
Compact R-X-Y Stage and Dual-Finger Micromanipulator under Inverted Optical Microscope for Microassembly

Pang, Jichao	Beijing Institute of Technology
Chen, Zhuo	Beijing Institute of Technology
Chen, Yan	Beijing Institute of Technology
Li, Yuke	Beijing Institute of Technology
Li, Yunsheng	Beijing Institute of Technology
Huang, Qiang	Beijing Institute of Technology
Arai, Tatsuo	University of Electro-Communications
Liu, Xiaoming	Beijing Institute of Technology
Keywords: Micro/Nano Robots, Assembly, Parallel Robots Abstract: Abstract — Microassembly plays an important role in fabricating complex structures with small basic components in industrial and biomedical fields. Inverted optical microscope could provide high-quality image feedback for microassembly with its continuously improving resolution. However, a compact stage capable of positioning and reorienting micro-objects while fitting within the limited space under an inverted optical microscope remains unavailable. This paper proposes a compact R-X-Y stage that can transport micro-objects over long distances in the X and Y directions, and reorient the objects by the 360-degree continuous rotation. Additionally, different from commonly putting the rotational stage on the X-Y stage, we mount the thin X-Y stage on a rotational stage. Thus, after aligning the centers of the visual field and rotational stage at the beginning, all the visiable micro-objects will not move out of the visual field during the rotation. We further integrate the R-X-Y stage and the dual-finger micromanipulator, and then use them to assemble 2-D patterns and complex 3-D micromachine. The obtained results and preliminary demonstration indicate that the proposed compact R-X-Y has great potential in assembling complex micromachines.

13:45-13:50, Paper TuBT6.6
SpongeBot: A Soft Magnetic Mini-Robot for Controlled Gastric Cell Sampling

Tian, Jiyuan	German Cancer Research Center
Chhaparwal, Nidhi	German Cancer Research Center
Jeong, Moonkwang	German Cancer Research Center (DKFZ)
Gao, Yue	German Cancer Research Center (DKFZ)
Müller, Ann-Sophia	German Cancer Research Center (DKFZ)
Zhang, Meng	German Cancer Research Center (DKFZ)
Bosch, Katharina	German Cancer Research Center
Nowicki-Osuch, Karol	German Cancer Research Center
Qiu, Tian	German Cancer Research Center (DKFZ)
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Medical Robots and Systems Abstract: Early detection of gastrointestinal (GI) cancer is critical for improving treatment outcomes and survival rates. Yet conventional endoscopic techniques remain invasive and labor-intensive, thus presenting significant challenges for cancer screening on large populations. Current commercially available sponge-based sampling devices are passive and limited in their reach to the esophagus, hindering comprehensive sampling in the stomach. Here, for the first time, we report the SpongeBot – a non-invasive soft mini-robot designed for active cell sampling in the upper GI tract, with a particular focus on the stomach. The SpongeBot integrates an open-cell sponge and a magnetic actuator, enabling precise and controlled sampling under a wireless external magnetic field. To accommodate the intricate anatomy of the stomach, the robot is capable of transitioning between two modes of motion — the navigation and the sampling mode, allowing trajectory control and targeted sampling at desired locations. Kinematic model is established to accurately represent the locomotion of the robot on wet mucosa surfaces. Pilot testing on ex vivo porcine stomachs is successfully performed with sufficient cells sampled for subsequent clinical laboratory testing. Histological analysis shows the sampling causes no detectable damage to the mucosa layer. SpongeBot has the potential as a cell sampling device for the upper GI tract to be deployed in primary care settings for cancer prevention.

13:50-13:55, Paper TuBT6.7
Enhanced Precession of a Magnetic Helical Microbot in a Viscoelastic Gel

Zhang, Meng	German Cancer Research Center (DKFZ)
Tan, Liyuan	Dresden University of Technology
Mariyanna, Jyothi Kumari	Division of Smart Technologies for Tumor Therapy, German Cancer
Jeong, Moonkwang	German Cancer Research Center (DKFZ)
Tian, Jiyuan	German Cancer Research Center
Müller, Ann-Sophia	German Cancer Research Center (DKFZ)
Qiu, Tian	German Cancer Research Center (DKFZ)
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Medical Robots and Systems Abstract: Magnetic helical micro-robots (microbots) have attracted strong interest due to their unique propulsion mechanisms and potential applications in biomedical fields, particularly in minimally-invasive surgical procedures. Earlier research primarily focused on studying helical microbots in viscous liquids, while their dynamic behavior in viscoelastic solids remains largely unexplored. Here, we present an experimental study of a helical microbot operating in a viscoelastic gelatin hydrogel. The robot is fabricated by two-photon polymerization and actuated by an external rotating magnetic field. We observe that in viscoelastic solids, the robot ruptures the gel and creates a three-dimensional (3D) helical trajectory, despite the rotational axis of the driving magnetic field being fixed. Largely distinct from the propulsion behavior in a Newtonian fluid, the precession angle of the helix is significantly enhanced in the viscoelastic gel and increases with a rising rotational frequency. A dynamic model is developed using the multipole expansion method, incorporating the gel’s complex viscosity and shear-thinning properties to capture the key characteristics of this dynamic response. These findings offer new insights into the behavior of helical microbots in viscoelastic media, expanding possible application scenarios of microbots in biomedicine.

13:55-14:00, Paper TuBT6.8
Long-Distance Delivery of Collective Cell Microrobots Driven by Mobile Magnetic Actuation System

Sun, Yimin	Souteast University
Cao, Ying	Southeast University
Zhang, Haoyu	Southeast University
Wang, Bin	Southeast University
Yang, Qijun	Southeast University
Cai, Mingxue	The Chinese University of Hong Kong (CUHK), Shatin NT, Hong Kong
Xu, Tiantian	Chinese Academy of Sciences
Wang, Qianqian	Southeast University
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Motion Control Abstract: Collective microrobots enable controlled batch delivery, showing promising application in the biomedical field. However, significant challenges remain in achieving long-distance delivery of collective microrobots in dynamic environments. This study proposes a magnetic actuation strategy for delivering collective cell microrobots in flowing conditions. The magnetic actuation method of multi-coil system is investigated, and a mobile magnetic actuation system with multiple coils coordination is developed to generate spatially isotropic magnetic fields. We conduct experiments on the delivery of collective cell microrobots under an average flow velocity of 8.84 mm/s.Experimental results demonstrate that the proposed actuation strategy enhances driving performance in dynamic environments, achieving long-distance delivery of collective cell microrobots (over 548 mm), with the delivery rate reaching up to 90% and 95% in upstream and downstream conditions, respectively.Our strategy provides an efficient control method for delivering collective microrobots, showing potential for biomedical applications.


TuBT7	307
Motion and Path Planning 2	Regular Session
Chair: Gao, Fei	Zhejiang University

13:20-13:25, Paper TuBT7.1
Planning Shorter Paths in Graphs of Convex Sets by Undistorting Parametrized Configuration Spaces

Garg, Shruti	Massachusetts Institute of Technology
Cohn, Thomas	Massachusetts Institute of Technology
Tedrake, Russ	Massachusetts Institute of Technology
Keywords: Motion and Path Planning, Collision Avoidance, Optimization and Optimal Control Abstract: Optimization based motion planning provides a useful modeling framework through various costs and constraints. Using Graph of Convex Sets (GCS) for trajectory optimization gives guarantees of feasibility and optimality by representing configuration space as the finite union of convex sets. Nonlinear parametrization can be used to extend this technique (to handle cases such as kinematic loops), but this distorts distances, such that solving with convex objectives will yield paths that are suboptimal in the original space. We present a method to extend GCS to nonconvex objectives, allowing us to "undistort" the optimization landscape while maintaining feasibility guarantees. We demonstrate our method's efficacy on three different robotic planning domains: a bimanual robot moving an object with both arms, the set of 3D rotations using Euler angles, and a rational parametrization of kinematics that enables certifying regions as collision free. Across the board, our method significantly improves path length and trajectory duration with only a minimal increase in runtime.

13:25-13:30, Paper TuBT7.2
Point Cloud-Based Control Barrier Functions for Model Predictive Control in Safety-Critical Navigation of Autonomous Mobile Robots

Liang, Faduo	South China University of Technology
Yang, Yunfeng	Guangdong Academy of Safety Production and Emergency Management
Dai, Shi-Lu	South China University of Technology
Keywords: Motion and Path Planning, Collision Avoidance, Vision-Based Navigation Abstract: In this work, we propose a novel motion planning algorithm to facilitate safety-critical navigation for autonomous mobile robots. The proposed algorithm integrates a real-time dynamic obstacle tracking and mapping system that categorizes point clouds into dynamic and static components. For dynamic point clouds, the Kalman filter is employed to estimate and predict their motion states. Based on these predictions, we extrapolate the future states of dynamic point clouds, which are subsequently merged with static point clouds to construct the forward-time-domain (FTD) map. By combining control barrier functions (CBFs) with nonlinear model predictive control, the proposed algorithm enables the robot to effectively avoid both static and dynamic obstacles. The CBF constraints are formulated based on risk points identified through collision detection between the predicted future states and the FTD map. Experimental results from both simulated and real-world scenarios demonstrate the efficacy of the proposed algorithm in complex environments. In simulation experiments, the proposed algorithm is compared with two baseline approaches, showing superior performance in terms of safety and robustness in obstacle avoidance. The source code is released for the reference of the robotics community.

13:30-13:35, Paper TuBT7.3
Any-Shape Real-Time Replanning Via Swept Volume SDF

Wang, Yijin	Huzhou Institute of Zhejiang University
Zhang, Tingrui	Zhejiang University
Zhang, Mengke	Zhejiang University
Ji, Shuhang	ZheJiang University
Li, Xiaoying	HEU
Gao, Fei	Zhejiang University
Keywords: Motion and Path Planning, Collision Avoidance, Whole-Body Motion Planning and Control Abstract: Existing robotic trajectory planning frameworks typically approximate the robot's geometry and environmental constraints. While this improves computational efficiency, it sacrifices the solution space and frequently encounters failure in confined environments. However, attaining a precise geometric representation and a continuous collision-free trajectory usually necessitates greater computational expenditure. This paper proposes a methodology that utilizes the concept of swept volume to address the identified limitations. The implementation of an efficient Swept Volume Signed Distance Field computation algorithm and a B-spline trajectory representation results in a significant increase in computational efficiency while maintaining strict safety guarantees. The proposed method combines the advantages of efficiency and maximal exploitation of the solution space. Additionally, it ensures continuous obstacle avoidance, achieving real-time 10Hz replanning performance on i50000 NUC11TNK for arbitrarily shaped rigid objects in complex, unstructured environments.

13:35-13:40, Paper TuBT7.4
Efficient Swept Volume-Based Trajectory Generation for Arbitrary-Shaped Ground Robot Navigation

Li, Yisheng	University of Hong Kong
Yin, Longji	The University of Hong Kong
Cai, Yixi	KTH Royal Institute of Technology
Liu, Jianheng	The University of Hong Kong
Zhu, Fangcheng	The University of Hong Kong
Ma, Mingpu	The University of Hong Kong
Liang, Siqi	The University of Hong Kong
Li, Haotian	The University of Hong Kong
Zhang, Fu	University of Hong Kong
Keywords: Motion and Path Planning, Collision Avoidance Abstract: Navigating an arbitrary-shaped ground robot safely in cluttered environments remains a challenging problem. The existing trajectory planners that account for the robot's physical geometry severely suffer from the intractable runtime. To achieve both computational efficiency and Continuous Collision Avoidance (CCA) of arbitrary-shaped ground robot planning, we proposed a novel coarse-to-fine navigation framework that significantly accelerates planning. In the first stage, a sampling-based method selectively generates distinct topological paths that guarantee a minimum inflated margin. In the second stage, a geometry-aware front-end strategy is designed to discretize these topologies into full-state robot motion sequences while concurrently partitioning the paths into SE(2) sub-problems and simpler mathbb{R}^2 sub-problems for back-end optimization. In the final stage, an SVSDF-based optimizer generates trajectories tailored to these sub-problems and seamlessly splices them into a continuous final motion plan. Extensive benchmark comparisons show that the proposed method is one to several orders of magnitude faster than the cutting-edge methods in runtime while maintaining a high planning success rate and ensuring CCA.

13:40-13:45, Paper TuBT7.5
Enforcing Temporal and Spatial Separation Constraints in Multi-Vehicle Trajectory Generation Problems Using a Bernstein Relaxation and Refinement Method

Sabetghadam, Bahareh	Instituto Superior Tecnico, Institute for Systems and Robotics
Cunha, Rita	Instituto Superior Tecnico
Pascoal, Antonio	Instituto Superior Tecnico
Keywords: Motion and Path Planning, Constrained Motion Planning, Autonomous Vehicle Navigation Abstract: Satisfying collision-avoidance constraints at all time instances along vehicles' trajectories is crucial to the success of a multi-vehicle mission. A common approach to handling collision-avoidance constraints in a trajectory generation problem is to check the constraints on some discrete points in time (or space). This approach, while being straightforward, cannot always guarantee that the generated trajectories are collision-free in between the points. On the other hand, most approaches for ensuring collision avoidance at all times can get overly conservative or computationally expensive. Furthermore, with these approaches, spatial deconfliction between trajectories can be very difficult, if not impossible, to enforce. In this paper, we parameterize trajectories with B'ezier curves and leverage the unique properties of these curves to propose a emph{Bernstein relaxation and refinement} method for evaluating temporal and spatial separation constraints in multi-vehicle trajectory generation problems. The proposed method can guarantee inter-vehicle collision avoidance at all times, while allowing for a flexible trade-off between the conservatism and the computational complexity of generating trajectories.

13:45-13:50, Paper TuBT7.6
Online Hierarchical Policy Learning Using Physics Priors for Robot Navigation in Unknown Environments

Chen, Weihan	Purdue University
Liu, Yuchen	Purdue University
Buynitsky, Alexiy	Purdue University
Qureshi, Ahmed H.	Purdue University
Keywords: Motion and Path Planning, Deep Learning Methods, Representation Learning Abstract: Robot navigation in large, complex, and unknown indoor environments is a challenging problem. The existing approaches, such as traditional sampling-based methods, struggle with resolution control and scalability, while imitation learning-based methods require a large amount of demonstration data. Active Neural Time Fields (ANTFields) have recently emerged as a promising solution by using local observations to learn cost-to-go functions without relying on demonstrations. Despite their potential, these methods are hampered by challenges such as spectral bias and catastrophic forgetting, which diminish their effectiveness in complex scenarios. To address these issues, our approach decomposes the planning problem into a hierarchical structure. At the high level, a sparse graph captures the environment’s global connectivity, while at the low level, a planner based on neural fields navigates local obstacles by solving the Eikonal PDE. This physics-informed strategy overcomes common pitfalls like spectral bias and neural field fitting difficulties, resulting in a smooth and precise representation of the cost landscape. We validate our framework in large-scale environments, demonstrating its enhanced adaptability and precision compared to previous methods, and highlighting its potential for online exploration, mapping, and real-world navigation.

13:50-13:55, Paper TuBT7.7
Accelerated Reeds-Shepp and Under-Specified Reeds-Shepp Algorithms for Mobile Robot Path Planning

Ibrahim, Ibrahim	KU Leuven
Decré, Wilm	Katholieke Universiteit Leuven
Swevers, Jan	KU Leuven
Keywords: Nonholonomic Motion Planning, Motion and Path Planning, Computational Geometry, Wheeled Robots Abstract: In this study, we present a simple and intuitive method for accelerating optimal Reeds-Shepp path computation. Our approach uses geometrical reasoning to analyze the behavior of optimal paths, resulting in a new partitioning of the state space and a further reduction in the minimal set of viable paths. We revisit and reimplement classic methodologies from the literature, which lack contemporary open-source implementations, to serve as benchmarks for evaluating our method. Additionally, we address the under-specified Reeds-Shepp planning problem where the final orientation is unspecified. We perform exhaustive experiments to validate our solutions. Compared to the modern C++ implementation of the original Reeds-Shepp solution in the Open Motion Planning Library, our method demonstrates a 15times speedup, while classic methods achieve a 5.79times speedup. Both approaches exhibit machine-precision differences in path lengths compared to the original solution. We release our proposed C++ implementations for both the accelerated and under-specified Reeds-Shepp problems as open-source code.

13:55-14:00, Paper TuBT7.8
Kino-PAX: Highly Parallel Kinodynamic Sampling-Based Planner

Perrault, Nicolas	University of Colorado at Boulder
Ho, Qi Heng	University of Colorado Boulder
Lahijanian, Morteza	University of Colorado Boulder
Keywords: Nonholonomic Motion Planning, Motion and Path Planning Abstract: Abstract— Sampling-based motion planners (SBMPs) are effective for planning with complex kinodynamic constraints in high-dimensional spaces, but they still struggle to achieve real-time performance, which is mainly due to their serial computation design. We present Kinodynamic Parallel Accelerated eXpansion (Kino-PAX), a novel highly parallel kinodynamic SBMP designed for parallel devices such as GPUs. Kino-PAX grows a tree of trajectory segments directly in parallel. Our key insight is how to decompose the iterative tree growth process into three massively parallel subroutines. Kino-PAX is designed to align with the parallel device execution hierarchies, through ensuring that threads are largely independent, share equal workloads, and take advantage of low-latency resources while minimizing high-latency data transfers and process synchronization. This design results in a very efficient GPU implementation. We prove that Kino-PAX is probabilistically complete and analyze its scalability with compute hardware improvements. Empirical evaluations demonstrate solutions in the order of 10 ms on a desktop GPU and in the order of 100 ms on an embedded GPU, representing up to 1000× improvement compared to coarse-grained CPU parallelization of state-of-the-art sequential algorithms over a range of complex environments and systems.


TuBT8	308
Medical Robots and Systems 2	Regular Session
Chair: Li, Xiang	Tsinghua University

13:20-13:25, Paper TuBT8.1
A Situation-Aware Autonomous Camera Alignment for Enhanced Suturing in Robot-Assisted Minimally Invasive Surgery

Iovene, Elisa	Politecnico Di Milano
Kossowsky Lev, Hanna	Ben Gurion University
Sharon, Yarden	Ben-Gurion University of the Negev
Netz, Uri	Soroka University Medical Center
Geftler, Alex	Ben Gurion University
Ferrigno, Giancarlo	Politecnico Di Milano
De Momi, Elena	Politecnico Di Milano
Nisky, Ilana	Ben Gurion University of the Negev
Keywords: Medical Robots and Systems, Robotics and Automation in Life Sciences, Telerobotics and Teleoperation Abstract: In robot-assisted minimally invasive surgery, optimal camera positioning is crucial for effective visualization and manipulation of tissues, which impacts the success of procedures. Traditional camera control can increase cognitive workload, lead to suboptimal camera viewpoints, and complicate surgical tasks. We propose an autonomous camera system that uses situational awareness and real-time prediction of user intent from kinematic data, and adjusts the camera position dynamically during a simulated suturing task. Such a system reduces the need to manually adjust the camera, allowing users to stay focused on the procedure. We demonstrated the framework in a user study with eight non-expert participants. They used the da Vinci Research Kit to control a simulated camera and instruments in a suturing task. We compared the performances in the suturing task with the autonomous and the teleoperated camera control. The autonomous system reduced execution time by 43% , shortened path length by 30% , and decreased completion cost by 35%. These results serve as a proof of concept for a situationally aware camera system and suggest that autonomous camera control can improve efficiency and simplify surgical workflow.

13:25-13:30, Paper TuBT8.2
Design and Integration of an Optical Frequency Domain Reflectometry (OFDR) Sensor with a Flexible Pedicle Screw for Biomechanical Evaluation

Kulkarni, Yash	The University of Texas at Austin
Tavangarifard, Mobina	The University of Texas at Austin
Amadio, Jordan P.	University of Texas Dell Medical School
Alambeigi, Farshid	University of Texas at Austin
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles, Engineering for Robotic Systems Abstract: Spinal fixation procedures rely on pedicle screws to stabilize the vertebral column, but conventional rigid pedicle screws (RPS) face challenges such as misplacement, pullout, and loosening, particularly in patients with low bone mineral density (BMD). To overcome these limitations, we recently proposed a flexible pedicle screw (FPS) inserted inside a J-shape trajectory drilled by a steerable drilling robot. Towards biomechanical evaluation of our proposed FPS for spinal fixation procedures, in this paper, we introduce the design, integration, calibration, and evaluation of an optical frequency domain reflectometry (OFDR) strain sensor into an FPS. This sensor-integrated FPS (Si-FPS) provides real-time strain and shape-sensing information, facilitating improved implant functionality assessment and optimization. To thoroughly evaluate the Si-FPS, we first additively manufacture a special FPS and integrate a OFDR shape sensing assembly within its structure. We then assess shape sensing performance of this sensorized FPS using static and dynamic FPS insertion experiments.

13:30-13:35, Paper TuBT8.3
A Da Vinci Open Spina Bifida Suturing Simulator with Continuum Tools for Surgeon Skills Training

Nimal, Nillan	University of Toronto
Law, Arion	University of Toronto
Lee, Connor Derrick	University of Toronto
Gondokaryono, Radian	University of Toronto
Drake, James	Hospital for Sick Children, University of Toronto
Van Mieghem, Tim	Sinai Health
Munawar, Adnan	Johns Hopkins University
Looi, Thomas	Hospital for Sick Children
Keywords: Medical Robots and Systems, Surgical Robotics: Laparoscopy, Telerobotics and Teleoperation Abstract: Open Spina Bifida (OSB) is a congenital neural tube defect that affects approximately 1 in 1000 births worldwide. Robotic in-utero OSB repair provides a minimally invasive alternative to open-surgery, which places significant strain on both baby and mother. Recent advancements in da Vinci miniature continuum tools reduce port sizes through the uterus for access to the fetus with lower maternal risk. However, idiosyncrasies in continuum tool behaviour further complicate an already difficult procedure. Consequently, a high-fidelity da Vinci OSB repair simulator is presented featuring continuum tools for surgeon skills training. The simulator incorporates a plugin for suture physics handling, soft body physics for deformable tissues and implements haptic virtual fixtures for improved situational awareness during suturing. Quantitative validation demonstrated virtual tool accuracy, with a mean-squared continuum backbone error of 0.64 mm² and system-level end-effector trajectory errors averaging 3.25 mm for a helix tracing task. During suturing, high-fidelity performance was maintained. Four expert surgeons from relevant specialties provided positive qualitative feedback, reporting that the simulator accurately replicates real tool control and offers a realistic and valuable training experience. Ultimately, the simulator shows promise as a training platform for safer robotic in-utero OSB repair and facilitating the adoption of novel continuum wristed tools in clinical settings.

13:35-13:40, Paper TuBT8.4
Surgical D-Knot: Augmented Dexterity for Tying Double Knots by Monitoring Optical Flow in Monocular Attention Windows

Chen, Ziyang	University of California at Berkeley, Berkeley
Hari, Kush	UC Berkeley
Dasari, Tanmayi	University of California, Berkeley
Shieh, Karen	University of California, Berkeley
Jain, Ria	University of California, Berkeley
Fer, Danyal	University of California, San Francisco East Bay
Guthart, Gary	Intuitive Surgical
Goldberg, Ken	UC Berkeley
Keywords: Medical Robots and Systems Abstract: Knot tying is a fundamental dexterous surgical subtask that is a key step in suturing. One challenge to robot augmentation is limited depth perception due to the small baseline of surgical endoscopic cameras. In this work, we present Surgical D-Knot: an augmented dexterity pipeline combining learned perception with model-based methods to perform surgical double knots using only one monocular RGB camera. This pipeline includes 2D grasp point identification, 3D suture thread grasping using local feature servoing, suture thread wrapping using relative motion and 2D re-grasp point identification. Human dexterity is required for initial thread setup and thread cutting after each double knot. Physical experiments with 120 double knot trials result in a success rate of 80.83% for the initial knot and 55.83% for the second knot. Translation of surgical knot tying to chicken skin results in success rates of 73.75% for the initial knot and 40% for the second knot. Each double knot requires on average 70 seconds. This is the first work to our knowledge that augments human dexterity for double knot tying.

13:40-13:45, Paper TuBT8.5
S3D: A Spatial Steerable Surgical Drilling Framework for Robotic Spinal Fixation Procedures

Maroufi, Daniyal	University of Texas at Austin
Huang, Xinyuan	The University of Texas at Austin
Kulkarni, Yash	The University of Texas at Austin
Rezayof, Omid	University of Texas at Austin
Sharma, Susheela	Vanderbilt University
Goggela, Vaibhav	The University of Texas at Austin
Amadio, Jordan P.	University of Texas Dell Medical School
Khadem, Mohsen	University of Edinburgh
Alambeigi, Farshid	University of Texas at Austin
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles Abstract: In this paper, we introduce S^{3}D: A Spatial Steerable Surgical Drilling Framework for Robotic Spinal Fixation Procedures. S^{3}D is designed to enable realistic steerable drilling while accounting for the anatomical constraints associated with vertebral access in spinal fixation (SF) procedures. To achieve this, we first enhanced our previously designed concentric tube Steerable Drilling Robot (CT-SDR) to facilitate steerable drilling across all vertebral levels of the spinal column. Additionally, we propose a four-Phase calibration, registration, and navigation procedure to perform realistic SF procedures on a spine holder phantom by integrating the CT-SDR with a seven-degree-of-freedom robotic manipulator. The functionality of this framework is validated through planar and out-of-plane steerable drilling experiments in vertebral phantoms.

13:45-13:50, Paper TuBT8.6
Closed-Loop Shape-Forming Control of a Magnetic Soft Continuum Robot

Francescon, Vittorio	University of Leeds
Murasovs, Nikita	University of Leeds
Lloyd, Peter Robert	University of Leeds
Onaizah, Onaizah	McMaster University
Chathuranga, Damith Suresh	University of Leeds
Valdastri, Pietro	University of Leeds
Keywords: Modeling, Control, and Learning for Soft Robots, Medical Robots and Systems Abstract: Continuum manipulators are frequently employed in endoluminal interventions, however, a lack of softness and dexterity in standard manipulators can risk trauma during navigation and limit reachable workspace. Magnetically Actuated Soft Continuum Robots (MSCRs) offer enhanced miniaturization potential and reduced rigidity due to their external actuation. Magnetizations pertaining only to the tip of the robot offer a limited range of deformation options where more versatile MSCRs can be embedded with distinct, lengthwise magnetization profiles. These full-body bespoke profiles allow the robots to form pre-determined shapes under actuation. Here we propose an approach to model and control MSCR behavior in closed-loop. We employ this system to achieve shape forming navigations subject to variations in initial conditions. To validate our methodology, we conduct experiments using a 50 mm long by 1.8 mm diameter MSCR navigating through a soft phantom from the tip of a duodenoscope. The proposed system is capable of rejecting variations in the angle at which the MSCR is inserted. We employed homogeneous magnetic fields for actuation and closed-loop vision-based control to manipulate the lengthwise body shape of our MSCR. The performance of this closed-loop approach is compared with an open loop counterpart, which fails in all but one navigation attempts into the pancreatic duct.

13:50-13:55, Paper TuBT8.7
Physics-Informed Residual Network for Magnetic Dipole Model Correction and High-Accuracy Localization

Shen, Miaozhang	Southern University of Science and Technology
Guo, Shuxiang	Southern University of Science and Technology
Chunying, Li	Southern University of Science and Technology
Wang, Zixu	Southern University of Science and Technology
Keywords: Medical Robots and Systems, Localization, Optimization and Optimal Control Abstract: The magnetic dipole model exhibits significant deviations from real-world sensor data due to neglected material nonlinearities and environmental interference. This paper proposed a Physics-Informed Residual Network (PIRNet) that adaptively corrected simulated magnetic field data by integrating dipole theory with deep residual learning. The network took a 5×5 triaxial magnetic matrix as input and employed a dual-branch architecture: a convolutional residual branch extracted local sensor-level distortion features, while a physics-encoding branch models systematic position and orientation-related deviations. A gated fusion mechanism dynamically combined these features, with a divergence-free constraint (∇·B=0) incorporated as a regularization term. The corrected data was processed through Levenberg-Marquardt (LM) optimization for pose estimation, with subsequent hybrid lookup table compensation combining distance-weighted trilinear interpolation for spatial coordinates and spherical linear interpolation (Slerp) for orientation vectors. Experimental results showed that the positioning error was reduced from 3.23 mm to 1.15 mm, the orientation error was reduced from 3.23° to 1.01°, and the average speed of magnet positioning reached 44.7ms frame. This approach provides a high-precision, low-cost sim-to-real transfer solution for magnetic navigation robots.

13:55-14:00, Paper TuBT8.8
Tactile-Guided Robotic Ultrasound: Mapping Preplanned Scan Paths for Intercostal Imaging

Zhang, Yifan	Technical University of Munich
Huang, Dianye	Technical University of Munich
Navab, Nassir	TU Munich
Jiang, Zhongliang	Technical University of Munich
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics, Sensor-based Control Abstract: Medical ultrasound (US) imaging is widely used in clinical examinations due to its portability, real-time capability, and radiation-free nature. To address inter- and intra-operator variability, robotic ultrasound systems have gained increasing attention. However, their application in challenging intercostal imaging remains limited due to the lack of an effective scan path generation method within the constrained acoustic window. To overcome this challenge, we explore the potential of tactile cues for characterizing subcutaneous rib structures as an alternative signal for ultrasound segmentation-free bone surface point cloud extraction. Compared to 2D US images, 1D tactile-related signals offer higher processing efficiency and are less susceptible to acoustic noise and artifacts. By leveraging robotic tracking data, a sparse tactile point cloud is generated through a few scans along the rib, mimicking human palpation. To robustly map the scanning trajectory into the intercostal space, the sparse tactile bone location point cloud is first interpolated to form a denser representation. This refined point cloud is then registered to an image-based dense bone surface point cloud, enabling accurate scan path mapping for individual patients. Additionally, to ensure full coverage of the object of interest, we introduce an automated tilt angle adjustment method to visualize structures beneath the bone. To validate the proposed method, we conducted comprehensive experiments on four distinct phantoms. The final scanning waypoint mapping achieved MNND and HD errors of 3.41 mm and 3.65 mm, respectively, while the reconstructed object beneath the bone had errors of 0.69 mm and 2.2 mm compared to the CT ground truth.


TuBT9	309
Semantic Scene Understanding: Sensor Fusion	Regular Session
Chair: Valada, Abhinav	University of Freiburg
Co-Chair: Wang, Wenshan	Carnegie Mellon University

13:20-13:25, Paper TuBT9.1
DiffSSC: Semantic LiDAR Scan Completion Using Denoising Diffusion Probabilistic Models

Cao, Helin	University of Bonn
Behnke, Sven	University of Bonn
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Computer Vision for Transportation Abstract: Perception systems play a crucial role in autonomous driving, incorporating multiple sensors and corresponding computer vision algorithms. 3D LiDAR sensors are widely used to capture sparse point clouds of the vehicle's surroundings. However, such systems struggle to perceive occluded areas and gaps in the scene due to the sparsity of these point clouds and their lack of semantics. To address these challenges, Semantic Scene Completion (SSC) jointly predicts unobserved geometry and semantics in the scene given raw LiDAR measurements, aiming for a more complete scene representation. Building on promising results of diffusion models in image generation and super-resolution tasks, we propose their extension to SSC by implementing the noising and denoising diffusion processes in the point and semantic spaces individually. To control the generation, we employ semantic LiDAR point clouds as conditional input and design local and global regularization losses to stabilize the denoising process. We evaluate our approach on autonomous driving datasets, and it achieves state-of-the-art performance for SSC, surpassing most existing methods.

13:25-13:30, Paper TuBT9.2
Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance

Zhao, Jiayi	Hunan University
Teng, Fei	Hunan University
Luo, Kai	Hunan University
Zhao, Guoqiang	Hunan University
Li, Zhiyong	HUNAN UNIVERSITY
Zheng, Xu	The Hong Kong University of Science and Technology
Yang, Kailun	Hunan University
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception Abstract: The perception capability of robotic systems relies on the richness of the dataset. Although Segment Anything Model 2 (SAM2), trained on large datasets, demonstrates strong perception potential in perception tasks, its inherent training paradigm prevents it from being suitable for RGB-T tasks. To address these challenges, we propose SHIFNet, a novel SAM2 driven Hybrid Interaction Paradigm that unlocks the potential of SAM2 with linguistic guidance for efficient RGB-Thermal perception. Our framework consists of two key components: (1) Semantic-Aware Cross-modal Fusion (SACF) module that dy namically balances modality contributions through text-guided affinity learning, overcoming SAM2’s inherent RGB bias; (2) Heterogeneous Prompting Decoder (HPD) that enhances global semantic information through a semantic enhancement module and then combined with category embeddings to amplify cross modal semantic consistency. With 32.27M trainable param eters, SHIFNet achieves state-of-the-art segmentation perfor mance on public benchmarks, reaching 89.8% on PST900 and 67.8% on FMB, respectively. The framework facilitates the adaptation of pre-trained large models to RGB-T segmentation tasks, effectively mitigating the high costs associated with data collection while endowing robotic systems with comprehensive perception capabilities.

13:30-13:35, Paper TuBT9.3
SORT3D: Spatial Object-Centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models

Zantout, Nader	Carnegie Mellon University
Zhang, Haochen	Carnegie Mellon University
Kachana, Pujith	Carnegie Mellon University
Qiu, Jinkai	Carnegie Mellon University
Chen, Guofei	Carnegie Mellon University
Zhang, Ji	Carnegie Mellon University
Wang, Wenshan	Carnegie Mellon University
Keywords: Semantic Scene Understanding, Vision-Based Navigation, AI-Enabled Robotics Abstract: Interpreting object-referential language and grounding objects in 3D with spatial relations and attributes is essential for robots operating alongside humans. However, this task is often challenging due to the diversity of scenes, large number of fine-grained objects, and complex free-form nature of language references. Furthermore, in the 3D domain, obtaining large amounts of natural language training data is difficult. Thus, it is important for methods to learn from little data and zero-shot generalize to new environments. To address these challenges, we propose SORT3D, an approach that utilizes rich object attributes from 2D data and merges a heuristics-based spatial reasoning toolbox with the ability of large language models (LLMs) to perform sequential reasoning. Importantly, our method does not require text-to-3D data for training and can be applied zero-shot to unseen environments. We show that SORT3D achieves state-of-the-art zero-shot performance on complex view-dependent grounding tasks on two benchmarks. We also implement the pipeline to run real-time on two autonomous vehicles and demonstrate that our approach can be used for object-goal navigation on previously unseen real-world environments. All source code for the system pipeline is publicly released.

13:35-13:40, Paper TuBT9.4
REACT: Real-Time Efficient Attribute Clustering and Transfer for Updatable 3D Scene Graph

Nguyen, Phuoc	Aalto University
Verdoja, Francesco	Aalto University
Kyrki, Ville	Aalto University
Keywords: Semantic Scene Understanding, Mapping, Object Detection, Segmentation and Categorization Abstract: Modern-day autonomous robots need high-level map representations to perform sophisticated tasks. Recently, 3D Scene Graphs (3DSG) have emerged as a promising alternative to traditional grid maps, blending efficient memory use and rich feature representation. However, most efforts to apply them have been limited to static worlds. This work introduces REACT, a framework that efficiently performs real-time attribute clustering and transfer to relocalize object nodes in a 3DSG. REACT employs a novel method for comparing object instances using an embedding model trained on triplet loss, facilitating instance clustering and matching. Experimental results demonstrate that REACT is able to relocalize objects while maintaining computational efficiency. The REACT framework's source code will be available as an open-source project, promoting further advancements in reusable and updatable 3DSGs.

13:40-13:45, Paper TuBT9.5
Hybrid Transformer-Mamba Model for 3D Semantic Segmentation

Wang, Xinyu	Huazhong University of Science and Technology
Hou, Jinghua	Huazhong University of Science and Technology
Liu, Zhe	Huazhong University of Science and Technology
Zhu, Yingying	Huazhong University of Science and Technology
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Deep Learning Methods Abstract: Transformer-based methods have demonstrated remarkable capabilities in 3D semantic segmentation through their powerful attention mechanisms, but the quadratic complexity limits their modeling of long-range dependencies in large-scale point clouds. While recent Mamba-based approaches offer efficient processing with linear complexity, they struggle with feature representation when extracting 3D features. However, effectively combining these complementary strengths remains an open challenge in this field. In this paper, we propose HybridTM, the first hybrid architecture that integrates Transformer and Mamba for 3D semantic segmentation. In addition, we propose the Inner Layer Hybrid Strategy, which combines attention and Mamba at a finer granularity, enabling simultaneous capture of long-range dependencies and fine-grained local features. Extensive experiments demonstrate the effectiveness and generalization of our HybridTM on diverse indoor and outdoor datasets. Furthermore, our HybridTM achieves state-of-the-art performance on ScanNet, ScanNet200, and nuScenes benchmarks. The code will be made available at https://github.com/deepinact/HybridTM.

13:45-13:50, Paper TuBT9.6
Open-Set LiDAR Panoptic Segmentation Guided by Uncertainty-Aware Learning

Mohan, Rohit	University of Freiburg
Hindel, Julia	University of Freiburg
Drews, Florian	Robert Bosch GmbH
Glaeser, Claudius	Robert Bosch GmbH
Cattaneo, Daniele	University of Freiburg
Valada, Abhinav	University of Freiburg
Keywords: Semantic Scene Understanding, Intelligent Transportation Systems, Deep Learning for Visual Perception Abstract: Autonomous vehicles that navigate in open-world environments may encounter previously unseen object classes. However, most existing LiDAR panoptic segmentation models rely on closed-set assumptions, failing to detect unknown object instances. In this work, we propose ULOPS, an uncertainty-guided open-set panoptic segmentation framework that leverages Dirichlet-based evidential learning to model predictive uncertainty. Our architecture incorporates separate decoders for semantic segmentation with uncertainty estimation, embedding with prototype association, and instance center prediction. During inference, we leverage uncertainty estimates to identify and segment unknown instances. To strengthen the model’s ability to differentiate between known and unknown objects, we introduce three uncertainty-driven loss functions. Uniform Evidence Loss to encourage high uncertainty in unknown regions. Adaptive Uncertainty Separation Loss ensures a consistent difference in uncertainty estimates between known and unknown objects at a global scale. Contrastive Uncertainty Loss refines this separation at the fine-grained level. To evaluate open-set performance, we extend benchmark settings on KITTI-360 and introduce a new open-set evaluation for nuScenes. Extensive experiments demonstrate that ULOPS consistently outperforms existing open-set LiDAR panoptic segmentation methods. We make the code and pre-trained models available at http://ulops.cs.uni-freiburg.de.

13:50-13:55, Paper TuBT9.7
DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation

Ge, Luzhou	Beijing Institute of Technology
Zhu, Xiangyu	Beijing Institute of Technology
Yang, Zhuo	Beijing Institute of Technology
Li, Xuesong	Beijing Institute of Technology
Keywords: Semantic Scene Understanding, RGB-D Perception Abstract: In real-world scenarios, environment changes caused by human or agent activities make it extremely challenging for robots to perform various long-term tasks. Recent works typically struggle to effectively understand and adapt to dynamic environments due to the inability to update their environment representations in memory in response to environment changes and lack of fine-grained reconstruction of the environments. To address these challenges, we propose DynamicGSG, a dynamic, high-fidelity, open-vocabulary scene graph construction system leveraging Gaussian Splatting. DynamicGSG builds hierarchical scene graphs using advanced vision language models to represent the spatial hierarchy and semantic relationships between objects in the environments, utilizes a joint feature loss to supervise Gaussian instance grouping while optimizing the Gaussian maps, and locally updates the Gaussian scene graphs according to real environment changes for long-term environment adaptation. Experiments and ablation studies demonstrate the performance and efficacy of our proposed method in terms of semantic segmentation, language-guided object retrieval, and reconstruction quality. In addition, we validate the dynamic updating capabilities of our system within real-world laboratory settings. The source code and supplementary materials will be available at: https://github.com/GeLuzhou/Dynamic-GSG.

13:55-14:00, Paper TuBT9.8
SDS-SLAM: VSLAM Fusing Static and Dynamic Semantic Information for Driving Scenarios (I)

Liu, Yang	Wuhan University
Guo, Chi	Wuhan University
Zhan, Jiao	Wuhan University
Wu, Xiaoyu	Wuhan University
Keywords: SLAM, Visual Tracking, Semantic Scene Understanding Abstract: Visual semantic SLAM integrates geometric measurements with semantic perception, making it widely applicable in autonomous driving and robotics. Semantic-assisted localization and dynamic object perception are two critical tasks in visual semantic SLAM. However, many existing state-of-the-art methods address only one of these tasks in isolation. To address issues of functional limitations and insufficient information utilization in a single framework, we propose a unified visual semantic SLAM framework, SDS-SLAM, which tightly couples static and dynamic semantic information to handle the motion estimation of both the camera and observed objects in driving scenarios. A multi-task network for driving perception is employed to extract semantic information, including drivable areas, lanes, and vehicles. Based on various information obtained, we propose semantic local ground manifolds (SLGMs) to represent the geometric structure and semantic features, enabling the online generation of a lightweight semantic map. Subsequently, we integrate SLGM-based constraints such as lane alignment and planar motion to promote camera and object pose estimation. We evaluated our method on the public KITTI dataset and self-collected real-world data. The results demonstrate that our method effectively perceives both dynamic and static semantic elements in driving scenarios, achieving high accuracy in estimating the poses of the camera and objects.


TuBT10	310
Semantic Scene Understanding	Regular Session
Co-Chair: Lagomarsino, Marta	Istituto Italiano Di Tecnologia

13:20-13:25, Paper TuBT10.1
SfmOcc: Vision-Based 3D Semantic Occupancy Prediction in Urban Environments

Marcuzzi, Rodrigo	University of Bonn
Nunes, Lucas	University of Bonn
Marks, Elias Ariel	University of Bonn
Wiesmann, Louis	University of Bonn
Läbe, Thomas	University of Bonn
Behley, Jens	University of Bonn
Stachniss, Cyrill	University of Bonn
Keywords: Semantic Scene Understanding, Deep Learning Methods Abstract: Semantic scene understanding is crucial for autonomous systems and 3D semantic occupancy prediction is a key task since it provides geometric and possibly semantic information of the vehicle’s surroundings. Most existing vision-based approaches to occupancy estimation rely on 3D voxel labels or segmented LiDAR point clouds for supervision. This limits their application to the availability of a 3D LiDAR sensor or the costly labeling of the voxels. While other approaches rely only on images for training, they usually supervise only with a few consecutive images and optimize for proxy tasks like volume reconstruction or depth prediction. In this paper, we propose a novel method for semantic occupancy prediction using only vision data also for supervision. We leverage all the available training images of a sequence and use bundle adjustment to align the images and estimate camera poses from which we obtain depth images. We compute semantic maps from a pre-trained open-vocabulary image model and generate occupancy pseudo labels to explicitly optimize for the 3D semantic occupancy prediction task. Without any manual or LiDAR-based labels, our approach predicts full 3D occupancy voxel grids and achieves state-of-the-art results for 3D occupancy prediction among methods trained without labels.

13:25-13:30, Paper TuBT10.2
Exploiting Information Theory for Intuitive Robot Programming of Manual Activities

Merlo, Elena	Italian Institute of Technology
Lagomarsino, Marta	Istituto Italiano Di Tecnologia
Lamon, Edoardo	University of Trento
Ajoudani, Arash	Istituto Italiano Di Tecnologia
Keywords: Semantic Scene Understanding, Learning from Demonstration, Manipulation Planning, Information Theory Abstract: Observational learning is a promising approach to enable people without expertise in programming to transfer skills to robots in a user-friendly manner, since it mirrors how humans learn new behaviors by observing others. Many existing methods focus on instructing robots to mimic human trajectories, but motion-level strategies often pose challenges in skills generalization across diverse environments. This article proposes a novel framework that allows robots to achieve a higher-level understanding of human-demonstrated manual tasks recorded in RGB videos. By recognizing the task structure and goals, robots generalize what observed to unseen scenarios. We found our task representation on Shannon’s Information Theory (IT), which is applied for the first time to manual tasks. IT helps extract the active scene elements and quantify the information shared between hands and objects. We exploit scene graph properties to encode the extracted interaction features in a compact structure and segment the demonstration into blocks, streamlining the generation of behavior trees for robot replicas. Experiments validated the effectiveness of IT to automatically generate robot execution plans from a single human demonstration. In addition, we provide HANDSOME, an open-source dataset of HAND Skills demOnstrated by Multi-subjEcts, to promote further research and evaluation in this field.

13:30-13:35, Paper TuBT10.3
SemRaFiner: Panoptic Segmentation in Sparse and Noisy Radar Point Clouds

Zeller, Matthias	CARIAD SE
Casado Herraez, Daniel	University of Bonn & CARIAD SE
Ayan, Bengisu	Technical University of Munich
Behley, Jens	University of Bonn
Heidingsfeld, Michael	CARIAD SE
Stachniss, Cyrill	University of Bonn
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Intelligent Transportation Systems Abstract: Semantic scene understanding, including the perception and classification of moving agents, is essential to enabling safe and robust driving behaviours of autonomous vehicles. Cameras and LiDARs are commonly used for semantic scene understanding. However, both sensor modalities face limitations in adverse weather and usually do not provide motion information. Radar sensors overcome these limitations and directly offer information about moving agents by measuring the Doppler velocity, but the measurements are comparably sparse and noisy. In this paper, we address the problem of panoptic segmentation in sparse radar point clouds to enhance scene understanding. Our approach, called SemRaFiner, accounts for changing density in sparse radar point clouds and optimizes the feature extraction to improve accuracy. Furthermore, we propose an optimized training procedure to refine instance assignments by incorporating a dedicated data augmentation. Our experiments suggest that our approach outperforms state-of-the-art methods for radar-based panoptic segmentation.

13:35-13:40, Paper TuBT10.4
CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes

Broedermann, Tim	ETH Zurich
Sakaridis, Christos	ETH Zurich
Fu, Yuqian	INSAIT
Van Gool, Luc	ETH Zurich
Keywords: Sensor Fusion, Semantic Scene Understanding, Computer Vision for Transportation Abstract: Leveraging multiple sensors is crucial for robust semantic perception in autonomous driving, as each sensor type has complementary strengths and weaknesses. However, existing sensor fusion methods often treat sensors uniformly across all conditions, leading to suboptimal performance. By contrast, we propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes. Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token that guides the fusion of multiple sensor modalities. We further newly introduce modality-specific feature adapters to align diverse sensor inputs into a shared latent space, enabling efficient integration with a single and shared pre-trained backbone. By dynamically adapting sensor fusion based on the actual condition, our model significantly improves robustness and accuracy, especially in adverse-condition scenarios. CAFuser ranks first on the public MUSES benchmarks, achieving 59.7 PQ for multimodal panoptic and 78.2 mIoU for semantic segmentation and also sets the new state of the art on DeLiVER. The source code is publicly available at: https://github.com/timbroed/CAFuser.

13:40-13:45, Paper TuBT10.5
Stereo-LiDAR Fusion by Semi-Global Matching with Discrete Disparity-Matching Cost and Semidensification

Yao, Yasuhiro	The University of Tokyo
Ishikawa, Ryoichi	The University of Tokyo
Oishi, Takeshi	The University of Tokyo
Keywords: Sensor Fusion, Computer Vision for Automation, Range Sensing Abstract: We present a real-time, nonlearning depth estimation method that fuses Light Detection and Ranging (LiDAR) data with stereo camera input. Our approach comprises three key techniques: Semi-Global Matching (SGM) stereo with Discrete Disparity-matching Cost (DDC), semidensification of LiDAR disparity, and a consistency check that combines stereo images and LiDAR data. Each of these components is designed for parallelization on a GPU to realize real-time performance. When it was evaluated on the KITTI dataset, the proposed method achieved an error rate of 2.79%, outperforming the previous state-of-the-art real-time stereo-LiDAR fusion method, which had an error rate of 3.05%. Furthermore, we tested the proposed method in various scenarios, including different LiDAR point densities and weather conditions, to demonstrate its high adaptability. We believe that the real-time and nonlearning nature of our method makes it highly practical for applications in robotics and automation.

13:45-13:50, Paper TuBT10.6
SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum

Low, JunEn	Stanford University
Adang, Maximilian	Stanford University
Yu, Javier	Stanford University
Nagami, Keiko	Stanford University
Schwager, Mac	Stanford University
Keywords: Sensorimotor Learning, Vision-Based Navigation, Aerial Systems: Perception and Autonomy Abstract: We propose a new simulator, training approach, and policy architecture, collectively called SOUS VIDE, for end-to-end visual drone navigation. Our trained policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only onboard perception and computation. Our simulator, called FiGS, couples a computationally simple drone dynamics model with a high visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly simulate drone flights producing photorealistic images at up to 130 fps. We use FiGS to collect 100k-300k image/state-action pairs from an expert MPC with privileged state and dynamics information, randomized over dynamics parameters and spatial disturbances. We then distill this expert MPC into an end-to-end visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net processes color image, optical flow and IMU data streams into low-level thrust and body rate commands at 20 Hz onboard a drone. Crucially, SV-Net includes a learned module for low-level control that adapts at runtime to variations in drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting or removing objects from the scene, and people moving aggressively through the drone's visual field. Code, data, and experiment videos can be found on our project page: https://stanfordmsl.github.io/SousVide/.

13:50-13:55, Paper TuBT10.7
ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion from Monocular Camera

Liang, Jing	University of Maryland
Yin, He	Amazon.com, Inc
Qi, Xuewei	Toyota Research Labs
Park, Jong Jin	Amazon Lab126
Sun, Min	National Tsing Hua University
Madhivanan, Rajasimman	Amazon.com
Manocha, Dinesh	University of Maryland
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Visual Learning Abstract: We introduce ET-Former, a novel end-to-end algorithm for semantic scene completion using a single monocular camera. Our approach generates a semantic occupancy map from single RGB observation while simultaneously providing uncertainty estimates for semantic predictions. By designing a triplane-based deformable attention mechanism, our approach improves geometric understanding of the scene than other SOTA approaches and reduces noise in semantic predictions. Additionally, through the use of a Conditional Variational AutoEncoder (CVAE), we estimate the uncertainties of these predictions. The generated semantic and uncertainty maps will help formulate navigation strategies that facilitate safe and permissible decision making in the future. Evaluated on the Semantic-KITTI dataset, ET-Former achieves the highest Intersection over Union (IoU) and mean IoU (mIoU) scores while maintaining the lowest GPU memory usage, surpassing state-of-the-art (SOTA) methods. It improves the SOTA scores of IoU from 44.71 to 51.49 and mIoU from 15.04 to 16.30 on SeamnticKITTI test, with a notably low training memory consumption of 10.9 GB. Project page: https://github.com/amazon-science/ET-Former


TuBT11	311A
Reinforcement Learning 2	Regular Session
Co-Chair: Keshavan, Jishnu	Indian Institute of Science

13:20-13:25, Paper TuBT11.1
MAVRL: Learn to Fly in Cluttered Environments with Varying Speed

Yu, Hang	Delft University of Technology
De Wagter, Christophe	Delft University of Technology
de Croon, Guido	TU Delft
Keywords: Reinforcement Learning, Collision Avoidance, Vision-Based Navigation Abstract: Autonomous flight in unknown, cluttered environments is still a major challenge in robotics. Existing obstacle avoidance algorithms typically adopt a fixed flight velocity, overlooking the crucial balance between safety and agility. We propose a reinforcement learning algorithm to learn an adaptive flight speed policy tailored to varying environment complexities, enhancing obstacle avoidance safety. A downside of learning-based obstacle avoidance algorithms is that the lack of a mapping module can lead to the drone getting stuck in complex scenarios. To address this, we introduce a novel training setup for the latent space that retains memory of previous depth map observations. The latent space is explicitly trained to predict both past and current depth maps. Our findings confirm that varying speed leads to a superior balance of success rate and agility in cluttered environments. Additionally, our memory-augmented latent representation outperforms the latent representation commonly used in reinforcement learning. Furthermore, an extensive comparison of our method with the existing state-of-the-art approaches Agile-autonomy and Ego-planner shows the superior performance of our approach, especially in highly cluttered environments. Finally, after minimal fine-tuning, we successfully deployed our network on a real drone for enhanced obstacle avoidance.

13:25-13:30, Paper TuBT11.2
Diverse Policy Learning Via Random Obstacle Deployment for Zero-Shot Adaptation

Choi, Seokjin	Seoul National University
Lee, Yonghyeon	MIT
Kim, Seungyeon	Seoul National University
Park, Che-Sang	Seoul National University
Hwang, Himchan	Seoul National University
Park, Frank	Seoul National University
Keywords: Reinforcement Learning, Representation Learning, Machine Learning for Robot Control Abstract: In this paper, we propose a novel reinforcement learning framework that enables zero-shot policy adaptation in environments with unseen, dynamically changing obstacles. Adopting the idea that learning a policy capable of generating diverse actions is key to achieving such adaptability, our primary contribution is a novel learning algorithm that incorporates random obstacle deployment, enabling the policy to explore and learn diverse actions. This method overcomes the limitations of existing diverse policy learning approaches, which primarily rely on mutual information maximization to increase diversity. To enable zero-shot dynamic adaptation, our method further involves two key components: a state-dependent latent skill sampler and a motion predictor. We sample multiple skill variables at each state using a skill sampler, then filter out unsafe skills using a motion predictor, and consequently execute actions corresponding to safe skills. Compared to existing methods, our experiments demonstrate that proposed method generates significantly more diverse actions and adapts better to dynamically changing environments, making it highly effective for tasks with varying constraints such as moving obstacles.

13:30-13:35, Paper TuBT11.3
Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robotic Navigation

Patel, Bhrij	UMD
Kulathun Mudiyanselage, Kasun Weerakoon	University of Maryland, College Park
Suttle, Wesley A.	DEVCOM ARL
Koppel, Alec	JP Morgan Chase
Sadler, Brian	Army Research Laboratory
Zhou, Tianyi	University of Maryland, College Park
Manocha, Dinesh	University of Maryland
Bedi, Amrit Singh	University of Maryland, College Park
Keywords: Reinforcement Learning, AI-Based Methods, Optimization and Optimal Control Abstract: Reinforcement learning (RL) offers a promising solution for robotic navigation, enabling robots to learn through trial and error. However, sparse reward settings, common in real-world robotic tasks, present a significant challenge by limiting exploration and leading to suboptimal policies. We introduce Confidence-Controlled Exploration (CCE), a novel method to improve sample efficiency in RL-based robotic navigation without modifying the reward function. Unlike existing techniques such as entropy regularization and reward shaping, which introduce instability by altering the reward, CCE dynamically adjusts trajectory length based on policy entropy, shortening trajectories when uncertainty is high to enhance exploration efficiency and extending them when confidence is high to prioritize exploitation. CCE is a principled and practical solution inspired by a theoretical connection between policy entropy and gradient estimation. It integrates seamlessly with on-policy and off-policy RL methods and requires minimal modifications. We validate CCE across REINFORCE, PPO, and SAC in both simulated and real-world navigation tasks. CCE outperforms fixed-trajectory and entropy-regularized baselines, achieving an 18% higher success rate, 20-38% shorter paths, and 9.32% lower elevation costs under a fixed training sample budget. Deploying CCE on a Clearpath Husky robot further demonstrates its effectiveness in complex outdoor environments.

13:35-13:40, Paper TuBT11.4
Disturbance Observer-Based Control Barrier Functions with Residual Model Learning for Safe Reinforcement Learning

Kalaria, Dvij	Carnegie Mellon University
Lin, Qin	University of Houston
Dolan, John M.	Carnegie Mellon University
Keywords: Reinforcement Learning, Machine Learning for Robot Control, Robust/Adaptive Control Abstract: Reinforcement learning (RL) agents need to explore their environment to learn optimal behaviors and achieve maximum rewards. However, exploration can be risky when training RL directly on real systems, while simulation-based training introduces the tricky issue of the sim-to-real gap. Recent approaches have leveraged safety filters, such as control barrier functions (CBFs), to penalize unsafe actions during RL training. However, the strong safety guarantees of CBFs rely on a precise dynamic model. In practice, uncertainties always exist, including internal disturbances from the errors of dynamics and external disturbances such as wind. In this work, we propose a novel safe RL framework built on a robust CBF, where the discrepancy between the nominal and true dynamic models is quantified through a combination of disturbance observation and residual model learning. We demonstrate our results on the Safety-gym benchmark for Point and Car robots on all tasks where we can outperform state-of-the-art approaches that use only residual model learning or a disturbance observer (DOB). We further validate the efficacy of our framework using a physical F1/10 racing car. Videos: https://sites.google.com/view/res-dob-cbf-rl

13:40-13:45, Paper TuBT11.5
Affordance-Guided Reinforcement Learning Via Visual Prompting

Lee, Olivia Y.	Stanford University
Xie, Annie	Stanford University
Fang, Kuan	Cornell University
Pertsch, Karl	UC Berkeley & Stanford University
Finn, Chelsea	Stanford University
Keywords: Reinforcement Learning, Continual Learning, Big Data in Robotics and Automation Abstract: Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existing learning-based approaches require significant data, such as human demonstrations of success and failure, to learn task-specific reward functions. Recently, there is also a growing adoption of large multi-modal foundation models for robotics that can perform visual reasoning in physical contexts and generate coarse robot motions for manipulation tasks. Motivated by this range of capability, in this work, we present Keypoint-based Affordance Guidance for Improvements (KAGI), a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL. State-of-the-art VLMs have demonstrated impressive zero-shot reasoning about affordances through keypoints, and we use these to define dense rewards that guide autonomous robotic learning. On diverse real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 30K online fine-tuning steps. Additionally, we demonstrate the robustness of KAGI to reductions in the number of in-domain demonstrations used for pre-training, reaching similar performance in 45K online fine-tuning steps.

13:45-13:50, Paper TuBT11.6
A Safety-Adjusted Policy Optimization Algorithm and Application for Obstacle Avoidance in the Quadcopter

Xia, Gang	Sichuan University
Yang, Xinsong	Sichuan University
Qi, Qihan	Sichuan University
Sun, Yaping	Sichuan University
Dong, Xiwang	Beihang University
Keywords: Reinforcement Learning, Collision Avoidance Abstract: Ensuring the safety of various real-world applications based on reinforcement learning (RL), such as quadcopter control, robotic manipulators, and autonomous robots, remains a critical challenge, despite RL’s remarkable success in solving complex decision-making tasks. Existing on-policy Lagrangian optimization methods in safe RL typically use a single policy to balance the trade-off between safety and return without taking the potential benefits of adopting multiple policies into account. In this paper, a new on-policy method is proposed, named Safe-Adjusted Policy Optimization(SAPO), which is a dual-policy framework designed to address safety constraint violations in RL. By incorporating a cost-oriented policy to dynamically adjust a reward-oriented policy, the SAPO effectively resolves the trade-off between safety and return. Moreover, to enhance performance in carrying out highdimensional tasks, the Kullback-Leibler (KL) divergence and a Gaussian kernel are employed in the distance functions to facilitate the training. In addition, a quadcopter-safe-navigation task is designed to overcome the drawback of previous research on quadcopter-safe-navigation with RL that only pays attention to reward function design without considering policy-level optimization. Finally, experimental results verify the feasibility of the designed task. Meanwhile, indicated by the test on real device, the proposed algorithm is easy to be implemented, offers performance guarantees, and outperforms existing safe RL baselines.

13:50-13:55, Paper TuBT11.7
Dynamics-Invariant Quadrotor Control Using Scale-Aware Deep Reinforcement Learning

Vaidya, Varad	Indian Institute of Science
Keshavan, Jishnu	Indian Institute of Science
Keywords: Reinforcement Learning, Aerial Systems: Mechanics and Control, Robust/Adaptive Control Abstract: Due to dynamic variations such as changing payload, aerodynamic disturbances, and varying platforms, a robust solution for quadrotor trajectory tracking remains challenging. To address these challenges, we present a deep reinforcement learning (DRL) framework that achieves physical dynamics invariance by directly optimizing force/torque inputs, eliminating the need for traditional intermediate control layers. Our architecture integrates a temporal trajectory encoder, which processes finite-horizon reference positions/velocities, with a latent dynamics encoder trained on historical state-action pairs to model platform-specific characteristics. Additionally, we introduce scale-aware dynamics randomization parameterized by the quadrotor's arm length, enabling our approach to maintain stability across drones spanning from 30g to 2.1kg and outperform other DRL baselines by 85% in tracking accuracy. Extensive real-world validation of our approach on the Crazyflie 2.1 quadrotor, encompassing over 200 flights, demonstrates robust adaptation to wind, ground effects, and swinging payloads while achieving less than 0.05m RMSE at speeds up to 2.0 m/s. This work introduces a universal quadrotor control paradigm that compensates for dynamic discrepancies across varied conditions and scales, paving the way for more resilient aerial systems.

13:55-14:00, Paper TuBT11.8
Learning Robust and Flexible Locomotion of Wheel-Legged Quadruped Robots in Complex Terrains

Zhou, Shiyu	Shanghai Jiao Tong University
Liu, Shaoxun	Zhejiang University
Wang, Rongrong	Zhejiang University
Keywords: Reinforcement Learning, AI-Enabled Robotics, Model Learning for Control Abstract: The wheel-legged quadruped robot, equipped with leg and end-wheel structures, possesses the capability to traverse continuous surfaces at relatively high speeds while also being able to navigate unstructured terrains. However, designing its controller using traditional methods presents significant challenges, particularly under conditions of limited or lost external environmental perception and highly variable terrain complexity. In light of this issue, this paper proposes a novel, concise, and effective reinforcement learning framework. The framework employs an asymmetric actor-critic structure incorporating a velocity estimation network and leverages multi-contact states generated by a central pattern generator for fusion, thereby training a single control policy to address the robust and flexible traversal of complex terrains by wheel-legged robots relying solely on an inertial measurement unit and joint sensors. Our method enables the modified Unitree Go1-based wheel-legged robot to traverse various challenging terrains, such as steps, high obstacles, rough terrain, and low-adhesion surfaces, while ensuring efficient locomotion performance on smooth and continuous surfaces. The effectiveness of the framework's training results has been validated through testing in both simulation and real-world environments.


TuBT12	311B
RGB-D Perception 2	Regular Session
Chair: Chen, Xieyuanli	National University of Defense Technology
Co-Chair: Chen, Long	Chinese Academy of Sciences

13:20-13:25, Paper TuBT12.1
Self-Supervised Monocular Depth Estimation for Dynamic Objects with Ground Propagation

Li, Huan	Bologna University
Poggi, Matteo	University of Bologna
Tosi, Fabio	University of Bologna
Mattoccia, Stefano	University of Bologna
Keywords: RGB-D Perception Abstract: Self-supervised single-view depth estimation, trained on video sequences, faces significant challenges when dynamic objects are present in the training data, as they violate the basic multi-view geometry assumptions used to compute photometric losses. We propose a novel approach that leverages the relationship between the depth of moving objects and their ground contact points. By iteratively propagating ground features to moving targets in perceptual layers, we recalibrate the depth of dynamic entities while preserving details. Our method maintains the end-to-end training paradigm without additional networks or complex training procedures. Our experiments demonstrate that our method achieves state-of-the-art performance when estimating depth for dynamic objects and attains superior generalization compared to existing approaches.

13:25-13:30, Paper TuBT12.2
WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation

Nie, Dujun	Institute of Automation, Chinese Academy of Sciences
Guo, Xianda	School of Computer Science, Wuhan University
Duan, Yiqun	University of Technolgoy Sydney
Zhang, Ruijun	Institute of Automation，Chinese Academy of Sciences
Chen, Long	Chinese Academy of Sciences
Keywords: RGB-D Perception, Computer Vision for Automation, Vision-Based Navigation Abstract: Object Goal Navigation-requiring an agent to locate a specific object in an unseen environment-remains a core challenge in embodied AI. Although recent progress in Vision-Language Model (VLM)-based agents has demonstrated promising perception and decision-making abilities through prompting, none has yet established a fully modular world model design that reduces risky and costly interactions with the environment by predicting the future state of the world. We introduce WMNav, a novel World Model-based Navigation framework powered by Vision-Language Models (VLMs). It predicts possible outcomes of decisions and builds memories to provide feedback to the policy module. In order to retain the predicted state of the environment, WMNav proposes the online maintained Curiosity Value Map as part of the world model memory to provide dynamic configuration for navigation policy. By decomposing according to a human-like thinking process, WMNav effectively alleviates the impact of model hallucination by making decisions based on the feedback difference between the world model plan and observation. To further boost efficiency, we implement a two-stage action proposer strategy: broad exploration followed by precise localization. Extensive evaluation on HM3D and MP3D validates WMNav surpasses existing zero-shot benchmarks in both success rate and exploration efficiency (absolute improvement: +3.2% SR and +3.2% SPL on HM3D, +13.5% SR and +1.1% SPL on MP3D). Project page: https://b0b8k1ng.github.io/WMNav/.

13:30-13:35, Paper TuBT12.3
Efficient Prediction of Dense Visual Embeddings Via Distillation and RGB-D Transformers

Fischedick, Söhnke Benedikt	Ilmenau University of Technology
Seichter, Daniel	Ilmenau University of Technology
Stephan, Benedict	Ilmenau University of Technology
Schmidt, Robin	Ilmenau University of Technology
Gross, Horst-Michael	Ilmenau University of Technology
Keywords: Deep Learning Methods, RGB-D Perception, Representation Learning Abstract: In domestic environments, robots require a comprehensive understanding of their surroundings to interact effectively and intuitively with untrained humans. In this paper, we propose DVEFormer - an efficient RGB-D Transformer-based approach that predicts dense text-aligned visual embeddings (DVE) via knowledge distillation. Instead of directly performing classical semantic segmentation with fixed predefined classes, our method uses teacher embeddings from Alpha-CLIP to guide our efficient student model DVEFormer in learning fine-grained pixel-wise embeddings. While this approach still enables classical semantic segmentation, e.g., via linear probing, it further enables flexible text-based querying and other applications, such as creating comprehensive 3D maps. Evaluations on common indoor datasets demonstrate that our approach achieves competitive performance while meeting real-time requirements, operating at 26.3 FPS for the full model and 77.0 FPS for a smaller variant on an NVIDIA Jetson AGX Orin. Additionally, we show qualitative results that highlight the effectiveness and possible use cases in real-world applications. Overall, our method serves as a drop-in replacement for traditional segmentation approaches while enabling flexible natural-language querying and seamless integration into 3D mapping pipelines for mobile robotics.

13:35-13:40, Paper TuBT12.4
Novel Diffusion Models for Multimodal 3D Hand Trajectory Prediction

Ma, Junyi	Beijing Institute of Technology
Bao, Wentao	Rochester Institute of Technology
Xu, Jingyi	Shanghai Jiao Tong University
Sun, Guanzhong	China University of Mining and Technology
Chen, Xieyuanli	National University of Defense Technology
Wang, Hesheng	Shanghai Jiao Tong University
Keywords: RGB-D Perception, Visual Learning Abstract: Predicting hand motion is critical for understanding human intentions and bridging the action space between human movements and robot manipulations. Existing hand trajectory prediction (HTP) methods forecast the future hand waypoints in 3D space conditioned on past egocentric observations. However, such models are only designed to accommodate 2D egocentric video inputs. There is a lack of awareness of multimodal environmental information from both 2D and 3D observations, hindering the further improvement of 3D HTP performance. In addition, these models overlook the synergy between hand movements and headset camera egomotion, either predicting hand trajectories in isolation or encoding egomotion only from past frames. To address these limitations, we propose novel diffusion models (MMTwin) for multimodal 3D hand trajectory prediction. MMTwin is designed to absorb multimodal information as input encompassing 2D RGB images, 3D point clouds, past hand waypoints, and text prompt. Besides, two latent diffusion models, the egomotion diffusion and the HTP diffusion as twins, are integrated into MMTwin to predict camera egomotion and future hand trajectories concurrently. We propose a novel hybrid Mamba-Transformer module as the denoising model of the HTP diffusion to better fuse multimodal features. The experimental results on three publicly available datasets and our self-recorded data demonstrate that our proposed MMTwin can predict plausible future 3D hand trajectories compared to the state-of-the-art baselines, and generalizes well to unseen environments. The code and pretrained models will be released at https://github.com/IRMVLab/MMTwin.

13:45-13:50, Paper TuBT12.6
Plane Detection and Ranking Via Model Information Optimisation

Zhong, Daoxin	University of Oxford
Li, Jun	Institute for Infocomm Research
Chuah, Meng Yee (Michael)	Agency for Science, Technology and Research (A*STAR)
Keywords: RGB-D Perception, Calibration and Identification, Probability and Statistical Methods Abstract: Plane detection from depth images is a crucial subtask with broad robotic applications, often accomplished by iterative methods such as Random Sample Consensus (RANSAC). While RANSAC is a robust strategy with strong probabilistic guarantees, the ambiguity of its inlier threshold criterion makes it susceptible to false positive plane detections. This issue is particularly prevalent in complex real-world scenes, where the true number of planes is unknown and multiple planes coexist. In this paper, we aim to address this limitation by proposing a generalised framework for plane detection based on model information optimization. Building on previous works, we treat the observed depth readings as discrete random variables, with their probability distributions constrained by the ground truth planes. Various models containing different candidate plane constraints are then generated through repeated random sub-sampling to explain our observations. By incorporating the physics and noise model of the depth sensor, we can calculate the information for each model, and the model with the least information is accepted as the most likely ground truth. This information optimization process serves as an objective mechanism for determining the true number of planes and preventing false positive detections. Additionally, the quality of each detected plane can be ranked by summing the information reduction of inlier points for each plane. We validate these properties through experiments with synthetic data and find that our algorithm estimates plane parameters more accurately compared to the default Open3D RANSAC plane segmentation. Furthermore, we accelerate our algorithm by partitioning the depth map using neural network segmentation, which enhances its ability to generate more realistic plane parameters in real-world data.

13:55-14:00, Paper TuBT12.8
DynamicPose: Real-Time and Robust 6D Object Pose Tracking for Fast-Moving Cameras and Objects

Liang, Tingbang	Xi'an Jiaotong University
Zeng, Yixin	Sun Yat-Sen University
Xie, Jiatong	Sun Yat-Sen University
Zhou, Boyu	Southern University of Science and Technology
Keywords: RGB-D Perception, Visual Tracking, Perception for Grasping and Manipulation Abstract: We present DynamicPose, a retraining-free 6D pose tracking framework that improves tracking robustness in highly dynamic scenarios. Previous work is mainly applicable to static or quasi-static scenes, and its performance significantly deteriorates when both the object and the camera move rapidly. To overcome these challenges, we propose three synergistic components: (1) A visual-inertial odometry compensates for the shift in the Region of Interest (ROI) caused by camera motion; (2) A depth-informed 2D tracker corrects ROI deviations caused by large object translation; (3) A VIO-guided Kalman filter predicts object rotation, generates multiple candidate poses, and then obtains the final pose by hierarchical refinement. The 6D pose tracking results guide subsequent 2D tracking and Kalman filter updates, forming a closed-loop system that ensures accurate pose initialization and precise pose tracking. Simulation and real-world experiments demonstrate the effectiveness of our method, achieving real-time and robust 6D pose tracking for fast-moving cameras and objects.


TuBT13	311C
Deep Learning for Visual Perception 2	Regular Session
Chair: Wang, Shuai	Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences

13:20-13:25, Paper TuBT13.1
Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory

Kou, Wei-Bin	The University of Hong Kong
Lin, Qingfeng	The University of HongKong
Tang, Ming	Southern University of Science and Technology
Lei, Jingreng	The University of Hong Kong
Wang, Shuai	Shenzhen Institute of Advanced Technology, Chinese Academy of Sc
Ye, Rongguang	Southern University of Science and Technology
Zhu, Guangxu	Shenzhen Research Institute of Big Data
Wu, Yik-Chung	The University of Hong Kong
Keywords: Deep Learning Methods, Deep Learning for Visual Perception, Autonomous Agents Abstract: To improve the generalization of the autonomous driving (AD) perception model, vehicles need to update the model over time based on the continuously collected data. As time progresses, the amount of data fitted by the AD model expands, which helps to improve the AD model generalization substantially. However, such ever-expanding data is a double-edged sword for the AD model. Specifically, as the fitted data volume grows to exceed the AD model's fitting capacities, the AD model is prone to under-fitting. To address this issue, we propose to use a pretrained Large Vision Models (LVMs) as backbone coupled with downstream perception head to understand AD semantic information. This design can not only surmount the aforementioned under-fitting problem due to LVMs' powerful fitting capabilities, but also enhance the perception generalization thanks to LVMs' vast and diverse training data. On the other hand, to mitigate vehicles' computational burden of training the perception head while running LVM backbone, we introduce a Posterior Optimization Trajectory (POT)-Guided optimization scheme (POTGui) to accelerate the convergence. Concretely, we propose a POT Generator (POTGen) to generate posterior (future) optimization direction in advance to guide the current optimization iteration, through which the model can generally converge within 10 epochs. Extensive experiments demonstrate that the proposed method improves the performance by over 66.48% and converges faster over 6 times, compared to the existing state-of-the-art approaches.

13:25-13:30, Paper TuBT13.2
Self-Supervised Diffusion-Based Scene Flow Estimation and Motion Segmentation with 4D Radar

Liu, Yufei	National University of Defense Technology
Chen, Xieyuanli	National University of Defense Technology
Wang, Neng	National University of Defense Technology
Andreev, Stepan	Moscow Institute of Physics and Technology
Dvorkovich, Alexander	Moscow Institute of Physics and Technology
Fan, Rui	Tongji University
Lu, Huimin	National University of Defense Technology
Keywords: Deep Learning for Visual Perception, Deep Learning Methods Abstract: Scene flow estimation (SFE) and motion segmentation (MOS) using 4D radar are emerging yet challenging tasks in robotics and autonomous driving applications. Existing LiDAR- or RGB-D-based point cloud processing methods often deliver suboptimal performance on radar data due to radar signals' highly sparse, noisy, and artifact-prone nature. Furthermore, for radar-based SFE and MOS, the lack of annotated datasets further aggravates these challenges. To address these issues, we propose a novel self-supervised framework that exploits denoising diffusion models to effectively handle radar noise inputs and predict point-wise scene flow and motion status simultaneously. To extract key features from the raw input, we design a transformer-based feature encoder tailored to address the sparsity of 4D radar data. Additionally, we generate self-supervised segmentation signals by exploiting the discrepancy between robust rigid ego-motion estimates and scene flow predictions, thereby eliminating the need for manual annotations. Experimental evaluations on the View-of-Delft (VoD) dataset and TJ4DRadSet demonstrate that our method achieves state-of-the-art performance for both radar-based SFE and MOS. The code and pre-trained weights of our method will be released at https://github.com/nubot-nudt/RadarSFEMOS.

13:30-13:35, Paper TuBT13.3
Motion-Feat: Motion Blur-Aware Local Feature Description for Image Matching

Gao, Ye	Beihang University
Zhang, Dongshuo	Nanyang Technological University
Yu, Xiaolong	Hangzhou Innovation Institute, Beihang University
Gao, Qing	Beihang University
Xu, Zhijun	Hangzhou Hikrobot Co., Ltd
Lam, Siew Kei	Nanyang Technological University
Lv, Jinhu	Beihang University
Keywords: Deep Learning for Visual Perception, Visual Learning, Deep Learning Methods Abstract: Local feature description is crucial for robotic tasks, yet existing methods struggle with motion blur, a prevalent challenge in high-dynamic and low-light environments. While effective on sharp images, they suffer significant degradation under blur. To address this issue, we propose Motion-Feat, an end-to-end motion blur-aware feature description method. Our approach introduces a Motion Deformable Block (MDB) that adaptively adjusts the receptive field based on pixel-wise motion information at different stages of the network, enhancing multi-scale feature descriptor robustness in blurred conditions. Additionally, we construct synthetic blurred datasets to systematically benchmark feature matching performance across varying blur intensities. Extensive experiments demonstrate that Motion-Feat outperforms state-of-the-art methods on blurred images while maintaining competitive performance on sharp images for relative camera pose estimation and homography estimation tasks.Both code and datasets are available at https://github.com/AndreGao08/Motion-Feat.

13:35-13:40, Paper TuBT13.4
BoRe-Depth: Self-Supervised Monocular Depth Estimation with Boundary Refinement for Embedded Systems

Liu, Chang	Beijing Institute of Technology
Li, Juan	Beijing Institute of Technology
Zhang, Sheng	Beijing Institute of Technology
Liu, Chang	BIT
Li, Jie	Beijing Institute of Technology
Zhang, Xu	Beijing Institute of Technology
Keywords: Range Sensing, Deep Learning for Visual Perception, Aerial Systems: Perception and Autonomy Abstract: Depth estimation is one of the key technologies for realizing 3D perception in unmanned systems. Monocular depth estimation has been widely researched because of its low-cost advantage, but the existing methods face the challenges of poor depth estimation performance and blurred object boundaries on embedded systems. In this paper, we propose a novel monocular depth estimation model, BoRe-Depth, which contains only 8.7M parameters. It can accurately estimate depth maps on embedded systems and significantly improves boundary quality. Firstly, we design an Enhanced Feature Adaptive Fusion Module (EFAF) which adaptively fuses depth features to enhance boundary detail representation. Secondly, we integrate semantic knowledge into the encoder to improve the object recognition and boundary perception capabilities. Finally, BoRe-Depth is deployed on NVIDIA Jetson Orin, and runs efficiently at 50.7 FPS. We demonstrate that the proposed model significantly outperforms previous lightweight models on multiple challenging datasets, and we provide detailed ablation studies for the proposed methods. The code is available at https://github.com/liangxiansheng093/BoRe-Depth.

13:40-13:45, Paper TuBT13.5
MambaSFLNet: A Mamba-Based Model for Low-Light Image Enhancement with Spatial and Frequency Features

Liu, Mingyu	Technical University of Munich
Cui, Yuning	Technical University of Munich
Strand, Leah	Technical University of Munich
Zhou, Xingcheng	Technical University of Munich
Zhang, Jiajie	Technical University of Munich
Knoll, Alois	Tech. Univ. Muenchen TUM
Keywords: Deep Learning for Visual Perception, Deep Learning Methods Abstract: Low-light image enhancement (LLIE) aims to enhance the illumination of images that are captured under dark conditions, which is critical for various applications in dim environments, such as robotics and autonomous driving. Existing convolutional neural network (CNN)-based methods usually struggle to capture long-range dependencies, while transformer-based methods, despite their effectiveness, are resource-consuming. Besides, the frequency domain includes important lightness degradation information. To this end, we propose a Mamba-based framework called MambaSFLNet to effectively address LLIE by integrating spatial and frequency features. Our approach utilizes the Visual State Space Module to establish relationships across different regions of the input image while maintaining low model complexity. Furthermore, The spatial module not only balances illumination distribution but also suppresses noise and artifacts during enhancement. In addition, the frequency module enhances image contrast and sharpness by leveraging frequency-domain information. Extensive experiments on nine widely used benchmarks demonstrate that our approach achieves superior performance and exhibits strong generalization capabilities compared to existing methods.

13:45-13:50, Paper TuBT13.6
CoCMT: Communication-Efficient Cross-Modal Transformer for Collaborative Perception

Wang, Rujia	Texas A&M University
Gao, Xiangbo	Texas A&M University
Xiang, Hao	University of California, Los Angeles
Xu, Runsheng	UCLA
Tu, Zhengzhong	Texas A&M University
Keywords: Deep Learning for Visual Perception, Intelligent Transportation Systems, Multi-Robot Systems Abstract: Multi-agent collaborative perception enhances each agent’s perceptual capabilities by sharing sensing infor- mation to cooperatively perform robot perception tasks. This approach has proven effective in addressing challenges such as sensor deficiencies, occlusions, and long-range perception. However, existing representative collaborative perception sys- tems transmit intermediate feature maps, such as bird’s-eye view (BEV) representations, which contain a significant amount of non-critical information, leading to high communication bandwidth requirements. To enhance communication efficiency while preserving perception capability, we introduce CoCMT, an object-query-based collaboration framework that optimizes communication bandwidth by selectively extracting and trans- mitting essential features. Within CoCMT, we introduce the Efficient Query Transformer (EQFormer) to effectively fuse multi-agent object queries and implement a synergistic deep supervision to enhance the positive reinforcement between stages, leading to improved overall performance. Experiments on OPV2V and V2V4Real datasets show CoCMT outperforms state-of-the-art methods while drastically reducing communi- cation needs. On V2V4Real, our model (Top-50 object queries) requires only 0.416 Mb bandwidth—83 times less than SOTA methods—while improving AP@70 by 1.1%. This efficiency breakthrough enables practical collaborative perception de- ployment in bandwidth-constrained environments without sac- rificing detection accuracy. The code and models are open- sourced through the following link: https://github.com/taco- group/COCMT.

13:50-13:55, Paper TuBT13.7
NeuFlow-V2: Push High-Efficiency Optical Flow to the Limit

Zhang, Zhiyong	Northeastern University
Gupta, Aniket	Northeastern University
Jiang, Huaizu	Northeastern University
Singh, Hanumant	Northeatern University
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation Abstract: Real-time high-accuracy optical flow estimation is critical for a variety of real-world robotic applications. However, current learning-based methods often struggle to balance accuracy and computational efficiency: methods that achieve high accuracy typically demand substantial processing power, while faster approaches tend to sacrifice precision. These fast approaches specifically falter in their generalization capabilities and do not perform well across diverse real-world scenarios. In this work, we revisit the limitations of the SOTA methods and present NeuFlow-V2, a novel method that offers both — high accuracy in real-world datasets coupled with low computational overhead. In particular, we introduce a novel light-weight backbone and a fast refinement module to keep computational demands tractable while delivering accurate optical flow. Experimental results on synthetic and real-world datasets demonstrate that NeuFlow-V2 provides similar accuracy to SOTA methods while achieving 10x-70x speedups. It is capable of running at over 20 FPS on 512x384 resolution images on a Jetson Orin Nano. The full training and evaluation code is available at url{https://github.com/neufieldrobotics/NeuFlow_v2}.

13:55-14:00, Paper TuBT13.8
Non-Overlap-Aware Egocentric Pose Estimation for Collaborative Perception in Connected Autonomy

Huang, Hong	North Carolina State University
Xu, Dongkuan	North Carolina State University
Zhang, Hao	University of Massachusetts Amherst
Gao, Peng	North Carolina State University
Keywords: Multi-Robot Systems, RGB-D Perception, Deep Learning for Visual Perception Abstract: Egocentric pose estimation is a fundamental capability for multi-robot collaborative perception in connected autonomy, such as connected autonomous vehicles. During multi-robot operations, a robot needs to know the relative pose between itself and its teammates with respect to its own coordinates. However, different robots usually observe completely different views that contains similar objects, which leads to wrong pose estimation. In addition, it is unrealistic to allow robots to share their raw observations to detect overlap due to the limited communication bandwidth constraint. In this paper, we introduce a novel method for Non-Overlap-Aware Egocentric Pose Estimation (NOPE), which performs egocentric pose estimation in a multi-robot team while identifying the non-overlap views and satifying the communication bandwidth constraint. NOPE is built upon an unified hierarchical learning framework that integrates two levels of robot learning: (1) high-level deep graph matching for correspondence identification, which allows to identify if two views are overlapping or not, (2) low-level position-aware cross-attention graph learning for egocentric pose estimation. To evaluate NOPE, we conduct extensive experiments in both high-fidelity simulation and real-world scenarios. Experimental results have demonstrated that NOPE enables the novel capability for non-overlapping-aware egocentric pose estimation and achieves state-of-art performance compared with the existing methods.


TuBT14	311D
Deep Learning Methods 2	Regular Session
Co-Chair: Pu, Jian	Fudan University

13:20-13:25, Paper TuBT14.1
FLAME: A Federated Learning Benchmark for Robotic Manipulation

Bou Betran, Santiago	KTH
Longhini, Alberta	KTH Royal Institute of Technology
Vasco, Miguel	KTH Royal Institute of Technology
Zhang, Yuchong	KTH Royal Institute of Technology
Kragic, Danica	KTH
Keywords: Big Data in Robotics and Automation, Learning from Demonstration, Deep Learning Methods Abstract: Recent progress in robotic manipulation has been fueled by large-scale datasets collected across diverse environments. Training robotic manipulation policies on these datasets is traditionally performed in a centralized manner, raising concerns regarding scalability, adaptability, and data privacy. While federated learning enables decentralized, privacy-preserving training, its application to robotic manipulation remains largely unexplored. We introduce FLAME (Federated Learning Across Manipulation Environments), the first benchmark designed for federated learning in robotic manipulation. FLAME consists of: (i) a set of large-scale datasets of over 160,000 expert demonstrations of multiple manipulation tasks, collected across a wide range of simulated environments; (ii) a training and evaluation framework for robotic policy learning in a federated setting. We evaluate standard federated learning algorithms in FLAME, showing their potential for distributed policy learning and highlighting key challenges. Our benchmark establishes a foundation for scalable, adaptive, and privacy-aware robotic learning. The code is publicly available at https://github.com/KTH-RPL/ELSA-Robotics-Challenge

13:25-13:30, Paper TuBT14.2
Spatial-Temporal Graph Contrastive Learning with Decreasing Masks for Traffic Flow Forecasting

Ren, Bin	Dongguan University of Technology
Zhang, Yongfa	Dongguan University of Technology
Wen, Yamin	ShenZhen University
Luo, Haocheng	Dongguan University of Technology
Zhang, Hao	Dongguan University of Technology
He, Chunhong	Dongguan City University
Keywords: Deep Learning Methods, Intelligent Transportation Systems Abstract: In recent years, Contrastive learning has shown great potential in traffic flow prediction tasks. However, existing contrastive learning methods have difficulties in dealing with missing data and noise, and it is difficult to fully capture local and global correlations by relying on a single contrast method. In this paper, a Decreasing Mask Spatio-Temporal Graph Comparison Learning Model (DMSTGCL) is proposed. The model dynamically adjusts the mask ratio through the adaptive mask reduction technique to effectively deal with the problem of missing data and noise. Meanwhile, the projection head is further combined with the TripleAttention mechanism in the spatio-temporal contrast learning process, which overcomes the limitations of a single contrast method and captures the complex relationships in local and global space more effectively.Experiments on three real-world datasets demonstrate that DMSTGCL achieves significantly higher prediction accuracy than existing methods.

13:30-13:35, Paper TuBT14.3
REGRACE: A Robust and Efficient Graph-Based Re-Localization Algorithm Using Consistency Evaluation

Oliveira, Débora	University of Technology Nuremberg
Knights, Joshua Barton	Queensland University of Technology
Barbas Laina, Sebastián	TU Munich
Boche, Simon	Technical University of Munich
Burgard, Wolfram	University of Technology Nuremberg
Leutenegger, Stefan	Technical University of Munich
Keywords: Deep Learning Methods, Deep Learning for Visual Perception, Recognition Abstract: Loop closures are essential for correcting odometry drift and creating consistent maps, especially in the context of large-scale navigation. Current methods using dense point clouds for accurate place recognition do not scale well due to computationally expensive scan-to-scan comparisons. Alternative object-centric approaches are more efficient but often struggle with sensitivity to viewpoint variation. In this work, we introduce REGRACE, a novel approach that addresses these challenges of scalability and perspective difference in re-localization by using LiDAR-based submaps. We introduce rotation-invariant features for each labeled object and enhance them with neighborhood context through a graph neural network. To identify potential revisits, we employ a scalable bag-of-words approach, pooling one learned global feature per submap. Additionally, we define a revisit with geometrical consistency cues rather than embedding distance, allowing us to recognize far-away loop closures. Our evaluations demonstrate that REGRACE achieves similar results compared to state-of-the-art place recognition and registration baselines while being twice as fast. Code and models are publicly available.

13:35-13:40, Paper TuBT14.4
CasPoinTr: Point Cloud Completion with Cascaded Networks and Knowledge Distillation

Yang, Yifan	Fudan University
Yan, Yuxiang	Fudan University
Liu, Boda	Fudan University
Pu, Jian	Fudan University
Keywords: Deep Learning Methods Abstract: Point clouds collected from real-world environments are often incomplete due to factors such as limited sensor resolution, single viewpoints, occlusions, and noise. These challenges make point cloud completion essential for various applications. A key difficulty in this task is predicting the overall shape and reconstructing missing regions from highly incomplete point clouds. To address this, we introduce CasPoinTr, a novel point cloud completion framework using cascaded networks and knowledge distillation. CasPoinTr decomposes the completion task into two synergistic stages: Shape Reconstruction, which generates auxiliary information, and Fused Completion, which leverages this information alongside knowledge distillation to generate the final output. Through knowledge distillation, a teacher model trained on denser point clouds transfers incomplete-complete associative knowledge to the student model, enhancing its ability to estimate the overall shape and predict missing regions. Together, the cascaded networks and knowledge distillation enhance the model’s ability to capture global shape context while refining local details, effectively bridging the gap between incomplete inputs and complete targets. Experiments on ShapeNet-55 under different difficulty settings demonstrate that CasPoinTr outperforms existing methods in shape recovery and detail preservation, highlighting the effectiveness of our cascaded structure and distillation strategy.

13:40-13:45, Paper TuBT14.5
FoldPath: End-To-End Object-Centric Motion Generation Via Modulated Implicit Paths

Rabino, Paolo	Politecnico Di Torino
Tiboni, Gabriele	Politecnico Di Torino
Tommasi, Tatiana	Politecnico Di Torino
Keywords: Deep Learning Methods, Computer Vision for Automation, Deep Learning for Visual Perception Abstract: Object-Centric Motion Generation (OCMG) is instrumental in advancing automated manufacturing processes, particularly in domains requiring high-precision expert robotic motions, such as spray painting and welding. To realize effective automation, robust algorithms are essential for generating extended, object-aware trajectories across intricate 3D geometries. However, contemporary OCMG techniques are either based on ad-hoc heuristics or employ learning-based pipelines that are still reliant on sensitive post-processing steps to generate executable paths. We introduce FoldPath, a novel, end-to-end, neural field based method for OCMG. Unlike prior deep learning approaches that predict discrete sequences of end-effector waypoints, FoldPath learns the robot motion as a continuous function, thus implicitly encoding smooth output paths. This paradigm shift eliminates the need for brittle post-processing steps that concatenate and order the predicted discrete waypoints. Particularly, our approach demonstrates superior predictive performance compared to recently proposed learning-based methods, and attains generalization capabil ities even in real industrial settings, where only a limited amount of expert samples are provided. We validate FoldPath through comprehensive experiments in a realistic simulation environment and introduce new, rigorous metrics designed to comprehensively evaluate long-horizon robotic paths, thus advancing the OCMG task towards practical maturity.

13:45-13:50, Paper TuBT14.6
Contrastive Conditional Adversarial Autoencoder with Class-Specific Forces for Imbalanced Open-Set Fault Detection (I)

Zuo, Zhonglin	Zhejiang University
Meng, Tao	Zhejiang University
Li, Zheng	Chongqing University
Zhang, Hao	The College of Automation, Chongqing University
Liu, Tong	University of Sheffield
Chen, Zhansheng	Shanghai Aerospace Space Technology Co
Pang, Zhibo	ABB Corporate Research Sweden
Keywords: Failure Detection and Recovery, Deep Learning Methods, Representation Learning Abstract: In real-world industrial scenarios, fault detection faces the widely recognized challenge of data imbalance, which not only refers to the scarcity of fault data but also includes the imbalance in healthy data. This article is concerned with imbalanced open-set fault detection (IOSFD), a practical yet challenging scenario in industrial applications where multiple healthy operating conditions and multiple fault types are imbalanced. In this article, we propose a new contrastive conditional adversarial autoencoder for IOSFD. It constructs an end-to-end unified model based on multi-class known healthy and faulty data to address the reliance of traditional methods on fault samples, while optimizing with class-specific weighted forces to ensure equal attention to imbalanced known classes. Input and feature reconstruction conditioned on operating modes are utilized to learn a compact decision plane and achieve both unknown fault detection and known data classification. Significantly, we formulate the optimization objective of conditional reconstruction based on contrastive learning and introduce adversarial training to further enhance the model's performance. The effectiveness of the proposed method is validated through real-world pipeline leak detection and Tennessee-Eastman multi-fault detection.

13:50-13:55, Paper TuBT14.7
Consistent Feature Alignment for Cross-Modal Knowledge Distillation in Monocular 3D Object Detection

Li, Fan	Xi'an Jiaotong University
Ding, Rui	Xi'an Jiaotong University
Yang, Meng	Xi'an Jiaotong University
Lan, Xuguang	Xi'an Jiaotong University
Keywords: Deep Learning Methods, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception Abstract: Cross-modal knowledge distillation (CMKD) in monocular 3D object detection transfers LiDAR's accurate depth information to compensate for the limitation of camera model. However, current methods directly align the intermediate features of the teacher and student networks, in which the modality gap between LiDAR and camera hinders their effectiveness. To mitigate this issue, we design two modules, namely, Consistent Alignment Module (CAM) and Deformable Adapter Module (DAM) to reduce the modality gap of CMKD. The CAM transforms intermediate features of LiDAR and camera into some consistent features through a lightweight Target Head. It is based on the observation that some high-level features such as heatmaps and depths are highly correlated in CMKD, though modality gap appears between LiDAR and camera. Therefore, these features can be effectively transferred from teacher to student in CMKD. The DAM introduce a deformable adapter for the intermediate features of the student network to reduce background noise in CMKD. It helps to dynamically align its intermediate features with the teacher network. We then propose a Consistent Feature Alignment network (MonoCFA) for CMKD to boost monocular 3D object detection. Our network integrates the two designed modules at different levels of the teacher and student networks, in order to align the intermediate features of LiDAR and camera more accurately and reliably. Our model can be widely applied to existing monocular 3D object detection models. For validation, we choose the representative MonoDLE, GUPNet, and DID-M3D as base models. Experiments on the KITTI benchmark show that our method significantly outperforms the three base models by 39%, 15.5%, and 15%, respectively, and achieves state-of-the-art when compared to other CMKD models.

13:55-14:00, Paper TuBT14.8
Meta-Learning Based Safety-Critical Control in Multi-Obstacles Environments (I)

Zhang, Yu	Technical University of Munich
Wen, Long	Technical University of Munich
Huang, Yuhong	Technische Universität München
Bing, Zhenshan	Technical University of Munich
He, Wei	University of Science and Technology Beijing
Knoll, Alois	Tech. Univ. Muenchen TUM
Keywords: Robot Safety, Deep Learning Methods Abstract: Autonomous robots operating in diverse scenarios are expected to safely and efficiently adapt to new, unknown, and cluttered environments. In this paper, we introduce a real-time goal-seeking and exploration framework incorporating novel meta-signed distance functions (MetaSDFs) and meta-buffer robust control barrier functions (Meta-BRCBFs). To adapt to environmental changes in real time, we employ Bayesian meta-learning to construct MetaSDFs. Deep neural network weights are initially trained offline, followed by efficient online adaptation at the last Bayesian layer, allowing for online updates at linear time complexity. Each MetaSDF is individually trained for its corresponding obstacle class, enhancing online distance estimation accuracy. Subsequently, buffer zones are constructed around the MetaSDFs to establish corresponding Meta-BRCBFs. These Meta-BRCBFs are activated only when the robot enters these zones, substantially reducing the number of CBFs required. Outside these specified buffer zones, the robot remains in textit{goal-seeking} mode, focusing on task completion. After entering a buffer zone, it transitions to textit{exploration} mode, prioritizing safety and exploring safe pathways, effectively balancing task execution with environmental adaptability. We demonstrate that, under this framework, the system achieves both safety and asymptotic stabilization. Extensive simulations and experiments are conducted to demonstrate our framework's effectiveness in both simulated scenarios and real-world environments. These tests confirm our framework's real-time capabilities and safety assurances in dynamic settings where state-of-the-art methods fail.


TuBT15	206
Swarm Robotics 2	Regular Session
Chair: Zhu, ZhengHong (George)	York University
Co-Chair: Tsetserukou, Dzmitry	Skolkovo Institute of Science and Technology

13:20-13:25, Paper TuBT15.1
Energy Efficient Scheduling for Position Reconfiguration of Swarm Drones (I)

Liu, Han	Sun Yat-Sen University
Wei, Mingxin	Sun Yat-Sen University
Zhao, Shuai	Sun Yat-Sen University
Cheng, Hui	Sun Yat-Sen University
Huang, Kai	Sun Yat-Sen University
Keywords: Swarm Robotics, Planning, Scheduling and Coordination, Biologically-Inspired Robots Abstract: Enhancing the energy efficiency of drones, particularly in extending the flight lifetime, has emerged as a crucial area. Position reconfiguration has been explored as a mechanism to achieve this goal for swarm drones. Building on this concept, we investigate how position reconfiguration can be applied within urban wind environments to further extend the lifetime of drone swarms. Despite its potential, efficiently implementing position reconfiguration remains challenging. To address it, we propose an efficient position reconfiguration scheme that reduces the energy consumption imbalance of the swarm and prolongs the lifetime. The scheme includes: (1) a MIP (mixed integer programming)-based optimization method. (2) an approximation algorithm that runs in pseudo-polynomial time and without the need for an optimization solver. The scheme provides a complete position reconfiguration solution that determines (i) the number of position reconfiguration; (ii) when to perform reconfiguration; (iii) who to change positions. Simulation and experimental results demonstrate the effectiveness of our scheme.

13:25-13:30, Paper TuBT15.2
Collaborative Swarm Shape Reconstruction of Tumbling Space Targets Via Decentralized Dynamic Factor Graph Optimization

Asri, El Ghali	York University
Zhu, ZhengHong (George)	York University
Keywords: Space Robotics and Automation, Multi-Robot SLAM, Swarm Robotics Abstract: On-orbit servicing (OOS) has become an essential aspect of modern space missions, encompassing satellite maintenance, orbital assembly, and debris removal. This paper presents a novel decentralized navigation algorithm designed for a swarm of servicing spacecraft to collaboratively reconstruct the shape of unknown tumbling space objects. The proposed method leverages a dynamic factor graph-based Collaborative Simultaneous Localization and Mapping (C-SLAM) framework, integrating observed and identified point cloud features across the swarm. To address the challenges associated with the target’s tumbling dynamics, the approach also incorporates a dynamic SLAM formulation, utilizing a noisy parametric model to propagate the dynamic map and construct the dynamic factor graph at the front-end. Kinematic factors are introduced to account for loop closures, enabling the swarm to recognize previously observed features as they rotate with the target, thereby enhancing mapping robustness. Additionally, the target kinematic model parameters themselves—such as its center of mass, linear velocity, and angular velocity—are estimated in real time from the reconstructed maps. The latter notably is estimated using a singular value decomposition approach that determine the best fit rotation between two consecutive sets of mapped points. These estimates are fed back into the kinematic factors and loop closure processes to refine map optimization iteratively. Simulation results demonstrate that incorporating kinematic factors, addressing loop closures, and facilitating inter-robot communication significantly enhance the swarm’s ability to track the evolving map without a central leader. This decentralized approach highlights the potential of equipping spacecraft swarms with advanced, robust, and scalable perception capabilities for the collaborative inspection and characterization of unknown tumbling space targets.

13:30-13:35, Paper TuBT15.3
Distributed Oscillatory Guidance for Formation Flight of Fixed-Wing Drones

Xu, Yang	University of Granada
Bautista, Jesús	Universidad De Granada
Hinojosa, Jose	Universidad De Granada
Garcia de Marina, Hector	Universidad De Granada
Keywords: Swarm Robotics, Aerial Systems: Mechanics and Control, Distributed Robot Systems Abstract: The autonomous formation flight of fixed-wing drones is hard when the coordination requires the actuation over their speeds since they are critically bounded and aircraft are mostly designed to fly at a nominal airspeed. This paper proposes an algorithm to achieve formation flights of fixed-wing drones without requiring any actuation over their speed. In particular, we guide all the drones to travel over specific paths, e.g., parallel straight lines, and we superpose an oscillatory behavior onto the guiding vector field that drives the drones to the paths. This oscillation enables control over the average velocity along the path, thereby facilitating inter-drone coordination. Each drone adjusts its oscillation amplitude distributively in a closed-loop manner by communicating with neighboring agents in an undirected and connected graph. A novel consensus algorithm is introduced, leveraging a non-negative, asymmetric saturation function. This unconventional saturation is justified since negative amplitudes do not make drones travel backward or have a negative velocity along the path. Rigorous theoretical analysis of the algorithm is complemented by validation through numerical simulations and a real-world formation flight.

13:35-13:40, Paper TuBT15.4
ImpedanceGPT: VLM-Driven Impedance Control of Swarm of Mini-Drones for Intelligent Navigation in Dynamic Environment

Batool, Faryal	Skoltech
Yaqoot, Yasheerah	Skolkovo Institute of Science and Technology
Zafar, Malaika	Skolkovo Institute of Science and Technology
Khan, Roohan Ahmed	The Skolkovo Institute of Science and Technology
Khan, Muhammad Haris	Intelligent Space Robotics Laboratory, Skoltech
Fedoseev, Aleksey	Skolkovo Institute of Science and Technology
Tsetserukou, Dzmitry	Skolkovo Institute of Science and Technology
Keywords: Swarm Robotics, Semantic Scene Understanding, Motion and Path Planning Abstract: Swarm robotics plays a crucial role in enabling autonomous operations in dynamic and unpredictable environments. However, a major challenge remains ensuring safe and efficient navigation in environments shared by both dynamic alive (e.g., humans) and dynamic inanimate (e.g., non-living objects) obstacles. In this paper, we propose ImpedanceGPT, a novel system that combines a Vision-Language Model (VLM) with retrieval-augmented generation (RAG) framework to enable real-time reasoning for adaptive navigation of mini-drone swarm in complex environments. The key innovation of ImpedanceGPT lies in the integration of VLM-RAG system with impedance control method, which is an active compliance strategy. This system provides drones with an enhanced semantic understanding of their surroundings and allows them to dynamically adjust impedance control parameters in response to obstacle types and environmental conditions. Our approach not only ensures safe and precise navigation but also improves coordination between drones in the swarm. Experimental evaluations demonstrate the effectiveness of the system. The VLM- RAG framework achieved an obstacle detection and retrieval accuracy of 80% under optimal lighting. In static environments,drones navigated dynamic inanimate obstacles at 1.4 m/s but slowed to 0.7 m/s with increased safety margin around humans. In dynamic environments, speed adjusted to 1.0 m/s near hard obstacles, while reducing to 0.6 m/s with higher deflection region to safely avoid moving humans.

13:40-13:45, Paper TuBT15.5
GO-Flock: Goal-Oriented Flocking in 3D Unknown Environments with Depth Maps

Tan, Yan Rui	National University of Singapore
Liu, Wenqi	National University of Singapore
Leong, Wai Lun	National University of Singapore
Tan, John	National University of Singapore
Yong, Wen Huei, Wayne	National University of Singapore
Foong, Shaohui	Singapore University of Technology and Design
Shi, Fan	National University of Singapore
Teo, Rodney	NUS
Keywords: Swarm Robotics, Multi-Robot Systems, Aerial Systems: Perception and Autonomy Abstract: Artificial Potential Field (APF) methods are widely used for reactive flocking control, but they often suffer from challenges such as deadlocks and local minima, especially in the presence of obstacles. Existing solutions to address these issues are typically passive, leading to slow and inefficient collective navigation. As a result, many APF approaches have only been validated in obstacle-free environments or simplified, pseudo-3D simulations. This paper presents GO-Flock, a hybrid flocking framework that integrates planning with reactive APF-based control. GO-Flock consists of an upstream Perception Module, which processes depth maps to extract waypoints and virtual agents for obstacle avoidance, and a downstream Collective Navigation Module, which applies a novel APF strategy to achieve effective flocking behavior in cluttered environments. We evaluate GO-Flock against passive APF-based approaches to demonstrate their respective merits, such as their flocking behavior and the ability to overcome local minima. Finally, we validate GO-Flock through obstacle-filled environment and also hardware-in-the-loop experiments where we successfully flocked a team of nine drones—six physical and three virtual— in a forest environment.

13:45-13:50, Paper TuBT15.6
Asynchronous Harmony-Based Decentralized Auctions Method for Scalable UAV Swarm

Chen, Runfeng	National University of Defense Technology
Li, Jie	National University of Defense Technology
Chen, Yiting	National University of Defense Technology
Huang, Yuchong	National University of Defense Technology
Xiong, Zehao	National University of Defense Technology
Keywords: Task Planning, Planning, Scheduling and Coordination, Swarm Robotics Abstract: Unmanned aerial vehicle (UAV) swarms find extensive applications in diverse fields, including search and rescue, logistics delivery, and environmental surveillance, necessitating meticulous task and temporal scheduling to meet intricate spatio-temporal requirements. A market-based strategy emerges as a suitable option for self-organizing swarm coordination. However, the consensus mechanisms employed by most market-based algorithms necessitate synchronous communication, leading to waiting times. Researchers have turned to asynchronous approaches for enhanced efficiency, yet the communication burden of existing asynchronous methods escalates swiftly with the growth of the swarm size. Therefore, this paper proposes an Asynchronous Harmony-based Decentralized Auctions (AHDA) method for networked UAV swarm to reduce the communication load and scheduling time required by a market-based approach. First, proximity communication is proposed to reduce the broadcast range and content of UAVs. Second, new conflict resolution protocols are designed to eliminate task conflict between UAVs faster. Third, propagation rules are designed to limit the scope of task information diffusion. Ultimately, it brings a decrease in communication load and scheduling time because it is expected to achieve the minimum requirement of no task conflict between UAVs, rather than swarm scheduling consistency. Monte Carlo simulations spanning 32 to 128 UAVs demonstrate that compared with the Asynchronous Consensus-Based Bundle Algorithm (ACBBA), the proposed AHDA achieves reductions of up to 70.16% in transmitted messages, 75.78% in communication traffic, and 63.12% in scheduling time.

13:50-13:55, Paper TuBT15.7
Integrated Task Assignment and Trajectory Planning for a Massive Number of Agents Based on Bilayer-Coupled Mean Field Games (I)

Niu, Zijia	Beihang University
Yao, Wang	Beihang University
Jin, Yuxin	Beihang University
Huang, Sanjin	Beihang University
Zhang, Xiao	Beihang University
Qian, Langyu	Beihang University
Keywords: Swarm Robotics, Path Planning for Multiple Mobile Robots or Agents, Optimization and Optimal Control Abstract: Aiming at the problem of integrated task assignment and trajectory planning of a massive number of agents in the scenario with different priority task nodes and multiple static obstacles, this paper proposes a general framework based on bilayer-coupled mean field games, which couples the minimum cost of trajectory planning of an agent in the task assignment process to achieve a reasonable, globally optimal, and targeted adjustable assignment result. In the proposed general framework, firstly, the multi-population mean field game is used to plan the optimal trajectory of an agent between each pair of priority adjacent task nodes, and the minimum costs are calculated. Then, based on the discrete time finite state space mean field game, a task assignment model in the discrete task space is constructed, and the minimum costs obtained in the trajectory planning are coupled into the model as a reference, the task assignment strategies are finally obtained. Moreover, we give a specific example of the proposed general framework and prove the existence of equilibrium solutions of two mean field games. The effectiveness of the proposed general framework is demonstrated through simulation experiments and results analysis.

13:55-14:00, Paper TuBT15.8
SubCDM: Collective Decision Making with a Swarm Subset

Fuady, Samratul	University of Southampton
Tarapore, Danesh	University of Southampton
Soorati, Mohammad D.	University of Southampton
Keywords: Swarm Robotics, Distributed Robot Systems, Multi-Robot Systems Abstract: Collective decision-making is a key function of autonomous robot swarms, enabling them to reach a consensus on actions based on environmental features. Existing strategies require the participation of all robots in the decision-making process, which is resource-intensive and prevents the swarm from allocating the robots to any other tasks. We propose Subset-Based Collective Decision-Making (SubCDM), which enables decisions using only a swarm subset. The construction of the subset is dynamic and decentralized, relying solely on local information. Our method allows the swarm to adaptively determine the size of the subset for accurate decision-making, depending on the difficulty of reaching a consensus. Simulation results using one hundred robots show that our approach achieves accuracy comparable to using the entire swarm while reducing the number of robots required to perform collective decision-making, making it a resource-efficient solution for collective decision-making in swarm robotics.


TuBT16	207
Human-Robot Interaction 2	Regular Session
Co-Chair: Chen, Wenkai	University of Hamburg

13:20-13:25, Paper TuBT16.1
Modeling and Evaluating Trust Dynamics in Multi-Human Multi-Robot Task Allocation

Obi, Ike	Purdue University
Wang, Ruiqi	Purdue University
Jo, Wonse	University of Michigan
Min, Byung-Cheol	Purdue University
Keywords: Acceptability and Trust, Human-Centered Robotics, Safety in HRI Abstract: Trust is essential in human-robot collaboration, particularly in multi-human, multi-robot (MH-MR) teams, where it plays a crucial role in maintaining team cohesion in complex operational environments. Despite its importance, trust is rarely considered in task allocation and reallocation algorithms for MH-MR collaboration. Prior research in single-human, single-robot interactions has demonstrated that incorporating trust significantly enhances both performance outcomes and user experience. However, limited studies have investigated the role of trust in MH-MR task allocation. In this paper, we introduce the Expectation Confirmation Trust (ECT) Model, a novel approach for modeling trust dynamics in MH-MR teams. We evaluate the ECT model alongside five existing trust models and a no-trust baseline to examine the impact of trust on task allocation outcomes across different team configurations (2H-2R, 5H-5R, and 10H-10R). Our results show that the ECT model improves task success rate, reduces mean completion time, and lowers task error rates. These findings highlight the complexities of trust-based task allocation in MH-MR teams. We discuss the implications of integrating trust into task allocation algorithms and propose directions for future research on adaptive trust mechanisms to balance efficiency and performance in dynamic, multi-agent environments.

13:25-13:30, Paper TuBT16.2
Frozen Triumph: Lessons from GARMI's Bimanual Trophy Handover at the Kandahar Ski World Cup – Shaping Current Research Directions

Troebinger, Mario	Technical University of Munich
Naceri, Abdeldjallil	Technical University of Munich
Sadeghian, Hamid	Technical University of Munich
Haddadin, Sami	Mohamed Bin Zayed University of Artificial Intelligence
Keywords: Service Robotics, Calibration and Identification, Acceptability and Trust Abstract: This paper presents GARMI’s successful outdoor demonstration during the Kandahar Ski World Cup, where it performed trophy handovers in sub-zero temperatures. The event highlighted challenges in deploying robots in extreme conditions, including fluctuating temperatures and uneven terrain. GARMI achieved and completed the trophy handover during the live event, streamed to 60 million viewers. This experience raised two key research questions: the feasibility of high-precision robotics in harsh weather and strategies to compensate for environmental effects. To address them, we extended our previous framework to estimate the mass of the lifted trophy in real-time, incorporating IMU data and conducting experiments under varying temperatures and orientations. Experimental results showed that even slight variations in the robot’s base orientation had a significant impact on the accuracy of mass estimation. For instance, a 5◦ tilt in the robot’s base orientation resulted in a more than 100% increase in mass estimation error. Online mass estimation, performed using a quasi-static model, demonstrated improved accuracy when incorporating IMU-based corrections for base orientation. Additionally, temperature variations were found to affect robot control performance, with tracking errors increasing outside the manufacturer’s recommended temperature range. The findings highlighted the need for real-time corrections and compensations for base orientation and temperature in robot dynamics, ensuring safe human-robot interaction.

13:30-13:35, Paper TuBT16.3
J-ORA: A Multimodal Framework and Dataset for Japanese Object Identification, Reference, Action Prediction in Robot Perception

Atuhurra, Jesse	Nara Institute of Science and Technology
Kamigaito, Hidetaka	Nara Institute of Science and Technology
Watanabe, Taro	Nara Institute of Science and Technology
Yoshino, Koichiro	Institute of Physical and Chemical Research (RIKEN)
Keywords: Data Sets for Robotic Vision, Social HRI, Recognition Abstract: We introduce J-ORA, a novel multimodal dataset that bridges the gap in robot perception by providing detailed object attribute annotations within Japanese human–robot dialogue scenarios. J-ORA is designed to support three critical perception tasks—object identification, reference resolution, and next-action prediction—by leveraging a comprehensive template of attributes (e.g., category, color, shape, size, material, and spatial relations). Extensive evaluations with both proprietary and open‐source Vision Language Models (VLMs) reveal that incorporating detailed object attributes substantially improves multimodal perception performance compared to without object attributes. Despite the improvement, we find that there still exists a gap between proprietary and open‐source VLMs. In addition, our analysis of object affordances demonstrates varying abilities in understanding object functionality and contextual relationships across different VLMs. These findings underscore the importance of rich, context-sensitive attribute annotations in advancing robot perception in dynamic environments.

13:35-13:40, Paper TuBT16.4
OpenRoboCare: A Multi-Modal Multi-Task Expert Demonstration Dataset for Robot Caregiving

Liang, Xiaoyu	Cornell University
Liu, Ziang	Cornell University
Lin, Kelvin	National University of Singapore
Gu, Edward	MIT
Ye, Ruolin	Cornell University
Nguyen, Tam	University of Massachusetts Lowell
Hsu, Cynthia	Columbia University
Yang, Xiaoman	Cornell University
Wu, Zhanxin	Cornell University
Cheung, Christy	Cornell University
Soh, Harold	National University of Singapore
Dimitropoulou, Katherine	Columbia University
Bhattacharjee, Tapomayukh	Cornell University
Keywords: Datasets for Human Motion, Multi-Modal Perception for HRI, Force and Tactile Sensing Abstract: We present OpenRoboCare, a multi-modal dataset for robot-assisted caregiving, capturing expert occupational therapist demonstrations of Activities of Daily Living (ADLs). Caregiving tasks involve complex physical human-robot in- teractions, requiring precise perception under occlusions, safe physical contact, and long-horizon planning. While recent advances in robot learning from demonstrations have shown promise, there is a lack of a large-scale, diverse, and expert- driven dataset that captures real-world caregiving routines. To address this gap, we collected data from 21 occupational therapists performing 15 ADL tasks on two manikins. The dataset spans five modalities—RGBD video, pose tracking, eye- gaze tracking, task and action annotations, and tactile sensing, providing rich multi-modal insights into caregiver movement, attention, force application, and task execution strategies. We further analyze expert caregiving principles and strategies, offering insights to improve robot efficiency and task feasibility. Additionally, our evaluations demonstrate that OpenRoboCare presents challenges for state-of-the-art robot perception and human activity recognition methods, both critical for developing safe and adaptive assistive robots, highlighting the value of our contribution.

13:40-13:45, Paper TuBT16.5
Adaptive Model Prediction Control Framework with Game Theory for Brain-Controlled Air-Ground Collaborative Unmanned Systems

Shi, Haonan	Beijing Institute of Technology
Bi, Luzheng	Beijing Institute of Technology
Yang, Zhenge	Beijing Institute of Technology
Ge, Haorui	Beijing Institute of Technology
Fei, Weijie	Beijing Institute of Technology
Wang, Ling	Tsinghua University
Keywords: Brain-Machine Interfaces, Human Factors and Human-in-the-Loop, Human-Robot Collaboration Abstract: Brain-machine interfaces (BMIs) can enable humans to bypass the peripheral nervous system and directly control devices through the central nervous system. In this way, operators’ hands are freed up, allowing them to interact with other devices, thus enabling multitasking operations. In this paper, to improve the performance of air-ground collaborative systems, we propose an adaptive model prediction control framework of brain-controlled air-ground collaboration systems, which consists of a BMI with a probabilistic output model, an interface model based on fuzzy logic, and an adaptive model-predictive-control shared controller based on game theory. We establish a human-in-the-loop experimental platform to validate the proposed method by trajectory tracking and obstacle avoidance scenarios. The experimental results show the effectiveness of the proposed method in improving performance and decreasing operators’ workload. This work can contribute to the research and development of air-ground collaboration and provide new insights into the study of human-machine integration.

13:45-13:50, Paper TuBT16.6
Inspiring External Human-Machine Interface Designs for Autonomous Personal Mobility Vehicle: Causal Discovering the Influence of Passengers' Personality Traits on User Experience

Liu, Hailong	Nara Institute of Science and Technology
Zeng, Zhe	Ulm University
Li, Yang	Karlsruhe Institute of Technology (KIT)
Cheng, Hao	University of Twente
Peng, Chen	University of Leeds
Wada, Takahiro	Nara Institute of Science and Technology
Keywords: Acceptability and Trust, Design and Human Factors, Intelligent Transportation Systems Abstract: As autonomous personal mobility vehicles (APMVs) are increasingly integrated into shared spaces, short-distance interactions between pedestrians and APMVs will become more frequent. To facilitate communication in shared spaces, APMVs equipped with external human-machine interfaces (eHMIs). Although the eHMI is primarily designed to communicate with pedestrians, its communication also affects the APMV passenger due to the short-distance interaction. This paper focused on the effect of passengers' personality traits on their user experience when the APMV exhibits different eHMIs. An experiment was conducted in the field with 24 participants as APMV passengers who experienced three distinct eHMI types: eHMI-T (text-based), eHMI-NV (neutral voice-based), and eHMI-AV (affective voice-based). Through causal discovery analysis, our findings revealed that when the APMV is equipped with eHMI-T, various personality traits of passengers collectively influenced their user experience. In contrast, the eHMI-NV design demonstrated that personality traits had no direct influence on user experience. The eHMI-AV design primarily showed that agreeableness and extraversion negatively influenced concerns about drawing attention, which subsequently affected other user experience. Based on the results, this paper recommends designing different eHMIs based on the APMV ownerships, such as private or public shared APMVs.


TuBT17	210A
Autonomous Vehicles 1	Regular Session
Chair: Zhang, Ya	Southeast University
Co-Chair: Liu, Jia	ShenZhen Institutes of Advanced Technology, Chinese Academy of Sciences

13:20-13:25, Paper TuBT17.1
ComDrive: Comfort-Oriented End-To-End Autonomous Driving

Wang, Junming	The University of Hong Kong
Zhang, Xingyu	Horizon Robotics
Xing, Zebin	UCAS
Gu, Songen	Fudan University
Guo, Xiaoyang	Horizon Robotics
Hu, Yang	Horizon Robotics
Song, Ziying	Beijing Jiaotong University
Zhang, Qian	Horizon Robotics
Long, Xiaoxiao	The University of Hong Kong
Yin, Wei	University of Adelaide
Keywords: Autonomous Vehicle Navigation, Vision-Based Navigation, Deep Learning Methods Abstract: We propose ComDrive: the first comfort-oriented end-to-end autonomous driving system to generate temporally consistent and comfortable trajectories. Recent studies have demonstrated that imitation learning-based planners and learning-based trajectory scorers can effectively generate and select safety trajectories that closely mimic expert demonstrations. However, such trajectory planners and scorers face the challenge of generating temporally inconsistent and uncomfortable trajectories. To address these issues, ComDrive first extracts 3D spatial representations through sparse perception, which then serves as conditional inputs. These inputs are used by a Conditional Denoising Diffusion Probabilistic Model (DDPM)-based motion planner to generate temporally consistent multi-modal trajectories. A dual-stream adaptive trajectory scorer subsequently selects the most comfortable trajectory from these candidates to control the vehicle. Experiments demonstrate that ComDrive achieves state-of-the-art performance in both comfort and safety, outperforming UniAD by 17% in driving comfort and reducing collision rates by 25% compared to SparseDrive. More results are available on our project page: https://jmwang0117.github.io/ComDrive/.

13:25-13:30, Paper TuBT17.2
CoPAD : Multi-Source Trajectory Fusion and Cooperative Trajectory Prediction with Anchor-Oriented Decoder in V2X Scenarios

Wu, Kangyu	Southeast University
Qiao, Jiaqi	Southeast University
Zhang, Ya	Southeast University
Keywords: Autonomous Vehicle Navigation, Sensor Fusion, Intelligent Transportation Systems Abstract: Recently, data-driven trajectory prediction methods have achieved remarkable results, significantly advancing the development of autonomous driving. However, the instability of single-vehicle perception introduces certain limitations to trajectory prediction. In this paper, a novel lightweight framework for cooperative trajectory prediction, CoPAD, is proposed. This framework incorporates a fusion module based on the Hungarian algorithm and Kalman filtering, along with the Past Time Attention (PTA) module, mode attention module and anchor-oriented decoder (AoD). It effectively performs early fusion on multi-source trajectory data from vehicles and road infrastructure, enabling the trajectories with high completeness and accuracy. The PTA module can efficiently capture potential interaction information among historical trajectories, and the mode attention module is proposed to enrich the diversity of predictions. Additionally, the decoder based on sparse anchors is designed to generate the final complete trajectories. Extensive experiments show that CoPAD achieves the state-of-the-art performance on the DAIR-V2X-Seq dataset, validating the effectiveness of the model in cooperative trajectory prediction in V2X scenarios.

13:30-13:35, Paper TuBT17.3
Autonomous Vehicle Controllers from End-To-End Differentiable Simulation

Nachkov, Asen	INSAIT, Sofia University
Paudel, Danda Pani	INSAIT-Sofia University
Van Gool, Luc	ETH Zurich
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, Simulation and Animation Abstract: Current methods to learn controllers for autonomous vehicles (AVs) focus on behavioural cloning. Being trained only on exact historic data, the resulting agents often generalize poorly to novel scenarios. Simulators provide the opportunity to go beyond offline datasets, but they are still treated as complicated black boxes, only used to update the global simulation state. As a result, these RL algorithms are slow, sample-inefficient, and prior-agnostic. In this work, we leverage a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers on the large-scale Waymo Open Motion Dataset. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of the environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We combine this setup with a recurrent architecture that can efficiently propagate temporal information across long simulated trajectories. This APG method allows us to learn robust, accurate, and fast policies, while only requiring widely-available expert trajectories, instead of scarce expert actions. We compare to behavioural cloning and find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.

13:35-13:40, Paper TuBT17.4
Efficient Distributional Reinforcement Learning with Monotonic Approximation for Driving Decision-Making

Yin, Jianwen	Shenzhen Institutes of Advanced Technology, Chinese Academy of S
Kou, YuRan	Shenzhen Institutes of Advanced Technology Chinese Academy of Sc
Li, Rixin	Southern University of Science and Technology
Liu, Jia	ShenZhen Institutes of Advanced Technology, Chinese Academy of S
Sun, Tianfu	Shenzhen Institutes of Advanced Technology, Chinese Academy of S
Xu, Tiantian	Chinese Academy of Sciences
Keywords: Reinforcement Learning, Autonomous Vehicle Navigation, Autonomous Agents Abstract: Decision-making in urban autonomous driving scenarios presents significant challenges due to the highly stochastic and interactive nature of traffic participants. While reinforcement learning-based approaches show promise for developing high-level driving policies, they often struggle with low sample efficiency and inadequate generalization. In this paper, we introduce a regularization-based value network to enhance the decision-making capabilities of distributional reinforcement learning, resulting in improved sample efficiency and stability. Specifically, we apply regularization techniques along with random ensemble methods to strengthen the learning process of the value network and address Q-value overestimation. Additionally, we utilize monotonic rational-quadratic splines to learn the quantile functions, naturally resolving the quantile crossing issue. Through extensive experiments, our approach demonstrates superior performance compared to baseline methods on the NoCrash benchmark, Town05, Town06 and our closed-loop OpenDD scenario. The experiment results indicate that our proposed method outperforms baseline performance in terms of success rate, safety and efficiency.

13:40-13:45, Paper TuBT17.5
Inverse Attention-Weighted Model with Heterogeneous Spatio-Temporal Interaction Graph for Autonomous Navigation Systems

Li, YiLin	National Chung Hsing University
Tsai, Hsiao-Ping	National Chung Hsing University
Keywords: Autonomous Vehicle Navigation, Collision Avoidance, Reinforcement Learning Abstract: This study proposes a Res-MLP-based attention mechanism tailored for dynamic environments with high-density crowds and static obstacles, aiming to significantly improve robotic collision avoidance capabilities and advance autonomous navigation. Traditional navigation approaches often lack predictive foresight, are trained in controlled environments that do not reflect real-world complexities, and face computational constraints in offline training. These limitations make them inadequate for handling diverse obstacle shapes and hinder their applicability in practical scenarios. To address these challenges, we propose a novel model that integrates an inverse attention-weighted module based on Res-MLP to refine both Robot-Human and Robot-Obstacle attention mechanisms to improve robustness. This leads to faster convergence, greater sensitivity to unpredictable hazards, and improved success rates. Our model builds upon HH attn, and incorporates the Gumbel Social Transformer (GST), which utilizes interaction graph modeling to enhance trajectory prediction accuracy. Additionally, we construct a heterogeneous spatio-temporal interaction graph in the training environment and incorporate circular and rectangular static obstacles to create a more realistic and challenging scenario. By employing curriculum learning, which simulates the human learning process, we enhance the model’s efficiency, real-world applicability, and generalization capability. Experimental results demonstrate superior performance in high-density crowd environments and scenarios with more diverse geometric obstacles. By integrating the inverse attention-weighted module and heterogeneous spatio-temporal interaction graphs, our approach enhances feature extraction and ensures robust and safe navigation in real-world applications, effectively mitigating performance degradation. This allows the robot to balance conservatism and assertiveness, resulting in a highly adaptable and reliable navigation system.

13:45-13:50, Paper TuBT17.6
Hierarchical Decision-Making for Autonomous Navigation: Integrating Deep Reinforcement Learning and Fuzzy Logic in Four-Wheel Independent Steering and Driving Systems

Wang, Yizhi	Central South University
Xu, Degang	Central South University
Xie, Yongfang	Central South University
Tan, Shuzhong	Central South University
Zhou, Xianhan	Central South University
Chen, Peng	Central South University
Keywords: Autonomous Vehicle Navigation, Redundant Robots, Engineering for Robotic Systems Abstract: This paper presents a hierarchical decision-making framework for autonomous navigation in four-wheel independent steering and driving (4WISD) systems. The proposed approach integrates deep reinforcement learning (DRL) for high-level navigation with fuzzy logic for low-level control to ensure both task performance and physical feasibility. The DRL agent generates global motion commands, while the fuzzy logic controller enforces kinematic constraints to prevent mechanical strain and wheel slippage. Simulation experiments demonstrate that the proposed framework outperforms traditional navigation methods, offering enhanced training efficiency and stability and mitigating erratic behaviors compared to purely DRL-based solutions. Real-world validations further confirm the framework’s ability to navigate safely and effectively in dynamic industrial settings. Overall, this work provides a scalable and reliable solution for deploying 4WISD mobile robots in complex, real-world scenarios.

13:50-13:55, Paper TuBT17.7
Robust and Real-Time Perception and Planning for UGVs in Complex Outdoor Environments

Huo, Dongjie	Beijing University of Chemical Technology
Wang, Dengshuo	Beijing University of Chemical Technology
Zhang, Dong	School of Information Science and Technology, Beijing University
Zhou, MengChu	New Jersey Institute of Technology
Cao, Zhengcai	Harbin Institute of Technology
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, Collision Avoidance Abstract: Large-scale outdoor navigation is essential for unmanned ground vehicles (UGVs), but despite significant advancements, they still face two key challenges in practical applications. The first one is how to ensure safe navigation in environments with dynamic and low-lying obstacles that LiDAR cannot detect. The second one is how to conduct the adaptive re-planning of target points while some of them are blocked by temporary obstacles. To address these challenges, this work proposes a Dynamic and Low-lying-obstacle Avoidance Navigation (DLAN) system to conduct perception, planning, and point correction for UGVs. To efficiently and accurately detect dynamic obstacles, it designs a lightweight Ensemble3D framework that integrates three fast but low-accuracy detection methods. A multi-criteria waypoint optimizer is used to assist UGVs in path planning. It ensures a balance between obstacle avoidance and path following. To adjust blocked target points through local re-planning, this work designs a checkpoint correction method. Extensive simulations and real-world experiments demonstrate that DLAN enables reliable navigation with high efficiency and robust obstacle avoidance in complex environments. More details can be found on our project homepage and video.

13:55-14:00, Paper TuBT17.8
Roadside GNSS Aided Multi-Sensor Integrated System for Vehicle Positioning in Urban Areas

Huang, Feng	The Hong Kong Polytechnic University
Zhong, Yihan	The Hong Kong Polytechnic University
Chen, Hang	Hong Kong Applied Science and Technology Research Institute
Su, Dongzhe	Hong Kong Applied Science and Technology Research Institute
Wu, Jin	University of Science and Technology Beijing
Wen, Weisong	Hong Kong Polytechnic University
Hsu, Li-ta	Hong Kong Polytechnic University
Keywords: Autonomous Vehicle Navigation, Sensor Fusion, Intelligent Transportation Systems Abstract: Global navigation satellite system (GNSS) positioning can be significantly degraded due to multipath and non-line-of-sight (NLOS) signals in urban areas. Cellular vehicle-to-everything (C-V2X) technology provides new opportunities to enhance GNSS performance from a single intelligent vehicle by leveraging roadside GNSS (RSG) and C-V2X. Inspired by this, we propose an RSG-aided GNSS/LiDAR/IMU (RSG-GLIO) method to achieve reliable odometry and mapping, which leverages the high-quality double-differenced (DD) measurements provided by nearby RSG, effectively mitigating shared random errors such as multipath and NLOS. Our RSG-GLIO first estimates the absolute state of the vehicle using onboard sensors. Utilizing this initial positioning estimate, the proposed method introduces a coarse-to-fine selection scheme to identify consistent DD observations from available RSG measurements. Finally, the consistent roadside DD constraints are jointly optimized into factor graph optimization. Static and dynamic data are extensively evaluated using multiple RSG receivers deployed in the Hong Kong C-V2X testbed to evaluate the effectiveness of roadside-aided positioning. The results demonstrate a significant 36.6% improvement in terms of absolute positioning accuracy compared to the state-of-the-art GLIO method. Furthermore, we showcase the potential for employing RSG as low-cost base stations in dense urban areas. The data of our work is publicly accessible at https://github.com/DarrenWong/RSG-GLIO.


TuBT18	210B
Multi-Robot Systems 2	Regular Session
Chair: Parasuraman, Ramviyas	University of Georgia

13:20-13:25, Paper TuBT18.1
An Anytime, Scalable and Complete Algorithm for Embedding a Manufacturing Procedure in a Smart Factory

Leet, Christopher	University of Southern California
Sciortino, Aidan	University of Rochester
Koenig, Sven	University of Southern California
Keywords: Multi-Robot Systems, Motion and Path Planning Abstract: Modern automated factories increasingly run manufacturing procedures using a matrix of programmable machines, such as 3D printers, interconnected by a programmable transport system, such as a fleet of tabletop robots. To embed a manufacturing procedure into a smart factory, an operator must: (a) assign each of its processes to a machine and (b) specify how agents should transport parts between machines. The problem of embedding a manufacturing process into a smart factory is termed the Smart Factory Embedding (SFE) problem. State-of-the-art SFE solvers can only scale to factories containing a couple dozen machines. Modern smart factories, however, may contain hundreds of machines. We fill this hole by introducing the first highly scalable solution to the SFE, TS-ACES, the Traffic System based Anytime Cyclic Embedding Solver. We show that TS-ACES is complete and can scale to SFE instances based on real industrial scenarios with more than a hundred machines.

13:25-13:30, Paper TuBT18.2
Cooperative Bearing-Only Target Pursuit Via Multiagent Reinforcement Learning: Design and Experiment

Li, Jianan	Westlake University
Wang, Zhikun	Westlake University
Susheng, Ding	Westlake University
Guo, Shiliang	WestlakeUniversity
Zhao, Shiyu	Westlake University
Keywords: Multi-Robot Systems Abstract: This paper addresses the multi-robot pursuit problem for an unknown target, encompassing both target state estimation and pursuit control. First, in state estimation, we focus on using only bearing information, as it is readily available from vision sensors and effective for small, distant targets. Challenges such as instability due to the nonlinearity of bearing measurements and singularities in the two-angle representation are addressed through a proposed uniform bearing-only information filter. This filter integrates multiple 3D bearing measurements, provides a concise formulation, and enhances stability and resilience to target loss caused by limited field of view (FoV). Second, in target pursuit control within complex environments, where challenges such as heterogeneity and limited FoV arise, conventional methods like differential games or Voronoi partitioning often prove inadequate. To address these limitations, we propose a novel multiagent reinforcement learning (MARL) framework, enabling multiple heterogeneous vehicles to search, localize, and follow a target while effectively handling those challenges. Third, to bridge the sim-to-real gap, we propose two key techniques: incorporating adjustable low-level control gains in training to replicate the dynamics of real-world autonomous ground vehicles (AGVs), and proposing spectral-normalized RL algorithms to enhance policy smoothness and robustness. Finally, we demonstrate the successful zero-shot transfer of the MARL controllers to AGVs, validating the effectiveness and practical feasibility of our approach. The accompanying video is available at https://youtu.be/HO7FJyZiJ3E.

13:30-13:35, Paper TuBT18.3
A Novel Strategy for Connectivity Maintenance and Recovery in Heterogeneous Multi-Robot Systems

Fiasche, Enrico	Université Côte D'Azur
Malis, Ezio	Inria
Martinet, Philippe	INRIA
Keywords: Multi-Robot Systems, Task Planning, Cooperating Robots Abstract: Ensuring connectivity and coordination in heterogeneous multi-robot systems (MRS) navigating complex environments is a critical challenge, especially when communication constraints and obstacles cause robots to become lost or disconnected. This paper presents a novel approach integrating Model Predictive Control (MPC) with Generalized Connectivity Maintenance (GCM) to enable real-time path adaptation while preserving connectivity. We introduce a decentralized decision-making framework that enables robots to recover lost members dynamically. When reconnection is infeasible, the system adapts the mission to continue while accounting for disconnected robots. Our method is evaluated through extensive simulations, showing its scalability and effectiveness in maintaining connectivity and ensuring mission success. Additionally, we propose a new evaluation metric that comprehensively assesses system performance, considering connectivity, coordination, and mission success in challenging environments.

13:35-13:40, Paper TuBT18.4
Bio-Inspired Shape Self-Assembly in Large-Scale Swarm Robots under Information Asymmetry

Wang, Jiali	Chongqing University
Xiao, Dengyu	Chongqing University
Wang, Gang	Chongqing University
Wu, Lang	Huazhong University of Science and Technology
Pu, Huayan	Chongqing University
Luo, Jun	Chongqing University
Keywords: Multi-Robot Systems, Distributed Robot Systems, Cooperating Robots Abstract: This study investigates the problem of large-scale swarm robots shape self-assembly problem under conditions of information asymmetry. Existing methods assume complete sharing of global information; however, this assumption has significant limitations in terms of resource consumption and swarm emergence. On the other hand, strategies that rely entirely on local information struggling to achieve the self-assembly of complex shapes, especially disconnected shapes. To address these challenges, this study proposes a novel bio-inspired distributed self-assembly strategy specifically designed for information asymmetric swarm. The strategy draws inspiration from the task specialization mechanism between scout ants and worker ants in social insects, guiding individuals to efficiently complete shape-assembly under information asymmetry through local perception, neighborhood interactions, and dynamic rule adjustments. Experimental results show that the proposed strategy successfully achieves the self-assembly of various shapes, including simple shapes (e.g., circle, rectangle), complex shapes (e.g., human, flower, and letter "A"), and disconnected shapes (e.g., letter "IO"). This demonstrates the strategy's adaptability to shape complexity. Furthermore, experiments with varying swarm sizes validate the strategy's robustness and scalability across different scales. During the experiments, we unexpectedly observed emergent behaviors within the swarm, further confirming that the proposed strategy not only significantly enhances task flexibility but also strengthens swarm emergence. These results indicate that the proposed method provides an efficient, scalable, and innovative solution for swarm robots self-assembly under information asymmetry.

13:40-13:45, Paper TuBT18.5
Multi-UAV Deployment in Obstacle-Cluttered Environments with LOS Connectivity

Chen, Yuda	Peking University
Wang, Shuaikang	Peking University
Li, Jie	National University of Defense Technology
Guo, Meng	Peking University
Keywords: Multi-Robot Systems, Planning, Scheduling and Coordination, Motion Control Abstract: A reliable communication network is essential for multiple UAVs operating within obstacle-cluttered environments, where limited communication due to obstructions often occurs. A common solution is to deploy intermediate UAVs to relay information via a multi-hop network, which introduces two challenges: (i) how to design the structure of multihop networks; and (ii) how to maintain connectivity during collaborative motion. To this end, this work first proposes an efficient constrained search method based on the minimumedge RRT‹ algorithm, to find a spanning-tree topology that requires a less number of UAVs for the deployment task. Then, to achieve this deployment, a distributed model predictive control strategy is proposed for the online motion coordination. It explicitly incorporates not only the inter-UAV and UAVobstacle distance constraints, but also the line-of-sight (LOS) connectivity constraint. These constraints are well-known to be nonlinear and often tackled by various approximations. In contrast, this work provides a theoretical guarantee that all agent trajectories are ensured to be collision-free with a teamwise LOS connectivity at all time. Numerous simulations are performed in 3D valley-like environments, while hardware experiments validate its dynamic adaptation when the deployment position changes online.

13:45-13:50, Paper TuBT18.6
DHC-ME: A Decentralized Hybrid Cooperative Approach for Multi-Robot Autonomous Exploration

Jia, Wenhao	Zhejiang University of Technology
Xu, Yang	Zhejiang University
Qian, Chenglong	Zhejiang University of Technology
Shi, Xiufang	Zhejiang University of Technology
Chen, Jiming	Zhejiang University
Li, Liang	Zhejiang Univerisity
Keywords: Multi-Robot Systems, View Planning for SLAM, Task and Motion Planning Abstract: Multi-robot exploration in unknown environments is a fundamental task for multi-robot systems, which requires the coordination of the robots to avoid collisions and conflicts while performing task allocation. Existing exploration strategies improve the efficiency of multi-robot exploration by modeling the multi-robot task allocation problem as a variant of the multiple traveling salesman problem. However, this is computationally intensive and difficult to deploy on physical platforms. Hence, this paper develops a hybrid strategy for range-sensing multi-robot exploration with effective team coordination, enabling a larger team dispersion degree and higher exploration efficiency. In addition, we present a novel multi-robot exploration point detection method suitable for narrow and dynamic environments, effectively reducing exploration failure and incompleteness. The Gazebo simulations demonstrate better exploration efficiency and the least time cost of our exploration framework compared with state-of-the-art methods, and real-world experiments also validate the effectiveness. The code is released at https://github.com/NeSC-IV/DHC_ME.

13:50-13:55, Paper TuBT18.7
CLGA: A Collaborative LLM Framework for Dynamic Goal Assignment in Multi-Robot Systems

Yu, Xin	Beihang University
Li, Haoyuan	Beihang University
Wang, Yandong	Beihang University
Li, Simin	Beihang University
Shi, Rongye	Beihang University
Ai, Gangzheng	Beihang University
Pu, Zhiqiang	University of Chinese Academy of Sciences; Institute of Automati
Wu, Wenjun	Beihang University
Keywords: Multi-Robot Systems, Human-Robot Collaboration, Task Planning Abstract: Goal assignment is a critical challenge in multi-robot systems. The emergence of large language models (LLMs) has enabled the use of natural language commands for tackling goal assignment problems. However, applying LLMs directly to these tasks presents two limitations: 1) limited accuracy and 2) excessive decision delays due to their autoregressive nature, hindering adaptability to unexpected changes. To address these issues, inspired by dual-process theory, we propose a framework called Collaborative LLMs for dynamic Goal Assignment (CLGA). Specifically, we leverage LLMs for pre-planning tasks and invoke an external solver to generate an initial goal assignment solution, ensuring solution accuracy. During execution, small-scale models enable real-time adjustments to respond to dynamic environmental changes. This approach integrates the strengths of slow, precise pre-planning and fast, adaptive online adjustments, allowing agents to efficiently handle real-world challenges. Additionally, we introduce a benchmark dataset for NLP-based goal assignment to advance research in this domain. Simulation and real-world experiments demonstrate that CLGA significantly enhances task execution efficiency and flexibility in multi-robot systems.

13:55-14:00, Paper TuBT18.8
Scalable Multi-Robot Cooperation for Multi-Goal Tasks Using Reinforcement Learning

An, Tianxu	ETH Zurich
Lee, Joonho	Neuromeka
Bjelonic, Marko	ETH Zurich
De Vincenti, Flavio	ETH Zurich
Hutter, Marco	ETH Zurich
Keywords: Legged Robots, Reinforcement Learning, Multi-Robot Systems Abstract: Coordinated navigation of an arbitrary number of robots to an arbitrary number of goals is a big challenge in robotics, often hindered by scalability limitations of existing strategies. This paper introduces a decentralized multi-agent control system using neural network policies trained in simulation. By leveraging permutation invariant neural network architectures and model-free reinforcement learning, our policy enables robots to prioritize varying numbers of collaborating robots and goals in a zero-shot manner without being biased by ordering or limited by a fixed capacity. We validate the task performance and scalability of our policies through experiments in both simulation and real-world settings. Our approach achieves a 10.3% higher success rate in collaborative navigation tasks compared to a policy without a permutation invariant encoder. Additionally, it finds near-optimal solutions for multi-robot navigation problems while being two orders of magnitude faster than an optimization-based centralized controller. We deploy our multi-goal navigation policies on two wheeled-legged quadrupedal robots, which successfully complete a series of multi-goal navigation missions.


TuBT19	210C
Grasping 2	Regular Session
Co-Chair: Yu, Jingjin	Rutgers University

13:20-13:25, Paper TuBT19.1
Diffusion Suction Grasping with Large-Scale Parcel Dataset

Huang, Dingtao	Tsinghua University
Hua, Debei	Tsinghua University
Yu, Dongfang	Tsinghua University
He, Xinyi	Tsinghua University
Lin, Ente	Tsinghua University
Wang, Lianghong	FuWei
Hou, Jinliang	Guangzhou Fuwei Intelligent Technology Co., Ltd
Zeng, Long	Tsinghua University
Keywords: Grasping, Deep Learning in Grasping and Manipulation, Factory Automation Abstract: While recent advances in suction grasping have shown remarkable progress, significant challenges persist particularly in cluttered and complex parcel handling scenarios. Current approaches are limited by (1) the lack of comprehensive parcel-specific suction grasp datasets and (2) poor adaptability to diverse object properties, including size, geometry, and texture. We address these challenges through two main contributions. Firstly, we introduce the Parcel-SuctionDataset, a large-scale synthetic dataset containing 25 thousand cluttered scenes with 410 million precision-annotated suction grasp poses, generated via our novel geometric sampling algorithm. Secondly, we propose Diffusion-Suction, a framework that innovatively reformulates suction grasp prediction as a conditional generation task using denoising diffusion probabilistic models. Our method iteratively refines random noise into suction grasping score through visual-conditioned guidance from point cloud observations, effectively learning spatial point-wise affordances from our synthetic dataset. Extensive experiments demonstrate that the simple yet efficient DiffusionSuction achieves new state-of-the-art performance compared to previous models on both Parcel-Suction-Dataset and the public SuctionNet-1Billion benchmark. This work provides a robust foundation for advancing automated parcel handling systems in real-world applications.

13:25-13:30, Paper TuBT19.2
DG16M: A Large-Scale Dataset for Dual-Arm Grasping with Force-Optimized Grasps

Karim, Md Faizal	IIIT Hyderabad
Hashmi, Mohammed Saad	International Institute of Information Technology Hyderabad
Bollimuntha, Shreya	International Institute of Information Technology Hyderabad
Tapeti, Mahesh Reddy	VNR Vignana Jyothi Institute of Engineering &Technology
Singh, Gaurav	IIIT Hyderabad
Govindan, Nagamanikandan	IIITDM Kancheepuram
Krishna, Madhava	IIIT Hyderabad
Keywords: Grasping, Deep Learning in Grasping and Manipulation, Dual Arm Manipulation Abstract: Dual-arm robotic grasping is crucial for handling large objects that require stable and coordinated manipulation. While single-arm grasping has been extensively studied, datasets tailored for dual-arm settings remain scarce. We introduce a large-scale dataset of 16 million dual-arm grasps, evaluated under improved force-closure constraints. Additionally, we develop a benchmark dataset containing 300 objects with approximately 30,000 grasps, evaluated in a physics simulation environment, providing a better grasp quality assessment for dual-arm grasp synthesis methods. Finally, we demonstrate the effectiveness of our dataset by training a Dual-Arm Grasp Classifier network that outperforms the state of-the-art methods by 15%, achieving higher grasp success rates and improved generalization across objects.

13:30-13:35, Paper TuBT19.3
KARL: Kalman-Filter Assisted Reinforcement Learner for Dynamic Object Tracking and Grasping

Boyalakuntla, Kowndinya	Rutgers University
Boularias, Abdeslam	Rutgers University
Yu, Jingjin	Rutgers University
Keywords: Grasping, Physical Human-Robot Interaction, Visual Servoing Abstract: We present Kalman-filter Assisted Reinforcement Learner (KARL) for dynamic object tracking and grasping over eye-on-hand (EoH) systems, significantly expanding such systems' capabilities in challenging, realistic environments. In comparison to the previous state-of-the-art, KARL (1) incorporates a novel six-stage RL curriculum that doubles the system’s motion range, thereby greatly enhancing the system's grasping performance, (2) integrates a robust Kalman filter layer between the perception and reinforcement learning (RL) control modules, enabling the system to maintain an uncertain but continuous 6D pose estimate even when the target object temporarily exits the camera’s field-of-view or undergoes rapid, unpredictable motion, and (3) introduces mechanisms to allow retries to gracefully recover from unavoidable policy execution failures. Extensive evaluations conducted in both simulation and real-world experiments qualitatively and quantitatively corroborates KARL's advantage over earlier systems, achieving higher grasp success rates, notably faster robot execution speed, and reduced collision incidences. Source code and supplementary materials for KARL will be made available at: https://github.com/arc-l/karl.

13:35-13:40, Paper TuBT19.4
PANDAS: Prediction and Detection of Accurate Slippage

Yan, Teng	Shenzhen University
Zhou, Xiaohong	Shenzhen Technology University
Long, Jiamin	Shenzhen Technology University
Zhang, Yang	Shenzhen Technology University
Li, Wenxian	Shenzhen Technology University
Keywords: Grasping, Grippers and Other End-Effectors, Deep Learning Methods Abstract: High‑resolution tactile sensing and advanced computational models have accelerated progress in robotic grasping; however, real‑time, stable manipulation of smooth and fragile objects still lags behind. The challenges are twofold: first, the robot must detect incipient slip at sub‑millimeter scales in real time; second, the system must issue millisecond‑level early warnings before true instability occurs so that the controller has sufficient time to react. To address these challenges, we propose PANDAS (Prediction AND Detection of Accurate Slippage), a framework that integrates a physics‑informed, multimodal spatiotemporal network for slip detection with a probabilistic temporal reasoning module for forecasting near‑future risk. Experimental results demonstrate that the proposed method achieves a slip sensitivity of 94.6%, a response latency of 28ms, and an early-warning lead time of 32ms. Moreover, under 5dB Gaussian noise, it maintains a high F1-score of 92.3%, validating its robustness, predictive capability, and suitability for edge deployment in dynamic, high-noise environments.

13:40-13:45, Paper TuBT19.5
Grasping Unknown Objects with Only One Demonstration

Yanghong, Li	University of Science and Technology of China
He, Haiyang	University of Science and Technology of China
Chai, Jin	University of Science and Technology of China
Bai, Guangrui	University of Science and Technology of China
Dong, Erbao	University of Science and Technology of China
Keywords: Grasping, Learning from Demonstration, Multifingered Hands Abstract: The combination of imitation learning and reinforcement learning is expected to solve the challenge of grasping unknown objects with anthropomorphic hand-arm systems. However, this method requires a large number of perfect demonstrations and the implementation in real robots often differs greatly from the simulation effect. In this work, we introduce a curriculum learning mechanism and propose a multifinger grasping learning method that requires only one demonstration. First, a human remotely manipulates the robot via a wearable device to perform a successful grasping demonstration. The state of the object and the robot is recorded as the initial reference trajectory for reinforcement learning training. Then, by combining robot proprioception and the point cloud features of the target object, a multimodal deep reinforcement learning agent generates corrective actions for the reference demonstration in the synergy subspace of grasping and trains in simulation environments. Meanwhile, considering the topological and geometric variations of different objects, we establish a learning curriculum for objects to gradually improve the generalization ability of the agent, starting from similar to unknown objects. Finally, only successfully trained models are deployed on real robots. Compared to the baseline method, our method reduces dependence on the grasping data set while improving learning efficiency. Our success rate for grasping novel objects is higher.

13:45-13:50, Paper TuBT19.6
Grasp-And-Classify Robotic Sorting with Grasping Rectangle Correction and Weighted Nearest-Neighbor Relation Network

Han, Dongxiao	Shanghai University
Li, Yuwen	Shanghai University
Keywords: Grasping, Deep Learning for Visual Perception, Deep Learning Methods Abstract: Robotic sorting in cluttered environments still faces significant challenges, especially with resource-constrained hardware. Traditional detect-and-grasp workflows usually require extensive image collection and annotation for model training, which can become impractical when the categories of the sorted objects frequently change. To overcome this issue, this article proposes a grasp-and-classify robotic sorting method with deep learning-based object grasping and classification algorithms which can be deployed on resource-constrained hardware platforms. To do this, a Grasping Rectangle Correction (GRC) algorithm is incorporated to adjust the grasping poses generated from the Generative Residual Convolutional Neural Network (GR-ConvNetv2). Then, an efficient Weighted Nearest-Neighbor Relation Network (WNNRNet) is developed for few-shot object classification. This model unifies Deep Nearest Neighbor Neural Network (DN4) and Relation network to reduce overfitting through feature sharing, and the joint training with a weighted multi-task loss function can enhance the generalization capability of few-shot classification. Simulation tests have been carried out to validate the GRC and WNNRNet algorithms with Cornell, Jacquard, and MiniImageNet datasets. Finally, a robotic sorting system with a UR10 robot and a Kinect camera has been built to perform real-world sorting tests to demonstrate the effectiveness of the proposed method. Benefited from the efficient correction of the grasping pose with the GRC algorithm and the fact that the WNNRNet requires limited samples for training, the proposed method can be deployed on a consumer-level laptop for sorting stacked objects in scenarios with varying categories.

13:50-13:55, Paper TuBT19.7
A Novel Gripper with Semi-Peaucellier Linkage and Idle-Stroke Mechanism for Linear Pinching and Self-Adaptive Grasping

Ding, Haokai	Shenzhen Technology University
Zhang, Wenzeng	Shenzhen X-Institute
Keywords: Grasping, Grippers and Other End-Effectors, Mechanism Design Abstract: This paper introduces a novel robotic gripper, named as the SPD gripper. It features a palm and two mechanically identical and symmetrically arranged fingers, which can be driven independently or by a single motor. The fingertips of the fingers follow a linear motion trajectory, facilitating the grasping of objects of various sizes on a tabletop without the need to adjust the overall height of the gripper. Traditional industrial grippers with parallel gripping capabilities often exhibit an arcuate motion at the fingertips, requiring the entire robotic arm to adjust its height to avoid collisions with the tabletop. The SPD gripper, with its linear parallel gripping mechanism, effectively addresses this issue. Furthermore, the SPD gripper possesses adaptive capabilities, accommodating objects of different shapes and sizes. This paper presents the design philosophy, fundamental composition principles, and optimization analysis theory of the SPD gripper. Based on the design theory, a robotic gripper prototype was developed and tested. The experimental results demonstrate that the robotic gripper successfully achieves linear parallel gripping functionality and exhibits good adaptability. In the context of the ongoing development of embodied intelligence technologies, this robotic gripper can assist various robots in achieving effective grasping, laying a solid foundation for collecting data to enhance deep learning training.

13:55-14:00, Paper TuBT19.8
Learning Gentle Grasping from Human-Free Force Control Demonstration

Li, Mingxuan	Tsinghua University
Zhang, Lunwei	Tsinghua University
Li, Tiemin	Tsinghua University
Jiang, Yao	Tsinghua University
Keywords: Grasping, Force and Tactile Sensing, Learning from Demonstration Abstract: Humans can steadily and gently grasp unfamiliar objects based on tactile perception. Robots still face challenges in achieving similar performance due to the difficulty of learning accurate grasp-force predictions and force control strategies that can be generalized from limited data. In this article, we propose an approach for learning grasping from ideal force control demonstrations, to achieve similar performance of human hands with limited data size. Our approach utilizes objects with known contact characteristics to automatically generate reference force curves without human demonstrations. In addition, we design the dual convolutional neural networks (Dual-CNN) architecture which incorporates a physics-based mechanics module for learning target grasping force predictions from demonstrations. The described method can be effectively applied in vision-based tactile sensors and enables gentle and stable grasping of objects from the ground. The described prediction model and grasping strategy were validated in offline evaluations and online experiments, and the accuracy and generalizability were demonstrated.


TuBT20	210D
Humanoid Robot Systems 2	Regular Session
Chair: Lorthioir, Guillaume	AIST
Co-Chair: Xiong, Xiaobin	University of Wisconsin Madison

13:20-13:25, Paper TuBT20.1
IWalker: Imperative Visual Planning for Walking Humanoid Robot

Lin, Xiao	Georgia Institute of Technology
Huang, Yuhao	University of Wisconsin-Madison
Fu, Taimeng	University at Buffalo
Xiong, Xiaobin	University of Wisconsin Madison
Wang, Chen	University at Buffalo
Keywords: Humanoid Robot Systems, Whole-Body Motion Planning and Control, AI-Enabled Robotics Abstract: Humanoid robots, designed to operate in human-centric environments, serve as a fundamental platform for a broad range of tasks. Although humanoid robots have been extensively studied for decades, a majority of existing humanoid robots still heavily rely on complex modular frameworks, leading to potential compounded errors from independent sensing, planning, and acting components. In response, we propose an end-to-end humanoid sense-plan-act walking system, enabling vision-based obstacle avoidance, footstep planning, and whole body balancing, simultaneously. To achieve self-supervised learning, we designed two imperative learning (IL)-based bilevel optimizations for model-predictive step planning and whole body balancing, respectively. This enables the robot to learn from arbitrary unlabeled data, significantly improving its adaptability and generalization capabilities. We refer to our method as iWalker and demonstrate its effectiveness in both simulated and real-world environments, representing a significant advancement toward autonomous humanoid robots.

13:25-13:30, Paper TuBT20.2
Generalizable Humanoid Manipulation with 3D Diffusion Policies

Ze, Yanjie	Stanford University
Chen, Zixuan	University of California San Diego
Wang, Wenhao	University of Pennsylvania
Chen, Tianyi	University of Pennsylvania
He, Xialin	UIUC
Yuan, Ying	Tsinghua University
Peng, Xue Bin	Simon Fraser University
Wu, Jiajun	Stanford University
Keywords: Humanoid Robot Systems, Imitation Learning, Learning from Demonstration Abstract: Humanoid robots capable of autonomous operation in diverse environments have long been a goal for roboticists. However, autonomous manipulation by humanoid robots has largely been restricted to one specific scene, primarily due to the difficulty of acquiring generalizable skills and the expensiveness of in-the-wild humanoid robot data. In this work, we build a real-world robotic system to address this challenging problem. Our system is mainly an integration of 1) a whole-upper-body robotic teleoperation system to acquire human-like robot data, 2) a 25-DoF humanoid robot platform with a height-adjustable cart and a 3D LiDAR sensor, and 3) an improved 3D Diffusion Policy learning algorithm for humanoid robots to learn from noisy human data. We run more than 2000 episodes of policy rollouts on the real robot for rigorous policy evaluation. Empowered by this system, we show that using only data collected in one single scene and with only onboard computing, a full-sized humanoid robot can autonomously perform skills in diverse real-world scenarios.

13:30-13:35, Paper TuBT20.3
Structural Analysis and Design of Humanoid Arms from Human Arm Reachable Workspace

Zhu, Xin	BIT
Liu, Haozhou	Beijing Institute of Technology
Li, Qingqing	Beijing Institute of Technology
Cai, Zhaoyang	Beijing University of Civil Engineering and Architecture
Yu, Zhangguo	Beijing Institute of Technology
Chen, Xuechao	Beijing Insititute of Technology
Huang, Qiang	Beijing Institute of Technology
Kheddar, Abderrahmane	CNRS-AIST
Keywords: Humanoid Robot Systems, Human and Humanoid Motion Analysis and Synthesis Abstract: Having a workspace (postural-reachable) for the humanoid arm that similar to a human arm around the upper body is essential for effective operation in human-like environments. However, research on this topic is scarce. The purpose of this paper is to explore the optimal structural configuration of a humanoid arm in the workspace around the upper body (WAUB). To achieve this, we first analyze the reachability data of human arm in the WAUB, and then propose a design strategy that integrates this human data into the humanoid arm design. The strategy consists of a structural parameter optimization method based on the postural reachability of task points and a task point identification method in the WAUB. The optimization process utilizes the covariance matrix adaptation evolutionary strategy (CMA-ES). To ensure realistic arm motion, the collision-free control is integrated into each optimization loop. The task point identification involves establishing an expected workspace from human data and selecting task points through analysis of the hard-to-reach areas. We propose various humanoid upper body models, taking into account the anthropomorphic shape of human upper body and the structural features of humanoid shoulder. The effectiveness of the optimization is assessed through performance tests in the WAUB. Our results provide practical insights for humanoid arm design.

13:35-13:40, Paper TuBT20.4
Physics-Informed Neural Networks with Unscented Kalman Filter for Sensorless Joint Torque Estimation in Humanoid Robots

Sorrentino, Ines	Istituto Italiano Di Tecnologia
Romualdi, Giulio	Istituto Italiano Di Tecnologia
Moretti, Lorenzo	Istituto Italiano Di Tecnologia
Traversaro, Silvio	Istituto Italiano Di Tecnologia
Pucci, Daniele	Italian Institute of Technology
Keywords: Humanoid Robot Systems, Calibration and Identification, Humanoid and Bipedal Locomotion Abstract: This paper presents a novel framework for whole- body torque control of humanoid robots without joint torque sensors, designed for systems with electric motors and high- ratio harmonic drives. The approach integrates Physics-Informed Neural Networks (PINNs) for friction modeling and Unscented Kalman Filtering (UKF) for joint torque estimation, within a real-time torque control architecture. PINNs estimate nonlinear static and dynamic friction from joint and motor velocity readings, capturing effects like motor actuation without joint movement. The UKF utilizes PINN-based friction estimates as direct mea surement inputs, improving torque estimation robustness. Experimental validation on the ergoCub humanoid robot demonstrates improved torque tracking accuracy, enhanced energy efficiency, and superior disturbance rejection compared to the state-of-the-art Recursive Newton-Euler Algorithm (RNEA), using a dynamic balancing experiment. The framework’s scalability is shown by consistent performance across robots with similar hardware but different friction characteristics, without re-identification. Furthermore, a comparative analysis with position control highlights the advantages of the proposed torque control approach. The results establish the method as a scalable and practical solution for sensorless torque control in humanoid robots, ensuring torque tracking, adaptability, and stability in dynamic environments.

13:40-13:45, Paper TuBT20.5
Humanoid Robot RHP Friends: Seamless Combination of Autonomous and Teleoperated Tasks in a Nursing Context

Benallegue, Mehdi	AIST Japan
Lorthioir, Guillaume	AIST
Dallard, Antonin	LIRMM
Cisneros Limon, Rafael	National Institute of Advanced Industrial Science and Technology
Kumagai, Iori	National Inst. of AIST
Morisawa, Mitsuharu	National Inst. of AIST
Kaminaga, Hiroshi	National Inst. of AIST
Murooka, Masaki	AIST
André, Antoine N.	AIST
Gergondet, Pierre	CNRS
Kaneko, Kenji	National Inst. of AIST
Caron, Guillaume	CNRS
Kanehiro, Fumio	National Inst. of AIST
Kheddar, Abderrahmane	CNRS-AIST
Yukizaki, Soh	Kawasaki Heavy Industries, Ltd
Karasuyama, Junichi	Kawasaki Heavy Industries
Murakami, Junichi	Kawasaki Heavy Industries
Kamon, Masayuki	Kawasaki Heavy Industries, Ltd
Keywords: Humanoid Robot Systems, Multi-Contact Whole-Body Motion Planning and Control, Telerobotics and Teleoperation Abstract: This paper describes RHP Friends, a social humanoid robot developed to enable assistive robotic deployments in human-coexisting environments. As a use-case application, we present its potential use in nursing by extending its capabilities to operate human devices and tools according to the task and by enabling remote assistance operations. To meet a wide variety of tasks and situations in environments designed by and for humans, we developed a system that seamlessly integrates the slim and lightweight robot and several technologies: locomanipulation, multi-contact motion, teleoperation, and object detection and tracking. We demonstrated the system's usage in a nursing application. The robot efficiently performed the daily task of patient transfer and a non-routine task, represented by a request to operate a circuit breaker. This demonstration, held at the 2023 International Robot Exhibition (IREX), conducted three times a day over three days.

13:45-13:50, Paper TuBT20.6
HiFAR: Multi-Stage Curriculum Learning for High-Dynamics Humanoid Fall Recovery

Chen, Penghui	Tsinghua University
Wang, Yushi	Tsinghua University
Luo, Changsheng	Tsinghua University
Cai, Wenhan	Booster Robotics
Zhao, Mingguo	Tsinghua University
Keywords: Humanoid Robot Systems, Failure Detection and Recovery, Machine Learning for Robot Control Abstract: Humanoid robots encounter considerable difficulties in autonomously recovering from falls, especially within dynamic and unstructured environments. Conventional control methodologies are often inadequate in addressing the complexities associated with high-dimensional dynamics and the contact-rich nature of fall recovery. Meanwhile, reinforcement learning techniques are hindered by issues related to sparse rewards, intricate collision scenarios, and discrepancies between simulation and real-world applications. In this study, we introduce a multi-stage curriculum learning framework, termed HiFAR. This framework employs a staged learning approach that progressively incorporates increasingly complex and high-dimensional recovery tasks, thereby facilitating the robot's acquisition of efficient and stable fall recovery strategies. Furthermore, it enables the robot to adapt its policy to effectively manage real-world fall incidents. We assess the efficacy of the proposed method using a real humanoid robot, showcasing its capability to autonomously recover from a diverse range of falls with high success rates, rapid recovery times, robustness, and generalization.

13:50-13:55, Paper TuBT20.7
Distillation-PPO: A Novel Two-Stage Reinforcement Learning Framework for Humanoid Robot Perceptive Locomotion

Zhang, Qiang	The Hong Kong University of Science and Technology (Guangzhou)
Han, Gang	Beijing Innovation Center of Humanoid Robotics
Zhao, Wen	Beijing Innovation Center of Humanoid Robotics
Sun, Jingkai	The Hong Kong University of Science and Technology(GZ)
Cao, Jiahang	The Hong Kong University of Science and Technology (Guangzhou)
Wang, Jiaxu	Hong Kong University of Science and Technology; Hong Kong Univer
Sun, Chenghao	Beijing Innovation Center of Humanoid Robotics Co. Ltd
Guo, Yijie	Beijing Innovation Center of Humanoid Robotics
Xu, Renjing	The Hong Kong University of Science and Technology (Guangzhou)
Keywords: Humanoid Robot Systems Abstract: In recent years, humanoid robots have garnered significant attention from both academia and industry due to their high adaptability to environments and human-like characteristics. With the rapid advancement of reinforcement learning, substantial progress has been made in the walking control of humanoid robots. However, existing methods still face challenges when dealing with complex environments and irregular terrains. In the field of perceptive locomotion, existing approaches are generally divided into two-stage methods and end-to-end methods. Two-stage methods first train a teacher policy in a simulated environment and then use distillation techniques, such as DAgger, to transfer the privileged information learned as latent features or actions to the student policy. End-to-end methods, on the other hand, forgo the learning of privileged information and directly learn policies from a partially observable Markov decision process (POMDP) through reinforcement learning. However, due to the lack of supervision from a teacher policy, end-to-end methods often face difficulties in training and exhibit unstable performance in real-world applications. This paper proposes an innovative two-stage perceptive locomotion framework that combines the advantages of teacher policies learned in a fully observable Markov decision process (MDP) to regularize and supervise the student policy. At the same time, it leverages the characteristics of reinforcement learning to ensure that the student policy can continue to learn in a POMDP, thereby enhancing the model's upper bound. Our experimental results demonstrate that our two-stage training framework achieves higher training efficiency and stability in simulated environments, while also exhibiting better robustness and generalization capabilities in real-world applications.


TuBT21	101
Optimization and Optimal Control 2	Regular Session
Chair: Qi, Jiaming	Centre for Transformative Garment Production, HongKong
Co-Chair: Robuffo Giordano, Paolo	Irisa Cnrs Umr6074

13:20-13:25, Paper TuBT21.1
System-Level Efficient Performance of EMLA-Driven Heavy-Duty Manipulators Via Bilevel Optimization Framework with a Leader-Follower Scenario (I)

Bahari, Mohammad	Tampere University
Paz Anaya, Alvaro	Tampere University
Shahna, Mehdi Heydari	Tampere University
Mustalahti, Pauli	Tampere University
Mattila, Jouni	Tampere University
Keywords: Optimization and Optimal Control, Mobile Manipulation, Robust/Adaptive Control Abstract: The global push for sustainability and energy efficiency is driving significant advancements across various industries, including the development of electrified solutions for heavy-duty mobile manipulators (HDMMs). Electromechanical linear actuators (EMLAs), powered by permanent magnet synchronous motors, present an all-electric alternative to traditional internal combustion engine (ICE)-powered hydraulic actuators, offering a promising path toward an eco-friendly future for HDMMs. However, the limited operational range of electrified HDMMs, closely tied to battery capacity, highlights the need to fully exploit the potential of EMLAs that drive the manipulators. This goal is contingent upon a deep understanding of the harmonious interplay between EMLA mechanisms and the dynamic behavior of heavy-duty manipulators. To this end, this paper introduces a bilevel multi-objective optimization framework, conceptualizing the EMLA-actuated manipulator of an electrified HDMM as a leader–follower scenario. At the leader level, the optimization algorithm maximizes EMLA efficiency by considering electrical and mechanical constraints, while the follower level optimizes the manipulator's motion through a trajectory reference generator that adheres to manipulator limits. This optimization approach ensures that the system operates with a synergistic trade-off between the most efficient operating region of the actuation system, achieving a total efficiency of 70.3%. Furthermore, to complement this framework and ensure precise tracking of the generated optimal trajectories, a robust decomposed system control (RDSC) strategy is developed with accurate control and exponential stability. The proposed methodologies are validated on a 3-degrees-of-freedoms (DoFs) manipulator, demonstrating significant efficiency improvements while maintaining high-performance operation. Finally, experiments are conducted on an EMLA test bed under predefined optimal trajectories, simulating dynamic load conditions of the manipulator's lift joint and controlled with the developed RDSC. The results validate the effectiveness of the optimization framework and the control strategy.

13:25-13:30, Paper TuBT21.2
Sensitivity-Aware Model Predictive Control for Robots with Parametric Uncertainty

Belvedere, Tommaso	CNRS
Cognetti, Marco	LAAS-CNRS and Université De Toulouse
Oriolo, Giuseppe	Sapienza University of Rome
Robuffo Giordano, Paolo	Irisa Cnrs Umr6074
Keywords: Optimization and Optimal Control, Model Predictive Control, Robust/Adaptive Control of Robotic Systems, Aerial Systems: Mechanics and Control Abstract: This paper introduces a computationally efficient robust MPC scheme for controlling nonlinear systems affected by parametric uncertainties in their models. The approach leverages the recent notion of closed- loop state sensitivity and the associated ellipsoidal tubes of perturbed trajectories for taking into account online time-varying restrictions on state and input constraints. This makes the MPC controller “aware” of potential additional requirements needed to cope with parametric uncertainty, thus significantly improving the tracking performance and success rates during navigation in constrained environments. An extensive simulation campaign is presented to demonstrate the effectiveness of the proposed approach in handling parametric uncertainties and enhancing task performance, safety, and overall robustness. Furthermore, we also provide an experimental validation that shows the the feasibility of the approach in real-world conditions and corroborates the statistical findings of the simulation campaign. The versatility and efficiency of the proposed method make it therefore a valuable tool for real-time control of robots subject to uncertainty in their model

13:30-13:35, Paper TuBT21.3
Inducing Desired Equilibria in Constrained Noncooperative Games Via Nudging

Wang, Ao	Tongji University
Meng, Min	Tongji University
Li, Xiuxian	Tongji University
Keywords: Optimization and Optimal Control, Task and Motion Planning Abstract: This paper explores nudge schemes for a central regulator aimed at incentivizing players in constrained noncooperative games to reach desired equilibria. Unlike traditional intervention mechanisms where players update their actions by blindly following signals from the regulator, the nudge mechanism comprehensively integrates players' rational judgment by incorporating trust variables into players' models. This implies that players update their actions by evaluating the signals from the regulator and their own expectations to the incentive mechanisms. If the regulator's signals significantly deviate from the players' expectations, players decrease trust in the regulator and rely more on the their expectations when updating their actions. Conversely, if the signals align closely with their expectations, players tend to increase trust in the regulator and place greater emphasis on the regulator's signals. It should be noted that each player does not have access to other players' actions, which means that it updates its action in a distributed manner by only observing the actions of its neighbors through a directed balanced graph. Furthermore, static and dynamic nudges are designed based on different information available to the regulator, which are also extended to an online case with the desired equilibrum being time-varying. Finally, an application to the robot formation control is shown to validate the obtained results.

13:35-13:40, Paper TuBT21.4
ExAMPC: The Data-Driven Explainable and Approximate NMPC with Physical Insights

Allamaa, Jean Pierre	Siemens Digital Industries Software
Patrinos, Panagiotis	KU Leuven
Tong, Son	Siemens Digital Industries Software
Keywords: Optimization and Optimal Control, Autonomous Vehicle Navigation, Machine Learning for Robot Control Abstract: Amidst the surge in the use of Artificial Intelligence (AI) for control purposes, classical and model-based control methods maintain their popularity due to their transparency and deterministic nature. However, advanced controllers like Nonlinear Model Predictive Control (NMPC), despite proven capabilities, face adoption challenges due to their computational complexity and unpredictable closed-loop performance in complex validation systems. This paper introduces ExAMPC, a methodology bridging classical control and explainable AI by augmenting the NMPC with data-driven insights to improve the trustworthiness and reveal the optimization solution and closed-loop performance's sensitivities to physical variables and system parameters. By employing a low-order spline embedding, we reduce the open-loop trajectory dimensionality by over 95%, and integrate it with SHAP and Symbolic Regression from eXplainable AI (XAI) for an approximate NMPC, enabling intuitive physical insights into the NMPC's optimization routine. The prediction accuracy of the approximate NMPC is enhanced through physics-inspired continuous-time constraints penalties, reducing the predicted continuous trajectory violations by 93%. ExAMPC also enables accurate forecasting of the NMPC's computational requirements with explainable insights on worst-case scenarios. Experimental validation on automated valet parking and autonomous racing with lap-time optimization, demonstrates the methodology's practical effectiveness for potential real-world applications.

13:40-13:45, Paper TuBT21.5
Robust and Efficient Embedded Convex Optimization through First-Order Adaptive Caching

Mahajan, Ishaan Abhay	Columbia University
Plancher, Brian	Barnard College, Columbia University and Dartmouth College
Keywords: Optimization and Optimal Control, Robust/Adaptive Control Abstract: Recent advances in Model Predictive Control (MPC) leveraging a combination of first-order methods, such as the Alternating Direction Method of Multipliers (ADMM), and offline precomputation and caching of select operations, have excitingly enabled real-time MPC on microcontrollers. Unfortunately, these approaches require the use of fixed hyperparameters, limiting their adaptability and overall performance. In this work, we introduce First-Order Adaptive Caching, which precomputes not only select matrix operations but also their sensitivities to hyperparameter variations, enabling online hyperparameter updates without full recomputation of the cache. We demonstrate the effectiveness of our approach on a number of dynamic quadrotor tasks, achieving up to a 63.4% reduction in ADMM iterations over the use of optimized fixed hyperparameters and approaching 70% of the performance of a full cache recomputation, while reducing the computational cost from O(n^3) to O(n^2) complexity. This performance enables us to perform figure-eight trajectories on a 27g tiny quadrotor under wind disturbances. We release our implementation open-source for the benefit of the wider robotics community.

13:45-13:50, Paper TuBT21.6
Zero-Sum Differential Game-Based Optimal Fault-Tolerant Control for Modular Robot Manipulators with Actuator Failure (I)

An, Tianjiao	Changchun University of Technology
Jing, Haoxuan	Changchun University of Technology
Xu, Zixuan	Changchun University of Technology
Ji, Zebin	Changchun University of Technology
Ma, Bing	Changchun University of Technology
Dong, Bo	Changchun University of Technology
Keywords: Optimization and Optimal Control Abstract: This article proposes a zero-sum differential game-based optimal fault-tolerant control for modular robot manipulators (MRM) with actuator failure. First, the joint torque feedback (JTF) technique is applied to establish the dynamic model of MRM system with actuator failure. By applying game theory-based adaptive dynamic programming (ADP), the controller input and actuator failure are considered as two completely opposing players in the game. Second, the uncertainty terms within the MRM system model online are identified by using radial basis function neural network, thereby enhancing the accuracy system model. A single critic neural network is utilized to approximately solve the Hamilton-Jacobi-Isaacs (HJI) equation. Furthermore, the system's tracking error is proved to be uniform and ultimately bounded using Lyapunov stability theory. Finally, the experimental results acquired from the MRM platform proved the effectiveness of the proposed method, significantly enhancing both the fault-tolerant performance and control performance of the system.

13:50-13:55, Paper TuBT21.7
Data-Driven Modeling and Operation Optimization with Inherent Feature Extraction for Complex Industrial Processes (I)

Li, Sihong	Qingdao University of Science and Technology
Zheng, Yi	Shanghai Jiao Tong University
Li, Shaoyuan	Shanghai Jiao Tong University
Huang, Meng	Hangzhou Dianzi University
Keywords: Process Control, Optimization and Optimal Control Abstract: This work proposes a strategy for intelligent modeling and operational optimization of industrial processes to address varying feedstock properties during industrial production. The proposed method demonstrates sustained efficacy in a major petroleum refinery in China.


TuBT22	102A
Robotics and Automation in Agriculture and Forestry 2	Regular Session
Chair: Cafolla, Daniele	Swansea University
Co-Chair: Filntisis, Panagiotis Paraskevas	National Technical University of Athens

13:20-13:25, Paper TuBT22.1
Development and Characterization of an Adaptive Baromorphic End-Effector for Precision Agricultural Handling

Al Khathib, Nader	Swansea University
Cafolla, Daniele	Swansea University
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation Abstract: Agricultural harvesting requires careful handling of delicate crops, a challenge often unmet by traditional machinery. To address this need, this paper presents an intelligent baromorphic end-effector that marks a significant innovation in agricultural technology. This novel architecture integrates sensing elements that enable the system to dynamically adjust to different crop shapes and textures, ensuring a gentle touch that minimizes damage. The design and testing of the end-effector, which utilizes flexible materials and embedded air channel and embedded sensor to effectively modulate grip adaptability are presented. The manufacturing approach focuses on using advanced techniques to ensure the end-effector’s durability and scalability. The embedded sensor provided real-time data feedback, allowing constant adjustments that enhance both safety and efficiency during the harvesting process. Through extensive simulation and experimental testing, the system has demonstrated its features.

13:25-13:30, Paper TuBT22.2
Efficient Collision Detection for Long and Slender Robotic Links in Euclidean Distance Fields: Application to a Forestry Crane

Ecker, Marc-Philip	TU Wien, Austrian Institute of Technology
Bischof, Bernhard	Austrian Institute of Technology
Vu, Minh Nhat	TU Wien, Austria
Froehlich, Christoph	Austrian Institute of Technology
Glück, Tobias	AIT Austrian Institute of Technology GmbH
Kemmetmueller, Wolfgang	TU Wien
Keywords: Robotics and Automation in Agriculture and Forestry, Collision Avoidance Abstract: Collision-free motion planning in complex outdoor environments relies heavily on perceiving the surroundings through proprioceptive sensors. A widely used approach represents the environment as a voxelized Euclidean distance field, where robots are typically approximated by spheres. However, for large-scale manipulators such as forestry cranes, which feature long and slender links, this conventional spherical approximation becomes inefficient and inaccurate. This work presents a novel collision detection algorithm specifically designed to exploit the elongated structure of such manipulators, significantly enhancing the computational efficiency of motion planning algorithms. Unlike traditional sphere decomposition methods, our approach not only improves computational efficiency but also naturally eliminates the need to fine-tune the approximation accuracy as an additional parameter. We validate the algorithm’s effectiveness using real-world LiDAR data from a forestry crane application, as well as simulated environment data.

13:30-13:35, Paper TuBT22.3
3D Plant Root Skeleton Detection and Extraction

Lin, Jiakai	Binghamton University
Zhang, Jinchang	University of Georgia
Jin, Ge	Yancheng Institute of Technology
Song, Wenzhan	University of Georgia
Liu, Tianming	University of Georgia
Lu, Guoyu	University of Georgia
Keywords: Robotics and Automation in Life Sciences, Deep Learning Methods, AI-Based Methods Abstract: Plant roots typically exhibit a highly complex and dense architecture, incorporating numerous slender lateral roots and branches, which significantly hinders the precise capture and modeling of the entire root system. Additionally, roots often lack sufficient texture and color information, making it difficult to identify and track root traits using visual methods. Previous research on roots has been largely confined to 2D studies; however, exploring the 3D architecture of roots is crucial in botany. Since roots grow in real 3D space, 3D phenotypic information is more critical for studying genetic traits and their impact on root development. We have introduced a 3D root skeleton extraction method that efficiently derives the 3D architecture of plant roots from a few images. This method includes the detection and matching of lateral roots, triangulation to extract the skeletal structure of lateral roots, and the integration of lateral and primary roots. We developed a highly complex root dataset and tested our method on it. The extracted 3D root skeletons showed considerable similarity to the ground truth, validating the effectiveness of the model. This method can play a significant role in automated breeding robots. Through precise 3D root structure analysis, breeding robots can better identify plant phenotypic traits, especially root structure and growth patterns, helping practitioners select seeds with superior root systems. This automated approach not only improves breeding efficiency but also reduces manual intervention, making the breeding process more intelligent and efficient, thus advancing modern agriculture.

13:35-13:40, Paper TuBT22.4
Transformer-Based Spatio-Temporal Association of Apple Fruitlets

Freeman, Harry	Carnegie Mellon University
Kantor, George	Carnegie Mellon University
Keywords: Robotics and Automation in Agriculture and Forestry, Computer Vision for Automation, Agricultural Automation Abstract: In this paper, we present a transformer-based method to spatio-temporally associate apple fruitlets in stereo-images collected on different days and from different camera poses. State-of-the-art association methods in agriculture are dedicated towards matching larger crops using either high-resolution point clouds or temporally stable features, which are both difficult to obtain for smaller fruit in the field. To address these challenges, we propose a transformer-based architecture that encodes the shape and position of each fruitlet, and propagates and refines these features through a series of transformer encoder layers with alternating self and cross-attention. We demonstrate that our method is able to achieve an F1-score of 92.4% on data collected in a commercial apple orchard and outperforms all baselines and ablations. The code and data can be found at https://kantor-lab.github.io/fruit_associator/

13:40-13:45, Paper TuBT22.5
Category-Level 6D Object Pose Estimation in Agricultural Settings Using a Lattice-Deformation Framework and Diffusion-Augmented Synthetic Data

Glytsos, Marios	Athena Research Center
Filntisis, Panagiotis Paraskevas	National Technical University of Athens
Retsinas, George	National Technical University of Athens
Maragos, Petros	National Technical University of Athens
Keywords: Robotics and Automation in Agriculture and Forestry, Perception for Grasping and Manipulation, Agricultural Automation Abstract: Accurate 6D object pose estimation is essential for robotic grasping and manipulation, particularly in agriculture, where fruits and vegetables exhibit high intra-class variability in shape, size, and texture. The vast majority of existing methods rely on instance-specific CAD models or require depth sensors to resolve geometric ambiguities, making them impractical for real-world agricultural applications. In this work, we introduce PLANTPose, a novel framework for category-level 6D pose estimation that operates purely on RGB input. PLANTPose predicts both the 6D pose and deformation parameters relative to a base mesh, allowing a single category-level CAD model to adapt to unseen instances. This enables accurate pose estimation across varying shapes without relying on instance specific data. To enhance realism and improve generalization, we also leverage Stable Diffusion to refine synthetic training images with realistic texturing, mimicking variations due to ripeness and environmental factors and bridging the domain gap between synthetic data and the real world. Our evaluations on a challenging benchmark that includes bananas of various shapes, sizes, and ripeness status demonstrate the effectiveness of our framework in handling large intraclass variations while maintaining accurate 6D pose predictions, significantly outperforming the state-of-the-art RGB-based approach MegaPose.

13:45-13:50, Paper TuBT22.6
FloPE: Flower Pose Estimation for Precision Pollination

Shrestha, Rashik	West Virginia University
Rijal, Madhav	West Virginia University
Smith, Trevor	West Virginia University
Gu, Yu	West Virginia University
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation, Data Sets for Robotic Vision Abstract: This study presents Flower Pose Estimation (FloPE), a novel, real-time flower pose estimation framework for computationally constrained robotic pollination systems. Robotic pollination has been proposed to supplement natural pollination to ensure global food security due to the decreased population of natural pollinators. However, flower detection for pollination is challenging due to natural variability, flower clusters, and high accuracy demands due to the flowers' fragility when pollinating. This method leverages 3D Gaussian Splatting to generate photorealistic synthetic datasets with precise pose annotations, enabling effective knowledge distillation from a high-capacity teacher model to a lightweight student model for efficient inference. The approach was evaluated on both single and multi-arm robotic platforms, achieving a mean pose estimation error of 0.6 cm and 19.14 degrees within a very low computational cost. Our experiments validate the effectiveness of FloPE, achieving up to 78.75% pollination success rate and outperforming prior robotic pollination techniques.

13:50-13:55, Paper TuBT22.7
Online Estimation of Table-Top Grown Strawberry Mass in Field Conditions with Occlusions

Zhen, Jinshan	Beijing Academy of Agriculture and Forestry Sciences
Ge, Yuanyue	Beijing Academy of Agriculture and Forestry Sciences
Zhu, Tianxiao	Beijing Academy of Agriculture and Forestry Sciences
Zhao, Hui	Tianjin University of Technology
Xiong, Ya	Beijing Academy of Agriculture and Forestry Sciences
Keywords: Computer Vision for Automation, Agricultural Automation, Robotics and Automation in Agriculture and Forestry Abstract: Accurate mass estimation of table-top grown strawberries under field conditions remains challenging due to frequent occlusions and pose variations. This study proposes a vision-based pipeline integrating RGB-D sensing and deep learning to enable non-destructive, real-time and online mass estimation. The method employed YOLOv8-Seg for instance segmentation, Cycle-consistent generative adversarial network (CycleGAN) for occluded region completion, and tilt-angle correction to refine frontal projection area calculations. A polynomial regression model then mapped the geometric features to mass. Experiments demonstrated mean mass estimation errors of 8.11% for not-occluded strawberries and 10.47% for occluded cases. CycleGAN outperformed large mask inpainting (LaMa) model in occlusion recovery, achieving superior pixel area ratios (PAR) (mean: 0.978 vs. 1.112) and higher intersection over union (IoU) scores (92.3% vs. 47.7% in the [0.9–1] range). This approach addresses critical limitations of traditional methods, offering a robust solution for automated harvesting and yield monitoring with complex occlusion patterns.


TuBT23	102B
Sensor Fusion 2	Regular Session
Chair: Ma, Jun	The Hong Kong University of Science and Technology

13:20-13:25, Paper TuBT23.1
Offline Motion Tracking of Multi-Link Mechanisms Using Inertial Sensor Fusion and EKF-Preconditioned FGO

Tilahun, Aderajew	University of Virginia
Cox, Jeronimo	University of Virginia
Furukawa, Tomonari	University of Virginia
Dissanayake, Gamini	University of Technology Sydney
Keywords: Sensor Fusion, Human and Humanoid Motion Analysis and Synthesis, Optimization and Optimal Control Abstract: This paper presents a novel strategy for offline estimation of the spatial motion of a multi-link mechanism using Inertial Measurement Unit (IMU) sensors. Accelerometers, gyroscopes and magnetometers are strategically mounted and modeled to maximize measurement accuracy through the past work of inertial sensor fusion. The core contribution of this paper is the development of the Factor Graph Optimization (FGO) Preconditioned by the Extended Kalman Filter (EKF), which is termed FGOPreEKF in this paper, and its integration with the inertial sensor fusion. Since the online EKF efficiently derives the initial guess using the same motion and sensor models, the FGO estimates the motion of a multi-link mechanism efficiently and accurately. The proposed approach was experimentally validated on a two-link system mounted on a fast-moving linear axis, demonstrating superior accuracy compared to standalone EKF or FGO. These results demonstrate the potential of this approach for estimating multi-link motion in more complex scenarios.

13:25-13:30, Paper TuBT23.2
Learning Point Correspondences in Radar 3D Point Clouds for Radar-Inertial Odometry

Michalczyk, Jan	University of Klagenfurt
Weiss, Stephan	Universität Klagenfurt
Steinbrener, Jan	Universität Klagenfurt
Keywords: Sensor Fusion, Deep Learning Methods, Localization Abstract: Using 3D point clouds in odometry estimation in robotics often requires finding a set of correspondences between points in subsequent scans. While there are established methods for point clouds of sufficient quality, state-of-the-art still struggles when this quality drops. Thus, this paper presents a novel learning-based framework for predicting robust point correspondences between pairs of noisy, sparse and unstructured 3D point clouds from a light-weight, low-power, inexpensive, consumer-grade System-on-Chip (SoC) Frequency Modulated Continuous Wave (FMCW) radar sensor. Our network is based on the transformer architecture which allows leveraging the attention mechanism to discover pairs of points in consecutive scans with the greatest mutual affinity. The proposed network is trained in a self-supervised way using set-based multi-label classification cross-entropy loss, where the ground-truth set of matches is found using the Linear Sum Assignment (LSA) algorithm, which avoids tedious hand annotation of the training data. Additionally, posing the loss calculation as multi-label classification permits supervising on point correspondences directly instead of odometry error, which is not feasible for sparse and noisy data from the SoC radar we use. We evaluate our method with an open-source state-of-the-art Radar-Inertial Odometry (RIO) framework in real-world Unmanned Aerial Vehicle (UAV) flights and with the widely used public Coloradar dataset. Evaluation shows that the proposed method improves the position estimation accuracy by over 14 % and 19 % on average, respectively.

13:30-13:35, Paper TuBT23.3
Adaptive Invariant Extended Kalman Filter for Legged Robot State Estimation

Kim, Kyung-Hwan	Hyundai Motor Group
Ahn, DongHyun	Hyundai Motor Company
Lee, Dong-hyun	KIST(Korea Institute of Science and Technology), Seoul, Korea
Yoon, JuYoung	Hyundai Motor Company
Hyun, Dong Jin	Hyundai Motor Company
Keywords: Sensor Fusion, Legged Robots Abstract: State estimation is crucial for legged robots as it directly affects control performance and locomotion stability. In this paper, we propose an Adaptive Invariant Extended Kalman Filter to improve proprioceptive state estimation for legged robots. The proposed method adaptively adjusts the noise level of the contact foot model based on online covariance estimation, leading to improved state estimation under varying contact conditions. It effectively handles small slips that traditional slip rejection fails to address, as overly sensitive slip rejection settings risk causing filter divergence. Our approach employs a contact detection algorithm instead of contact sensors, reducing the reliance on additional hardware. The proposed method is validated through real-world experiments on the quadruped robot LeoQuad, demonstrating enhanced state estimation performance in dynamic locomotion scenarios.

13:35-13:40, Paper TuBT23.4
Iterative Camera-LiDAR Extrinsic Optimization Via Surrogate Diffusion

Ou, Ni	Beijing Institute of Technology
Chen, Zhuo	King's College London
Zhang, Xinru	Beijing Institute of Techonology
Wang, Junzheng	Beijing Institute of Technology
Keywords: Sensor Fusion, Intelligent Transportation Systems, Calibration and Identification Abstract: Cameras and LiDAR are essential sensors for autonomous vehicles. The fusion of camera and LiDAR data addresses the limitations of individual sensors but relies on precise extrinsic calibration. Recently, numerous end-to-end calibration methods have been proposed; however, most predict extrinsic parameters in a single step and lack iterative optimization capabilities. To address the increasing demand for higher accuracy, we propose a versatile iterative framework based on surrogate diffusion. This framework can enhance the performance of any calibration method without requiring architectural modifications. Specifically, the initial extrinsic parameters undergo iterative refinement through a denoising process, in which the original calibration method serves as a surrogate denoiser to estimate the final extrinsics at each step. For comparative analysis, we selected four state-of-the-art calibration methods as surrogate denoisers and compared the results of our diffusion process with those of two other iterative approaches. Extensive experiments demonstrate that when integrated with our diffusion model, all calibration methods achieve higher accuracy, improved robustness, and greater stability compared to other iterative techniques and their single-step counterparts.

13:40-13:45, Paper TuBT23.5
RoCaRS: Robust Camera-Radar BEV Segmentation for Sensor Failure Scenarios

Park, Byounghun	Hanyang University
Kim, Jeongtae	Hanyang University
Cho, Yongjae	Hanyang University
Hwang, Soonmin	Hanyang University
Keywords: Sensor Fusion, Deep Learning for Visual Perception, Computer Vision for Transportation Abstract: While camera–radar fusion has led to notable progress in autonomous driving, many existing approaches overlook the risk of sensor failures, which can critically compromise system safety. To address this limitation, we propose RoCaRS, a robust camera–radar fusion model designed for bird’s-eye view (BEV) segmentation under sensor failure scenarios. RoCaRS incorporates two key components—Radar-aware Backbone (RB) and Feature Spreading (FS)—to enhance BEV feature representation, along with a Dynamic Input Dropout Strategy (DIDS) and Bidirectional Feature Refinement (BFR) to address missing sensor inputs. Experiments on the nuScenes benchmark show that RoCaRS not only outperforms state-of-the-art fusion models under normal conditions but also maintains high performance under various sensor failure settings. Notably, in the complete absence of camera input, RoCaRS exceeds the baseline by +23.2 mIoU for map and +30.0 IoU for vehicle. Furthermore, it retains 99% of the radar-only model’s performance and achieves 103% of the camera-only model’s performance when either all cameras or all radars are disabled—without any retraining. These results highlight the potential of intermediate fusion to match the robustness of late fusion, while more effectively leveraging complementary modalities.

13:45-13:50, Paper TuBT23.6
CORENet: Cross-Modal 4D Radar Denoising Network with LiDAR Supervision for Autonomous Driving

Liu, Fuyang	Institute of Computing Technology, Chinese Academy of Sciences
Mei, Jilin	Institute of Computing Technology, Chinese Academy of Sciences
Mao, Fangyuan	Institute of Computing Technology, Chinese Academy of Sciences
Hu, Yu	Institute of Computing Technology Chinese Academy of Sciences
Min, Chen	Chinese Academy of Sciences
Xing, Yan	Beijing Institute of Control Engineering
Keywords: Sensor Fusion, Deep Learning Methods, Recognition Abstract: 4D radar-based object detection has garnered great attention for its robustness in adverse weather conditions and capacity to deliver rich spatial information across diverse driving scenarios. Nevertheless, the sparse and noisy nature of 4D radar point clouds poses substantial challenges for effective perception. To address the limitation, we present CORENet, a novel cross-modal denoising framework that leverages LiDAR supervision to identify noise patterns and extract discriminative features from raw 4D radar data. Designed as a plug-and-play architecture, our solution enables seamless integration into voxel-based detection frameworks without modifying existing pipelines. Notably, the proposed method only utilizes LiDAR data for cross-modal supervision during training while maintaining full radar-only operation during inference. Extensive evaluation on the challenging Dual-Radar dataset, which is characterized by elevated noise level, demonstrates the effectiveness of our framework in enhancing detection robustness. Comprehensive experiments validate that CORENet achieves superior performance compared to existing mainstream approaches.

13:50-13:55, Paper TuBT23.7
The Constitutional Filter: Bayesian Estimation of Compliant Agents

Kohaut, Simon	TU Darmstadt
Divo, Felix	TU Darmstadt
Flade, Benedict	Honda Research Institute Europe GmbH
Dhami, Devendra Singh	Eindhoven University of Technology
Eggert, Julian P.	Honda Research Institute Europe GmbH
Kersting, Kristian	TU Darmstadt & Hessian.AI
Keywords: Sensor Fusion, Intelligent Transportation Systems, Probabilistic Inference Abstract: Predicting agents impacted by legal policies, physical limitations, and operational preferences is inherently difficult. In recent years, neuro-symbolic methods have emerged, integrating machine learning and symbolic reasoning models into end-to-end learnable systems. Hereby, a promising avenue for expressing high-level constraints over multi-modal input data in robotics has opened up. This work introduces an approach for Bayesian estimation of agents expected to comply with a human-interpretable neuro-symbolic model we call its Constitution. Hence, we present the Constitutional Filter (CoFi), leading to improved tracking of agents by leveraging expert knowledge, incorporating deep learning architectures, and accounting for environmental uncertainties. CoFi extends the general, recursive Bayesian estimation setting, ensuring compatibility with a vast landscape of established techniques such as Particle Filters. To underpin the advantages of CoFi, we evaluate its performance on real-world marine traffic data. Beyond improved performance, we show how CoFi can learn to trust and adapt to the level of compliance of an agent, recovering baseline performance even if the assumed Constitution clashes with reality.

13:55-14:00, Paper TuBT23.8
PlugAndFilter: Architecture Agnostic Booster for Lightweight Registration

Malaspina, Edoardo	Institut Pascal Université Clermont Auvergne (UCA), SMA-RTY SAS
Abdelouahab, Kamel	SMA-RTY SAS
Berry, Francois	CNRS UCA
Keywords: Sensor Fusion, Deep Learning for Visual Perception Abstract: This paper introduces PlugAndFilter, a framework designed to enhance the performance of multi-modal image registration, particularly for real-time video registration tasks. The improvements provided by PlugAndFilter include not only better registration quality for individual image pairs but also the transformation of the image registration methods into a more robust video registration system. These enhancements are made possible by three proposed contributions: Spatial and Temporal outlier detection, along with Confidence-based keypoint accumulation. PlugAndFilter is compatible with a wide range of thermal-visible registration models, and any registration method capable of producing keypoint matches can be integrated. The proposed implementation is optimized for real-time video registration on edge devices, with key design decisions highlighted to support this goal.


TuBT24	102C
Software Architecture and AI-Based Methods	Regular Session
Chair: Peng, Xingguang	Northwestern Polytechnical University
Co-Chair: Pupa, Andrea	University of Modena and Reggio Emilia

13:20-13:25, Paper TuBT24.1
Gazing Preference Induced Controllable Milling Behavior in Swarm Robotics

Zhou, Yongjian	Northwestern Polytechnical University
Song, Jintao	Xi’an University of Architecture and Technology
Liu, Tong	Northwestern Polytechnical University
Peng, Xingguang	Northwestern Polytechnical University
Keywords: Swarm Robotics, Biologically-Inspired Robots, Probability and Statistical Methods Abstract: Milling is a collective behavior that is useful in a variety of real scenarios, but how to regulate milling behavior simply and efficiently is still a challenging problem. This letter introduces a novel method for controlling milling behavior in real-world robots, where both the direction and radius of the milling pattern can be continuously adjusted by tuning a single parameter. Inspired by visual attention mechanisms, the proposed model introduces the concept of gazing preference. That is, the robot will prefer to choose a neighbor in a particular direction for interaction, which in turn creates a force that deviates from the direction of velocity and leads to the milling behavior. Additionally, a potential function ensures swarm cohesion and prevents collisions through simulated repulsive and attractive forces. Simulations and experiments involving up to 50 robots demonstrate that adjusting the gazing preference parameter enables seamless control of the rotation direction and fine-tuning of the milling pattern’s angular velocity and radius. Overall, this letter provides a straightforward and effective approach for designing controlled milling behavior in swarm robotics.

13:25-13:30, Paper TuBT24.2
CAD2SLAM: Adaptive Projection between CAD Blueprints and SLAM Maps

Bayón-Gutiérrez, Martín	Universidad De León
Prieto-Fernández, Natalia	University of León
García-Ordás, María-Teresa	University of Leon
Benítez-Andrades, José Alberto	Universidad De León
Alaiz-Moreton, Hector	Universidad De Leon
Grisetti, Giorgio	Sapienza University of Rome
Keywords: Software Architecture for Robotic and Automation, Computational Geometry, Mapping Abstract: Robotic mobile platforms are key building blocks for many applications and the cooperation between robots and humans is a key aspect to enhance productivity and reduce labor costs. To operate safely, robots usually rely on a custom map of the environment that depends on the sensor configuration of the platform. In contrast, blueprints stand as an abstract representation of the environment. The use of both CAD and SLAM maps allows robots to collaborate using the blueprint as a common language, while also easing the control for non-robotics experts. In this work we present an adaptive system to project a 2D pose in the blueprint to the SLAM map and vice versa. Previous work in the literature aims at morphing a SLAM map to a previously available map. In contrast, CAD2SLAM does not alter the internal map representation used by the SLAM and localization algorithms running on the robot, preserving its original properties. We believe that our system is beneficial for the control and supervision of multiple heterogeneous robotic platforms that are monitored and controlled through the CAD map. We present a set of experiments that support our claims as well as an open source implementation.

13:30-13:35, Paper TuBT24.3
Cloud-Native Fog Robotics: Model-Based Deployment and Evaluation of Real-Time Applications

Wen, Long	Technical University of Munich
Zhang, Yu	Technical University of Munich
Rickert, Markus	University of Bamberg
Lin, Jianjie	Technische Universität München
Pan, Fengjunjie	Technical University of Munich
Knoll, Alois	Tech. Univ. Muenchen TUM
Keywords: Software Architecture for Robotic and Automation, Software, Middleware and Programming Environments, Distributed Robot Systems Abstract: As the field of robotics evolves, robots become increasingly multi-functional and complex. Currently, there is a need for solutions that enhance flexibility and computational power without compromising real-time performance. The emergence of fog computing and cloud-native approaches addresses these challenges. In this paper, we integrate a microservice-based architecture with cloud-native fog robotics to investigate its performance in managing complex robotic systems and handling real-time tasks. Additionally, we apply model-based systems engineering~(MBSE) to achieve automatic configuration of the architecture and to manage resource allocation efficiently. To demonstrate the feasibility and evaluate the performance of this architecture, we conduct comprehensive evaluations using both bare-metal and cloud setups, focusing particularly on real-time and machine-learning-based tasks. The experimental results indicate that a microservice-based cloud-native fog architecture offers a more stable computational environment compared to a bare-metal one, achieving over 20% reduction in the standard deviation for complex algorithms across both CPU and GPU. It delivers improved startup times, along with a 17% (wireless) and 23% (wired) faster average message transport time. Nonetheless, it exhibits a 37% slower execution time for simple CPU tasks and 3% for simple GPU tasks, though this impact is negligible in cloud-native environments where such tasks are typically deployed on bare-metal systems.

13:35-13:40, Paper TuBT24.4
Introducing Novice Operators to Collaborative Robots: A Hands-On Approach for Learning and Training (I)

Kornmaaler Hansen, Andreas	Aalborg University
Villani, Valeria	University of Modena and Reggio Emilia
Pupa, Andrea	University of Modena and Reggio Emilia
Heidemann Lassen, Astrid	Aalborg University
Keywords: Software Tools for Robot Programming, Human-Robot Collaboration, Human-Centered Robotics Abstract: Collaborative robots (cobots) have seen widespread adoption in industrial applications over the last decade. Cobots can be placed outside protective cages and are generally regarded as much more intuitive and easy to program compared to larger classical industrial robots. However, despite the cobots’ widespread adoption, their collaborative potential and opportunity to aid flexible production processes seem hindered by a lack of training and understanding from shop floor workers. Researchers have focused on technical solutions, which allow novice robot users to more easily train collaborative robots. However, most of this work has yet to leave research labs. Therefore, training methods are needed with the goal of transferring skills and knowledge to shop floor workers about how to program collaborative robots. We identify general basic knowledge and skills that a novice must master to program a collaborative robot. We present how to structure and facilitate cobot training based on cognitive apprenticeship and test the training framework on a total of 20 participants using a UR10e and UR3e robot. We considered two conditions: adaptive and self-regulated training. We found that the facilitation was effective in transferring knowledge and skills to novices, however, found no conclusive difference between the adaptive or self-regulated approach. The results demonstrate that, thanks to the proposed training method, both groups are able to significantly reduce task time, achieving a reduction of 40%, while maintaining the same level of performance in terms of position error.Note to Practitioners—This paper was motivated by the fact that the adoption of smaller, so-called collaborative robots is increasing within manufacturing but the potential for a single robot to be used flexibly in multiple places of a production seems unfulfilled. If more unskilled workers understood the collaborative robots and received structured training, they would be capable of programming the robots independently. This could change the current landscape of stationary collaborative robots towards more flexible robot use and thereby increase companies’ internal overall equipment efficiency and competencies. To this end, we identify general skills and knowledge for programming a collaborative robot, which helps increase the transparency of what novices need to know. We show how such knowledge and skills may be facilitated in a structured training framework, which effectively transfers necessary programming knowledge and skills to novices. This framework may be applied to a wider scope of knowledge and skills as the learner progresses. The skills and knowledge that we identify are general across robot platforms, however, collaborative robot interfaces differ. Therefore, a practical limitation to the approach includes the need for a knowledgeable person on the specific collaborative robot in question in order to create training material in areas specific to that model.

13:40-13:45, Paper TuBT24.5
Orchestrating Method Ensembles to Adapt to Resource Requirements and Constraints During Robotic Task Execution

Lay, Florian Samuel	DLR
Dömel, Andreas	German Aerospace Center (DLR)
Lii, Neal Y	German Aerospace Center (DLR)
Stulp, Freek	DLR - Deutsches Zentrum Für Luft Und Raumfahrt E.V
Keywords: Space Robotics and Automation, Methods and Tools for Robot System Design Abstract: Robot behavior designers commonly select one method -- e.g. A* or RRT -- that is assumed to have the appropriate trade-off for a given domain between computational load, computation time, and the quality of the result of the method. We propose ensemble orchestration patterns, which evaluate multiple methods, and select the best result, thus exploiting the complementary advantages that alternative methods often have. By implementing different termination, preemption, constraint enforcement and selection schemes, different patterns lead to different (predictable) resource trade-offs. Thus, rather than selecting and committing to only one method, a designer chooses the appropriate pattern and constraints for the desired trade-off, and the pattern then realizes the selection on-line. We apply these patterns to various subtasks that are prevalent in our Surface Avatar ISS Technology Demonstration Mission, such as navigation, motion planning, and registration. In our evaluation, we demonstrate that these patterns can effectively exploit increased resource budgets or relaxed constraints to find better solutions, and adapt the selection to different situations.

13:45-13:50, Paper TuBT24.6
Unified Control of an Orbital Manipulator for the Approach and Grasping of a Free-Floating Satellite (I)

Vijayan, Ria	German Aerospace Center (DLR)
De Stefano, Marco	German Aerospace Center (DLR)
Dietrich, Alexander	German Aerospace Center (DLR)
Ott, Christian	TU Wien
Keywords: Space Robotics and Automation, Motion Control, Compliance and Impedance Control Abstract: In robotic on-orbit servicing (OOS) missions, an orbital manipulator approaches and grasps a faulty client satellite. The approach phase and postgrasp phase pose different challenges and hence impose different requirements on the design of a controller for the orbital manipulator. In the approach phase, the foremost requirement is to track the client satellite with the end-effector in Cartesian space. In addition, it is desirable to have the servicing satellite and arm in a suitable safe configuration during grasp and stabilization. In the postgrasp phase, a crucial requirement is to limit the interaction forces at the manipulator’s end-effector. This is to ensure that the grasping interface is not damaged during stabilization. In this article, we develop a unified control framework for an orbital manipulator in the approach phase and postgrasp phase. The controller hierarchically fulfills various requirements of each phase. In the approach phase, the proposed controller tracks the Cartesian pose of the grasp point on the client. It also simultaneously tracks a joint-space trajectory in the nullspace to achieve a suitable servicer pose for grasping. The proposed postgrasp controller stabilizes the client with limited interaction forces while bringing the servicer to a safe configuration with respect to the client. Furthermore, the unified controller redistributes torques between thrusters and reaction wheels so as to save thruster energy in the approach phase and reduce external momentum in the postgrasp phase. Results of simulation and experiments performed on the hardware-in-the-loop facility OOS-Sim at DLR validate the proposed method.

13:50-13:55, Paper TuBT24.7
DR-MPC: Deep Residual Model Predictive Control for Real-World Social Navigation

Han, James	University of Toronto
Thomas, Hugues	University of Toronto
Zhang, Jian	Apple
Rhinehart, Nicholas	University of Toronto
Barfoot, Timothy	University of Toronto
Keywords: Social HRI, Reinforcement Learning, Autonomous Agents Abstract: How can a robot safely navigate around people with complex motion patterns? Deep Reinforcement Learning (DRL) in simulation holds some promise, but much prior work relies on simulators that fail to capture the nuances of real human motion. Thus, we propose Deep Residual Model Predictive Control (DRMPC) to enable robots to quickly and safely perform DRL from real-world crowd navigation data. By blending MPC with model-free DRL, DR-MPC overcomes the DRL challenges of large data requirements and unsafe initial behavior. DR-MPC is initialized with MPC-based path tracking, and gradually learns to interact more effectively with humans. To further accelerate learning, a safety component estimates out-of-distribution states to guide the robot away from likely collisions. In simulation, we show that DR-MPC substantially outperforms prior work, including traditional DRL and residual DRL models. Hardware experiments show our approach successfully enables a robot to navigate a variety of crowded situations with few errors using less than 4 hours of training data (video: https://youtu.be/GUZlGBk60uY, code: https://github.com/James-R-Han/DR-MPC).

13:55-14:00, Paper TuBT24.8
Stability Criterion and Stability Enhancement for a Thruster-Assisted Underwater Hexapod Robot

Chen, Lepeng	Northwestern Polytechnical University
Cui, Rongxin	Northwestern Polytechnical University
Yan, Weisheng	Northwestern Polytechnical University
Yang, Chenguang	University of Liverpool
Li, Zhijun	University of Science and Technology of China
Xu, Hui	Northwestern Polytechnical University
Yu, Haitao	Northwestern Polytechnical University
Keywords: Stability criterion and stability enhancement, Marine Robotics, Legged Robots, Motion Control Abstract: The stability criterion is critical for the design of legged robots’ motion planning and control algorithms. If these algorithms cannot theoretically ensure legged robots’ stability, we need many trials to identify suitable parameters for stable locomotion. However, most existing stability criteria are tailored to robots driven solely by legs and cannot be applied to thruster-assisted legged robots. Here, we propose a stability criterion for a thruster-assisted underwater hexapod robot by ﬁnding maximum and minimum allowable thruster forces and comparing them with the current thrusts to check its stability. On this basis, we propose a method to increase the robot’s stability margin by adjusting the value of thrusts. This process is called stability enhancement. The criterion uses the optimization method to transform multiple variables such as attitude, velocity, acceleration of the robot body, and the angle and angular velocity of leg joints into one kind of variable (thrust) to judge the stability directly. In addition, the stability enhancement method is straightforward to implement because it only needs to adjust the thrusts. These provide insights into how multiclass forces such as inertia force, ﬂuid force, thrust, gravity, and buoyancy affect the robot’s stability.


TuBT25	103A
Dexterous Manipulation 2	Regular Session
Chair: Jiang, Xin	Harbin Institute of Technology, Shenzhen

13:20-13:25, Paper TuBT25.1
Temporal-Spatial Representation Fusion for Dexterous Manipulation Learning with Unpaired Visual-Action Data

Han, Guwen	Zhejiang University
Sun, Zhengnan	Zhejiang University
Liu, Qingtao	Zhejiang University
Cui, Yu	Zhejiang University
Chen, Anjun	Zhejiang University
Chen, Huajin	Zhejiang University
Xiong, Rong	Zhejiang University
Chen, Jiming	Zhejiang University
Ye, Qi	Zhejiang University
Keywords: Dexterous Manipulation, Deep Learning in Grasping and Manipulation, Manipulation Planning Abstract: Supervised behavioral cloning using robot visual-action data has been widely investigated in robot manipulation. However, these methods typically require simultaneous acquisition of visual and action data, which makes them difficult to utilize unpaired visual-action datasets: e.g. videos on Internet or action only data which has less privacy and security concerns. To take advantage of the action data without synchronized visual observation, we propose UnVALe, a novel dexterous robotic manipulation RL framework which utilizes action data without paired images to learn priors of human dexterous manipulation skills. Specifically, a LSTM-based network is designed to learn the temporal action prior by reconstructing the input trajectories and a VAE network is designed to learn the spatial action prior by reconstructing the input action. Novel rewards are proposed to incorporate the priors into reinforcement learning, which encourage actions output from RL polices to maintain low reconstruction errors in the VAE and LSTM networks. We perform extensive validation on three dexterous robot manipulation tasks. The experimental results show that UnVALe can effectively improve robot manipulation performance. Compared with existing visual pretraining methods, our method achieves a more than 30% increase in success rates.

13:25-13:30, Paper TuBT25.2
VTAO-BiManip: Masked Visual-Tactile-Action Pre-Training with Object Understanding for Bimanual Dexterous Manipulation

Sun, Zhengnan	Zhejiang University
Shi, Zhaotai	Zhejiang University
Chen, Jiaying	Zhejiang University
Liu, Qingtao	Zhejiang University
Cui, Yu	Zhejiang University
Chen, Jiming	Zhejiang University
Ye, Qi	Zhejiang University
Keywords: Dexterous Manipulation, Deep Learning in Grasping and Manipulation, Bimanual Manipulation Abstract: Bimanual dexterous manipulation remains a significant challenge in robotics due to the high DoFs of each hand and their coordination. Existing single-hand manipulation techniques often leverage human demonstrations to guide RL methods but fail to generalize to complex bimanual tasks involving multiple sub-skills. In this paper, we propose VTAO-BiManip, a novel framework that integrates visual-tactile-action pre-training with object understanding, aiming to enable human-like bimanual manipulation via curriculum RL. We improve prior learning by incorporating hand motion data, providing more effective guidance for dual-hand coordination. Our pretraining model predicts future actions as well as object pose and size using masked multimodal inputs, facilitating cross-modal regularization. To address the multi-skill learning challenge, we introduce a two-stage curriculum RL approach to stabilize training. We evaluate our method on a bimanual bottle-cap twisting task, demonstrating its effectiveness in both simulated and real-world environments. Our approach achieves a success rate that surpasses existing visual-tactile pretraining methods by over 20%.

13:30-13:35, Paper TuBT25.3
Learning to Throw-Flip

Liu, Yang	EPFL
Da Costa, Bruno	EPFL
Billard, Aude	EPFL
Keywords: Dexterous Manipulation, Learning from Experience, Transfer Learning Abstract: Dynamic manipulation, such as robot tossing or throwing objects, has recently gained attention as a novel paradigm to speed up logistic operations. However, the focus has predominantly been on the object's landing location, irrespective of its final orientation. In this work, we present a method enabling a robot to accurately "throw-flip" objects to a desired landing pose (position and orientation). Conventionally, objects thrown by revolute robots suffer from parasitic rotation, resulting in highly restricted and uncontrollable landing poses. Our approach is based on two key design choices: first, leveraging the impulse-momentum principle, we design a family of throwing motions that effectively decouple the parasitic rotation, significantly expanding the feasible set of landing poses. Second, we combine a physics-based model of free flight with regression-based learning methods to account for unmodeled effects. Real robot experiments demonstrate that our framework can learn to throw-flip objects to a pose target within (±5 cm, ±45degree) threshold in dozens of trials. Thanks to data assimilation, incorporating projectile dynamics reduces sample complexity by an average of 40% when throw-flipping to unseen poses compared to end-to-end learning methods. Additionally, we show that past knowledge on in-hand object spinning can be effectively reused, accelerating learning by 70% when throwing a new object with a Center of Mass (CoM) shift.

13:35-13:40, Paper TuBT25.4
PACR: Point-Axis Constraint Reasoning for Enhanced Robotic Manipulation with Dexterity and Compliance

Haowen, Xiong	Harbin Institute of Technology
Mu, Yao	The University of Hong Kong
Liu, Zhuang	Harbin Institute of Technology
Yusi, Fan	Harbin Institute of Technology
Huang, Yi	Harbin Institute of Technology
Liu, Jianxing	Harbin Institute of Technology
Keywords: Dexterous Manipulation, Bimanual Manipulation, Agent-Based Systems Abstract: Developing robotic systems for unstructured and contact-rich environments presents significant challenges, necessitating advanced dexterous motion planning, compliant interaction control, and spatio-temporal coordination. To address these, we introduce PACR (Point-Axis Constraint Reasoning), an unified framework that encodes robot trajectories and impedance profiles via constraint functions parameterized by point-axis primitives, extracted from multi-view RGB-D camera observations. This enables joint optimization of motion and impedance within a shared mathematical framework. For enhanced robustness, we implement a dual-agent Vision-Language Model (VLM) system: a Generator employs Chain-of-Thought reasoning to formulate constraints, while an adversarial Critic validates them, significantly mitigating hallucination risks. Integrated with the dual-agent system, the framework also features an error backtracking mechanism, enabling dynamic adaptation by learning from failures. Extensive experiments across diverse manipulation tasks reveal that PACR achieves a 61% success rate (compared to 37% for baseline methods) and reduces the average contact forces, demonstrating broad applicability through zero-shot generalization without task-specific training.

13:40-13:45, Paper TuBT25.5
Flipping Manipulation with a Two-Fingered Parallel-Jaw Gripper

Liao, Wenxi	Harbin Institute of Technology
Hu, Shao	Harbin Institute of Technology, ShenZhen
Liu, Zhitong	Harbin Institute of Technology, Shenzhen
Jiang, Xin	Harbin Institute of Technology, Shenzhen
Keywords: Dexterous Manipulation, In-Hand Manipulation, Contact Modeling Abstract: Industrial part reorientation remains a critical challenge in automated manufacturing workflows, particularly with parallel-jaw grippers lacking the dexterity for complex manipulations. This paper presents a systematic flipping strategy for structured environments. A quasi-static force equilibrium model is developed to characterize multi-contact manipulation systems, and stability criteria are derived through wrench space analysis, enabling A*-based optimal trajectory generation within the derived stable configuration space. To ensure persistent fingertip-object contact, adaptive impedance control dynamically adjusts gripper stiffness based on real-time force thresholds, preventing unintended detachment. Experimental validation demonstrates robust performance in two representative scenarios:1) cube flipping on a compliant surface(84.3g, 90% success over 50 trials), 2) vision-free continuous pivoting of an irregular part on a rigid substrate(56g, 88% success over 50 trials). The methodology requires neither environmental modification nor expensive tactile sensing, showing promise for practical deployment in structured manufacturing systems.

13:45-13:50, Paper TuBT25.6
Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation

Li, Jinzhou	Cornell University
Wu, Tianhao	Peking University
Zhang, Jiyao	Peking University
Chen, Zeyuan	Peking University
Jin, Haotian	Beijing University of Chemical Technology
Mingdong Wu, Aaron	Peking University
Shen, Yujun	Ant Group
Yang, Yaodong	Peking University
Dong, Hao	Peking University
Keywords: Dexterous Manipulation, Imitation Learning, Force and Tactile Sensing Abstract: Effectively utilizing multi-sensory data is important for robots to generalize across diverse tasks. However, the heterogeneous nature of these modalities makes fusion challenging. Existing methods propose strategies to obtain comprehensively fused features but often ignore the fact that each modality requires different levels of attention at different manipulation stages. To address this, we propose a force-guided attention fusion module that adaptively adjusts the weights of visual and tactile features without human labeling. We also introduce a self-supervised future force prediction auxiliary task to reinforce the tactile modality, improve data imbalance, and encourage proper adjustment. Our method achieves an average success rate of 93% across three fine-grained, contact-rich tasks in real-world experiments. Further analysis shows that our policy appropriately adjusts attention to each modality at different manipulation stages.

13:50-13:55, Paper TuBT25.7
Human–Robot Intrinsic Skill Transfer and Programming by Demonstration System (I)

Wang, Fei	Northeastern University
Wu, Jinxiu	Northeastern University
Lian, Siyi	Northeastern University
Guo, Yi	Northeastern University
Hu, Kaiyin	Northeastern University
Keywords: Learning from Demonstration, Dexterous Manipulation, Transfer Learning Abstract: Robots can often learn skills from human demonstrations.Robot operations via visual perception are commonly influenced by the external environment and are relatively demanding in terms of external conditions, manual segmentation of tasks is time-consuming and labor-intensive, and robots do not perform complex tasks with suffcient accuracy and naturalness in their movements. In this work, we propose a programming by demonstration framework to facilitate autonomous task segmentation and flexible execution. We acquire surface electromyography (sEMG) signals from the forearm and train the gesture datasets by transfer learning from the sign language dataset to achieve action classification. Meanwhile, the inertial information of the forearm is collected and combined with the sEMG signals to autonomously segment operational skills into discrete task units, and after comparing with ground truth, it is demonstrated that multi-modal information can lead to higher segmentation accuracy. To make the robot movements more natural, we add arm stiffness information to this system and estimate the arm stiffness of different individuals by creating a muscle force map of the demonstrator. Finally, human manipulation skills are mapped onto the UR5e robot to validate the results of human-robot skill transfer.


TuBT26	103B
Soft Robot Materials and Design 2	Regular Session
Chair: Wang, Dong	Shanghai Jiao Tong University
Co-Chair: Guo, Shijie	Hebei University of Technology

13:20-13:25, Paper TuBT26.1
Multi-Material 3D-Printed Magnetic Millirobot for Quadrupedal Locomotion in Endoluminal Spaces

Wang, Ruichen	Shanghai Jiao Tong University
Wang, Jinqiang	Shanghai Jiao Tong University
Wang, Dong	Shanghai Jiao Tong University
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Additive Manufacturing Abstract: Quadrupedal locomotion have advantages of a low center of gravity, broad support base, and four-legged coordination, enabling outstanding stability in complex terrains. Drawing inspiration from this, researchers have developed robots emulate such locomotion through multiple actuations controlling and structural reconfiguration. However, complex control sequences and slow tethered actuation limit their locomotion in confined endoluminal spaces. Here, we present a quadrupedal magnetic millirobot (beamrobot) fabricated via multi-material direct ink writing (DIW) for medical applications in complex endoluminal spaces. The millirobot combines a soft body with magnetic feet, enabling controlled shape-morphing and locomotion under external magnetic fields. The printing parameters are optimized. The numerical simulations and experiments validate the static deformation and dynamic locomotion modes. Experimental results demonstrate the versatility of the beamrobots, including following “U”-shaped trajectories, displaying movement in the branch vessel model, and clearing the obstruction in the vessel model. The proposed quadrupedal magnetic millirobot and hard-magnetic actuation approach open up new possibilities for the medical applications.

13:25-13:30, Paper TuBT26.2
PNEUmorph: A Shape-Morphing Interface Comprising a Pneumatic Membrane Constrained by Variable-Length Tendons

Soana, Valentina	University College London
Bosi, Federico	University College London
Wurdemann, Helge Arne	University College London
Keywords: Soft Robot Materials and Design, Soft Robot Applications Abstract: Soft shape-morphing technologies are explored in fields such as soft robotics, metamaterials and design, enabling systems to adapt dynamically through elastic deformations. Applied in mobile devices, actuators, interactive objects and environments, these systems can respond to functional needs and environmental stimuli, communicating information and enhancing human experiences. A primary design goal for these systems is achieving extensive and complex shape transformations. Traditionally, soft robotics employs a pneumatic active layer constrained by a passive layer, limiting the deformation range. However, using dual active layers can expand deformation potential. Expanding on these principles, this work introduces PNEUmorph: a pneumatic surface constrained by a network of variable-length tendons, allowing broader shape transformations than traditional single-layer systems. PNEUmorph’s dual-layer actuation overcomes fixed deformation limits, significantly enhancing shape-morphing capabilities. This paper presents PNEUmorph’s design and preliminary geometrical characterization, achieved through an interdisciplinary approach that merges design and soft robotics methods. This study details methods for simulation, fabrication, operation and evaluation, offering insights into experimental results and directions for advancing surface-based soft shape-morphing systems.

13:30-13:35, Paper TuBT26.3
Design and Performance Analysis of a Pipeline Crawling Robot Based on Spring-Roll Dielectric Elastomer Actuators

Zhang, Qinghai	Fudan University
Yu, Wei	Hebei University of Technology
Zhang, Ziqi	Hebei University of Technology
Jianghua, Zhao	Hebei University of Technology
Guo, Shijie	Hebei University of Technology
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Modeling, Control, and Learning for Soft Robots Abstract: With the increasing complexity of pipeline systems in various industrial and environmental applications, there is a critical need for flexible and efficient robotic solutions that can navigate and inspect confined spaces. This paper introduces a lightweight pipeline crawling robot based on spring-roll dielectric elastomer actuators (DEAs). Inspired by the adaptability of caterpillars, the robot combines anisotropic friction feet with a spring-roll DEA structure to achieve high-speed movement. It operates effectively in pipes with diameters ranging from 16 mm to 20 mm, reaching a maximum speed of 357 mm/s (5.95 BL/s) under a 3.5 kV driving voltage. The optimized design enhances actuator performance and friction distribution, significantly outperforming existing soft crawling robots. This innovation demonstrates great potential for high-speed, lightweight pipeline inspection applications and advances the field of soft robotics for diverse industrial tasks.

13:35-13:40, Paper TuBT26.4
Growing Manipulators through Feeding Material Outside-In: Inversion Robots

Pi, Xinyi	University College London
Yao, Junke	King's College London
Adams, Benjamin	University College London
Gerontati, Antonia	University College London
Wurdemann, Helge Arne	University College London
Keywords: Soft Robot Materials and Design, Soft Robot Applications Abstract: Soft eversion robots have demonstrated significant advantages in navigating within confined spaces with minimal friction, making them promising candidates for various intraluminal applications in medical, industrial, and exploratory domains. These types of growing robots enable frictionless movement within hollow structures. This paper introduces a novel growing robotic manipulator. The area of medical application is endoscopic vein harvesting. We present the design, implementation, and experimental validation of this eversion robot, investigating its growth behavior under varying pressure values and with different diameters, its ability to navigate along defined trajectories, and with a tool mounted to its tip. This eversion robot could enable vein dissection while preserving the surrounding fat layer, making it a promising innovation for minimally invasive vascular surgery and beyond.

13:40-13:45, Paper TuBT26.5
Mechanically Programming the Cross-Sectional Shape of Soft Growing Robotic Structures for Patient Transfer

Osele, Obumneme Godson	Stanford University
Barhydt, Kentaro	Massachusetts Institute of Technology
Sullivan, Catherine	Massachusetts Institute of Technology
Asada, Harry	MIT
Okamura, Allison M.	Stanford University
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Modeling, Control, and Learning for Soft Robots Abstract: Pneumatic soft everting robotic structures have the potential to facilitate human transfer tasks due to their ability to grow underneath humans without sliding friction and their utility as a flexible sling when deflated. Tubular structures naturally yield circular cross-sections when inflated, whereas a robotic sling must be both thin enough to grow between a human and their resting surface and wide enough to cradle the human. Recent works have achieved flattened cross-sections by including rigid components into the structure, but this reduces conformability to the human. We present a method of mechanically programming the cross-section of soft everting robotic structures using flexible strips that constrain radial expansion between points along the outer membrane. Our method enables simultaneously wide and thin inflated profiles, and maintains the full multi-axis flexibility of traditional slings when deflated. We develop and validate a model relating geometric design specifications to fabrication parameters, and experimentally characterize their effects on growth rate. Finally, we prototype a soft growing robotic sling system and demonstrate its use for assisting a single caregiver in bed-to-chair patient transfer.

13:45-13:50, Paper TuBT26.6
Enhancing Continuum Robot Mobility: Design and Control with Integrated Dual Rotational DOFs

Yuan, Peikang	Tianjin University
Sun, Changchao	Tianjin University
Chang, Xiang	Tianjin University
Zhang, Xu	Tianjin University
Kang, Rongjie	Tianjin University
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Modeling, Control, and Learning for Soft Robots Abstract: Continuum robots, known for their compliance in unstructured environments, face limitations due to the lack of rotational degrees of freedom (DOFs) about the backbone. This prevents them from compensating undesired torsional deformation and performing 6-DOF control of the end-effector, thereby restricting their mobility. This paper presents a continuum robot with integrated dual rotational DOFs. One is integrated at the arm base to compensate for torsional deformation caused by external loads, while the other one, located at the arm tip, enables full 6-DOF control of the end-effector. To control the robot, a screw-theory-based kinematic model and a kinematic control framework are proposed to enable real-time, simultaneous control of the end-effector's position and orientation. Experimental results show that the arm base rotational joint can fully compensate for undesired torsional deformation caused by a 1000 g payload. Thanks to the arm tip's DOF and the proposed kinematic control framework, the robot's end-effector can maintain a constant orientation while achieving open-loop path-tracking errors of only 3.3% of the arm's length (930 mm), and successfully executing valve-closing tasks with coordinated 6-DOF motion, demonstrating the robot's potential for industrial maintenance, human-robot interaction, and confined-space manipulation.

13:50-13:55, Paper TuBT26.7
Bioinspired Directional Adhesives Enable High Stiffness Layer Jamming in Soft Actuators

Wang, Zhihuan	Hohai University
Xu, Linsen	Hohai University
Wang, Mingming	Hohai University
Ye, Liangzhi	Hohai University
Zhang, Zhihua	Changzhou Institute of Advanced Manufacturing Technology
Keywords: Soft Robot Materials and Design, Soft Robot Applications Abstract: Soft actuators are inherently flexible and compliant, traits that enhance their adaptability to diverse environments and tasks. However, their low structural stiffness can lead to unpredictable and uncontrollable complex deformations when substantial force is required, thereby compromising their load-bearing capacity. This work proposes a novel layer jamming method that uses bioinspired directional adhesives as interlayer films to adjust the stiffness of soft actuators. The mechanical behavior of a single tilted fibril was analyzed using the energy method to determine the adhesion force of the adhesives. The directional adhesive was designed under the guidance of the adhesion force model. Testing under various loads and directions revealed that the tilted characteristic of fibrils can enhance the adhesion force in its grasping direction. A tunable stiffness actuator using directional adhesives (TSADA), was developed with these adhesives serving as interlayer films. The stiffness model of TSADA was derived by analyzing its axial compression force. The results of stiffness experiments indicate that the adhesives serve as interlayer films can adjust the stiffness in response to applied load. TSADA was compared with other typical soft actuators to evaluate the stiffness performance, and the results indicate that TSADA exhibits the highest stiffness and the widest tunable stiffness range. This demonstrates the superior performance of the directional adhesives as interlayer films in terms of stiffness adjustment.

13:55-14:00, Paper TuBT26.8
Origami-Inspired Soft Gripper with Tunable Constant Force Output

Ni, Zhenwei	National University of Singapore
Xu, Chang	National University of Singapore
Qin, Zhihang	National University of Singapore
Zhang, Ceng	National University of Singapore
Tang, Zhiqiang	National University of Singapore
Wang, Peiyi	National University of Singapore (NUS)
Laschi, Cecilia	National University of Singapore
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Grippers and Other End-Effectors Abstract: Soft robotic grippers gently and safely manipulate delicate objects due to their inherent adaptability and softness. Limited by insufficient stiffness and imprecise force control, conventional soft grippers are not suitable for applications that require stable grasping force. In this work, we propose a soft gripper that utilizes an origami-inspired structure to achieve tunable constant force output over a wide strain range. The geometry of each taper panel is established to provide necessary parameters such as protrusion distance, taper angle, and crease thickness required for 3D modeling and FEA analysis. Simulations and experiments show that by optimizing these parameters, our design can achieve a tunable constant force output. Moreover, the origami-inspired soft gripper dynamically adapts to different shapes while preventing excessive forces, with potential applications in logistics, manufacturing, and other industrial settings that require stable and adaptive operations.


TuBT27	103C
Service Robotics	Regular Session
Chair: Sadeghian, Hamid	Technical University of Munich
Co-Chair: Zhu, Jihong	University of York

13:20-13:25, Paper TuBT27.1
A Whole-Body Unified Force-Impedance Control for Non-Holonomic Service Robots

Forouhar, Moein	Technische Universität München
Sadeghian, Hamid	Technical University of Munich
Naceri, Abdeldjallil	Technical University of Munich
Koenig, Alexander	Technische Universität München
Haddadin, Sami	Mohamed Bin Zayed University of Artificial Intelligence
Keywords: Service Robotics Abstract: In this paper, we extend the {Unified Force-Impedance Control (UFIC)} framework for the whole-body control of service mobile robots subject to nonholonomic constraints. This enables the robot to execute complex service tasks that demand both force and impedance control. The task space of the robot is defined as the pose of both end-effectors. Following the concept of UFIC, both impedance and force tracking commands are applied in the task space of the whole-body controller, with augmented energy tanks incorporated to ensure passivity. To enable smooth transitions between force tracking and impedance control—particularly in cases of contact loss—a shaping function is used to modulate the force control command. Additionally, the robot's redundancy is exploited to shape the posture, while satisfying joint limits, avoiding singularities, and preventing self-collisions between the arms. The effectiveness of the proposed whole-body UFIC controller is validated through simulations and real-world experiments with the service robot GARMI performing several daily tasks.

13:25-13:30, Paper TuBT27.2
A Safe and Convenient Feeding Assistive Robotic Based on Multi-Modal Interaction Method

Ding, Jiahui	Shenyang University of Technology
Zhao, Donghui	Tsinghua University
Liu, Baixue	Shenyang University of Technology
Yang, Junyou	Shenyang University of Technology
Wang, Shuoyu	Kochi University of Technology
Liu, Houde	Shenzhen Graduate School, Tsinghua University
Fukuda, Toshio	Nagoya University
Keywords: Service Robotics, Rehabilitation Robotics Abstract: For individuals with limited mobility who are bedridden for extended periods, providing comfortable assisted feeding services is one of the most significant actions to enhance their quality of life. Despite the development of various feeding assistive robots, there remain limitations in terms of interaction convenience and safety, which restrict the overall feeding experience for users. To address these challenges, this study first establishes a feeding assistive robot system that integrates multimodal interaction methods. Furthermore, we propose an interactive feeding method that combines both safety and comfort. This method utilizes visual recognition to detect the user's active meal intent, food selection preferences, and chewing status. Additionally, based on a Large Language Model (LLM), a monitoring thread is designed to conduct voice interactions regarding the user's ambiguous intentions, temporary changes in intent, emergency situations, and risky behaviors throughout the feeding process. Comprehensive experimental results demonstrate that the proposed multimodal interaction method, which aligns with the natural eating patterns, incorporates both language and visual interactions, the two most convenient forms for users. It also matches force sensing and pose control techniques during the feeding stages, thereby enhancing the flexibility and safety of the assisted feeding system.

13:30-13:35, Paper TuBT27.3
Robot-Mediated Gesture-Based Memory Game for Older Adult Psychophysical Stimulation

Pozzi, Luca	Politecnico Di Milano
Braghin, Francesco	Politecnico Di Milano
Gandolla, Marta	Politecnico Di Milano
Keywords: Service Robotics Abstract: The rapid growth of the aging population in developed countries makes healthcare an important social challenge. In this context, service robots can play a key role. This work presents a software application for a service robot (TIAGo, PAL Robotics) implementing a motor-cognitive game. The activity combines cognitive and physical stimulation, a design that is relatively uncommon in literature. The embodied interaction design of the game distinguishes it from classical touchscreen-based games. In the game, the robot mimes letters with its arm, and the user has to recognize and then imitate them. The letter sequence increases in length each turn to train memory. User gestures are tracked using an ArUCo marker, and classified via a neural network. The application was tested on 10 young subjects and 4 community-dwelling older adults (82.3±3.5years). Recognition accuracy reached 92.2% and 80.5%, respectively, for young and older adults. Post-session questionnaires highlighted high engagement and perceived usefulness, especially among older users who appreciated the memory and physical training aspects. This pilot project demonstrates the potential of integrating service robots into eldercare to support both patients and caregivers.

13:35-13:40, Paper TuBT27.4
Bimanual Robot-Assisted Dressing: A Spherical Coordinate-Based Strategy for Tight-Fitting Garments

Zhao, Jian	University of York
Lian, Yunlong	The University of York
Tyrrell, Andy	University of York
Gienger, Michael	Honda Research Institute Europe
Zhu, Jihong	University of York
Keywords: Service Robotics, Imitation Learning, Physically Assistive Devices Abstract: Robot-assisted dressing is a popular but challenging topic in the field of robotic manipulation, offering significant potential to improve the quality of life for individuals with mobility limitations. Currently, the majority of research on robot-assisted dressing focuses on how to put on loose-fitting clothing, with little attention paid to tight garments. For the former, since the armscye is larger, a single robotic arm can usually complete the dressing task successfully. However, for the latter, dressing with a single robotic arm often fails due to the narrower armscye and the property of diminishing rigidity in the armscye, which eventually causes the armscye to get stuck. This paper proposes a bimanual dressing strategy suitable for dressing tight-fitting clothing. To facilitate the encoding of dressing trajectories that adapt to different human arm postures, a spherical coordinate system for dressing is established. We uses the azimuthal angle of the spherical coordinate system as a task-relevant feature for bimanual manipulation. Based on this new coordinate, we employ Gaussian Mixture Model (GMM) and Gaussian Mixture Regression (GMR) for imitation learning of bimanual dressing trajectories, generating dressing strategies that adapt to different human arm postures. The effectiveness of the proposed method is validated through various experiments.

13:40-13:45, Paper TuBT27.5
Target Localization and Following Based on LiDAR and Ultra-Wideband Ranging with Consideration of Target Visibility

Guo, Lin	Southwest University of Science and Technology
Liu, Ran	Nanyang Technological University
Cao, Zhiqiang	Singapore University of Technology and Design
Lau, Billy Pik Lik	Nanyang Technological University
Tan, U-Xuan	Singapore University of Techonlogy and Design
Yuen, Chau	Nanyang Technological University
Keywords: Sensor-based Control, Human Detection and Tracking, Localization Abstract: To perform target-following tasks in unknown environments, a robot must identify the target's position and plan an efficient path to reach it. Traditional LiDAR-based localization systems face challenges in distinguishing the target from objects with similar appearances. Meanwhile, existing target-following approaches often neglect target visibility during path planning, leading to target occlusion by obstacles and ultimately resulting in following failure. In this paper, we propose a sequence matching method for target-localization using LiDAR and Ultra-Wideband (UWB) ranging. We determine the position of the target by analyzing the similarities between UWB ranging sequence and LiDAR cluster trajectories. To achieve visibility-aware target-following, we incorporate a visibility objective function into the Dynamic Window Approach (DWA) to generate a following path that minimizes the risk of target loss. This function evaluates the target loss risk based on the positional relationships between the robot, the target, and the nearest obstacle to the target. Extensive experiments were conducted using both human and robot as targets. The results show that our approach achieves higher completion rates when compared to the target-following using traditional DWA.

13:45-13:50, Paper TuBT27.6
L3M+P: Lifelong Planning with Large Language Models

Agarwal, Krish	University of Texas at Austin
Jiang, Yuqian	University of Texas at Austin
Hu, Jiaheng	UT Austin
Liu, Bo	University of Texas at Austin
Stone, Peter	The University of Texas at Austin
Keywords: Task Planning, Service Robotics, Long term Interaction Abstract: By combining classical planning methods with large language models (LLMs), recent research such as LLM+P has enabled agents to plan for general tasks given in natural language. However, scaling these methods to general-purpose service robots remains challenging: (1) classical planning algorithms generally require a detailed and consistent specification of the environment, which is not always readily available; and (2) existing frameworks mainly focus on isolated planning tasks, whereas robots are often meant to serve in long-term continuous deployments, and therefore must maintain a dynamic memory of the environment which can be updated with multi-modal inputs and extracted as planning knowledge for future tasks. To address these two issues, this paper introduces L3M+P (Lifelong LLM+P), a framework that uses an external knowledge graph as a representation of the world state. The graph can be updated from multiple sources of information, including sensory input and natural language interactions with humans. L3M+P enforces rules for the expected format of the absolute world state graph to maintain consistency between graph updates. At planning time, given a natural language description of a task, L3M+P retrieves context from the knowledge graph and generates a problem definition for classical planners. Evaluated on household robot simulators and on a real-world service robot, L3M+P achieves significant improvement over baseline methods both on accurately registering natural language state changes and on correctly generating plans, thanks to the knowledge graph retrieval and verification.

13:50-13:55, Paper TuBT27.7
AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environments

Sun, Nan	Tsinghua University
Mao, Bo	Beijing University of Posts and Telecommunications
Li, Yongchang	Yantai University
Guo, Di	Beijing University of Posts and Telecommunications
Liu, Huaping	Tsinghua University
Keywords: Service Robotics, Autonomous Agents, Human-Centered Automation Abstract: Current service robots suffer from limited natural language communication abilities, heavy reliance on predefined commands, ongoing human intervention, and, most notably, a lack of proactive collaboration awareness in human-populated environments. This results in narrow applicability and low utility. In this paper, we introduce AssistantX, an LLM-powered proactive assistant designed for autonomous operation in real-world scenarios with high accuracy. AssistantX employs a multi-agent framework consisting of 4 specialized LLM agents, each dedicated to perception, planning, decision-making, and reflective review, facilitating advanced inference capabilities and comprehensive collaboration awareness, much like a human assistant. We built a dataset of 210 real-world tasks to validate AssistantX, which includes instruction content and status information on whether relevant personnel are available. Extensive experiments were conducted in both text-based simulations and a real office environment over the course of a month and a half. Our experiments demonstrate the effectiveness of the proposed architecture, showing that AssistantX can reactively respond to user instructions, actively adjust strategies to adapt to contingencies, and proactively seek assistance from humans to ensure successful task completion.


TuBT28	104
Marine Robotics 2	Regular Session
Chair: Englot, Brendan	Stevens Institute of Technology
Co-Chair: Navarro-Alarcon, David	The Hong Kong Polytechnic University

13:20-13:25, Paper TuBT28.1
ZS-Puffin: Design, Modeling and Implementation of an Unmanned Aerial-Aquatic Vehicle with Amphibious Wings

Wang, Zhenjiang	Sun Yat-Sen University
Jiang, Yunhua	Sun Yat-Sen University
Zhen, Zikun	Sun Yat-Sen University
Jiang, Yifan	Sun Yat-Sen University
Tan, Yubin	Sun Yat-Sen University
Wang, Wubin	Sun Yat-Sen University
Keywords: Marine Robotics, Aerial Systems: Mechanics and Control, Biologically-Inspired Robots Abstract: Unmanned aerial-aquatic vehicles (UAAVs) can operate both in the air and underwater, giving them broad application prospects. Inspired by the dual-function wings of puffins, we propose a UAAV with amphibious wings to address the challenge posed by medium differences on the vehicle's propulsion system. The amphibious wing, redesigned based on a fixed-wing structure, features a single degree of freedom in pitch and requires no additional components. It can generate lift in the air and function as a flapping wing for propulsion underwater, reducing disturbance to marine life and making it environmentally friendly. Additionally, an artificial central pattern generator (CPG) is introduced to enhance the smoothness of the flapping motion. This paper presents the prototype, design details, and practical implementation of this concept.

13:25-13:30, Paper TuBT28.2
Enhancing Navigational Scene Understanding Using Integrated Language Models in Maritime Environments

Shin, Yeongha	Korea Advanced Institute of Science and Technology
Kim, Jinwhan	KAIST
Keywords: Marine Robotics, Autonomous Vehicle Navigation, AI-Based Methods Abstract: In this study, we introduce an innovative algorithm for enhanced navigational scene understanding in complex maritime environments by utilizing large language models (LLM) and visual language models (VLM) to achieve autonomous maritime situational awareness. The proposed algorithm interprets the meanings of various features and marks on detected objects in maritime contexts. By combining this information with radar and camera data, the algorithm generates cost maps for safe navigation. This approach offers two key benefits: (1) the ability to identify navigable areas considering obstacles, maritime marks, rules, and ship intentions, and (2) decision-making support based on reasoning, bridging the information gap between human operators and perception results. The performance of the proposed approach is demonstrated using a real-world dataset. The detailed information can be found at:https://yeongha-shin.github.io/vlmllm-maritime/

13:30-13:35, Paper TuBT28.3
Distributional Reinforcement Learning Based Integrated Decision Making and Control for Autonomous Surface Vehicles

Lin, Xi	Stevens Institute of Technology
Szenher, Paul	Stevens Institute of Technology
Huang, Yewei	Dartmouth College
Englot, Brendan	Stevens Institute of Technology
Keywords: Marine Robotics, Autonomous Vehicle Navigation, Reinforcement Learning Abstract: With the growing demands for Autonomous Surface Vehicles (ASVs) in recent years, the number of ASVs being deployed for various maritime missions is expected to increase rapidly in the near future. However, it is still challenging for ASVs to perform sensor-based autonomous navigation in obstacle-filled and congested waterways, where perception errors, closely gathered vehicles and limited maneuvering space near buoys may cause difficulties in following the Convention on the International Regulations for Preventing Collisions at Sea (COLREGs). To address these issues, we propose a novel Distributional Reinforcement Learning based navigation system that can work with onboard LiDAR and odometry sensors to generate arbitrary thrust commands in continuous action space. Comprehensive evaluations of the proposed system in high-fidelity Gazebo simulations show its ability to decide whether to follow CORLEGs or take other beneficial actions based on the scenarios encountered, offering superior performance in navigation safety and efficiency compared to systems using state-of-the-art Distributional RL, non-Distributional RL and classical methods.

13:35-13:40, Paper TuBT28.4
Development of an Efficient Stiffness Modulation Mechanism in Fish-Like Robots for Enhanced Swimming Performance

Chao, Xu	City University of Hong Kong
Yu, Bohan	City University of Hong Kong
Navarro-Alarcon, David	The Hong Kong Polytechnic University
Jing, Xingjian	City University of Hong Kong
Keywords: Marine Robotics, Biomimetics, Mechanism Design Abstract: Drawing inspiration from the ability of fish to maintain efficient swimming over a wide range of speeds by tuning the stiffness of their tails, researchers have explored stiffness adjustment mechanisms in fish-like robots. Typically, existing mechanisms require extra actuators or power sources only for tuning stiffness, resulting in additional energy consumption and more complex structures. To address this, our study introduces an innovative fishtail featuring an online stiffness modulation mechanism that does not require additional actuators or power sources solely for stiffness adjustment. Through model-based simulations and experimental testing, we evaluated the effectiveness of the proposed method. The results demonstrate that the designed mechanism enables efficient swimming across a broader frequency range (0–4 Hz) compared to most servo-actuated platforms with adjustable stiffness reported in existing studies. The robot achieves a maximum average speed of 1.4 BL/s and a minimum cost of transport of 9.5 J/(m·kg).

13:40-13:45, Paper TuBT28.5
ACSim: A Novel Acoustic Camera Simulator with Recursive Ray Tracing, Artifact Modeling and Ground Truthing

Wang, Yusheng	The University of Tokyo
Ji, Yonghoon	JAIST
Tsuchiya, Hiroshi	Wakachiku Construction Co., Ltd
Ota, Jun	The University of Tokyo
Asama, Hajime	The University of Tokyo
Yamashita, Atsushi	The University of Tokyo
Keywords: Marine Robotics, Computer Vision for Other Robotic Applications, Simulation and Animation, Underwater Perception Abstract: We introduce ACSim, a novel acoustic camera simulator that generates realistic sonar images using recursive ray tracing and sonar artifact modeling, providing various ground truth labels for benchmarking and learning purposes. Real-world underwater experiments are challenging, making realistic sonar simulation a crucial alternative. Existing simulators often lack realism or are limited to specific scenes, hindering sim-to-real applications in deep learning. Our simulator addresses this by incorporating complex physical phenomena and providing ground truth data for dataset creation. We employ recursive ray tracing to model multipath reflections in arbitrary scenes and propose physics-based shading for intensity computation. We introduce an anti-aliasing method, and model key artifacts such as rolling shutter distortions and cross-talk noise. We validated our simulator on several tasks, showing that models trained on synthetic images perform well on real data. Additionally, we developed a Blender add-on for an enhanced user interface and plan to release the simulator in the future.

13:45-13:50, Paper TuBT28.6
Stability Enhancement in Variable Morphing Multi-Body AUVs for Underwater Structure Maintenance

Kang, Shuai	Beijing University of Chemical Technology
He, Yuan	Beijing University of Chemical Technology
Zhang, Jin	Liaoning University
Gao, Yuxi	Beijing University of Chemical Technology
Bai, Yunfei	Shenyang Institute of Automation Chinese Academy of Sciences
Li, Longchuan	Beijing University of Chemical Technology
Keywords: Marine Robotics, Dynamics, Body Balancing Abstract: This paper presents a Variable Morphing Multi-Body AUVs (VMMAUVs), designed for underwater structure maintenance. These robots are capable of dynamically adjusting their structure to adapt to varying operational scenarios. The study explores two key stability mechanisms: buoyancy adjustment and aperture angle control, both aimed at optimizing the metacentric height. Through simulations and experiments with different buoyancy configurations and aperture angles, the results show that the proposed methods significantly enhance the system’s stability, enabling faster convergence and better posture retention. The feasibility of the control strategies is validated through various numerical simulations, demonstrating the effectiveness of angle tracking control and buoyancy adjustment in maintaining stability under dynamic oceanic conditions.

13:50-13:55, Paper TuBT28.7
Enhancing Joint Dynamics Modeling for Underwater Robotics through Stochastic Extension

Ding, Mingxuan	Harbin Engineer University
Wang, Gang	Harbin Engineering University
Meng, Lingzhe	Harbin Engineer University
Jixin, Wang	Harbin Engineering University
Wang, Li-Quan	Harbin Engineering University
Lu, Dake	Harbin Engineering University
Wang, Junlong	Harbin Engineering University
Yun, Feihong	Harbin Engineer University
Jia, Peng	Harbin Engineering University
Keywords: Marine Robotics, Dynamics, Contact Modeling Abstract: Accurate joint dynamics models are essential for the compliance and robustness of robot control, especially for robots operating in complex underwater environments. To improve the precision of joint dynamics models, much research focuses on refining specific parameters or incorporating previously overlooked parameters through theoretical deductions and simulations. However, the effectiveness of these advancements can only be determined through empirical validation using the new model. This letter delineates a methodology that facilitates the assessment of potential avenues for enhancing the model, without necessitating prior theoretical derivation. Specifically, a methodology based on stochastic extension is proposed for evaluating directions of model improvement, applied to enhancing the LuGre model for underwater sealed joints. This approach employs the coefficient of variation in LuGre model parameters to assess the direction of model enhancement, with the comparison of coefficients of variation before and after improvement elucidating the superiority of the enhancements. Experimental outcomes corroborate that the LuGre model, refined using this evaluative technique, can precisely estimate friction forces across diverse typical conditions in underwater joint applications. The sealed joints utilizing the improved model demonstrated enhanced response times and precision in underwater environments.

13:55-14:00, Paper TuBT28.8
Nezha-Morphing: Design and Experiments of a Seabird-Inspired Hybrid Aerial Underwater Vehicle

Aili, Muxierepu	Shanghai Jiao Tong University
Song, Xuwang	Shanghai Jiao Tong University
Wang, Yingqiang	Shanghai Jiao Tong University
Zeng, Zheng	Shanghai Jiao Tong University
Lian, Lian	Shanghai Jiaotong University
Keywords: Marine Robotics, Field Robots, Aerial Systems: Applications Abstract: Hybrid aerial underwater vehicles (HAUVs) hold great promise but face challenges like air-water integration and high underwater resistance. This paper presents Nezha-Morphing, a bio-inspired HAUV that emulates seabirds’ adaptive wing extension and retraction mechanisms for efficient movement in both aerial and aquatic domains. It has a servo-driven foldable-arm mechanism and is made of high-strength, low-mass materials with a double-system architecture for aerial and underwater control. This paper details the mechanical and electronic system design, dynamic analysis, and experimental results of Nezha-Morphing. The experimental findings are highly impressive: underwater, the folded-arm configuration effectively reduces resistance, enabling a maximum speed of 0.620 m/s and achieving a high average acceleration, significantly enhancing motion efficiency. In the air, with its arms unfolded, the vehicle exhibits exceptional stability and strong wind resistance, maintaining steady flight even under level-4 wind conditions. Moreover, it completes the water-to-air cross-domain transition in just 1.5 seconds. Nezha-Morphing successfully integrates flight stability, cross-domain adaptability, and hydrodynamic efficiency, showcasing substantial potential for diverse applications.


TuBT29	105
SLAM 2	Regular Session
Chair: Zhao, Junqiao	Tongji University

13:20-13:25, Paper TuBT29.1
Bag-Of-Word-Groups (BoWG): A Robust and Efficient Loop Closure Detection Method under Perceptual Aliasing

Fei, Xiang	Carnegie Mellon University, Robotics Institute
Tian, Yu	Carnegie Mellon University
Choset, Howie	Carnegie Mellon University
Li, Lu	Carnegie Mellon University
Keywords: SLAM, Localization, Computer Vision for Automation Abstract: Loop closure is critical in Simultaneous Localization and Mapping (SLAM) systems to reduce accumulative drift and ensure global mapping consistency. However, conventional methods struggle in perceptually aliased environments, such as narrow pipes, due to vector quantization, feature sparsity, and repetitive textures, while existing solutions often incur high computational costs. This paper presents Bag-of-Word-Groups (BoWG), a novel loop closure detection method that achieves superior precision-recall, robustness, and computational efficiency. The core innovation lies in the introduction of word groups, which captures the spatial co-occurrence and proximity of visual words to construct an online dictionary. Additionally, drawing inspiration from probabilistic transition models, we incorporate temporal consistency directly into similarity computation with an adaptive scheme, substantially improving precision-recall performance. The method is further strengthened by a feature distribution analysis module and dedicated post-verification mechanisms. To evaluate the effectiveness of our method, we conduct experiments on both public datasets and a confined-pipe dataset we constructed. Results demonstrate that BoWG surpasses state-of-the-art methods—including both traditional and learning-based approaches—in terms of precision-recall and computational efficiency. Our approach also exhibits excellent scalability, achieving an average processing time of 16 ms per image across 17,565 images in the Bicocca25b dataset. The source code is available at: https://github.com/EdgarFx/BoWG.

13:25-13:30, Paper TuBT29.2
GSO-SLAM: Robust Monocular SLAM with Global Structure Optimization

Jiang, Bingzheng	Huazhong University of Science and Technology
Wang, Jiayuan	Wuhan University of Technology
Zhu, Lijun	Huazhong University of Science and Technology
Ding, Han	Huazhong University of Science and Technology
Keywords: SLAM, Localization, Mapping Abstract: This paper presents a robust monocular visual SLAM system that simultaneously utilizes point, line, and vanishing point features for accurate camera pose estimation and mapping. To address the critical challenge of achieving reliable localization in low-texture environments, where traditional point-based systems often fail due to insufficient visual features, we introduce a novel approach leveraging Global Primitives structural information to improve the system's robustness and accuracy performance. Our key innovation lies in constructing vanishing points from line features and proposing a weighted fusion strategy to build Global Primitives in the world coordinate system. This strategy associates multiple frames with non-overlapping regions and formulates a multi-frame reprojection error optimization, significantly improving tracking accuracy in texture-scarce scenarios. Evaluations on various datasets show that our system outperforms state-of-the-art methods in trajectory precision, particularly in challenging environments.

13:30-13:35, Paper TuBT29.3
A Novel LiDAR Odometry Based on Surface Distributed Point Feature with Dual Feature Fusion

Li, Jianjie	Institute of Automation, Chinese Academy of Sciences
Guan, Peiyu	CASIA
Gong, Xurong	Institute of Automation, Chinese Academy of Sciences
Liu, Zhicheng	Institute of Automation, Chinese Academy of Sciences
Cao, Zhiqiang	Institute of Automation, Chinese Academy of Sciences
Keywords: SLAM, Localization, Mapping Abstract: LiDAR odometry has gained popularity due to the LiDAR sensor’s accurate depth measurement and robustness to varying illumination conditions. The feature-based methods achieve the advantage of efficiency through feature extraction, while the distribution-based ones attain better accuracy by modeling the point cloud as distributions. Combining the strengths of both methods is anticipated to yield superior performance. However, existing combination schemes typically extract and process features and distributions separately, and such a loosely integrated approach cannot fully leverage their complementarity. To address this problem, we propose a novel LiDAR odometry method based on surface distributed point (SDP) feature with dual feature fusion. Specifically, the SDP feature is introduced to tightly integrate features and distributions, facilitating efficient feature association and map maintenance. On this basis, the associated source and target features are then effectively integrated through dual feature fusion to form the dual-fusion (DF) associated plane. This plane serves as the basis for constructing the point-to-DF associated plane constraint for pose optimization. As a result, the local planar structure is more accurately reflected, thereby enhancing the accuracy of pose estimation. The SDP feature and the resultant constraint are employed in both scan-to-map matching and fixed-lag smoothing, which are hierarchically organized to achieve accurate pose estimation. Experiments on KITTI dataset and large-scale KITTI-360 dataset demonstrate the effectiveness of the proposed method.

13:35-13:40, Paper TuBT29.4
LI-SLAM: Lightweight and Incremental Semantic Visual Localization and Mapping for Autonomous Valet Parking

Wu, Huateng	Southeast Universit
Yang, Tingran	Zhejiang University
Zhao, Song	Deepblue College
Keywords: SLAM, Localization, Mapping Abstract: Autonomous valet parking enables vehicles to identify parking spaces and park without human intervention, with accurate localization being a fundamental prerequisite. Existing methods typically rely on visual feature maps or semantic maps for localization. However, visual feature maps often lack robustness in underground parking environments due to similar structures, weak textures, and fluctuating lighting conditions. Semantic maps require complex post-processing and suffer from heterogeneous data association problems. In this paper, we propose directly to regress the semantic corner points to build the semantic map. Furthermore, we introduce a novel map update and merge method, which is unaffected by environmental and temporal changes, allowing continuous update and refinement of the map. To establish a global semantic map, we use four fisheye cameras to synthesize surround-view images, combined with an IMU (Inertial Measurement Unit) and wheel encoders. Real-world experiments validate the localization accuracy of the proposed system. The experimental results demonstrate the robustness and practicability of the proposed system.

13:40-13:45, Paper TuBT29.5
Convex Hull-Based Algebraic Constraint for Visual Quadric SLAM

Yu, Xiaolong	Tongji University
Zhao, Junqiao	Tongji University
Song, Shuangfu	Tongji University
Zhu, Zhongyang	Tongji University
Yuan, Zihan	Tongji University
Ye, Chen	Tongji University
Feng, Tiantian	Tongji University
Yu, Qiankun	SAIC Intelligent Technology (Shanghai) Co. Ltd
Keywords: SLAM, Localization, Mapping Abstract: Using Quadrics as the object representation has the benefits of both generality and closed-form projection derivation between image and world spaces. Although numerous constraints have been proposed for dual quadric reconstruction, we found that many of them are imprecise and provide minimal improvements to localization. After scrutinizing the existing constraints, we introduce a concise yet more precise convex hull-based algebraic constraint for object landmarks, which is applied to object reconstruction, frontend pose estimation, and backend bundle adjustment. This constraint is designed to fully leverage precise semantic segmentation, effectively mitigating mismatches between complex-shaped object contours and dual quadrics. Experiments on public datasets demonstrate that our approach is applicable to both monocular and RGB-D SLAM and achieves improved object mapping and localization than existing quadric SLAM methods.

13:45-13:50, Paper TuBT29.6
NGD-SLAM: Towards Real-Time Dynamic SLAM without GPU

Zhang, Yuhao	University of Cambridge
Bujanca, Mihai	University of Manchester
Luján, Mikel	University of Manchester
Keywords: SLAM, Localization, Mapping Abstract: Many existing visual SLAM methods can achieve high localization accuracy in dynamic environments by leveraging deep learning to mask moving objects. However, these methods incur significant computational overhead as the camera tracking needs to wait for the deep neural network to generate mask at each frame, and they typically require GPUs for real-time operation, which restricts their practicality in real-world robotic applications. Therefore, this paper proposes a real-time dynamic SLAM system that runs exclusively on a CPU. Our approach incorporates a mask propagation mechanism that decouples camera tracking and deep learning-based masking for each frame. We also introduce a hybrid tracking strategy that integrates ORB features with optical flow methods, enhancing both robustness and efficiency by selectively allocating computational resources to input frames. Compared to previous methods, our system maintains high localization accuracy in dynamic environments while achieving a tracking frame rate of 60 FPS on a laptop CPU. These results demonstrate the feasibility of utilizing deep learning for dynamic SLAM without GPU support. Since most existing dynamic SLAM systems are not open-source, we make our code publicly available at: https://github.com/yuhaozhang7/NGD-SLAM

13:50-13:55, Paper TuBT29.7
Fusion Scene Context: Robust and Efficient LiDAR Place Recognition in Changing Environments

Cao, Fengkui	Shenyang Institute of Automation
Jia, Yanpeng	Shenyang Institute of Automation, University of Chinese Academy
Wang, Ting	Robotics Lab., Shenyang Institute of Automation, CAS
Wang, Hesheng	Shanghai Jiao Tong University
Chen, Xieyuanli	National University of Defense Technology
Keywords: SLAM, Localization, Range Sensing Abstract: Place recognition is an important component for autonomous robot navigation. Many existing LiDAR-based place recognition methods encode the structural information of 3D LiDAR data into 2D image representations. However, most of these intermediates only exploit the projection in a single view, ignoring a great amount of useful information. In this paper, a compact fusion-view image representation of LiDAR point cloud is proposed to extract important structural information from different views. Our proposed method generates such fusion-view images using the corresponding geometric information among points highlighting the edges of objects. It then extracts texture features encoding the shapes and layouts of scene elements into global descriptors, where regional features are designed to adapt local discrepancies caused by seasonal changes. Extensive experiments on the Oxford RobotCar, NCLT, UTBM datasets and our cross-season dataset validate the proposed method and demonstrate its superior generalization performance under different LiDAR sensors and season shifts. Moreover, our proposed method can operate online with a single CPU, making it suitable for resource-limited real robot platforms. To benefit community, our source code and cross-season dataset are available at https://github.com/Cao-DUT/MVSC.

13:55-14:00, Paper TuBT29.8
Edge-Assisted Multi-Robot Visual-Inertial SLAM with Efficient Communication (I)

Liu, Xin	Yanshan University
Wen, Shuhuan	Yanshan University
Zhao, Jing	Yanshan University
Z. Qiu, Tony	University of Alberta
Zhang, Hong	Southern University of Science and Technology
Keywords: SLAM Abstract: The integration of cloud computing and edge computing is an effective way to achieve global consistent and real-time multi-robot Simultaneous Localization and Mapping (SLAM). Cloud computing effectively solves the problem of limited computing, communication and storage capacity of terminal equipment. However, limited bandwidth and extremely long communication links between terminal devices and the cloud result in serious performance degradation of multi-robot SLAM systems. To reduce the computational cost of feature tracking and improve the real-time performance of the robot, a lightweight SLAM method of optical flow tracking based on pyramid IMU prediction is proposed. On this basis, a centralized multi-robot SLAM system based on a robot-edge-cloud layered architecture is proposed to realize real-time collaborative SLAM. It avoids the problems of limited on-board computing resources and low execution efficiency of single robot. In this framework, only the feature points and keyframe descriptors are transmitted and lossless encoding and compression are carried out to realize real-time remote information transmission with limited bandwidth resources. This design reduces the actual bandwidth occupied in the process of data transmission, and does not cause the loss of SLAM accuracy caused by data compression. Through experimental verification on the EuRoC dataset, compared with the current most advanced local feature compression method, our method can achieve lower data volume feature transmission, and compared with the current advanced centralized multi-robot SLAM scheme, it can achieve the same or better positioning accuracy under low computational load.


TuBT30	106
Aerial Perception 2	Regular Session
Chair: Manoonpong, Poramate	Vidyasirimedhi Institute of Science and Technology (VISTEC)
Co-Chair: Hu, Zhiqiang	Tokyo University of Science

13:20-13:25, Paper TuBT30.1
Multimodal Obstacle Detection and Adaptive Neural Control for Autonomous Drones

Phetpoon, Theerawath	Vidyasirimedhi Institute of Science and Technology (VISTEC)
Jaiton, Vatsanai	Vidyasirimedhi Institute of Science and Technology
Rothomphiwat, Kongkiat	VidyasirimedhiInstitute of Science and Technology (VISTEC)
Manawakul, Matas	VISTEC
Chirathanyanon, Pannapat	AI and Robotics Ventures Company Limited
Ritmetee, Pawarit	AI Ans Robotics Venture
Manoonpong, Poramate	Vidyasirimedhi Institute of Science and Technology (VISTEC)
Keywords: Aerial Systems: Perception and Autonomy, Collision Avoidance, Neural and Fuzzy Control Abstract: Achieving reliable navigation for autonomous drones in complex environments remains a significant challenge, particularly in low-light conditions. To address this, we propose an integrated multimodal obstacle detection and adaptive neural control system with online learning to enable drones to navigate autonomously both during the day and at night. The proposed multimodal obstacle detection system integrates two ranging LiDAR sensors and a depth camera with sensory processing techniques, including the iKD-Tree interested area search algorithm, sensor fusion, and neuro-obstacle directional feature extraction. This ensures robust obstacle detection across various conditions without requiring sensor reconfiguration. The adaptive neural control system applies Hebbian correlation-based learning and synaptic scaling plasticity principles to continuously update the control weights, allowing the drone to dynamically adapt its speed and maneuver around obstacles in real time. We evaluate the system’s performance in both simulation and real-world environments, demonstrating its effectiveness under diverse lighting conditions and obstacle types.

13:25-13:30, Paper TuBT30.2
TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-Based Scenes

Maxey, Christopher	University of Maryland, Army Research Laboratory
Choi, Jaehoon	University of Maryland, College Park
Lee, Yonghan	University of Maryland
Kwon, Heesung	DEVCOM Army Research Laboratory
Lee, Hyungtae	US Army Research Laboratory
Manocha, Dinesh	University of Maryland
Keywords: Aerial Systems: Perception and Autonomy, Human Detection and Tracking Abstract: In this paper, we present a new approach to improve the neural rendering fidelity of in-the-wild unmanned aerial vehicle (UAV)-based scenes. Our formulation is designed for dynamic scenes, consisting of small moving objects or human actions in particular. We propose an extension of K-Planes Neural Radiance Field (NeRF), wherein our algorithm stores a set of tiered high dimensional feature vectors. The tiered feature vectors are generated to effectively model conceptual information about a scene as well as to be processed by an image decoder that transforms output feature maps into RGB images. Our technique leverages the information among both static and dynamic objects within a scene and is able to capture salient scene attributes of high altitude videos. We evaluate its performance on challenging datasets, including Okutama Action and UG2, and observe considerable improvement in accuracy over state of the art neural rendering methods.

13:30-13:35, Paper TuBT30.3
UAV Video Deblurring Via Motion-Aware Diffusion: A Path to Robust Target Detection

Hu, Zhiqiang	Tokyo University of Science
Huang, Shouren	Tokyo University of Science
Ishikawa, Masatoshi	Tokyo University of Science
Keywords: Aerial Systems: Perception and Autonomy, Object Detection, Segmentation and Categorization Abstract: Unmanned Aerial Vehicles (UAVs) play a crucial role in scenarios ranging from disaster response to traffic surveillance. However, aerial video footage often suffers from severe motion blur due to rapid flight maneuvers, vibrations, and camera panning, which can significantly degrade downstream tasks such as target detection. In this paper, we tackle UAV video deblurring via a motion-aware diffusion framework tailored for high-dynamic environments. To reduce computational cost, we first propose an Adaptive Latent Scale Selector} that dynamically adjusts the latent space resolution according to the intensity of UAV motion, thus balancing detail preservation with efficiency. We then introduce a Multi-Frame Alignment and Learnable Gating module to warp and gate preceding frames, enabling the model to fuse only relevant temporal information and suppress misaligned or uninformative features. Our method can ensure temporal inconsistencies and effectively recovers sharp details form UAV video stream. Extensive experiments on real UAV benchmarks demonstrate that our method not only yields superior deblurring performance but also significantly boosts target detection accuracy, making it highly applicable to robust aerial vision tasks.

13:35-13:40, Paper TuBT30.4
Three-DOF Controlled Flight in Palm-Scale Micro Robotic Blimp Driven by Flapping Wings

Chen, Jie	National University of Defense Technology
Lu, Xiang	National University of Defense Technology
Wu, Yulie	National University of Defense Technology
Chen, Yang	National University of Defense Technology
Xiao, Dingbang	National University of Defense Technology
Wu, Xuezhong	National University of Defense Technology
Keywords: Aerial Systems: Perception and Autonomy, Biomimetics Abstract: Micro blimps exhibit significant potential for applications in environmental monitoring and disaster rescue. Nonetheless, traditional propulsion methods for micro blimps encounter challenges such as complex mechanical structures, intricate attitude control, and large volumes. This paper present a novel compact and lightweight bio-inspired flapping-wing-driven micro robotic blimp actuated by piezoelectric (PZT), featuring a simplified structure and achieving three-degree-of-freedom (DOF) motion control with only two flapping-wing thruster units. We present a high-voltage drive-sense-control circuit and adaptive control strategy, enabling wireless remote control, onboard attitude sensing, and closed-loop yaw control. The proposed micro robotic blimp, powered by an onboard battery, measures 15 cm in major axis and weighs 1.53 g, achieves a maneuvering speed of 17 cm/s, and angular velocity reaches 12°/s with a yaw angle control accuracy of 0.5°. As the smallest and lightest known self-powered micro blimp capable of stable yaw control, the platform demonstrates excellent endurance and environmental stealth characteristics and advances the design of micro aerial vehicles by offering a novel and efficient approach.

13:40-13:45, Paper TuBT30.5
VisLanding: Monocular 3D Perception for UAV Safe Landing Via Depth-Normal Synergy

Tan, Zhuoyue	Xiamen University
He, Boyong	Xiamen University
Ji, Yuxiang	Xiamen University
Wu, Liaoni	Xiamen University
Keywords: Aerial Systems: Perception and Autonomy, Vision-Based Navigation, Deep Learning Methods Abstract: This paper presents VisLanding, a monocular 3D perception-based framework for safe UAV (Unmanned Aerial Vehicle) landing. Addressing the core challenge of autonomous UAV landing in complex and unknown environments, this study innovatively leverages the depth-normal synergy prediction capabilities of the Metric3D V2 model to construct an end-to-end safe landing zones (SLZ) estimation framework. By introducing a safe zone segmentation branch, we transform the landing zone estimation task into a binary semantic segmentation problem. The model is fine-tuned and annotated using the WildUAV dataset from a UAV perspective, while a cross-domain evaluation dataset is constructed to validate the model's robustness. Experimental results demonstrate that VisLanding significantly enhances the accuracy of safe zone identification through a depth-normal joint optimization mechanism, while retaining the zero-shot generalization advantages of Metric3D V2. The proposed method exhibits superior generalization and robustness in cross-domain testing compared to other approaches. Furthermore, it enables the estimation of landing zone area by integrating predicted depth and normal information, providing critical decision-making support for practical applications.

13:45-13:50, Paper TuBT30.6
Perception-Aware Planning for Quadrotor Flight in Unknown and Feature-Limited Environments

Yu, Chenxin	Harbin Institute of Technology, Shenzhen
Lu, Zihong	Harbin Institute of Technology, Shenzhen
Mei, Jie	Harbin Insitute of Technology, Shenzhen
Zhou, Boyu	Southern University of Science and Technology
Keywords: Aerial Systems: Perception and Autonomy, Motion and Path Planning, Vision-Based Navigation Abstract: Various studies on perception-aware planning have been proposed to enhance the state estimation accuracy of quadrotors in visually degraded environments. However, many existing methods heavily rely on prior environmental knowledge and face significant limitations in previously unknown environments with sparse localization features, which greatly limits their practical application. In this paper, we present a perception-aware planning method for quadrotor flight in unknown and feature-limited environments that properly allocates perception resources among environmental information during navigation. We introduce a viewpoint transition graph that allows for the adaptive selection of local target viewpoints, which guide the quadrotor to efficiently navigate to the goal while maintaining sufficient localizability and without being trapped in feature-limited regions. During the local planning, a novel yaw trajectory generation method that simultaneously considers exploration capability and localizability is presented. It constructs a localizable corridor via feature co-visibility evaluation to ensure localization robustness in a computationally efficient way. Through validations conducted in both simulation and real-world experiments, we demonstrate the feasibility and real-time performance of the proposed method. The source code will be released to benefit the community.

13:50-13:55, Paper TuBT30.7
FASTEX: Fast UAV Exploration in Large-Scale Environments Using Dynamically Expanding Grids and Coverage Paths

Zhang, Xiaoxun	Sun Yat-Sen University
Duan, Peiming	Sun Yat-Sen University
Zheng, Lanxiang	Sun Yat-Sen University
Huang, Junlong	Sun Yat-Sen University
Cheng, Hui	Sun Yat-Sen University
Keywords: Aerial Systems: Perception and Autonomy, Aerial Systems: Applications, Motion and Path Planning Abstract: Autonomous exploration is essential for the effective deployment of quadrotors in various applications. However, existing approaches face significant challenges in large-scale environments, particularly in balancing global coverage efficiency and computational overhead. These limitations often result in poor adaptability to environmental changes and redundant revisits to previously explored areas, reducing overall exploration efficiency. To address these issues, we propose FASTEX, a fast UAV exploration framework designed for large-scale environments, using dynamically expanding grids and coverage paths to improve exploration efficiency. To support efficient exploration planning in large-scale scenarios, we introduce an efficient environment preprocessing method, including a dynamic grid expansion mechanism and a sparse roadmap. Furthermore, we present a hierarchical exploration planning framework that integrates an incremental global planner with a local planner, ensuring high coverage and computational efficiency. Extensive simulation tests demonstrate the superior performance and robustness of the proposed method compared to the state-of-the-art methods. In addition, we conduct various real-world experiments to validate the feasibility of our autonomous exploration system.


TuCT1	401
Award Finalists 3	Regular Session
Chair: Laschi, Cecilia	National University of Singapore
Co-Chair: Tan, Jindong	University of Tennessee, Knoxville

15:00-15:05, Paper TuCT1.1
AgiBot World Colosseo: Large-Scale Manipulation Platform for Scalable and Intelligent Embodied Systems

Bu, Qingwen	The University of Hong Kong
Ren, Guanghui	Agibot
Liu, Chiming	Agibot
Shi, Modi	Beihang University
Xie, Chengen	Shanghai Jiaotong University
He, Xindong	AgiBot
Song, Jianheng	AgiBot
Lu, Yuxiang	The University of Hong Kong
Feng, Siyuan	Agibot
Mu, Yao	The University of Hong Kong
Zhao, Chengyue	AgiBot
Yang, Shukai	Shanghai AgiBot Innovation Technology Co., Ltd
Xiong, Ziyu	AgiBot
Huang, Xu	Agibot
Wei, Dafeng	Agibot
Xu, Guo	AgiBot
Liu, Yi	Beihang University
Jiang, Yuxin	AgiBot
Cui, Xiuqi	FuZhou University
Ruan, Cheng	AGIBOT
Zeng, Jia	AGIBOT
Yang, Lei	AGIBOT
Chen, Li	Shanghai AI Laboratory, the University of Hong Kong
Ding, Yan	SUNY Binghamton
Cai, Jisong	Shanghai AI Laboratory
Sima, Chonghao	Purdue University
Wang, Huijie	OpenDriveLab
Shen, Yongjian	AgiBot
Li, Jialu	AgiBot
Jing, Cheng	AgiBot
Shi, Mingkang	AgiBot
Zhang, Chi	AgiBot
Zhang, Qinglin	AgiBot
CunBiao, Yang	AgiBot
Wang, Wenhao	University of Pennsylvania
Hu, Xuan	AgiBot
Zhao, Bin	Northwestern Polytechnical University
Qiao, Yu	Shenzhen Institutes of Advanced Technology, ChineseAcademyof Sci
Yan, Junchi	Shanghai Jiao Tong University
Luo, Jianlan	UC Berkeley
Luo, Ping	The University of Hong Kong
Yao, Maoqing	AgiBot
Li, Hongyang	The University of Hong Kong
Keywords: Deep Learning in Grasping and Manipulation, Bimanual Manipulation, Data Sets for Robot Learning Abstract: We explore how scalable robot data can address real-world challenges for generalized robotic manipulation. Introducing AgiBot World, a large-scale platform comprising over 1 million trajectories across 217 tasks in five deployment scenarios, we achieve an order-of-magnitude increase in data scale compared to existing datasets. Accelerated by a standardized collection pipeline with human-in-the-loop verification, AgiBot World guarantees high-quality and diverse data distribution. It is extensible from grippers to dexterous hands and visuo-tactile sensors for fine-grained skill acquisition. Building on top of data, we introduce Genie Operator-1 (GO-1), a novel generalist policy that leverages latent action representations to maximize data utilization, demonstrating predictable performance scaling with increased data volume. Policies pre-trained on our dataset achieve an average performance improvement of 30% over those trained on Open X-Embodiment, both in in-domain and out-of-distribution scenarios. GO-1 exhibits exceptional capability in real-world dexterous and long-horizon tasks, achieving over 60% success rate on complex tasks and outperforming prior RDT approach by 32%. By open-sourcing the dataset, tools, and models, we aim to democratize access to large-scale, high-quality robot data, advancing the pursuit of scalable and general-purpose intelligence.

15:05-15:10, Paper TuCT1.2
DiFuse-Net: RGB and Dual-Pixel Depth Estimation Using Window Bi-Directional Parallax Attention and Cross-Modal Transfer Learning

Swami, Kunal	Samsung R&D Institute India Bangalore
Gupta, Debtanu	Samsung
Muduli, Amrit Kumar	Google India Pvt. Ltd
Jaiswal, Chirag	Samsung R&D Institute Bangalore
Bajpai, Pankaj Kumar	Samsung R&D Institute India Bangalore
Keywords: Deep Learning for Visual Perception, Data Sets for Robotic Vision, Deep Learning Methods Abstract: Depth estimation is crucial for intelligent systems, enabling applications from autonomous navigation to augmented reality. While traditional stereo and active depth sensors have limitations in cost, power, and robustness, dual-pixel (DP) technology, ubiquitous in modern cameras, offers a compelling alternative. This paper introduces DiFuse-Net, a novel modality decoupled network design for disentangled RGB and DP based depth estimation. DiFuse-Net features a window bi-directional parallax attention mechanism (WBiPAM) specifically designed to capture the subtle DP disparity cues unique to smartphone cameras with small aperture. A separate encoder extracts contextual information from the RGB image, and these features are fused to enhance depth prediction. We also propose a Cross-modal Transfer Learning (CmTL) mechanism to utilize large-scale RGB-D datasets in the literature to cope with the limitations of obtaining large-scale RGB-DP-D dataset. Our evaluation and comparison of the proposed method demonstrates its superiority over the DP and stereo-based baseline methods. Additionally, we contribute a new, high-quality, real-world RGB-DP-D training dataset, named Dual-Camera Dual-Pixel (DCDP) dataset, created using our novel symmetric stereo camera hardware setup, stereo calibration and rectification protocol, and AI stereo disparity estimation method.

15:10-15:15, Paper TuCT1.3
Like Playing a Video Game: Spatial-Temporal Optimization of Foot Trajectories for Controlled Football Kicking in Bipedal Robots

Li, Wanyue	The University of Hong Kong
Ma, Ji	The University of Hong Kong
Lu, Minghao	The University of Hong Kong
Lu, Peng	The University of Hong Kong
Keywords: Humanoid and Bipedal Locomotion, Optimization and Optimal Control, Legged Robots Abstract: Humanoid robot soccer poses several challenges, particularly in maintaining system stability during aggressive kicking motions while achieving precise ball trajectory control. Current solutions, whether traditional position-based control methods or reinforcement learning (RL) approaches, exhibit significant limitations. Model predictive control (MPC) is a prevalent approach for ordinary quadruped and biped robots. While MPC has demonstrated advantages in dynamic motion control for legged robots, existing studies often oversimplify the leg swing progress, relying merely on simple trajectory interpolation methods. This severely constrains the foot's environmental interaction capability, which is particularly detrimental for tasks such as ball kicking. This study innovatively adapts the spatial-temporal trajectory planning method, which has been successful in drone applications, to bipedal robotic systems. The proposed approach autonomously generates foot trajectories that satisfy constraints on target kicking position, velocity, and acceleration while simultaneously optimizing swing phase duration. Experimental results demonstrate that the optimized trajectories closely mimic human kicking behavior, featuring a backswing motion. Simulation and hardware experiments confirm the algorithm's efficiency, with trajectory planning times under 1 ms, and its reliability, achieving nearly 100 % task completion accuracy when the soccer goal is within the range of -90° to 90°.

15:15-15:20, Paper TuCT1.4
Resilient Multi-Robot Target Tracking with Sensing and Communication Danger Zones

Li, Peihan	Drexel University
Wu, Yuwei	University of Pennsylvania
Liu, Jiazhen	Georgia Institute of Technology
Sukhatme, Gaurav	University of Southern California
Kumar, Vijay	University of Pennsylvania
Zhou, Lifeng	Drexel University
Keywords: Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents, Planning, Scheduling and Coordination Abstract: Multi-robot collaboration for target tracking in adversarial environments poses significant challenges, including system failures, dynamic priority shifts, and other unpredictable factors. These challenges become even more pronounced when the environment is unknown. In this paper, we propose a resilient coordination framework for multi-robot, multi-target tracking in environments with unknown sensing and communication danger zones. We consider scenarios where failures caused by these danger zones are probabilistic and temporary, allowing robots to escape from danger zones to minimize the risk of future failures. We formulate this problem as a nonlinear optimization with soft chance constraints, enabling real-time adjustments to robot behaviors based on varying types of dangers and failures. This approach dynamically balances target tracking performance and resilience, adapting to evolving sensing and communication conditions in real-time. To validate the effectiveness of the proposed method, we assess its performance across various tracking scenarios, benchmark it against methods without resilient adaptation and collaboration, and conduct several real-world experiments.

15:20-15:25, Paper TuCT1.5
LLM-CBT: LLM-Driven Closed-Loop Behavior Tree Planning for Heterogeneous UAV-UGV Swarm Collaboration

Tian, Yuanyuan	Zhejiang University
Song, Weilong	Beijing Institute of Technology
Fu, Jinna	Zhejiang University
Li, Zhenhui	Zhejiang University
Fang, Chenyu	Zhejiang University
Wang, Linbo	Zhejiang University
Hu, Wanyang	Zhejiang University
Liu, Yabo	Zhejiang University
Keywords: Task Planning, Multi-Robot Systems, Distributed Robot Systems Abstract: The heterogeneous cluster system holds significant application potential in scenarios such as collaborative logistics, disaster response operations, and precision agriculture, but achieving effective task planning for its subsystems remains a challenging issue due to specialized robotic hardware and distinct action spaces. To this end, an innovative framework called LLM-driven Closed-Loop Behavior Tree (LLM-CBT) is proposed. LLMs and behavior trees (BTs) are integrated for task planning in heterogeneous unmanned clusters, including Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs). Particularly, a novel mechanism, Generation-Refinement-Execution-Feedback (GREF), is introduced, in which an initial behavior tree is generated by LLM and iteratively refined. The refined behavior tree is then executed, and adjustments are made based on the execution results, forming a closed-loop process that ultimately achieves the task objectives. In this way, the executability of BTs is improved, and the robustness of task execution in dynamic environments is enhanced. Experiments were conducted across three scenarios with varying task complexity. The results show that the GREF closed-loop mechanism is essential for the effective operation of heterogeneous unmanned clusters.

15:25-15:30, Paper TuCT1.6
DRACo-SLAM2: Distributed Robust Acoustic Communication-Efficient SLAM for Imaging Sonar Equipped Underwater Robot Teams with Object Graph Matching

Huang, Yewei	Dartmouth College
McConnell, John	United States Naval Academy
Lin, Xi	Stevens Institute of Technology
Englot, Brendan	Stevens Institute of Technology
Keywords: Marine Robotics, Multi-Robot SLAM, Range Sensing Abstract: We present DRACo-SLAM2, a distributed SLAM framework for underwater robot teams equipped with multibeam imaging sonar. This framework improves upon the original DRACo-SLAM by introducing a novel representation of sonar maps as object graphs and utilizing object graph matching to achieve time-efficient inter-robot loop closure detection without relying on prior geometric information. To better-accommodate the needs and characteristics of underwater scan matching, we propose incremental Group-wise Consistent Measurement Set Maximization (GCM), a modification of Pairwise Consistent Measurement Set Maximization (PCM), which effectively handles scenarios where nearby inter-robot loop closures share similar registration errors. The proposed approach is validated through extensive comparative analyses on simulated and real-world datasets.


TuCT2	402
Modeling, Control, and Learning for Soft Robots 1	Regular Session
Chair: Huang, Baoru	Imperial College London
Co-Chair: Mechbal, Nazih	Arts Et Métiers Paritech, Paris

15:00-15:05, Paper TuCT2.1
Static Analysis and Modeling of a Trunk-Like Robot Capable of Adjustable Multi-Turn Helical Deformation

Long, Zeyu	Osaka University
Wakamatsu, Hidefumi	Grad. School of Eng., Osaka Univ
Iwata, Yoshiharu	Osaka University
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications, Grasping Abstract: Current trunk-like continuum robots face limitations in actuator capabilities, hindering the realization of adjustable multi-turn helical deformations. In our previous research [1,2], we proposed the Twisted String and Spiral Hose (TSSH) mechanism, which utilizes both the tensile and torsional forces of twisted string actuators (TSAs) to generate helical deformations. However, the deformation principles of the TSSH mechanism remain not fully understood, motivating further investigation. The main contribution of this study is the analysis of TSSH deformation principles using a conventional mathematical model of twisted strings, supported by experimental validation. Building on this analysis, we developed a static simulation model based on potential energy minimization to predict TSSH deformation. The proposed model provides insights into the deformation behavior of the TSSH mechanism, enabling parameter optimization for enhanced performance. Furthermore, this static analysis can be extended to other TSA-based and tendon-driven systems, providing a valuable reference for string-actuated robotic mechanisms.

15:05-15:10, Paper TuCT2.2
Modeling the States of Liquid Phase Change Pouch Actuators by Reservoir Computing

Caremel, Cedric	The University of Tokyo
Nguyen, Khang	University of Texas at Arlington
Nguyen, Anh	University of Liverpool
Huber, Manfred	University of Texas at Arlington
Kawahara, Yoshihiro	The University of Tokyo
Ta, Tung D.	The University of Tokyo
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Sensors and Actuators, Soft Robot Materials and Design Abstract: Liquid phase change pouch actuators (liquid pouch motors) hold great promise for a wide range of robotic applications, from artificial organs to pneumatic manipulators for dexterous manipulation. However, the usability of liquid pouch motors remains challenging due to the nonlinear intrinsic properties of liquids and their highly dynamic implications for liquid-gas phase changes, which complicate state modeling and estimation. To address these issues, we propose a reservoir computing-based method for modeling the inflation states of a customized liquid pouch motor, which serves as an actuator, featuring four Peltier heating junctions. We use a motion capture system to track the landmark movements on the pouch as a proxy for its volumetric profile. These movements represent the internal liquid-gas phase changes of the pouch at stable room temperature, atmospheric pressure, and in the presence of electrical noise. The motion coordinates are thus learned by our reservoir computing framework, PhysRes, to model the states based on prior observations. Through training, our model achieves excellent results on the test set, with a normalized root mean squared error of 0.0041 in estimating the states and a corresponding volumetric error of 0.0160%. To further demonstrate how such actuators could be implemented in the future, we also design a dual-pouch actuator-based robotic gripper to control the grasping of soft objects.

15:10-15:15, Paper TuCT2.3
Kinetostatic Modeling of Retractable and Prismatic Spring Body for Continuum Climbing Robots in Discontinuous Terrains

Yang, Pengpeng	Harbin Institute of Technology
Zang, Jialin	CNNP Nuclear Power Operations Management Co., Ltd
Jin, Ge	CNNP Nuclear Power Operations Management Co., Ltd
Long, Junliang	Harbin Institute of Technology, Weihai
Huang, Bo	Harbin Institute of Technology
Zhao, Jianwen	Harbin Institute of Technology, Weihai
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications, Flexible Robotics Abstract: There are few studies on the mechanics of the retractable backbone for continuum climbing robots, especially the non-circular cross-section. The retractable non-circular structure endows the robot with more compact structure, adjustability in initial stiffness, and dexterous mobility in narrow space. Consequently, a retractable prismatic spring backbone is proposed. Aiming at rectangular helical characteristic and coupling deformation, the backbone is equivalent to an elastic beam, whose equivalent stiffness is solved by the projection principle of the micro-segment deformation. Then the finite piecewise method and continuous differential method are used to establish its mechanical model. The piecewise method uses linear superposition principle to decouple the compression and bending deformation, and the rotation angle is solved by using the projection principle of the bending deformation. The continuous method uses the Cosserat-rod theory to establish the variable-curvature mechanics based on the equivalent beam, whose boundary-value problem is solved by gradually extending the integral region. Finally, two theory methods are in good agreement with FEA and experiment results; the continuous method has higher accuracy and piecewise method has lower computation cost; a multipurpose continuum climbing robot composed of the spring backbone, rotatable joint and flexible claw is applied to inspection in enclosed equipment.

15:15-15:20, Paper TuCT2.4
Multiphysics Model and Hysteresis Compensation for Control of Electroactive Actuators

Ferradj, Imane	Arts Et Métiers ParisTech
Monteiro, Eric	Arts Et Metiers Paristech, PIMM
Roland, Sébastien	Arts Et Métiers ParisTech
Mechbal, Nazih	Arts Et Métiers Paritech, Paris
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Sensors and Actuators Abstract: Electroactive polymer actuators based on PVDFTrFE-CTFE terpolymers are rapidly gaining prominence in applications requiring compliant, high-precision actuation, notably in soft robotics for minimally invasive surgical tools. However, the relaxor ferroelectric properties of these materials inherently introduce significant nonlinear hysteresis, which can severely impair control accuracy. This paper presents a modelbased control architecture that overcomes these challenges by integrating an analytically inverted generalized Prandtl-Ishlinskii hysteresis model with model predictive control (MPC). The proposed framework effectively linearizes the hysteresis behavior while optimizing control inputs under stringent actuator constraints, ensuring rapid response and zero steady-state error in quasi-static regimes. The developed model has been successfully compared through experimental measurements of actuator deflection and electric displacement, demonstrating strong correlation between predicted and observed behavior. Comprehensive numerical simulations demonstrate that the MPC strategy achieves near-instantaneous settling times and markedly improved trajectory tracking compared to conventional proportional-integral-derivative control. These promising results underscore the potential of the integrated approach for precision-critical urological applications, where precise and compliant actuation is essential for safe, minimally invasive procedures.

15:20-15:25, Paper TuCT2.5
On the Design, Analysis, and Experimental Validation of Pneumatically-Actuated, Soft Robotic, Telescopic Structures

Busby, Bryan	The University of Auckland
Jiang, Haodan	The University of Auckland
Thompson, Marcus	Whanauka Limited
Liarokapis, Minas	The University of Auckland
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Materials and Design, Soft Robot Applications Abstract: Soft robotics leverages highly elastic, deformable materials to enable sensitive human-centered applications in medicine, rehabilitation, and assistance, as well as industrial applications like robotic grasping and load-bearing. One particular type of soft robotic actuator - the pneumatically-actuated, soft robotic, telescopic structure (PASTS) - is a relatively new concept that utilises compliant material and geometry reminiscent of traditional telescopes to produce linear motion and force exertion. Previous works on telescopic soft actuators have focused on specific applications rather than fundamental mechanics, creating a clear knowledge gap in understanding their design, dynamics, and dependencies. This paper provides an in-depth study of the fundamental design parameters of soft telescopic actuators, and examines the impacts of certain critical dimensions and geometries on the physical behaviour and capabilities of these soft actuators. It revolves around the exploration of the influence of telescopic structure's length, wall thickness and number of rings on its inflation and deflation behaviour, lifting and lowering speed, and motion smoothness. By experimentally verifying these relationships using several design variations, it demonstrates that telescopic structures are capable of repeatable linear extension within ±0.85 mm of precision. It also determines the varying degrees by which each critical parameter affects the desirable properties of the actuator, allowing for telescopic structures to be tailored and optimised for specific applications.

15:25-15:30, Paper TuCT2.6
Soft Synergies: Model Order Reduction of Hybrid Soft-Rigid Robots Via Optimal Strain Parameterization

Alkayas, Abdulaziz Y.	Khalifa University
Mathew, Anup Teejo	Khalifa University
Feliu, Daniel	Delft University of Technology (TU Delft)
Deng, Ping	The University of Hong Kong
George Thuruthel, Thomas	University College London
Renda, Federico	Khalifa University of Science and Technology
Keywords: Modeling, Control, and Learning for Soft Robots, Dynamics, Soft Sensors and Actuators, Reduced Order Modeling Abstract: Soft robots offer remarkable adaptability and safety advantages over rigid robots, but modeling their complex, nonlinear dynamics remains challenging. Strain-based models have recently emerged as a promising candidate to describe such systems, however, they tend to be high-dimensional and time-consuming. This paper presents a novel model order reduction approach for soft and hybrid robots by combining strain-based modeling with Proper Orthogonal Decomposition (POD). The method identifies optimal coupled strain basis functions -or mechanical synergies- from simulation data, enabling the description of soft robot configurations with a minimal number of generalized coordinates. The reduced order model (ROM) achieves substantial dimensionality reduction in the configuration space while preserving accuracy. Rigorous testing demonstrates the interpolation and extrapolation capabilities of the ROM for soft manipulators under static and dynamic conditions. The approach is further validated on a snake-like hyper-redundant rigid manipulator and a closed-chain system with soft and rigid components, illustrating its broad applicability. Moreover, the approach is leveraged for shape estimation of a real six-actuator soft manipulator using only two position markers, showcasing its practical utility. Finally, the ROM's dynamic and static behavior is validated experimentally against a parallel hybrid soft-rigid system, highlighting its effectiveness in representing the High-Order Model (HOM) and the real system. This POD-based ROM offers significant computational speed-ups, paving the way for real-time simulation and control of complex soft and hybrid robots.

15:30-15:35, Paper TuCT2.7
Data-Driven Methods Applied to Soft Robot Modeling and Control: A Review (I)

Chen, Zixi	Scuola Superiore Sant'Anna
Renda, Federico	Khalifa University of Science and Technology
Le Gall, Alexia	Scuola Superiore Sant'Anna
Mocellin, Lorenzo	The BioRobotics Institute - Scuola Superiore Sant’Anna
Bernabei, Matteo	Scuola Superiore Sant'Anna
Dangel, Théo	Scuola Superiore Sant'Anna
Ciuti, Gastone	Scuola Superiore Sant'Anna
Cianchetti, Matteo	Scuola Superiore Sant'Anna
Stefanini, Cesare	Scuola Superiore Sant'Anna
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications Abstract: Soft robots show compliance and have infinite degrees of freedom. Thanks to these properties, such robots can be leveraged for surgery, rehabilitation, biomimetics, unstructured environment exploring, and industrial grippers. In this case, they attract scholars from a variety of areas. However, nonlinearity and hysteresis effects also bring a burden to robot modeling. Moreover, following their flexibility and adaptation, soft robot control is more challenging than rigid robot control. In order to model and control soft robots, a large number of data-driven methods are utilized in pairs or separately. This review first briefly introduces two foundations for data-driven approaches, which are physical models and the Jacobian matrix, then summarizes three kinds of data-driven approaches, which are statistical method, neural network, and reinforcement learning. This review compares the modeling and controller features, e.g., model dynamics, data requirement, and target task, within and among these categories. Finally, we summarize the features of each method. A discussion about the advantages and limitations of the existing modeling and control approaches is presented, and we forecast the future of data-driven approaches in soft robots. A website (https://sites.google.com/view/23zcb) is built for this review and will be updated frequently.

15:35-15:40, Paper TuCT2.8
Two-Dimensional Trajectory Tracking of a Magnetic Continuum Robot by Optimal Magnet Manipulation

Hao, Lijun	Beijing Jiaotong Uinversity
Yang, Tangwen	Beijing Jiaotong University
Liu, Sheng	Fuwai Hospital
Zheng, Zhe	Fuwai Hospital
Keywords: Modeling, Control, and Learning for Soft Robots, Motion and Path Planning Abstract: The steerability of catheter is critical to the success of interventional procedure. In this paper, a magnetic continuum robot is presumably mounted to the distal of a catheter to pull it in the narrow, bifurcate, tortuous pathways of the blood vessels. The continuum robot is actuated by a permanent magnet. It is linear and soft, which complicates its interaction with the magnet. However, it generates the omni-directional deflection at its tip and improves its steerability instead. In the non-uniform field generated by a permanent magnet, besides magnetic torque, magnetic force is acting on the robot as well, and both are used to derive the dynamic equations to govern the interaction between the robot and the magnet, in terms of the Euler-Bernoulli beam theory. An iterative algorithm to calculate the magnet pose is proposed to generate an optimal moving magnetic field and actuate the robot to follow the planned trajectory at its tip. A robot prototype is fabricated, and the experimental results show that this prototype can accurately track the planned trajectories in 2D space by the magnet manipulation with a robot arm.


TuCT3	403
Automation at Micro-Nano Scales	Regular Session
Chair: Yang, Liangjing	Zhejiang University

15:00-15:05, Paper TuCT3.1
A Macro-Micro Vision Integrated Micromanipulation System for Self-Initialization and Resilient Control (I)

Wang, Tiexin	Zhejiang University
Long, Yun	Zhejiang University
Weng, Tianle	Zhejiang University
Yang, Liangjing	Zhejiang University
Keywords: Automation at Micro-Nano Scales, Biological Cell Manipulation, Micro/Nano Robots Abstract: Robotic micromanipulation systems (RMS) enable precise and repeatable operations under a microscope. Traditional RMS rely solely on microscopic visual feedback, necessitating time-consuming manual positioning to bring the tool tip within the microscope field-of-view (Micro-FOV), which limits efficiency and heavily depends on operator skill. This paper proposes an innovative RMS that integrates macro and micro vision to automate the aforementioned tool tip positioning and facilitate resilient control. The system utilizes an external camera to obtain the macro field-of-view (Macro-FOV), containing the tool and fiducial markers, and estimates the tool tip’s 3D position by triangulation. Visual servoing is then used to guide the tool tip towards the Micro-FOV. Under the Micro-FOV, a tool-sweep detector based on partitioned difference images is used to sequentially locate the tool’s shaft and tip. After auto-focusing, the system executes tool tip and Petri dish resilient control based on our developed self-calibration and self-recalibration mechanisms. During the operation, the system provides an intuitive user interface that includes both macro and micro information, improving the visualization and productivity of micromanipulation. Experiments show that the self-initialization scheme can be implemented across different macro camera viewpoints, reducing the average tip positioning time from 65.70 to 50.08 seconds compared to manual operation, thereby decreasing manual labor intensity and improving efficiency. The self-recalibration mechanism achieves precision and resilient control, with an average error of 0.95 μm over 25 continuous trials. Additionally, the system exhibits robustness against vibration and visual interference, underscoring its potential for diverse biomedical applications.

15:05-15:10, Paper TuCT3.2
Cell Cryopreservation in a Microfluidic Chip with Vision-Based Fluid Control and Region Reaching (I)

Miao, Shu	Tsinghua University
Jia, Yongyi	Tsinghua University
Jiang, Ze	Hong Kong Centre for Logistics Robotics
Xu, Jiehuan	Institute of Animal Husbandry and Veterinary Science, Shanghai A
Li, Xiang	Tsinghua University
Keywords: Automation at Micro-Nano Scales, Biological Cell Manipulation, Micro/Nano Robots Abstract: The solution exchange process is crucial in cell cryopreservation, an assistive reproductive technique that enhances reproductive autonomy and helps women overcome infertility challenges. Such a task is time-critical in the sense that {the duration of cell exposure to the solutions significantly impacts cell viability}. In this paper, a new micromanipulation system has been developed to automate such a task, where the contributions can be summarized as follows. This paper addresses the challenge of tracking cell positions within a microfluidic chip, due to the limitations of the microscope's field of view (FOV). By utilizing a region control approach with visual feedback, the proposed method ensures that the cell remains centered in the image. Additionally, a real-time tracking method based on correlation filtering is presented, which precisely localizes cells and controls the syringe pump flow rate to mitigate delays and prevent cell loss. Experimental results illustrate the consistent positioning of the micro objects/cells at each step, the satisfactory success rate, and the robustness to the loss of vision feature. Integrated with novel manipulation strategies, our intelligent manipulation system offers a promising solution for in vitro fertilization (IVF), characterized by an embryologist-centered configuration and standardized robotic manipulation. This study aims to standardize clinical cryopreservation by transitioning from manual operation to a more efficient and reliable automated process. Furthermore, this approach can simplify procedures and enhance oocyte viability.

15:10-15:15, Paper TuCT3.3
Selective, Robust and Precision Manipulation of Particles in Complex Enviorenments with Ultrasonic Phased Transducer Array and Microscope

Wang, Mingyue	Shanghaitech Univerisity
An, Siyuan	Shanghaitech University
Sun, Zhenhuan	Shanghaitech University
Li, Jiaqi	ShanghaiTech University
Wang, Yang	Shanghaitech University
Liu, Song	ShanghaiTech University
Keywords: Automation at Micro-Nano Scales, Grippers and Other End-Effectors, Micro/Nano Robots, Noncontact micromanipulation Abstract: The noncontact acoustic manipulation of particles, bio-samples, droplets, and air bubbles has emerged as a promising technology in fields of biology, chemistry, medicine, etc. The noncontact nature offers significant advantages in terms of bio-compatibility, contamination-free, and material versatility. However, current noncontact acoustic manipulation techniques still lack adequate selectivity, robustness, and precision controllability in complex environments. To this end, this paper proposes an automated noncontact manipulation system that leverages a high-density ultrasonic phased transducer array in combination with a microscope to further optimize and enhance the controllability and flexibility of noncontact particle manipulation. This work presents several notable contributions. First, we successfully realized selective particle manipulation, allowing instantaneous interaction with users to perform user-designated and objective-oriented manipulation tasks. Second, we integrated closed-loop control strategy into the system that effectively mitigates misalignment errors induced by the trapping stiffness heterogeneity of acoustic trap, and enables automated precision position control of particles in complex environments (in 30 mm wide workspace, positioning precision is one-fortieth of the wavelength). Third, we proposed a reconfigurable acoustic trap design method, named pseudo-vortex trap, featuring real-time computing and trapping particles larger than the wavelength. The system setup, the calibration specifics, the acoustic trap design methodology, and the corresponding visual servo control scheme (in terms of selective trapping, precision positioning, and dynamic trajectory planning) are given in detail in the paper. Meanwhile, the trapping stiffness and the manipulation stability is also analyzed in this work. Experimental results well demonstrated the effectiveness of proposed system.

15:15-15:20, Paper TuCT3.4
A Metal Film Thickness Measurement System with a Large Range Based on High-Performance ME Sensors (I)

Qiu, Yang	China Jiliang University
Shi, Lingshan	China Jiliiang University
Guoliang, Yu	China Jiliang University
Zhu, Mingmin	China Jiliang University
Li, Yan	China Jiliang University
Wang, Jiawei	China Jiliang University
Zhang, Shulin	Shanghai Institute of Microsystem and Infor Mation Technology, C
Ren, Kun	Zhejiang University
Zhou, Haomiao	China Jiliang University
Keywords: Automation at Micro-Nano Scales, Nanomanufacturing, Semiconductor Manufacturing Abstract: The majority of traditional eddy current-based metal film thickness measurement systems measure the thickness of the metal film by detecting changes in the impedance or voltage of the detection coil, leading to limited measurement range and susceptibility to measurement errors caused by variations in lift-off distance. This article proposes a method for metal film thickness measurement across a wide range using a high-performance magnetoelectric (ME) sensor to directly detect the total magnetic field of the excitation field and the induced field. Numerical calculations based on the eddy current magnetic field model reveal a highly linear relationship within a specific range between this total magnetic field and the logarithm of the metal film thickness. If the detectable magnetic field range of the magnetic field sensor covers 32–51 nT with a bandwidth of up to 1.5 MHz, theoretically, the measurement of copper film thickness within the range of 0.65 nm–1000 µm can be achieved by adjusting the excitation magnetic field frequency, and the measurement error introduced by the lift-off distance can also be ignored. The experimentally prepared ME sensor, covering a range of 30 pT–100 nT within a bandwidth of 1.5 kHz–1.5 MHz,was used to construct a metal film thickness measurement system. Measurements on copper films with thicknesses ranging from nanometers to micrometers, under excitation fields at six different frequency excitation fields within the range of 1.5 kHz–1.5 MHz, exhibited measurement errors within 1%, confirming the feasibility of this method for a wide range of copper film thicknesses from 60 nm–350 µm.

15:20-15:25, Paper TuCT3.5
Off-Focus Image Restoration Based Three-Dimensional Particle Localization for Improved Measurement Resolution (I)

Wang, Yuliang	Beihang University
Keywords: Automation at Micro-Nano Scales, Visual Servoing Abstract: Ultrahigh-precision 3-D particle localization and tracking in microscopic vision systems plays an important role in numerous micro/nanoscale applications. However, it remains a challenge due to image degradation caused by lens defocusing. Here, we propose an off-focus-image-restoration-based approach to enhance measurement resolution in 3-D particle localization. In this method, a point spread function corresponding to the offfocus position is established and served as the deconvolution kernel. By employing an improved nonlinear filtering algorithm, high-quality restored particle images are reconstructed effectively. The principle of the proposed method is presented in detail, followed by experimental validation in particle localization and 3-D trajectory tracking. With the proposed approach, results demonstrate thatmeasurement resolution along the lateral and axial directions is improved by 38% and 31%, respectively. The proposed method provides a practical solution for 3-D particle tracking within a relatively large axial range, which is believed to be significant for micro/nanoscale spatial-localization-related applications.

15:25-15:30, Paper TuCT3.6
Automated Sperm Immobilization with Compensation for Sperm Intrinsic and Fluid Flow-Induced Movements (I)

Gu, Haoyuan	Shanghai Jiao-Tong University
Zhai, Rongan	Shanghai University
Yue, Chunfeng	Kagawa University
Sun, Yu	University of Toronto
Dai, Changsheng	Dalian University of Technology
Keywords: Automation at Micro-Nano Scales, Biological Cell Manipulation, Visual Servoing Abstract: In automated sperm immobilization, motorized stages are conventional means for transporting sperm, while they are not typical devices in clinical setups. This study designs a new robotic sperm immobilization method by leveraging a micromanipulator to position a tooltip above sperm and immobilize it without a motorized stage. The challenge arises when tooltip motion agitates the medium, leading to the passive displacement of sperm, and the sperm’s intrinsic movement also complicates tooltip positioning. To address these concerns, first, a predictor is introduced to predict the sperm’s intrinsic movement based on its identified motion pattern. Second, a model is established to describe the fluid flow and characterize the passive displacement of sperm as a function of tooltip velocity and medium parameters. Finally, an adaptive control algorithm that accounts for uncalibrated medium parameters is designed to compensate for the sperm’s intrinsic movement and tooltip motion-induced sperm displacement. The stability of the control algorithm is analyzed, and the boundedness of the positioning error is proven. Experiments verified the effectiveness of the proposed method in both predicting and feedforwarding sperm’s intrinsic movement and in compensating for tooltip motion-induced sperm displacement. The performance of the proposed method is comparable to existing robotic sperm immobilization methods that rely on motorized stages, while being easier to implement in clinics. With the proposed method, the system achieved a success rate of 90.9% in sperm immobilization, representing an improvement of 28.2% over the method not factoring in the sperm’s intrinsic movement and 16.3% over the method not accounting for the tooltip motion-induced sperm displacement, respectively

15:30-15:35, Paper TuCT3.7
Learning-Based Auto-Focus and 3D Pose Identification of Moving Micro and Nanowires in Fluid Suspensions (I)

Song, Jiaxu	Binghamton University
Wu, Juan	Binghamton University
Yu, Kaiyan	Binghamton University
Keywords: Automation at Micro-Nano Scales, Micro/Nano Robots, AI-Based Methods Abstract: Precise manipulation of micro- and nano-objects through visual feedback is challenging because of the difficulty of observing their motion along the line-of-sight of microscopes. This paper presents an efficient learning-based auto-focus (AF) and visual posture estimation scheme for tracking the three-dimensional (3D) poses of multiple moving micro- and nanowires in fluid suspensions under bright-field microscopes. The proposed AF and 3D pose estimation methods integrate convolutional neural networks (CNNs) to precisely identify the focal distances and inclination angles of multiple moving wires through a single region-of-interest (ROI) image for each wire. Furthermore, we demonstrate the versatility of the proposed AF method by adapting it for wires of other materials through transfer learning (TF), using a limited dataset. Extensive experimental results validate the high accuracy and efficiency of AF and 3D pose estimation compared to traditional methods. This work lays the foundation for the automated control of micro- and nano-objects in 3D microfluidic environments.

15:35-15:40, Paper TuCT3.8
Design of a Flexible XYZ Micropositioner with Active Vertical Crosstalk Compensation (I)

Lyu, Zekui	University of Macau
Wu, Zehao	University of Macau
Xu, Qingsong	University of Macau
Keywords: Automation at Micro-Nano Scales Abstract: This paper presents the design and development of a new flexible XYZ micropositioner with a hybrid kinematic configuration. A piezoelectric-driven Z stage is embedded into a parallel-kinematic XY stage actuated by two voice coil motors. The XYZ micropositioner features a sizeable workspace with a compact architecture, which benefits from employing deployable mechanisms and mixed actuators. One uniqueness is that the Z-axis crosstalk error of the XYZ micropositioner is compensated by the closed-loop motion control of the Z stage, which achieves a constant vertical position of the center platform when performing planar motion tasks. Experimental results indicate that the vertical crosstalk error has been dramatically reduced from 7.333 to 1.719 um after using the active control of the Z-axis position. The proposed design provides a promising approach to enable pure planar motion for ultrahigh precision applications requiring optical or electron focusing, such as electron beam lithography.


TuCT4	404
Robot Control	Regular Session
Chair: Xiong, Xiaobin	University of Wisconsin Madison
Co-Chair: Uno, Kentaro	Tohoku University

15:00-15:05, Paper TuCT4.1
DA-MPPI: Disturbance-Aware Model Predictive Path Integral Via Active Disturbance Estimation and Compensation

Zhang, Haodi	Southeast University
Su, Jinya	Southeast University
Yang, Jun	Loughborough University
Li, Shihua	Southeast University
Keywords: Motion Control, Aerial Systems: Mechanics and Control, Robust/Adaptive Control Abstract: Model Predictive Path Integral (MPPI) controllers are drawing increasing attention for their ability to efficiently handle complex systems by leveraging GPU acceleration while with flexible prediction models and cost functions. However, their performance generally degrades with low-quality prediction models and unknown external disturbances. Existing methods that rely solely on feedforward disturbance compensation are limited by the assumption of matched disturbances, which rarely holds in practice due to the complex lumped disturbances. To this end, we propose a novel Disturbance-Aware (DA-) MPPI framework, which seamlessly integrates an Extended high-order Sliding Mode Observer (ESMO) into MPPI. The ESMO provides accurate estimates of uncertainties and external disturbances, which are directly incorporated into the MPPI rolling dynamics to improve prediction and therefore tracking control performance. The proposed algorithm is verified against the baseline MPPI in AirSim simulation environment by stochastic simulation. Comparatively statistical experiments show that incorporating ESMO within the MPPI framework significantly enhances tracking performance, with the RMSE reduction in term of median by 8.0%, 17.7%, 6.17%, 12.9% and in term of standard variance by 11.5%, 26.0%, 10.4%, and 9.2% in four representative scenarios. The effects of target velocity and prediction horizon on control performance are also systematically evaluated. These results validate the robustness and accuracy of the DA-MPPI controller in complex and uncertain environments.

15:05-15:10, Paper TuCT4.2
Reference-Steering Via Data-Driven Predictive Control for Hyper-Accurate Robotic Flying-Hopping Locomotion

Zeng, Yicheng	University of Wisconsin - Madison
Huang, Yuhao	University of Wisconsin-Madison
Xiong, Xiaobin	University of Wisconsin Madison
Keywords: Motion Control, Legged Robots, Optimization and Optimal Control Abstract: State-of-the-art model-based control designs have been shown to be successful in realizing dynamic locomotion behaviors for robotic systems. The precision of the realized behaviors in terms of locomotion performance via fly, hopping, or walking has not yet been well investigated, despite the fact that the difference between the robot model and physical hardware is doomed to produce inaccurate trajectory tracking. To address this inaccuracy, we propose a referencing-steering method to bridge the model-to-real gap by establishing a data-driven input-output (DD-IO) model on top of the existing model-based design. The DD-IO model takes the reference tracking trajectories as the input and the realized tracking trajectory as the output. By utilizing data-driven predictive control, we steer the reference input trajectories online so that the realized output ones match the actual desired ones. We demonstrate our method on the robot PogoX to realize hyper-accurate hopping and flying behaviors in both simulation and hardware. This data-driven reference-steering approach is straightforward to apply to general robotic systems for performance improvement via hyper-accurate trajectory tracking.

15:10-15:15, Paper TuCT4.3
Robust and Modular Multi-Limb Synchronization in Motion Stack for Space Robots with Trajectory Clamping Via Hypersphere

Neppel, Elian	TOHOKU UNIVERSITY
Mishra, Ashutosh	Tohoku University
Karimov, Shamistan	Tohoku University
Uno, Kentaro	Tohoku University
Santra, Shreya	Tohoku University
Yoshida, Kazuya	Tohoku University
Keywords: Motion Control, Robust/Adaptive Control, Cellular and Modular Robots Abstract: Modular robotics holds immense potential for space exploration, where reliability, repairability, and reusability are critical for cost-effective missions. Coordination between heterogeneous units is paramount for precision tasks – whether in manipulation, legged locomotion, or multi-robot interaction. Such modular systems introduce challenges far exceeding those in monolithic robot architectures. This study presents a robust method for synchronizing the trajectories of multiple heterogeneous actuators, adapting dynamically to system variations with minimal system knowledge. This design makes it inherently robot-agnostic, thus highly suited for modularity. To ensure smooth trajectory adherence, the multidimensional state is constrained within a hypersphere representing the allowable deviation. The distance metric can be adapted hence, depending on the task and system under control, deformation of the constraint region is possible. This approach is compatible with a wide range of robotic platforms and serves as a core interface for Motion Stack, our new open-source universal framework for limb coordination (available at https://github.com/2lian/Motion-Stack). The method is validated by synchronizing the end-effectors of six highly heterogeneous robotic limbs, evaluating both trajectory adherence and recovery from significant external disturbances.

15:15-15:20, Paper TuCT4.4
Optimization-Based Path-Velocity Control for Time-Optimal Path Tracking under Uncertainties

Jia, Zheng	Lund University
Olofsson, Bjorn	Lund University
Karayiannidis, Yiannis	Lund University
Keywords: Motion Control, Robust/Adaptive Control Abstract: This work addresses the path-tracking problem of time-optimal trajectories under model uncertainties, by proposing a real-time predictive scaling algorithm. The algorithm is formulated as a convex optimization problem, designed to balance the trade-off between improving feasibility and time optimality of a trajectory. The predicted trajectory is scaled based on the presence of path segments that are particularly sensitive to model uncertainties within the prediction horizon. Numerical simulations and experiments demonstrate that the proposed scaling algorithm reduces the path traversal time, while preserving similar path-tracking accuracy compared to an existing non-predictive method.

15:20-15:25, Paper TuCT4.5
Active Training Data Selection for Gaussian Process-Based Robot Dynamics Learning and Control

Han, Feng	New York Institute of Technology
Yi, Jingang	Rutgers University
Huang, Yi	Rutgers University
Keywords: Motion Control, Machine Learning for Robot Control Abstract: The model-based robot control requires an accurate dynamics model and a machine learning-based method can extract robot dynamics from collected motion data in simulation and experiment. Gaussian process (GP) has been used as one of the learning methods to obtain robot dynamics. To avoid large training data sets for learning robot dynamics, we propose an active training data selection strategy. The data sampling criteria are to minimize the probability density difference between the actual model and the GP-based estimate. Under such a criterion, the active training data strategy identifies where to sample the next data point for model training. We demonstrate the proposed active learning strategy with a 3-link robot arm in both fully actuated and underactuated motion modes. Under the selected data set containing 150 data points, the integrated probability density error compared with the entire dataset (over 30,000 data points) is less than 0.3. The experimental results confirm that the GP-based control performance is greater than the model-based one.

15:25-15:30, Paper TuCT4.6
Adaptive Wall-Following Control for Unmanned Ground Vehicles Using Spiking Neural Networks

Yang, Hengye	Midea Group
Chen, Yanxiao	Midea Group
Fan, Zexuan	Midea Group
Shao, Lin	Midea Group
Sun, Tao	Massachusetts Institute of Technology
Keywords: Motion Control, Neural and Fuzzy Control, Robust/Adaptive Control Abstract: Unmanned ground vehicles operating in complex environments must adaptively adjust to modeling uncertainties and external disturbances to perform tasks such as wall following and obstacle avoidance. This paper introduces an adaptive control approach based on spiking neural networks for wall fitting and tracking, which learns and adapts to unforeseen disturbances. We propose real-time wall-fitting algorithms to model unknown wall shapes and generate corresponding trajectories for the vehicle to follow. A discretized linear quadratic regulator is developed to provide a baseline control signal based on an ideal vehicle model. Point matching algorithms then identify the nearest matching point on the trajectory to generate feedforward control inputs. Finally, an adaptive spiking neural network controller, which adjusts its connection weights online based on error signals, is integrated with the aforementioned control algorithms. Numerical simulations demonstrate that this adaptive control framework outperforms the traditional linear quadratic regulator in tracking complex trajectories and following irregular walls, even in the presence of partial actuator failures and state estimation errors.

15:30-15:35, Paper TuCT4.7
Deep Reinforcement Learning-Based Trajectory Tracking Framework for 4WS Robots Considering Switch of Steering Modes

Bao, Runjiao	Beijing Institute of Technology
Xu, Yongkang	Beijing Institute of Technology
Zhang, Lin	Beijing Institute of Technology
Yuan, Haoyu	Beijing Institute of Technology
Si, Jinge	Beijing Institute of Technology
Wang, Shoukun	Beijing Institute of Technology
Niu, Tianwei	Beijing Institute of Technology
Keywords: Motion Control, Reinforcement Learning, Wheeled Robots Abstract: The application scenarios of automated robots are undergoing a paradigm shift from structured environments to unstructured, complex settings. In highly constrained settings like factory inspections or disaster rescue, conventional steering systems show clear drawbacks. While the four-wheel independent drive and independent steering (4WS) robot provides a variety of steering modes, which can effectively meet the needs of complex environments. However, how a 4WS robot autonomously selects different steering modes based on trajectory point information during trajectory tracking remains a challenging problem. This paper proposes a multi-modal trajectory tracking method considering the switch of steering modes, which decomposes the trajectory tracking task into two parts: mode decision-making and tracking control. The corresponding method is designed based on deep reinforcement learning. Additionally, a target trajectory random generator and corresponding training interaction environment are designed to train the model in a data-driven manner. In the designed scenario, our tracker achieve more than a 30% improvement in average tracking error across all motion modes compared with model predictive control, and the decider's average decision position error is less than 2 cm. Extensive experiments demonstrate that our method achieves superior tracking performance and real-time capabilities compared to current methods.

15:35-15:40, Paper TuCT4.8
Pursuit-Evasion for Car-Like Robots with Sensor Constraints

Gonultas, Burak M	University of Minnesota
Isler, Volkan	The University of Texas at Austin
Keywords: Motion Control, Dynamics, Reinforcement Learning Abstract: We study a pursuit-evasion game between two players with car-like dynamics and sensing limitations by formalizing it as a partially observable stochastic zero-sum game. The partial observability caused by the sensing constraints is particularly challenging. As an example, in a situation where the agents have no visibility of each other, they would need to extract information from their sensor coverage history to reason about potential locations of their opponents. However, keeping historical information greatly increases the size of the state space. To mitigate the challenges encountered with such partially observable problems, we develop a new learning-based method that encodes historical information to a belief state and uses it to generate agent actions. Through experiments we show that the learned strategies improve over existing multiagent RL baselines by up to 16% in terms of capture rate for the pursuer. Additionally, we present experimental results showing that learned belief states are strong state estimators for extending existing game theory solvers and demonstrate our method’s competitiveness for problems where existing fully observable game theory solvers are computationally feasible. Finally, we deploy the learned policies on physical robots for a game between the F1TENTH and JetRacer platforms moving as fast as 2 m/s in indoor environments, showing that they can be executed on real-robots.


TuCT5	407
Dynamics	Regular Session
Co-Chair: Chen, Zheng	Zhejiang University

15:00-15:05, Paper TuCT5.1
A Stepwise Identification Framework for Determining the Physical Feasibility Parameters of Robot Dynamics

Liu, Guanghui	Shenyang University of Technology
Tan, Fuyuan	Shenyang University of Technology
Fang, Lijin	Northeastern University
Zhang, Hualiang	Shenyang Institute of Automation, Chinese Academy of Sciences
Li, Qiang	Shenzhen Technology University
Keywords: Dynamics, Industrial Robots, Optimization and Optimal Control Abstract: This paper introduces a systematic approach to identifying a physically feasible set of robot dynamics parameters. The framework consists of four steps: 1) Identification of robot dynamics parameters using least squares combined with a linear friction model. 2) Construction of a weighting matrix based on the least squares identification error, and performing weighted least squares identification combined with the linear friction model. 3) Introduction of a nonlinear friction model to fit joint friction. 4) Optimization of the remaining robot dynamics parameters to adhere to physical feasibility constraints. Various combinations of identification methods with linear or nonlinear friction models are analyzed experimentally, using a 6-DoF industrial robot and a 7-Dof collaborative robot, respectively, to demonstrate the effectiveness of the proposed recognition framework. Experimental results affirm that the proposed method provides accurate estimates of the robot joint torques while maintaining the physical feasibility of the dynamics.

15:05-15:10, Paper TuCT5.2
Real-Time Position-Based Deformable Human Body Dynamics for Disaster Rescue Simulation: A Stress-Driven Approach Using a Practical Neo-Hookean Constraint

Wang, Xu	Hokkaido University
Aoki, Daichi	Hokkaido University
Zhu, Zechen	Hokkaido University
Murakami, Soichi	Hokkaido University Hospital
Shimoe, Takashi	Department of Orthopaedic Surgery, Wakayama Medical University
Senoo, Taku	Hokkaido University
Date, Hiroaki	Hokkaido University
Shichinohe, Toshiaki	Gastroenterological Surgery II, Hokkaido University Graduate Sch
Abe, Takashige	Department of Urology, Hokkaido University Graduate School of Me
Kanai, Satoshi	Hokkaido University
Konno, Atsushi	Hokkaido University
Keywords: Dynamics, Simulation and Animation Abstract: Position Based Dynamics (PBD) has been widely adopted for interactive simulation, particularly in applications such as virtual surgery and elastodynamics. However, many existing frameworks focus exclusively on interactive deformation, often neglecting the comprehensive analysis of stress distribution, which is a critical factor in engineering assessments. In remote disaster rescue scenarios, real-time stress visualization can provide vital insights that enable rescue teams to make informed decisions when interacting with deformable objects. In this work, we introduce a fully GPU-parallel, stress-driven simulation framework for real-time deformable human body dynamics, specifically designed for disaster rescue applications. Our approach computes the von Mises stress for each tetrahedral element using a practical Neo-Hookean material model, projects the stress onto the corresponding mesh vertices, and maps these values onto the surface for intuitive rendering. In particular, we address the convergence limitations of general Neo-Hookean constraints under a Jacobi parallel scheme by developing a robust, improved approach. This improved approach avoids the typical volume loss observed in conventional methods and better replicates the qualitative behavior of the Gauss-Seidel scheme. Quantitative experiments using 100 human body models with diverse shapes, heights, and weights demonstrate that our framework effectively maintains the original pose while delivering enhanced physical realism and informative stress visualization. This capability provides disaster rescue teams with critical insights to optimize decision-making during emergencies.

15:10-15:15, Paper TuCT5.3
Reservoir Computing for Torque-Restricted Pendulum Control

Ye, Fan	University of Cambridge
Bonner, Timothy	Cambridge University
Abdulali, Arsen	Cambridge University
Chu, Kai-Fung	University of Cambridge
Iida, Fumiya	University of Cambridge
Keywords: Dynamics, Body Balancing, Natural Machine Motion Abstract: Underactuated control remains a significant challenge in robotics, often necessitating precise modeling or large amounts of data for effective controller design. To address this problem, we introduce a novel training method that utilities a Reservoir Computing (RC) framework to serve as a model-free controller that can effectively control a nonlinear robot using minimal training. This paper explores the application of the proposed framework to an underactuated single pendulum and achieves similar control performance to that of model-free reinforcement learning controllers while utilising just 0.5% of the data and a simple passive data collection method. We analyze 1,000 unique successful reservoir structures, examining their internal connectivity and memory properties, and identify key structural features that enhance control performance. Finally, this paper also explores our proposed controller's robustness to changes in pendulum dimensionality and torque limit with successful control achieved for a large range of varying properties without any additional training.

15:15-15:20, Paper TuCT5.4
Markov Parameters Generation for Data-Based Modeling of Tensegrity Robots Considering Finite Word-Length Effects

Shi, Linxuan	Soochow University
Cao, Weizhi	Soochow University
Chen, Muhao	University of Houston
Shen, Yuling	Soochow University
Keywords: Dynamics, Model Learning for Control, Flexible Robotics Abstract: This paper studies the impact of finite word-length effects on the Markov parameters of tensegrity robots during digital simulations. First, the round-off noise models are introduced, where round-off noise is applied to the system's inputs, outputs, and states. The deterministic and stochastic definitions of the Markov parameters are then presented. It is proven that stochastic Markov parameters remain invariant under finite word-length effects in linear time-invariant (LTI) systems, regardless of the round-off noise in inputs, outputs, or states. The nonlinear tensegrity dynamics and a linearization approach are introduced, with a tensegrity morphing airfoil studied as an illustrative example. The results indicate that using twisted inputs and outputs signals, the Markov parameters can converge correctly through white noise experiments. This supports the theoretical findings. The proposed approach allows accurate Markov parameter generation via simulation tests, which can be further used as model reduction or linearization of tensegrity robots to eliminate the distortions caused by round-off errors.

15:20-15:25, Paper TuCT5.5
Hybrid Data-Model-Driven External Force Estimation for Manipulators Via Generalized Momentum-Based Third-Order Observer

Zhang, Haohao	Ningxia University
Li, Yi	Ningxia University
Wang, Yixin	Ningxia University
Li, Chong	LPS Laboratory
Tian, Xuhang	Liupanshan Laboratory
Han, Yulan	Ningxia University
Ren, ZhongYi	Ningxia University
Keywords: Dynamics, Physical Human-Robot Interaction, Calibration and Identification Abstract: Accurate dynamic modeling and external force estimation are crucial for high-precision robot control and applications. However, model incompleteness and external disturbances inevitably lead to a residual between the actual joint torque and the torque calculated by the identified dynamic model. To address this, this paper proposes a hierarchical fusion framework. First, a multi-layer perceptron neural network (MLPNN) is employed to systematically compensate for these joint torque residuals. Subsequently, a generalized momentum-based third-order external force observer is designed to enhance the accuracy of estimating external forces acting on the manipulator. This approach retains the interpretability inherent in physics-based models while augmenting generalization capability through data-driven correction. The advantages of the third-order external force observer are substantiated via comparative analysis with first- and second-order observers on a Simulink simulation platform using a 2-DOF planar manipulator. Furthermore, the effectiveness of the proposed method was validated through a dragging experiment conducted on a 6-DOF manipulator without end-effector force/torque sensor, demonstrating its performance in practical applications.

15:25-15:30, Paper TuCT5.6
On the Fully Decoupled Rigid-Body Dynamics Identification of Serial Industrial Robots

Hu, Jinfei	Hong Kong Centre for Logistic Robotics
Chen, Zelong	Zhejiang University
Lin, Yinjie	Zhejiang University
Chen, Zheng	Zhejiang University
Yao, Bin	Purdue University
Ma, Xin	Chinese Univerisity of HongKong
Keywords: Dynamics Identification, Industrial Robots, Dynamics, Kinematics Abstract: Accurate rigid-body dynamics is crucial for serial industrial robot applications such as force control and physical human-robot interaction. Despite decades of research, the precise identification of dynamic parameterstextemdash particularly low-magnitude inertia parameterstextemdash remains a challenge for serial industrial robots. Researchers usually focus on developing various parameter estimation methods, while optimizing exciting trajectories in similar ways, typically minimizing the condition number of the information matrix. However, such optimization usually fails to ensure sufficient excitation for each parameter, due to non-convex coupling effects. To address this limitation, we propose a fully decoupled rigid-body dynamics identification (FDRDI) method in this article. This approach innovatively eliminates coupling effects by using novel symmetrical exciting trajectories based on reciprocating S-curve (RSC). This innovation enables the independent identification of dynamic parameters associated with joint friction, as well as the gravity and inertia of links and payloads. Comparative experiments show that FDRDI achieves superior identification accuracy, evidenced by re


TuCT6	301
Micro/Nano Robots 3	Regular Session
Co-Chair: Liu, Xiaoming	Beijing Institute of Technology

15:00-15:05, Paper TuCT6.1
Selective Motion Control of Cell Microrobots in Three-Dimensional Space

Zhang, Haoyu	Southeast University
Sun, Yimin	Souteast University
An, Xuanyu	Southeast University
Du, Jiansheng	Southeast University
Luo, Shengming	Southeast University
Cao, Ying	Southeast University
Yu, Jiangfan	Chinese University of Hong Kong, Shenzhen
Wang, Qianqian	Southeast University
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Motion Control Abstract: Magnetic microrobots are showing great potential in micromanipulation due to the capability of motion control under external fields. However, achieving selective control of magnetic microrobots in three-dimensional (3D) space using global magnetic fields still presents a challenge. In this work, we propose a selective control strategy based on a movable electromagnetic coil system, incorporating a mass-spring-damping model to achieve precise control of cell microrobots in 3D space. By combining theoretical analysis with vision-based feedback, experiments are demonstrated in different scenarios, including step climbing and ring traversal, validating the control capability in different environments. Furthermore, by utilizing the differences in magnetic responses among soft magnetic cell microrobots, this strategy enables selective manipulation of multiple cell microrobots, demonstrating real-time sorting manipulation in a 3D space. Our work presents a strategy that can be applied to selectively manipulate magnetic microrobots in complex environments.

15:05-15:10, Paper TuCT6.2
Design and Modeling of a Micro-Coil Array Platform for the Smooth Movement of Multiple Micro-Robots

Tang, Xinzhe	JiangNan University
Liu, Yueyue	Jiangnan University
Fan, Qigao	Jiangnan University
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Nanomanufacturing Abstract: In recent years, local magnetic field actuation technology has garnered significant attention in the field of collaborative control of multiple micro-robots. By optimizing the design of coil array structures, gradient magnetic fields can be generated in target areas, enabling independent control of multi-robot systems. However, existing research is largely limited to step-by-step actuation modes, which often cause noticeable jitter during robot movement. Moreover, they make it difficult to achieve omnidirectional and continuous motion, severely limiting both motion smoothness and positioning accuracy. To address these issues, this study proposes a coil array actuation platform and introduces a differential current actuation strategy, effectively achieving smooth motion control of multi-robot systems. The research first analyzes the spatial magnetic field distribution characteristics through modeling; then, based on the magnetic force model, a differential current actuation strategy for multiple robots is proposed; finally, an experimental platform is constructed and a series of experiments are conducted. The experimental results show that this actuation platform can achieve independent and smooth control of multiple micro-robots, demonstrating promising potential in applications such as automated microscopic manipulation.

15:10-15:15, Paper TuCT6.3
Asynchronous Rectification-Based Fast Local Imaging and Estimation Scheme for High-Speed Rotating States Observation of MM

Sun, Zhiyong	The University of Hong Kong
Cheng, Yu	Michigan State University
Chen, Liangliang	Michigan State University
Cheng, Erkang	Nullmax Inc
Lei, Hong	Hainan University
Song, Bo	Hefei Institutes of Physical Science, Chinese Academy of Science
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Sensor Fusion Abstract: Magnetic microrobots (MMs) have emerged as promising tools for targeted therapies, including non-invasive in vivo treatments and precise drug delivery, owing to their untethered controllability and biocompatibility. Current actuation strategies for MMs primarily rely on two magnetic field (MF) generation approaches: gradient-based and rotational methods. Unlike the gradient method, rotational actuation enables efficient manipulation of MMs under significantly weaker magnetic fields. To fully leverage the potential of rotationally driven MMs, a comprehensive understanding of their fundamental spin motility is essential. Achieving accurate characterization of these MMs necessitates the development of an MF generation system equipped with rapid motion-tracking and broad-range measurement capabilities. This study proposes a high-speed rotating states observation scheme by developing a tracking-based optimal local imaging and estimation scheme, simultaneously meeting the broad-range observation capability and the high imaging speed requirement. Specifically, the CSR-DCF tracking method is adopted to detect the MM's location, and based on this, the observation system adjusts the imaging region optimally. An estimation scheme based on the asynchronous rectification method is derived to measure the MM rotating states consistently using measured MF data and local optical images of the target. Experimental studies are carried out to validate the effectiveness of the proposed scheme.

15:15-15:20, Paper TuCT6.4
Flow-Aware Navigation of Magnetic Micro-Robots in Complex Fluids Via PINN-Based Prediction

Jia, Yongyi	Tsinghua University
Miao, Shu	Tsinghua University
Wu, JiaYu	Tsinghua University
Yang, Ming	Southern University of Science and Technology
Hu, Chengzhi	Southern University of Science and Technology
Li, Xiang	Tsinghua University
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Vision-Based Navigation Abstract: While magnetic micro-robots have demonstrated significant potential across various applications, including drug delivery and microsurgery, the open issue of precise navigation and control in complex fluid environments is crucial for in vivo implementation. This paper introduces a novel flow-aware navigation and control strategy for magnetic micro-robots that explicitly accounts for the impact of fluid flow on their movement. First, the proposed method employs a Physics-Informed U-Net (PI-UNet) to refine the numerically predicted fluid velocity using local observations. Then, the predicted velocity is incorporated in a flow-aware A* path planning algorithm, ensuring efficient navigation while mitigating flow-induced disturbances. Finally, a control scheme is developed to compensate for the predicted fluid velocity, thereby optimizing the micro-robot's performance. A series of simulation studies and real-world experiments are conducted to validate the efficacy of the proposed approach. This method enhances both planning accuracy and control precision, expanding the potential applications of magnetic micro-robots in fluid-affected environments typical of many medical scenarios.

15:20-15:25, Paper TuCT6.5
Non-Contact Dexterous Micromanipulation with Multiple Optoelectronic Robots

Jia, Yongyi	Tsinghua University
Miao, Shu	Tsinghua University
Wang, Ao	Beihang University
Ni, Caiding	Beihang University
Feng, Lin	Beihang University
Wang, Xiaowo	Tsinghua Univeristy
Li, Xiang	Tsinghua University
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales Abstract: Micromanipulation systems leverage automation and robotic technologies to improve the precision, repeatability, and efficiency of various tasks at the microscale. However, current approaches are typically limited to specific objects or tasks, which necessitates the use of custom tools and specialized grasping methods. This paper proposes a novel non-contact micromanipulation method based on optoelectronic technologies. The proposed method utilizes repulsive dielectrophoretic forces generated in the optoelectronic field to drive a microrobot, enabling the microrobot to push the target object in a cluttered environment without physical contact. The non-contact feature can minimize the risks of potential damage, contamination, or adhesion while largely improving the flexibility of manipulation. The feature enables the use of a general tool for indirect object manipulation, eliminating the need for specialized tools. A series of simulation studies and real-world experiments---including non-contact trajectory tracking, obstacle avoidance, and reciprocal avoidance between multiple microrobots---are conducted to validate the performance of the proposed method. The proposed formulation provides a general and dexterous solution for a range of objects and tasks at the micro scale.

15:25-15:30, Paper TuCT6.6
Contactless and Economical Chemical Reaction Platform Based on Ultrasonic Field

Li, Yunsheng	Beijing Institute of Technology
Wang, Qiao	Beijing Institute of Technology
Liu, Yuyan	Beijing Institute of Technology
Yuan, Bo	Beijing Institute of Technology
Chen, Zhuo	Beijing Institute of Technology
Huang, Qiang	Beijing Institute of Technology
Arai, Tatsuo	University of Electro-Communications
Liu, Xiaoming	Beijing Institute of Technology
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales Abstract: Chemical reactions constitute a cornerstone of fundamental scientific inquiry, yet traditional methodologies and platforms are encumbered by excessive reagent and consumable demands. Emerging alternatives, such as microfluidic systems, while innovative, suffer from intricate fabrication processes and elevated costs associated with operator training. Other contemporary approaches face limitations including reagent compatibility constraints and prohibitively expensive instrumentation. To address these challenges, this study introduces a contactless chemical reaction platform leveraging an ultrasonic vortex field to achieve stable capture, microscale droplet transport, and sequential multi-droplet mixing without direct contact. This platform substantially reduces contamination risks, minimizes reagent and consumable usage, accommodates a broad spectrum of reagent types, and imposes minimal demands on operator expertise. Demonstrating robust performance in microdose reaction control, the system offers significant potential for advancing chemical research and its applications.

15:30-15:35, Paper TuCT6.7
Dual-Bubble Coordinated Acoustic Micromanipulator for Multidirectional Object Rotation

Li, Yuyang	Jiangsu University
Zhang, Zhong-Qiang	Jiangsu University
Miao, Chenglin	Jiangsu University
Du, Xu	Jiangsu University
Huang, Qiang	Beijing Institute of Technology
Arai, Tatsuo	University of Electro-Communications
Liu, Xiaoming	Beijing Institute of Technology
Keywords: Micro/Nano Robots, Biological Cell Manipulation Abstract: Micromanipulation techniques struggle to achieve three-dimensional rotational control at the microscale without compromising biocompatibility or spatial flexibility. Conventional methods based on mechanical contact, optical forces, or confined microfluidics constrain dynamic reconfiguration and surgical accessibility. Here, we introduce a dual-bubble acoustic micromanipulator that enables multidirectional rotation through controlled hydrodynamic fields. By placing oscillating microbubbles at the tips of micropipettes, this system creates adjustable vortex patterns: a single microbubble generates toroidal flows for out-of-plane rotation, while two microbubbles produce shear forces for in-plane spinning. This approach uses simple mechanical adjustments to control rotational axes in open fluid environments, without needing frequency modulation or phase synchronization. Flow-field simulations and experiments with polystyrene microspheres confirm deterministic orientation control, and tests with shrimp embryos demonstrate rotation at clinically relevant speeds. The open architecture integrates seamlessly with standard microscopy and robotic injection systems, offering a non-contact, precise tool for applications such as polar body alignment, intracellular surgery, and 3-D imaging.

15:35-15:40, Paper TuCT6.8
DMPBot: A High-Speed, High-Precision, Omnidirectional, Insect-Scale Piezoelectric Robot

Chen, Yan	Beijing Institute of Technology
Chen, Shu	Beijing Institute of Technology
Yang, Zheyu	Beijing Institute of Technology
Liu, Pengyu	Beijing Institute of Technology
Zhang, Sicheng	Beijing Institute of Technology
Deng, Ziru	Beijing Institude of Technology
An, Junqi	Beijing Institute of Technology
Huang, Qiang	Beijing Institute of Technology
Arai, Tatsuo	University of Electro-Communications
Liu, Xiaoming	Beijing Institute of Technology
Keywords: Micro/Nano Robots Abstract: Microrobots have garnered significant attention due to their vast potential applications across various fields. Among various types of microrobots, piezoelectric robots stand out due to their exceptional motion accuracy, low power consumption, and simple structural design. This work introduces a novel piezoelectric microrobot, the Dual-Modal Piezoelectric Robot (DMPBot), which is fabricated with an innovative carbon fiber substrate through a heat-pressing process with a compact size of 6 mm × 9 mm × 1.1 mm and a weight of only 0.05 g. DMPBot can achieve both high-speed and high-precision motion in non-resonant mode, as well as omnidirectional movement by integrating non-resonant and resonant modes. In non-resonant mode, the robot can reach a speed of 33 mm/s (3.67 body lengths per second) and a sub-micron resolution of 0.4 μm by adjusting the applied signal. This work presents an analysis of the design, fabrication, and performance of DMPBot, focusing on its dynamic response, motion mechanisms, high-speed and high-precision motion, and omnidirectional movement capabilities. Experimental results validate the ability of DMPBot to perform high-speed, high-precision, and omnidirectional motion, demonstrating its promising potential in the field of micromanipulation.


TuCT7	307
Motion and Path Planning 3	Regular Session
Co-Chair: Park, Suhan	Kwangwoon University

15:00-15:05, Paper TuCT7.1
Multi-Robot Coordination in an Adversarial Graph-Traversal Game

Berneburg, James	George Mason University
Wang, Xuan	George Mason University
Xiao, Xuesu	George Mason University
Shishika, Daigo	George Mason University
Keywords: Motion and Path Planning, Multi-Robot Systems, Optimization and Optimal Control Abstract: This paper studies coordinated behaviors which arise when a team of robots must traverse hazardous environments in the presence of an adversary. We formulate the scenario as a novel non-cooperative stochastic game in which the ``blue'' team of robots moves in an environment modeled by a time-varying graph, attempting to reach some goal with minimum cost, while the ``red'' player controls how the graph changes to maximize the cost. In addition to a numerical method to compute the Nash equilibrium, we also present novel theoretical analysis on security strategies that provides performance bounds in a more computationally efficient way. Through numerical simulations, we demonstrate the emergence of beneficial coordinated behavior, where the robots split up and/or synchronize to traverse risky edges.

15:05-15:10, Paper TuCT7.2
TrajFlow: Multi-Modal Motion Prediction Via Flow Matching

Yan, Qi	UBC
Zhang, Brian Zhaoning	University of Waterloo
Zhang, Yutong	Georgia Institute of Technology
Yang, Hsueh-Han Daniel	Carnegie Mellon University
White, Joshua	University of British Columbia
Chen, Di	Xsense.ai
Liu, Jiachao	Zhejiang Leapmotor Technology Co., Ltd
Liu, Langechuan	Nvidia
Zhuang, Binnan	Nvidia
Shi, Shaoshuai	Max Planck Institute for Informatics
Liao, Renjie	University of British Columbia
Keywords: Motion and Path Planning, Deep Learning Methods Abstract: Efficient and accurate motion prediction is crucial for ensuring safety and informed decision-making in autonomous driving, particularly under dynamic real-world conditions that necessitate multi-modal forecasts. We introduce TrajFlow, a novel flow matching-based motion prediction framework that addresses the scalability and efficiency challenges of existing generative trajectory prediction methods. Unlike conventional generative approaches that employ i.i.d. sampling and require multiple inference passes to capture diverse outcomes, TrajFlow predicts multiple plausible future trajectories in a single pass, significantly reducing computational overhead while maintaining coherence across predictions. Moreover, we propose a ranking loss based on the Plackett-Luce distribution to improve uncertainty estimation of predicted trajectories. Additionally, we design a self-conditioning training technique that reuses the model’s own predictions to construct noisy inputs during a second forward pass, thereby improving generalization and accelerating inference. Extensive experiments on the large-scale Waymo Open Motion Dataset (WOMD) demonstrate that TrajFlow achieves state-of-the-art performance across various key metrics, underscoring its effectiveness for safety-critical autonomous driving applications. The code and other details are available on the project website https://traj- flow.github.io/.

15:10-15:15, Paper TuCT7.3
Active Probing with Multimodal Predictions for Motion Planning

Gadginmath, Darshan	University of California Riverside
Savvas Sadiq Ali, Farhad Nawaz	University of Pennsylvania
Sung, Minjun	University of Illinois Urbana Champaign
Tariq, Faizan M.	Honda Research Institute USA, Inc
Bae, Sangjae	Honda Research Institute, USA
Isele, David	University of Pennsylvania, Honda Research Institute USA
Pasqualetti, Fabio	University of California, Riverside
D'sa, Jovin	Honda Research Institute, USA
Keywords: Motion and Path Planning, Intention Recognition, Probabilistic Inference Abstract: Navigation in dynamic environments requires autonomous systems to reason about uncertainties in the behavior of other agents. In this paper, we introduce a unified framework that combines trajectory planning with multimodal predictions and active probing to enhance decision-making under uncertainty. We develop a novel risk metric that seamlessly integrates multimodal prediction uncertainties through mixture models. When these uncertainties follow a Gaussian mixture distribution, we prove that our risk metric admits a closed-form solution, and is always finite, thus ensuring analytical tractability. To reduce prediction ambiguity, we incorporate an active probing mechanism that strategically selects actions to improve its estimates of behavioral parameters of other agents, while simultaneously handling multimodal uncertainties. We extensively evaluate our framework in autonomous navigation scenarios using the MetaDrive simulation environment. Results demonstrate that our active probing approach successfully navigates complex traffic scenarios with uncertain predictions. Additionally, our framework shows robust performance across diverse traffic agent behavior models, indicating its broad applicability to real-world autonomous navigation challenges.

15:15-15:20, Paper TuCT7.4
Collision-Free Trajectory Planning in Cluttered Environments for Efficient Bin Picking

Xu, Xiaomei	University Saarland, ZeMA - Center for Mechatronics and Automati
Bashir, Attique	ZeMA GGmbH
Bhagiya, Jaykumar	ZeMa GGmbH
Müller, Rainer	ZeMA GGmbh
Keywords: Motion and Path Planning, Localization, Object Detection, Segmentation and Categorization Abstract: In cluttered, reconfigurable environments with varied objects, materials, and sizes—where obstacles can be added, removed, or moved—planning robot arm actions between grasps is a key challenge for traditional bin picking. We propose a trajectory planner that quickly determines how to grasp and precisely assemble customized products in reconfigurable environment. Inspired by voting and global registration methods, our strategy reduces false positives and enhances object identification accuracy, even with vastly different point cloud scales. We present an image processing technique combining best-fit and iterative closest point methods to improve system robustness, adaptability, and stability for objects of varying shapes, sizes, and materials, achieving millimeter-level 3D localization with millions of scene points. An enhanced tree method manages uneven 3D space distribution, enabling fast collision-free planning. A space configuration algorithm minimizes computational load, supporting complex tasks like liquid handling. After digital twin verification of collision-free paths, the robotic arm executes assembly tasks. This approach is validated through Lego assembly and other industrial use cases.

15:20-15:25, Paper TuCT7.5
GFM-Planner: Perception-Aware Trajectory Planning with Geometric Feature Metric

Lin, Yue	Dalian University of Technology
Zhang, Xiaoxuan	Dalian University of Technology
Liu, Yang	Dalian University of Technology
Wang, Dong	Dalian University of Technology
Lu, Huchuan	Dalian University of Technology
Keywords: Motion and Path Planning, Localization Abstract: Like humans who rely on landmarks for orientation, autonomous robots depend on feature-rich environments for accurate localization. In this paper, we propose the GFM-Planner, a perception-aware trajectory planning framework based on the geometric feature metric, which enhances LiDAR localization accuracy by guiding the robot to avoid degraded areas. First, we derive the Geometric Feature Metric (GFM) from the fundamental LiDAR localization problem. Next, we design a 2D grid-based Metric Encoding Map (MEM) to efficiently store GFM values across the environment. A constant-time decoding algorithm is further proposed to retrieve GFM values for arbitrary poses from the MEM. Finally, we develop a perception-aware trajectory planning algorithm that improves LiDAR localization capabilities by guiding the robot in selecting trajectories through feature-rich areas. Both simulation and real-world experiments demonstrate that our approach enables the robot to actively select trajectories that significantly enhance LiDAR localization accuracy.

15:25-15:30, Paper TuCT7.6
A Constrained Motion Planning Method Exploiting Learned Latent Space for High-Dimensional State and Constraint Spaces (I)

Park, Suhan	Kwangwoon University
Jeon, Suhyun	Seoul National University
Park, Jaeheung	Seoul National University
Keywords: Motion and Path Planning, Manipulation Planning, AI-Based Methods Abstract: This article presents a novel approach to address high-dimensional constrained motion planning problems. The proposed method exploits learned latent spaces to efficiently find a feasible constrained path. Although recent data-driven methods have reduced planning time, two major challenges arise as the space dimensionality increases, such as in multi-arm manipulation. First, the configuration space search employed by existing data-driven approaches becomes computationally expensive as the complexity of constraints increases. Second, preparing datasets for high-dimensional problems is a time-consuming task. To address these challenges, this article introduces a novel approach: the latent motion method. Instead of exploring the configuration space, the latent motion method explores the latent space with a latent jump method that mitigates topological problems. In addition, a tangent space dataset augmentation technique is employed to approximate the manifold using a sparse dataset. Experimental evaluations on benchmark problems indicate that the proposed approach outperforms existing methods in high-dimensional constrained motion planning problems.

15:30-15:35, Paper TuCT7.7
Multi-Robot Motion Planning with Cooperative Localization

Theurkauf, Anne	University of Colorado Boulder
Ahmed, Nisar	University of Colorado Boulder
Lahijanian, Morteza	University of Colorado Boulder
Keywords: Motion and Path Planning, Multi-Robot Systems, Cooperating Robots Abstract: We consider the uncertain multi-robot motion planning (MRMP) problem with cooperative localization (CL-MRMP), under both motion and measurement noise, where each robot can act as a sensor for its nearby teammates. We formalize CL-MRMP as a chance-constrained motion planning problem, and propose a safety-guaranteed algorithm that explicitly accounts for robot-robot correlations. Our approach extends a sampling-based planner to solve CL-MRMP while preserving probabilistic completeness. To improve efficiency, we introduce novel biasing techniques. We evaluate our method on diverse benchmarks, showing effective planning and significant gains from biasing strategies.

15:35-15:40, Paper TuCT7.8
EffiTune: Diagnosing and Mitigating Training Inefficiency for Parameter Tuner in Robot Navigation System

Feng, Shiwei	Purdue University
Chen, Xuan	Purdue University
Xiong, Zikang	Deeproute.ai
Cheng, Zhiyuan	Purdue University
Gao, Yifei	Purdue University
Cheng, Siyuan	Purdue University
Kate, Sayali	Purdue University
Zhang, Xiangyu	Purdue University
Keywords: Motion and Path Planning, Deep Learning Methods Abstract: Robot navigation systems are critical for various real-world applications such as delivery services, hospital logistics, and warehouse management. Although classical navigation methods provide interpretability, they rely heavily on expert manual tuning, limiting their adaptability. Conversely, purely learning-based methods offer adaptability but often lead to instability and erratic robot behaviors. Recently introduced parameter tuners aim to balance these approaches by integrating data-driven adaptability into classical navigation frameworks. However, the parameter tuning process currently suffers from training inefficiencies and redundant sampling, with critical regions in environment often underrepresented in training data. In this paper, we propose EffiTune, a novel framework designed to diagnose and mitigate training inefficiency for parameter tuners in robot navigation systems. EffiTune first performs robot-behavior-guided diagnostics to pinpoint critical bottlenecks and underrepresented regions. It then employs a targeted up-sampling strategy to enrich the training dataset with critical samples, significantly reducing redundancy and enhancing training efficiency. Our comprehensive evaluation demonstrates that EffiTune achieves more than a 13.5% improvement in navigation performance, enhanced robustness in out-of-distribution scenarios, and a 4× improvement in training efficiency within the same computational budget.


TuCT8	308
Medical Robots and Systems 3	Regular Session
Chair: Lu, Haojian	Zhejiang University

15:00-15:05, Paper TuCT8.1
DARt Vinci: Egocentric Data Collection for Surgical Robot Learning at Scale

Liu, Yihao	Johns Hopkins University
Ku, Yu-Chun	Johns Hopkins University
Zhang, Jiaming	Johns Hopkins University
Ding, Hao	Johns Hopkins University
Kazanzides, Peter	Johns Hopkins University
Armand, Mehran	Johns Hopkins University
Keywords: Medical Robots and Systems, AI-Enabled Robotics, Imitation Learning Abstract: Data scarcity has long been an issue in the robot learning community. Particularly, in safety-critical domains like surgical applications, obtaining high-quality data can be especially difficult. It poses challenges to researchers seeking to exploit recent advancements in reinforcement learning and imitation learning, which have greatly improved generalizability and enabled robots to conduct tasks autonomously. We introduce dARt Vinci, a scalable data collection platform for robot learning in surgical settings. The system uses Augmented Reality (AR) hand tracking and a high fidelity physics engine to capture subtle maneuvers in primitive surgical tasks: By eliminating the need for a physical robot setup and providing flexibility in terms of time, space, and hardware resources—such as multiview sensors and actuators—specialized simulation is a viable alternative. At the same time, AR allows the robot data collection to be more egocentric, supported by its body tracking and content overlaying capabilities. Our user study confirms the proposed system's efficiency and usability, where we use widely-used primitive tasks for training teleoperation with da Vinci surgical robots. Data throughput improves across all tasks compared to real robot settings by 41% on average. The total experiment time is reduced by an average of 10%. The temporal demand in the task load survey is improved. These gains are statistically significant. Additionally, the collected data is over 400 times smaller in size, requiring far less storage while achieving double the frequency. The source code for this project can be accessed at https://dartvinci.finite-state.com/.

15:05-15:10, Paper TuCT8.2
RoboNurse-VLA: Robotic Scrub Nurse System Based on Vision-Language-Action Model

Li, Shunlei	The Chinese University of Hong Kong
Wang, Jin	Italian Institute of Technology
Dai, Rui	Istituto Italiano Di Tecnologia
Ma, Wanyu	The Hong Kong Polytechnic University
Ng, Wing Yin	The Chinese University of Hong Kong
Hu, Yingbai	Technische Universität München
Li, Zheng	The Chinese University of Hong Kong
Keywords: Medical Robots and Systems, AI-Enabled Robotics, Perception for Grasping and Manipulation Abstract: In modern healthcare, the demand for autonomous robotic assistants has grown significantly, particularly in the operating room, where surgical tasks require precision and reliability. Robotic scrub nurses have emerged as a promising solution to improve efficiency and reduce human error during surgery. However, challenges remain in terms of accurately grasping and handing over surgical instruments, especially when dealing with complex objects in dynamic environments. In this work, we introduce RoboNurse-VLA, a novel robotic scrub nurse system based on a Vision-Language-Action (VLA) model.RoboNurse-VLA integrates Segment Anything Model 2 (SAM 2) and Llama 2, leveraging an LLM head to enhance reasoning capabilities. By combining SAM 2’s mask generation with Llama 2’s advanced reasoning, RoboNurse-VLA can accurately interpret task requirements, identify optimal grasping points, and determine appropriate handover poses. Designed for realtime operation, RoboNurse-VLA enables precise grasping and seamless handover of surgical instruments based on voice commands from the surgeon. Utilizing state-of-the-art vision and language models, it effectively addresses challenges related to object detection, pose optimization, and handling difficultto- grasp instruments. Extensive evaluations demonstrate that RoboNurse-VLA outperforms existing models, achieving high success rates in surgical instrument handovers, even for previously unseen tools and complex objects. This work represents a significant advancement in autonomous surgical assistance, highlighting the potential of VLA models for real-world medical applications. More details can be found at https://robonurse-vla.github.io.

15:10-15:15, Paper TuCT8.3
Autonomous Image-To-Grasp Robotic Suturing Using Reliability-Driven Suture Thread Reconstruction

Joglekar, Neelay	University of California, San Diego
Liu, Fei	University of Tennessee Knoxville
Richter, Florian	University of California, San Diego
Yip, Michael C.	University of California, San Diego
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics, Surgical Robotics: Planning Abstract: Automating suturing during robotically-assisted surgery reduces the burden on the operating surgeon, enabling them to focus on making higher-level decisions rather than fatiguing themselves in the numerous intricacies of a surgical procedure. Accurate suture thread reconstruction and grasping are vital prerequisites for suturing, particularly for avoiding entanglement with surgical tools and performing complex thread manipulation. However, such methods must be robust to heavy perceptual degradation resulting from heavy noise and thread feature sparsity from endoscopic images. We develop a reconstruction algorithm that utilizes quadratic programming optimization to fit smooth splines to thread observations, satisfying reliability bounds estimated from measured observation noise. Additionally, we craft a grasping policy that generates gripper trajectories that maximize the probability of a successful grasp. Our full image-to-grasp pipeline is rigorously evaluated with over 400 grasping trials, exhibiting state-of-the-art accuracy. We show that this strategy can be applied to the various techniques in autonomous suture needle manipulation to achieve autonomous surgery in a generalizable way.

15:15-15:20, Paper TuCT8.4
Quality-Driven Adaptive Control Framework for Robotic Ultrasound Imaging of Vascular Anatomies

Wang, Bo	Politecnico Di Milano
Fu, Junling	Politecnico Di Milano
Ferrigno, Giancarlo	Politecnico Di Milano
De Momi, Elena	Politecnico Di Milano
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics Abstract: This paper proposes a quality-driven adaptive control framework for robotic vascular anatomies scanning to facilitate the acquisition of high-quality ultrasound (US) images. Specifically, a novel probability-based US image quality evaluation metric for vascular anatomies is introduced, leveraging an image segmentation network to establish a mapping between the controlled variables of the robotic US system (e.g., pose and force) and US image quality. Furthermore, an adaptive US probe control strategy driven by US image quality is developed to optimize real-time image acquisition, with its stability rigorously proven. To assess the effectiveness of the proposed framework, two experiments were conducted on a human tissue-mimicking phantom, encompassing both static and dynamic scenarios. The experimental results demonstrate that the proposed framework ensures stable contact force and significantly enhances US image quality for robot-assisted vascular anatomy imaging, even in the presence of external disturbances.

15:20-15:25, Paper TuCT8.5
Robotic Ultrasound-Guided Femoral Artery Reconstruction of Anatomically-Representative Phantoms

Al-Zogbi, Lidia	Johns Hopkins University
Raina, Deepak	Indian Institute of Technology Mandi
Pandian, Vinciya	Johns Hopkins University
Fleiter, Thorsten	University of Maryland School of Medicine, R Adams Cowley Shock
Krieger, Axel	Johns Hopkins University
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics Abstract: Femoral artery access is essential for numerous clinical procedures, including diagnostic angiography, therapeutic catheterization, and emergency interventions. Despite its critical role, successful vascular access remains challenging due to anatomical variability, overlying adipose tissue, and the need for precise ultrasound (US) guidance. Needle placement errors can result in severe complications, thereby limiting the procedure to highly skilled clinicians operating in controlled hospital environments. While robotic systems have shown promise in addressing these challenges through autonomous scanning and vessel reconstruction, clinical translation remains limited due to reliance on simplified phantom models that fail to capture human anatomical complexity. In this work, we present a method for autonomous robotic US scanning of bifurcated femoral arteries, and validate it on five vascular phantoms created from real patient computed tomography (CT) data. Additionally, we introduce a video-based deep learning US segmentation network tailored for vascular imaging, enabling improved 3D arterial reconstruction. The proposed network achieves a Dice score of 89.21% and an Intersection over Union of 80.54% on a new vascular dataset. The reconstructed artery centerline is evaluated against ground truth CT data, showing an average L2 error of 0.91+/-0.70 mm, with an average Hausdorff distance of 4.36+/-1.11mm. This study is the first to validate an autonomous robotic system for US scanning of the femoral artery on a diverse set of patient-specific phantoms, introducing a more advanced framework for evaluating robotic performance in vascular imaging and intervention.

15:25-15:30, Paper TuCT8.6
A Surgical State Identifying Method Based on BiLSTM with Vibration Processing for Improving Safety of Bone Milling System

Liu, Jinyu	Nankai University
Zhan, Yuanzhu	Pennsylvania State University
Jia, Wenduo	Nankai University
Dai, Yu	Nankai University
Zhang, Jianxun	Nankai University
Keywords: Medical Robots and Systems, Failure Detection and Recovery, Deep Learning Methods Abstract: In spinal surgery, ensuring surgical precision and safety is paramount. Traditionally, surgeons have relied on their experience to determine when to cease milling as the cutter approaches the spinal cord; However, improper technique during this process can lead to complications, such as vertebral plate fractures and spinal cord injuries. This paper investigates the development of a robot capable of high-precision recognition of the milling state. Initially, we identify vibration signals as the basis for state recognition, establishing their feasibility through theoretical analysis, which provides a foundation for the creation of datasets for subsequent milling experiments. We then conducted milling experiments using pig scapulae and designed neural networks for state identification. Vibration signals corresponding to varying milling depth and the proportion of cortical and cancellous bone layers were collected. A BiLSTM-based neural network was developed to identify the milling depth and the proportion of the bone layers, achieving the desired outcomes within an acceptable error range.The results demonstrate that the proposed system achieves high accuracy in state recognition, with errors falling within an acceptable range. This research highlights the potential of integrating advanced neural networks and vibration analysis into robotic systems to enhance precision and safety in spinal surgery.

15:30-15:35, Paper TuCT8.7
Design and Kinematics for the Cystoscope of a Transurethral Continuum Surgical Robotic System

Kuang, Haomin	Shanghai Jiao Tong University
Guo, Jiaxin	The Chinese University of Hong Kong
Chen, Wei	The Chinese University of Hong Kong
Xu, Kai	Shanghai Jiao Tong University
Liu, Yunhui	Chinese University of Hong Kong
Keywords: Medical Robots and Systems, Kinematics, Compliant Joints and Mechanisms Abstract: To achieve en bloc resection of bladder tumor and the anterior tumor resection in transurethral resection of bladder tumor (TURBT), a cystoscope transurethral continuum robotic system has been proposed. A continuum cystoscope in the system needs to bend more than 180° and its base has translation, axial rotation, and tilt degrees of freedom to achieve full bladder accessibility. Under the constant-curvature assumption, the analytical solution of inverse kinematics already exists for multi-segment continuum robots with variable segment lengths and continuum robots with two inextensible segments. However, there is a lack of analytical inverse kinematics solution for continuum robots with features of the continuum cystoscope. Therefore, this paper proposes a novel and efficient inverse kinematics solving algorithm for the continuum cystoscope used in TURBT. The proposed method simplifies the inverse kinematics problem by constructing a robot plane coordinate and uses geometric relationships to derive a non-linear constraint equation containing only one intermediate variable. By solving this non-linear equation, the solution to the entire inverse kinematics problem is obtained. Additionally, based on this inverse kinematics algorithm, the length of the continuum segment is designed to ensure full bladder accessibility. In the comparative experiments with the Jacobian-based method, which involves 12500 target poses, the proposed method solves 100% of the inverse kinematics problems with a much greater computational efficiency.

15:35-15:40, Paper TuCT8.8
Compact Modular Surgical System with a Novel RCM Mechanism for Laparo-Endoscopic Single-Site Surgery

Zhang, Ziwei	NUS
Chng, Chin-Boon	National University of Singapore
Chui, Chee Kong	National University of Singapore
Keywords: Medical Robots and Systems, Mechanism Design, Dual Arm Manipulation Abstract: This paper presents a novel robotic system designed to address the challenges of a robotic-assisted Laparo-Endoscopic single-site (LESS) surgery. While numerous studies have explored mechanisms enabling Remote Center of Motion (RCM) for minimally invasive procedures, few have investigated the interactions of multiple (≥2) manipulators within an integrated system. The proposed robotic system features an arc-shaped mainframe for positioning and mounting of five-bar Spherical Parallel Mechanism (SPM). A conventional straight-shaft surgical tool is inserted through a linear guide rail aligned with the pointing axis leading into the RCM. To account for the physical dimensions of the closed-chain linkages, a modified Denavit-Hartenberg parameterization is adopted to assign spherical linkage frames. This design approach ensures self-collision avoidance within the parallel mechanism and enables systematic evaluation of the distal end-effector’s linear motion characteristics. Furthermore, we investigate the basic functions of SPM manipulators through a prototype experiment, providing preliminary insights that could inform future enhancement of surgical techniques for robotic-assisted LESS procedures.


TuCT9	309
Semantic Scene Understanding: Visual Learning	Regular Session
Co-Chair: De Martini, Daniele	University of Oxford

15:00-15:05, Paper TuCT9.1
MinkOcc: Towards Real-Time Label-Efficient Semantic Occupancy Prediction

Sze, Samuel	University of Oxford
De Martini, Daniele	University of Oxford
Kunze, Lars	UWE Bristol
Keywords: Semantic Scene Understanding, Deep Learning Methods, Deep Learning for Visual Perception Abstract: Developing 3D semantic occupancy prediction models often relies on dense 3D annotations for supervised learning, a process that is both labor and resource-intensive, underscoring the need for label-efficient or even label-free approaches. To address this, we introduce MinkOcc, a multi-modal 3D semantic occupancy prediction framework for cameras and LiDARs that proposes a two-step semi-supervised training procedure. Here, a small dataset of explicitly 3D annotations warm-starts the training process; then, the supervision is continued by simpler-to-annotate accumulated LiDAR sweeps and images -- semantically labelled through vision foundational models. MinkOcc effectively utilizes these sensor-rich supervisory cues and reduces reliance on manual labeling by 90% while maintaining competitive accuracy. In addition, the proposed model incorporates information from LiDAR and camera data through early fusion and leverages sparse convolution networks for real-time prediction. With its efficiency in both supervision and computation, we aim to extend MinkOcc beyond curated datasets, enabling broader real-world deployment of 3D semantic occupancy prediction in autonomous driving.

15:05-15:10, Paper TuCT9.2
Learning Upright and Forward-Facing Object Poses Using Category-Level Canonical Representations

Han, Bing	Xi'an Jiaotong University
Pan, Ruitao	Xi'an Jiaotong University
Zhang, Xinyu	Xi'an Jiaotong University
Chenxi, Wang	Xi'an Jiaotong University
Zhai, Zhi	Xi'an Jiaotong University
Zhao, Zhibin	Xi'an Jiaotong University
Chen, Xuefeng	Xi'an Jiaotong University
Keywords: Semantic Scene Understanding, Data Sets for Robotic Vision, Perception for Grasping and Manipulation Abstract: Constructing a unified canonical pose representation for 3D object categories is crucial for pose estimation and robotic scene understanding. Previous unified pose representations often relied on manual alignment, such as in ShapeNet and ModelNet. Recently, self-supervised canonicalization methods have been proposed, However, they are sensitive to intra-class shape variations, and their canonical pose representations cannot be aligned to a coordinate system centered on the object. In this paper, we propose a category-level canonicalization method that alleviates the impact of shape variation and extends the canonical pose representation to an upright and forward-facing state. First, we design a Siamese Vector Neurons Module (SVNM) that achieves SE(3) equivariance modeling and self-supervised disentangling of 3D shape and pose attributes. Next, we introduce a Siamese equivariant constraint that addresses the pose alignment bias caused by shape deformation. Finally, we propose a method to generate upright surface labels from pose-unknown in-the-wild data and use upright and symmetry losses to correct the canonical pose. Experimental results show that our method not only achieves SOTA consistency performance but also aligns with the object-centered coordinate system.

15:10-15:15, Paper TuCT9.3
Reducing Scene Graph Generation Parameters towards UAV Understanding of Structured Environments

Li, Xudong	National University of Defense Technology
Wang, Chang	National University of Defense Technology
Niu, Yifeng	National University of Defense Technology
Wu, Lizhen	National University of Defense Technology
Yuan, Man	National University of Defence Technology
Keywords: Semantic Scene Understanding, Recognition, Deep Learning for Visual Perception Abstract: Scene graph generation (SGG) is a structured approach to understanding real-world scenes with complex relations, which can enhance UAV autonomy in unfamiliar environments. However, SGG typically has numerous model parameters that require considerable computational resources. This paper proposes a refined SGG model and reduces the model parameters for its UAV applications. First, we use subject-object query pairs to predict triplets directly, eliminating the need for separate entity predictions. Additionally, the cross-attention mechanism enhances the model's ability to query triplets. We use a single decoder to process subject and object entities simultaneously, enhancing computational speed and reducing the number of parameters. Then, we map the entities to the relational semantic space before performing relations classification, which improves the model performance by adding a small number of parameters. Finally, the set prediction loss function is designed for relation prediction to strengthen the role of relation prediction in triplets. Real-world UAV experiments show that our model can extract more triplets per second with fewer parameters than the benchmarks. Github: https://github.com/SupersPig/myLGTR

15:15-15:20, Paper TuCT9.4
Do Visual-Language Grid Maps Capture Latent Semantics?

Pekkanen, Matti	Aalto University
Mihaylova, Tsvetomila	Aalto University
Verdoja, Francesco	Aalto University
Kyrki, Ville	Aalto University
Keywords: Semantic Scene Understanding, Mapping, Deep Learning for Visual Perception Abstract: Visual-language models (VLMs) have recently been introduced in robotic mapping by using the latent representations, i.e., embeddings, of the VLMs to represent semantics in the map. They allow moving from a limited set of human-created labels toward open-vocabulary scene understanding, which is required for service robots operating in complex real-world environments and interacting with humans. While there is anecdotal evidence that maps built this way support downstream tasks, such as navigation, rigorous analysis of the quality of the maps using these embeddings is missing. In this paper, we propose a way to analyze the quality of maps created using VLMs. We investigate two critical properties of map quality: queryability and distinctness. The evaluation of queryability addresses the ability to retrieve information from the embeddings. We investigate intra-map distinctness to study the ability of the embeddings to represent abstract semantic classes, and inter-map distinctness to evaluate the generalization properties of the representation. We propose metrics to evaluate these properties and evaluate two state-of-the-art mapping methods, VLMaps and OpenScene, using two encoders, LSeg and OpenSeg, using real-world data from the Matterport3D data set. Our findings show that while 3D features improve queryability, they are not scale invariant, whereas image-based embeddings generalize to multiple map resolutions. This allows the image-based methods to maintain smaller map sizes, which can be crucial for using these methods in real-world deployments. Furthermore, we show that the choice of the encoder has an effect on the results. The results imply that properly thresholding open-vocabulary queries is an open problem.

15:20-15:25, Paper TuCT9.5
PanopticSplatting: End-To-End Panoptic Gaussian Splatting

Xie, Yuxuan	Zhejiang University
Yu, Xuan	Zhejiang University
Jiang, Changjian	Zhejiang University
Mao, Sitong	ShenZhen Huawei Cloud Computing Technologies Co., Ltd
Zhou, Shunbo	Huawei
Fan, Rui	Tongji University
Xiong, Rong	Zhejiang University
Wang, Yue	Zhejiang University
Keywords: Semantic Scene Understanding, RGB-D Perception Abstract: Open-vocabulary panoptic reconstruction is a challenging task for simultaneous scene reconstruction and understanding. Recently, methods have been proposed for 3D scene understanding based on Gaussian splatting. However, these methods are multi-staged, suffering from the accumulated errors and the dependence of hand-designed components. To streamline the pipeline and achieve global optimization, we propose PanopticSplatting, an end-to-end system for open-vocabulary panoptic reconstruction. Our method introduces query-guided Gaussian segmentation with local cross attention, lifting 2D instance masks without cross-frame association in an end-to-end way. The local cross attention within view frustum effectively reduces the training memory, making our model more accessible to large scenes with more Gaussians and objects. In addition, to address the challenge of noisy labels in 2D pseudo masks, we propose label blending to promote consistent 3D segmentation with less noisy floaters, as well as label warping on 2D predictions which enhances multi-view coherence and segmentation accuracy. Our method demonstrates strong performances in 3D scene panoptic reconstruction on the ScanNet-V2 and ScanNet++ datasets, compared with both NeRF-based and Gaussian-based panoptic reconstruction methods. Moreover, PanopticSplatting can be easily generalized to numerous variants of Gaussian splatting, and we demonstrate its robustness on different Gaussian base models.

15:25-15:30, Paper TuCT9.6
GTAD: Global Temporal Aggregation Denoising Learning for 3D Semantic Occupancy Prediction

Li, Tianhao	Fudan University
Li, Yang	East China Normal University
Li, Mengtian	Shanghai University
Deng, YiSheng	Fudan University
Ge, Weifeng	Fudan University
Keywords: Semantic Scene Understanding Abstract: Accurately perceiving dynamic environments is a fundamental task for autonomous driving and robotic systems. Existing methods inadequately utilize temporal information, relying mainly on local temporal interactions between adjacent frames and failing to leverage global sequence information effectively.To address this limitation, we investigate how to effectively aggregate global temporal features from temporal sequences, aiming to achieve occupancy representations that efficiently utilize global temporal information from historical observations. For this purpose, we propose a global temporal aggregation denoising network named GTAD, introducing a global temporal information aggregation framework as a new paradigm for holistic 3D scene understanding. Our method employs an in-model latent denoising network to aggregate local temporal features from the current moment and global temporal features from historical sequences. This approach enables the effective perception of both fine-grained temporal information from adjacent frames and global temporal patterns from historical observations. As a result, it provides a more coherent and comprehensive understanding of the environment. Extensive experiments on the nuScenes and Occ3D-nuScenes benchmark and ablation studies demonstrate the superiority of our method.

15:30-15:35, Paper TuCT9.7
FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction

Rotondi, Dennis	University of Stuttgart
Scaparro, Fabio	University of Stuttgart
Blum, Hermann	Uni Bonn \| Lamarr Institute
Arras, Kai Oliver	University of Stuttgart
Keywords: Semantic Scene Understanding, Perception for Grasping and Manipulation, RGB-D Perception Abstract: The concept of 3D scene graphs is increasingly recognized as a powerful semantic and hierarchical representation of the environment. Current approaches often address this at a coarse, object-level resolution. In contrast, our goal is to develop a representation that enables robots to directly interact with their environment by identifying both the location of functional interactive elements and how these can be used. To achieve this, we focus on detecting and storing objects at a finer resolution, focusing on affordance-relevant parts. The primary challenge lies in the scarcity of data that extends beyond instance-level detection and the inherent difficulty of capturing detailed object features using robotic sensors. We leverage currently available 3D resources to generate 2D data and train a detector, which is then used to augment the standard 3D scene graph generation pipeline. Through our experiments, we demonstrate that our approach achieves functional element segmentation comparable to state-of-the-art 3D models and that our augmentation enables task-driven affordance grounding with higher accuracy than the current solutions. See our project page at https://fungraph.github.io.

15:35-15:40, Paper TuCT9.8
GaussianGraph: 3D Gaussian-Based Scene Graph Generation for Open-World Scene Understanding

Wang, Xihan	Beijing Institute of Technology
Yang, Dianyi	Beijing Institute of Technology
Gao, Yu	Beijing Institude of Technology
Yue, Yufeng	Beijing Institute of Technology
Yang, Yi	Beijing Institute of Technology
Fu, Mengyin	Beijing Institute of Technology
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Visual Learning Abstract: Recent advancements in 3D Gaussian Splatting(3DGS) have significantly improved semantic scene understanding, enabling natural language queries to localize objects within a scene. However, existing methods primarily focus on embedding compressed CLIP features to 3D Gaussians, suffering from low object segmentation accuracy and lack spatial reasoning capabilities. To address these limitations, we propose GaussianGraph, a novel framework that enhances 3DGS-based scene understanding by integrating adaptive semantic clustering and scene graph generation. We introduce a ‘Control-Follow’ clustering strategy, which dynamically adapts to scene scale and feature distribution, avoiding feature compression and significantly improving segmentation accuracy. Additionally, we enrich scene representation by integrating object attributes and spatial relations extracted from 2D foundation models. To address inaccuracies in spatial relationships, we propose 3D correction modules that filter implausible relations through spatial consistency verification, ensuring reliable scene graph construction. Extensive experiments on three datasets demonstrate that GaussianGraph outperforms state-of-the-art methods in both semantic segmentation and object grounding tasks, providing a robust solution for complex scene understanding and interaction.


TuCT10	310
Semantic Scene Understanding: Segmentation and Mapping	Regular Session
Chair: Yue, Yufeng	Beijing Institute of Technology
Co-Chair: Valada, Abhinav	University of Freiburg

15:00-15:05, Paper TuCT10.1
MapDiffusion: Generative Diffusion for Vectorized Online HD Map Construction and Uncertainty Estimation in Autonomous Driving

Monninger, Thomas	Mercedes-Benz AG, University of Stuttgart
Zhang, Zihan	University of California San Diego
Mo, Zhipeng	Mercedes-Benz Research and Development North America
Anwar, Md Zafar	Penn State University
Staab, Steffen	University of Stuttgart
Ding, Sihao	Mercedes-Benz R&D North America
Keywords: Semantic Scene Understanding, Sensor Fusion, Deep Learning for Visual Perception Abstract: Autonomous driving requires an understanding of the static environment from sensor data. Learned Bird's-Eye View (BEV) encoders are commonly used to fuse multiple inputs, and a vector decoder predicts a vectorized map representation from the latent BEV grid. However, traditional map construction models provide deterministic point estimates, failing to capture uncertainty and the inherent ambiguities of real-world environments, such as occlusions and missing lane markings. We propose MapDiffusion, a novel generative approach that leverages the diffusion paradigm to learn the full distribution of possible vectorized maps. Instead of predicting a single deterministic output from learned queries, MapDiffusion iteratively refines randomly initialized queries, conditioned on a BEV latent grid, to generate multiple plausible map samples. This allows aggregating samples to improve prediction accuracy and deriving uncertainty estimates that directly correlate with scene ambiguity. Extensive experiments on the nuScenes dataset demonstrate that MapDiffusion achieves state-of-the-art performance in online map construction, surpassing the baseline by 5% in single-sample performance. We further show that aggregating multiple samples consistently improves performance along the ROC curve, validating the benefit of distribution modeling. Additionally, our uncertainty estimates are significantly higher in occluded areas, reinforcing their value in identifying regions with ambiguous sensor input. By modeling the full map distribution, MapDiffusion enhances the robustness and reliability of online vectorized HD map construction, enabling uncertainty-aware decision-making for autonomous vehicles in complex environments.

15:05-15:10, Paper TuCT10.2
HAMF: A Hybrid Attention-Mamba Framework for Joint Scene Context Understanding and Future Motion Representation Learning

Mei, Xiaodong	HKUST
Wang, Sheng	Hong Kong University of Science and Technology
Cheng, Jie	Hong Kong University of Science and Technology
Chen, Yingbing	Huawei Hongkong Research Center
Xu, Dan	Hong Kong University of Science and Technology
Keywords: Semantic Scene Understanding, Human-Aware Motion Planning, Autonomous Vehicle Navigation Abstract: Motion forecasting represents a critical challenge in autonomous driving systems, requiring accurate prediction of surrounding agents' future trajectories. While existing approaches predict future motion states with the extracted scene context feature from historical agent trajectories and road layouts, they suffer from the information degradation during the scene feature encoding. To address the limitation, we propose HAMF, a novel motion forecasting framework that learns future motion representations with the scene context encoding jointly, to coherently combine the scene understanding and future motion state prediction. We first embed the observed agent states and map information into 1D token sequences, together with the target multi-modal future motion features as a set of learnable tokens. Then we design a unified Attention-based encoder, which synergistically combines self-attention and cross-attention mechanisms to model the scene context information and aggregate future motion features jointly. Complementing the encoder, we implement the Mamba module in the decoding stage to further preserve the consistency and correlations among the learned future motion representations, to generate the accurate and diverse final trajectories. Extensive experiments on Argoverse 2 benchmark demonstrate that our hybrid Attention-Mamba model achieves state-of-the-art motion forecasting performance with the simple and lightweight architecture.

15:10-15:15, Paper TuCT10.3
Label-Efficient LiDAR Semantic Segmentation with 2D-3D Vision Transformer Adapters

Hindel, Julia	University of Freiburg
Mohan, Rohit	University of Freiburg
Bratulić, Jelena	University of Freiburg
Cattaneo, Daniele	University of Freiburg
Brox, Thomas	University of Freiburg
Valada, Abhinav	University of Freiburg
Keywords: Semantic Scene Understanding, Computer Vision for Transportation, Deep Learning for Visual Perception Abstract: LiDAR semantic segmentation models are typically trained from random initialization as universal pre-training is hindered by the lack of large, diverse datasets. Moreover, most point cloud segmentation architectures incorporate custom network layers, limiting the transferability of advances from vision-based architectures. Inspired by recent advances in universal foundation models, we propose BALViT, a novel approach that leverages frozen vision models as amodal feature encoders for learning strong LiDAR encoders. Specifically, BALViT incorporates both range-view and bird’s-eye-view LiDAR encoding mechanisms, which we combine through a novel 2D-3D adapter. While the range-view features are processed through a frozen image backbone, our bird’s-eye-view branch enhances them through multiple cross-attention interactions. Thereby, we continuously improve the vision network with domain-dependent knowledge, resulting in a strong label-efficient LiDAR encoding mechanism. Extensive evaluations of BALViT on the SemanticKITTI and nuScenes benchmarks demonstrate that it outperforms state-of-the-art methods on small data regimes. We make the code and models publicly available at http://balvit.cs.uni-freiburg.de.

15:15-15:20, Paper TuCT10.4
Dense Semantic Bird-Eye-View Map Generation from Sparse LiDAR Point Clouds Via Distribution-Aware Feature Fusion

Li, Jinsong	City University of Hong Kong
Peng, Kunyu	Karlsruhe Institute of Technology
Sun, Yuxiang	City University of Hong Kong
Keywords: Semantic Scene Understanding, Intelligent Transportation Systems Abstract: Semantic scene understanding in bird-eye view (BEV) plays a crucial role in autonomous driving. A common approach to generating BEV maps from LiDAR point-cloud data involves constructing a pillar-level representation by projecting 3D point clouds onto a 2D plane. This process partially discards spatial geometric information, and produces sparse semantic maps. However, downstream tasks (e.g., trajectory planning and prediction), typically require dense grid-like semantic BEV maps rather than sparse segmentation outputs. To bridge this gap, we propose PointDenseBEV, an end-to-end, distribution-aware feature fusion framework. It takes as input sparse LiDAR point clouds and directly generates dense semantic BEV maps. Spatial geometric information and temporal context are embedded as auxiliary semantic cues within the BEV grid representation to enhance semantic density. Extensive experiments on the SemanticKITTI dataset demonstrate that our method achieves competitive performance compared to existing approaches.

15:20-15:25, Paper TuCT10.5
Interpreting Behaviors and Geometric Constraints As Knowledge Graphs for Robot Manipulation Control

Jiang, Chen	University of Alberta
Wang, Allie	University of Alberta
Jagersand, Martin	University of Alberta
Keywords: Semantic Scene Understanding, Manipulation Planning, Visual Servoing Abstract: In this paper, we investigate the feasibility of using knowledge graphs to interpret actions and behaviors for robot manipulation control. Equipped with an uncalibrated visual servoing controller, we propose to use robot knowledge graphs to unify behavior trees and geometric constraints, conceptualizing robot manipulation control as semantic events. The robot knowledge graphs not only preserve the advantages of behavior trees in scripting actions and behaviors, but also offer additional benefits of mapping natural interactions between concepts and events, which enable knowledgeable explanations of the manipulation contexts. Through real-world evaluations, we demonstrate the flexibility of the robot knowledge graphs to support explainable robot manipulation control.

15:25-15:30, Paper TuCT10.6
TASeg: Text-Aware RGB-T Semantic Segmentation Based on Fine-Tuning Vision Foundation Models

Yu, Meng	Beijing Institute of Technology
Cui, Te	Beijing Institute of Technology
Chu, Qitong	Beijing Institute of Technology
Song, Wenjie	Beijing Institute of Technology
Yang, Yi	Beijing Institute of Technology
Yue, Yufeng	Beijing Institute of Technology
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception Abstract: Reliable semantic segmentation of open environments is essential for intelligent systems, yet significant problems remain: 1) Existing RGB-T semantic segmentation models mainly rely on low-level visual features and lack high-level textual information, which struggle with accurate segmentation when categories share similar visual characteristics. 2) While SAM excels in instance-level segmentation, integrating it with thermal images and text is hindered by modality heterogeneity and computational inefficiency. To address these issues, we propose TASeg, a text-aware RGB-T semantic segmentation framework by using Low-Rank Adaptation (LoRA) fine-tuning technology to adapt vision foundation models. Specifically, we propose a Dynamic Feature Fusion Module (DFFM) in the image encoder, which effectively merges features from multiple visual modalities while freezing SAM's original transformer blocks. Additionally, we incorporate CLIP-generated text embeddings in the mask decoder to enable semantic alignment, which further rectifies the classification error and improves the semantic understanding accuracy. Experimental results across diverse datasets demonstrate that our method achieves superior performance in challenging scenarios with fewer trainable parameters.

15:30-15:35, Paper TuCT10.7
Domain-Conditioned Scene Graphs for State-Grounded Task Planning

Herzog, Jonas	Zhejiang University
Liu, Jiangpin	Zhejiang University
Wang, Yue	Zhejiang University
Keywords: Semantic Scene Understanding, Task Planning, Object Detection, Segmentation and Categorization Abstract: Recent robotic task planning frameworks have integrated large multimodal models (LMMs) such as GPT- 4o. To address grounding issues of such models, it has been suggested to split the pipeline into perceptional state grounding and subsequent state-based planning. As we show in this work, the state grounding ability of LMM-based approaches is still limited by weaknesses in granular, structured, domain-specific scene understanding. To address this shortcoming, we develop a more structured state grounding framework that features a domain-conditioned scene graph as its scene representation. We show that such representation is actionable in nature as it is directly mappable to a symbolic state in planning languages such as the Planning Domain Definition Language (PDDL). We provide an instantiation of our state grounding frame- work where the domain-conditioned scene graph generation is implemented with a lightweight vision-language approach that classifies domain-specific predicates on top of domain- relevant object detections. Evaluated across three domains, our approach achieves significantly higher state grounding accuracy and task planning success rates compared to LMM-based approaches.

15:35-15:40, Paper TuCT10.8
SAFormer: Spatially Adaptive Transformer for Efficient and Multi-Resolution Occupancy Prediction

Tang, Song	Hong Kong University of Science and Technology(Guangzhou)
Wang, Qiang	Harbin Institute of Technology, Shenzhen
Chu, Xiaowen	The Hong Kong University of Science and Technology (Guangzhou)
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Visual Learning Abstract: Accurate and efficient 3D scene understanding from multi-view images remains a fundamental challenge in autonomous driving. Existing methods often struggle with high-dimensional features, leading to excessive computational costs and memory usage. In this paper, we present SAFormer, a novel transformer-based framework for efficient spatially adaptive occupancy prediction. SAFormer incorporates two key techniques to reduce resource consumption: Octree-based Multi-resolution Feature (OMRF) Learning and Spatial-Adaptive Progressive Query (SAPQ). First, OMRF introduces an Octree-based hierarchical structure to compress multi-resolution 3D feature volumes. Second, SAPQ facilitates efficient information flow across different scales while effectively addressing scene sparsity. It employs a region-aware query mechanism that intelligently allocates computational resources, processing safety-critical regions at high resolution while handling background elements at lower resolutions. Experiments on the nuScenes dataset demonstrate that our method achieves state-of-the-art performance while significantly reducing inference latency (up to 3times) and memory cost (up to 2.9times). Our approach excels in managing scene sparsity and recognizing small, safety-critical objects, highlighting its potential for practical applications in autonomous driving.


TuCT11	311A
Reinforcement Learning 3	Regular Session
Chair: Ding, Jiatao	University of Trento
Co-Chair: Ding, Wenbo	Tsinghua University

15:00-15:05, Paper TuCT11.1
AAOPL: Automated Articulated Object Parameter Learning for Open-World Robotics

Feng, Ziyang	University of Science and Technology of China
Qiu, Quecheng	School of Data Science, USTC, Hefei 230026, China
Zhang, Silong	University of Science & Technology of China
Ji, Jianmin	University of Science and Technology of China
Keywords: Reinforcement Learning, Deep Learning in Grasping and Manipulation, Manipulation Planning Abstract: Articulated objects are ubiquitous in daily environments, and effective manipulation of these objects is essential for advancing open-world robotics. Existing approaches, which rely heavily on large-scale data collection or simulation, often face limitations in real-world applications, including issues with generalization and the sim-to-real gap. In this paper, we introduce the Automated Articulated Object Parameter Learning (AAOPL) framework, which autonomously learns the articulation parameters of real-world articulated objects through direct interaction. This approach enables robots to generate precise manipulation trajectories without relying on predefined object models or extensive human demonstration data. To accelerate the learning process, we develop Accelerated Single-Step Gradient (ASSG) algorithm, which efficiently refines the articulation parameters by leveraging real-time execution feedback. Experimental results demonstrate that AAOPL can learn accurate articulation parameters within 30 minutes (80 trials) and generate robust manipulation trajectories, outperforming baseline methods in terms of both task completion and force efficiency. Our approach eliminates the need for large-scale training datasets and can adapt to various articulated objects in real-world environments, offering a scalable solution for autonomous robotic manipulation in unstructured settings.

15:05-15:10, Paper TuCT11.2
Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning

Liu, Jiaqi	University of North Carolina at Chapel Hill
Xu, Chengkai	Tongji University
Hang, Peng	Tongji University
Sun, Jian	Tongji University
Zhan, Wei	Univeristy of California, Berkeley
Tomizuka, Masayoshi	University of California
Ding, Mingyu	University of North Carolina at Chapel Hill
Keywords: Reinforcement Learning, Deep Learning Methods, Cooperating Robots Abstract: The cooperative driving technology of Connected and Autonomous Vehicles (CAVs) is crucial for improving the efficiency and safety of transportation systems. Learning-based methods, such as Multi-Agent Reinforcement Learning (MARL), have demonstrated strong capabilities in cooperative decision-making tasks. However, existing MARL approaches still face challenges in terms of learning efficiency and performance. In recent years, Large Language Models (LLMs) have rapidly advanced and shown remarkable abilities in various sequential decision-making tasks. To enhance the learning capabilities of cooperative agents while ensuring decision-making efficiency and cost-effectiveness, we propose LDPD, a language-driven policy distillation method for guiding MARL exploration. In this framework, a teacher agent based on LLM trains smaller student agents to achieve cooperative decision-making through its own decision-making demonstrations. The teacher agent enhances the observation information of CAVs and utilizes LLMs to perform complex cooperative decision-making reasoning, which also leverages carefully designed decision-making tools to achieve expert-level decisions, providing high-quality teaching experiences. The student agent then refines the teacher's prior knowledge into its own model through gradient policy updates. The experiments demonstrate that the students can rapidly improve their capabilities with minimal guidance from the teacher and eventually surpass the teacher's performance. Extensive experiments show that our approach demonstrates better performance and learning efficiency compared to baseline methods.

15:10-15:15, Paper TuCT11.3
Closing the Intent-To-Reality Gap Via Fulfillment Priority Logic

El Mabsout, Bassel	Boston University
Abdelgawad, Abdelrahman	Boston University
Mancuso, Renato	Boston University
Keywords: Reinforcement Learning, Formal Methods in Robotics and Automation, Neural and Fuzzy Control Abstract: Practitioners designing reinforcement learning policies face a fundamental challenge: translating intended behavioral objectives into representative reward functions. This challenge stems from behavioral intent requiring simultaneous achievement of multiple competing objectives, typically addressed through labor-intensive linear reward composition that yields brittle results. Consider the ubiquitous robotics scenario where performance maximization directly conflicts with energy conservation. Such competitive dynamics are resistant to simple linear reward combinations. In this paper, we present the concept of objective fulfillment upon which we build Fulfillment Priority Logic (FPL). FPL allows practitioners to define logical formulae representing their intentions and priorities within multi-objective reinforcement learning. Our novel Balanced Policy Gradient algorithm leverages FPL specifications to achieve up to 500% better sample efficiency compared to Soft Actor Critic. Notably, this work constitutes the first implementation of a non-linear utility scalarization design, intended explicitly for continuous control problems.

15:15-15:20, Paper TuCT11.4
World Models for Anomaly Detection During Model-Based Reinforcement Learning Inference

Domberg, Fabian	Universität Zu Lübeck
Schildbach, Georg	University of Luebeck
Keywords: Reinforcement Learning, Machine Learning for Robot Control, AI-Based Methods Abstract: Learning-based controllers are often purposefully kept out of real-world applications due to concerns about their safety and reliability. We explore how state-of-the-art world models in Model-Based Reinforcement Learning can be utilized beyond the training phase to ensure a deployed policy only operates within regions of the state-space it is sufficiently familiar with. This is achieved by continuously monitoring discrepancies between a world model’s predictions and observed system behavior during inference. It allows for triggering appropriate measures, such as an emergency stop, once an error threshold is surpassed. This does not require any task-specific knowledge and is thus universally applicable. Simulated experiments on established robot control tasks show the effectiveness of this method, recognizing changes in local robot geometry and global gravitational magnitude. Real-world experiments using an agile quadcopter further demonstrate the benefits of this approach by detecting unexpected forces acting on the vehicle. These results indicate how even in new and adverse conditions, safe and reliable operation of otherwise unpredictable learning-based controllers can be achieved.

15:20-15:25, Paper TuCT11.5
Imagine-2-Drive: Leveraging High-Fidelity World Models Via Multi-Modal Diffusion Policies

Garg, Anant	International Institute of Information Technology, Hyderabad
Krishna, Madhava	IIIT Hyderabad
Keywords: Reinforcement Learning, Autonomous Vehicle Navigation, Learning from Experience Abstract: World Model-based Reinforcement Learning (WMRL) enables sample efficient policy learning by reducing the need for online interactions which can potentially be costly and unsafe, especially for autonomous driving. However, existing world models often suffer from low prediction fidelity and compounding one-step errors, leading to policy degradation over long horizons. Additionally, traditional RL policies, often deterministic or single Gaussian-based, fail to capture the multi-modal nature of decision-making in complex driving scenarios. To address these challenges, we propose Imagine-2-Drive, a novel WMRL framework that integrates a high-fidelity world model with a multi-modal diffusion-based policy actor. It consists of two key components: DiffDreamer, a diffusion-based world model that generates future observations simultaneously, mitigating error accumulation, and DPA (Diffusion Policy Actor), a diffusion-based policy that models diverse and multi-modal trajectory distributions. By training DPA within DiffDreamer, our method enables robust policy learning with minimal online interactions. We evaluate our method in CARLA using standard driving benchmarks and demonstrate that it outperforms prior world model baselines, improving Route Completion and Success Rate by 15% and 20% respectively.

15:25-15:30, Paper TuCT11.6
Safe Reinforcement Learning with a Predictive Safety Filter for Motion Planning and Control: A Drifting Vehicle Example

Zhou, Bei	Zhejiang University
Zarrouki, Baha	Technical University of Munich
Piccinini, Mattia	Technical University of Munich
Hu, Cheng	Zhejiang University
Xie, Lei	Zhejiang University
Betz, Johannes	Technical University of Munich
Keywords: Reinforcement Learning, Motion and Path Planning, Machine Learning for Robot Control Abstract: Autonomous drifting is a complex and crucial maneuver for safety-critical scenarios like slippery roads and emergency collision avoidance, requiring precise motion planning and control. Traditional motion planning methods often struggle with the high instability and unpredictability of drifting, particularly when operating at high speeds. Recent learning-based approaches have attempted to tackle this issue but often rely on expert knowledge or have limited exploration capabilities. Additionally, they do not effectively address safety concerns during learning and deployment. To overcome these limitations, we propose a novel Safe Reinforcement Learning (RL)-based motion planner for autonomous drifting. Our approach integrates an RL agent with model-based drift dynamics to determine desired drift motion states, while incorporating a Predictive Safety Filter (PSF) that adjusts the agent's actions online to prevent unsafe states. This ensures safe and efficient learning, and stable drift operation. We validate the effectiveness of our method through simulations on a Matlab-Carsim platform, demonstrating significant improvements in drift performance, reduced tracking errors, and computational efficiency compared to traditional methods. This strategy promises to extend the capabilities of autonomous vehicles in safety-critical maneuvers.

15:30-15:35, Paper TuCT11.7
Distributional Decision Transformer: Risk-Sensitive Offline RL Via Quantile-Based Critics and Stochastic Return

Wei, Changxu	Tsinghua University
Tang, Huaze	Tsinghua University
Zhang, Yixian	Tsinghua University
Wang, Chao	Tsinghua University
Zhang, Xiao-Ping	Tsinghua University
Ding, Wenbo	Tsinghua University
Keywords: Reinforcement Learning, Deep Learning Methods, Machine Learning for Robot Control Abstract: Offline reinforcement learning faces a critical challenge in synthesizing high-reward trajectories from suboptimal datasets while robustly handling the stochasticity inherent in real-world decision-making. While combination of return conditioned sequence models, such as Decision Transformers (DT), and dynamics programming critics shows great potential in trajectory synthesis, their deterministic action generation and scale Q value critic often fails to distinguish intentional behavioral variability from detrimental noise, leading to suboptimal policy collapse. To address this challenge, we propose the Distributional Decision Transformer (DDT), a novel framework that unifies probabilistic return distribution modeling with autoregressive action generation. DDT introduces two key innovations: (1) a Gaussian stochastic return mechanism that reparameterizes target returns as samplable distributions, enabling diverse action candidate generation; and (2) an Implicit Quantile Network (IQN) critic embedded within the deciding loop, which evaluates actions across the full spectrum of return distributions (quantiles tau sim U(0,1)). In D4RL benchmarks, DDT achieves state-of-the-art performance, achieving a 91.6 average normalized score in MuJoCo locomotion and 69.3 in sparse-reward settings. The results establish DDT as a principled solution for synthesis of risk-aware trajectory in offline RL.

15:35-15:40, Paper TuCT11.8
Learning Efficient Flocking Control Based on Gibbs Random Fields

Zhang, Dengyu	Sun Yat-Sen University
Yu, Chenghao	Sun Yat-Sen University
Xue, Feng	Sun Yat-Sen University
Zhang, Qingrui	Sun Yat-Sen University
Keywords: Reinforcement Learning, Machine Learning for Robot Control, Multi-Robot Systems Abstract: Flocking control is essential for multi-robot systems in diverse applications, yet achieving efficient flocking in congested environments poses challenges regarding computation burdens, performance optimality, and motion safety. This paper addresses these challenges through a multi-agent reinforcement learning (MARL) framework built on Gibbs Random Fields (GRFs). With GRFs, a multi-robot system is represented by a set of random variables conforming to a joint probability distribution, thus offering a fresh perspective on flocking reward design. A decentralized training and execution mechanism, which enhances the scalability of MARL concerning robot quantity, is realized using a GRF-based credit assignment method. An action attention module is introduced to implicitly anticipate the motion intentions of neighboring robots, consequently mitigating potential non-stationarity issues in MARL. The proposed framework enables learning an efficient distributed control policy for multi-robot systems in challenging environments with success rate around 90%, as demonstrated through thorough comparisons with state-of-the-art solutions in simulations and experiments. Ablation studies are also performed to validate the efficiency of different framework modules.


TuCT12	311B
RGB-D Perception 3	Regular Session
Chair: Ren, Hongliang	Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS)
Co-Chair: Wang, Wei	Midea Group

15:00-15:05, Paper TuCT12.1
V2-SfMLearner: Learning Monocular Depth and Ego-Motion for Multimodal Wireless Capsule Endoscopy (I)

Bai, Long	The Chinese University of Hong Kong
Cui, Beilei	The Chinese University of Hong Kong
Wang, Liangyu	The Chinese University of Hong Kong
Li, Yanheng	City University of Hong Kong
Yao, Shilong	City University of Hong Kong/Southern University of Science And
Yuan, Sishen	The Chinese University of Hong Kong
Wu, Yanan	China Medical University
Zhang, Yang	Hubei University of Technology
Meng, Max Q.-H.	The Chinese University of Hong Kong
Li, Zhen	Qilu Hospital of Shandong University
Ding, Weiping	Nantong University
Ren, Hongliang	Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS)
Keywords: RGB-D Perception, Sensor Fusion, Surgical Robotics: Planning Abstract: Deep learning can predict depth maps and capsule ego-motion from capsule endoscopy videos, aiding in 3D scene reconstruction and lesion localization. However, the collisions of the capsule endoscopies within the gastrointestinal tract cause vibration perturbations in the training data. Existing solutions focus solely on vision-based processing, neglecting other auxiliary signals like vibrations that could reduce noise and improve performance. Therefore, we propose V2-SfMLearner, a multimodal approach integrating vibration signals into vision-based depth and capsule motion estimation for monocular capsule endoscopy. We construct a multimodal capsule endoscopy dataset containing vibration and visual signals, and our artificial intelligence solution develops an unsupervised method using vision-vibration signals, effectively eliminating vibration perturbations through multimodal learning. Specifically, we carefully design a vibration network branch and a Fourier fusion module, to detect and mitigate vibration noises. The fusion framework is compatible with popular vision-only algorithms. Extensive validation on the multimodal dataset demonstrates superior performance and robustness against vision-only algorithms. Without the need for large external equipment, our V2-SfMLearner has the potential for integration into clinical capsule robots, providing real-time and dependable digestive examination tools. The findings show promise for practical implementation in clinical settings, enhancing the diagnostic capabilities of doctors.

15:05-15:10, Paper TuCT12.2
MK-Pose: Category-Level Object Pose Estimation Via Multimodal-Based Keypoint Learning

Yang, Yifan	Nankai University
Song, Peili	Dalian University of Technology
Lan, Enfan	Nankai University
Liu, Dong	Nankai University
Liu, Jingtai	Nankai University
Keywords: RGB-D Perception, Deep Learning for Visual Perception, Computer Vision for Automation Abstract: Category-level object pose estimation, which predicts the pose of objects within a known category without prior knowledge of individual instances, is essential in applications like warehouse automation and manufacturing. Existing methods relying on RGB images or point cloud data often struggle with object occlusion and generalization across different instances and categories. This paper proposes a multimodal-based keypoint learning framework (MK-Pose) that integrates RGB images, point clouds, and category-level textual descriptions. The model uses a self-supervised keypoint detection module enhanced with attention- based query generation, soft heatmap matching and graph- based relational modeling. Additionally, a graph-enhanced feature fusion module is designed to integrate local geometric information and global context. MK-Pose is evaluated on CAMERA25 and REAL275 dataset, and is further tested for cross-dataset capability on HouseCat6D dataset. The results demonstrate that MK-Pose outperforms existing state-of-the-art methods in both IoU and average precision without shape priors. Codes are released at href{https://github.com/yangyifanYYF/MK-Pose}{https://gith ub.com/yangyifanYYF/MK-Pose}.

15:10-15:15, Paper TuCT12.3
Quaternion Approximate Networks for Enhanced Image Classification and Oriented Object Detection

Grant, Bryce	Case Western Reserve University
Wang, Peng	Case Western Reserve University
Keywords: RGB-D Perception, Computer Vision for Automation, Deep Learning Methods Abstract: This paper introduces Quaternion Approximate Networks (QUAN), a novel deep learning framework that leverages quaternion algebra for rotation-equivariant image classification and object detection. Quaternion networks are studied for their inherent rotation-equivariance, enabling them to capture richer spatial relationships and geometric structures compared to real-valued convolutions. Unlike conventional quaternion neural networks that operate entirely in the quaternion domain, QUAN utilizes the Hamilton product to approximate quaternion convolution while maintaining real-valued operations. This approach preserves both semantic and geometric information while reducing computational overhead, allowing for efficient implementation within standard deep learning frameworks. Additionally, quaternion operations are extended to spatial attention mechanisms, and Independent Quaternion Batch Normalization is introduced to enhance feature characterization stability. The effectiveness of QUAN is evaluated in two experimental settings: image classification and rotated object detection. In classification tasks on CIFAR-10 and CIFAR-100, QUAN achieves higher accuracy with fewer parameters and faster convergence compared to existing convolution and quaternion-based models. For rotated object detection, performance is assessed on a real-world robotic assembly dataset, where a UR5 robot executes manipulation tasks with oriented bounding box outputs. By integrating quaternion-based representations into both classification and detection tasks, QUAN demonstrates improved parameter efficiency, rotation handling, and overall accuracy over standard Convolution Neural Networks (CNNs) or quaternion CNNs. These results highlight its potential for deployment in resource-constrained robotic systems and provide a foundation for further exploration on larger benchmark datasets.

15:15-15:20, Paper TuCT12.4
SPLATART: Articulated Gaussian Splatting with Estimated Object Structure

Lewis, Stanley	University of Michigan
Chandra, Vishal	University of Michigan
Gao, Tom	University of Michigan
Jenkins, Odest Chadwicke	University of Michigan
Keywords: Visual Learning, Representation Learning, RGB-D Perception Abstract: Representing articulated objects remains a difficult problem within the field of robotics. Objects such as pliers, clamps, or cabinets require representations that capture not only geometry and color information, but also part separation, connectivity, and joint parametrization. Furthermore, learning these representations becomes even more difficult with each additional degree of freedom. Complex articulated objects such as robot arms may have seven or more degrees of freedom, and the depth of their kinematic tree may be notably greater than the tools, drawers, and cabinets that are the typical subjects of articulated object research. To address these concerns, we introduce SPLATART - a pipeline for learning Gaussian splat representations of articulated objects from posed images, of which a subset contain image space part segmentations. SPLATART disentangles the part separation task from the articulation estimation task, allowing for post-facto determination of joint estimation, as well as allowing for representing articulated objects with deeper kinematic trees than previously exhibited. In this work, we present data on the SPLATART pipeline as applied to the Paris dataset objects [1], as well as qualitative results on real- world objects. We additionally present on more complex articulated serial chain manipulators to demonstrate usage on deeper kinematic tree structures.

15:20-15:25, Paper TuCT12.5
DualCLIP: Bridging 3D Geometry and Multimodal Semantics for Robotic Perception

Liu, Yinghao	Southwest Jiaotong University
Dai, Penglin	Southwest Jiaotong University
Ding, Yan	SUNY Binghamton
Cao, Nieqing	Xi'an Jiaotong-Liverpool University
Keywords: RGB-D Perception, Deep Learning Methods Abstract: Current approaches to integrating CLIP into language-driven robotics face a fundamental dilemma: While robotic implementations overlook cutting-edge 3D classification adaptations of CLIP, existing 3D-oriented CLIP methods prove inadequate for interpreting color-critical instructions prevalent in manipulation tasks. We resolve this through DualCLIP, a contrastive multimodal fusion framework that hierarchically integrates depth-aligned CLIP encoders. Our approach first aligns depth and CLIP RGB encoders using synthetic RGB-D pairs, then performs multimodal fusion via contrastive learning with language-triplet optimization. This joint training preserves 3D geometric coherence and color semantics. Evaluations demonstrate DualCLIP's combined strength — surpassing CLIP2Point in 3D classification while showing promising improvements for CLIPORT in color-sensitive robotic manipulation. This work establishes a paradigm for translating vision-language models into 3D-aware robotic systems without compromising task-specific modality sensitivity.

15:25-15:30, Paper TuCT12.6
A Novel Event-Based Structured Light System for High-Precision and High-Speed Depth Sensing

Su, Gongzhe	Midea Corporate Research Center
Sun, Fulong	Midea Corporate Research Center
Wang, Wei	Midea Group
Xi, Wei	Midea
Keywords: RGB-D Perception, Computer Vision for Automation, Computer Vision for Manufacturing Abstract: This paper presents a novel event-based depth sensing system with line laser scan. Our main contribution involves both hardware and software improvements to previous state-of-the-art works. The polygon mirror scanner is designed to steer line laser with a constant velocity, which minimizes non-linearity of the projected time map to improve depth precision. A piecewise linear model is then proposed to model the behavior of the scanner, which is simple and easy to calibrate. The corresponding reconstruction pipeline achieves a high-speed depth map with an efficient plane-ray intersection-based depth calculation. Experimental results verify the approach is capable of realizing 0.6mm precision at a distance of 500mm and 8.3ms depth reconstruction runtime on embedded platforms.

15:30-15:35, Paper TuCT12.7
CDIS : Cross-Dimensional Class-Agnostic 3D Instance Segmentation Via 2D Mask Tracking and 3D-2D Projection Merging

Kim, Juno	Seoul National University
Yoon, Hye Jung	Seoul National University
Park, Yesol	Seoul National University
Zhang, Byoung-Tak	Seoul National University
Keywords: RGB-D Perception, Object Detection, Segmentation and Categorization, Semantic Scene Understanding Abstract: Class-agnostic 3D instance segmentation is critical for robotic systems operating in unknown environments, enabling perception of previously unseen objects for reliable manipulation and navigation. Existing approaches typically project per-frame 2D instance masks into 3D and merge them, which often breaks object identities across time and yields fragmented 3D instances. We introduce Cross-Dimensional Class-Agnostic 3D Instance Segmentation (CDIS), a zero-shot framework that explicitly tracks 2D instance masks across frames and associates them with 3D superpoints, creating a feedback loop between 2D and 3D. This cross-dimensional reasoning links temporally stable 2D tracks with spatially coherent 3D regions, producing globally consistent 3D instance labels without any 3D-specific training. Experiments on benchmark datasets demonstrate that CDIS achieves higher accuracy and consistency than state-of-the-art zero-shot methods, while remaining efficient and scalable to diverse real-world environments.


TuCT13	311C
Deep Learning for Visual Perception 3	Regular Session
Chair: Li, Xiang	Tsinghua University

15:00-15:05, Paper TuCT13.1
SDA-LLM: Spatial DisAmbiguation Via Multi-Turn Vision-Language Dialogues for Robot Navigation

Chen, Kuan-Lin	National Yang Ming Chiao Tung University
Wei, Tzu-Ti	National Yang-Ming Chiao-Tung University
Lee, Ming-Lun	National Yang Ming Chiao Tung University, Taiwan
Yeh, Li-Tzu	National Yang Ming Chiao Tung University
Kao, Elaine	National Yang Ming Chiao Tung University Intern
Tseng, Yu-Chee	National Yang Ming Chiao Tung University
Chen, Jen-Jee	National Yang Ming Chiao Tung University
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, Vision-Based Navigation Abstract: When users give natural language instructions to service robots, positional information is often referenced relative to objects in the environment rather than absolute coordinates. However, humans naturally use relative references. For example, in “Go to the chair and pick up empty bottles”, where the positional reference is the chair, ambiguity arises when multiple similar objects co-exist in the environment or when the robot's view is limited, resulting in multiple possible interpretations of the same command and affecting navigation decisions. To address this issue, we propose a two-level framework that integrates a large language model (LLM) and a vision-language model (VLM), allowing the robot to engage in multi-turn dialogues for spatial disambiguation. Our method first utilizes a VLM to map the semantic meanings of dialogues to a unique object ID in images and then further maps this object ID to a 3D depth map, enabling the robot to accurately determine its navigation target. To the best of our knowledge, this is the first work leveraging foundation models to address spatial ambiguity.

15:05-15:10, Paper TuCT13.2
COARSE: Collaborative Pseudo-Labeling with Coarse Real Labels for Off-Road Semantic Segmentation

Noca, Aurelio	Caltech
Lei, Xianmei	NASA JPL
Becktor, Jonathan	Techincal University of Denmark
Edlund, Jeffrey	Jet Propulsion Lab
Sabel, Anna	NASA JPL
Spieler, Patrick	JPL
Padgett, Curtis	JPL
Alahi, Alexandre	EPFL
Atha, Deegan	Jet Propulsion Laboratory
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Autonomous Vehicle Navigation Abstract: Autonomous off-road navigation faces challenges due to diverse, unstructured environments, requiring robust perception with both geometric and semantic understanding. However, scarce densely labeled semantic data limits generalization across domains. Simulated data helps, but introduces domain adaptation issues. We propose COARSE, a semi-supervised domain adaptation framework for off-road semantic segmentation, leveraging sparse, coarse in-domain labels and densely labeled out-of-domain data. Using pretrained vision transformers, we bridge domain gaps with complementary pixel-level and patch-level decoders, enhanced by a collaborative pseudo-labeling strategy on unlabeled data. Evaluations on RUGD and Rellis-3D datasets show significant improvements of 9.7% and 8.4% respectively, versus only using coarse data. Tests on real-world off-road vehicle data in a multi-biome setting further demonstrate COARSE’s applicability.

15:10-15:15, Paper TuCT13.3
Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos

Ma, Junyi	Beijing Institute of Technology
Chen, Xieyuanli	National University of Defense Technology
Xu, Jingyi	Shanghai Jiao Tong University
Wang, Hesheng	Shanghai Jiao Tong University
Keywords: Deep Learning for Visual Perception, Deep Learning Methods Abstract: Understanding how humans would behave during hand-object interaction (HOI) is vital for applications in service robot manipulation and extended reality. To achieve this, some recent works simultaneously forecast hand trajectories and object affordances on human egocentric videos. The joint prediction serves as a comprehensive representation of future HOI in 2D space, indicating potential human motion and motivation. However, the existing approaches mostly adopt the autoregressive paradigm, which lacks bidirectional constraints within the holistic future sequence, and accumulates errors along the time axis. Meanwhile, they overlook the effect of camera egomotion on first-person view predictions. To address these limitations, we propose a novel diffusion-based HOI prediction method, namely Diff-IP2D, to forecast future hand trajectories and object affordances with bidirectional constraints in an iterative non-autoregressive manner on egocentric videos. Motion features are further integrated into the conditional denoising process to enable Diff-IP2D aware of the camera wearer's dynamics for more accurate interaction prediction. Extensive experiments demonstrate that Diff-IP2D significantly outperforms the state-of-the-art baselines on both the off-the-shelf and our newly proposed evaluation metrics. This highlights the efficacy of leveraging a generative paradigm for 2D HOI prediction. The code and the video have been released at https://github.com/IRMVLab/Diff-IP2D.

15:15-15:20, Paper TuCT13.4
PAVLM: Advancing Point Cloud Based Affordance Understanding Via Vision-Language Model

Liu, Shang-Ching	Universität Hamburg
Tran, Van Nhiem	Hon Hai Research Institute (AI Research Center), Foxconn
Chen, Wenkai	University of Hamburg
Cheng, Wei-Lun	National Taiwan University
Huang, Yen-Lin	National Tsing Hua University
Liao, I-Bin	Foxconn
Li, Yung-Hui	AI Research Center, Hon Hai Research Institute
Zhang, Jianwei	University of Hamburg
Keywords: Deep Learning for Visual Perception, RGB-D Perception, Visual Learning Abstract: Affordance understanding, the task of identify- ing actionable regions on 3D objects, plays a vital role in allowing robotic systems to engage with and operate within the physical world. Although Visual Language Models (VLMs) have excelled in high-level reasoning and long-horizon planning for robotic manipulation, they still fall short in grasping the nuanced physical properties required for effective human-robot interaction. In this paper, we introduce PAVLM (Point cloud Affordance Vision-Language Model), an innovative framework that utilizes the extensive multimodal knowledge embedded in pre-trained language models to enhance 3D affordance understanding of point cloud. PAVLM is an approach to integrates a geometric-guided propagation module with hidden embeddings from large language models (LLMs) to enrich visual semantics. On the language side, we prompt Llama-3.1 models to generate refined context-aware text, augmenting the instructional input with deeper semantic cues. Experimental results on the 3D-AffordanceNet benchmark demonstrate that PAVLM outperforms baseline methods for both full and partial point clouds, particularly excelling in its generalization to novel open-world affordance tasks of 3D objects. For more information, visit our project site: pavlm-source.github.io.

15:20-15:25, Paper TuCT13.5
VPOcc: Exploiting Vanishing Point for 3D Semantic Occupancy Prediction

Kim, Junsu	Ulsan National Institute of Science and Technology
Lee, Junhee	Ulsan National Institute of Science and Technology
Shin, Ukcheol	CMU(Carnegie Mellon University)
Oh, Jean	Carnegie Mellon University
Joo, Kyungdon	UNIST
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Object Detection, Segmentation and Categorization Abstract: Understanding 3D scenes semantically and spatially is crucial for the safe navigation of robots and autonomous vehicles, aiding obstacle avoidance and trajectory planning. Camera-based 3D semantic occupancy prediction, which infers dense voxel grids from 2D images, is gaining importance in robot vision for its resource efficiency over 3D sensors. However, this task inherently suffers from a 2D-3D discrepancy problem, meaning that even for the same-sized objects in 3D space, an object closer to the camera looks larger than the other in a 2D image due to the camera perspective geometry. To tackle this issue, we propose a novel framework called VPOcc that leverages vanishing point (VP) to mitigate the 2D-3D discrepancy from pixel-level and feature-level perspectives. As a pixel-level solution, we introduce a VPZoomer module, which warps images by counteracting the perspective effect through a VP-based homography transform. In addition, as a feature-level solution, we propose a VP-guided cross-attention (VPCA) module that performs perspective-aware feature aggregation, utilizing 2D image features that are more suitable for 3D space. Lastly, we integrate two feature volumes extracted from original and warped images to compensate for each other through a spatial volume fusion (SVF) module. By effectively incorporating VP into the network, our framework achieves improved performance in both IoU and mIoU metrics on SemanticKITTI and SSCBench-KITTI360. Our video is available at https://vision3d-lab.github.io/vpocc/.

15:25-15:30, Paper TuCT13.6
Learning Appearance and Motion Cues for Panoptic Tracking

Juana Valeria, Hurtado	University of Freiburg
Marvi, Mohammad Sajad	University of Freiburg, Mercedes Benz AG
Mohan, Rohit	University of Freiburg
Valada, Abhinav	University of Freiburg
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, Visual Learning Abstract: Panoptic tracking enables pixel-level scene interpretation of videos by integrating instance tracking in panoptic segmentation. This provides robots with a spatio-temporal understanding of the environment, an essential attribute for their operation in dynamic environments. In this paper, we propose a novel approach for panoptic tracking that simultaneously captures general semantic information and instance-specific appearance and motion features. Unlike existing methods that overlook dynamic scene attributes, our approach leverages both appearance and motion cues through dedicated network heads. These interconnected heads employ multi-scale deformable convolutions that reason about scene motion offsets with semantic context and motion-enhanced appearance features to learn tracking embeddings. Furthermore, we introduce a novel two-step fusion module that integrates the outputs from both heads by first matching instances from the current time step with propagated instances from previous time steps and subsequently refines associations using motion-enhanced appearance embeddings, improving robustness in challenging scenarios. Extensive evaluations of our proposed netname model on two benchmark datasets demonstrate that it achieves state-of-the-art performance in panoptic tracking accuracy, surpassing prior methods in maintaining object identities over time. To facilitate future research, we make the code available at panoptictracking.cs.uni-freiburg.de.

15:30-15:35, Paper TuCT13.7
ACP-MVS: Efficient Multi-View Stereo with Attention-Based Context Perception

Jia, Hao	Huazhong University of Science and Technology
Xu, Gangwei	Huazhong University of Science and Technology
Feng, Miaojie	Huazhong University of Science and Technology
Wang, Xianqi	Huazhong University of Science and Technology
Cheng, Junda	Huazhong University of Science and Technology
Lin, Min	Huazhong University of Science and Technology
Yang, Xin	Huazhong University of Science and Technology
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, SLAM Abstract: The core of Multi-View Stereo (MVS) is to find corresponding pixels in neighboring images. However, due to challenging regions in input images such as untextured areas, repetitive patterns, or reflective surfaces, existing methods struggle to find precise pixel correspondence therein, resulting in inferior reconstruction quality. In this paper, we present an efficient context-perception MVS network, termed ACP-MVS. The ACP-MVS constructs a context-aware cost volume that can enhance pixels containing essential context information while suppressing irrelevant or noisy information via our proposed Context-stimulated Weighting Fusion module. Furthermore, we introduce a new Context-Guided Global Aggregation module, based on the insight that similar-looking pixels tend to have similar depths, which exploits global contextual cues to implicitly guide depth detail propagation from high-confidence regions to low-confidence ones. These two modules work in synergy to substantially improve reconstruction quality of ACP-MVS without incurring significant additional computational and time cost. Extensive experiments demonstrate that our approach not only achieves state-of-the-art performance but also offers the fastest inference speed and minimal GPU memory usage, providing practical value for practitioners working with high-resolution MVS image sets. Notably, our method ranks 2nd on the challenging Tanks and Temples advanced benchmark among all published methods. Code is available at https://github.com/HaoJia-mongh/ACP-MVS.

15:35-15:40, Paper TuCT13.8
SLTNet: Efficient Event-Based Semantic Segmentation with Spike-Driven Lightweight Transformer-Based Networks

Long, Xianlei	Chongqing University
Zhu, Xiaxin	Chongqing University
Guo, Fangming	Chongqing University
Zhang, Wanyi	Chongqing University
Gu, Qingyi	Institute of Automation, Chinese Academy of Sciences
Chen, Chao	Chongqing University
Gu, Fuqiang	Chongqing University
Keywords: Deep Learning for Visual Perception, Semantic Scene Understanding, AI-Based Methods Abstract: Event-based semantic segmentation has great potential in autonomous driving and robotics due to the advantages of event cameras, such as high dynamic range, low latency, and low power cost. Unfortunately, current artificial neural network (ANN)-based segmentation methods suffer from high computational demands, the requirements for image frames, and massive energy consumption, limiting their efficiency and application on resource-constrained edge/mobile platforms. To address these problems, we introduce SLTNet, a spike-driven lightweight transformer-based network designed for event-based semantic segmentation. Specifically, SLTNet is built on efficient spike-driven convolution blocks (SCBs) to extract rich semantic features while reducing the model's parameters. Then, to enhance the long-range contextural feature interaction, we propose novel spike-driven transformer blocks (STBs) with binary mask operations. Based on these basic blocks, SLTNet employs a high-efficiency single-branch architecture while maintaining the low energy consumption of the Spiking Neural Network (SNN). Finally, extensive experiments on DDD17 and DSEC-Semantic datasets demonstrate that SLTNet outperforms state-of-the-art (SOTA) SNN-based methods by at most 9.06% and 9.39% mIoU, respectively, with extremely 4.58x lower energy consumption and 114 FPS inference speed. Our code is open-sourced and available at url{https://github.com/longxianlei/SLTNet-v1.0}.


TuCT14	311D
Deep Learning Methods 3	Regular Session
Chair: Luck, Kevin Sebastian	Vrije Universiteit Amsterdam
Co-Chair: Rasouli, Amir	Huawei Technologies Canada

15:00-15:05, Paper TuCT14.1
MetaFold: Language-Guided Multi-Category Garment Folding Framework Via Trajectory Generation and Foundation Model

Chen, Haonan	National University of Singapore
Li, Junxiao	Nanjing University
Wu, Ruihai	Peking University
Liu, Yiwei	National University of Singapore
Hou, Yiwen	University of Science and Technology of China
Xu, Zhixuan	National University of Singapore
Guo, Jingxiang	National University of Singapore
Gao, Chongkai	National University of Singapore
Wei, Zhenyu	Shanghai Jiao Tong University
Xu, Shensi	Nanjing University
Huang, Jiaqi	Nanjing University
Shao, Lin	National University of Singapore
Keywords: Computer Vision for Automation, Model Learning for Control, Deep Learning Methods Abstract: Garment folding is a common yet challenging task in robotic manipulation. The deformability of garments leads to a vast state space and complex dynamics, which complicates precise and fine-grained manipulation. Previous approaches often rely on predefined key points or demonstrations, limiting their generalization across diverse garment categories. This paper presents a framework, MetaFold, that disentangles task planning from action prediction and learns each independently to enhance model generalization. It employs language-guided point cloud trajectory generation for task planning and a low-level foundation model for action prediction. This structure facilitates multi-category learning, enabling the model to adapt flexibly to various user instructions and folding tasks. Experimental results demonstrate the superiority of our proposed framework. Supplementary materials are available on our website: https://meta-fold.github.io/.

15:05-15:10, Paper TuCT14.2
Multi-PrefDrive: Optimizing Large Language Models for Autonomous Driving through Multi-Preference Tuning

Li, Yun	The University of Tokyo
Javanmardi, Ehsan	The University of Tokyo
Thompson, Simon	Tier IV
Katsumata, Kai	The University of Tokyo
Orsholits, Alex	The University of Tokyo, Graduate School of Information Science
Tsukada, Manabu	The University of Tokyo
Keywords: Deep Learning Methods, Reinforcement Learning, Simulation and Animation Abstract: This paper introduces Multi-PrefDrive, a framework that significantly enhances LLM-based autonomous driving through multidimensional preference tuning. Aligning LLMs with human driving preferences is crucial yet challenging, as driving scenarios involve complex decisions where multiple incorrect actions can correspond to a single correct choice. Traditional binary preference tuning fails to capture this complexity. Our approach pairs each chosen action with multiple rejected alternatives, better reflecting real-world driving decisions. By implementing the Plackett-Luce preference model, we enable nuanced ranking of actions across the spectrum of possible errors. Experiments in the CARLA simulator demonstrate that our algorithm achieves an 11.0 % improvement in overall score and an 83.6 % reduction in infrastructure collisions, while showing perfect compliance with traffic signals in certain environments. Comparative analysis against DPO and its variants reveals that Multi-PrefDrive's superior discrimination between chosen and rejected actions, which achieving a margin value of 25, and such ability has been directly translates to enhanced driving performance. We implement memory-efficient techniques including LoRA and 4-bit quantization to enable deployment on consumer-grade hardware and will open-source our training code and multi-rejected dataset to advance research in LLM-based autonomous driving systems.

15:10-15:15, Paper TuCT14.3
Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy

Li, Bojian	Beihang University
Liu, Bo	Beihang University
Yao, Xinning	Beihang University
Yue, Jinghua	Beihang University
Zhou, Fugen	Beihang University
Keywords: Vision-Based Navigation, Computer Vision for Medical Robotics, Deep Learning Methods Abstract: Depth estimation is a cornerstone of 3D reconstruction and plays a vital role in minimally invasive endoscopic surgeries. However, most current depth estimation networks rely on traditional convolutional neural networks, which are limited in their ability to capture global information. Foundation models offer a promising approach to enhance depth estimation, but those models currently available are primarily trained on natural images, leading to suboptimal performance when applied to endoscopic images. In this work, we introduce a novel fine-tuning strategy for the Depth Anything Model and integrate it with an intrinsic-based unsupervised monocular depth estimation framework. Our approach includes a low-rank adaptation technique based on random vectors, which improves the model’s adaptability to different scales. Additionally, we propose a residual block built on depthwise separable convolution to compensate for the transformer’s limited ability to capture local features. Our experimental results on the SCARED dataset and Hamlyn dataset show that our method achieves state-of-the-art performance while minimizing the number of trainable parameters. Applying this method in minimally invasive endoscopic surgery can enhance surgeons’ spatial awareness, thereby improving the precision and safety of the procedures.

15:15-15:20, Paper TuCT14.4
LaneMind: Seeing Lanes Like Human Drivers

Qian, Zhengyan	Nanjing University of Science and Technology
Ma, Qian	Nanjing University of Science and Technology
Keywords: Deep Learning Methods, Computer Vision for Transportation, Object Detection, Segmentation and Categorization Abstract: Accurate lane detection is critical for autonomous driving safety. In recent years, anchor-based detection methods have made significant progress. However, existing frameworks struggle in complex scenarios such as nighttime or dazzle light environments. Additionally, these methods exhibit limited geometric modeling and extrapolation capabilities for curvature variations in curved lanes. To tackle these challenges, we propose LaneMind, an innovative framework that combines human visual perception principles with advanced geometric modeling. Our approach features a dual-path architecture with cross-path attention mechanism, enabling simultaneous local feature extraction and global structure modeling. The network outputs a confidence heatmap, followed by a skeleton-guided regression module that extracts medial-axis skeletons from high-probability lane regions to precisely localize lanes while maintaining topological continuity. Experimental results demonstrate that LaneMind achieves competitive performance across various benchmarks, particularly excelling in challenging curved lane scenarios and adverse lighting conditions. The framework's robust performance and accurate detection quality highlight its potential for real-world autonomous driving applications.

15:20-15:25, Paper TuCT14.5
Spatio-Temporal Hyperbolic Aggregation Neural Network for Human Action Recognition

Akremi, Mohamed Sanim	Phd Student
Neji, Najett	Universite Paris Saclay
Tabia, Hedi	ETIS-ENSEA
Keywords: Deep Learning Methods, Surveillance Robotic Systems, Machine Learning for Robot Control Abstract: Human action recognition (HAR) is a critical task in the field of robotics. Traditionally, HAR methods rely on either perceptual features from RGB images or skeletal features. While RGB-based features are typically represented in 2D Euclidean space, few approaches differentiate between methods developed for RGB data and those for skeletal features, often treating both as Euclidean representations. This conventional approach, which typically leverages standard deep learning techniques, limits the descriptive power of skeletal data, which naturally exhibits a tree-like structure. In this paper, we introduce a novel framework that, for the first time, utilizes skeletal data while preserving its inherent structure to fully capture its descriptive potential. Our proposed deep neural network embeds skeletal joints into hyperbolic space, followed by a spatio-temporal processing framework that incorporates established transformations to optimize performance while maintaining the advantages of hyperbolic analysis. Extensive experiments on publicly available datasets, including UAV-Human, UAV-Gesture, and DHG 14/28, demonstrate that our approach achieves state-of-the-art results, underscoring its ability to enhance robotic systems' performance in dynamic environments.

15:25-15:30, Paper TuCT14.6
MuSPaCSA: Multi-Scale Parallel-Channel Self-Attention Network for Point Cloud Classification and Segmentation

Yao, Xuran	Shantou University
Zheng, Xianwei	Foshan University
Yao, Zheng	Shantou University
Li, Xutao	SHANTOU UNIVERSITY
Keywords: Deep Learning Methods, Recognition, Deep Learning for Visual Perception Abstract: Point cloud classification and segmentation are fundamental tasks in 3D computer vision. Recently, deep learning-based methods, particularly 3D Transformers, have demonstrated their effectiveness across a variety of point cloud tasks. However, transformer-based methods embed position information into feature vectors, which can introduce a significant computational cost. Additionally, these approaches often struggle to adaptively extract different features across varying receptive fields, which limits their performance in various tasks. To address these challenges, we propose a novel Multi-Scale Parallel-Channel Self-Attention (MuSPaCSA) network, designed with a multi-scale feature extraction architecture by stacking Parallel-Channel Self-Attention (PaCSA) layers for classification and segmentation tasks. Specifically, our MuSPaCSA employs the PaCSA module to extract essential semantic and spatial features. The core components of the PaCSA module include the Semantic-Spatial Integration (SSI) and Adaptive Self-Attention (ASA) modules. The SSI module employs a parallel-channel approach to integrate semantic and spatial information, enabling the representation of high-dimensional structural features in point clouds. The ASA module calculates adaptive weights to aggregate rich, high-dimensional structural features from neighboring nodes in a lightweight manner. Through the multi-scale feature fusion architecture of MuSPaCSA, local and global features, as well as semantic and spatial features, are effectively integrated, significantly enhancing the model's representational capacity. Extensive experiments demonstrate that our model achieves superior performance and results with lower computational cost compared to competing methods.

15:30-15:35, Paper TuCT14.7
Label-Efficient LiDAR Panoptic Segmentation

Canakci, Ahmet Selim	University of Freiburg
Vödisch, Niclas	University of Freiburg
Petek, Kürsat	University of Freiburg
Burgard, Wolfram	University of Technology Nuremberg
Valada, Abhinav	University of Freiburg
Keywords: Deep Learning Methods, Semantic Scene Understanding, Range Sensing Abstract: A main bottleneck of learning-based robotic scene understanding methods is the heavy reliance on extensive annotated training data, which often limits their generalization ability. In LiDAR panoptic segmentation, this challenge becomes even more pronounced due to the need to simultaneously address both semantic and instance segmentation from complex, high-dimensional point cloud data. In this work, we address the challenge of LiDAR panoptic segmentation with very few labeled samples by leveraging recent advances in label-efficient vision panoptic segmentation. To this end, we propose a novel method, Limited-Label LiDAR Panoptic Segmentation (L3PS), which requires only a minimal amount of labeled data. Our approach first utilizes a label-efficient 2D network to generate panoptic pseudo-labels from a small set of annotated images, which are subsequently projected onto point clouds. We then introduce a novel 3D refinement module that capitalizes on the geometric properties of point clouds. By incorporating clustering techniques, sequential scan accumulation, and ground point separation, this module significantly enhances the accuracy of the pseudo-labels, improving segmentation quality by up to +10.6 PQ and +7.9 mIoU. We demonstrate that these refined pseudo-labels can be used to effectively train off-the-shelf LiDAR segmentation networks. Through extensive experiments, we show that L3PS not only outperforms existing methods but also substantially reduces the annotation burden. We release the code of our work at https://l3ps.cs.uni-freiburg.de.

15:35-15:40, Paper TuCT14.8
Co-Adaptation of Embodiment and Control with Self-Imitation Learning

Hernández-Gutiérrez, Sergio	University of Tübingen
Kyrki, Ville	Aalto University
Luck, Kevin Sebastian	Vrije Universiteit Amsterdam
Keywords: Evolutionary Robotics, Imitation Learning, Deep Learning Methods Abstract: The task of co-optimizing the body and behaviour of agents has been a long-standing problem in the fields of evolutionary robotics and embodied AI. Previous work has largely focused on the development of learning methods exploiting massive parallelization of agent evaluations with large population sizes, a paradigm which is applicable to simulated agents but cannot be transferred to the real world due to the assoicated costs with the production of embodiments and robots. Furthermore, recent data-efficient approaches utilizing reinforcement learning can suffer from distributional shifts in transition dynamics as well as in state and action spaces when experiencing new body morphologies. In this work, we propose a new co-adaptation method combining reinforcement learning and State-Aligned Self-Imitation Learning to co-optimize embodiment and behavioural policies withing a handful of design iterations. We show that the integration of a self-imitation signal improves the data-efficiency of the co-adaptation process as well as the behavioural recovery when adapting morphological parameters.


TuCT15	206
Simulation	Regular Session
Co-Chair: Wei, Dunwen	University of Electronic Science and Technology of China

15:00-15:05, Paper TuCT15.1
LSTM-MHSA-Enhanced Deep Reinforcement Learning for Accurate Gait Control in Human Musculoskeletal Model

Mao, Shiyu	University of Electronic Science and Technology of China
Tang, Zihao	University of Electronic Science and Technology of China
Ficuciello, Fanny	Università Di Napoli Federico II
Peng, Bei	University of Electronic Science and Technology of China
Wei, Dunwen	University of Electronic Science and Technology of China
Keywords: Deep Learning Methods, Modeling and Simulating Humans, Prosthetics and Exoskeletons Abstract: Modeling and controlling the musculoskeletal system are crucial for understanding human motor functions, optimizing human-robot interaction, and developing embodied intelligence. However, existing musculoskeletal models are mainly limited to specific body parts and muscle groups, and still face challenges in large-scale muscle coordination and the generation of diverse movements. In this study, we propose a musculoskeletal deep reinforcement learning (DRL) control model. This model integrates a Long Short-Term Memory (LSTM) network and a Multi-Head Self-Attention (MHSA) mechanism into the Proximal Policy Optimization (PPO) algorithm. The LSTM-MHSA-enhanced PPO control approach generates accurate muscle activation, motion trajectories, and torque control strategies to precisely control and replicate diverse human gaits based on target joint movements. Experimental results demonstrate that this LSTM-MHSA-enhanced PPO algorithm significantly improves the model accuracy compared to the traditional PPO algorithm, with a 43.75% reduction in Mean Absolute Error (MAE) for walking tasks and a 34.14% reduction for running tasks. Furthermore, for complex tasks such as striking and dancing, the MAE decreases by 46.97% and 41.78%, respectively. These findings highlight that integrating LSTM and MHSA into PPO algorithm not only enhances gait simulation accuracy but also improves the model's generalization capability, particularly for complex motion patterns. This research provides an efficient tool for motion simulation and gait analysis, advancing the development of human musculoskeletal control systems.

15:05-15:10, Paper TuCT15.2
MIXPINN: Mixed-Material Simulations by Physics-Informed Neural Network

Yuan, Xintian	ETH Zurich
Ao, Yunke	ETH Zurich
Chen, Boqi	ETH Zurich
Fuernstahl, Philipp	University of Zurich
Keywords: Medical Robots and Systems, Deep Learning Methods, Simulation and Animation Abstract: Simulating the complex interactions between soft tissues and rigid anatomy is critical for applications in surgical training, planning, and robotic-assisted interventions. Traditional Finite Element Method (FEM)-based simulations, while accurate, are computationally expensive and impractical for real-time scenarios. Learning-based approaches have shown promise in accelerating predictions but have fallen short in modeling soft-rigid interactions effectively. We introduce MIXPINN, a physics-informed Graph Neural Network (GNN) framework for mixed-material simulations, explicitly capturing soft-rigid interactions using graph-based augmentations. Our approach integrates Virtual Nodes (VNs) and Virtual Edges (VEs) to enhance rigid body constraint satisfaction while preserving computational efficiency. By leveraging a graph-based representation of biomechanical structures, MIXPINN learns high-fidelity deformations from FEM-generated data and achieves real-time inference with sub-millimeter accuracy. We validate our method in a realistic clinical scenario, demonstrating superior performance compared to baseline GNN models and traditional FEM methods. Our results show that MIXPINN reduces computational cost by an order of magnitude while maintaining high physical accuracy, making it a viable solution for real-time surgical simulation and robotic-assisted procedures.

15:10-15:15, Paper TuCT15.3
Robust and High-Fidelity 3D Gaussian Splatting: Fusing Pose Priors and Geometry Constraints for Texture-Deficient Outdoor Scenes

Guo, Meijun	School of Mechatronical Engineering, Beijing Institute of Techno
Shi, Yongliang	Qiyuan Lab
Liu, Caiyun	Institute for AI Industry Research, Tsinghua University
Feng, Yixiao	Qiyuan Lab
Ma, Ming	Qiyuan Lab
Tinghai, Yan	Qiyuan Lab
Lu, Weining	Tsinghua University
Liang, Bin	Tsinghua University
Keywords: Mapping, Sensor Fusion, Simulation and Animation Abstract: 3D Gaussian Splatting (3DGS) has emerged as a key rendering pipeline for digital asset creation due to its balance between efficiency and visual quality. To address the issues of unstable pose estimation and scene representation distortion caused by geometric texture inconsistency in large outdoor scenes with weak or repetitive textures, we approach the problem from two aspects: pose estimation and scene representation. For pose estimation, we leverage LiDAR-IMU Odometry to provide prior poses for cameras in large-scale environments. These prior pose constraints are incorporated into COLMAP’s triangulation process, with pose optimization performed via bundle adjustment. Ensuring consistency between pixel data association and prior poses helps maintain both robustness and accuracy. For scene representation, we introduce normal vector constraints and effective rank regularization to enforce consistency in the direction and shape of Gaussian primitives. These constraints are jointly optimized with the existing photometric loss to enhance the map quality. We evaluate our approach using both public and self-collected datasets. In terms of pose optimization, our method requires only one-third of the time while maintaining accuracy and robustness across both datasets. In terms of scene representation, the results show that our method significantly outperforms conventional 3DGS pipelines. Notably, on self-collected datasets characterized by weak or repetitive textures, our approach demonstrates enhanced visualization capabilities and achieves superior overall performance.

15:25-15:30, Paper TuCT15.6
CIT*: Context-Based Biased Batch-Sampling for Almost-Surely Asymptotically Optimal Motion Planning

Zhang, Liding	Technical University of Munich
Wei, Yankun	Technical University of Munich
Cai, Kuanqi	Technical University of Munich
Bing, Zhenshan	Technical University of Munich
Meng, Yuan	Technical University of Munich
Wu, Fan	Technical University of Munich
Haddadin, Sami	Mohamed Bin Zayed University of Artificial Intelligence
Knoll, Alois	Tech. Univ. Muenchen TUM
Keywords: Motion and Path Planning, Manipulation Planning, Task and Motion Planning Abstract: This paper introduces Context Informed Trees (CIT), a sampling-based motion planning algorithm that enhances exploration efficiency by biasing sampling based on uncertainty estimation from local samples and connectivity information obtained during the search process. CIT is based on Flexible Informed Trees (FIT) and incorporates three key components: region-based sampling, uncertainty-driven weighting, and connection-greedy prioritization (CGP). It generates regions from sampled states based on local obstacle proximity, assigning weights to these regions using probability uncertainty estimation via kernel density estimation (KDE) classification. To further refine the sampling focus, CGP prioritizes regions that exhibit strong connectivity in previous searches, ensuring that exploration is directed toward unknown and critical areas that have a higher likelihood of contributing to feasible and efficient paths. The sampling process is then guided by a mixture of Gaussian distributions centered on weighted regions, where the weighting biases sampling toward more critical regions, thereby improving search efficiency and accelerating convergence. Benchmark evaluations demonstrate that CIT improves efficiency by reducing reliance on random sampling, which often leads to slower solution discovery and higher path costs. With biased sampling, CIT* maintains strong performance in solving complex motion planning problems in R^4 to R^16 and has been demonstrated on a real-world manipulation task. A video showcasing our method and experimental results is available at: https://youtu.be/SG2cy9WmjD0.

15:30-15:35, Paper TuCT15.7
Control and Localization of Magnetic Nanorobot Swarms in Human-Sized Vascular Phantom

Wang, Shengyuan	Beihang University
Chen, Zaiyang	The Chinese University of Hong Kong
Zeng, Zijin	Beihang University
Hu, Yunhan	Beihang University
Sun, Hongyan	Beihang University
Li, Chan	Beihang University
Wang, Chutian	Beihang University
Feng, Lin	Beihang University
Keywords: Micro/Nano Robots, Medical Robots and Systems, Simulation and Animation Abstract: Magnetically controlled micro-nano robots hold revolutionary significance in the clinical targeted treatment of brain tumors. Imaging and tracking miniature robots can provide feedback for precise magnetic field control. The cooperation among micro-nano robots, magnetic field control system, and imaging system is a significant challenge for transitioning micro-nano robots from laboratory research to clinical applications. This study explores the control and spatial localization of magnetic nanorobot swarms in a highly realistic, human-sized vascular phantom which is manufactured using the raw CT scan images. The cerebral arterial vessels are the key focus area with four main inlets and twenty-six branch outlets. The simulation results show that, under the influence of a magnetic field, the nanorobots can accumulate at the target tumor site. The Kernelized Correlation Filter (KCF) algorithm was employed to achieve single-plane tracking of nanorobots. Furthermore, based on a biplanar imaging system, three-dimensional spatial trajectory tracking of nanorobots was realized. This study provides a reference for in vivo spatial localization and imaging of magnetic nanorobot swarms (MNRS) transported through vascular system.

15:35-15:40, Paper TuCT15.8
Uncertain-Aware Informative Task Planning and Assignment for Multiple-UUVs Cooperative Underwater Exploration

Jia, Chengfeng	Nanyang Technological University
Liu, Fen	Nanyang Technological University
Lu, Yun	Nanyang Technological University
Zhang, Na	Nanyang Technological University
Su, Rong	Nanyang Technological University
Keywords: Marine Robotics, Probabilistic Inference, Task Planning Abstract: This paper presents an uncertainty-aware exploration framework for cooperative underwater operations using multiple unmanned underwater vehicles (UUVs). The proposed framework leverages prior environmental information to iteratively integrate task planning, task assignment, and belief updating, enabling efficient exploration in unknown underwater environments. An interest area selection strategy is introduced to balance the exploration of uncharted regions and the exploitation of areas with high target likelihood. To optimize task allocation, a simultaneous auction-based mechanism is developed, ensuring maximal information gain while minimizing operational costs. Additionally, to address the computational constraints of UUV systems, a Sparse Gaussian Process (SGP) with variationally optimized inducing points is employed, enabling rapid and accurate fusion of real-time observations with prior data. This approach facilitates dynamic updates of the probabilistic environment representation and interest point selection without compromising accuracy. Experimental results in the HoloOcean simulator demonstrate the framework's effectiveness in refining the probabilistic environment representation, achieving efficient exploration and accurate target detection in complex underwater scenarios. The results highlight the framework's capability to dynamically adapt to environmental uncertainties, showcasing its potential for underwater exploration applications.


TuCT16	207
Human-Robot Collaboration	Regular Session
Chair: Calinon, Sylvain	Idiap Research Institute
Co-Chair: Williams, Tom	Colorado School of Mines

15:00-15:05, Paper TuCT16.1
Enhancing Tactile Sensing in Robotics Using Null-Space Diffusion Model with EIT-Based Sensors

Zhang, Qilin	University of Science and Technology of China
Chen, Haofeng	Czech Technical University in Prague
Yang, Xuanxuan	Chinese Academy of Sciences
Ma, Gang	University of Science and Technology of China
Wang, Xiaojie	Chinese Academy of Sciences
Keywords: Human-Robot Collaboration, Wearable Robotics, Physical Human-Robot Interaction Abstract: Robotic tactile sensors based on Electrical Impedance Tomography (EIT) have gained great attention in robotic sensing applications due to their features such as no internal wiring, ”all-in-one” structure, and continuous sensing capabilities. However, the effectiveness of EIT-based tactile sensors is hampered by limited spatial resolution and artifacts in the reconstructed images. To address these challenges, various iterative optimization methods based on spatial regularizations and model-based methods have been proposed. In this study, a new EIT reconstruction method using null-space decomposition based on a diffusion model (NSDM) is proposed. Specifically, NSDM consists of a forward diffusion process that first gradually adds Gaussian noise to a clean conductivity image, followed by a backward process that learns to predict the noise that should be removed during each sampling step, utilizing a prior to ensure that the denoising process does not deviate from the correct direction. NSDM requires no training, no optimization, and only requires a pre-prepared diffusion model. Experimental results (both simulation and actual tests) demonstrate that the proposed method outperforms existing generation methods and provides higher quality reconstruction, providing a new solution for robotic tactile sensing in real scenarios.

15:05-15:10, Paper TuCT16.2
Human-Robot Collaborative SLAM-XR

Yassine, Karim	American University of Beirut
Sayour, Malak	American University of Beirut
Manasfi, Adam	American University of Beirut
Hachach, Maya	American University of Beirut
Dib, Nadim	Idaho State University
Elhajj, Imad	American University of Beirut
Khoury, Elie	Idealworks
Asmar, Boulos	Idealworks
Asmar, Daniel	American University of Beirut
Keywords: Human-Robot Collaboration, SLAM, Localization Abstract: Abstract—In this paper, we propose a collaborative centralized 3D mapping and localization framework that harnesses the capabilities of both SLAM (Simultaneous Localization And Mapping) and XR (eXtended Reality). On one hand, our framework allows for integrating local maps generated by a multitude of heterogeneous agents (e.g. robots) into a unified map. On the other hand, it allows human intervention at multiple levels: first, humans can inspect and intervene in the mapping process in situ to produce 3D maps, overlay virtual assets, and add annotations, all of which can contribute towards enhanced autonomy and navigation. Second, beyond the mapping aspect, a human can also intervene in the localization task of any collaborating robot by inspecting and correcting its generated paths, and, if necessary, enforcing a desired trajectory. Experiments inside two real settings demonstrated the superiority of the proposed system.

15:10-15:15, Paper TuCT16.3
GEAR: Gaze-Enabled Human-Robot Collaborative Assembly

Shahid, Asad Ali	IDSIA
Moroncelli, Angelo	Università Della Svizzera Italiana, IDSIA
Brscic, Drazen	Kyoto University
Kanda, Takayuki	Kyoto University
Roveda, Loris	SUPSI-IDSIA
Keywords: Human-Robot Collaboration, Assembly, Human Factors and Human-in-the-Loop Abstract: Recent progress in robot autonomy and safety has significantly improved human-robot interactions, enabling robots to work alongside humans on various tasks. However, complex assembly tasks still present significant challenges due to inherent task variability and the need for precise operations. This work explores deploying robots in an assistive role for such tasks, where the robot assists by fetching parts while the skilled worker provides high-level guidance and performs the assembly. We introduce GEAR, a gaze-enabled system designed to enhance human-robot collaboration by allowing robots to respond to the user's gaze. We evaluate GEAR against a touch-based interface where users interact with the robot through a touchscreen. The experimental study involved 30 participants working on two distinct assembly scenarios of varying complexity. Results demonstrated that GEAR enabled participants to accomplish the assembly with reduced physical demand and effort compared to the touchscreen interface, especially for complex tasks, maintaining great performance, and receiving objects effectively. Participants also reported enhanced user experience while performing assembly tasks.

15:15-15:20, Paper TuCT16.4
That’s Iconic! Designing Augmented Reality Iconic Gestures to Enhance Multi-Modal Communication for Morphologically Limited Robots

Zhu, Yifei	Colorado School of Mines
Torres, Alexander	Colorado School of Mines
Aloia, Zane	Fairview High School
Williams, Tom	Colorado School of Mines
Keywords: Human-Robot Collaboration, Gesture, Posture and Facial Expressions Abstract: Robots that use gestures in conjunction with speech can achieve more effective and natural communication with human teammates, however, not all robots have capable and dexterous arms. Augmented Reality technology has effectively enabled deictic gestures for morphologically limited robots in prior work, however, the design space of AR-facilitated iconic gestures remains under-explored. Moreover, existing work largely focuses on closed-world context, where all referents are known a priori. In this work, we present a human-subject study situated in an open-world context, and compare the task performance and subjective perception associated with three different iconic gesture designs (anthropomorphic, non-anthropomorphic, deictic-iconic) against previously studied abstract gesture design. Our quantitative and qualitative results demonstrate that deictic iconic gestures (in which a robot hand is shown pointing to a visualization of a target referent) outperforms all other gestures on all metrics – but that non-anthropomorphic iconic gestures (where a visualization of a target referent appears on its own) is overall most preferred by users. These results represent a significant step to enabling effective human-robot interactions in realistic large-scale open-world environments.

15:20-15:25, Paper TuCT16.5
Physics-Informed Learning for Human Whole-Body Kinematics Prediction Via Sparse IMUs

Guo, Cheng	Italian Institute of Technology
L'Erario, Giuseppe	Istituto Italiano Di Tecnologia
Romualdi, Giulio	Istituto Italiano Di Tecnologia
Leonori, Mattia	Istituto Italiano Di Tecnologia
Lorenzini, Marta	Istituto Italiano Di Tecnologia
Ajoudani, Arash	Istituto Italiano Di Tecnologia
Pucci, Daniele	Italian Institute of Technology
Keywords: Human-Robot Collaboration, Intention Recognition, Human Detection and Tracking Abstract: Accurate and physically feasible human motion prediction is crucial for safe and seamless human-robot collaboration. While recent advancements in human motion capture enable real-time pose estimation, the practical value of many existing approaches is limited by the lack of future predictions and consideration of physical constraints. Conventional motion prediction schemes rely heavily on past poses, which are not always available in real-world scenarios. To address these limitations, we present a physics-informed learning framework that integrates domain knowledge into both training and inference to predict human motion using inertial measurements from only 5 IMUs. We propose a network that accounts for the spatial characteristics of human movements. During training, we incorporate forward and differential kinematics functions as additional loss components to regularize the learned joint predictions. At the inference stage, we refine the prediction from the previous iteration to update a joint state buffer, which is used as extra inputs to the network. Experimental results demonstrate that our approach achieves high accuracy, smooth transitions between motions, and generalizes well to unseen subjects. The source code and data are available at https : //github.com/ami−iit/paper_guo_2025_iros_human_kinema tics_prediction.

15:25-15:30, Paper TuCT16.6
IDAGC: Adaptive Generalized Human-Robot Collaboration Via Human Intent Estimation and Multimodal Policy Learning

Liu, Haotian	Institute of Automation, Chinese Academy of Sciences
Tong, Yuchuang	The Institute of Automation of the Chinese Academy of Sciences
Liu, Guanchen	Institute of Automation, Chinese Academy of Sciences
Ju, Zhaojie	University of Portsmouth
Zhang, Zhengtao	Institute of Automation, Chinese Academy of Sciences
Keywords: Human-Robot Collaboration, Intention Recognition, Multi-Modal Perception for HRI Abstract: In Human-Robot Collaboration (HRC), which encompasses physical interaction and remote cooperation, accurate estimation of human intentions and seamless switching of collaboration modes to adjust robot behavior remain paramount challenges. To address these issues, we propose an Intent-Driven Adaptive Generalized Collaboration (IDAGC) framework that leverages multimodal data and human intent estimation to facilitate adaptive policy learning across multi-tasks in diverse scenarios, thereby facilitating autonomous inference of collaboration modes and dynamic adjustment of robotic actions. This framework overcomes the limitations of existing HRC methods, which are typically restricted to a single collaboration mode and lack the capacity to identify and transition between diverse states. Central to our framework is a predictive model that captures the interdependencies among vision, language, force, and robot state data to accurately recognize human intentions with a Conditional Variational Autoencoder (CVAE) and automatically switch collaboration modes. By employing dedicated encoders for each modality and integrating extracted features through a Transformer decoder, the framework efficiently learns multi-task policies, while force data optimizes compliance control and intent estimation accuracy during physical interactions. Experiments highlights our framework's practical potential to advance the comprehensive development of HRC.

15:30-15:35, Paper TuCT16.7
Whole-Body Impedance Control of a Humanoid Robot Based on Human-Human Demonstration for Human-Robot Collaboration

Li, Chenzui	The Chinese University of Hong Kong
Liu, Junjia	The Chinese University of Hong Kong
Teng, Tao	The Chinese University of Hong Kong & Hong Kong Centre for Logis
Wang, Shixiong	The Chinese University of Hong Kong
Calinon, Sylvain	Idiap Research Institute
Chen, Fei	T-Stone Robotics Institute, the Chinese University of Hong Kong
Keywords: Human-Robot Collaboration, Compliance and Impedance Control, Learning from Demonstration Abstract: This paper proposes a novel whole-body impedance control method for the Collaborative dUal-arm Robot manIpulator (CURI) in Human-Robot Collaboration (HRC). The method enables CURI to adapt its physical behavior to human motion while following trajectories learned from human-human demonstrations. A whole-body impedance controller coordinates the robot joints to achieve desired Cartesian space impedance. Collaborative tasks are captured from human-human demonstrations and represented using a Task-parameterized Gaussian Mixture Model (TP-GMM). Electromyography (EMG) sensors record muscle activities to estimate human impedance profiles, which are then mimicked by a variable impedance controller. An adaptive parameter is introduced to adjust robot stiffness based on spatial displacement between the robot and human, ensuring safe and efficient interaction. Experimental validation through confrontational Tai Chi pulling/pushing tasks demonstrates the superiority of the proposed adaptive impedance method over the fixed impedance controller.

15:35-15:40, Paper TuCT16.8
Enhancing Performance in Human-Robot Collaboration: A Modular Architecture for Task Scheduling and Safe Trajectory Planning (I)

Pupa, Andrea	University of Modena and Reggio Emilia
Comari, Simone	University of Bologna
Arrfou, Mohammad	Datalogic
Andreoni, Gildo	Datalogic
Carapia, Alessandro	IMA SpA
Carricato, Marco	University of Bologna
Secchi, Cristian	Univ. of Modena & Reggio Emilia
Keywords: Human-Robot Collaboration, Robot Safety, Task and Motion Planning Abstract: The integration of robots into shared workspaces alongside humans is the basis of Human-Robot Collaboration (HRC). This field of research has changed the paradigm of the industrial context, making HRC of pivotal importance for both researchers and the industry. In this context, a suitable task scheduling and trajectory planning strategy are crucial to achieve good performances and create a synergy between the two actors. Indeed, the task scheduling should be able to optimally distribute the tasks between the actors and recover from possible failures, i.e. by rescheduling the tasks. The trajectory planning strategy must comply with the safety standards that impose a reduction of velocity based on human behaviour. To this end, the monitoring system must also be safe-certified; otherwise, safety cannot be guaranteed. This paper proposes a novel architecture that integrates a dynamic task scheduling module with a dynamic trajectory planning module that explicitly considers ISO/TS 15066. For this purpose, the framework exploits a secure and certified monitoring system capable of tracking the human operator even in case of occlusions. The overall platform has been extensively validated both in a real and complex industrial scenario within the context of the ROSSINI EU project, where a dual-arm mobile robot collaborates with a human operator in an automatic machine-tending operation, and in a mock-up scenario.


TuCT17	210A
Autonomous Vehicles 2	Regular Session
Chair: Zhang, Wei	Eastern Institute of Technology, Ningbo
Co-Chair: Zhang, Dong	School of Information Science and Technology, Beijing University of Chemical Technology

15:00-15:05, Paper TuCT17.1
MAER-Nav: Bidirectional Motion Learning through Mirror-Augmented Experience Replay for Robot Navigation

Wang, Shanze	The Hong Kong Polytechnic University
Tan, Mingao	Eastern Institute of Technology, Ningbo
Yang, Zhibo	National University of Singapore
Huang, Biao	Harbin Institute of Technology, Shenzhen
Shen, Xiaoyu	Eastern Institute of Technology, Ningbo, China
Huang, Hailong	The Hong Kong Polytechnic University
Zhang, Wei	Eastern Institute of Technology, Ningbo
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, AI-Enabled Robotics Abstract: Deep Reinforcement Learning (DRL) based navigation methods have demonstrated promising results for mobile robots, but suffer from limited action flexibility in confined spaces. Conventional DRL approaches predominantly learn forward-motion policies, causing robots to become trapped in complex environments where backward maneuvers are necessary for recovery. This paper presents MAER-Nav (Mirror-Augmented Experience Replay for Robot Navigation), a novel framework that enables bidirectional motion learning without requiring explicit failure-driven hindsight experience replay or reward function modifications. Our approach integrates a mirror-augmented experience replay mechanism with curriculum learning to generate synthetic backward navigation experiences from successful trajectories. Experimental results in both simulation and real-world environments demonstrate that MAER-Nav significantly outperforms state-of-the-art methods while maintaining strong forward navigation capabilities. The framework effectively bridges the gap between the comprehensive action space utilization of traditional planning methods and the environmental adaptability of learning-based approaches, enabling robust navigation in scenarios where conventional DRL methods consistently fail.

15:05-15:10, Paper TuCT17.2
Dynamic Residual Safe Reinforcement Learning for Multi-Agent Safety-Critical Scenarios Decision-Making

Wang, Kaifeng	Beijing Institute of Technology
Chen, Yinsong	Beijing Institute of Technology
Liu, Qi	Beijing Institute of Technology
Li, Xueyuan	Beijing Institute of Technology
Gao, Xin	Beijing Institute of Technology
Keywords: Autonomous Vehicle Navigation, Reinforcement Learning, Robot Safety Abstract: In multi-agent safety-critical scenarios, traditional autonomous driving frameworks face significant challenges in balancing safety constraints and task performance. These frameworks struggle to quantify dynamic interaction risks in real-time and depend heavily on manual rules, resulting in low computational efficiency and conservative strategies. To address these limitations, we propose a Dynamic Residual Safe Reinforcement Learning (DRS-RL) framework grounded in a safety-enhanced networked Markov decision process. It's the first time that the weak-to-strong theory is introduced into multi-agent decision-making, enabling lightweight dynamic calibration of safety boundaries via a weak-to-strong safety correction paradigm. Based on the multi-agent dynamic conflict zone model, our framework accurately captures spatiotemporal coupling risks among heterogeneous traffic participants and surpasses the static constraints of conventional geometric rules. Moreover, a risk-aware prioritized experience replay mechanism mitigates data distribution bias by mapping risk to sampling probability. Experimental results reveal that the proposed method significantly outperforms traditional RL algorithms in safety, efficiency, and comfort. Specifically, it reduces the collision rate by up to 92.17%, while the safety model accounts for merely 27% of the main model's parameters.

15:10-15:15, Paper TuCT17.3
Accelerating Real-World Overtaking in F1TENTH Racing Employing Reinforcement Learning Methods

Steiner, Emily	University of Auckland
Van der Spuy, Daniel	University of Auckland
Zhou, Futian	University of Auckland
Pama, Afereti	University of Auckland
Liarokapis, Minas	The University of Auckland
Williams, Henry	University of Auckland
Keywords: Autonomous Vehicle Navigation, Reinforcement Learning, Collision Avoidance Abstract: While autonomous racing performance in Time-Trial scenarios has seen significant progress and development, autonomous wheel-to-wheel racing and overtaking are still severely limited. These limitations are particularly apparent in real-life driving scenarios where state-of-the-art algorithms struggle to safely or reliably complete overtaking manoeuvres. This is important, as reliable navigation around other vehicles is vital for safe autonomous wheel-to-wheel racing. The F1Tenth Competition provides a useful opportunity for developing wheel-to-wheel racing algorithms on a standardised physical platform. The competition format makes it possible to evaluate overtaking and wheel-to-wheel racing algorithms against the state-of-the-art. This research presents a novel racing and overtaking agent capable of learning to reliably navigate a track and overtake opponents in both simulation and reality. The agent was deployed on an F1Tenth vehicle and competed against opponents running varying competitive algorithms in the real world. The results demonstrate that the agent’s training against opponents enables deliberate overtaking behaviours with an overtaking rate of 87% compared 56% for an agent trained just to race.

15:15-15:20, Paper TuCT17.4
AVP Scene Graph: Hierarchical Visual Language Mapping and Navigation for Autonomous Valet Parking

Mu, Xiangru	Shanghai Jiao Tong University
Chen, Fengyi	Shanghai Jiao Tong University
Wang, Runhan	Shanghai Jiao Tong University
Chen, Siyuan	Shanghai JiaoTong University
Cai, Jiyuan	Shanghai Jiao Tong University
Cai, Jia	Northwestern Polytechnical University
Yang, Ming	Shanghai Jiao Tong University
Qin, Tong	Shanghai Jiao Tong University
Keywords: Mapping, Autonomous Vehicle Navigation, Vision-Based Navigation Abstract: Autonomous valet parking (AVP) aims to help the human drivers navigate to the desired location in the parking lot. Currently, the AVP task is not flexible enough to perform the open-vocabulary navigation tasks such as ``navigate to the exit" or ``park near the elevator". The widely used map formats for AVP like vectorized maps have some limitations including limited semantics, high cost and poor human-machine interaction, restricting the flexible application of AVP in complex scenarios. To address these problems, we propose AVP Scene Graph (AVP-SG), a hierarchical visual language mapping and navigation framework for open-vocabulary AVP tasks, which enables autonomous navigation from multi-modal human instructions. Our framework consists of two parts: a bottom-up mapping module and a top-down navigation module. In the mapping pipeline, assisted by the vision-language model (VLM) and optical character recognition (OCR) model, we first extract open-vocabulary conceptual semantics from images and project them to the elements of map. Next, by the bottom-up scheme performing feature fusion layer by layer, the scene graph is built hierarchically, consisting of slot, lane, block, and garage layer. In the top-down navigation pipeline, the navigation goal can be efficiently found by an LLM-enhanced graph retrieval approach. Experiments on real-world AVP tasks prove that the self-driving vehicle can perform open-vocabulary AVP tasks successfully utilizing the AVP-SG.

15:20-15:25, Paper TuCT17.5
HiTail: Hierarchical Neural Planner for Adaptive and Flexible Long-Tail Trajectory Planning

Zhang, Shenghong	Shanghai Jiao Tong University
Zhou, Xiangyu	Shanghai Jiao Tong University
Li, Xiao	Shanghai Jiaotong University
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning Abstract: A planner for autonomous vehicles must be capable of operating in diverse and complex real-world environments. However, learning-based planners often struggle with limited generalization due to the long-tail distribution in datasets. Moreover, the black-box nature of neural networks limits their interpretability and complicates the integration of explicit rules. In this work, we propose a hierarchical neural trajectory planner that takes the bird’s-eye view (BEV) rasters as input. The planner operates in two hierarchical phases: first, spatial proposals are sampled from a policy generated from interpretable learned reward maps, and second, learnable temporal velocity profiles are assigned to the spatial proposals using clothoid curves. We conduct training and closed-loop simulation on the nuPlan dataset. The results demonstrate that our proposed planner outperforms other learning-based methods, exhibiting superior adaptability in long-tail scenarios. Additionally, we explore the flexibility of our planner in integrating manually defined rule sets. Project website: https://iunone.github.io/HiTail

15:25-15:30, Paper TuCT17.6
MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous Driving

Wang, Xiyang	Chongqing University
Qi, Shouzheng	National University of Defense Technology
Zhao, Jieyou	Sichuan University
Zhou, Hangning	Mach-Drive
Zhang, Siyu	Mach
Wang, Guoan	Mach Drive
Kai, Tu	Mach
Guo, Songlin	Mach Drive
Zhao, Jianbo	University of Science and Technology of China
Li, Jian	National University of Defense Technology
Qin, Hailong	National University of Singapore
Yang, Mu	Megvii Inc
Keywords: Autonomous Vehicle Navigation, Intelligent Transportation Systems, Sensor Fusion Abstract: This paper introduces MCTrack, a new 3D multi-object tracking method that achieves state-of-the-art (SOTA)performance across KITTI, nuScenes, and Waymo datasets. Addressing the gap in existing tracking paradigms, which often perform well on specific datasets but lack generalizability, MCTrack offers a unified solution. Additionally, we have standardized the format of perceptual results across various datasets, termed BaseVersion, facilitating researchers in the field of multi-object tracking (MOT) to concentrate on the core algorithmic development without the undue burden of data preprocessing. Finally, recognizing the limitations of current evaluation metrics, we introduce a novel set of metrics designed to evaluate the output of motion information, including velocity and acceleration, which are essential for subsequent tasks. The source codes of the proposed method are available at this link:https://github.com/megvii-research/MCTrack

15:30-15:35, Paper TuCT17.7
Decremental Dynamics Planning for Robot Navigation

Lu, Yuanjie	George Mason University
Xu, Tong	George Mason University
Wang, Linji	George Mason University
Hawes, Nick	University of Oxford
Xiao, Xuesu	George Mason University
Keywords: Autonomous Vehicle Navigation, Dynamics, Motion and Path Planning Abstract: Most, if not all, robot navigation systems employ a decomposed planning framework that includes global and local planning. To trade-off onboard computation and plan quality, current systems have to limit all robot dynamics considerations only within the local planner, while leveraging an extremely simplified robot representation (e.g., a point-mass holonomic model without dynamics) in the global level. However, such an artificial decomposition based on either full or zero consideration of robot dynamics can lead to gaps between the two levels, e.g., a global path based on a holonomic point-mass model may not be realizable by a non-holonomic robot, especially in highly constrained obstacle environments. Motivated by such a limitation, we propose a novel paradigm, Decremental Dynamics Planning (DDP), that integrates dynamic constraints into the entire planning process, with a focus on high-fidelity dynamics modeling at the beginning and a gradual fidelity reduction as the planning progresses. To validate the effectiveness of this paradigm, we augment three different planners with DDP and show overall improved planning performance. We also develop a new DDP-based navigation system, which achieves the first place in the simulation phase of the 2025 BARN Challenge. Both simulated and physical experiments validate DDP's hypothesized benefits.


TuCT18	210B
Multi-Robot Systems 3	Regular Session
Chair: Jiang, Yiming	Hunan University
Co-Chair: Miao, Zhiqiang	Hunan University

15:00-15:05, Paper TuCT18.1
GNN-Based Decentralized Perception in Multirobot Systems for Predicting Worker Actions

Imran, Ali	École De Technologie Supérieure ÉTS
Beltrame, Giovanni	Ecole Polytechnique De Montreal
St-Onge, David	Ecole De Technologie Superieure
Keywords: Multi-Robot Systems, Deep Learning for Visual Perception, Intention Recognition Abstract: In industrial environments, predicting human actions is essential for ensuring safe and effective collaboration between humans and robots. This paper introduces a perception framework that enables mobile robots to understand and share information about human actions in a decentralized way. The framework first allows each robot to build a spatial graph representing its surroundings, which it then shares with other robots. This shared spatial data is combined with temporal information to track human behavior over time. A swarm-inspired decision-making process is used to ensure all robots agree on a unified interpretation of the human’s actions. Results show that adding more robots and incorporating longer time sequences improve prediction accuracy. Additionally, the consensus mechanism increases system resilience, making the multi-robot setup more reliable in dynamic industrial settings.

15:05-15:10, Paper TuCT18.2
Decentralized Admittance Control for a Multi–manipulator System: Implementation and Analysis

Carriero, Graziano	University of Basilicata
Sileo, Monica	University of Basilicata
Fregnan, Sebastiano	Lund University
Guberina, Marko	Lund University
Pierri, Francesco	Università Della Basilicata
Caccavale, Fabrizio	Università Degli Studi Della Basilicata
Karayiannidis, Yiannis	Lund University
Keywords: Multi-Robot Systems, Distributed Robot Systems, Cooperating Robots Abstract: A decentralized strategy for object transportation is presented, assuming that the object is grasped by a team of N cooperative manipulators. The proposed strategy consists of two steps. First, each robot estimates the wrenches applied to the object by all the others robots, even without all-to-all communication. Second, an admittance control scheme is used to limit internal wrenches, preventing excessive stresses that could affect manipulation stability and object integrity. Stability is proven under the assumption of a spring connection between each robot end-effector and its grasping point on the object. A work cell with two 7-degree-of-freedom (DOF) and one 6-DOF robotic manipulators was used to validate the strategy. Experimental results show that the controller effectively reduces internal wrenches, confirming the feasibility and robustness of the decentralized approach in cooperative manipulation.

15:10-15:15, Paper TuCT18.3
Autonomous 3D Moving Target Encirclement and Interception with Range Measurement

Liu, Fen	Nanyang Technological University
Yuan, Shenghai	Nanyang Technological University
Nguyen, Thien-Minh	Nanyang Technological University
Su, Rong	Nanyang Technological University
Keywords: Multi-Robot Systems, Localization, Motion and Path Planning Abstract: Small UAVs pose a growing threat, capable of carrying hazardous payloads or causing damage. To counter this, we introduce an autonomous 3D target encirclement and interception strategy. Unlike traditional ground-guided systems, this strategy employs autonomous drones to track and engage non-cooperative hostile UAVs, which is effective in non-line-of-sight conditions, GPS denial, and radar jamming, where conventional detection and neutralization from ground-guidance fail. Using two real-time noisy distances measured by drones, guardian drones estimate the relative position from their own to the target using observation and velocity compensation methods, based on anti-synchronization (AS) and an X-Y circular motion combined with vertical jitter. An encirclement control mechanism is proposed to enable UAVs to adaptively transition from encircling and protecting a target to encircling and monitoring a hostile target. Upon breaching a warning threshold, the UAVs may even employ a suicide attack to neutralize the hostile target. We validate this strategy through real-world UAV experiments and simulated analysis in MATLAB, demonstrating its effectiveness in detecting, encircling, and intercepting hostile drones. More details: url{https://youtu.be/L2435lyiBGo}.

15:15-15:20, Paper TuCT18.4
Merry-Go-Round: Safe Control of Decentralized Multi-Robot Systems with Deadlock Prevention

Lee, Wonjong	Sogang University
Sim, Joonyeol	Sogang University
Kim, Joonkyung	Sogang University
Jo, Siwon	University of North Carolina at Charlotte
Luo, Wenhao	University of Illinois Chicago
Nam, Changjoo	Sogang University
Keywords: Multi-Robot Systems, Distributed Robot Systems Abstract: We propose a hybrid approach for decentralized multi-robot navigation that ensures both safety and deadlock prevention. Building on a standard QP-based control formulation, we add a lightweight deadlock resolution mechanism by forming temporary ``roundabouts'' (circular reference paths). Each robot relies only on local, peer-to-peer communication and a controller for base collision avoidance; a roundabout is generated or joined on demand to avert deadlocks. Robots in the roundabout travel counterclockwise until an escape condition is met, allowing them to return to goal-oriented motion. Unlike classical decentralized methods that lack explicit deadlock resolution, our roundabout maneuver ensures system-wide forward progr ess while preserving safety constraints. Extensive simulations and physical robot experiments show that our method consistently outperforms or matches the success and arrival rates of other decentralized control approaches, particularly in cluttered or high-density scenarios, all with minimal centralized coordination.

15:20-15:25, Paper TuCT18.5
DIBNN: A Dual-Improved-BNN Based Algorithm for Multi-Robot Cooperative Area Search in Complex Obstacle Environments (I)

Chen, Bo	Hunan University
Zhang, Hui	Hunan University
Zhang, Fangfang	Zhengzhou University
Jiang, Yiming	Hunan University
Miao, Zhiqiang	Hunan University
Yu, Hongnian	Edinburgh Napier University
Wang, Yaonan	Hunan University
Keywords: Multi-Robot Systems, Distributed Robot Systems, Task and Motion Planning Abstract: Aiming at the area search task of a multi-robot system in an unknown complex obstacle environment, we propose a cooperative area search algorithm based on a dual improved bio-inspired neural network (DIBNN). First, we improve the BNN model to reduce the interference of the complex obstacle environment on robot decision making. Each robot generally chooses the neuron with the largest sum of surrounding activity values among adjacent neurons as its next movement position. Then, we propose a collaborative search mechanism. When a robot falls into a local deadlock state in the complex obstacle environment, the mechanism will guide the robot to quickly find unsearched areas. Finally, we conduct multi-robot area search simulation experiments under different obstacle environments and compare them with three baseline algorithms in this field. The simulation results verify that the proposed algorithm can efficiently guide the multi-robot to complete the area search task in the complex obstacle environment.

15:25-15:30, Paper TuCT18.6
Distributed Coverage Control for Time-Varying Spatial Processes

Pratissoli, Federico	Università Degli Studi Di Modena E Reggio Emilia
Mantovani, Mattia	University of Modena and Reggio Emilia
Prorok, Amanda	University of Cambridge
Sabattini, Lorenzo	University of Modena and Reggio Emilia
Keywords: Multi-Robot Systems, Distributed Robot Systems, Networked Robots, Sensor Networks Abstract: Multi-robot systems are essential for environmental monitoring, particularly for tracking spatial phenomena like pollution, soil minerals, and water salinity, and more. This study addresses the challenge of deploying a multi-robot team for optimal coverage in environments where the density distribution, describing areas of interest, is unknown and changes over time. We propose a fully distributed control strategy that uses Gaussian Processes (GPs) to model the spatial field and balance the trade-off between learning the field and optimally covering it. Unlike existing approaches, we address a more realistic scenario by handling time-varying spatial fields, where the exploration-exploitation trade-off is dynamically adjusted over time. Each robot operates locally, using only its own collected data and the information shared by the neighboring robots. To address the computational limits of GPs, the algorithm efficiently manages the volume of data by selecting only the most relevant samples for the process estimation. The performance of the proposed algorithm is evaluated through several simulations and experiments, incorporating real-world data phenomena to validate its effecti

15:30-15:35, Paper TuCT18.7
Distributed Fault-Tolerant Multi-Robot Cooperative Localization in Adversarial Environments

Kargar Tasooji, Tohid	University of Georgia
Parasuraman, Ramviyas	University of Georgia
Keywords: Localization, Multi-Robot Systems, Robust/Adaptive Control Abstract: In multi-robot systems (MRS), cooperative localization is a crucial task for enhancing system robustness and scalability, especially in GPS-denied or communication-limited environments. However, adversarial attacks, such as sensor manipulation, and communication jamming, pose significant challenges to the performance of traditional localization methods. In this paper, we propose a novel distributed fault-tolerant cooperative localization framework to enhance resilience against sensor and communication disruptions in adversarial environments. We introduce an adaptive event-triggered communication strategy that dynamically adjusts communication thresholds based on real-time sensing and communication quality. This strategy ensures optimal performance even in the presence of sensor degradation or communication failure. Furthermore, we conduct a rigorous analysis of the convergence and stability properties of the proposed algorithm, demonstrating its resilience against bounded adversarial zones and maintaining accurate state estimation. Robotarium-based experiment results show that our proposed algorithm significantly outperforms traditional methods in terms of localization accuracy and communication efficiency, particularly in adversarial settings. Our approach offers improved scalability, reliability, and fault tolerance for MRS, making it suitable for large-scale deployments in real-world, challenging environments.

15:35-15:40, Paper TuCT18.8
Semi-Distributed Cross-Modal Air-Ground Relative Localization

Lu, Weining	Tsinghua University
Deer, Bin	Qi Yuan Laboratory
Ma, Lian	QiyuanLab
Ma, Ming	Qiyuan Lab
Ma, Zhihao	QiYuanLab
Chen, Xiangyang	JiangHuai Advanced Technology Center
Wang, Longfei	Jianghuai Advanced Technology Center
Feng, Yixiao	Qiyuan Lab
Jiang, Zhouxian	Qiyuan Lab
Shi, Yongliang	Qiyuan Lab
Liang, Bin	Tsinghua University
Keywords: Localization, Sensor Fusion, Multi-Robot Systems Abstract: Efficient, accurate, and flexible relative localization is crucial in air-ground collaborative tasks. However, current approaches for robot relative localization are primarily realized in the form of distributed multi-robot SLAM systems with the same sensor configuration, which are tightly coupled with the state estimation of all robots, limiting both flexibility and accuracy. To this end, we fully leverage the high capacity of Unmanned Ground Vehicle (UGV) to integrate multiple sensors, enabling a semi-distributed cross-modal air-ground relative localization framework. In this work, both the UGV and the Unmanned Aerial Vehicle (UAV) independently perform SLAM while extracting deep learning-based keypoints and global descriptors, which decouples the relative localization from the state estimation of all agents. The UGV employs a local Bundle Adjustment (BA) with LiDAR, camera, and an IMU to rapidly obtain accurate relative pose estimates. The BA process adopts sparse keypoint optimization and is divided into two stages: First, optimizing camera poses interpolated from LiDAR-Inertial Odometry (LIO), followed by estimating the relative camera poses between the UGV and UAV. Additionally, we implement an incremental loop closure detection algorithm using deep learning-based descriptors to maintain and retrieve keyframes efficiently. Experimental results demonstrate that our method achieves outstanding performance in both accuracy and efficiency. Unlike traditional multi-robot SLAM approaches that transmit images or point clouds, our method only transmits keypoint pixels and their descriptors, effectively constraining the communication bandwidth under 0.3 Mbps. Codes and data will be publicly available on https://github.com/Ascbpiac/cross-model-relative-localization


TuCT19	210C
Grasping 3	Regular Session
Chair: Zhang, Wenzeng	Tsinghua University
Co-Chair: Naceri, Abdeldjallil	Technical University of Munich

15:00-15:05, Paper TuCT19.1
FoundationGrasp: Generalizable Task-Oriented Grasping with Foundation Models (I)

Tang, Chao	Southern University of Science and Technology
Huang, Dehao	Southern University of Science and Technology
Dong, Wenlong	Southern University of Science and Technology
Xu, Ruinian	Georgia Institute of Technology
Zhang, Hong	SUSTech
Keywords: Grasping, Perception for Grasping and Manipulation, Deep Learning in Grasping and Manipulation Abstract: Task-oriented grasping (TOG), which refers to synthesizing grasps on an object that are configurationally compatible with the downstream manipulation task, is the first milestone towards tool manipulation. Analogous to the activation of two brain regions responsible for semantic and geometric reasoning during cognitive processes, modeling the intricate relationship between objects, tasks, and grasps necessitates rich semantic and geometric prior knowledge about these elements. Existing methods typically restrict the prior knowledge to a closed-set scope, limiting their generalization to novel objects and tasks out of the training set. To address such a limitation, we propose FoundationGrasp, a foundation model-based TOG framework that leverages the open-ended knowledge from foundation models to learn generalizable TOG skills. Extensive experiments are conducted on the contributed Language and Vision Augmented TaskGrasp (LaViA-TaskGrasp) dataset, demonstrating the superiority of FoundationGrasp over existing methods when generalizing to novel object instances, object classes, and tasks out of the training set. Furthermore, the effectiveness of FoundationGrasp is validated in real-robot grasping and manipulation experiments on a 7-DoF robotic arm. Our code, data, appendix, and video are publicly available at https://sites.google.com/view/foundationgrasp.

15:05-15:10, Paper TuCT19.2
Learning Wrist Policies for Anthropomorphic Soft Power Grasping in Handle and Door Manipulation

Voigt, Florian	Technical University of Munich
Naceri, Abdeldjallil	Technical University of Munich
Haddadin, Sami	Mohamed Bin Zayed University of Artificial Intelligence
Keywords: Grasping, Dexterous Manipulation, Service Robots, Modeling, Control, and Learning for Soft Robots Abstract: In this work, we advance robotic grasping by incorporating wrist compliance in a unified hand–arm system inspired by human limb coordination. This integration improves grasping reliability and robustness through impedance and force learning in robotic arms. The compliant wrist system effectively compensates for uncertainties in object position and orientation. Employing a combined impedance-force control approach, we address diverse grasping and manipulation tasks in simulation. Successfully transferring the learned policy to a service humanoid mobile robot enables the seamless execution of grasping and opening tasks for various doors and handles without additional learning, using both fully actuated and underactuated robotic hands. Remarkably, our robust strategies yielded only one failure in 30 trials for the underactuated hand, even with up to 8 cm translation normal to the handle and 33◦ rotation errors, and no failures for the fully actuated one with up to 12 cm translation and 30◦ rotation. This significantly outperforms state-of-the-art end-to-end reinforcement learning approaches. Furthermore, we successfully tested and validated our approach across various constrained everyday tasks in different environments. Our proposed framework represents an advancement in the learning and execution of power grasping with compliant manipulation, achieving practically relevant performance.

15:10-15:15, Paper TuCT19.3
Stochastic Force-Closure Grasp Synthesis for Unknown Objects Using Proximity Perception (I)

Xu, Wei	Shanghai Jiao Tong University
Zhao, Yanchao	Shanghai Jiao Tong Universtiy
Guo, Weichao	Shanghai Jiao Tong University
Sheng, Xinjun	Shanghai Jiao Tong University
Keywords: Grasping, Perception for Grasping and Manipulation, Force and Tactile Sensing Abstract: Proximity perception is a promising technology that provides near-field information valuable for robotics. To improve the functionality of proximity perception in anthropomorphic hands, we propose a novel stochastic force-closure grasp synthesis (SFCGS) that finds robust grasps insensitive to object uncertainty introduced by the lack of explorations and perception noise. Specifically, we propose a dual-mode perception system comprised of five flexible capacitive dual-mode (proximity and pressure) sensors. Using the Gaussian process, we explore unknown objects with proximity perception and approximate their signed distance function (SDF). The SFCGS formulates the problem of finding the probabilistically optimal grasps as minimizing the separation probability of the origin and stochastic grasp wrench space (SGWS). In addition, a dual-mode reactive controller is presented to improve the grasping success rate. The results from simulation experiments indicate that proximity perception can reduce the reconstruction error by 4.53cm compared to tactile perception. Furthermore, the newly introduced SFCGS can yield more uncertainty-insensitive grasps than the traditional force-closure approach. In real-world experiments, the proposed approach achieves a considerable 13.4% improvement in success rate over benchmark methods. The outcomes of this study are significant in promoting the application of proximity perception in robot hand-arm systems and upper-limb prostheses.

15:15-15:20, Paper TuCT19.4
Structured Local Feature-Conditioned 6-DOF Variational Grasp Detection Network in Cluttered Scenes (I)

Liu, Hongyang	Beijing Institute of Technology
Li, Hui	Beijing Institute of Technology
Jiang, Changhua	China Astronaut Research and Training Center
Xue, Shuqi	Astronaut Research and Training Center of China
Zhao, Yan	Beijing Institute of Technology
Huang, Xiao	Beijing Institute of Technology
Jiang, Zhihong	Beijing Institute of Technology
Keywords: Grasping, Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation Abstract: One of the most crucial abilities for robots is to grasp objects accurately in cluttered scenes. This article proposes a structured local feature-conditioned 6-DOF variational grasp detection network (LF-GraspNet) that can generate accurate grasp configurations in cluttered scenes end to end. First, we propose a network using a 3-D convolutional neural network with a conditional variational autoencoder (CVAE) as a backbone. The explorability of the VAE enhances the network's generalizability in grasp detection. Second, we jointly encode the truncated signed distance function (TSDF) of the scene and successful grasp configurations into the global feature as the prior of the latent space of the CVAE. The structured local feature of the TSDF volume is used as the condition of the CVAE, which can then skillfully fuse different modalities and scales of features. Simulation and real-world grasp experiments demonstrate that LF-GraspNet, trained on a grasp dataset with a limited number of primitive objects, achieves better success rates and declutter rates for unseen objects in cluttered scenes than baseline methods. Specifically, in real-world grasp experiments, LF-GraspNet achieves stable grasping of objects in cluttered scenes with single-view and multiview depth images as input, demonstrating its excellent grasp performance and generalization ability from simple primitive objects to complex and unseen objects.

15:20-15:25, Paper TuCT19.5
Adaptive, Rapid, and Stable Trident Robotic Gripper: A Bistable Tensegrity Structure Implementation (I)

Zhang, Jie	Dalian University of Technology
Yang, Hao	Sun Yat-Sen University
Zhao, Yuwen	Sun Yat-Sen University
Yang, Jinzhao	Sun Yat-Sen University
Ozkan-Aydin, Yasemin	University of Notre Dame
Li, Shengkai	Princeton University
Rajabi, Hamed	London South Bank University
Peng, Haijun	Dalian University of Technology
Wu, Jianing	Sun Yat-Sen University
Keywords: Grasping, Biologically-Inspired Robots, Grippers and Other End-Effectors Abstract: Grasping and manipulating objects represents crucial functionalities for modern robotic grippers. Nonetheless, an enduring challenge persists in engineering a gripper capable of achieving adaptive, rapid, and stable grasping behavior simultaneously. Here, we proposed a bistable mechanism derived from compliant tensegrity structures that performs a rapid shape change within 200 ms, based on which a Trident robotic gripper is developed without the need for additional actuation sources. This universal and unactuated paradigm showcases robust grasping abilities with adaptability and rapidity analogous to natural flytrap leaves. The tunable bistable properties, achieved by varying geometrical parameters, endow the gripper with extensive design flexibility. The morphing configuration of the gripper retains residual energy after transitioning between stable states, ensuring stable grasping without requiring a continuous energy supply. The benefits of the gripper empower the creation of a robotic system capable of handling objects with diverse profiles across various fields and an unactuated mechanism proficient in gently grasping and swiftly moving insects.

15:25-15:30, Paper TuCT19.6
Grasp EveryThing (GET): 1-DoF, 3-Fingered Gripper with Tactile Sensing for Robust Grasping

Burgess, Michael	Massahcusetts Institute of Technology
Adelson, Edward	MIT
Keywords: Grippers and Other End-Effectors, Force and Tactile Sensing, Hardware-Software Integration in Robotics Abstract: We introduce the Grasp EveryThing (GET) gripper, a novel 1-DoF, 3-finger design for securely grasping objects of many shapes and sizes. Mounted on a standard parallel jaw actuator, the design features three narrow, tapered fingers arranged in a two-against-one configuration, where the two fingers converge into a V-shape. The GET gripper is more capable of conforming to object geometries and forming secure grasps than traditional designs with two flat fingers. Inspired by the principle of self-similarity, these V-shaped fingers enable secure grasping across a wide range of object sizes. Further to this end, fingers are parametrically designed for convenient resizing and interchangeability across robotic embodiments with a parallel jaw gripper. Additionally, we incorporate a rigid fingernail for ease in manipulating small objects. Tactile sensing can be integrated into the standalone finger via an externally-mounted camera. A neural network was trained to estimate normal force from tactile images with an average validation error of 1.3 N across a diverse set of geometries. In grasping 15 objects and performing 3 tasks via teleoperation, the GET fingers consistently outperformed standard flat fingers. All finger designs, compatible with multiple robotic embodiments, both incorporating and lacking tactile sensing, are available on GitHub.

15:30-15:35, Paper TuCT19.7
A Rigid-Flexible Coupled Bionic Robotic Finger with Perception Decoupling and Slip Detection Capabilities

Xiang, Yuyaocen	Tsinghua Shenzhen International Graduate School
Mao, Baijin	Tsinghua University
Huang, Yedong	Tsinghua University
Yuan, Qiangjing	Tsinghua University
Qu, Juntian	Tsinghua University
Keywords: Grippers and Other End-Effectors, In-Hand Manipulation, Perception for Grasping and Manipulation Abstract: Human fingertips are densely distributed with sensory nerve endings, allowing them to perceive various physical characteristics, including pressure, roughness, etc. In this work, we develop a rigid-flexible coupled bionic robotic finger with perception decoupling and slip detection capabilities. Particularly, slip perception is important in grasping operations. Timely prediction of slippage and adjusting gripping force can improve gripping stability. Fiber Bragg gratings (FBGs) are embedded within both the rigid skeleton and flexible shell of the bionic fingertip. The fibers within the flexible shell are capable of sensing slight pressure, while the optical fibers embedded in the rigid skeleton can measure temperature changes. Firstly, this paper introduces the principles of distributed fiber optic sensors and the morphological design of the bionic fingertip. Then, the fabrication process of the bionic fingertip is described. Finally, we verify the multimodal sensory capabilities of the bionic fingertip through a series of experiments. The results demonstrate that the bionic finger can successfully sense whether the slip has occurred during grasping process. In summary, this rigid-flexible bionic finger is expected to play a significant role in dexterous manipulation, fruit picking and so on.

15:35-15:40, Paper TuCT19.8
SPARK Hand: Scooping-Pinching Adaptive Robotic Hand with Kempe Mechanism for Passive Grasp in Environmental Constraints

Yin, Jiaqi	Harbin Institute of Technology
Bi, TianYi	Southern University of Science and Technology
Zhang, Wenzeng	Shenzhen X-Institute
Keywords: Grippers and Other End-Effectors, Grasping, Mechanism Design Abstract: This paper presents the SPARK Finger, an innovative passive adaptive robotic finger capable of executing both parallel pinching and scooping grasps. The SPARK Finger incorporates a multi-link mechanism with Kempe linkages to achieve a vertical linear fingertip trajectory. Furthermore, a parallelogram linkage ensures the fingertip maintains a fixed orientation relative to the base, facilitating precise and stable manipulation. By integrating these mechanisms with elastic elements, the design enables effective interaction with surfaces, such as tabletops, to handle challenging objects. The finger employs a passive switching mechanism that facilitates seamless transitions between pinching and scooping modes, adapting automatically to various object shapes and environmental constraints without additional actuators. To demonstrate its versatility, the SPARK Hand, equipped with two SPARK Fingers, has been developed. This system exhibits enhanced grasping performance and stability for objects of diverse sizes and shapes, particularly thin and flat objects that are traditionally challenging for conventional grippers. Experimental results validate the effectiveness of the SPARK design, highlighting its potential for robotic manipulation in constrained and dynamic environments.


TuCT20	210D
Humanoid and Bipedal Locomotion 1	Regular Session
Chair: Chen, Wenkai	University of Hamburg
Co-Chair: Hayashibe, Mitsuhiro	Tohoku University

15:00-15:05, Paper TuCT20.1
DCM Modulation: A Three-Axis Rotation Stabilization Technique for Bipedal Locomotion Control

Tazaki, Yuichi	Kobe University
Keywords: Humanoid and Bipedal Locomotion, Body Balancing, Legged Robots Abstract: This paper proposes a simple controller for bipedal locomotion that can stabilize three-axis (roll, pitch, and yaw) rotation without relying on ground reaction moment manipulation. Extra acceleration of the CoM (center-of-mass) from the nominal DCM (divergent component of motion) dynamics generates moment around the CoM. Based on this principle, the behavior of the desired DCM is modulated to carry signal for rotation stabilization. A robust walking controller is synthesized by combining the proposed rotation stabilizer with continuous step adaptation. Simulation study shows that the proposed controller is capable of robust disturbance rejection and yaw-regulated walking of a point-foot robot.

15:05-15:10, Paper TuCT20.2
Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies

Chen, Zixuan	University of California San Diego
He, Xialin	UIUC
Wang, Yen-Jen	University of California, Berkeley
Liao, Qiayuan	University of California, Berkeley
Ze, Yanjie	Stanford University
Li, Zhongyu	University of California, Berkeley
Sastry, Shankar	University of California, Berkeley
Wu, Jiajun	Stanford University
Sreenath, Koushil	University of California, Berkeley
Gupta, Saurabh	UIUC
Peng, Xue Bin	Simon Fraser University
Keywords: Humanoid and Bipedal Locomotion, Machine Learning for Robot Control Abstract: Reinforcement learning combined with sim-to-real transfer offers a general framework for developing locomotion controllers for legged robots. To facilitate successful deployment in the real world, smoothing techniques, such as low-pass filters and smoothness rewards, are often employed to develop policies with smooth behaviors. However, because these techniques are non-differentiable and usually require tedious tuning of a large set of hyperparameters, they tend to require extensive manual tuning for each robotic platform. To address this challenge and establish a general technique for enforcing smooth behaviors, we propose a simple and effective method that imposes a Lipschitz constraint on a learned policy, which we refer to as Lipschitz-Constrained Policies (LCP). We show that the Lipschitz constraint can be implemented in the form of a gradient penalty, which provides a differentiable objective that can be easily incorporated with automatic differentiation frameworks. We demonstrate that LCP effectively replaces the need for smoothing rewards or low-pass filters and can be easily integrated into training frameworks for many distinct humanoid robots. We extensively evaluate LCP in both simulation and real-world humanoid robots, producing smooth and robust locomotion controllers.

15:10-15:15, Paper TuCT20.3
Humanoid Whole-Body Locomotion on Narrow Terrain Via Dynamic Balance and Reinforcement Learning

Xie, Weiji	Shanghai Jiao Tong University
Bai, Chenjia	Institute of Artificial Intelligence (TeleAI), China Telecom
Shi, Jiyuan	TeleAI
Yang, Junkai	Shanghai Jiao Tong University
Ge, Yunfei	TeleAI
Zhang, Weinan	Shanghai Jiao Tong University
Li, Xuelong	Northwestern Polytechnical University
Keywords: Humanoid and Bipedal Locomotion, Reinforcement Learning, Machine Learning for Robot Control Abstract: Humans possess delicate dynamic balance mechanisms that enable them to maintain stability across diverse terrains and under extreme conditions. However, despite significant advances recently, existing locomotion algorithms for humanoid robots are still struggle to traverse extreme environments, especially in cases that lack external perception (e.g., vision or LiDAR). This is because current methods often rely on gait-based or perception-condition rewards, lacking effective mechanisms to handle unobservable obstacles and sudden balance loss. To address this challenge, we propose a novel whole-body locomotion algorithm based on dynamic balance and Reinforcement Learning (RL) that enables humanoid robots to traverse extreme terrains, particularly narrow pathways and unexpected obstacles, using only proprioception. Specifically, we introduce a dynamic balance mechanism by leveraging a novel Zero Moment Point (ZMP)-driven reward and task-driven rewards in a whole-body actor-critic framework, aiming to achieve coordinated actions of the upper and lower limbs for robust locomotion. Experiments conducted on a full-sized Unitree H1-2 robot verify the ability of our method to maintain balance on extremely narrow terrains and under external disturbances, demonstrating its effectiveness in enhancing the robot's adaptability to complex environments. The videos are given at https://whole-body-loco.github.io.

15:15-15:20, Paper TuCT20.4
VMTS: Vision-Assisted Teacher-Student Reinforcement Learning for Multi-Terrain Locomotion in Bipedal Robots

Chen, Fu	Southeast University
Wan, Rui	School of Automation, Southeast University
Liu, Peidong	Southeast University
Zheng, Nanxing	Southeast University
Wang, Bingyi	Southeast University
Zhou, Bo	Southeast University
Keywords: Humanoid and Bipedal Locomotion, Reinforcement Learning, Visual Learning Abstract: Bipedal robots, due to their anthropomorphic design, offer substantial potential across various applications, yet their control is hindered by the complexity of their structure. Currently, most research focuses on proprioception-based methods, which lack the capability to overcome complex terrain. While visual perception is vital for operation in human-centric environments, its integration complicates control further. Recent reinforcement learning (RL) approaches have shown promise in enhancing legged robot locomotion, particularly with proprioception-based methods. However, terrain adaptability, especially for bipedal robots, remains a significant challenge, with most research focusing on flat-terrain scenarios. In this paper, we introduce a novel mixture of experts teacher-student network RL strategy, which enhances the performance of teacher-student policies based on visual inputs through a simple yet effective approach. Our method combines terrain selection strategies with the teacher policy, resulting in superior performance compared to traditional models. Additionally, we introduce an alignment loss between the teacher and student networks, rather than enforcing strict similarity, to improve the student's ability to navigate diverse terrains. We validate our approach experimentally on the Limx Dynamic P1 bipedal robot, demonstrating its feasibility and robustness across multiple terrain types.

15:20-15:25, Paper TuCT20.5
Achieving Precise and Reliable Locomotion with Differentiable Simulation-Based System Identification

Kovalev, Vyacheslav	Moscow Institute of Physics and Technology
Chaikovskaia, Ekaterina	Moscow Institute of Physics and Technology
Davydenko, Egor	Moscow Institute of Physics and Technology
Gorbachev, Roman	Moscow Institute of Physics and Technology
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Reinforcement Learning Abstract: Accurate system identification is crucial for reducing trajectory drift in bipedal locomotion, particularly in reinforcement learning and model-based control. In this paper, we present a novel control framework that integrates system identification into the reinforcement learning training loop using differentiable simulation. Unlike traditional approaches that rely on direct torque measurements, our method estimates system parameters using only trajectory data (positions, velocities) and control inputs. We leverage the differentiable simulator MuJoCo-XLA to optimize system parameters, ensuring that simulated robot behavior closely aligns with real-world motion. This framework enables scalable and flexible parameter optimization. It supports fundamental physical properties such as mass and inertia. Additionally, it handles complex system nonlinear behaviors, including advanced friction models, through neural network approximations. Experimental results show that our framework significantly improves trajectory following. It reduces rotational deviation by 75% and increases travel distance in the commanded direction by 46% compared to a baseline reinforcement learning method.

15:25-15:30, Paper TuCT20.6
Fast Autolearning for Multimodal Walking in Humanoid Robots with Variability of Experience

Figueroa, Nícolas F.	University Montpellier, Pontificia Universidad Católica Del Perú
Tafur, Julio C.	Pontificia Universidad Católica Del Perú
Kheddar, Abderrahmane	CNRS-AIST
Keywords: Humanoid and Bipedal Locomotion, Reinforcement Learning, AI-Based Methods Abstract: Recent advancements in reinforcement learning (RL) and humanoid robotics are rapidly addressing the challenge of adapting to complex, dynamic environments in real time. This letter introduces a novel approach that integrates two key concepts: experience variability (a criterion for detecting changes in loco-manipulation) and experience accumulation (an efficient method for storing acquired experiences based on a selection criterion). These elements are incorporated into the development of RL agents and humanoid robots, with an emphasis on stability. This focus enhances adaptability and efficiency in unpredictable environments. Our approach enables more sophisticated modeling of such environments, significantly improving the system's ability to adapt to real-world complexities. By combining this method with advanced RL techniques, such as Proximal Policy Optimization (PPO) and Model-Agnostic Meta-Learning (MAML), and incorporating self-learning driven by stability, we improve the system's generalization capabilities. This facilitates rapid learning from novel and previously unseen scenarios. We validate our algorithm through both simulations and real-world experiments on the HRP-4 humanoid robot, utilizing an intrinsically stable model predictive controller.

15:30-15:35, Paper TuCT20.7
Stabilizing Humanoid Robot Trajectory Generation Via Physics-Informed Learning and Control-Informed Steering

D'Elia, Evelyn	Italian Institute of Technology
Viceconte, Paolo Maria	Lab0 SRL
Rapetti, Lorenzo	IIT
Ferigo, Diego	Robotics and AI Institute
Romualdi, Giulio	Istituto Italiano Di Tecnologia
L'Erario, Giuseppe	Istituto Italiano Di Tecnologia
Camoriano, Raffaello	Politecnico Di Torino
Pucci, Daniele	Italian Institute of Technology
Keywords: Humanoid and Bipedal Locomotion, Imitation Learning, Deep Learning Methods Abstract: Recent trends in humanoid robot control have successfully employed imitation learning to enable the learned generation of smooth, human-like trajectories from human data. While these approaches make more realistic motions possible, they are limited by the amount of available motion data, and do not incorporate prior knowledge about the physical laws governing the system and its interactions with the environment. Thus they may violate such laws, leading to divergent trajectories and sliding contacts which limit real-world stability. We address such limitations via a two-pronged learning strategy which leverages the known physics of the system and fundamental control principles. First, we encode physics priors during supervised imitation learning to promote trajectory feasibility. Second, we minimize drift at inference time by applying a proportional-integral controller directly to the generated output state. We validate our method on various locomotion behaviors for the ergoCub humanoid robot, where a physics-informed loss encourages zero contact foot velocity. Our experiments demonstrate that the proposed approach is compatible with multiple controllers on a real robot and significantly improves the accuracy and physical constraint conformity of generated trajectories.

15:35-15:40, Paper TuCT20.8
Concerted Control: Modulating Joint Compliance Using GRF for Gait Generation at Different Speeds

Koseki, Shunsuke	Tohoku University
Mohseni, Omid	Technische Universität Darmstadt
Owaki, Dai	Tohoku University
Hayashibe, Mitsuhiro	Tohoku University
Seyfarth, Andre	TU Darmstadt
Ahmad Sharbafi, Maziar	Technical University of Darmstadt
Keywords: Humanoid and Bipedal Locomotion, Modeling and Simulating Humans, Legged Robots Abstract: This paper proposes a bio-inspired, simple, and easy-to-implement walking controller, termed Concerted Control, which leverages a shared common signal to coordinate movements across multiple joints without requiring predefined trajectories. The backbone of this controller is our previously developed Force Modulated Compliance (FMC) control concept, which modulates joint stiffness using ground reaction force (GRF). In Concerted Control, FMC is implemented across multiple joints, allowing implicit coordination through the shared GRF signal in absence of any centralized controller. We tested the performance of this controller on a simulated bipedal walker model and demonstrated that Concerted Control can generate human-like walking gaits across a wide range of speeds, from 0.7 to 1.8 m/s. Additionally, we assessed the robustness of these gaits against external angular momentum perturbations, and the results showed a high level of robustness. Concerted Control offers a promising approach for enhancing the control of bipedal robots and assistive systems.


TuCT21	101
Optimization and Optimal Control 3	Regular Session
Chair: Zhong, Fangxun	The Chinese University of Hong Kong, Shenzhen
Co-Chair: Yue, Linzhu	The Chinese University of Hong Kong

15:00-15:05, Paper TuCT21.1
Simultaneous System Identification and Model Predictive Control with No Dynamic Regret

Zhou, Hongyu	University of Michigan
Tzoumas, Vasileios	University of Michigan, Ann Arbor
Keywords: Optimization and Optimal Control, Learning and Adaptive Systems, Model Learning for Control, Online Learning and Regret Minimization Abstract: We provide an algorithm for the simultaneous system identification and model predictive control of nonlinear systems. The algorithm has finite-time near-optimality guarantees and asymptotically converges to the optimal controller. Particularly, the algorithm enjoys sublinear dynamic regret, defined herein as the suboptimality against an optimal clairvoyant controller that knows how the unknown disturbances and system dynamics will adapt to its actions. The algorithm is self-supervised and applies to control-affine systems with unknown dynamics/disturbances that can be expressed in reproducing kernel Hilbert spaces. Such spaces can model external disturbances and modeling errors that can even be adaptive to the system's state and control input. The algorithm first generates random Fourier features that are used to approximate the unknown dynamics/disturbances. Then, it employs model predictive control based on the current learned model of the unknown dynamics/disturbances. The model is updated online using least squares based on the data collected while controlling the system. We validate our algorithm in hardware experiments and physics-based simulations.

15:05-15:10, Paper TuCT21.2
Antidisturbance Distributed Lyapunov-Based Model Predictive Control for Quadruped Robot Formation Tracking (I)

Nie, Yingxuan	Fudan University
Li, Xiang	Fudan
Keywords: Optimization and Optimal Control, Multi-Robot Systems Abstract: This article proposes a novel distributed Lyapunov-based model predictive control (DLMPC) algorithm for a team of autonomous quadruped robots (AQRs) to achieve the formation tracking. We describe the motion characteristic of the AQR members in a disturbed environment with a limited number of inputs. Within the local framework, an anti-disturbance strategy is designed to improve the control performance, where an observer is presented in both state prediction and auxiliary controller design. We explicitly consider the AQR's dynamic balance and speed limits to enhance the trajectory feasibility. By predicting the neighbors’ behavior, a penalty for inter-AQR collision is introduced with regard to a buffer zone. We analytically address the closed-loop robust stability by the design of a worst-case contraction. Hardware experiments with the real-world distributed control system witness the superior real-time control performance and robustness of our algorithm.

15:10-15:15, Paper TuCT21.3
Precision Autonomous Landing of UAV on High-Speed Vehicles Based on Enhanced Gimbal Stabilization and Smooth Trajectory Generation

Chen, Baijian	Beijing Insititude of Technology
Song, Tao	Beijing Institute of Technology
Ye, Jianchuan	Beijing Institute of Technology
Jiang, Tao	Chongqing University
Jia, Kaixuan	Beijing Institute of Technology
Hu, Kaikun	Beijing Institute of Technology
Keywords: Optimization and Optimal Control, Motion Control, Visual Tracking Abstract: This paper proposes a precision autonomous landing system for unmanned aerial vehicles (UAVs) targeting high-speed moving platforms. By integrating gimbal-based precise positioning, smooth trajectory generation, and dynamically robust control, the system addresses key challenges in high-speed landing scenarios, such as significant visual localization deviations and difficulties in dynamic trajectory planning and control. The study introduces the Comprehensive Coordinate System (CCS-3AG) to eliminate dynamic optical-axis misalignment errors in the gimbal, thereby enhancing the gimbal's ranging accuracy and control precision. We combine an enhanced single-stage minimum control (MINCO) trajectory framework (L-MINCO) with a bidirectional command update strategy to achieve fast and accurate trajectory planning that accounts for dynamic delays, and designs an Incremental Nonlinear Dynamic Inversion (INDI) controller for high-dynamic command tracking. Simulation and real-flight experiments demonstrate that, at target speeds between 0 and 7.7 m/s, the system attains an average landing precision of 0.108 meters, with a success rate of 97.78% across 90 actual landing tests, outperforming existing landing methods. This work provides a highly robust solution for UAV logistics delivery and emergency landing scenarios.

15:15-15:20, Paper TuCT21.4
Robust Model Predictive Control for Quadruped Locomotion under Model Uncertainties and External Disturbances

Xia, Weipeng	The Chinese University of Hong Kong
Yue, Linzhu	The Chinese University of Hong Kong
Liu, Yunhui	Chinese University of Hong Kong
Keywords: Optimization and Optimal Control, Legged Robots, Multi-Contact Whole-Body Motion Planning and Control Abstract: Model Predictive Control (MPC) enables agile and robust locomotion in quadruped robots but is sensitive to model uncertainties and environmental variations. This paper presents a Tube-based Robust MPC (TR-MPC) framework for quadruped locomotion under uncertainties, modeled as parameter mismatches and additive disturbances. TR-MPC constructs an Invariant Ellipsoid to bound errors induced by uncertainties, ensuring convergent error trajectories. A Semi-Definite Programming (SDP) problem with Linear Matrix Inequality (LMI) constraints is solved offline to minimize the ellipsoid size, while a linear feedback term stabilizes error dynamics, guaranteeing stability within uncertainty bounds. Simulations and experiments demonstrate TR-MPC’s robustness: the robot achieves stable trotting under a 14 kg load (123% of its weight) and recovers from a 1.4 m/s impact while carrying 10 kg (88% of its weight). This framework significantly enhances robustness in dynamic and uncertain environments.

15:20-15:25, Paper TuCT21.5
A Parallel Fuzzy Nonlinear ADRC Framework for Robotic Machining with Vibrations

Hao, Xueqi	School of Electrical and Information Engineering, Hunan Universi
Gao, Changqing	National Engineering Research Center for Robot Visual Perception
Fang, Qiu	Hunan University
Zheng, Yan	Hunan University
Wang, Yaonan	Hunan University
Keywords: Optimization and Optimal Control, Robust/Adaptive Control, Industrial Robots Abstract: This paper presents a Parallel Fuzzy Nonlinear Active Disturbance Rejection Control (FNLADRC) strategy to enhance the precision and robustness of robotic manipulators in machining large, complex components. By decoupling the multi-degree-of-freedom manipulator dynamics and integrating Nonlinear Active Disturbance Rejection Control (NLADRC) with fuzzy logic, the proposed framework adaptively estimates and compensates for disturbances in real time, effectively addressing challenges such as machining vibrations and trajectory-tracking errors. A parallelized control architecture is designed to optimize multi-joint coordination, improving system adaptability and response speed. Furthermore, a fuzzy logic-based adaptive tuning mechanism dynamically adjusts control parameters, significantly enhancing system robustness. Simulation experiments using a UR5 robotic arm validate the proposed method’s superior performance in dynamic and uncertain environments. These results establish FNLADRC as a promising control solution for high-precision robotic machining applications.

15:25-15:30, Paper TuCT21.6
Data-Driven Control for Magnetic Actuation Capsule: Dynamic Compensation and Input Constraints (I)

Chen, Peng	Zhejiang University of Technology
He, Xiongxiong	Zhejiang University of Technology
Chen, Qiang	Zhejiang University of Technology
Chen, Jiashu	Zhejiang Polytechnic University of Mechanical and Electrical Eng
Jiang, Qianru	Zhejiang University of Technology
Li, Sheng	Zhejiang University of Technology
Keywords: Medical Robots and Systems, Optimization and Optimal Control, Sensor-based Control Abstract: In the magnetic actuation system, MAC (Magnetically Actuated Capsule) motion is disturbed by gastrointestinal resistance, and dynamic constraints exist between MAC and EPM (External Permanent Magnet), leading to control difficulties. This paper presents a data-driven control method for magnetic actuation system. Firstly, a data-driven modeling approach is proposed for addressing the modeling challenges, relying solely on MAC visual positioning input and EPM position output data. Secondly, To effectively track rapidly changing MAC desired trajectories, determining the upper bound of system inputs through analysis of the magnetic field relationship, and integrating it with adaptive parameter reset conditions, enables the establishment of dynamic constraints for the MAC system. Finally, dynamic compensation is applied to account for non-linear resistance terms and inaccuracies in data model representation. A dual-visual positioning experimental platform simulating the gastrointestinal environment is established to validate the proposed algorithm's effectiveness in MAC trajectory tracking under different conditions.


TuCT22	102A
Robotics in Automation in Construction	Regular Session
Chair: Li, Shuai	University of Florida
Co-Chair: Nikolakopoulos, George	Luleå University of Technology

15:00-15:05, Paper TuCT22.1
Real-Time Excavation Trajectory Modulation for Slip and Rollover Prevention

Kim, ChangU	Seoul National University
Son, Bukun	Seoul National University
Lee, Minhyeong	Seoul National University
Choi, Hyelim	Seoul National University
Hong, Seokhyun	Seoul National University
Kang, Minsung	HD Hyundai XiteSolution
Moon, Ji Hyun	HD Hyundai XiteSolution
Kim, Dongmok	Hyundai Doosan Infracore
Lee, Dongjun	Seoul National University
Keywords: Robotics and Automation in Construction, Motion and Path Planning, Deep Learning Methods Abstract: We propose a novel real-time excavation trajectory modulation framework on a slope for an autonomous excavator with a low-level digital kinematic control as common for hydraulic industrial excavators. Excavation on a slope is challenging because of a higher risk of slips and rollovers. To deal with this, we propose a real-time excavation trajectory modulation framework based on slope tangential/normal force ratio and zero moment point. The slip and rollover prevention conditions are incorporated in a single linear inequality using the same fractional structure in slope tangential/normal force ratio and zero moment point with the common denominator. However, due to the adoption of the low-level digital kinematic control, this prevention requires the prediction of the excavation force at the next timestamp, and, for this, we develop a data-driven excavation force difference prediction model utilizing a deep learning architecture, Transformer. The remaining error of this prediction is then addressed by using the technique of robust optimization with box uncertainty of the developed excavation force difference model. Our proposed framework is validated experimentally with our customized scaled-down excavator.

15:05-15:10, Paper TuCT22.2
Predictive Energy Stability Margin: Prediction of Heavy Machine Overturning Considering Rotation and Translation

Kamezaki, Mitsuhiro	The University of Tokyo
Kokudo, Yuya	Waseda University
Uehara, Yusuke	Waseda University
Itano, Shunya	Waseda University
Iida, Tatsuhiro	Waseda University
Sugano, Shigeki	Waseda University
Keywords: Robotics and Automation in Construction, Field Robots, Search and Rescue Robots Abstract: Fatal accidents caused by the overturning of heavy machines still happen, so the prediction and prevention of overturns are urgently needed. Indicators to evaluate overturn, such as the energy stability margin (ESM), have been proposed but are limited to a non-slip ground surface. Even if ESM is above zero, the machine may overturn due to additional manipulator operation or hitting an obstacle while sliding down a slope. This study thus proposes a predictive energy stability margin, p-ESM, that focuses on kinetic energy in the translational and rotational directions for overturn prediction. Rotational kinetic energy E_R accelerates overturning, and the translational kinetic energy E_T in the slope direction is converted to E_R. Both are calculated from the mass and the position and acceleration of the center of gravity (COG) for each part of the machine. ESM U is defined as the difference between the height of COG just before overturn and the current height of the COG. Thus, p-ESM is defined as U minus the sum of E_R and E_T. We also developed an operation support system to limit the manipulator operation by using p-ESM. The results of experiments using a hydraulically driven scale model (1/14) with different combinations of operations, loading weights, and ground surfaces confirmed that p-ESM can predict overturns early and accurately, which conventional ESM cannot do. We also found that the support system using p-ESM can prevent inappropriate operations and avoid overturns.

15:10-15:15, Paper TuCT22.3
Dynamic Planning and Assembly for Constructing Mortar-Joint Multi-Leaf Stone Masonry Walls with a Robotic Arm

Wang, Qianqing	EPFL
Wang, Jingwen	EPFL
Pantoja Rosero, Bryan German	University of Texas at Austin
Beyer, Katrin	EPFL
Parascho, Stefana	EPFL
Keywords: Robotics and Automation in Construction, Building Automation Abstract: The construction industry faces growing challenges in sustainability, labor shortages, and efficiency. Stone masonry, known for its durability and low environmental impact, has declined in modern construction due to high labor costs and slow building processes. This work presents a robotic system for automating the construction of mortar-joint, multi-leaf stone masonry walls. Our approach integrates stone layout optimization, sequence planning, vacuum-based grasping, automated pick-and-place motion, and vision- and sensor-guided trajectory correction. A digital twin system provides real-time feedback to improve accuracy and adaptability. To evaluate our method, we construct a 700 × 700 × 400 mm³ masonry wall and compare it to those built by skilled masons. The final structure demonstrates comparable strength, stone interlocking, and stone filling to manually built walls. As the first robotic approach to mortar-joint multi-leaf masonry construction, this work addresses key challenges such as robotic manipulation and dense packing of irregular stones with wet mortar. Our findings contribute to advancing robotic construction with natural materials and offer a scalable framework for future architectural applications.

15:15-15:20, Paper TuCT22.4
Reinforcement Learning-Based Autonomous Control Methodology of Hydraulic Excavators

Helian, Bobo	Karlsruhe Institute of Technology
Liu, Xiyang	Karlsruhe Institute of Technology
Liu, Zichen	Karlsruhe Institute of Technology
Geimer, Marcus	Karlsruhe Institute of Technology
Keywords: Robotics and Automation in Construction, Machine Learning for Robot Control, Reinforcement Learning Abstract: Autonomous hydraulic excavators are in high demand for tasks in industries, where operations often occur in harsh and unpredictable environments. Achieving autonomy requires advanced control strategies that account for system constraints, nonlinear hydraulic systems, and environmental interactions. This study proposes a reinforcement learning (RL)-based methodology to perform a complete excavation cycle by controlling proportional valves. A comprehensive joint simulation tool is developed, in which a hydraulic system model is detailed based on a real machine, and it is integrated with an excavator mechanism and working environment to create a realistic interaction environment for RL training. The RL agent, trained using Proximal Policy Optimization (PPO), incorporates a customized reward shaping method that ensures operational safety and accuracy, considering constraints such as pump flow saturation and geometric constraints. In addition, an Adaptive Control Frequency (ACF) method is developed to enhance training efficiency by dynamically adjusting the control frequency based on task complexity. Comparative validations demonstrate the RL agent’s ability to successfully complete a full excavation cycle, satisfy operational constraints, and generalize across varying initial conditions and valve responses. Furthermore, the controller operates effectively in a soil environment despite being trained without soil, demonstrating robustness to uncertain, time-varying loads.

15:20-15:25, Paper TuCT22.5
Safety-Aware Optimal Scheduling for Autonomous Masonry Construction Using Collaborative Heterogeneous Aerial Robots

Stamatopoulos, Marios-Nektarios	Luleå University of Technology
Velhal, Shridhar	Lulea Technical University
Banerjee, Avijit	Luleå University of Technology
Nikolakopoulos, George	Luleå University of Technology
Keywords: Robotics and Automation in Construction Abstract: This paper presents a novel high-level task planning and optimal coordination framework for autonomous masonry construction using a team of heterogeneous aerial robotic workers, consisting of agents with separate skills for brick placement and mortar application. This introduces new challenges in scheduling and coordination, particularly due to the mortar curing deadline required for structural bonding and ensuring the safety constraints among UAVs operating in parallel. To address this, an automated pipeline generates the wall construction plan based on the available bricks while identifying static structural dependencies and potential conflicts for safe operation. The proposed framework optimizes UAV task allocation and execution timing by incorporating dynamically coupled precedence deadline constraints that account for the curing process and static structural dependency constraints, while enforcing spatio-temporal constraints to prevent collisions and ensure safety. The primary objective of the scheduler is to minimize the overall construction makespan while minimizing logistics, traveling time between tasks, and the curing time to maintain both adhesion quality and safe workspace separation. The effectiveness of the proposed method in achieving coordinated and time-efficient aerial masonry construction is extensively validated through Gazebo simulated missions. The results demonstrate the framework’s capability to streamline UAV operations, ensuring both structural integrity and safety during the construction process.

15:25-15:30, Paper TuCT22.6
Human-Inspired Planning and Control of Shotcrete Robots Based on Dynamical Systems Mapping

Wu, Rui	École Polytechnique Fédérale De Lausanne
Gholami, Soheil	École Polytechnique Fédérale De Lausanne (EPFL)
Bonato, Tristan	École Polytechnique Fédérale De Lausanne, Learning Algorithms An
Billard, Aude	EPFL
Keywords: Robotics and Automation in Construction, Learning from Demonstration, Motion and Path Planning Abstract: Performing shotcrete operations at construction sites can be hazardous to humans and inefficient. Robots can offer a safer and more efficient alternative to assist in these tasks. We present a new planning strategy for shotcrete robots, including both the spraying and surface finishing phases, that can plan for a general target area, whether flat or complexly curved. Our method uses learning from demonstrations and dynamical systems concepts to enable reactive and adaptive planning for robots, allowing them to effectively handle disturbances. We evaluated the effectiveness of the proposed planning and control framework in a laboratory setup using a velocity-controlled robot and curved targets both in the spraying and polishing phases. The results demonstrate the effectiveness of the proposed approach.

15:30-15:35, Paper TuCT22.7
Holistic Construction Automation with Modular Robots: From High-Level Task Specification to Execution (I)

Külz, Jonathan	Technical University of Munich
Terzer, Michael	Fraunhofer Italia Research Scarl
Magri, Marco	Fraunhofer Italia
Giusti, Andrea	Fraunhofer Italia Research
Althoff, Matthias	Technische Universität München
Keywords: Robotics and Automation in Construction, Cellular and Modular Robots, Optimization and Optimal Control Abstract: In situ robotic automation in construction is challenging due to constantly changing environments, a shortage of robotic experts, and a lack of standardized frameworks bridging robotics and construction practices. This work proposes a holistic framework for construction task specification, optimization of robot morphology, and mission execution using a mobile modular reconfigurable robot. Users can specify and monitor the desired robot behavior through a graphical interface. In contrast to existing, monolithic solutions, we automatically identify a new task-tailored robot for every task by integrating Building Information Modeling (BIM). Our framework leverages modular robot components that enable the fast adaption of robot hardware to the specific demands of the construction task. Other than previous works on modular robot optimization, we consider multiple competing objectives, which allow us to explicitly model the challenges of real-world transfer, such as calibration errors. We demonstrate our framework in simulation by optimizing robots for drilling and spray painting. Finally, experimental validation demonstrates that our approach robustly enables the autonomous execution of robotic drilling.

15:35-15:40, Paper TuCT22.8
Stepping Locomotion for a Walking Excavator Robot Using Hierarchical Reinforcement Learning and Action Masking

Babu, Ajish	German Research Center for Artificial Intelligence (DFKI)
Kirchner, Frank	University of Bremen
Keywords: Robotics in Hazardous Fields, Reinforcement Learning, Motion Control Abstract: The employment of walking excavator robots, endowed with hybrid locomotion capabilities, holds considerable promise in facilitating the execution of intricate tasks in challenging terrain environments. A critical skill for such a system pertains to traversing obstacles through stepping locomotion, a process entailing the momentary disengagement of the end-effectors from the ground. Existing solutions are encumbered by two significant limitations. Primarily, they are often too cumbersome to develop and implement due to the complexity of the problem formulations. Secondly, they present restrictions on the available avenues to influence the behavior, hindering the effective leveraging of domain knowledge to achieve the intended objective. This research proposes an alternative approach to learning the stepping locomotion. The proposed method employs a hierarchical reinforcement learning strategy, wherein the complex control task is decomposed into multiple subtasks, each aligned with a sub-objective defined as a reward function. The training of these subtasks is conducted individually, starting from the lowest level and progressing to the higher levels, predominantly utilizing deep reinforcement learning. Additionally, the masking of invalid actions is utilized to guide the controller during training, offering enhanced opportunities to influence behavior while using only simple formulations. Notably, the proposed approach has been successfully trained for three distinct stepping scenarios: obstacle, step, and gap, underscoring the versatility of the controller. The design of the controller, along with the results of training and evaluation in simulation, is presented herein.


TuCT23	102B
Sensor Fusion 3	Regular Session
Chair: Myung, Hyun	KAIST (Korea Advanced Institute of Science and Technology)
Co-Chair: Wang, Yu	University of Science and Technology of China

15:00-15:05, Paper TuCT23.1
Apple Detection Method Based on Fusion of Infrared Thermal Image and Visible-Light Image

Li, Yuanchen	Xi'an Jiaotong University
Wu, Zhichao	Xi'an Jiaotong University
Jiang, Siqi	Xi'an Jiaotong University
Dong, Xia	Xi'an Jiaotong University
Xu, Haibo	School of Mechanical Engineering, Xi'an Jiaotong University
Wang, Kedian	Xi'an Jiaotong University
Keywords: Sensor Fusion, Recognition, Deep Learning Methods Abstract: To address the challenges posed by lighting variations and fruit occlusion in open orchard environments, which significantly affect the performance of apple-harvesting robots, this study proposes an apple detection method based on the fusion of infrared thermal images and visible-light images. Firstly, An edge feature-based registration technique was employed to achieve precise alignment of infrared and visible-light images. Subsequently, an improved YOLOv8s model integrated with the SeAFusion framework was utilized to facilitate efficient apple detection. Experimental results revealed that the proposed method achieved 94.9% mean accuracy and 89.5% mean recall across diverse illumination scenarios (normal/ strong/ backlight), surpassing visible-light-only detection by 0.6%, 4.8%, and 3.5% in apple count accuracy under respective conditions. The proposed method established a robust framework for vision-based harvesting robots, significantly improving operational reliability in complex orchard environments and providing technical foundations for scalable agricultural automation.

15:05-15:10, Paper TuCT23.2
CHADET: Cross-Hierarchical-Attention for Depth-Completion Using Unsupervised Lightweight Transformer

Marsim, Kevin Christiansen	KAIST
Jeong, Myeongwoo	KAIST
Kim, Yeeun	KAIST
Jeon, Jinwoo	KAIST
Myung, Hyun	KAIST (Korea Advanced Institute of Science and Technology)
Keywords: Sensor Fusion, RGB-D Perception, Deep Learning for Visual Perception Abstract: Depth information which specifies the distance between objects and current position of the robot is essential for many robot tasks such as navigation. Recently, researchers have proposed depth completion frameworks to provide dense depth maps that offer comprehensive information about the surrounding environment. However, existing methods show significant trade-offs between computational efficiency and accuracy during inference. The substantial memory and computational requirements make them unsuitable for real-time applications, highlighting the need to improve the completeness and accuracy of depth information while improving processing speed to enhance robot performance in various tasks. To address these challenges, in this paper, we propose CHADET (cross-hierarchical-attention depth-completion transformer), a lightweight depth-completion network that can generate accurate dense depth maps from RGB images and sparse depth points. For each pair, its feature is extracted from the depthwise blocks and passed to the equally lightweight transformer-based decoder. In the decoder, we utilize the novel cross-hierarchical-attention module that refines the image features from the depth information. Our approach improves the quality and reduces memory usage of the depth map prediction, as validated in both KITTI, NYUv2, and VOID datasets.

15:10-15:15, Paper TuCT23.3
ELMAR: Enhancing LiDAR Detection with 4D Radar Motion Awareness and Cross-Modal Uncertainty

Peng, Xiangyuan	Infineon Technologie AG/ Technical University of Munich
Tang, Miao	China University of Geosciences(wuhan)
Sun, Huawei	Technical University of Munich; Infineon Technologies AG
Bierzynski, Kay	Infineon Technologies AG
Servadei, Lorenzo	Technical University of Munich
Wille, Robert	Technical University of Munich
Keywords: Sensor Fusion, Object Detection, Segmentation and Categorization, Computer Vision for Transportation Abstract: LiDAR and 4D radar are widely used in autonomous driving and robotics. While LiDAR provides rich spatial information, 4D radar offers velocity measurement and remains robust under adverse conditions. As a result, increasing studies have focused on the 4D radar-LiDAR fusion method to enhance the perception. However, the misalignment between different modalities is often overlooked. To address this challenge and leverage the strengths of both modalities, we propose a LiDAR detection framework enhanced by 4D radar motion status and cross-modal uncertainty. The object movement information from 4D radar is first captured using a Dynamic Motion-Aware Encoding module during feature extraction to enhance 4D radar predictions. Subsequently, the instance-wise uncertainties of bounding boxes are estimated to mitigate the cross-modal misalignment and refine the final LiDAR predictions. Extensive experiments on the View-of-Delft (VoD) dataset highlight the effectiveness of our method, achieving state-of-the-art performance with the mAP of 74.89% in the entire area and 88.70% within the driving corridor while maintaining a real-time inference speed of 30.02 FPS.

15:15-15:20, Paper TuCT23.4
A New Unsupervised Infrared and Visible Image Fusion Method Based on Salient Object Segmentation under Poor Illumination

Wang, Zheng	Zhejiang University
Ji, Haifeng	Zhejiang University
Wang, Baoliang	Zhejiang University
Huang, Zhiyao	Zhejiang University
Keywords: Sensor Fusion, Object Detection, Segmentation and Categorization Abstract: This work aims to propose a new unsupervised infrared and visible image fusion method based on salient object segmentation, which can obtain a fused image with more information on salient object and realize the salient object segmentation under poor illumination. The new method can be divided into four steps: (1) A new superpixel segmentation method based on simple linear iterative clustering (SLIC) with K-means subdivision is used to initially process the infrared and visible image, which has better superpixel segmentation quality. (2) A new improved Density Peaks Clustering (DPC) based on superpixel is used to realize the salient object segmentation of the infrared image, which is improved to be automatically selecting the cluster centers with less computation cost. (3) A new GrabCut strategy using the eroded and dilated salient object regions of the infrared image to predetermine the foreground and background respectively is used to achieve the salient object segmentation of the visible image, which can be totally automatic with better salient object segmentation quality. (4) An image fusion strategy is used to realize the final image fusion, which treats the salient object region and background respectively. Experiments were carried out under different poor illumination scenes in the real world. The experimental results show that the new infrared and visible image fusion method is successful with Q^(AB/F) greater than 0.69. In addition, the provided superpixel segmentation method, salient object segmentation method and new GrabCut strategy are also effective. The research results provide an effective infrared and visible image fusion thought and three useful methods, which can provide a good reference for researchers. And, the research work reveals the application potential of DPC on image fusion and salient object segmentation, and broadens the application fields of DPC.

15:20-15:25, Paper TuCT23.5
RGB-Thermal Visual Place Recognition Via Vision Foundation Model

Ye, Minghao	Harbin Institute of Technology, Shenzhen
Liu, Xiao	Harbin Institute of Technology, Shenzhen
Wang, Yu	University of Science and Technology of China
Liu, Lu	City University of Hong Kong
Chen, Haoyao	Harbin Institute of Technology, Shenzhen
Keywords: Sensor Fusion, Recognition, Deep Learning for Visual Perception Abstract: Visual place recognition is a critical component of robust simultaneous localization and mapping systems. Conventional approaches primarily rely on RGB imagery, but their performance degrades significantly in extreme environments, such as those with poor illumination and airborne particulate interference (e.g., smoke or fog), which significantly degrade the performance of RGB-based methods. Furthermore, existing techniques often struggle with cross-scenario generalization. To overcome these limitations, we propose an RGB-thermal multimodal fusion framework for place recognition, specifically designed to enhance robustness in extreme environmental conditions. Our framework incorporates a dynamic RGB-thermal fusion module, coupled with dual fine-tuned vision foundation models as the feature extraction backbone. Experimental results on public datasets and our self-collected dataset demonstrate that our method significantly outperforms state-of-the-art RGB-based approaches, achieving generalizable and robust retrieval capabilities across day and night scenarios. The code is available at https://github.com/HITSZ-NRSL/RGB-Thermal-VPR.

15:25-15:30, Paper TuCT23.6
Impact of Temporal Delay on Radar-Inertial Odometry

Štironja, Vlaho-Josip	Faculty of Electrical Engineering and Computing
Petrović, Luka	University of Zagreb Faculty of Electrical Engineering and Compu
Persic, Juraj	University of Zagreb
Markovic, Ivan	University of Zagreb Faculty of Electrical Engineering and Compu
Petrovic, Ivan	University of Zagreb
Keywords: Sensor Fusion, Localization, Calibration and Identification Abstract: Accurate ego-motion estimation is a critical component of any autonomous system. Conventional ego-motion sensors, such as cameras and LiDARs, may be compromised in adverse environmental conditions, such as fog, heavy rain, or dust. Automotive radars, known for their robustness to such conditions, present themselves as complementary sensors or a promising alternative within the ego-motion estimation frameworks. In this paper we propose a novel Radar-Inertial Odometry (RIO) system that integrates an automotive radar and an inertial measurement unit. The key contribution is the integration of online temporal delay calibration within the factor graph optimization framework that compensates for potential time offsets between radar and IMU measurements. To validate the proposed approach we have conducted thorough experimental analysis on real-world radar and IMU data. The results show that, even without scan matching or target tracking, integration of online temporal calibration significantly reduces localization error compared to systems that disregard time synchronization, thus highlighting the important role of, often neglected, accurate temporal alignment in radar-based sensor fusion systems for autonomous navigation.

15:30-15:35, Paper TuCT23.7
RAVES-Calib: Robust, Accurate and Versatile Extrinsic Self Calibration Using Optimal Geometric Features

Zhang, Haoxin	Sun Yat-Sen University
Li, Shuaixin	Academy of Military Science
Zhu, Xiaozhou	Chinese Academy of Military Science
Zhang, Xiao	Harbin Engineering University
Chen, Hongbo	Sun Yat-Sen University
Yao, Wen	Chinese Academy of Military Science
Keywords: Sensor Fusion, Mapping, Calibration and Identification Abstract: In this paper, we present a user-friendly LiDAR-camera calibration toolkit that is compatible with various LiDAR and camera sensors and requires only a single pair of laser points and a camera image in targetless environments. Our approach eliminates the need for an initial transform and remains robust even with large positional and rotational LiDAR-camera extrinsic parameters. We employ the Gluestick pipeline to establish 2D-3D point and line feature correspondences for a robust and automatic initial guess. To enhance accuracy, we quantitatively analyze the impact of feature distribution on calibration results and adaptively weight the cost of each feature based on these metrics. As a result, extrinsic parameters are optimized by filtering out the adverse effects of inferior features. We validated our method through extensive experiments across various LiDAR-camera sensors in both indoor and outdoor settings. The results demonstrate that our method provides superior robustness and accuracy compared to SOTA techniques. Our code is open-sourced on GitHub https://github.com/haoxinnihao/LiDAR_camera_calibration to benefit the community.

15:35-15:40, Paper TuCT23.8
Consistent Pose Estimation of Unmanned Ground Vehicles through Terrain-Aided Multi-Sensor Fusion on Geometric Manifolds

Raab, Alexander	AGILOX Services GmbH
Weiss, Stephan	Universität Klagenfurt
Fornasier, Alessandro	University of Klagenfurt
Brommer, Christian	University of Klagenfurt
Ibrahim, Abdalrahman	AGILOX Services GmbH
Keywords: Sensor Fusion, Localization, Wheeled Robots Abstract: Aiming to enhance the consistency and thus long-term accuracy of Extended Kalman Filters for terrestrial vehicle localization, this paper introduces the Manifold Error State Extended Kalman Filter (M-ESEKF). By representing the robot's pose in a space with reduced dimensionality, the approach ensures feasible estimates on generic smooth surfaces, without introducing artificial constraints or simplifications that may degrade a filter's performance. The accompanying measurement models are compatible with common loosely- and tightly-coupled sensor modalities and also implicitly account for the ground geometry. We extend the formulation by introducing a novel correction scheme that embeds additional domain knowledge into the sensor data, giving more accurate uncertainty approximations and further enhancing filter consistency. The proposed estimator is seamlessly integrated into a validated modular state estimation framework, demonstrating compatibility with existing implementations. Extensive Monte Carlo simulations across diverse scenarios and dynamic sensor configurations show that the M-ESEKF outperforms classical filter formulations in terms of consistency and stability. Moreover, it eliminates the need for scenario-specific parameter tuning, enabling its application in a variety of real-world settings.


TuCT24	102C
Robust/Adaptive Control 1	Regular Session
Chair: Deghat, Mohammad	University of New South Wales
Co-Chair: Verginis, Christos	Uppsala University

15:00-15:05, Paper TuCT24.1
Primitive-Swarm: An Ultra-Lightweight and Scalable Planner for Large-Scale Aerial Swarms

Hou, Jialiang	Fudan University
Zhou, Xin	ZHEJIANG UNIVERSITY
Pan, Neng	Zhejiang University
Li, Ang	Beihang University
Guan, Yuxiang	Fudan University
Xu, Chao	Zhejiang University
Gan, Zhongxue	Fudan University
Gao, Fei	Zhejiang University
Keywords: Path Planning for Multiple Mobile Robots or Agents, Motion and Path Planning, Multi-Robot Systems, Aerial Systems: Perception and Autonomy Abstract: Achieving large-scale aerial swarms is challenging due to the inherent contradictions in balancing computational efficiency and scalability. This article introduces primitive-swarm, an ultra-lightweight and scalable planner designed specifically for large-scale autonomous aerial swarms. The proposed approach adopts a decentralized and asynchronous replanning strategy. Within it is a novel motion primitive library consisting of time- optimal and dynamically feasible trajectories. They are generated utilizing a novel time-optimal path parameterization algorithm based on reachability analysis. Then, a rapid collision check- ing mechanism is developed by associating the motion primitives with the discrete surrounding space according to conflicts. By considering both spatial and temporal conflicts, the mechanism handles robot-obstacle and robot–robot collisions simultaneously. Then, during a replanning process, each robot selects the safe and minimum cost trajectory from the library based on user-defined requirements. Both the time-optimal motion primitive library and the occupancy information are computed offline, turning a time- consuming optimization problem into a linear-complexity selection problem. This enables the planner to comprehensively explore the nonconvex, discontinuous 3-D safe space filled with numerous obstacles and robots, effectively identifying the best hidden path. Benchmark comparisons demonstrate that our method achieves the shortest flight time and traveled distance with a computation time of less than 1 ms in dense environments. Super large-scale swarm simulations, involving up to 1000 robots, running in real time, verify the scalability of our method. Real-world experiments validate the feasibility and robustness of our approach. The code will be released to foster community collaboration.

15:05-15:10, Paper TuCT24.2
Adaptive Bearing-Only Target Localization and Circumnavigation under Unknown Wind Disturbance: Theory and Experiments

Sui, Donglin	The University of New South Wales
Deghat, Mohammad	University of New South Wales
Sun, Zhiyong	Eindhoven University of Technology
Eskandari, Mohsen	University of New South Wales
Keywords: Robust/Adaptive Control, Localization, Aerial Systems: Applications Abstract: This paper addresses the problem of controlling an autonomous agent to localize and circumnavigate a stationary or slowly moving target in the presence of an unknown wind disturbance. First, we introduce a novel wind estimator that utilizes bearing-only measurements to adaptively estimate the wind velocity. Then, we develop an estimator-coupled circumnavigation controller to mitigate wind effects, enabling the agent to move in a circular orbit centered at the target with a predefined radius. We analytically prove that the estimation and control errors are locally exponentially convergent when the wind is constant and the target is stationary. Then, the robustness of the method is evaluated for a slowly moving target and time-varying wind. It is shown that the circumnavigation errors converge to small neighborhoods of the origin whose sizes depend on the velocities of the wind and that of the target. Comprehensive simulation and experiments using unmanned aerial vehicles (UAVs) illustrate the efficacy of the proposed estimator and controller.

15:10-15:15, Paper TuCT24.3
Non-Parametric Neuro-Adaptive Formation Control (I)

Verginis, Christos	Uppsala University
Xu, Zhe	Arizona State University
Topcu, Ufuk	The University of Texas at Austin
Keywords: Robust/Adaptive Control, Multi-Robot Systems, Machine Learning for Robot Control Abstract: We develop a learning-based algorithm for the distributed formation control of networked multi-agent systems governed by unknown, nonlinear dynamics. Most existing algorithms either assume certain parametric forms for the unknown dynamic terms or resort to unnecessarily large control inputs in order to provide theoretical guarantees. The proposed algorithm avoids these drawbacks by integrating neural network-based learning with adaptive control in a two-step procedure. In the first step of the algorithm, each agent learns a controller, represented as a neural network, using training data that correspond to a collection of formation tasks and agent parameters. These parameters and tasks are derived by varying the nominal agent parameters and a user-defined formation task to be achieved, respectively. In the second step of the algorithm, each agent incorporates the trained neural network into an online and adaptive control policy in such a way that the behavior of the multiagent closed-loop system satisfies the user-defined formation task. Both the learning phase and the adaptive control policy are distributed, in the sense that each agent computes its own actions using only local information from its neighboring agents. The proposed algorithm does not use any a priori information on the agents’ unknown dynamic terms or any approximation schemes. We provide formal theoretical guarantees on the achievement of the formation task.

15:15-15:20, Paper TuCT24.4
Adaptive Control of Multi-Agent Systems with an Unstable High-Dimensional Leader and Switching Disconnected Topologies (I)

Sun, Jian	Dalian Minzu University
Li, Ruoqi	Dalian Minzu University
Liu, Lei	Liaoning University of Technology
Zhang, Jianxin	Dalian Minzu University
Shan, Qihe	Dalian Maritime University
Keywords: Robust/Adaptive Control Abstract: This paper addresses an adaptive consensus control problem for heterogeneous multi-agent systems with switching disconnected topologies.Unlike the existing works on switching disconnected topologies,the unstable high dimensional leader is first considered in this work.To tackle this problem,we propose a novel blockwise energy descent approach.This approach divides the switching periods into several time blocks and mines the cooperation laws of agents within these blocks.Then the descent phenomenon at switching time can be obtained,which can be used to counteract the divergence within the switching periods.Building upon this,we develop a time-varying Lyapunov function to describe the system's dynamics and establish conditions for achieving the output consensus.Finally,we develop a simulation example to confirm the validity of our theoreti-cal results.

15:20-15:25, Paper TuCT24.5
Full Pose Tracking Via Robust Control for Over-Actuated Multirotors

Hachem, Mohamad	Ecole Nationale De l'Aviation Civile
Roos, Clement	Onera
Miquel, Thierry	ENAC
Bronz, Murat	ENAC, Université De Toulouse
Keywords: Robust/Adaptive Control, Optimization and Optimal Control, Aerial Systems: Mechanics and Control Abstract: This paper presents a robust cascaded control architecture for over-actuated multirotors. It extends the Incremental Nonlinear Dynamic Inversion (INDI) control, combined with structured Hinfty control—originally proposed for under-actuated multirotors to a broader range of multirotor configurations. Furthermore, it reduces the control law’s dependency on the multirotor model compared to the methods in the literature by shifting it to the actuator model, thereby improving the closed-loop robustness to uncertainties. To achieve attitude and position tracking, we employ a weighted least-squares geometric guidance control allocation method, formulated as a quadratic optimization problem, enabling full-pose tracking. The proposed approach effectively addresses key challenges, such as preventing infeasible pose references, enhancing robustness to disturbances, and accounting for the multirotor's actual physical limitations. Numerical simulations with an over-actuated hexacopter validate the method’s effectiveness, demonstrating its adaptability to diverse mission scenarios and its potential for real-world aerial applications.

15:25-15:30, Paper TuCT24.6
Adaptive Model-Based Control of Quadrupeds Via Online System Identification Using Kalman Filter

Haack, Jonas	University of Bremen
Stark, Franek	Robotics Innovation Center, DFKI GmbH
Vyas, Shubham	Robotics Innovation Center, DFKI GmbH
Kirchner, Frank	University of Bremen
Kumar, Shivesh	Chalmers University of Technology
Keywords: Robust/Adaptive Control, Legged Robots, Model Learning for Control Abstract: Many real-world applications require legged robots to be able to carry variable payloads. Model-based controllers such as model predictive control (MPC) have become the de facto standard in research for controlling these systems. However, most model-based control architectures use fixed plant models, which limits their applicability to different tasks. In this paper, we present a Kalman filter (KF) formulation for online identification of the mass and center of mass (COM) of a four-legged robot. We evaluate our method on a quadrupedal robot carrying various payloads and find that it is more robust to strong measurement noise than classical recursive least squares (RLS) methods. Moreover, it improves the tracking performance of the model-based controller with varying payloads when the model parameters are adjusted at runtime.

15:30-15:35, Paper TuCT24.7
Safe Corridor-Based MPC for Follow-Ahead and Obstacle Avoidance of Mobile Robot in Cluttered Environments

Zhang, Yikun	Huazhong University of Science and Technology
Chen, Xinxing	Huazhong University of Science and Technology
Huang, Jian	Huazhong University of Science and Technology
Keywords: Physical Human-Robot Interaction, Collision Avoidance, Motion Control Abstract: In cluttered environments, a human-following mobile robot must predict the motion intention of the followed human and take environmental obstacles into consideration. Consequently, it brings several challenges, such as the human's detour direction prediction problem and the visibility maintenance problem for route planning. To overcome these problems, this paper proposes an integrated follow-ahead framework, in which the human's detour behavior is predicted by the Leg Motion Model-based EKF (LMM-EKF) and the iterative human route search algorithm, followed by the Safe Corridor-based Model Predictive Controller (SCMPC) used to obtain the optimal control solution. Also, a new perspective about visibility is provided in this paper that, via placing multiple obstacle-free safe regions along the human's intended direction without any complex preprocessing for the point cloud, SCMPC prevents the robot from collision and occlusion simultaneously based on the basic properties of the convex set. The validity of the proposed method is comprehensively verified through real-world experiments.

15:35-15:40, Paper TuCT24.8
Enabling On-Chip Adaptive Linear Optimal Control Via Linearized Gaussian Process

Gao, Yuan	Nanyang Technology University
Lai, Yinyi	City University of HongKong
Wang, Jun	Nanyang Technology University
Fang, Yini	Hong Kong University of Science and Technology
Keywords: Robust/Adaptive Control, Machine Learning for Robot Control, Model Learning for Control Abstract: Unpredictable and complex aerodynamic effects pose significant challenges to achieving precise flight control, emphasizing the necessity of adaptive control via datadriven models. Moreover, real hardware usually requires highfrequency and has limited on-chip computation, making the adaptive controller design more challenging. To address these challenges, we incorporate a linearized Gaussian process (GP) to model the external aerodynamics with linear model predictive control, enabling real-time computability. More importantly, to compensate for the control performance sacrificed by GP linearization and reduce on-chip GP computations, we design active data collection strategies using Bayesian optimization with additive GP, reducing the performance sacrifice as much as possible. Specifically, we decompose the performance into force and trajectory partitions, while the force model is for control and the trajectory model is for data collection. Experimental results on both simulations and real quadrotors show that we can achieve comparable tracking errors with full GP (not real-time computable) while maintaining real-time computable on the Crazyflies.


TuCT25	103A
Dexterous Manipulation 3	Regular Session
Chair: Tao, Yong	Beijing University of Aeronautics and Astronautics
Co-Chair: Kasaei, Hamidreza	University of Groningen

15:00-15:05, Paper TuCT25.1
Learning Dual-Arm Push and Grasp Synergy in Dense Clutter

Wang, Yongliang	University of Groningen
Kasaei, Hamidreza	University of Groningen
Keywords: Dexterous Manipulation, Reinforcement Learning, Dual Arm Manipulation Abstract: Robotic grasping in densely cluttered environments is challenging due to scarce collision-free grasp affordances. Non-prehensile actions can increase feasible grasps in cluttered environments, but most research focuses on single-arm rather than dual-arm manipulation. Policies from single-arm systems fail to fully leverage the advantages of dual-arm coordination. We propose a target-oriented hierarchical deep reinforcement learning (DRL) framework that learns dual-arm push-grasp synergy for grasping objects to enhance dexterous manipulation in dense clutter. Our framework maps visual observations to actions via a pre-trained deep learning backbone and a novel CNN-based DRL model, trained with Proximal Policy Optimization (PPO), to develop a dual-arm push-grasp strategy. The backbone enhances feature mapping in densely cluttered environments. A novel fuzzy-based reward function is introduced to accelerate efficient strategy learning. Our system is developed and trained in Isaac Gym and then tested in simulations and on a real robot. Experimental results show that our framework effectively maps visual data to dual push-grasp motions, enabling the dual-arm system to grasp target objects in complex environments. Compared to other methods, our approach generates 6-DoF grasp candidates and enables dual-arm push actions, mimicking human behavior. Results show that our method efficiently completes tasks in densely cluttered environments.

15:05-15:10, Paper TuCT25.2
FunGrasp: Functional Grasping for Diverse Dexterous Hands

Huang, Linyi	The Hong Kong University of Science and Technology (Guangzhou)
Zhang, Hui	ETHz
Wu, Zijian	The Hong Kong University of Science and Technology (Guangzhou)
Christen, Sammy	Disney Research
Song, Jie	ETHZ
Keywords: Dexterous Manipulation, Grasping, Reinforcement Learning Abstract: Functional grasping is essential for humans to perform specific tasks, such as grasping scissors by the finger holes to cut materials or by the blade to safely hand them over. Enabling dexterous robot hands with functional grasping capabilities is crucial for their deployment to accomplish diverse real-world tasks. Recent research in dexterous grasping, however, often focuses on power grasps while overlooking task- and object-specific functional grasping poses. In this paper, we introduce FunGrasp, a system that enables functional dexterous grasping across various robot hands and performs one-shot transfer to unseen objects. Given a single RGBD image of functional human grasping, our system estimates the hand pose and transfers it to different robotic hands via a human-to-robot (H2R) grasp retargeting module. Guided by the retargeted grasping poses, a policy is trained through reinforcement learning in simulation for dynamic grasping control. To achieve robust sim-to-real transfer, we employ several techniques including privileged learning, system identification, domain randomization, and gravity compensation. In our experiments, we demonstrate that our system enables diverse functional grasping of unseen objects using single RGBD images, and can be successfully deployed across various dexterous robot hands. The significance of the components is validated through comprehensive ablation studies.

15:10-15:15, Paper TuCT25.3
A Hybrid Motion Optimization Framework for the Humanoid Upper-Body Robot: Safe and Dexterous Object Carrying

Chen, Zhaoyang	Harbin Institute of Technology
Min, Kang	Harbin Institute of Technology
Fan, Xinyang	Harbin Institute of Technology
Ni, Fenglei	State Key Laboratory of Robotics and System, Harbin Institute Of
Liu, Hong	Harbin Institute of Technology
Keywords: Dexterous Manipulation, Redundant Robots, Motion and Path Planning Abstract: In this paper, a novel hybrid motion optimization framework is proposed for a humanoid upper-body robot with two 7-DOF arms and a 2-DOF waist, to flexibly and safely carry objects in constrained and dynamic environments. The framework consists of trilayer interconnected optimization, which is dedicated to planning the optimal carrying configuration, waist-arm motion trajectory, and dual-arm nullspace trajectory, respectively. The top layer finds the most dexterous waist-arm carrying configuration that satisfies environmental constraints, which is achieved by the effective integration of the multi-objective evolutionary algorithm(EA), the proposed optimal manipulation index, and the parameterization method. The middle layer applies a non-linear optimization method to plan collision-free trajectory that connects the initial and optimal carrying configuration. The bottom layer is achieved by equipping the top layer with elite reproduction operator strategy and adaptive evolution space strategy, generating dual-arm nullspace trajectory in real-time for dynamic obstacle avoidance. The various carrying experiments demonstrate the effectiveness of the proposed framework.

15:15-15:20, Paper TuCT25.4
Hierarchical Diffusion Policy: Manipulation Trajectory Generation Via Contact Guidance

Wang, Dexin	Shandong University
Liu, Chunsheng	Shandong University
Chang, Faliang	Shandong University
Xu, Yichen	Shandong University
Keywords: Dexterous Manipulation, Deep Learning in Robotics and Automation, Learning from Demonstration, Hierarchical planning Abstract: Decision-making in robotics using denoising diffusion processes has increasingly become a hot research topic, but end-to-end policies perform poorly in tasks with rich contact and have limited interactivity. This paper proposes Hierarchical Diffusion Policy (HDP), a new robot manipulation policy of using contact points to guide the generation of robot trajectories. The policy is divided into two layers: the high-level policy predicts the contact for the robot's next object manipulation based on 3D information, while the low-level policy predicts the action sequence toward the high-level contact based on the latent variables of observation and contact. We represent both-level policies as conditional denoising diffusion processes, and combine behavioral cloning and Q-learning to optimize the low-level policy for accurately guiding actions towards contact. We benchmark Hierarchical Diffusion Policy across 6 different tasks and find that it significantly outperforms the existing state-of-the-art imitation learning method Diffusion Policy with an average improvement of 20.8%. We find that contact guidance yields significant improvements, including superior performance, greater interpret

15:20-15:25, Paper TuCT25.5
Non-Contact Manipulator for Sedimented/Floating Objects Via Laser-Induced Thermocapillary Convection

Hui, Xusheng	Northwestern Polytechnical University
Luo, Jianjun	Northwestern Polytechnical University(P.R.China)
You, Haonan	Northwestern Polytechnical University
Sun, Hao	Beijing Advanced Medical Technologies, Ltd. Inc
Keywords: Dexterous Manipulation, Micro/Nano Robots, Motion Control, Thermocapillary convection Abstract: Non-contact manipulation in liquid environments holds significant applications in micro/nanofluidics, microassembly, micromanufacturing, and microbotics. Achieving compatibility in manipulating both sedimented and floating objects, as well as independently and synergistically manipulating multiple targets, remains a significant challenge. Here, a non-contact manipulator is developed for both sedimented and floating objects using laser-induced thermocapillary convection. Various strategies are proposed based on the distinct responses of sedimented and floating objects. Predefined scanning and 'checkpoint' methods facilitate accurate movements of individual and multiple particles, respectively. Ultrafast programmed scanning and laser multiplexing enable independent manipulation of multiple particles. At the liquid-air interface, 'laser cage' and 'laser wall' are proposed to serve as effective tools for manipulating floating objects, especially with vision-based closed-loop control. Methods and strategies here are independent of the features of targets, solvents, and substrates. This work provides a versatile platform and a novel methodology for non-contact manipulation in liquid.

15:25-15:30, Paper TuCT25.6
Design of a Reconfigurable Gripper with Rigid-Flexible Variable Fingers (I)

Wang, Huan	Southeast University
Gao, Bingtuan	Southeast University
Hu, Anqing	Southeast University, College of Electric Engineering
Xu, Wenxuan	Southeast University
Shen, Huan	Nanjing University of Aeronautics and Astronautics
He, Jiahong	Southeast University
Keywords: Dexterous Manipulation, Multifingered Hands, Compliant Joints and Mechanisms Abstract: Grippers with a single rigid or flexible state have corresponding limitations, such as poor safety of rigid grippers and low precision of flexible grippers, which require complex algorithms and additional sensors to perform tedious tasks. Here, we design a reconfigurable gripper based on three fingers that can switch among the rigid, rigid bending and flexible states. This adaptability enables the gripper to perform in diverse application scenarios. The flexible joints are locked and released by the back and forth movement of a slider mechanism, thus enabling switches between the rigid and flexible states of the fingers. When the fingers are in the rigid state, the structure resembles a conventional rigid gripper; when the fingers are in the flexible state, the spring supported joints are driven by four cables. We have developed the mathematical models for the three states, tested the performance of different grasping configurations, analyzed the factors that influence horizontal grasping errors, and finally explored the application of rigid-flexible cooperative operations. Numerical and empirical experimental results clearly demonstrate the viability of the proposed gripper and its wide adaptability to different tasks.

15:30-15:35, Paper TuCT25.7
Hierarchical Framework for Constrained Dual-Arm Cooperative Manipulation with Whole-Body Collision Avoidance

Zhang, Silong	University of Science and Technology of China
Qiu, Quecheng	School of Data Science, USTC, Hefei 230026, China
Ni, Yingtai	University of Science and Technology of China
Shao, YueCheng	University of Science and Technology of China
Feng, Ziyang	University of Science and Technology of China
Ji, Jianmin	University of Science and Technology of China
Keywords: Bimanual Manipulation, Collision Avoidance, Imitation Learning Abstract: Dual-arm robotic systems hold great potential for complex bimanual tasks that require intricate and coordinated manipulation, such as holding and transporting a tray with a cup of coffee while navigating through cluttered environments. However, these tasks pose significant challenges due to the inherent closed-chain constraints between the arms and the object, as well as the need for real-time collision avoidance, especially in real-world applications. To address these challenges, we introduce a hierarchical framework that combines learning-based planning with classical control theory to ensure whole-body collision avoidance movement while maintaining the kinematic relationship. In addition, we present a novel, efficient, and cost-free data generation method specifically designed for dual-arm cooperative tasks, overcoming the lack of sufficient training data. Extensive experiments in both simulation and real-world scenarios demonstrate that our approach improves the success rate by 26.3% compared to existing planning methods and by 54.7% compared to end-to-end methods. These results highlight the advantages of our method in whole-body collision avoidance and environmental adaptability, making it a promising solution for dual-arm cooperative tasks.


TuCT26	103B
Soft Robot Materials and Design 3	Regular Session
Chair: Ikemoto, Shuhei	Kyushu Institute of Technology
Co-Chair: Yang, Laihao	Xi'an Jiaotong University

15:00-15:05, Paper TuCT26.1
Development of a Photo-Curing 3D Printer for Fabrication of Small-Scale Soft Robots with Programming Spatial Magnetization

Li, Shishi	Harbin Institute of Technology
Meng, Xianghe	Harbin Institute of Technology
Shen, Xingjian	Harbin Institute of Technology
Wang, Jinrong	Harbin Institute of Technology
Xie, Hui	Harbin Institute of Technology
Keywords: Soft Robot Materials and Design, Additive Manufacturing, Biologically-Inspired Robots Abstract: The magnetic soft robot has potential applications in biomimetic, soft interaction, and biomedical fields. However, its functionality depends on deformation patterns and locomotion modes, challenging the fabrication of complex structures with precise magnetization. Therefore, we developed a programming magnetization photo-curing 3D printer for fabrication of small-scale soft robots. The printer integrates a three-dimensional magnetic field generator (3D-MFG) and a digital light processing photo-curing system. The 3D-MFG generates a high-strength (up to 80 mT) magnetic field through Halbach arrays (x-y plane) and a solenoid (z-axis), generating arbitrary uniform magnetic field with low energy consumption. Adhesion between the printed structure and the substrate was analyzed, and A real-time force-based printing control method is presented for precise optimization of key parameters, including layer thickness, approaching force, and separation speed, enhancing overall print quality and reliability in stacking of complex three-dimensional structures. Finally, a crawling robot mimicking inchworm gait, a swimming robot with butterfly-inspired motion, and a capsule robot for targeted drug delivery were fabricated by the developed system. These experimental results validated the printer’s capability to fabricate highly complex structures, advancing the practical application of small-scale soft robots in biomimetic and biomedical fields.

15:05-15:10, Paper TuCT26.2
Hybrid Hard-Soft Robotic Joint and Robotic Arm Based on Pneumatic Origami Chambers (I)

Oh, Namsoo	Dankook University
Lee, Haneol	Sungkyunkwan University
Shin, Jiseong	Sungkyunkwan University(SKKU)
Choi, Youngjin	Hanyang University
Cho, Kyu-Jin	Seoul National University, Biorobotics Laboratory
Rodrigue, Hugo	Sungkyunkwan University
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, Soft Robot Applications Abstract: Soft robotic arms are actively being researched due to their high potential, but they still face challenges in terms of controllability due to their infinite degrees of freedom (DOFs). Hybrid hard-soft robotic arms have been proposed as a solution, but their performance is limited by only operating under positive or negative pressures due to the pneumatic chambers collapsing under a negative pressure. In this paper, we propose an origami based pneumatic robotic joint that can utilize both positive and negative pressures through the origami structure guiding the motion through both expansion and contraction of the chamber. This origami chamber is combined with a hard rotational structure, forming a hybrid hard-soft robotic joint. A single chamber bidirectional joint and a dual chamber joint with two antagonistic chambers were developed. Their range of motion, load-lifting capacity, and control performance under positive and negative pressures were evaluated. These joints were assembled into multi-DOFs joint which were easily controlled through decoupled controllers for each DOF and implemented into a 3-DOFs robot manipulator that could perform pick-and-place operations with objects weighing over 1 kg.

15:10-15:15, Paper TuCT26.3
Real-Time Shape Estimation of Tensegrity Structures Using Strut Inclination Angles

Bhat, Tufail Ahmad	Kyushu Institure of Technology
Yoshimitsu, Yuhei	Kyushu Institute of Technology
Wada, Kazuki	Kyushu Institute of Technology
Ikemoto, Shuhei	Kyushu Institute of Technology
Keywords: Soft Robot Materials and Design, Flexible Robotics, Compliant Joints and Mechanisms Abstract: Tensegrity structures are becoming widely used in robotics, such as continuously bending soft manipulators and mobile robots to explore unknown and uneven environments dynamically. Estimating their shape, which is the foundation of their state, is essential for establishing control. However, on-board sensor-based shape estimation remains difficult despite its importance, because tensegrity structures lack well-defined joints, which makes it challenging to use conventional angle sensors such as potentiometers or encoders for shape estimation. To our knowledge, no existing work has successfully achieved shape estimation using only onboard sensors such as Inertial Measurement Units (IMUs). This study addresses this issue by proposing a novel approach that uses energy minimization to estimate the shape. We validated our method through experiments on a simple Class 1 tensegrity structure, and the results show that the proposed algorithm can estimate the real-time shape of the structure using onboard sensors, even in the presence of external disturbances.

15:15-15:20, Paper TuCT26.4
Reconfigurable Soft Pneumatic Actuators Using Multi-Material Self-Healing Polymers

Kosaka, Shota	Osaka University
Kimura, Kentaro	Osaka University
Yamamoto, Seiichi	Osaka University
Ishizuka, Hiroki	Osaka University
Masuda, Yoichi	Osaka University
Punpongsanon, Parinya	Saitama University
Ikeda, Sei	Osaka University
Oshiro, Osamu	Osaka University
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, Soft Robot Applications Abstract: Self-healing polymers, which can autonomously repair damage or be repaired through simple treatments, have garnered attention due to their flexibility in addressing damage and enabling long-term use in soft robotics. Many studies on the application of self-healing polymers to soft robotics have focused on realizing healable systems. Such a mechanism makes it possible to construct soft robots. Despite previous research on reconfigurable soft robots, the reconfiguration of actuators ―critical components for determining actuation―has not been explored extensively. We propose a method for constructing reconfigurable soft pneumatic actuators using multi-material selfhealing polymers, thus enhancing the adaptability and functionality of soft robotic systems. The actuator comprises stretchable and less-stretchable self-healing polymers, which allow for durable and versatile configurations. By adjusting the placement of the less-stretchable polymer on the actuator surface, we can design and alter the deformation, thus achieving various forms of actuation. Our research demonstrates the feasibility of actuators capable of bending, contracting, twisting, and elongating. We provide examples of their application in actuator fabrication, thus showcasing the potential of our study in inspiring the design and fabrication of soft robots utilizing self-healing polymers.

15:20-15:25, Paper TuCT26.5
Rigid-Soft Hybrid Suction Cups for Enhanced Anti-Torque and Energy-Efficient Attachment

Qingkai, Guo	Xi'an Jiaotong University
Sun, Yu	Xi'an Jiaotong University
Zhao, Zipan	Xi'an Jiaotong University
Ning, Jiajia	Xi'an Jiaotong University
Ling, Wang	Xi'an Jiaotong University
Yuxin, Lv	Xi'an Jiaotong University
Xuefeng, Chen	State Key Laboratory for Manufacturing Systems Engineering Jiato
Yang, Laihao	Xi'an Jiaotong University
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Climbing Robots Abstract: In the realm of robotics, suction-based adhesion plays a pivotal role in applications ranging from object transfer to wall-climbing robots. To improve the sealing and attachment stability of suction cups, researchers have employed state-of-the-art techniques, including the use of soft materials with better conformal properties or jamming mechanisms. However, soft materials can cause undesired deformation of the suction cup under load, and the use of jamming mechanisms has limitations in terms of size and weight. This letter introduces a novel Rigid-Soft Hybrid Suction Cup (RSH-SC) designed for enhanced stability and energy-efficient attachment in nonideal conditions (i.e., suction on irregular and inclined surfaces). To emulate the octopus’s dexterous suction and the limpet’s robust adhesion capabilities, the RSH-SC integrates a rigid shell for better sealing and torque resistance. Notably, compared to a soft suction cup made from Ecoflex 00-50, the RSH-SC’s torque deformation resistance is 550 times greater. The RSH-SC’s unique structure also allows it to maintain secure attachment without continuous vacuum pressure, thus conserving energy. A crawling robot utilizing RSH-SCs showcased stable movement on ceiling, which can significantly advance the capabilities of soft robots in complex environments, paving the way for broader applications in robotics and automated systems.

15:25-15:30, Paper TuCT26.6
Soft Growing Robots Explore Unknown Environments through Obstacle Interaction

Wu, Haoran	Beihang University
Sun, Fuchun	Tsinghua University
Huang, Canwei	Shenzhen University
Huang, Haiming	Shenzhen University
Chu, Zhongyi	Beihang University
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, Mapping Abstract: In low-light, unstructured, and confined environments, performing Simultaneous Localization and Mapping (SLAM) with conventional methods presents significant challenges. Soft growing robots, characterized by their compliance and extensibility, interact safely with the environment, making them well-suited for navigation in such environments. Through collision-based guidance, the robot can gather environmental data via morphological adaptations. Based on this, we developed the sensing capabilities of the soft growing robot, retaining its flexibility while enabling effective environmental interaction and perception. The robot employs a gyroscope combined with an encoder to track the end-effector trajectory and uses flexible proximity sensing to detect obstacles. By fusing the information from these sensors, we propose a multi-sensor fusion strategy for environmental exploration of the soft growing robot. The robot navigates unknown environments by employing pre-bending based on prior environmental data and utilizing pneumatic artificial muscles. In multi-obstacle environmental exploration, the path prediction error is less than 3.5% of the robot’s total length, enabling greater environmental coverage with fewer exploration attempts.

15:30-15:35, Paper TuCT26.7
Soft Pneumatic Helical Actuators with Programmable Variable Curvatures

Xu, Zefeng	South China University of Technology
Liang, Jiaqiao	South China University of Technology
Zhou, Yitong	South China University of Technology
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, Soft Robot Applications Abstract: This paper proposes novel variable-curvature soft helical actuators (VC-SHAs) using fluidic prestressed composites (FPCs) for enhanced grasping performance. A third-order three-dimensional chained composite model called 3D-CCM is developed to pre-program variable curvatures of SHAs and predict their actuation responses. The proposed actuator design increases shape-matching areas between the objects and the soft actuators, highly improving grasping performance. Various VC-SHAs are fabricated and tested to validate the model and parametric studies are conducted to identify the effects of key parameters on actuator responses. A bioinspired passive dynamic gripper is developed with three VC-SHAs. Loading tests are conducted to evaluate its grasping success rates and maximum payload for various objects and object sizes, achieving a maximum payload of 26.56 N. The gripper is integrated to a drone to demonstrate its capability to perch on pipelines of different sizes and angles, and various objects.


TuCT27	103C
Parallel and Redundant Robots 1	Regular Session
Co-Chair: Zhang, Tianxue	Beihang University

15:00-15:05, Paper TuCT27.1
Design and Wrench-Feasible-Workspace Analysis for a Novel Cable-Driven Parallel Robot with Movable Anchor Winches

An, Hao	Harbin Institute of Technology Shenzhen
Zheng, Zongkai	Harbin Institute of Technology(Shenzhen)
Yuan, Han	Harbin Institute of Technology
Keywords: Parallel Robots, Mechanism Design, Kinematics Abstract: Traditional cable-driven parallel robots (T-CDPRs) typically employ multiple driving cables extending from fixed anchor points to connect with the end-effector. Once assembled, the anchor positions and the wrench feasible workspace of the CDPR are fixed and cannot be readily modified. This paper presents a spatial movable anchor winch cable-driven parallel robot (M-CDPR), whose distinctive feature lies in its ability to freely move anchor winches along enclosed circular tracks. Through modelling of the proposed CDPR, this study proposes an anchor winch reconfiguration methodology that significantly enhances the kinematic flexibility of the M-CDPR. Comparative simulation analyses of the wrench feasible workspace and orientation variation range between the M-CDPR and its fixed anchor counterpart with the identical dimensions demonstrates that the incorporation of movable anchor winches expands both the wrench feasible workspace and orientation variation range of the M-CDPR. Finally, a prototype was constructed, and relevant position control and impedance control experiments were conducted using the proposed anchor winch reconfiguration method.

15:05-15:10, Paper TuCT27.2
Many-Objective Motion Generation Method for Redundant Manipulators by Solving Pathwise Inverse Kinematics

Xie, Bin	Central South University
Zhao, Jiaming	Central South University
Wang, Qingfeng	Central South University
Wu, Di	Central South University
Keywords: Redundant Robots, Task and Motion Planning, Kinematics Abstract: Modern robots are required to operate in complex environments and perform diverse tasks, resulting in redundant degrees of freedom (DoF) for flexibility. However, managing redundancy is challenging due to the high-dimensional and non-convex nature of robotic kinematics. When executing complex tracking tasks, redundant robots must handle non-convex constraints while maintaining many objectives, such as balancing and obstacle avoidance. This paper models the pathwise inverse kinematics of redundant mechanisms as a multi-objective nonlinear optimization problem. We propose an efficient gradient-free optimization method named MoeIK, which demonstrates strong multi-objective balance, rapid global convergence, and adaptability. Our approach enhances the method by integrating relaxation dominance, adaptive interval search strategies, and a restart strategy, significantly improving performance in overcoming many-objective optimization challenges. We compared MoeIK with RelaxedIK, Trac-IK, and BioIK across multiple trajectories on various redundant robots, and the experimental results demonstrate that our algorithm exhibits better multi-objective balance capabilities and supports real-time computation.

15:10-15:15, Paper TuCT27.3
Human-Robot Co-Transportation Using Disturbance-Aware MPC with Pose Optimization

Mahmud, Al Jaber	George Mason University
Raj, Amir Hossain	George Mason University
Nguyen, Duc	George Mason University
Li, Weizi	University of Tennessee, Knoxville
Xiao, Xuesu	George Mason University
Wang, Xuan	George Mason University
Keywords: Redundant Robots, Human-Robot Teaming, Optimization and Optimal Control Abstract: This paper proposes a new control algorithm for human-robot co-transportation using a robot manipulator equipped with a mobile base and a robotic arm. We integrate the regular Model Predictive Control (MPC) with a novel pose optimization mechanism to more efficiently mitigate disturbances (such as human behavioral uncertainties or robot actuation noise) during the task. The core of our methodology involves a two-step iterative design: At each planning horizon, we determine the optimal pose of the robotic arm (joint angle configuration) from a candidate set, aiming to achieve the lowest estimated control cost. This selection is based on solving a disturbance-aware Discrete Algebraic Riccati Equation (DARE), which also determines the optimal inputs for the robot’s whole body control (including both the mobile base and the robotic arm). To validate the effectiveness of the proposed approach, we provide theoretical derivation for the disturbance-aware DARE and perform simulated experiments and hardware demos using a Fetch robot under varying conditions, including different trajectories and different levels of disturbances. The results reveal that our proposed approach outperforms baseline algorithms.

15:15-15:20, Paper TuCT27.4
Addressing Dimensional Scaling in Reinforcement Learning for Symbolic Locomotion Policies through Leveraging Inductive Priors

Fransen, Rogier	University of Surrey
Bowden, Richard	University of Surrey
Hadfield, Simon	University of Surrey
Keywords: Redundant Robots, Reinforcement Learning, Legged Robots Abstract: We explore symbolic policy optimization for various legged locomotion challenges; specifically walker environments ranging from bipedal to highly redundant systems with 128 legs. These represent a broad range of action space dimensionalities. We find that state-of-the-art symbolic policy optimization approaches struggle to scale to these higher dimensional problems, due to the need to iterate over action dimensions, and their reliance on a neural network anchor policy. We thus propose Fast Symbolic Policy (FSP) to accelerate the training of symbolic locomotion policies. This approach avoids the need to iterate over the action dimensions, and does not require a pre-trained neural network anchor. We also propose Dim-X, a method for effectively reducing the action space dimensionality using the inductive priors of legged locomotion. We demonstrate that FSP with Dim-X can learn symbolic policies, with improved scaling performance compared to the baselines, vastly exceeding that possible with previous symbolic techniques. We further show that Dim-X on its own can also be integrated into neural network policies to shorten their training time and improve scaling performance.

15:20-15:25, Paper TuCT27.5
LITHE-Joint: Variable Stiffness Compliant Spherical Contact Joint in an Under-Actuated System

Punapanont, Sanpoom	Vidyasirimedhi Institute of Science and Technology
Janna, Run	Vidyasirimedhi Institute of Science and Technology
Sison, Harn	Vidyasirimedhi Institute of Science and Technology (VISTEC)
Manoonpong, Poramate	Vidyasirimedhi Institute of Science and Technology (VISTEC)
Keywords: Compliant Joints and Mechanisms, Actuation and Joint Mechanisms, Legged Robots Abstract: The concept of morphological computation (MC) is applied in the robotics field to improve the design and reduce the complexity of control systems. The MC uses mechanical intelligence, where stiffness properties play an important role as constraints to enhance system flexibility and to store elastic energy. This can reduce the number of required actuators. According to the MC principle, This work proposes LITHE-joint: variable stiffness compliant spherical contact joint in an under-actuated system. This compact design for a 2-degrees of freedom (DOF) compliant spherical contact joint with controllable stiffness uses a pneumatic artificial muscle (PAM). This joint requires only one PAM actuator to control stiffness in a 2-DOF system, achieving a stiffness of up to 0.38 Nm/rad with a bandwidth of 0.1967 Nm/rad. With its variable stiffness properties, the joint is able to adapt its bending behavior, enabling energy redistribution of torque and angle. The modulation of torque and bending angle is governed by joint stiffness and the passive body dynamics. The benefits of the passive, compliant joint with a variable stiffness property are demonstrated by using as the spine of an under-actuated robot, controlling the passive bending of the body and the robot’s walking direction using the adjustable stiffness.

15:25-15:30, Paper TuCT27.6
Morphological Computation in Robotic Hopping: The Role of Monoarticular and Biarticular Muscle Configurations

Murcia i Matute, Marc	TU Darmstadt
Mohseni, Omid	Technische Universität Darmstadt
Seyfarth, Andre	TU Darmstadt
Sawicki, Gregory	Georgia Institute of Technology
Sharbafi, Maziar	Technische Universität Darmstadt
Keywords: Compliant Joints and Mechanisms, Biologically-Inspired Robots, Actuation and Joint Mechanisms Abstract: Human locomotion exhibits extraordinary adaptability and robustness, yet the mechanisms by which lower limbs adjust to sudden environmental disruptions remain poorly understood. To address this, we employed the bioinspired human-sized EPA-Hopper II robot to examine how lower-limb joints recover from an abrupt drop in ground height, mimicking unexpected perturbations encountered in natural settings. Our study investigates the roles of the monoarticular soleus (SOL) and biarticular gastrocnemius (GAS) muscle configurations, focusing on how their compliance influences the robot’s hopping stability. Experiments reveal that a coordinated interplay between SOL and GAS markedly improves recovery from disturbances, enhancing energy distribution and joint synchronization. Detailed kinematic and power analyses show that GAS facilitates energy transfer across joints, while SOL’s spring-like properties support rapid recovery. These results highlight how bioinspired muscle arrangements enable robust locomotion through intrinsic mechanical interactions. By leveraging a robotic platform to probe these dynamics, this work deepens our understanding of biological locomotion and informs the design of bioinspired bipedal robots and prosthetics capable of thriving in unpredictable environments.

15:30-15:35, Paper TuCT27.7
Abdominal Undulation with Compliant Mechanism Improves Flight Performance of a Biomimetic Robotic Butterfly

Lian, Xuyi	Zhejiang University
Luo, MingYu	Zhejiang University
Lin, Te	Zhejiang University
Qian, Chen	University of Science and Technology of China
Li, Tiefeng	Zhejiang University
Keywords: Biologically-Inspired Robots, Compliant Joints and Mechanisms, Flexible Robotics Abstract: This paper presents the design, modeling, and experimental validation of a biomimetic robotic butterfly (BRB) that integrates a compliant mechanism to achieve coupled wing-abdomen motion. Drawing inspiration from the natural flight dynamics of butterflies, a theoretical model is developed to investigate the impact of abdominal undulation on flight performance. To validate the model, motion capture experiments are conducted on three configurations: a BRB without an abdomen, with a fixed abdomen, and with an oscillating abdomen. The results demonstrate that abdominal undulation enhances lift generation, extends flight duration, and stabilizes pitch oscillations, thereby improving overall flight efficiency. These findings underscore the significance of abdomen-wing interaction in flapping-wing aerial vehicles (FWAVs) and lay the groundwork for future advancements in energy-efficient biomimetic flight designs.

15:35-15:40, Paper TuCT27.8
The Parallel Pneumatic Artificial Muscle Platform Based on RBF Neural Network Compensation

Li, Jun	Chinese Academy of Sciences
Dai, YuanQuan	Haixi Institutes, Chinese Academy of Sciences
Zhang, Dongdong	Fuzhou University
Yu, Ruidong	Fuzhou University
Zi, MingKang	Haixi Institutes, Chinese Academy of Sciences
Liu, Shuaicheng	Haixi Institutes, Chinese Academy of Sciences
Xie, Yinhui	CAS
Keywords: Biologically-Inspired Robots, Motion Control, Compliant Joints and Mechanisms Abstract: A two-degree-of-freedom parallel mechanism control system based on an adaptive learning rate and radial basis function (RBF) neural network controller is studied in this paper. The mechanism is composed of four pneumatic artificial muscles(PAM), forming two pairs of antagonistic single-degree-of-freedom joints, which enable two-degree-of-freedom motion along the X and Y axes. The core objective of the system is to automatically output the air pressure values for the X and Y axes based on the input desired angle, driving the joints to precisely reach the specified angle. In this research, dynamic modeling of the two pairs of driving joints composed of four pneumatic muscles was conducted, analyzing the motion characteristics of the system. Subsequently, an RBF neural network was employed to approximate system modeling errors and external disturbances, combined with a PID controller to optimize the driving performance of the pneumatic muscles. The stability of the controller was proven by designing the Lyapunov function, ensuring that the system remains stable during dynamic changes. Finally, simulation experiments were conducted using MATLAB/Simulink to verify the effectiveness of the proposed algorithm. The experimental results demonstrate that the control algorithm enables the actual angle to track the desired angle in real-time, with high control accuracy and stability. This research provides a new solution for the precise control of pneumatic muscle-driven parallel joint systems, with broad application prospects, effectively addressing the limitations of traditional PAM control methods that require precise modeling and suffer from poor robustness.


TuCT28	104
Marine Robotics 3	Regular Session
Co-Chair: Zeng, Zheng	Shanghai Jiao Tong University

15:00-15:05, Paper TuCT28.1
Design and Motion Performance of a Novel Variable-Area Tail for Underwater Gliders (I)

Wang, Gongbo	Tianjin University
Wang, Yanhui	Tianjin University
Yang, Ming	Tianjin University
Yang, Shaoqiong	Tianjin University
Keywords: Marine Robotics, Dynamics, Performance Evaluation and Benchmarking Abstract: There is a mutually exclusive relationship between the course stability and heading maneuverability of underwater gliders (UGs), making it challenging to improve the critical motion performance in the design of UGs. An effective measure to solve this issue is to optimize and modify the hydrodynamic shape of UGs. For better motion performance of UGs, this study proposes a novel variable-area tail (VAT) by introducing a multi-axis variable structure, which can adapt to the ocean currents automatically. We optimize the core parameters of the tail by approximate modeling. To evaluate the motion performance of the UG with the proposed tail, a dynamic model is established, which is verified by sea trial data. With fixed initial parameters, the novel tail improves the stability criterion of the UGs by about 6.88% in terms of course stability and reduces the turning radius by about 75.61% regarding heading maneuverability. Our design concept and performance verification have excellent scalability, and the proposed VAT provides a valuable reference for the shape design of other underwater vehicles.

15:05-15:10, Paper TuCT28.2
UVS: A Novel Underwater Vehicle with Integrated VCMS-Thrusters Hybrid Architecture for Enhanced Attitude Regulation

Zhang, Suohang	Zhejiang University
Qian, Shipang	Zhejiang University
Liu, Ruiheng	Zhejiang University
Wang, Lu	Zhejiang University
Fei, Xinyu	Zhejiang University
Chen, Yanhu	Zhejiang University
Keywords: Marine Robotics, Engineering for Robotic Systems, Mechanism Design Abstract: Autonomous Underwater Vehicles (AUVs) require energy-efficient and responsive attitude control for underwater operations. We present UVS, a novel underwater vehicle that combines Variable Center of Mass System (VCMS) and thrusters for hybrid attitude regulation. Through multi-objective optimization of the VCMS structure, we achieved a 5.19% larger pitch angle range while reducing space occupation by 15.72%. Pool experiments demonstrated near-linear pitch control from 17.5° to 172.5° with stable horizontal-vertical mode transitions. Our proposed collaborative control method integrates VCMS and thruster advantages, enabling rapid convergence to target attitudes with long-term stability. The results show UVS's potential for energy-efficient, wide-range attitude control in mobile ocean sensing applications.

15:10-15:15, Paper TuCT28.3
Design and Implementation of a Bone-Shaped Hybrid Aerial Underwater Vehicle

Bi, Yuanbo	Shanghai Jiao Tong University
Xu, Zhuxiu	Shanghai Jiao Tong University
Shen, Yishu	Shanghai Jiao Tong University
Zeng, Zheng	Shanghai Jiao Tong University
Lian, Lian	Shanghai Jiaotong University
Keywords: Marine Robotics, Field Robots, Aerial Systems: Applications Abstract: The hybrid aerial underwater vehicles (HAUVs) have promising applications due to their capability to operate in both air and water domains. It is a challenge to balance the rapid maneuverability in both media and the stability at the cross-domain phase. In this letter, Nezha-B, a bone-shaped HAUV prototype, is proposed. The double-layered and staggered quadrotor layout results in a shortened width of the vehicle. Along with the streamlined buoyancy material and lightweight marine thrusters, the underwater speed reaches 1.67m/s. This allows Nezha-B to reconcile the rapid maneuverability of a slender underwater vehicle and the stability of a quadrotor without the additional morphing mechanics. Meanwhile, Nezha-B weighs only 900g and can pass through a circular gap with the diameter (124mm) of its aerial propeller, making it portable and capable of operating in narrow environments. Simulations in the air domain evaluate the responses of the attitude controller. Underwater dynamic calculations demonstrate the hydrodynamic properties. The integrated amphibious locomotion capability is verified through field tests.

15:15-15:20, Paper TuCT28.4
Application of Soft Constraints on Mirror Position to Improve Robustness of Optical Target Positioning in Shallow Water

Zhang, Xiangjie	Beijing Jiaotong University
Hou, Taogang	Beijing Jiaotong University
Qin, Hong-De	Harbin Engineering University
Li, Haojie	Beijing Jiaotong University
Wang, Xiangxing	Beijing Jiaotong University, School of Electronics and Informati
Wang, Hao	Beihang University
Cao, Yong	Northwestern Polytechnical University
Keywords: Marine Robotics, Field Robots, Calibration and Identification Abstract: The unique optical characteristics of the underwater environment, such as light refraction and loss of salient features, pose a significant challenge to traditional vision sensors, especially in the swarm operation scenario where multiple autonomous underwater vehicles (AUVs) cooperate with the mother ship for positioning. To address these challenges, this study proposes the application of soft constraints on mirror position to improve the robustness of optical target positioning in shallow water. During the mapping phase, we establish and optimize the pose relationships between ArUco markers and their mirrors, thereby expanding the locatable space for AUVs (Autonomous Underwater Vehicles) . With the arrangement of ArUco markers unchanged, the number of usable markers doubles. Surface mirror-assisted positioning provides more visual features and additional computed corner points, enhancing the reliability of camera observations and improving positioning accuracy. Experimental results demonstrate that, compared to classical algorithms from the Artificial Vision Applications (A.V.A) laboratory, our method improves position accuracy by 25.8% in single-marker scenarios and by 14.7% in multi-marker scenarios. Therefore, our method provides enhanced mapping and positioning for AUVs in shallow water areas where optical markers can be deployed.This study provides a new mapping paradigm and multi-body localization solution for optical marker-guided underwater swarm operations.

15:20-15:25, Paper TuCT28.5
Safe Motion Planning and Control Using Predictive and Adaptive Barrier Methods for Autonomous Surface Vessels

Gonzalez-Garcia, Alejandro	KU Leuven
Xiao, Wei	MIT
Wang, Wei	University of Wisconsin-Madison
Astudillo, Alejandro	KU Leuven
Decré, Wilm	Katholieke Universiteit Leuven
Swevers, Jan	KU Leuven
Ratti, Carlo	Massachusetts Institute of Technology
Rus, Daniela	MIT
Keywords: Marine Robotics, Field Robots, Collision Avoidance Abstract: Safe motion planning is essential for autonomous vessel operations, especially in challenging spaces such as narrow inland waterways. However, conventional motion planning approaches are often computationally intensive or overly conservative. This paper proposes a safe motion planning strategy combining Model Predictive Control (MPC) and Control Barrier Functions (CBFs). We introduce a time-varying inflated ellipse obstacle representation, where the inflation radius is adjusted depending on the relative position and attitude between the vessel and the obstacle. The proposed adaptive inflation reduces the conservativeness of the controller compared to traditional fixed-ellipsoid obstacle formulations. The MPC solution provides an approximate motion plan, and high-order CBFs ensure the vessel's safety using the varying inflation radius. Simulation and real-world experiments demonstrate that the proposed strategy enables the fully-actuated autonomous robot vessel to navigate through narrow spaces in real time and resolve potential deadlocks, all while ensuring safety.

15:25-15:30, Paper TuCT28.6
EDeformNet: Estimating Fishing Net Deformations from Sparse Observations

Wijegunawardana, Isira Damsith	University of Technology Sydney
Valls Miro, Jaime	University of Technology Sydney
Quincoces, Inaki	AZTI Foundation
Zhao, Liang	The University of Edinburgh
Huang, Shoudong	University of Technology, Sydney
Keywords: Marine Robotics, Field Robots, Computational Geometry Abstract: This paper introduces EDeformNet, a novel method for real-time 3D reconstruction of fishing nets using sparse positional measurements. Currently, net deployment during large-scale fishing operations is challenging as the submerged lattice deformations that occur in response to the various environmental factors are not visible to the vessel operator. EDeformNet extends Embedded Deformation Graphs (EDGs), a commonly used technique in template-based nonrigid 3D reconstruction that allows control of embedded spaces through sparse control point correspondences. These can be suitably derived from acoustic tracking beacons attached to the net. EDeformNet enhances the standard EDG optimization scheme by including constraints that preserve surface normals at control points and guard distances between vertices in the template mesh. These improvements are proven to enable an accurate representation of the complex deformations and movements typical in purse seine nets, the fishing technique where the algorithm has been tested, which standard EDG is unable to attain. Moreover, EDeformNet also proposes a tailored strategy that dynamically adjusts the net template according to the known length of the deployed portion of the fishing net. This approach reconstructs exclusively the submerged portion of the fishing net, avoiding extraneous data from above-water sections and enhancing accuracy under realistic fishing conditions. The proposed method is validated using realistic 3D physics simulations in Blender, where quantifiable comparisons demonstrate that EDeformNet effectively captures the spatial dynamics of purse-seining. Compared to standard EDG, EDeformNet achieves superior performance, resulting in at least a 25% improvement across the array of challenging temporal scenarios studied.

15:30-15:35, Paper TuCT28.7
Experimental Open-Source Framework for Underwater Pick-And-Place Studies with Lightweight UVMS - an Extensive Quantitative Analysis

Bauschmann, Nathalie	Hamburg University of Technology
Lenz, Vincent	Hamburg University of Technology
Seifried, Robert	Hamburg University of Technology
Duecker, Daniel Andre	Technical University of Munich (TUM)
Keywords: Marine Robotics, Field Robots, Mobile Manipulation Abstract: The rise of lightweight, low-cost underwater vehicle-manipulator systems (UVMS) has made autonomous underwater manipulation increasingly accessible. Yet, most current research remains limited to isolated tasks, such as trajectory tracking or compensation of unknown payloads. Detailed experimental analyses that go beyond a proof-of-concept are particularly rare. We present a comprehensive open-source software framework for fully automated pick-and-place studies. We build upon our previous work on a task-priority control framework and extend it to enable fully autonomous manipulation. This includes a high-level decision-making process to coordinate the pick-and-place sequence and a grasp detection method to verify the successful pick-up of the object. We demonstrate this framework on the widely-used platform of a BlueROV2 and an Alpha5 manipulator. Extensive quantitative experimental studies (100+ trials) show the picking and placing to be highly accurate, with mean position errors of <5mm and <10mm, respectively. We additionally validate our grasp detection approach and analyze trajectory tracking sensitivity to varying payloads and speeds. These results provide a baseline of what accuracy is currently achievable with state-of-the-art lightweight hardware under ideal research conditions. The code is available at https://github.com/HippoCampusRobotics/uvms.

15:35-15:40, Paper TuCT28.8
Design and Development of the MeCO Open-Source Autonomous Underwater Vehicle

Widhalm, David	University of Minnesota
Ohnsted, Cory	University of Minnesota
Knutson, Corey	University of Minnesota - Twin Cities
Kutzke, Demetrious T.	University of Minnesota - Twin Cities
Singh, Sakshi	University of Minnesota
Mukherjee, Rishi	University of Minnesota
Schwidder, Grant	University of Minnesota
Wu, Ying-Kun	University of Minnesota
Sattar, Junaed	University of Minnesota
Keywords: Marine Robotics, Field Robots Abstract: We present MeCO, the Medium Cost Open-source autonomous underwater vehicle (AUV), a versatile autonomous vehicle designed to support research and development in underwater human-robot interaction (UHRI) and marine robotics in general. An inexpensive platform to build compared to similarly-capable AUVs, the MeCO design and software are released under open-source licenses, making it a cost effective, extensible, and open platform. It is equipped with UHRI-focused systems, such as front and side facing displays, light-based communication devices, a transducer for acoustic interaction, and stereo vision, in addition to typical AUV sensing and actuation components. Additionally, MeCO is capable of real-time deep learning inference using the latest edge computing devices, while maintaining low-latency, closed-loop control through high-performance microcontrollers. MeCO is designed from the ground up for modularity in internal electronics, external payloads, and software architecture, exploiting open-source robotics and containerarization tools. We demonstrate the diverse capabilities of MeCO through simulated, closed-water, and open-water experiments. All resources necessary to build and run MeCO, including software and hardware design, have been made publicly available.


TuCT29	105
SLAM 3	Regular Session
Chair: Zhang, Lianxin	The Chinese University of Hong Kong, Shenzhen
Co-Chair: Dong, Yi	Nanjing University of Science and Technology

15:00-15:05, Paper TuCT29.1
MeGS-SLAM: Memory Efficient Gaussian Splatting SLAM with Graph Signal Processing

Zhang, Sude	Sun Yat-Sen University
Zhang, Zhiyong	Sun Yat-Sen University
Keywords: SLAM, Mapping, RGB-D Perception Abstract: The recent 3D Gaussian Splatting simultaneous localization and mapping (3DGS-SLAM) has achieved highfidelity reconstruction from RGB-D images. However, the input lacks edge information in the scene, resulting in inability of Gaussians to accurately model edges and appearance of artifacts at object edges. 3DGS-SLAM involves in a pointbased representation method. It produces massive Gaussians for obtaining a detailed map, causing low rendering speed and high storage. To overcome above shortages, we propose a novel SLAM frame with edge priors constraint that adds edge attributes to Gaussians for expressing edge information and edge loss is further introduced to enable Gaussians to accurately reconstruct edges and suppress artifacts. Furthermore, we propose graph signal processing for local Gaussians to establish relationships among irregular Gaussians and extract geometric features from Gaussian scene representations, which are used to efficiently reduce redundant Gaussians without sacrificing performance of tracking and reconstruction. Experiments performed on synthetic and real-world datasets show that our method achieves over 2× compression in memory usage and increases nearly 250% rendering speed while maintaining tracking and mapping performance. Additional information can be found on our project page: yoona12.github.io/MeGSSLAM.github.io.

15:05-15:10, Paper TuCT29.2
Embracing Dynamics: Dynamics-Aware 4D Gaussian Splatting SLAM

Sun, Zhicong	Hong Kong Polytechnic University
Lo, Jacqueline Tsz Yin	The Hong Kong Polytechnic University
Hu, Jinxing	Shenzhen Institutes of Advanced Technology, Chinese Academy of S
Keywords: SLAM, Mapping, Visual Tracking Abstract: Simultaneous localization and mapping (SLAM) technology has recently achieved photorealistic mapping capabilities thanks to the real-time, high-fidelity rendering enabled by 3D Gaussian Splatting (3DGS). However, due to the static representation of scenes, current 3DGS-based SLAM encounters issues with pose drift and failure to reconstruct accurate maps in dynamic environments. To address this problem, we present D4DGS-SLAM, the first SLAM method based on 4DGS map representation for dynamic environments. By incorporating the temporal dimension into scene representation, D4DGS-SLAM enables high-quality reconstruction of dynamic scenes. Utilizing the dynamics-aware InfoModule, we can obtain the dynamics, visibility, and reliability of scene points, and filter out unstable dynamic points for tracking accordingly. When optimizing Gaussian points, we apply different isotropic regularization terms to Gaussians with varying dynamic characteristics. Experimental results on real-world dynamic scene datasets demonstrate that our method outperforms state-of-the-art approaches in both camera pose tracking and map quality.

15:10-15:15, Paper TuCT29.3
Affine-SLAM a Closed-Form Solution to the Landmark-SLAM Via Affine Relaxation

Yang, Shaoran	Tongji University
Dong, Yi	Nanjing University of Science and Technology
Keywords: SLAM, Mapping, Localization Abstract: We propose a closed-form solution to the landmark simultaneous localization and mapping (landmark-SLAM) problem. The core idea is to extend the recent advancement in the generalized Procrustes analysis (GPA) research by incorporating an affine-relaxed odometry term. We show that the resulting affine relaxed landmark-SLAM formulation, termed affine-SLAM, can be solved globally in closed-form. Through numerical experiments, we demonstrate that the affine-SLAM solution is rather close to the optimal solution of the standard nonlinear least squares (NLS) optimization, and thus can be used either as a stand-alone approximate solution or as a high-quality initialization for NLS solvers.

15:15-15:20, Paper TuCT29.4
Intensity-Augmented LiDAR-Visual-Inertial Odometry and Meshing

Hua, YunFeng	ZheJiang University
Liu, QinYu	Shining 3d Tech Co., Ltd
Lin, Zhong Wei	Shining 3d Tech Co., Ltd
Gong, Xiao	Shining 3d Tech Co., Ltd
Zhao, Bintao	Shining 3D Tech Co., Ltd
Zhang, Jian	Shining 3D Tech Co., Ltd
Jiang, Tengfei	Shining3d Tech Co., Ltd
Xu, Weiwei	Zhejiang University
Keywords: SLAM, Mapping, Computer Vision for Automation Abstract: This paper presents a tightly-coupled LiDAR-Visual-Inertial Odometry (LIVO) system that integrates both LIO and VIO subsystems. The system jointly estimates the state by fusing LiDAR or visual data with Inertial Measurement Units (IMUs). It employs point-to-mesh tracking to optimize LiDAR poses and leverages intensity information from LiDAR point clouds to refine camera pose estimation. The optimized camera pose, derived from VIO, plays a crucial role in texture mapping and 3D geometry synthesis (3DGS) rendering. Our experiments demonstrate a significant improvement in average Peak Signal-to-Noise Ratio (PSNR) compared to existing methods, including R3LIVE, SR-LIVO, and FAST-LIVO. Furthermore, the system features a real-time mapping module implemented on the GPU, utilizing Truncated Signed Distance Function (TSDF) fields for global map maintenance and the Marching Cubes algorithm for mesh extraction. This approach ensures rapid and precise tracking and reconstruction capabilities. Additionally, our system supports real-time remeshing of the global map upon detecting loop closures, thereby enhancing the robustness and accuracy of the overall SLAM process.

15:20-15:25, Paper TuCT29.5
FGS-SLAM: Fourier-Based Gaussian Splatting for Real-Time SLAM with Sparse and Dense Map Fusion

Xu, Yansong	University of Chinese Academy of Sciences
Li, Junlin	Shenyang Institute of Automation Chinese Academy of Sciences
Zhang, Wei	State Key Laboratory of Robotics, Shenyang Institute of Automati
Chen, Siyu	University of Chinese Academy of Sciences
Zhang, Shengyong	Chinese Academy of Sciences
Leng, Yuquan	Harbin Institute of Technology (Shenzhen)
Zhou, Weijia	Shenyang Institute of Automation Chinese Academy of Sciences
Keywords: SLAM, Mapping, Visual Tracking Abstract: 3D gaussian splatting has advanced simultaneous localization and mapping (SLAM) technology by enabling real-time positioning and the construction of high-fidelity maps. However, the uncertainty in gaussian position and initialization parameters introduces challenges, often requiring extensive iterative convergence and resulting in redundant or insufficient gaussian representations. To address this, we introduce a novel adaptive densification method based on Fourier frequency domain analysis to establish gaussian priors for rapid convergence. Additionally, we propose constructing independent and unified sparse and dense maps, where a sparse map supports efficient tracking via Generalized Iterative Closest Point (GICP) and a dense map creates high-fidelity visual representations. This is the first SLAM system leveraging frequency domain analysis to achieve high-quality gaussian mapping in real-time. Experimental results demonstrate an average frame rate of 36 FPS on Replica and TUM RGB-D datasets, achieving competitive accuracy in both localization and mapping. The source code is publicly available at https://github.com/3DV-Coder/FGS-SLAM.

15:25-15:30, Paper TuCT29.6
KISS-SLAM: A Simple, Robust, and Accurate 3D LiDAR SLAM System with Enhanced Generalization Capabilities

Guadagnino, Tiziano	University of Bonn
Mersch, Benedikt	University of Bonn
Gupta, Saurabh	University of Bonn
Vizzo, Ignacio	Dexory
Grisetti, Giorgio	Sapienza University of Rome
Stachniss, Cyrill	University of Bonn
Keywords: SLAM, Mapping, Localization Abstract: Robust and accurate localization and mapping of an environment using laser scanners, so-called LiDAR SLAM, is essential to many robotic applications. Early 3D LiDAR SLAM methods often exploited additional information from IMU or GNSS sensors to enhance localization accuracy and mitigate drift. Later, advanced systems further improved the estimation at the cost of a higher runtime and complexity. This paper explores the limits of what can be achieved with a LiDAR-only SLAM approach while following the “Keep It Small and Simple” (KISS) principle. By leveraging this minimalist design principle, our system, KISS-SLAM, achieves state-of-the-art performance in pose accuracy while requiring little to no parameter tuning for deployment across diverse environments, sensors, and motion profiles. We follow best practices in graph-based SLAM and build upon LiDAR odometry to compute the relative motion between scans and construct local maps of the environment. To correct drift, we match local maps and optimize the trajectory in a pose graph optimization step. The experimental results demonstrate that this design achieves competitive performance while reducing complexity and reliance on additional sensor modalities. By prioritizing simplicity, this work provides a new strong baseline for LiDAR-only SLAM and a high-performing starting point for future research. Furthermore, our pipeline builds consistent maps that can be used directly for downstream tasks like navigation. Our open-source system operates faster than the sensor frame rate in all presented datasets and is designed for real-world scenarios.

15:30-15:35, Paper TuCT29.7
SDF-Guided Keyframe Selection: Novel Boost for NeRF SLAM Loop Closure

Hui, Ma	Beijing Normal-Hong Kong Baptist University(BNBU), Shenzhen Inst
Yu, Liu	Shenzhen Institute of Advanced Technology Chinese Academy of Sci
Cheng, Jun	Shenzhen Institutes of Advanced Technology
Keywords: SLAM, Mapping, View Planning for SLAM Abstract: In the domain of Simultaneous Localization and Mapping (SLAM), loop closure is a linchpin for achieving accurate and consistent 3D environment mapping. However, the process is fraught with abrupt light changes and motion blur. These elements introduce uncertainties and inaccuracies in the data captured by sensors, severely undermining the system's robustness. To address this critical challenge, we present a novel SDF-guided keyframe selection algorithm tailored for loop closure. Our approach capitalizes on the geometric insights provided by the Signed Distance Function (SDF) to meticulously choose keyframes, effectively mitigating the impact of noisy data. By doing so, we enhance the reliability of loop closure, refine the accuracy of 3D map reconstructions, and fortify the overall stability of the system. Our algorithm's efficacy is substantiated through comprehensive experiments on datasets like Replica, ScanNet, and Tum-RGBD. Notably, it can be easily integrated as a plug-and-play module into diverse existing methods, enhancing their performance across different scenarios. Real-world trials using a hand-held LeTMC-520 camera for indoor scene reconstruction further validate its practicality and effectiveness.

15:35-15:40, Paper TuCT29.8
VoxEKF-RIO: A 4D Radar Inertial Odometry Based on Incremental Voxel Map and Iterated Kalman Filter

Shen, Jiawei	Nankai University
Shen, Chenyu	Nankai University
Deng, Zishun	Nankai University
Lin, Wanbiao	Nankai University
Shi, Bohan	Nankai University
Sun, Lei	Nankai University
Keywords: SLAM, Localization, Mapping Abstract: 4D mmWave radar provides the point cloud with range, azimuth, elevation, Doppler velocity and operates normally in severe weather conditions. However, due to wavelength characteristics, the noisy and sparse point cloud that 4D radar collects poses great challenges for SLAM research. In this paper, we propose VoxEKF-RIO, a 4D radar inertial odometry. VoxEKF-RIO filters out noisy points and estimates ego-velocity through a preprocessing module and maintains an incremental voxel map to represent the probabilistic models of environments. To improve the accuracy, a reliable scan-to-submap matching method is designed based on the voxel map, using a point filter to obtain valid points with reliable matches, and adopting a distribution-to-distribution matching distance. Iterative Kalman filter is used to fuse radar velocity, point cloud registration, and IMU data for estimating the platform’s motion. The experiments on publicly available 4D radar datasets demonstrate the reliability and high accuracy of VoxEKF-RIO. The ablation studies reveal the benefit of voxel map in describing the environment characteristics and the reliable matching method.


TuCT30	106
Aerial Systems	Regular Session
Chair: Han, Lijun	Shanghai Jiao Tong University
Co-Chair: Nascimento, Tiago	Universidade Federal Da Paraiba

15:00-15:05, Paper TuCT30.1
Online Motion Planning for Quadrotor Multi-Point Navigation Using Efficient Imitation Learning-Based Strategy

Zhou, Jin	Zhejiang University
Mei, Jiahao	Zhejiang University of Technology
Zhao, Fangguo	Zhejiang University
Chen, Jiming	Zhejiang University
Li, Shuo	Zhejiang University
Keywords: Motion and Path Planning, Aerial Systems: Applications Abstract: Over the past decade, there has been a remarkable surge in utilizing quadrotors for various purposes due to their simple structure and aggressive maneuverability. One of the key challenges is online time-optimal trajectory generation and control technique. This paper proposes an imitation learning-based online solution to efficiently navigate the quadrotor through multiple waypoints with near-time-optimal performance. The neural networks (WN&CNets) are trained to learn the control law from the dataset generated by the time-consuming CPC algorithm and then deployed to generate the optimal control commands online to guide the quadrotors. To address the challenge of limited training data and the hover maneuver at the final waypoint, we propose a transition phase strategy that utilizes MINCO trajectories to help the quadrotor 'jump over' the stop-and-go maneuver when switching waypoints. Our method is demonstrated in both simulation and real-world experiments, achieving a maximum speed of 5.6m/s while navigating through 7 waypoints in a confined space of 5.5m × 5.5m × 2.0m. The results show that with a slight loss in optimality, the WN&CNets significantly reduce the processing time and enable online control for multi-point flight tasks.

15:05-15:10, Paper TuCT30.2
Prescribed-Time Safe Pursuit Control with Dynamic Obstacle and Occlusion Avoidance

Li, Zheng	Beihang University
Shao, Xiaodong	Beihang University
Haoran, Li	Beihang University
Li, Dongyu	Beihang University
Hu, Qinglei	Beihang University
Keywords: Motion and Path Planning, Aerial Systems: Applications, Task and Motion Planning Abstract: Performing target tracking and surveillance in dynamic obstacle environments requires maintaining continuous visual focus on the target while ensuring collision avoidance. This paper presents a safety-critical tracking control method that ensures dynamic obstacles remain outside the camera's line of sight while simultaneously avoiding collisions between the chaser vehicle and obstacles. A novel real-time occlusion detection function is developed, and motion constraints are systematically integrated using a hybrid framework combining the artificial potential field (APF) method with an observer-based control strategy. To address temporal-sensitive tasks, a prescribed time controller (PTC) based on time-scale transformation technique has been proposed. Furthermore, a prescribed time linear extended state observer (PTESO) is proposed, featuring a simplified structure to enable rapid and accurate estimation of unknown environmental disturbances and nonlinear terms. Finally, the effectiveness of the proposed method was verified via simulation in a simplified physical scenario.

15:10-15:15, Paper TuCT30.3
Parameterized Motion Planning for Aerial Manipulators in Contact with Unstructured Surfaces

Zhang, Zhixing	Hunan University
Zhong, Hang	Hunan University
Lin, Chaoquan	Hunan University
Wang, Weizheng	Hunan University
Hua, Hean	Hunan University
Zhang, Hui	Hunan University
Wang, Yaonan	Hunan University
Keywords: Motion and Path Planning, Aerial Systems: Applications, Manipulation Planning Abstract: Motion planning for continuous contact-based aerial manipulators on complex unstructured surfaces remains a substantial challenge due to the sophisticated topology of unstructured surfaces. While direct planning in the high-dimensional configuration space manifolds faces efficiency limitations, simplified planning in the parametric space sacrifices trajectory quality. Therefore, this paper proposes a sampling-based motion planning method, namely, parameter-configuration space fast marching tree (PCS-FMT), which integrates both configuration and parameter space information. The proposed PCS-FMT introduces a reparameterization strategy that compresses the planning space into a low-dimensional parameter manifold while preserving metric consistency with the original configuration space. Thus, PCS-FMT* can efficiently plan in the parameter space and optimize the motion trajectory. Simulations on challenging unstructured surfaces validate the effectiveness of PSC-FMT* for aerial manipulators in contact with unstructured surfaces.

15:15-15:20, Paper TuCT30.4
Uncertainty-Aware Multi-Robot Flocking Via Learned State Estimation and Control Barrier Functions

Catellani, Mattia	University of Modena and Reggio Emilia
Sabattini, Lorenzo	University of Modena and Reggio Emilia
Keywords: Multi-Robot Systems, Aerial Systems: Perception and Autonomy, Optimization and Optimal Control Abstract: Information exchange is crucial for optimal coordination of robots, but a link may not always be available among agents to share data. For this reason, this paper presents a decentralized solution for flocking control, leveraging state and uncertainty estimation of undetected robots. A neural network is trained to mimic a state estimator, also providing information about the uncertainty of the estimate. This uncertainty is used to weigh the contribution of the estimate in taking actions for coordination. Using Control Barrier Function and Control Lyapunov Functions, we define an optimization problem to find an optimal control input to reproduce collective motion observed in nature. We evaluate both the learned estimator and the control strategy with extensive simulations.

15:20-15:25, Paper TuCT30.5
PI-WAN: A Physics-Informed Wind-Adaptive Network for Quadrotor Dynamics Prediction in Unknown Environments

Wang, Mengyun	National University of Defense Technology
Wang, Bo	National University of Defense Technology
Niu, Yifeng	National University of Defense Technology
Wang, Chang	National University of Defense Technology
Keywords: Model Learning for Control, Machine Learning for Robot Control, Aerial Systems: Mechanics and Control Abstract: Accurate dynamics modeling is essential for quadrotors to achieve precise trajectory tracking in various applications. Traditional physical knowledge-driven modeling methods face substantial limitations in unknown environments characterized by variable payloads, wind disturbances, and external perturbations. On the other hand, data-driven modeling methods suffer from poor generalization when handling out-of-distribution (OoD) data, restricting their effectiveness in unknown scenarios. To address these challenges, we introduce the Physics-Informed Wind-Adaptive Network (PI-WAN), which combines knowledge-driven and data-driven modeling methods by embedding physical constraints directly into the training process for robust quadrotor dynamics learning. Specifically, PI-WAN employs a Temporal Convolutional Network (TCN) architecture that efficiently captures temporal dependencies from historical flight data, while a physics-informed loss function applies physical principles to improve model generalization and robustness across previously unseen conditions. By incorporating real-time prediction results into a model predictive control (MPC) framework, we achieve improvements in closed-loop tracking performance. Comprehensive simulations and real-world flight experiments demonstrate that our approach outperforms baseline methods in terms of prediction accuracy, tracking precision, and robustness to unknown environments.

15:25-15:30, Paper TuCT30.6
Design and Control of an Actively Morphing Quadrotor with Vertically Foldable Arms

Yeh, Tingyu	Shanghai Jiaotong University
Xu, Mengxin	Shanghai Jiao Tong University
Han, Lijun	Shanghai Jiao Tong University
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control Abstract: In this work, we propose a novel quadrotor design capable of folding its arms vertically to grasp objects and navigate through narrow spaces. The transformation is controlled actively by a central servomotor, gears, and racks. The arms connect the motor bases to the central frame, forming a parallelogram structure that ensures the propellers maintain a constant orientation during morphing. In its stretched state, the quadrotor resembles a conventional design, and when contracted, it functions as a gripper with grasping components emerging from the motor bases. To mitigate disturbances during transforming and grasping payloads, we employ an adaptive sliding mode controller with a disturbance observer. After fully folded, the quadrotor frame shrinks to 67% of its original size. The control performance and versatility of the morphing quadrotor are validated through real-world experiments.

15:30-15:35, Paper TuCT30.7
TRACER: Thrust Auto-Calibration and Ground Effect Estimation Using Onboard Force Sensitive Resistor Array for Multirotors

Lou, Baichuan	Nanyang Technological University
Deng, Lingxiao	Nanyang Technological University
Ji, Yuan	Nanyang Technological University
Yanxin, Zhou	Nanyang Technological University
Lv, Chen	Nanyang Technological University
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Aerial Systems: Perception and Autonomy Abstract: Auto-calibration of the rotor thrust coefficient and estimation of ground effect are both challenging aspects of multirotor dynamics control and planning. Conventional approaches address these issues separately and typically rely on experimental rigs for bench testing. In this paper, we propose a low-cost onboard sensor array and a well-designed unified algorithm to enable fast auto-calibration of the rotor thrust coefficient alongside ground effect estimation (TRACER). Our sensor array consists of four force-sensitive resistors compactly placed under the quadrotor landing gear to measure contact force during the lift-off phase, capturing changes in thrust. The joint calibration and estimation problem is formulated to rapidly decouple ground effect influence from free-air rotor thrust based on sensor readings. Furthermore, our approach is adaptable to various flight control input formats (duty cycle, rotor throttle, or rpm), ensuring general applicability across different multirotor operations. Experimental results demonstrate that the proposed method provides reliable joint calibration and estimation of rotor thrust and ground effect in a short lift-off process, achieving less than 10% mean absolute percentage error compared to the ground truth.


TuDT1	401
Award Finalists 4	Regular Session
Co-Chair: Liu, Lu	City University of Hong Kong

16:40-16:45, Paper TuDT1.1
Zero-Shot Semantic Segmentation for Robots in Agriculture

Chong, Yue Linn	University of Bonn
Nunes, Lucas	University of Bonn
Magistri, Federico	University of Bonn
Zhong, Xingguang	University of Bonn
Behley, Jens	University of Bonn
Stachniss, Cyrill	University of Bonn
Keywords: Robotics and Automation in Agriculture and Forestry, Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization Abstract: Conventional crop production, essential for providing food, feed, fuel, and fiber for our society, relies heavily on harmful herbicides to remove weeds. Instead, agricultural robots could remove weeds more sustainably. However, these robots require a generalizable perception system that can locate weeds, enabling automatic removal of weeds. Specifically, they need to perform crop-weed semantic segmentation, which locates and distinguishes between the crop and the weed plants with pixel-level resolution. However, most existing crop-weed semantic segmentation methods are fully supervised and require expensive and labor-intensive pixel-wise labeling of the training data. To avoid the costly labeling process, we address the problem of unsupervised crop-weed segmentation in this paper. Unlike previous approaches, we leverage the idea that weeds are "weird" plants that occur less frequently and are highly variable in appearance and reframe the problem as an anomaly segmentation problem. We propose an approach to segment weeds as anomalous plants by categorizing plants in the feature space of a pretrained foundation model. Our approach curates a bag-of-features representation of crop features and models the manifold of crop plants as hyperspheres. During inference, it classify vegetation segments of the image with features within this manifold as crop plants and all other plants as weeds. Our experiments show that our zero-shot anomaly segmentation method can perform crop-weed segmentation on several datasets from real crop fields.

16:45-16:50, Paper TuDT1.2
Scalable Outdoors Autonomous Drone Flight with Visual-Inertial SLAM and Dense Submaps Built without LiDAR

Barbas Laina, Sebastián	TU Munich
Boche, Simon	Technical University of Munich
Papatheodorou, Sotiris	Imperial College London
Tzoumanikas, Dimos	Imperial College London
Schaefer, Simon	Technical University of Munich
Chen, Hanzhi	Technical University of Munich
Leutenegger, Stefan	Technical University of Munich
Keywords: Aerial Systems: Perception and Autonomy, Field Robots, Robotics and Automation in Agriculture and Forestry Abstract: Autonomous navigation is needed for several robotics applications. In this paper we present an autonomous Micro Aerial Vehicle (MAV) system which purely relies on cost-effective and light-weight passive visual and inertial sensors to perform large-scale autonomous navigation in outdoor, unstructured and cluttered environments. We leverage visual-inertial simultaneous localization and mapping (VI-SLAM) for accurate MAV state estimates and couple it with a volumetric occupancy submapping system to achieve a scalable mapping framework which can be directly used for path planning. To ensure the safety of the MAV during navigation, we also propose a novel reference trajectory anchoring scheme that deforms the reference trajectory the MAV is tracking upon state updates from the VI-SLAM system in a consistent way, even upon large state updates due to loop-closures. We thoroughly validate our system in both real and simulated forest environments and at peak velocities up to 3 m/s -- while not encountering a single collision or system failure. To the best of our knowledge, this is the first system which achieves this level of performance in such an unstructured environment using low-cost passive visual sensors and fully on-board computation, including VI-SLAM. Code available at https://github.com/ethz-mrl/mrl_navigation

16:50-16:55, Paper TuDT1.3
Side Scan Sonar-Based SLAM for Autonomous Algae Farm Monitoring

Valdez, Julian	KTH Royal Institute of Technology
Torroba Balmori, Ignacio	KTH Royal Institute of Technology
Folkesson, John	KTH
Stenius, Ivan	KTH
Keywords: Marine Robotics, SLAM, Robotics and Automation in Agriculture and Forestry Abstract: The transition of seaweed farming to an alternative food source on an industrial scale relies on automating its processes through smart farming, equivalent to land agriculture. Key to this process are autonomous underwater vehicles (AUVs) via their capacity to automate crop and structural inspections. However, the current bottleneck for their deployment is ensuring safe navigation within farms, which requires an accurate, online estimate of the AUV pose and map of the infrastructure. To enable this, we propose an efficient side scan sonar-based (SSS) simultaneous localization and mapping (SLAM) framework that exploits the geometry of kelp farms via modeling structural ropes in the back-end as sequences of individual landmarks from each SSS ping detection, instead of combining detections into elongated representations. Our method outperforms state of the art solutions in hardware in the loop (HIL) experiments on a real AUV survey in a kelp farm.

16:55-17:00, Paper TuDT1.4
Efficient Learning of a Unified Policy for Whole-Body Manipulation and Locomotion Skills

Hou, Dianyong	Zhejiang University
Zhu, Chengrui	Zhejiang University
Zhang, Zhen	Zhejiang University
Li, Zhibin (Alex)	University College London
Guo, Chuang	Zhejiang University
Liu, Yong	Zhejiang University
Keywords: Legged Robots, Machine Learning for Robot Control, Kinematics Abstract: Equipping quadruped robots with manipulators provides unique loco-manipulation capabilities and unlocks a wide range of practical applications. This integration creates a more complex system that has increased difficulties in modeling and control. Reinforcement learning (RL) offers a promising solution to address these challenges by learning optimal control strategies through interaction. Nevertheless, RL methods often struggle with local optima when exploring large solution spaces for motion and manipulation tasks. To overcome these limitations, we propose a novel approach that integrates an explicit kinematic model of the manipulator into the RL framework. This integration provides real-time feedback on the mapping of the body postures to the manipulator's workspace, guiding the RL exploration process and effectively mitigating the local optima issue. Our algorithm has been successfully depolyed on a DeepRobotics X20 quadruped robot equipped with a Unitree Z1 manipulator, and extensive experimental results demonstrate the superior control performance of this approach.

17:00-17:05, Paper TuDT1.5
FCRF: Flexible Constructivism Reflection for Long-Horizon Robotic Task Planning with Large Language Models

Song, Yufan	Zhejiang University
Zhang, Jiatao	Zhejiang University
Gu, Zeng	University of Chinese Academy of Sciences
Liang, QingMiao	University of Chinese Academy of Sciences
Hu, Tuocheng	Zhejiang University
Song, Wei	Zhejiang Lab
Zhu, Shiqiang	Zhejiang University
Keywords: Task Planning, AI-Enabled Robotics, AI-Based Methods Abstract: Autonomous error correction is critical for domestic robots to achieve reliable execution of complex long-horizon tasks. Prior work has explored self-reflection in Large Language Models (LLMs) for task planning error correction; however, existing methods are constrained by inflexible self-reflection mechanisms that limit their effectiveness. Motivated by these limitations and inspired by human cognitive adaptation, we propose the Flexible Constructivism Reflection Framework (FCRF), a novel Mentor-Actor architecture that enables LLMs to perform flexible self-reflection based on task difficulty, while constructively integrating historical valuable experience with failure lessons. We evaluated FCRF on diverse domestic tasks through simulation in AlfWorld and physical deployment in the real-world environment. Experimental results demonstrate that FCRF significantly improves overall performance and self-reflection flexibility in complex long-horizon robotic tasks.

17:05-17:10, Paper TuDT1.6
Multiple-Scale Augmented Reality Markers for Positioning of Robotic Micromanipulation

Liang, Shuzhang	The University of Tokyo
Rabette, Vincent	INSA Lyon
Sugiura, Hirotaka	The University of Tokyo
Wang, Rixin	Guangdong Key Laboratory of Precision Equipment and Manufacturin
Amaya, Satoshi	The University of Tokyo
Dai, Yuguo	The University of Tokyo
Mo, Hao	The University of Tokyo
Arai, Fumihito	The University of Tokyo
Keywords: Biological Cell Manipulation, Automation at Micro-Nano Scales, Calibration and Identification Abstract: This study proposes a novel strategy for cross-scale position of robotic micromanipulation. The strategy utilizes multiple-scale augmented reality (AR) markers for locating the robotic manipulator on different scales. The macro-marker (3.0 cm-per side, 5 mm×5 mm each square) is applied to position the robot to the microscopic manipulation area. The micro-marker (2.4 mm-per side, 400 μm×400 μm each square) is used for positioning the end-effector under microscopic view. After the fabrication of the markers, the camera's internal parameter matrix was first calibrated. Subsequently, we conducted the detection effect of macro- and micro-markers. Since the observation effect of micro-markers is different under the microscope, the detection distance of the micro-marker was corrected and compensated, and the fixed reference marker was introduced for the correction in different focus heights. Finally, based on detection markers, a robotic manipulator, integrated with a microfluidic chip as an end-effector, was employed to demonstrate the micromanipulation of loading oocytes. The proposed strategy has a potential application in the biology laboratory automation.

17:10-17:15, Paper TuDT1.7
Detection for Harvesting with an Active Illumination Camera System and DUTU2-Net+

Pan, Qinghui	Dalian University of Technology
Lian, Jie	Dalian University of Technology
Zhao, Yadong	Dalian University of Technology
Qiu, Chaochao	Dalian University of Technology
Wang, Dong	Dalian University of Technology
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation Abstract: Robots operating in agricultural environments require a robust, fast perception system to accurately identify picking points. This paper proposed a lightweight method for detecting sweet pepper peduncles, which uses an active illumination camera system and DUTUtextsuperscript{2}-Nettextsuperscript{+} to achieve efficient and accurate peduncle detection. The camera system is used to overcome the influence of ambient light through flash and no-flash (FNF) image pairs, achieving more robust color detection and quickly locating the peduncle's region of interest (ROI). The improved DUTUtextsuperscript{2}-Nettextsuperscript{+} is used accurately for peduncle ROI detection. It uses an encoder with depthwise separable convolution (DSC), dilated convolution, and a feature enhancement structure with a triple attention module (TAM) to reduce the computational load and parameters while ensuring detection accuracy. Experimental results show that proposed method can effectively identify the position of peduncle. The DUTUtextsuperscript{2}-Nettextsuperscript{+} model achieves an average absolute error of 0.002, a maximum F1 score of 0.992, a frame rate of 36.3 FPS, and a model size of 6.9 MB.


TuDT2	402
Modeling, Control, and Learning for Soft Robots 2	Regular Session
Chair: Ren, Hongliang	Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS)
Co-Chair: Katzschmann, Robert Kevin	ETH Zurich

16:40-16:45, Paper TuDT2.1
Offline Reinforcement Learning with Koopman Operators for Control of Soft Robots

Jiang, Yue	National University of Defense Technology
Li, Cong	Technical University of Munich
Yang, Yihe	National University of Defence Technology
Cao, Wenyu	National University of Defence Technology
Xu, Xin	National University of Defense Technology
Liu, Jinze	National University of Defense Technology
Jiang, Wei	National University of Defense Technology
Zhang, Xinglong	National University of Defense Technology
Keywords: Modeling, Control, and Learning for Soft Robots Abstract: Soft robots are promising to offer flexibility in environmental interaction tasks through compliant deformations. However, the infinite degrees of freedom and high nonlinearity of dynamics pose significant challenges in dynamic modeling and control in soft robots. While online reinforcement learning (RL) is promising for designing policies directly from data, the black-box policy learning process suffers from data inefficiency and sim-to-real gap, limiting its applications in soft robots. To address these challenges, we propose a novel offline RL with Koopman operators (KORL) framework to generate control policies for soft robots without using physical simulators or real-world interactions. In particular, we first utilize a deep neural network to map dynamics of soft robots to a lifted Koopman observable space, which is inherently linear. Then, an offline RL algorithm with a control-informed actor is designed to learn the robotic policy in the linear observable space. This is significantly different from the black-box policy design in existing offline RL paradigms. The designed Koopman observable enables efficient model-free policy learning with linear control theory, improving control performance while preserving interpretability in policy learning. The effectiveness of our KORL framework is validated in a real-world soft robotic system. Comparative experimental results demonstrate that our method outperforms state-of-the-art methods in target-reaching and trajectory-tracking tasks.

16:45-16:50, Paper TuDT2.2
Composite Locally Weighted Learning Position and Stiffness Control of Articulated Soft Robots with Disturbance Observers

Zou, Zhigang	Sun Yat-Sen University
Li, Zhiwen	Sun Yat-Set University
Li, Weibing	Sun Yat-Sen University
Pan, Yongping	Southeast University
Keywords: Modeling, Control, and Learning for Soft Robots, Neural and Fuzzy Control, Compliant Joints and Mechanisms Abstract: Articulated soft robots (ASRs) driven by variable stiffness actuators (VSAs) are challenging to control well due to their highly nonlinear dynamics and difficulties in accurate modeling. The paper proposes a locally weighted learning (LWL)-based robust composite learning control (RCLC) solution for ASRs with agonistic-antagonistic (AA)-VSAs to enable the favorable tracking of both joint position and stiffness without exact robot models. In our solution, two LWL models are adopted online to estimate uncertainties in the link-side and stiffness dynamics, respectively, a nonlinear disturbance observer (DOB) is applied to improve tracking robustness at the link side, and a composite learning law is developed to achieve parameter convergence under a condition of interval excitation strictly weaker than persistent excitation so as to improve online modeling speed and accuracy. A distinctive feature of the proposed LWL-RCLC framework lies in the fact that the estimation of the DOB and the learning of LWL are independent yet work in a synergistic manner, which enables exact robot modeling online while improving tracking robustness. Experiments on a multi-DoF ASR with AA-VSAs have verified the superiority of the proposed method.

16:50-16:55, Paper TuDT2.3
Exploring Stiffness Gradient Effects in Magnetically Induced Metamorphic Materials Via Continuum Simulation and Validation

Shi, Wentao	The Chinese University of Hong Kong
Yang, Yang	The Chinese University of Hong Kong
Huang, Yiming	The Chinese University of Hong Kong
Ren, Hongliang	Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS)
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Materials and Design Abstract: Magnetic soft continuum robots are capable of bending performances with remote control in confined space environments, and they have been applied in various bioengineering contexts. As one type of ferromagnetic soft continuums, the Magnetically Induced Metamorphic Materials (MIMMs)-based continuum (MC) exhibits similar bending behaviors. Based on the characteristics of its base material, MC is flexible in modifying unit stiffness and convenient in molding fabrication. However, recent studies on magnetic continuum robots primarily focused on one or two design parameters, limiting the development of a comprehensive magnetic continuum bending model. In this work, we constructed graded-stiffness MCs (GMCs) and developed a numerical model for GMCs' bending performance, incorporating four key parameters that determine their performance. The simulated bending results were validated with real bending experiments in four different categories: varying magnetic field, cross-section, unit stiffness, and unit length. The graded-stiffness design strategy applied to GMCs prevents sharp bending at the fixed end and results in a more circular curvature. We also trained an expansion model for GMCs' bending performance that is highly efficient and accurate compared to the simulation process. An extensive bending prediction library for GMCs was built using the trained model.

16:55-17:00, Paper TuDT2.4
Multimodal Deformation Estimation of Soft Pneumatic Gripper During Operation

Cai, Changheng	The Chinese University of Hong Kong, Shenzhen
Xiao, Fei	The Chinese University of Hong Kong, Shenzhen
Vanza, Marcellus	The Chinese University of Hongkong, Shenzhen
Wang, Taoyang	Chinese University of Hong Kong (Shenzhen)
Zhou, Fangbing	The Chinese University of Hong Kong, Shenzhen
Xu, Xuanyang	The Chinese University of Hong Kong, Shenzhen
Zhu, Jian	Chinese University of Hong Kong, Shenzhen
Gao, Yuan	Shenzhen Institute of Artificial Intelligence and Robotics for S
Keywords: Modeling, Control, and Learning for Soft Robots, Grippers and Other End-Effectors, Deep Learning in Grasping and Manipulation Abstract: Soft pneumatic robots are gaining significant attention due to their compliance and adaptability in unstructured environments. While emerging dual-chamber soft pneumatic robots can achieve complex 3D deformations beyond conventional single-axis bending, real-time proprioception remains challenging due to the high degrees of freedom and the complex interaction between chambers. To address this issue, we propose a multimodal sensing method that combines camera and inertial measurement unit (IMU) and then extracts full-body shape information using deep learning algorithms. Our method enhances proprioception by effectively processing high-dimensional sensor data, providing real-time feedback on the gripper shape. The average error of key points was found to be 3.67mm when comparing our method's outputs with an external motion capture system, with the average error for using the camera alone being 4.36mm and for using the IMU alone being 9.32mm. This approach enables soft pneumatic robots to be seamlessly integrated into the interaction pipeline of embodied AI, improving their reliability and paving the way for applications in handling fragile objects, rehabilitation robotics, and human-robot collaboration.

17:00-17:05, Paper TuDT2.5
Sampling-Based Model Predictive Control for Dexterous Manipulation on a Biomimetic Tendon-Driven Hand

Hess, Adrian	ETH Zurich
Kübler, Alexander M.	ETH Zürich
Forrai, Benedek	ETH Zürich
Dogar, Mehmet R	University of Leeds
Katzschmann, Robert Kevin	ETH Zurich
Keywords: Modeling, Control, and Learning for Soft Robots, Dexterous Manipulation, Machine Learning for Robot Control Abstract: Biomimetic and compliant robotic hands offer the potential for human-like dexterity, but controlling them is challenging due to high dimensionality, complex contact interactions, and uncertainties in state estimation. Sampling-based model predictive control (MPC), using a physics simulator as the dynamics model, is a promising approach for generating contact-rich behavior. However, sampling-based MPC has yet to be evaluated on physical (non-simulated) robotic hands, particularly on compliant hands with state uncertainties. We present the first successful demonstration of in-hand manipulation on a physical biomimetic tendon-driven robot hand using sampling-based MPC. While sampling-based MPC does not require lengthy training cycles like reinforcement learning approaches, it still necessitates adapting the task-specific objective function to ensure robust behavior execution on physical hardware. To adapt the objective function, we integrate a visual language model (VLM) with a real-time optimizer (MuJoCo MPC). We provide the VLM with a high-level human language description of the task and a video of the hand's current behavior. The VLM gradually adapts the objective function, allowing for efficient behavior generation, with each iteration taking less than two minutes. We show the feasibility of ball rolling, flipping, and catching using both simulated and physical robot hands. Our results demonstrate that sampling-based MPC is a promising approach for generating dexterous manipulation skills on biomimetic hands without extensive training cycles (see video with experiments: https://youtu.be/u4d6v3ohsOI ).

17:05-17:10, Paper TuDT2.6
Integrating Contact-Aware CPG System for Learning-Based Soft Snake Robot Locomotion Controllers

Liu, Xuan	Zhejiang University
Onal, Cagdas	WPI
Fu, Jie	University of Florida
Keywords: Modeling, Control, and Learning for Soft Robots, Neurorobotics, Biologically-Inspired Robots, Learning and Adaptive Systems Abstract: Contact-awareness poses a significant challenge in the locomotion control of soft snake robots. This paper is to develop bio-inspired contact-aware locomotion controllers, grounded in a novel theory pertaining to the feedback mechanism of the Matsuoka oscillator. This mechanism enables the Matsuoka central pattern generator (CPG) system to function analogously to a ``spinal cord'' in the entire contact-aware control framework. Specifically, it concurrently integrates stimuli such as tonic input signals originating from the ``brain'' (a goal-tracking locomotion controller) and sensory feedback signals from the ``reflex arc'' (the contact reactive controller), for generating different types of rhythmic signals to orchestrate the movement of the soft snake robot traversing through densely populated obstacles and even narrow aisles. Within the ``reflex arc'' design, we have designed two distinct types of contact reactive controllers: 1) a reinforcement learning (RL)-based sensor regulator that learns to modulate the sensory feedback inputs of the CPG system, and 2) a local reflexive controller that establishes a direct connection between sensor readings and the CPG's feedback inputs, adhering to a specific topological configuration. These two reactive controllers, when combined with the goal-tracking locomotion controller and the Matsuoka CPG system, facilitate the implementation of two contact-aware locomotion control schemes. Both control schemes have been rigorous tested and evaluated in both simulated and real-world soft snake robots, demonstrating commendable performance in contact-aware locomotion tasks. These experimental outcomes further validate the benefits of the modified Matsuoka CPG system, augmented by a novel sensory feedback mechanism, for the design of bio-inspired robot controllers.

17:10-17:15, Paper TuDT2.7
Real-Time Reinforcement Learning for Dynamic Tasks with a Parallel Soft Robot

Avtges, James	Northwestern University
Ketchum, Jake	Northwestern University
Schlafly, Millicent	Northwestern University
Young, Helena	Northwestern University
Kim, Taekyoung	Northwestern University, Center for Robotics and Biosystems
Pinosky, Allison	Northwestern University
Truby, Ryan	Northwestern University
Murphey, Todd	Northwestern University
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications, Reinforcement Learning Abstract: Closed-loop control remains an open challenge in soft robotics. The nonlinear responses of soft actuators under dynamic loading conditions limit the use of analytic models for soft robot control. Traditional methods of controlling soft robots underutilize their configuration spaces to avoid nonlinearity, hysteresis, large deformations, and the risk of actuator damage. Furthermore, episodic data-driven control approaches such as reinforcement learning (RL) are traditionally limited by sample efficiency and inconsistency across initializations. In this work, we demonstrate RL for reliably learning control policies for dynamic balancing tasks in real-time single-shot hardware deployments. We use a deformable Stewart platform constructed using parallel, 3D-printed soft actuators based on motorized handed shearing auxetic (HSA) structures. By introducing a curriculum learning approach based on expanding neighborhoods of a known equilibrium, we achieve reliable single-deployment balancing at arbitrary coordinates. In addition to benchmarking the performance of model-based and model-free methods, we demonstrate that in a single deployment, Maximum Diffusion RL is capable of learning dynamic balancing after half of the actuators are effectively disabled, by inducing buckling and by breaking actuators with bolt cutters. Training occurs with no prior data, in as fast as 15 minutes, with performance nearly identical to the fully-intact platform. Single-shot learning on hardware facilitates soft robotic systems reliably learning in the real world and will enable more diverse and capable soft robots.

17:15-17:20, Paper TuDT2.8
Toward Dynamic Control of Tendon-Driven Continuum Robots Using Clarke Transform

Muhmann, Christian	Leibniz University Hannover
Grassmann, Reinhard M.	University of Toronto
Bartholdt, Max	Institute of Mechatronic Systems, Leibniz Universität Hannover
Burgner-Kahrs, Jessica	University of Toronto
Keywords: Modeling, Control, and Learning for Soft Robots, Dynamics, Flexible Robotics Abstract: In this paper, we propose a dynamic model and control framework for tendon-driven continuum robots (TDCRs) with multiple segments and an arbitrary number of tendons per segment. Our approach leverages the Clarke transform, the Euler-Lagrange formalism, and the piecewise constant curvature assumption to formulate a dynamic model on a two-dimensional manifold embedded in the joint space that inherently satisfies tendon constraints. We present linear and constraint-informed controllers that operate directly on this manifold, along with practical methods for preventing negative tendon forces without compromising control fidelity. This opens up new design possibilities for overactuated TDCRs with improved force distribution and stiffness without increasing controller complexity. We validate these approaches in simulation and on a physical prototype with one segment and five tendons, demonstrating accurate dynamic behavior and robust trajectory tracking under real-time conditions.


TuDT3	403
Biomimetics	Regular Session
Chair: Kamegawa, Tetsushi	Okayama University
Co-Chair: Deng, Jie	Harbin Institute of Technology

16:40-16:45, Paper TuDT3.1
Design of a Module for a Modular Snake Robot with 3D Locomotion

Shimizu, Yuya	Okayama University
Kita, Yosuke	Okayama University
Shimooka, So	Okayama University
Kamegawa, Tetsushi	Okayama University
Keywords: Biomimetics, Multi-Robot Systems, Search and Rescue Robots Abstract: Although modular robots with snake-like structures have been proposed in previous studies, very few have achieved separation and docking without direct human intervention. Based on prior research on conventional snake robots and modular robots, we propose a modular snake robot composed of a minimum module consisting of five links and four joints. This robot is capable of rotating and translating in a two-dimensional plane even as a single module. Each module is equipped with a unique detachable and dockable structure using magnets and mechanical hooks, and reconfiguration between modules is realized through these motions. Through field experiments, we verified the reproducibility of the detachment and docking actions, as well as the locomotion capability of each module in the detached state.

16:45-16:50, Paper TuDT3.2
Cockroach's Turning Strategy Enhanced Hexapod Robot with Flexible Torso

Li, Yiming	Harbin Institute of Technology, Shenzhen
Li, Xingyu	Harbin Institute of Technology, Shenzhen
Zhou, Jie	Harbin Institute of Technology, Shenzhen
Xie, Chenfeng	Harbin Institute of Technology, Shenzhen
Li, Yao	Harbin Institute of Technology, Shenzhen
Li, Bing	Harbin Institute of Technology (Shenzhen)
Keywords: Biomimetics, Biologically-Inspired Robots, Flexible Robotics Abstract: The design and control of hexapod robots have become an active research field due to the ability to achieve adaptive and stable multi-terrain locomotion. However, existing hexapod robots focus on the integration of flexible pitch joints to enhance their obstacle-crossing and slope-climbing abilities, and few biological observations have been made to gain insight into the agile steering mechanisms of hexapod insects. Herein, we observed the steering movements of Madagascar cockroaches. Observations showed that cockroaches exhibited specific phase relationships in addition to regular tripod gait pattern during steering. Moreover, we also found that a smaller steering radius resulted in a larger lateral bending angle of the thoracic segments. Inspired by this, a flexible hexapod robot (F-RHex) with a flexible torso was designed and fabricated. Bio-inspired gait patterns were abstracted and simplified into two steering strategies: gait-based and mix-based. Compared to the purely gait-based strategy, the F-RHex testing results demonstrated a ~27.4% reduction in turning radius and ~40% enhancement in steering velocity, implying that the mix-based strategy offers superior steering capability.

16:50-16:55, Paper TuDT3.3
Development of an Under-Actuated Tendon-Driven Planar Elephant Robot Based on Synergistic Motion Analysis

Kitabayashi, Koki	Meiji Univercity
Aotani, Takumi	Meiji University
Ozawa, Ryuta	Meiji University
Keywords: Biomimetics, Mechanism Design, Tendon/Wire Mechanism Abstract: Elephants cannot intentionally control individual trunk muscles but instead generate coordinated movements. This coordination, known as kinematic synergy, allows muscles to work together, reducing the controlled degrees of freedom (DOF) while enhancing motion efficiency.In this paper, we develop an elephant trunk robot that generates coordinated motion with only a few actuators inspired by kinematic synergy. First, we analyze elephant trunk motion using video data and extract the principal components that closely approximate actual movements.Next, we design a robot based on an underactuated tendon-driven transmission to reproduce the selected principal components and a control system to reduce tracking errors. Finally, we implement the control system and evaluate the robot's performance in replicating trunk motion with a limited number of actuators.

16:55-17:00, Paper TuDT3.4
Artificial Muscle: A Sarcomere-Inspired Magnetic Approach

Li, Ning	Shenyang Institute of Automation, Chinese Academy of Sciences
Cheng, Zengdong	Northeastern University
Zhang, Kaihan	Dalian Maritime University
Sang, Yuqing	Northeastern University
Yu, Zhuoheng	Shenyang Jianzhu University
Xi, Ning	The University of Hong Kong
Liu, Lianqing	Shenyang Institute of Automation
Zhao, Xingang	Shenyang Institute of Automation, Chinese Academy of Sciences
Keywords: Biomimetics, Biologically-Inspired Robots Abstract: Soft artificial muscle actuators have gained attention in robotics for their remote control, fast response, and high compliance. However, replicating the intricate and efficient motions of natural muscles remains a challenge. Existing designs often lack the hierarchical and anisotropic properties of muscle sarcomeres, limiting their ability to achieve biomimetic movements. We developed a novel Biomimetic Magnetic Artificial Actuator (BMAA) inspired by muscle sarcomeres. Using a soft magnetic composite material arranged in a hierarchical structure, the actuator mimics the arrangement of actin and myosin filaments. External magnetic fields enable precise control of contraction and relaxation, emulating natural muscle motion. The driver can achieve the motion performance of muscle like motion characteristics, and its driving ability is verified by reptile experiment and elbow experiment. The actuator demonstrates significant deformation, fast response, and excellent controllability, enabling complex and precise movements. This research advances the development of biomimetic soft actuators, offering potential applications in soft robotics, biomedical devices, and artificial muscles, and paving the way for more versatile and intelligent machines.

17:00-17:05, Paper TuDT3.5
Observation of Snails and a Bionic Snail Robot Crawling with Distributed Suction

Ji, Qinjie	Southeast University
Song, Aiguo	Southeast University
Wang, Shaohu	Southeast University
Kim, Sareum	EPFL
Hughes, Josie	EPFL
Keywords: Biologically-Inspired Robots, Soft Robot Materials and Design Abstract: Slow-speed animals can also exhibit remarkable capabilities, as seen in snails that crawl while maintaining adhesion. Snails have inspired researchers to develop traveling wave-based robots and suction robots; however, the combination of traveling wave propulsion with suction ability remains a challenge. In this paper, we propose a snail-inspired robot that integrates a corkscrew propulsion mechanism with distributed suction cups, enabling it to crawl upside down on the ceiling. The propulsion model of the corkscrew generating the traveling wave is derived, and a temporal-spatial decomposition method is applied to validate the high efficiency of traveling wave generation. The trade-off between wave amplitude and suction cup depth is investigated to determine an optimized configuration. The results show that the robot’s speed aligns well with the propulsion model. The traveling wave ratio calculated from experiments is 0.938. The optimized configuration consists of a corkscrew with a 14 mm diameter and suction cups with a 2.5 mm depth, achieving a crawling speed of 3.02 pm 0.28 mm/s while moving upside down. The combination of the proposed smooth traveling wave generation method and distributed suction cups enables the robot to crawl upside down while carrying a 200 g load and to climb a vertical wall, like a natural snail.

17:05-17:10, Paper TuDT3.6
Topo-Field: Topometric Mapping with Brain-Inspired Hierarchical Layout-Object-Position Fields

Hou, Jiawei	Fudan University
Guan, Wenhao	Fudan University
Liang, Longfei	Shanghai Neuhelium Neuromorphic Intelligence Tech. Co.Ltd
Feng, Jianfeng	Fudan University
Xue, Xiangyang	Fudan University
Zeng, Taiping	Fudan University
Keywords: Bioinspired Robot Learning, Mapping, Representation Learning Abstract: Mobile robots require comprehensive scene understanding to operate effectively in diverse environments, enriched with contextual information such as layouts, objects, and their relationships. Although advances like neural radiance fields (NeRFs) offer high-fidelity 3D reconstructions, they are computationally intensive and often lack efficient representations of traversable spaces essential for planning and navigation. In contrast, topological maps are computationally efficient but lack the semantic richness necessary for a more complete understanding of the environment. Inspired by a population code in the postrhinal cortex (POR) strongly tuned to spatial layouts over scene content rapidly forming a high-level cognitive map, this work introduces Topo-Field, a framework that integrates Layout-Object-Position (LOP) associations into a neural field and constructs a topometric map from this learned representation. LOP associations are modeled by explicitly encoding object and layout information, while a Large Foundation Model (LFM) technique allows for efficient training without extensive annotations. The topometric map is then constructed by querying the learned neural representation, offering both semantic richness and computational efficiency. Empirical evaluations in multi-room environments demonstrate the effectiveness of Topo-Field in tasks such as position attribute inference, query localization, and topometric planning, successfully bridging the gap between high-fidelity scene understanding and efficient robotic navigation. The open-source code is available at: href{https://github.com/fudan-birlab/Topo-Field}{https://github.com/fu dan-birlab/Topo-Field}.

17:10-17:15, Paper TuDT3.7
Bioinspired Sensing of Undulatory Flow Fields Generated by Leg Kicks in Swimming (I)

Wang, Jun	Peking University
Shen, Tongsheng	National Innovation Institute of Defense Technology
Zhao, DeXin	National Innovation Institute of Defense Technology
Zhang, Feitian	Peking University
Keywords: Bioinspired Robot Learning, Marine Robotics, Human-Robot Collaboration Abstract: The artificial lateral line (ALL) is a bioinspired flow sensing system for underwater robots, comprising of distributed flow sensors. The ALL has been successfully applied to detect the undulatory flow fields generated by body undulation and tail-flapping of bioinspired robotic fish. However, its feasibility and performance in sensing the undulatory flow fields produced by human leg kicks during swimming have not been systematically tested and studied. This paper presents a novel sensing framework to investigate the undulatory flow field generated by swimmer's leg kicks, leveraging bioinspired ALL sensing. To evaluate the feasibility of using the ALL system for sensing the undulatory flow fields generated by swimmer leg kicks, this paper designs an experimental platform integrating an ALL system and a lab-fabricated human leg model. To enhance the accuracy of flow sensing, this paper proposes a feature extraction method that dynamically fuses time-domain and time-frequency characteristics. Specifically, time-domain features are extracted using one-dimensional convolutional neural networks and bidirectional long short-term memory networks (1DCNN-BiLSTM), while time-frequency features are extracted using short-term Fourier transform and two-dimensional convolutional neural networks (STFT-2DCNN). These features are then dynamically fused based on attention mechanisms to achieve accurate sensing of the undulatory flow field. Furthermore, extensive experiments are conducted to test various scenarios inspired by human swimming, such as leg kick pattern recognition and kicking leg localization, achieving satisfactory results.

17:15-17:20, Paper TuDT3.8
An Untethered Tripodal Miniature Piezoelectric Robot with Strong Load Capacity Inspired by Land Motion of Seals (I)

Li, Jing	Harbin Institute of Technology
Deng, Jie	Harbin Institute of Technology
Zhang, Shijing	Harbin Institute of Technology
Wang, Weiyi	Harbin Institute of Technology
Liu, Yingxiang	Harbin Institute of Technology
Keywords: Biologically-Inspired Robots, Biomimetics, Legged Robots Abstract: Wireless motion and large capacity are two important characteristics for practical applications of the miniature robots, and become more challenging as the decrease of the robot size. Here, we proposed a tripodal miniature piezoelectric robot (TMPR) with size of 31 × 44 × 20 mm3 and weight of 9.8 g. The prominent features were the square section piezo-leg arranged at a 45° angle and the supporting structure with a passive wheel. The former made the robot achieve bidirectional motions by only one signal; and the latter, inspired by land motion of seals, helped the robot increase the load capacity. Then, a small onboard power supply was developed, and the TMPR achieved wireless motions with maximum velocities of 146.0 mm/s (forward) and 93.1 mm/s (backward). In the wireless motion, the tested maximum load was 331.6 g, the load-to-weight ratio exceeded 33.8, and the cost of transport in the bidirectional motions were only 1.33 and 1.56, respectively; besides, the endurance time was more than 120 min powered by a 400 mAh battery. Moreover, the robot could operate in sloping or shaking pipelines, and collect information by equipping carrying a small camera, which exhibited much potential in application of small pipeline detection.


TuDT4	404
Robot Learning	Regular Session
Chair: Uriguen Eljuri, Pedro Miguel	Kyoto University

16:40-16:45, Paper TuDT4.1
High-Precision and High-Efficiency Trajectory Tracking for Excavators Based on Closed-Loop Dynamics

Zou, Ziqing	Zhejiang University
Wang, Cong	NetEase Fuxi Robot Department
Hu, Yue	NetEase Fuxi AI Lab
Liu, Xiao	Zhejiang University
Xu, Bowen	Zhejiang University
Xiong, Rong	Zhejiang University
Fan, Changjie	NetEase
Chen, Yingfeng	Netease Inc
Wang, Yue	Zhejiang University
Keywords: Machine Learning for Robot Control, Robotics and Automation in Construction Abstract: The complex nonlinear dynamics of hydraulic excavators, such as time delays and control coupling, pose significant challenges to achieving high-precision trajectory tracking. Traditional control methods often fall short in such applications due to their inability to effectively handle these nonlinearities, while commonly used learning-based methods require extensive interactions with the environment, leading to inefficiency. To address these issues, we introduce EfficientTrack, a trajectory tracking method that integrates model-based learning to manage nonlinear dynamics and leverages closed-loop dynamics to improve learning efficiency, ultimately minimizing tracking errors. We validate our method through comprehensive experiments both in simulation and on a real-world excavator. Comparative experiments in simulation demonstrate that our method outperforms existing learning-based approaches, achieving the highest tracking precision and smoothness with the fewest interactions. Real-world experiments further show that our method remains effective under load conditions and possesses the ability for continual learning, highlighting its practical applicability. For implementation details and source code, please refer to https://github.com/ZiqingZou/EfficientTrack.

16:45-16:50, Paper TuDT4.2
Empirical Analysis of Sim-And-Real Cotraining of Diffusion Policies for Planar Pushing from Pixels

Wei, Adam	Massachusetts Institute of Technology
Agarwal, Abhinav	Massachusetts Institute of Technology
Chen, Boyuan	Massachusetts Institute of Technology
Rohan, Bosworth	MIT CSAIL
Pfaff, Nicholas Ezra	Massachusetts Institute of Technology
Tedrake, Russ	Massachusetts Institute of Technology
Keywords: Machine Learning for Robot Control, Imitation Learning Abstract: Cotraining with demonstration data generated both in simulation and on real hardware has emerged as a promising recipe for scaling imitation learning in robotics. This work seeks to elucidate basic principles of this sim-and-real cotraining to inform simulation design, sim-and-real dataset creation, and policy training. Our experiments confirm that cotraining with simulated data can dramatically improve performance, especially when real data is limited. We show that these performance gains scale with additional simulated data up to a plateau; adding more real-world data increases this performance ceiling. The results also suggest that reducing physical domain gaps may be more impactful than visual fidelity for non-prehensile or contact-rich tasks. Perhaps surprisingly, we find that some visual gap can help cotraining – binary probes reveal that high-performing policies must learn to distinguish simulated domains from real. We conclude by investigating this nuance and mechanisms that facilitate positive transfer between sim-and-real. Focusing narrowly on the canonical task of planar pushing from pixels allows us to be thorough in our study. In total, our experiments span 50+ real-world policies (evaluated on 1000+ trials) and 250 simulated policies (evaluated on 50,000+ trials). Videos and code can be found at https://sim-and-real-cotraining.github.io/.

16:50-16:55, Paper TuDT4.3
Diffusion-Based Approximate MPC: Fast and Consistent Imitation of Multi-Modal Action Distributions

Marquez Julbe, Pau	Max Planck Institute for Intelligent Systems
Nubert, Julian	ETH Zürich
Hose, Henrik	RWTH Aachen
Trimpe, Sebastian	RWTH Aachen University
Kuchenbecker, Katherine J.	Max Planck Institute for Intelligent Systems
Keywords: Machine Learning for Robot Control, Motion Control, Learning from Demonstration Abstract: Approximating model predictive control (MPC) using imitation learning (IL) allows for fast control without solving expensive optimization problems online. However, methods that use neural networks in a simple L2-regression setup fail to approximate multi-modal (set-valued) solution distributions caused by local optima found by the numerical solver or non-convex constraints, such as obstacles, significantly limiting the applicability of approximate MPC in practice. We solve this issue by using diffusion models to accurately represent the complete solution distribution (i.e., all modes) up to kilohertz sampling rates. This work shows that diffusion-based AMPC significantly outperforms L2-regression-based approximate MPC for multi-modal action distributions. In contrast to most earlier work on IL, we also focus on running the diffusion-based controller at a higher rate and in joint space instead of end-effector space. Additionally, we propose the use of gradient guidance during the denoising process to consistently pick the same mode in closed loop to prevent switching between solutions. We propose using the cost and constraint satisfaction of the original MPC problem during parallel sampling of solutions from the diffusion model to pick a better mode online. We evaluate our method on the fast and accurate control of a 7-DoF robot manipulator both in simulation and on hardware deployed at 250 Hz, achieving a speedup of more than 70 times compared to solving the MPC problem online and also outperforming the numerical optimization (used for training) in success ratio.

16:55-17:00, Paper TuDT4.4
Personalized Re-Identification through Unsupervised Continual Learning and Parallel Training

Rollo, Federico	Leonardo S.p.A
Zunino, Andrea	Leonardo
Ajoudani, Arash	Istituto Italiano Di Tecnologia
Kashiri, Navvab	Leonardo Labs
Keywords: Recognition, AI-Based Methods, Continual Learning Abstract: Object re-identification and tracking lay the foundation for various computer vision and robotics applications. In this study, we propose a method for personalizing a neural network to enhance and continuously adapt the re-identification of a specific target. Employing an unsupervised continual learning approach in conjunction with an intelligent image pool collection, we can effectively track the target and mitigate the issue of catastrophic forgetting, a challenge prevalent in this research domain. Our primary goal is to provide a robust person re-identification approach to extend the capabilities of recent tracking frameworks employed in robotics, which we have adopted as our baselines for evaluation. Our results demonstrate our approach's efficacy in successfully re-identifying the target, even when the target drastically changes his clothing appearance and the baseline frameworks struggle. To optimally tune the framework parameters, we conducted an ablation study and substantiated our findings with saliency maps to elucidate the reasons behind the effectiveness of our approach.

17:00-17:05, Paper TuDT4.5
LoopSR: Looping Sim-And-Real for Lifelong Policy Adaptation of Legged Robots

Wu, Peilin	Shanghai Jiao Tong University
Xie, Weiji	Shanghai Jiao Tong University
Cao, Jiahang	Shanghai Jiao Tong University
Lai, Hang	Shanghai Jiao Tong University
Zhang, Weinan	Shanghai Jiao Tong University
Keywords: Continual Learning, Legged Robots, Reinforcement Learning Abstract: Reinforcement Learning (RL) has shown its remarkable and generalizable capability in legged locomotion through sim-to-real transfer. However, while adaptive methods like domain randomization are expected to enhance policy robustness across diverse environments, they potentially compromise the policy's performance in any specific environment, leading to suboptimal real-world deployment due to the No Free Lunch theorem. To address this, we propose LoopSR, a lifelong policy adaptation framework that continuously refines RL policies in the post-deployment stage. LoopSR employs a transformer-based encoder to map real-world trajectories into a latent space and reconstruct a digital twin of the real world for further improvement. Autoencoder architecture and contrastive learning methods are adopted to enhance feature extraction of real-world dynamics. Simulation parameters for continual training are derived by combining predicted values from the decoder with retrieved parameters from a pre-collected simulation trajectory dataset. By leveraging simulated continual training, LoopSR achieves superior data efficiency compared with strong baselines, yielding eminent performance with limited data in both sim-to-sim and sim-to-real experiments.

17:05-17:10, Paper TuDT4.6
Taxonomy-Aware Continual Semantic Segmentation in Hyperbolic Spaces for Open-World Perception

Hindel, Julia	University of Freiburg
Cattaneo, Daniele	University of Freiburg
Valada, Abhinav	University of Freiburg
Keywords: Continual Learning, Incremental Learning, Deep Learning for Visual Perception Abstract: Semantic segmentation models are typically trained on a fixed set of classes, limiting their applicability in open-world scenarios. Class-incremental semantic segmentation aims to update models with emerging new classes while preventing catastrophic forgetting of previously learned ones. However, existing methods impose strict rigidity on old classes, reducing their effectiveness in learning new incremental classes. In this work, we propose Taxonomy-Oriented Poincaré-regularized Incremental-Class Segmentation (TOPICS) that learns feature embeddings in hyperbolic space following explicit taxonomy-tree structures. This supervision provides plasticity for old classes, updating ancestors based on new classes while integrating new classes at fitting positions. Additionally, we maintain implicit class relational constraints on the geometric basis of the Poincaré ball. This ensures that the latent space can continuously adapt to new constraints while maintaining a robust structure to combat catastrophic forgetting. We also establish eight realistic incremental learning protocols for autonomous driving scenarios, where novel classes can originate from known classes or the background. Extensive evaluations of TOPICS on the Cityscapes and Mapillary Vistas 2.0 benchmarks demonstrate that it achieves state-of-the-art performance. We make the code and trained models publicly available at http://topics.cs.uni-freiburg.de.

17:10-17:15, Paper TuDT4.7
Online Context Learning for Socially Compliant Navigation

Okunevich, Iaroslav	University of Technology of Belfort-Montbéliard
Lombard, Alexandre	Université De Technologie De Belfort-Montbéliard, Laboratoire Co
Krajnik, Tomas	Czech Technical University
Ruichek, Yassine	University of Technology of Belfort-Montbeliard - France
Yan, Zhi	École Nationale Supérieure De Techniques Avancées (ENSTA)
Keywords: Continual Learning, Human-Centered Robotics, Incremental Learning Abstract: Robot social navigation needs to adapt to different human factors and environmental contexts. However, since these factors and contexts are difficult to predict and cannot be exhaustively enumerated, traditional learning-based methods have difficulty in ensuring the social attributes of robots in long-term and cross-environment deployments. This letter introduces an online context learning method that aims to empower robots to adapt to new social environments online. The proposed method adopts a two-layer structure. The bottom layer is built using a deep reinforcement learning-based method to ensure the output of basic robot navigation commands. The upper layer is implemented using an online robot learning-based method to socialize the control commands suggested by the bottom layer. Experiments using a community-wide simulator show that our method outperforms the state-of-the-art ones. Experimental results in the most challenging scenarios show that our method improves the performance of the state-of-the-art by 8%. The source code of the proposed method, the data used, and the tools for the per-training step are publicly available at https://github.com/Nedzhaken/SOCSARL-OL.

17:15-17:20, Paper TuDT4.8
Haptic-Informed ACT with a Soft Gripper and Recovery-Informed Training for Pseudo Oocyte Manipulation

Uriguen Eljuri, Pedro Miguel	Kyoto University
Shibata, Hironobu	Ritsumeikan University
Maeyama, Katsuyoshi	Ritsumeikan University
Jia, Yuanyuan	Kyoto University
Taniguchi, Tadahiro	Kyoto University
Keywords: Learning from Demonstration, Imitation Learning, AI-Enabled Robotics Abstract: In this paper, we introduce Haptic-Informed ACT, an advanced robotic system for pseudo oocyte manipulation, integrating multimodal information and Action Chunking with Transformers (ACT). Traditional automation methods for oocyte transfer rely heavily on visual perception, often requiring human supervision due to biological variability and environmental disturbances. Haptic-Informed ACT enhances ACT by incorporating haptic feedback, enabling real-time grasp failure detection and adaptive correction. Additionally, we introduce a 3D-printed TPU soft gripper to facilitate delicate manipulations. Experimental results demonstrate that Haptic-Informed ACT improves the task success rate, robustness, and adaptability compared to conventional ACT, particularly in dynamic environments. These findings highlight the potential of multimodal learning in robotics for biomedical automation.


TuDT5	407
Additive Manufacturing	Regular Session
Chair: Surynek, Pavel	Czech Technical University in Prague
Co-Chair: Xiao, Jing	Worcester Polytechnic Institute (WPI)

16:40-16:45, Paper TuDT5.1
Object Packing and Scheduling for Sequential 3D Printing: A Linear Arithmetic Model and a CEGAR-Inspired Optimal Solver

Surynek, Pavel	Czech Technical University in Prague
Bubník, Vojtěch	Prusa Research
Lukáš, Matěna	Prusa Research
Kubiš, Petr	Prusa Research
Keywords: Additive Manufacturing, Planning, Scheduling and Coordination, Collision Avoidance Abstract: We address the problem of object arrangement and scheduling for sequential 3D printing. Unlike the standard 3D printing, where all objects are printed slice by slice at once, in sequential 3D printing, objects are completed one after other. In the sequential case, it is necessary to ensure that the moving parts of the printer do not collide with previously printed objects. We look at the sequential printing problem from the perspective of combinatorial optimization. We propose to express the problem as a linear arithmetic formula, which is then solved using a solver for satisfiability modulo theories (SMT). However, we do not solve the formula expressing the problem of object arrangement and scheduling directly, but we have proposed a technique inspired by counterexample guided abstraction refinement (CEGAR), which turned out to be a key innovation to efficiency.

16:45-16:50, Paper TuDT5.2
Stress-Driven Algorithm for Fiber Alignment in Smart Materials for Controlled Deformation in 4D-Printed Soft Robotics

Choi, Won Bin	Pohang University of Science and Technology
Jang, Jinah	Pohang University of Science and Technology
Chung, Wan Kyun	POSTECH
Keywords: Additive Manufacturing, Soft Robot Materials and Design, Soft Sensors and Actuators Abstract: This work proposes a path generation policy for self-actuating soft grippers by converting external deformation conditions into intrinsic load conditions. This transformation enables anisotropic material orientation control of functional materials - which can deform under stimuli - aligning with the deformation requirements of soft grippers to enhance controllability. Given a desired deformation for an arbitrary geometry, finite element method (FEM) analysis is used to determine the internal stress distribution. The second-order stress tensor is transformed into a traction vector field, guiding the alignment of material anisotropy for optimal deformation. A computational framework is developed to generate smooth, continuous printing paths by integrating along the vector field, ensuring internal morphology control of the target geometry. The proposed method is validated through FEM analysis, demonstrating a positional deviation rate of under 5% relative to the largest geometric feature in each test case across various tested shapes and deformation conditions. The results demonstrate that the algorithm effectively generates 4D printing paths that enable soft grippers to achieve target deformations with high matching rate.

16:50-16:55, Paper TuDT5.3
A General Approach for Constrained Robotic Coverage Path Planning on 3D Freeform Surfaces (I)

McGovern, Sean	Worcester Polytechnic Institute
Xiao, Jing	Worcester Polytechnic Institute (WPI)
Keywords: Manipulation Planning, Additive Manufacturing, Industrial Robots Abstract: Abstract—There are many industrial robotic applications which require a robot manipulator’s end-effector to fully cover a 3D surface region in a constrained motion. Constrained surface coverage in this context is focused on placing commonly used coverage patterns (such as raster, spiral, or dual-spiral) onto the surface for the manipulator to follow. The manipulator must continuously satisfy surface task constraints imposed on the end effector while maintaining manipulator joint constraints. While there is substantial research for coverage on planar surfaces, methods for constrained coverage of 3D (spatial) surfaces are limited to certain (parametric or spline) surfaces and do not consider feasibility systematically given manipulator and task constraints. There is a lack of fundamental research to address the general problem: given a manipulator, a 3D freeform surface, and task constraints, whether there exists a feasible continuous motion plan to cover the surface, and if so, how to produce a uniform coverage path that best satisfies task constraints. In this paper, we introduce a general approach to address this fundamental but largely open coverage problem. We have applied our approach to example 3D freeform surface coverage tasks in simulation and real world environments with a 7-DOF robotic manipulator to demonstrate its effectiveness. Note to Practitioners—This paper was motivated by the constrained coverage path planning problem on 3D freeform surfaces for many industrial applications, such as painting, spray coating, abrasive blasting, polishing, shotcreting, etc. It provides a principled and general approach that includes an automatic robotic system to find feasible robotic end-effector paths for covering a 3D freeform surface with some interaction from a human worker who provides key parameters related to the specific task without being an expert in robotics. Therefore, the approach enables a human worker who only has the domain knowledge of a specific coverage task to operate the general and automatic robotic system effectively for completing the task.

16:55-17:00, Paper TuDT5.4
Snuggle-Pack: Speeding up Multi-Heuristic Pack Planning of Complex Objects

Nickel, Tim	Fraunhofer IPA
Bormann, Richard	Fraunhofer IPA
Arras, Kai Oliver	University of Stuttgart
Keywords: Logistics, Computer Vision for Automation Abstract: Efficient object packing is a fundamental challenge in logistics and industrial automation. This work introduces Snuggle-Pack, a novel 3D packing algorithm that integrates Fast Fourier Transform (FFT)-based spatial analysis with a multi-heuristic optimization framework to achieve real-time, high-density packing. Unlike traditional heuristic-based approaches that rely on 2D simplifications, our method operates in a fully 3D volumetric space, ensuring collision-free, stable, and physically feasible placements. At its core, our approach employs a proximity-aware and support-sensitive placement strategy, which encourages objects to fit snugly within their surroundings —hence the name—, optimizing space utilization through fine-grained collision metrics. We evaluate our method on the YCB and IPA-3D1K datasets in both previewed and ad-hoc packing scenarios. Our experiments show that Snuggle-Pack significantly outperforms the state of the art, achieving up to 25% higher packing densities or, alternatively, accelerating computation by up to 10×. Moreover, our framework allows for dynamic adaptation to custom constraints, such as balanced center of mass, weight limitations on fragile items, and safety proximity constraints. These results highlight Snuggle-Pack as an efficient, flexible, and scalable solution for industrial robotic packing tasks.

17:00-17:05, Paper TuDT5.5
DSFormer-RTP: Dynamic-Stream Transformers for Real-Time Deterministic Trajectory Prediction

Chen, Xun	Nanyang Technological University
Wen, Mingxing	China-Singapore International Joint Research Center
Deng, Tianchen	Shanghai Jiao Tong University
Zhou, Yichen	Nanyang Technological University
Zhang, Haoyuan	Nanyang Technological University
Wang, Danwei	Nanyang Technological University
Keywords: Logistics, Computer Vision for Transportation Abstract: As delivery robots are increasingly integrated into our daily lives, their ability to navigate through crowded spaces demands swift and accurate prediction of pedestrian trajectories, which is crucial for autonomous functionality. However, existing methods face challenges of unstable accuracy and inefficiency in real-world deployment. Trajectory prediction involves both temporal and social dimensions. Recent methods have achieved better results by modeling temporal and social dimensions simultaneously, preventing information loss com- pared to modeling them separately, which significantly increases computational costs, posing challenges for practical deployment. In this paper, we conceptualize the trajectory prediction task as a deterministic sequence-to-sequence model that produces one precise forecast, aligning with real-world needs while reducing complexity. To improve efficiency and reduce latency for real-time applications, we propose a novel dynamic-stream transformer architecture that categorizes layers into multi- stream and single-stream based on the number of dimensions involved in computation. The single-stream modules attend to all dimensions simultaneously, providing comprehensive infor- mation fusion but with higher computational complexity. The multi-stream modules focus on only one dimension, enabling parallel and batched computation, crucial for improving the model’s real-time performance. By combining them strategi- cally, we achieve a balance between accuracy and speed. Exten- sive experiments on real datasets show that our dynamic-stream transformer architecture significantly reduces computational complexity, achieving a speed increase of 180% to 3180% com- pared to similar approaches, while also attaining performance close to the state-of-the-art (SOTA) for deterministic trajectory prediction


TuDT6	301
Micro/Nano Robots 4	Regular Session
Chair: Wang, Xiangyu	Nankai University
Co-Chair: Qin, Yanding	Nankai University

16:40-16:45, Paper TuDT6.1
Feedback Control of a Single-Tail Bioinspired 59-Mg Swimmer

Trygstad, Conor	Washington State University
Longwell, Cody	Washington State University
Gonçalves, Francisco	Washington State University
Blankenship, Elijah	Washington State University
Perez-Arancibia, Nestor O	Washington State University (WSU)
Keywords: Micro/Nano Robots, Biologically-Inspired Robots, Marine Robotics Abstract: We present an evolved controllable version of the single-tail Fish-&-Ribbon–Inspired Small Swimming Harmonic roBot (FRISSHBot), a 59-mg biologically inspired swimmer driven by a new shape-memory alloy (SMA)-based bimorph actuator. The new FRISSHBot is controllable in the two-dimensional (2D) space, which enabled the first demonstration of feedback-controlled trajectory tracking of a single-tail aquatic robot with onboard actuation at the subgram scale. These new capabilities are the result of a physics-informed design with an enlarged head and shortened tail relative to those of the original platform. Enhanced by its design, this new platform achieves forward swimming speeds of up to 13.6 mm/s (0.38 Bl/s), which is over four times that of the original platform. Furthermore, when following 2D references in closed loop, the tested FRISSHBot prototype achieves forward swimming speeds of up to 9.1 mm/s, root-mean-square (RMS) tracking errors as low as 2.6 mm, turning rates of up to 13.1 °/s, and turning radii as small as 10 mm.

16:45-16:50, Paper TuDT6.2
Development of an Untethered Ultrasonic Robot with Fast and Load-Carriable Movement Imitating Rotatory Galloping Gait (I)

Wu, Jiang	Shandong University
Ding, Zhaochun	Shandong University
Wang, Lipeng	Yanshan University
Zhang, Ranxu	Shandong University
Zhang, Yanhu	Jiangsu University
Rong, Xuewen	Shandong University
Song, Rui	Shandong University
Dong, Huijuan	Harbin Institute of Technology
Zhao, Jie	State Key Laboratory of Robotics and System, Harbin Institue of T
Li, Yibin	Shandong University
Keywords: Micro/Nano Robots, Biologically-Inspired Robots Abstract: An untethered ultrasonic robot (U2sonobot) operating in resonant vibration is developed by integrating dual transducers, an onboard circuit, and a battery. Here, the longitudinal and bending vibrations lead to the out-of-phase swing motion and the alternating acceleration, respectively; these imitate the rotatory galloping gait in terms of the driving feet’s movement pattern and the operating sequence. First, the transducers were designed to gather the resonant frequencies of two vibrations and produce the same node for steadily supporting the other components. Second, an onboard circuit was devised to convert the 3.7 V battery’s dc signal into multi-channels of ultrasonic signals via multilevel amplification. Third, a prototype 54 × 52 × 46 mm3 in size and 76.5 g in weight was fabricated to assess its moving/carrying performance. At 59.3 kHz frequency, U2sonobot yielded the maximal speed of 221 mm/s and the minimal step displacement of 0.3 µm. According to the wirelessly-received commands, it produced various types of flexible movements (e.g., those with adjustable speed/steering-radii and in situ rotations) and climbed 8.9° slope. Moreover, it carried the maximal payload of 520 g and provided the minimal cost of transport of 3.9. U2sonobot accomplishes fast and load-carriable movements, implying its potentially applicability to optical focusing/scanning system.

16:50-16:55, Paper TuDT6.3
A Physics-Informed Neural Network for the Calibration of Electromagnetic Navigation Systems

Ernst, Pascal	ETH Zurich, MagnebotiX AG
Gervasoni, Simone	ETH Zurich, Multi Scale Robotics Laboratory
Sivakumaran, Derick	Swiss Federal Institute of Technology (ETH Zurich)
Masina, Enea	Magnebotix AG
Sargent, David Fisher	Inst. Molecular BIology and Biophysics, ETHZ
Nelson, Bradley J.	ETH Zurich
Boehler, Quentin	ETH Zurich
Keywords: Micro/Nano Robots, Deep Learning Methods, Calibration and Identification Abstract: Electromagnetic Navigation Systems enable remote actuation of untethered micro and nanorobots, as well as the precise control of magnetic surgical tools for minimally invasive medical procedures. Accurate modeling of the magnetic fields generated by the electromagnets composing these systems is essential for achieving reliable and precise navigation. Existing modeling approaches either neglect nonlinear effects such as electromagnet saturation or fail to ensure that the field predictions are physically consistent. These limitations can lead to significant prediction errors, particularly in the estimation of field gradients, which directly impacts force calculations. As a result, inaccurate gradient predictions degrade force control performance, limiting the precision of magnetic actuation. In this work, we investigate physics-informed and data-driven modeling techniques to improve the accuracy of magnetic field and gradient predictions. Additionally, we introduce an approach for solving the inverse problem, developing models capable of predicting the required electromagnet currents to generate a desired magnetic field and gradient based on this approach. By incorporating physical constraints into the models, we enhance the predictive accuracy and physical consistency of the field estimates. In the experimental section, we demonstrate the benefits of these methods to enable improved force control in open-loop for untethered robots using a small-scale Electromagnetic Navigation System.

16:55-17:00, Paper TuDT6.4
Multi-DoF Optothermal Microgripper for Micromanipulation Applications

Chen, Kaiwen	Imperial College London
Thompson, Alexander James	Imperial College London
Ahmad, Belal	Imperial College London
Keywords: Micro/Nano Robots, Grippers and Other End-Effectors, Automation at Micro-Nano Scales Abstract: Microgrippers have emerged in minimally invasive surgery and biomedical applications, enabling tasks such as gripping, micro-assembly, and cell manipulation. Realizing microgrippers with multiple degrees-of-freedom (DoFs) enables higher dexterity and multi-functionality. However, due to their small size and limited working space, the development of microgrippers with multi-DoF faces great challenges, requiring complex fabrication technologies and actuation mechanisms. Here we report a novel optothermally-actuated multi-DoF microgripper with multiple functionalities for micromanipulation. For this, three types of optothermal microactuators, namely bimaterial microjoints, chevron-shaped microactuators, and hot-cold arm microactuators are considered. The suitability of the chevron-shaped and hot-cold arm microactuators for mechanical pushing tasks is evaluated through modeling and simulation. Then, a 3-DoF microgripper incorporating two spiral bimaterial microjoints and one chevron-shaped microactuator is designed and fabricated. The selective and individual actuation of these microactuators is facilitated using a fiber bundle and a digital micromirror device. Finally, the performance and multi-functionality of the microgripper are demonstrated by performing multiple micromanipulation tasks of mechanical pushing and pick-and-release of microbeads. This work provides a proof of concept of an optothermal multi-DoF microgripper with multiple functionalities, opening the way for advanced dexterity at the microscale that is difficult to achieve using the current technology.

17:00-17:05, Paper TuDT6.5
Oscillation Suppression of Acoustic Trapping: A Disturbance Observer-Based Approach

Jia, Yuyu	Shanghaitech University
Gong, Yizhou	ShanghaiTech University
Wang, Mingyue	Shanghaitech Univerisity
Sun, Zhenhuan	Shanghaitech University
Shi, Yalin	Shandong University
Wang, Yang	Shanghaitech University
Liu, Song	ShanghaiTech University
Keywords: Micro/Nano Robots, Grippers and Other End-Effectors Abstract: Acoustic tweezers have been a valuable tool across various fields, from nano-microfabrication to biology. Their unique characteristics enable three-dimensional particle manipulation, where acoustic trapping serves as a fundamental requirement. However, traditional methods struggle to maintain steady particle positioning due to nonlinear forces and complex dynamic coupling effects. As a result, particle oscillations are inevitable and cannot be effectively compensated by predesigned acoustic trapping. To address these challenges, this study introduces a novel visual feedback control approach that dynamically adjusts the acoustic field distribution to mitigate oscillations along the z-axis of the acoustic trapping. A binocular microscopic vision system is employed for precise particle localization, while a disturbance observer estimates the effects of strong nonlinearity and uncertainties of the acoustic trapping. The proposed methodology is validated through simulations and experiments, demonstrating a significant reduction in z-axis oscillations from 1.33times wavelength to within 0.03times wavelength. This advancement marks a step forward in achieving precise and complex acoustic manipulation using traveling-wave acoustic tweezers.

17:05-17:10, Paper TuDT6.6
Simultaneous 6-DOF Localization and Scanning Angle Detection of Magnetic Ultrasound Capsule Endoscope (MUSCE) with Internal Sensors

Yang, Zhengxin	Suzhou Institute of Biomedical Engineering and Technology, Chine
Liu, Lihao	University of Science and Technology of China
Jiao, Yang	Suzhou Institute of Biomedical Engineering and Technology (SIBET
Cui, Yaoyao	Suzhou Institute of Biomedical Engineering and Technology (SIBET
Keywords: Micro/Nano Robots, Medical Robots and Systems Abstract: Localization of magnetic capsule endoscope (MCE) is essential for accurate actuation. Despite extensive progress in pose estimation using internal magnetic field sensors and external magnetic sources, it remains challenging to achieve localization when a time-varying internal magnetic field (IMF) exists. This study presents a compound sensing method of magnetic ultrasound capsule endoscope (MUSCE) based on an internal magnetic field sensor array and external permanent magnet source, achieving simultaneous 6-degree of freedom (DOF) localization for magnetic navigation and real-time ultrasound (US) beam scanning angle detection for distortion-free US imaging reconstruction. Firstly, a MUSCE consisting of an internal magnet, US transducer, and hall sensors is designed, enabling simultaneous spiral structure-based locomotion and high-quality endoluminal US imaging. Then, a compound sensing strategy is presented, realizing the separation of time-varying IMF and external magnetic field (EMF), allowing synchronous 6-DOF MUSCE localization and US beam scanning angle detection. Finally, the effectiveness of the presented method is validated by tests. The demonstrated static localization error is 4.08 ± 1.91 mm in position norm and 2.46 ± 1.31° in orientation, across a workspace shared with the robotic manipulator. Also, the scanning angle detection can rectify the US image distortion, showing potential clinical applications.

17:10-17:15, Paper TuDT6.7
Design and Control of a Miniaturized Magnetic Driven Deformable Capsule Robot for Targeted Drug Delivery (I)

Cai, Zhuocong	Nankai University
Qin, Yanding	Nankai University
Han, Jianda	Nankai University
Keywords: Micro/Nano Robots, Medical Robots and Systems Abstract: This paper proposes a magnetic driven deformable capsule robot to deliver targeted drugs to the constantly peristalsis human intestine. An embedded radially magnetized permanent magnet ring and an external permanent magnet operated by a robot arm are used to remotely control the robot. This design is unique in that it is equipped with a deformation mechanism easily controllable by the external permanent magnet. The robot can switch its mode between motion mode and drug delivery mode when it moves in human intestine, making it safer and less invasive. The proposed robot can actively deform in the radial direction, making it hopeful in realizing various advanced functions, such as flexible motion, anchoring, and drug delivery. The design and fabrication of the robot are presented. Analytical modeling of its motion under the external magnetic torque is built. The feasibility of the proposed design is computationally investigated and then experimentally tested and verified on a prototype.

17:15-17:20, Paper TuDT6.8
VALP: Vision-Based Adaptive Laser Propulsor for Noncontact Manipulation at the Air-Liquid Interface (I)

Hui, Xusheng	Northwestern Polytechnical University
Luo, Jianjun	Northwestern Polytechnical University(P.R.China)
Keywords: Micro/Nano Robots, Motion Control, Dexterous Manipulation Abstract: Noncontact manipulation at the air-liquid interface holds vast potential applications in biochemistry analysis, flexible electronics, micromanufacturing, and microrobotics. The universal and versatile control of passive floating objects remains a great challenge. Here, we present a vision-based adaptive laser propulsor (VALP) system for the motion control of generalized millimeter-scale floating objects. The VALP system actuates the floating objects directionally through parallel thermocapillary flows induced by ultrafast laser scanning. A simplified kinetic model is developed to simulate the dynamic response of floating objects in the VALP system, and corresponding gains are proven effective in closed-loop control experiments for stationary target positioning and complex trajectory replication. The maximum velocity of the floating object reaches 13.9 mm/s, while its position-holding error for the intended target maintains within 0.48 mm. The trajectory replication error for a typical Lissajous curve is below 0.4 mm. Multiple objects can be manipulated simultaneously through the ultrafast scanning and multiplexing of the laser beam. Adaptability is validated in multiple generalization experiments for floating objects with different sizes, materials, shapes, and other characteristics. With the capability of highly directional propulsion, the VALP system enables smooth, fast, precise, and adaptive closed-loop motion control of generalized floating objects without the need for their prior information, promising it as a universal and versatile platform for noncontact manipulation at the air–liquid interface.


TuDT7	307
Motion and Path Planning 4	Regular Session
Chair: Gao, Fei	Zhejiang University
Co-Chair: Vonasek, Vojtech	Czech Technical University in Prague

16:40-16:45, Paper TuDT7.1
SEB-Naver: A SE(2)-Based Local Navigation Framework for Car-Like Robots on Uneven Terrain

Li, Xiaoying	HEU
Xu, Long	Zhejiang University
Huang, Xiaolin	Huzhou Institute of Zhejiang University
Xue, Donglai	Huzhou Institude of Zhejiang University
Zhang, Zhihao	Huzhou Institute of Zhejiang University
Han, Zhichao	Zhejiang University
Xu, Chao	Zhejiang University
Cao, Yanjun	Zhejiang University, Huzhou Institute of Zhejiang University
Gao, Fei	Zhejiang University
Keywords: Motion and Path Planning, Nonholonomic Motion Planning, Collision Avoidance Abstract: Autonomous navigation of car-like robots on uneven terrain poses unique challenges compared to flat terrain, particularly in traversability assessment and terrain-associated kinematic modelling for motion planning. This paper introduces SEB-Naver, a novel SE(2)-based local navigation framework designed to overcome these challenges. First, we propose an efficient traversability assessment method for SE(2) grids, leveraging GPU parallel computing to enable real-time updates and maintenance of local maps. Second, inspired by differential flatness, we present an optimization-based trajectory planning method that integrates terrain-associated kinematic models, significantly improving both planning efficiency and trajectory quality. Finally, we unify these components into SEB-Naver, achieving real-time terrain assessment and trajectory optimization. Extensive simulations and real-world experiments demonstrate the effectiveness and efficiency of our approach. The code is open-sourced at https://github.com/ZJU-FAST-Lab/ seb_naver.

16:45-16:50, Paper TuDT7.2
An Unsupervised C-Uniform Trajectory Sampler with Applications to Model Predictive Path Integral Control

Poyrazoglu, Oguzhan Goktug	University of Minnesota
Mahesh, Rahul Moorthy	University of Minnesota - Twin Cities
Cao, Yukang	University of Minnesota
Chastek, William	University of Minnesota
Isler, Volkan	The University of Texas at Austin
Keywords: Motion and Path Planning, Nonholonomic Motion Planning, Motion Control Abstract: Sampling-based model predictive controllers generate trajectories by sampling control inputs from a fixed, simple distribution such as the normal or uniform distributions. This sampling method yields trajectory samples that are tightly clustered around a mean trajectory. This clustering behavior in turn, limits the exploration capability of the controller and reduces the likelihood of finding feasible solutions in complex environments. Recent work has attempted to address this problem by either reshaping the resulting trajectory distribution or increasing the sample entropy to enhance diversity and promote exploration. In our recent work, we introduced the concept of C-Uniform trajectory generation [1], which allows the computation of control input probabilities to generate trajectories that sample the configuration space uniformly. In this work, we first address the main limitation of this method: the lack of scalability due to computational complexity. We introduce Neural C-Uniform, an unsupervised C-Uniform trajectory sampler that mitigates scalability issues by computing control input probabilities without relying on a discretized configuration space. Furthermore, we present CU-MPPI, which integrates Neural C-Uniform sampling into existing MPPI variants. We analyze the performance of CU-MPPI in both simulation and real-world experiment scenarios. Our results indicate that CU-MPPI achieves drastic improvements in performance in settings where the optimal solution has high-curvature trajectories. Additionally, it performs as well as or better than baseline methods in dynamic environments. Additional results can be found at the project website.

16:50-16:55, Paper TuDT7.3
Fast Shortest Path Polyline Smoothing with G^1 Continuity and Bounded Curvature

Pastorelli, Patrick	University of Trento
Dagnino, Simone	University of Trento
Saccon, Enrico	University of Trento
Frego, Marco	Free University of Bolzano
Palopoli, Luigi	University of Trento
Keywords: Motion and Path Planning, Nonholonomic Motion Planning, Wheeled Robots Abstract: In this work, we propose the Dubins Path Smoothing (DPS) algorithm, a novel and efficient method for smoothing polylines in motion planning tasks. DPS applies to motion planning of vehicles with bounded curvature. In the paper, we show that the generated path: 1) has minimal length, 2) is G^1 continuous, and 3) is collision-free by construction, under mild hypotheses. We compare our solution with the state-of-the-art and show its convenience both in terms of computation time and of length of the compute path.

16:55-17:00, Paper TuDT7.4
KRRF: Kinodynamic Rapidly-Exploring Random Forest Algorithm for Multi-Goal Motion Planning

Ježek, Petr	Faculty of Electrical Engineering, Czech Technical University In
Minařík, Michal	Czech Technical University in Prague
Vonasek, Vojtech	Czech Technical University in Prague
Penicka, Robert	Czech Technical University in Prague
Keywords: Motion and Path Planning, Nonholonomic Motion Planning Abstract: The problem of kinodynamic multi-goal motion planning is to find a trajectory over multiple target locations with an apriori unknown sequence of visits. The objective is to minimize the cost of the trajectory planned in a cluttered environment for a robot with a kinodynamic motion model. This problem has yet to be efficiently solved as it combines two NP-hard problems, the Traveling Salesman Problem (TSP) and the kinodynamic motion planning problem. We propose a novel approximate method called Kinodynamic Rapidly-exploring Random Forest (KRRF) to find a collision-free multi-goal trajectory that satisfies the motion constraints of the robot. KRRF simultaneously grows kinodynamic trees from all targets towards all other targets while using the other trees as a heuristic to boost the growth. Once the target-to-target trajectories are planned, their cost is used to solve the TSP to find the sequence of targets. The final multi-goal trajectory satisfying kinodynamic constraints is planned by guiding the RRT-based planner along the target-to-target trajectories in the TSP sequence. Compared with existing approaches, KRRF provides shorter target-to-target trajectories and final multi-goal trajectories with 1.1 − 2 times lower costs while being computationally faster in most test cases. The method will be published as an open-source.

17:00-17:05, Paper TuDT7.5
FSDP: Fast and Safe Data-Driven Overtaking Trajectory Planning for Head-To-Head Autonomous Racing Competitions

Hu, Cheng	Zhejiang University
Huang, Jihao	Zhejiang University
Mao, Wule	Zhejiang University
Fu, Yonghao	Zhejiang University
Chi, Xuemin	Zhejiang University
Qin, Haotong	ETH Zurich
Baumann, Nicolas	ETH
Liu, Zhitao	Zhejiang University
Magno, Michele	ETH Zurich
Xie, Lei	Zhejiang University
Keywords: Motion and Path Planning, Optimization and Optimal Control, Collision Avoidance Abstract: Generating overtaking trajectories in autonomous racing remains a challenge, as the trajectory must satisfy the vehicle's dynamics and ensure safety and real-time performance. We propose the Fast and Safe Data-Driven Planner in this work to address this challenge. Sparse Gaussian predictions are introduced to improve both the efficiency and accuracy of opponent predictions. Building upon these predictions, we employ a bi-level quadratic programming framework to generate an overtaking trajectory. The first level uses polynomial fitting to generate a rough trajectory, from which reference states and control inputs are derived for the second level. The second level formulates a model predictive control optimization problem in the Frenet frame, generating a trajectory that satisfies both kinematic feasibility and safety. Experimental results show that our method outperforms the State-of-the-Art, achieving an 8.93% higher overtaking success rate, allowing the maximum opponent speed, ensuring a smoother ego trajectory, and reducing 74.04% computational time compared to the Predictive Spliner method. The code is available at: https://github.com/ZJU-DDRX/FSDP.

17:05-17:10, Paper TuDT7.6
Stochastic Trajectory Optimization for Robotic Skill Acquisition from a Suboptimal Demonstration

Ming, Chenlin	Shanghai Jiao Tong University
Wang, Zitong	Shanghai Jiao Tong University
Zhang, Boxuan	Technical University of Munich
Cao, Zhanxiang	Shanghai Jiao Tong University
Duan, Xiaoming	Shanghai Jiao Tong University
He, Jianping	Shanghai Jiao Tong University
Keywords: Motion and Path Planning, Optimization and Optimal Control, Learning from Demonstration Abstract: Learning from Demonstration (LfD) has emerged as a crucial method for robots to acquire new skills. However, when given suboptimal task trajectory demonstrations with shape characteristics reflecting human preferences but subpar dynamic attributes such as slow motion, robots not only need to mimic the behaviors but also optimize the dynamic performance. In this work, we leverage optimization-based methods to search for a superior-performing trajectory whose shape is similar to that of the demonstrated trajectory. Specifically, we use Dynamic Time Warping (DTW) to quantify the difference between two trajectories and combine it with additional performance metrics, such as collision cost, to construct the cost function. Moreover, we develop a multi-policy version of the Stochastic Trajectory Optimization for Motion Planning (STOMP), called MSTOMP, which is more stable and robust to parameter changes. To deal with the jitter in the demonstrated trajectory, we further utilize the gain-controlling method in the frequency domain to denoise the demonstration and propose a computationally more efficient metric, called Mean Square Error in the Spectrum (MSES), that measures the trajectories’ differences in the frequency domain. We also theoretically highlight the connections between the time domain and the frequency domain methods. Finally, we verify our method in both simulation experiments and real-world experiments, showcasing its improved optimization performance and stability compared to existing methods. The source code can be found at https://ming-bot.github.io/MSTOMP.github.io.

17:10-17:15, Paper TuDT7.7
Reactive 3D Motion Planning in Dynamic Environments Using Efficient Model Predictive Control Via Circular Fields

Zeug, Fabrice	Gottfried Wilhelm Leibniz University Hanover
Kleinjohann, Sarah	Leibniz University Hannover
Lilge, Torsten	Leibniz Universität Hannover
Becker, Marvin	Gottfried Wilhelm Leibniz Universität Hannover
Müller, Matthias A.	Gottfried Wilhelm Leibniz Universität Hannover
Keywords: Motion and Path Planning, Optimization and Optimal Control, Reactive and Sensor-Based Planning Abstract: In this paper, we present a novel online global reactive motion planner that synergizes the benefits of reactive control and model predictive control (MPC). By applying circular fields, the planner significantly simplifies the problem of determining control inputs for mobile robots and manipulators, making real-time MPC feasible even in complex and dynamic three-dimensional environments. This approach utilizes the performance advantages of optimal control while maintaining reactivity to environmental changes and computational efficiency. The proposed motion planner is evaluated in various simulated scenarios, including complex dynamic environments with up to 100 moving obstacles, and is compared to different state-of-the-art approaches.

17:15-17:20, Paper TuDT7.8
Time-Optimal Path Parameterization with Viscous Friction and Jerk Constraints Based on Reachability Analysis

Dio, Maximilian	Friedrich-Alexander-Universität Erlangen-Nürnberg
Wahrburg, Arne	ABB AG, Corporate Research Germany
Enayati, Nima	ABB AG
Graichen, Knut	Friedrich Alexander University Erlangen-Nürnberg
Völz, Andreas	Friedrich-Alexander-Universität Erlangen-Nürnberg
Keywords: Motion and Path Planning, Optimization and Optimal Control Abstract: This paper presents a novel approach for time-optimal path parameterization based on reachability analysis for robotic systems with viscous friction in the dynamics and jerk constraints. The main step is the backward propagation of controllable sets through a linear second-order system. In order to avoid the unbounded growth of the number of constraints, the sets are approximated by a ray shooting algorithm. Using a convex relaxation, the required set expansion can be solved with second-order cone programming. Evaluation results for a 6-degree of freedom (DOF) robot arm highlight the advantages of the method for computing jerk-limited trajectories.


TuDT8	308
Medical Robots and Systems 4	Regular Session
Chair: Navarro-Alarcon, David	The Hong Kong Polytechnic University

16:40-16:45, Paper TuDT8.1
Mechanism Design, Optimization, and Experimental Validation of an Ultrasound-Guided Series-Parallel Hybrid Robot for Prostate Transperineal Puncture

Li, Haiyuan	Beijing University of Posts and Telecommunications
Li, Yanbo	Beijing University of Posts and Telecommunications
Liu, Yuchen	Department School of Intelligent Engineering and Automation
He, Tian	Beijing University of Posts and Telecommunications
Shi, Yilun	Beijing Healinno Medical Technology Co
Keywords: Medical Robots and Systems, Mechanism Design Abstract: Transperineal prostate puncture is challenging for the physician to manually place a needle and presents a steep learning curve. This paper proposes a novel ultrasound -guided series-parallel hybrid robot with the aim to enhance transperineal procedures. For maximum prostate coverage with flexibility and accuracy in needle placement, a 5 degrees of freedom series-parallel hybrid mechanism with two serial manipulators, a linear feeding unit, and an US probe positioning mechanism is be designed. In addition to mechanical design and kinematics modeling, an QPSO algorithm is proposed to optimize the mechanical parameters. Upon comprehensive comparison with alternative algorithms, the optimization outcomes fully align with clinical requirements. The prototype was fabricated and verified through needle insertion experiments in different scenarios to assess its feasibility. The absolute positioning error of the robot is 1.47 mm in water and 1.75 mm in gel phantom.

16:45-16:50, Paper TuDT8.2
A Precise and High-Load Capacity Miniature 6-DOF Manipulator for Microsurgery (I)

Li, Liang	School of Biomedical Engineering and Informatics, Nanjing Medica
Li, Heng	Tsinghua University
Sui, Chaoye	Tsinghua University
Ding, Hui	Tsinghua University
Liu, Bin	Nanjing Medical University
Jianqing, Li	School of Biomedical Engineering and Informatics, Nanjing Medica
Wang, Guangzhi	Tsinghua University
Keywords: Medical Robots and Systems, Micro/Nano Robots, Flexible Robotics Abstract: Miniaturized manipulators are invaluable for microsurgery; however, they often face limitations in precision, load capacity, and degrees of freedom (DOFs) due to size and weight constraints. In this work, we introduce a 6-DOF mini manipulator designed to address these challenges. Powered by cost-effective stepper linear motion modules, the proposed device incorporates a dual-plane mechanism for 4-DOF XY motion. We also innovatively integrated a compliant 2-DOF rod drive system for Z-axis rotation and translation. The manipulator's forward/inverse and remote center of motion (RCM) kinematics models are provided. Furthermore, we developed a microsurgical platform equipped with microvision sensing and remote-control functions. This manipulator, which weighs less than 72 g, enables precise 6-DOF control for tools with diameters less than 0.5 mm. It delivers more than 3 N of thrust force along the compliant rod drive axis, achieving at least 10 μm motion resolution and a 9 N maximum output force per DOF. Rigorous testing was performed, demonstrating the system's ability to successfully perform puncture and drilling operations on porcine skin and bone. Additionally, we showed its effectiveness in ocular surgeries, demonstrating its potential for use as a cost-effective, portable device in minimally invasive procedures.

16:50-16:55, Paper TuDT8.3
Model Predictive Control for 3D Steerable Needles: A Hierarchical Approach to Reduce Tissue Trauma

Hussain, Sajjad	University of Naples Federico II
Tavakoli, Mahdi	University of Alberta
Ficuciello, Fanny	Università Di Napoli Federico II
Siciliano, Bruno	Univ. Napoli Federico II
Keywords: Medical Robots and Systems, Optimization and Optimal Control Abstract: This paper presents a three-dimensional (3D) control framework for bevel-tip steerable needles that combines Model Predictive Control (MPC) with a hierarchical supervisory logic. The MPC layer uses a reduced-order two-mode switching model to generate continuous control actions, while the supervisory logic adaptively prioritizes in-plane and out-of-plane corrections based on real-time error magnitudes. This hierarchical approach smoothly modulates the axial rotation commands to minimize abrupt needle flips, thereby reduces the so-called “drilling effect,” a key source of tissue trauma. The simulation results show that the proposed approach reduces tissue trauma by more than 50% compared to conventional pulse-width-modulated sliding-mode controllers while achieving mean absolute error and targeting errors in the submillimeter range.

16:55-17:00, Paper TuDT8.4
An Improved A-Star Algorithm for Path Planning in Robot-Assisted Long Bone Fracture Reduction

Gao, Qin	Chongqing University of Technology
Wu, Xiaoyong	Chongqing University of Technology
Ding, Jun	Chongqing University of Technology
Yuan, Bo	Chongqing University of Technology
Wang, Yujin	Chongqing University of Technology
Shu, Ruizhi	Chongqing University of Technology
Keywords: Medical Robots and Systems, Parallel Robots, Motion and Path Planning Abstract: Long bone fractures are common clinical conditions, yet the development of robot systems for closed reduction surgery remains in its early stages. The key challenge in this field is the lack of an efficient and precise path planning algorithm. To address this issue, this study proposes an improved A-star (A) algorithm for path planning to enhance the accuracy and efficiency of fracture reduction. The algorithm begins by expanding a random node using the fundamental A algorithm. An artificial potential field (APF) algorithm is then incorporated to optimize the generation of sample nodes and enhance obstacle avoidance. Additionally, a cylindrical bounding box method is employed for collision detection, and a B-spline curve is utilized to smooth the generated path. The experimental validation is conducted on a fracture reduction robot system, demonstrating that the optimized path achieves clinically acceptable accuracy, significantly enhancing the precision and reliability of the reduction procedure.

17:00-17:05, Paper TuDT8.5
Safe Learning by Constraint-Aware Policy Optimization for Robotic Ultrasound Imaging (I)

Duan, Anqing	Mohamed Bin Zayed University of Artificial Intelligence
Yang, Chenguang	University of Liverpool
Zhao, Jingyuan	Hong Kong Polytechnic University
Huo, Shengzeng	The Hong Kong Polytechnic University
Zhou, Peng	Great Bay University
Ma, Wanyu	The Hong Kong Polytechnic University
Zheng, Yong Ping	The Hong Kong Polytechnic University
Navarro-Alarcon, David	The Hong Kong Polytechnic University
Keywords: Medical Robots and Systems, Reinforcement Learning, Safety in HRI Abstract: Ultrasound-based medical examination usually requires establishing proper contact between an ultrasound probe and a human body that ensures the quality of ultrasound images. The scanning skills are quite challenging for a robot to learn primarily due to the complex coupling between the applied force profile and the resulting ultrasound image quality. While reinforcement learning appears as a powerful tool for learning complex robot skills, the deployment of these algorithms in medical robots demands special attention due to the evident safety concerns that arise from physical probe-tissue interactions. In this paper, we explicitly consider external constraints on the force magnitude when searching for the optimal policy parameters to enhance safety during ultrasound-guided robotic interventions. In particular, we study policy optimization under the framework of a constrained Markov decision process. The resulting gradient-based policy update is then subject to the involved constraints, which can be readily addressed by the primal-dual interior-point technique. In addition, upon the observation that policy update requires consecutive policies to be close to each other to have stable and robust performance with reinforcement learning algorithms, we design the learning rate of policy gradient from an imitation perspective. The performance of the proposed constraint-aware policy optimization method is validated with experiments of robotic ultrasound imaging for spinal diagnosis. Note to Practitioners—This paper was motivated by the problem of safely learning the optimal interaction force strategy to facilitate robotic ultrasound imaging. Existing approaches to robotic ultrasound imaging usually empirically set a constant value for the scanning force, despite the fact the force strategy plays an important role in the quality of the ultrasound images. This paper suggests the usage of reinforcement learning to identify the optimal interaction force due to the complex acoustic coupling between the force and the ultrasound image quality. Specifically, we propose constraint-aware reinforcement learning in view of the safety-critical issues as a result of physical human-probe interaction. We then conduct a theoretical analysis of the proposed safe reinforcement learning, including monotonic improvement and policy value bound under mild assumptions. Preliminary real experiments on ultrasound imaging of the spine of a phantom for scoliosis assessment suggest that the proposed approach can safely learn the optimal scanning force without violating the prescribed force threshold. In the future, we would like to apply our approach to learning the optimal scanning force on different organs of interest of human subjects.

17:05-17:10, Paper TuDT8.6
Non-Contact Hand-Guided Coarse Positioning of Neurosurgical Instrument Insertion End Effector Based on Magnetic Sensing

Xian, Yitian	The Chinese University of Hong Kong
Sun, Yichong	The Chinese University of Hong Kong
Luo, Xiao	The Chinese University of Hong Kong
Hu, Yingbai	Technische Universität München
Zou, Limin	The Chinese University of Hong Kong
Chan, Tat-Ming	Prince of Wales Hospital
Chan, David Yuen Chung	Department of Surgery Faculty of Medicine the Chinese University
Li, Zheng	The Chinese University of Hong Kong
Keywords: Medical Robots and Systems, Sensor-based Control Abstract: Despite advantages from neurosurgical systems, achieving intuitive and safe collaboration with robot during the coarse positioning of instrument insertion end effector (IIEE) remains a critical issue. In this paper, we propose a novel non-contact hand-guided method for such advancement based on magnetic sensing. First, a wearable magnet band and a magnetic sensor are designed, based on which the magnetic localization is achieved for surgeon’s hand location detection. Second, a quadratic programming-based control is implemented, to guarantee the pose-based servo performance, higher rotational manipulability for IIEE fine alignment, and joint position&velocity limits avoidance. For evaluation, two experiments are designated and conducted. Results show that the magnetic localization algorithm can achieve < 4.7 mm and 2.6° errors in a dynamic path tracking test, which can provide an accurate magnet location for hand guidance. Moveover, workflow of the proposed solution in a brain biopsy scenario demonstrates its enhancement of IIEE rotational manipulability (11.6% increase at final configuration), and safety improvement of collision avoidance when other surgeon approaches for cannula delivery. This research contributes to enhanced intuitiveness and safety for surgeon-robot collaborative coarse positioning of IIEE in neurosurgery.

17:10-17:15, Paper TuDT8.7
Continuous Renal Calculi Tracking for Autonomous Robotic Ureteroscopic Lithotripsy

Zhang, Quan	Tongji University
Nie, Xiao Hang	TongJi University
Sun, Jingqian	Tongji University
Tang, Ruichao	Tongji University
Tang, Yichao	Tongji University
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles, Machine Learning for Robot Control Abstract: Abstract— Renal calculi, while not inherently life-threatening, can induce excruciating pain during acute episodes. The predominant clinical treatment – ureteroscopic lithotripsy (URS) –currently faces challenges including restricted maneuverability, frequent manual adjustments during dynamic calculi movement, and tissue damage risk, highlighting the need for robotic assistance. This study proposes an autonomous lithotripsy system through three integrated technological advancements: 1) a robotic ureteroscope with sub-millimeter-scale positioning accuracy; 2) a concatenated Quenching-net semantic segmentation visual processing framework based on convolutional neural networks, achieving a segmentation accuracy of 98.5% (validated on 12,768 endoscopic images); 3) a control strategy developed through collaborative deep reinforcement learning(DRL), enabling a 93% success rate in tracking randomly moving calculi. This system's autonomous calculi localization capability reduces operator fatigue and may mitigate cognitive bias in calculi targeting. It demonstrates how embodied AI enhances medical procedural precision while preserving human oversight in critical decisions.

17:15-17:20, Paper TuDT8.8
A Wireless 6-DoF Pose Tracking System Using a Triaxially Anisotropic Soft Magnet (I)

Liu, Suqi	South China University of Technology
Wang, Heng	South China University of Technology
Keywords: Localization, Medical Robots and Systems, Sensor Fusion Abstract: Magnetic tracking, as a noncontacting and occlusion-free pose tracking method, has been increasingly employed to track various intracorporeal medical devices such as endoscopes and medical robots for monitoring, guidance, and automation of the medical procedure. Existing electromagnet-based tracking systems cannot achieve wireless tracking while permanent magnet-based systems are prone to magnetic disturbances and is only able to estimate 5-Degrees of Freedom (DoF) pose (3-DoF position and 2-DoF orientation). In this article, a new wireless 6-DoF magnetic pose tracking system is proposed using a stationary electromagnet as the primary magnetic source and a high-magnetic-permeability soft magnet as the sensitive element attached to the moving target. The soft magnet experiences a pose-dependent magnetization by the local field from the electromagnet, and its magnetic field is then measured by stationary sensors for pose estimation. The geometry of the soft magnet is designed with triaxially unequal dimensions (e.g., triaxial ellipsoids) to enable anisotropic magnetization and thereby 3-DoF orientation tracking. The magnetic response to the 6-DoF motion of the soft magnet is analytically modeled and experimentally validated. An artificial neural network is trained to inverse the nonlinear measurement model and directly estimate the pose from magnetic measurements. Pose tracking experiments are conducted with simultaneous translation and rotation of the soft magnet, which shows that the position error is below 5 mm and the orientation error is below 4°.


TuDT9	309
Semantic Scene Understanding: Visual Perception	Regular Session
Chair: Wang, Wenshan	Carnegie Mellon University
Co-Chair: Goulette, François	MINES ParisTech

16:40-16:45, Paper TuDT9.1
3D-MoRe: Unified Modal-Contextual Reasoning for Embodied Question Answering

Xu, Rongtao	Spatialtemporal AI, China
Gao, Han	Beijing University of Posts and Telecommunications
Yu, Ming-Ming	BeiHang University
An, Dong	Institute of Automation, Chinese Academy of Sciences
Chen, Shunpeng	Beijing University of Posts and Telecommunications
Wang, Changwei	Sdas
Guo, Li	BUPT
Liang, Xiaodan	Sun Yat-Sen University
Xu, Shibiao	Beijing University of Posts and Telecommunications
Keywords: Semantic Scene Understanding, Data Sets for Robotic Vision, Deep Learning for Visual Perception Abstract: With the growing need for diverse and scalable data in indoor scene tasks, such as question answering and dense captioning, we propose 3D-MoRe, a novel paradigm designed to generate large-scale 3D-language datasets by leveraging the strengths of foundational models. The framework integrates key components, including multi-modal embedding, cross-modal interaction, and a language model decoder, to process natural language instructions and 3D scene data. This approach facilitates enhanced reasoning and response generation in complex 3D environments. Using the ScanNet 3D scene dataset, along with text annotations from ScanQA and ScanRefer, 3D-MoRe generates 62,000 question-answer (QA) pairs and 73,000 object descriptions across 1,513 scenes. We also employ various data augmentation techniques and implement semantic filtering to ensure high-quality data. Experiments on ScanQA demonstrate that 3D-MoRe significantly outperforms state-of-the-art baselines, with the CIDEr score improving by 2.15%. Similarly, on ScanRefer, our approach achieves a notable increase in CIDEr@0.5 by 1.84%, highlighting its effectiveness in both tasks. Our code and generated datasets will be publicly released to benefit the community, and both can be accessed on the https://3D-MoRe.github.io.

16:45-16:50, Paper TuDT9.2
RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration

Alama, Omar	Carnegie Mellon University
Bhattacharya, Avigyan	Carnegie Mellon University
He, Haoyang	Carnegie Mellon University
Kim, Seungchan	Carnegie Mellon University
Qiu, Yuheng	Carnegie Mellon University
Wang, Wenshan	Carnegie Mellon University
Ho, Cherie	Carnegie Mellon University
Keetha, Nikhil Varma	Carnegie Mellon University
Scherer, Sebastian	Carnegie Mellon University
Keywords: Semantic Scene Understanding, Mapping, Deep Learning for Visual Perception Abstract: Open-set semantic mapping is crucial for robotic systems to reason, search, and navigate in real-world environments. However, existing open-vocabulary semantic mapping systems often rely on computationally expensive, multi-stage and multi-model pipelines to extract pixel-level language-aligned features, limiting their deployment. Additionally, most approaches rely on dense depth maps to reconstruct their metric representations, making them incapable of leveraging non-metric semantic information. We address these challenges with RayFronts -- an open-set, task-agnostic 3D mapping system that encodes all available semantic information in voxels and ray-based frontiers, adapting to the availability of metric depth information. This enables object-search algorithms to significantly reduce their search volume. RayFronts introduces a novel image encoding approach, which surpasses state-of-the-art results in 3D semantic segmentation while improving throughput by 16.5 times. Furthermore, this work proposes a novel planner-agnostic, online-search evaluation framework for mapping systems, allowing researchers to assess their representations independently of global planner performance. Using this evaluation, we demonstrate that RayFronts outperforms existing semantic maps in semantic search efficiency.

16:50-16:55, Paper TuDT9.3
Cross-Modal State Space Model for Real-Time RGB-Thermal Wild Scene Semantic Segmentation

Guo, Xiaodong	Beijing Institute of Technology
Lin, Zi'ang	Beijing Institute of Technology
Hu, Luwen	Beijing Institude of Technology
Deng, Zhihong	Beijing Institute of Technology
Liu, Tong	Beijing Institute of Technology
Zhou, Wujie	Zhejiang University of Science and Technology
Keywords: Semantic Scene Understanding, Sensor Fusion, Field Robots Abstract: The integration of RGB and thermal data can significantly improve semantic segmentation performance in wild environments for field robots. Nevertheless, multi-source data processing (e.g. Transformer-based approaches) imposes significant computational overhead, presenting challenges for resource-constrained systems. To resolve this critical limitation, we introduced CM-SSM, an efficient RGB-thermal semantic segmentation architecture leveraging a cross-modal state space modeling (SSM) approach. Our framework comprises two key components. First, we introduced a cross-modal 2D-selective-scan (CM-SS2D) module to establish SSM between RGB and thermal modalities, which constructs cross-modal visual sequences and derives hidden state representations of one modality from the other. Second, we developed a cross-modal state space association (CM-SSA) module that effectively integrates global associations from CM-SS2D with local spatial features extracted through convolutional operations. In contrast with Transformer-based approaches, CM-SSM achieves linear computational complexity with respect to image resolution. Experimental results show that CM-SSM achieves state-of-the-art performance on the CART dataset with fewer parameters and lower computational cost. Further experiments on the PST900 dataset demonstrate its generalizability. Codes are available at https://github.com/xiaodonguo/CMSSM.

16:55-17:00, Paper TuDT9.4
HARP-NeXt: High-Speed and Accurate Range-Point Fusion Network for 3D LiDAR Semantic Segmentation

Abou Haidar, Samir	Mines Paris - PSL, and CEA
Chariot, Alexandre	CEA
Darouich, Mehdi	Commissariat à L'Energie Atomique Et Au Energies Alternatives (C
Joly, Cyril	Mines ParisTech, PSL Research University
Deschaud, Jean-Emmanuel	ARMINES
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Embedded Systems for Robotic and Automation Abstract: LiDAR semantic segmentation is crucial for autonomous vehicles and mobile robots, requiring high accuracy and real-time processing, especially on resource-constrained embedded systems. Previous state-of-the-art methods often face a trade-off between accuracy and speed. Point-based and sparse convolution-based methods are accurate but slow due to the complexity of neighbor searching and 3D convolutions. Projection-based methods are faster but lose critical geometric information during the 2D projection. Additionally, many recent methods rely on test-time augmentation (TTA) to improve performance, which further slows the inference. Moreover, the pre-processing phase across all methods increases execution time and is demanding on embedded platforms. Therefore, we introduce HARP-NeXt, a high-speed and accurate LiDAR semantic segmentation network. We first propose a novel pre-processing methodology that significantly reduces computational overhead. Then, we design the Conv-SE-NeXt feature extraction block to efficiently capture representations without deep layer stacking per network stage. We also employ a multi-scale range-point fusion backbone that leverages information at multiple abstraction levels to preserve essential geometric details, thereby enhancing accuracy. Experiments on the nuScenes and SemanticKITTI benchmarks show that HARP-NeXt achieves a superior speed-accuracy trade-off compared to all state-of-the-art methods, and, without relying on ensemble models or TTA, is comparable to the top-ranked PTv3, while running 24× faster. The code is available at https://github.com/SamirAbouHaidar/HARP-NeXt

17:00-17:05, Paper TuDT9.5
MEFusion: Memory-Efficient Data Fusion for Real-Time 3D Reconstruction on Resource-Constrained Devices

Cao, Ruizhi	Beihang University
Wang, Rui	Beihang University
Wen, Yu	University of Houston
Xie, Chenhao	Beihang University
Keywords: Semantic Scene Understanding, RGB-D Perception Abstract: Online semantic 3D modeling from streaming RGB-D data fundamentally requires consistent fusion of 2D segmentation. Popular approaches address segmentation inconsistencies through histogram-based label aggregation, where each 3D element (point/voxel) maintains the frequency of candidate labels, which introduces prohibitive memory and computational overhead for resource-constrained devices. In response to this challenge, we propose MEFusion, a memory-efficient probabilistic fusion framework to avoid element-wise histogram aggregation. Specifically, we propose a element-wise probability update algorithm based on Bayesian Estimation, where each voxel stores only one instance label and updates it based on a posterior probability to maintain segmentation consistency. Following 3D segmentation, we establish a segment-wise voting framework to aggregate the semantic labels from historical data, where co-segment voxels share the semantic voting histogram, for semantic consistency. Our experiments demonstrate that our method achieves a memory reduction of 77%(85%) and a speed improvement of 58%(6.12x) on the desktop (embedded) platform while maintaining comparable reconstruction accuracy to the state-of-the-art point-cloud-based method.

17:05-17:10, Paper TuDT9.6
LuSeg: Efficient Negative and Positive Obstacles Segmentation Via Contrast-Driven Multi-Modal Feature Fusion on the Lunar

Jiao, Shuaifeng	National University of Defense Technology
Zeng, Zhiwen	National University of Defense Technology
Su, Zhuoqun	National University of Defense Technology
Chen, Xieyuanli	National University of Defense Technology
Zhou, Zongtan	National University of Defense Technology
Lu, Huimin	National University of Defense Technology
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, RGB-D Perception Abstract: As lunar exploration missions grow increasingly complex, ensuring safe and autonomous rover-based surface exploration has become one of the key challenges in lunar exploration tasks. In this work, we have developed a lu- nar surface simulation system called the Lunar Exploration Simulator System (LESS) and the LunarSeg dataset, which provides RGB-D data for lunar obstacle segmentation that includes both positive and negative obstacles. Additionally, we propose a novel two-stage segmentation network called LuSeg. Through contrastive learning, it enforces semantic consistency between the RGB encoder from Stage I and the depth encoder from Stage II. Experimental results on our proposed LunarSeg dataset and additional public real-world NPO road obstacle dataset demonstrate that LuSeg achieves state-of-the-art segmentation performance for both positive and negative obstacles while maintaining a high inference speed of approximately 57 Hz. We have released the implementation of our LESS system, LunarSeg dataset, and the code of LuSeg at: https://github.com/nubot-nudt/LuSeg.

17:10-17:15, Paper TuDT9.7
MMCD: Multi-Modal Collaborative Decision-Making for Connected Autonomy with Knowledge Distillation

Liu, Rui	University of Maryland
Wang, Zikang	North Carolina a & T State University
Gao, Peng	North Carolina State University
Shen, Yu	University of Maryland
Tokekar, Pratap	University of Maryland
Lin, Ming C.	University of Maryland at College Park
Keywords: Cooperating Robots, Semantic Scene Understanding, Intelligent Transportation Systems Abstract: Autonomous systems have advanced significantly, but challenges persist in accident-prone environments where robust decision-making is crucial. A single vehicle's limited sensor range and obstructed views increase the likelihood of accidents. Multi-vehicle connected systems and multi-modal approaches, leveraging RGB images and LiDAR point clouds, have emerged as promising solutions. However, existing methods often assume the availability of all data modalities and connected vehicles during both training and testing, which is impractical due to potential sensor failures or missing connected vehicles. To address these challenges, we introduce a novel framework MMCD (Multi-Modal Collaborative Decision-making) for connected autonomy. Our framework fuses multi-modal observations from ego and collaborative vehicles to enhance decision-making under challenging conditions. To ensure robust performance when certain data modalities are unavailable during testing, we propose an approach based on cross-modal knowledge distillation with a teacher-student model structure. The teacher model is trained with multiple data modalities, while the student model is designed to operate effectively with reduced modalities. In experiments on connected autonomous driving with ground vehicles and aerial-ground vehicles collaboration, our method improves driving safety by up to 20.7%, surpassing the best-existing baseline in detecting potential accidents and making safe driving decisions. More information can be found on our website https://ruiiu.github.io/mmcd.

17:15-17:20, Paper TuDT9.8
Memory-Efficient Real Time Many-Class 3D Metric-Semantic Mapping

Nadgir, Vallabh	University of Illinois Urbana-Champaign
Correia Marques, Joao Marcos	University of Illinois at Urbana-Champaign
Hauser, Kris	University of Illinois at Urbana-Champaign
Keywords: Mapping, Sensor Fusion, Semantic Scene Understanding Abstract: Metric-semantic 3D mapping is the process of creating class-labeled 3D maps by fusing the information from images captured by a moving camera. The memory usage required by standard solutions grows linearly with the number of semantic classes being considered, which can pose a bottleneck in large and many-class scenes. This paper proposes two novel methods for compressing the memory used by semantic fusion: calibrated top-k histogram and encoded fusion. The first method maintains, for each voxel, only the counts of the k most likely classes, while the second method uses a neural network to encode all-class probability vectors into a k-dimensional latent space in which per-voxel fusion is performed. The fused result is then decoded, at query time, using another neural network. Experiments show that both methods preserve map accuracy and calibration even at low values of k, and per-voxel memory usage is linear in k. The proposed methods can achieve real-time semantic fusion with 150 classes on commodity GPUs in building-scale scenes where prior approaches run out of memory.


TuDT10	310
Semantic Recognition and Scene Understanding	Regular Session
Co-Chair: Oh, Minho	KAIST

16:40-16:45, Paper TuDT10.1
PCMF2-Net: A Pyramid Cross-Modal Feature Fusion Network for Off-Road Freespace Detection

Gao, Ming	Nanjing University of Science and Technology
Lu, Chunpeng	Nanjing University of Science and Technology
Gu, Shuo	Nanjing University of Science and Technology
Zhang, Yigong	Nankai University
Zhang, Chenyang	Changzhou Institute of Technology
Kong, Hui	University of Macau
Keywords: Semantic Scene Understanding, Sensor Fusion, Intelligent Transportation Systems Abstract: Freespace detection plays an important role in autonomous driving. In recent years, deep learning based freespace detection methods have performed well in urban scenes. However, for off-road scenes, freespace detection still poses significant challenges due to the complexity of scenes and lack of clear edges. The existing methods have not effectively fused LiDAR data and camera images. In this paper, we proposed a pyramid cross-modal feature fusion network (PCMF2-Net) for off-road freespace detection. The dense depth maps are concatenated with RGB images and used as inputs along with surface normal maps. The dual branch CNN-Transformer encoder combines convolutional neural networks and transformers to extract local and global features from RGBD images and surface normal maps, respectively. Then, in the pyramid cross-modal feature fusion module, the multi-scale and multi-modal encoder features are fused in a top-down manner. In addition, we also use an edge segmentation task and a two-step training strategy to further improve the performance. Experiments on the off-road freespace detection dataset (ORFD) demonstrate that the proposed PCMF2-Net achieves a competitive result of 93.9% IoU at a speed of 23 Hz.

16:45-16:50, Paper TuDT10.2
Overlap-Aware Feature Learning for Robust Unsupervised Domain Adaptation for 3D Semantic Segmentation

Chen, Junjie	Southern University of Science and Technology
Xu, Yuecong	National University of Singapore
Li, Haosheng	Southern University of Science and Technology
Ding, Kemi	Southern University of Science and Technology
Keywords: Transfer Learning, Semantic Scene Understanding, Object Detection, Segmentation and Categorization Abstract: 3D point cloud semantic segmentation (PCSS) is a cornerstone for environmental perception in robotic systems and autonomous driving, enabling precise scene understanding through point-wise classification. While unsupervised domain adaptation (UDA) mitigates label scarcity in PCSS, existing methods critically overlook the inherent vulnerability to real-world perturbations (e.g., snow, fog, rain) and adversarial distortions. This work first identifies two intrinsic limitations that undermine current PCSS-UDA robustness: (a) unsupervised features overlap from unaligned boundaries in shared-class regions and (b) feature structure erosion caused by domain-invariant learning that suppresses target-specific patterns. To address the proposed problems, we propose a tripartite framework consisting of: 1) a robustness evaluation model quantifying resilience against adversarial attack/corruption types through robustness metrics; 2) an invertible attention alignment module (IAAM) enabling bidirectional domain mapping while preserving discriminative structure via attention-guided overlap suppression; and 3) a contrastive memory bank with quality-aware contrastive learning that progressively refines pseudo-labels with feature density and cleanliness for more discriminative representations. Extensive experiments on SynLiDAR-to-SemanticPOSS adaptation demonstrate a maximum mIoU improvement of 14.3% under adversarial attack.

16:50-16:55, Paper TuDT10.3
Collaborative Dynamic 3D Scene Graphs for Open-Vocabulary Urban Scene Understanding

Steinke, Tim	University of Freiburg
Büchner, Martin	University of Freiburg
Vödisch, Niclas	University of Freiburg
Valada, Abhinav	University of Freiburg
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Mapping Abstract: Mapping and scene representation are fundamental to reliable planning and navigation in mobile robots. While purely geometric maps using voxel grids allow for general navigation, obtaining up-to-date spatial and semantically rich representations that scale to dynamic large-scale environments remains challenging. In this work, we present CURB-OSG, an open-vocabulary dynamic 3D scene graph engine that generates hierarchical decompositions of urban driving scenes via multi-agent collaboration. By fusing the camera and LiDAR observations from multiple perceiving agents with unknown initial poses, our approach generates more accurate maps compared to a single agent while constructing a unified open-vocabulary semantic hierarchy of the scene. Unlike previous methods that rely on ground truth agent poses or are evaluated purely in simulation, CURB-OSG alleviates these constraints. We evaluate the capabilities of CURB-OSG on real-world multi-agent sensor data obtained from multiple sessions of the Oxford Radar RobotCar dataset. We demonstrate improved mapping and object prediction accuracy through multi-agent collaboration as well as evaluate the environment partitioning capabilities of the proposed approach. To foster further research, we release our code and supplementary material at https://ov-curb.cs.uni-freiburg.de.

16:55-17:00, Paper TuDT10.4
TACS-Graphs: Traversability-Aware Consistent Scene Graphs for Ground Robot Localization and Mapping

Kim, Jeewon	School of Electrical Engineering, KAIST
Oh, Minho	KAIST
Myung, Hyun	KAIST (Korea Advanced Institute of Science and Technology)
Keywords: Semantic Scene Understanding, SLAM, Mapping Abstract: Scene graphs have emerged as a powerful tool for robots, providing a structured representation of spatial and semantic relationships for advanced task planning. Despite their potential, conventional 3D indoor scene graphs face critical limitations, particularly under- and over-segmentation of room layers in structurally complex environments. Under-segmentation misclassifies non-traversable areas as part of a room, often in open spaces, while over-segmentation fragments a single room into overlapping segments in complex environments. These issues stem from naive voxel-based map representations that rely solely on geometric proximity, disregarding the structural constraints of traversable spaces and resulting in inconsistent room layers within scene graphs. To the best of our knowledge, this work is the first to tackle segmentation inconsistency as a challenge and address it with Traversability-Aware Consistent Scene Graphs (TACS-Graphs), a novel framework that integrates ground robot traversability with room segmentation. By leveraging traversability as a key factor in defining room boundaries, the proposed method achieves a more semantically meaningful and topologically coherent segmentation, effectively mitigating the inaccuracies of voxel-based scene graph approaches in complex environments. Furthermore, the enhanced segmentation consistency improves loop closure detection efficiency in the proposed Consistent Scene Graph-leveraging Loop Closure Detection (CoSG-LCD) leading to higher pose estimation accuracy. Experimental results confirm that the proposed approach outperforms state-of-the-art methods in terms of scene graph consistency and pose graph optimization performance.

17:00-17:05, Paper TuDT10.5
Soft-Rigid Coupled Blade Leg Achieves Spatio-Temporal Terrain Classification with Minimal Sensor Configuration

Sirithunge, Chapa	University of Cambridge
Chandiramani, Vijay	University of Bristol
Xie, Yue	University of Cambridge
Hauser, Helmut	University of Bristol
Conn, Andrew	University of Bristol
Iida, Fumiya	University of Cambridge
Keywords: Recognition, Soft Sensors and Actuators, Legged Robots Abstract: Fast-legged humanoid robots are transforming industries from manufacturing to medical robotics, with the global market projected to grow from 0.67 billion in 2024 to 2.27 billion by 2033 at a 14.3% CAGR. Despite rapid advancements, challenges remain in navigating complex terrains, especially uneven, deformable, and high-friction surfaces. This paper presents the first minimally sensorised blade leg made by coupling soft and rigid materials for robots: an alternative approach for multimodal sensing and advanced control algorithms in terrain navigation. This incorporates a passive leg design embedded with barometric pressure sensors that are proven to retain high dimentional spatio-temporal data. Hence we hypothesized that barometric pressure sensors can capture multidimensional terrain data and subtle surface compliance changes through spatiotemporal pressure patterns. The blade was mounted on an UR5 robotic arm and tested in terrains of varied textures, including aluminium, pebble, coir, and sandpaper; materials spanning a diverse range of stiffness. Spatiotemporal data from the sensors were recorded and analyzed to assess terrain characteristics and leg-terrain interactions under different conditions. The results demonstrated that barometric pressure sensors could accurately recognize different terrains with as few as three sensors in a 2-second time frame. Recognition accuracy improved with more sensors, demonstrating the effectiveness of morphologically adapted composite structures with optimally placed minimal sensors.

17:05-17:10, Paper TuDT10.6
SAMap: Semantic Alignment for HD Map Detection Domain Generalization under Varying Weather and Lighting

Gao, Wenjie	Xi'an Jiaotong University
Jing, Haodong	Xi'an Jiaotong University
Fu, Jiawei	The Chinese University of Hong Kong
Zhu, Ziyu	Xi'an Jiaotong University
Chen, Shitao	Xi'an Jiaotong University
Zheng, Nanning	Xi'an Jiaotong University
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Computer Vision for Transportation Abstract: High-definition (HD) maps are crucial for autonomous driving systems. Despite recent advances in learning-based HD map prediction methods, these approaches experience significant performance degradation when encountering unseen weather or lighting conditions due to feature distribution discrepancies (domain gaps) of input images. To address this issue, we propose SAMap, a novel map learning framework that enhances domain generalization capabilities of existing models by reducing domain discrepancies in input images. SAMap innovatively introduces a Semantic Aligner, an image-to-image transformation module that aligns images from different domains into a unified domain space while preserving semantic consistency. To train this aligner, we leverage Vision-Language Models (VLMs) that have acquired image-text alignment capabilities. Specifically, we first train a Prompt Learner that combines handcrafted and learnable prompts to capture domain-invariant semantic information. We then train Semantic Aligner through dual supervision mechanisms: a content preservation loss that maintains feature consistency across transformations and a semantic alignment loss that leverages VLM's encoders to align transformed images with domain-invariant textual representations. Adequate experiments on the NuScenes dataset demonstrate that when integrated with three existing HD map prediction methods, SAMap achieves a performance improvement of up to 11.6% on unseen domains (rain or night conditions), effectively validating its generalization capabilities across domains.


TuDT11	311A
Reinforcement Learning 4	Regular Session
Chair: Miao, Fei	University of Connecticut
Co-Chair: Masnavi, Houman	Toronto Metropolitan University

16:40-16:45, Paper TuDT11.1
Dual Agent Learning Based Aerial Trajectory Tracking

Garg, Shaswat	University of Waterloo
Masnavi, Houman	Toronto Metropolitan University
Fidan, Baris	University of Waterloo
Janabi-Sharifi, Farrokh	Ryerson University
Keywords: Reinforcement Learning, Aerial Systems: Perception and Autonomy, Motion and Path Planning Abstract: This paper presents a novel reinforcement learn- ing framework for trajectory tracking of unmanned aerial vehicles in cluttered environments using a dual-agent ar- chitecture. Traditional optimization methods for trajectory tracking face significant computational challenges and lack robustness in dynamic environments. Our approach employs deep reinforcement learning (RL) to overcome these limitations, leveraging 3D pointcloud data to perceive the environment without relying on memory-intensive obstacle representations like occupancy grids. The proposed system features two RL agents: one for predicting UAV velocities to follow a reference trajectory and another for managing collision avoidance in the presence of obstacles. This architecture ensures real-time performance and adaptability to uncertainties. We demonstrate the efficacy of our approach through simulated and real-world experiments, highlighting improvements over state-of-the-art RL and optimization-based methods. Additionally, a curriculum learning paradigm is employed to scale the algorithms to more complex environments, ensuring robust trajectory tracking and obstacle avoidance in both static and dynamic scenarios.

16:45-16:50, Paper TuDT11.2
Safe Multi-Agent Reinforcement Learning for Behavior-Based Cooperative Navigation

Dawood, Murad	University of Bonn
Pan, Sicong	University of Bonn
Dengler, Nils	University of Bonn
Zhou, Siqi	Technical University of Munich
Schoellig, Angela P.	TU Munich
Bennewitz, Maren	University of Bonn
Keywords: Reinforcement Learning, Robot Safety, Multi-Robot Systems Abstract: In this paper, we address the problem of behavior-based cooperative navigation of mobile robots using safe multi-agent reinforcement learning~(MARL). Our work is the first to focus on cooperative navigation without individual reference targets for the robots, using a single target for the formation's centroid. This eliminates the complexities involved in having several path planners to control a team of robots. To ensure safety, our MARL framework uses model predictive control (MPC) to prevent actions that could lead to collisions during training and execution. We demonstrate the effectiveness of our method in simulation and on real robots, achieving safe behavior-based cooperative navigation without using individual reference targets, with zero collisions, and faster target reaching compared to baselines. Finally, we study the impact of MPC safety filters on the learning process, revealing that we achieve faster convergence during training and we show that our approach can be safely deployed on real robots, even during early stages of the training.

16:50-16:55, Paper TuDT11.3
Multi-Agent Reinforcement Learning Guided by Signal Temporal Logic Specifications

Wang, Jiangwei	University of Connecticut
Yang, Shuo	University of Pennsylvania
An, Ziyan	Vanderbilt University
Han, Songyang	Amazon AWS
Zhang, Zhili	University of Connecticut
Mangharam, Rahul	University of Pennsylvania
Ma, Meiyi	Vanderbilt University
Miao, Fei	University of Connecticut
Keywords: Reinforcement Learning, Autonomous Agents, Planning, Scheduling and Coordination Abstract: Reward design is a key component of deep reinforcement learning (DRL), yet some tasks and designer’s objectives may be unnatural to define as a scalar cost function. Among the various techniques, formal methods integrated with DRL have garnered considerable attention due to their expressiveness and flexibility in defining the reward and requirements for different states and actions of the agent. Nevertheless, the exploration of leveraging Signal Temporal Logic (STL) for guiding multi-agent reinforcement learning (MARL) reward design is still limited. The presence of complex interactions, heterogeneous goals, and critical safety requirements in multi-agent systems exacerbates this challenge. In this paper, we propose a novel STL-guided multi-agent reinforcement learning framework. The STL requirements are designed to include both task specifications according to the objective of each agent and safety specifications. The robustness values from checking the states against STL specifications are leveraged to generate rewards. We validate our approach by conducting experiments across various testbeds. The experimental results demonstrate significant performance improvements compared to MARL without STL guidance, along with a remarkable increase in the overall safety rate of the multi-agent systems.

16:55-17:00, Paper TuDT11.4
Transformer-Based Multi-Agent Reinforcement Learning Method with Credit-Oriented Strategy Differentiation

Huang, Kaixuan	Dalian University of Technology
Jin, Bo	Dalian University of Technology
Zhang, Kun	Northwestern Polytechnical University
Piao, Haiyin	Northwestern Polytechnical University
Wei, Ziqi	Chinese Academy of Science
Keywords: Reinforcement Learning, Multi-Robot Systems, Cooperating Robots Abstract: The problem of Multi-Agent Reinforcement Learning (MARL) shows a high level of both complexity in the environment and coordination between agents. In order to scale the algorithm to large-scale agent scenarios, neural networks designed for MARL are typically implemented with parameter sharing. These characteristics result in the challenges of partial observability, credit assignment and strategy homogenization. In this paper, a Transformer-Based Multi-Agent Reinforcement Learning Method With Credit-Oriented Strategy Differentiation (TMRC) is presented to address each of these challenges. First, we design a Temporal-Spatial Encoding module and an Attention-Based Value Decomposition module based on the Transformer architecture. The former leverages both temporal and spatial observation information, compensating for the missing environmental perspectives due to partial observability. The latter is designed to identify each agent's individual contribution in complex interactions, effectively optimizing the credit assignment process. Then, we propose a Credit-Oriented Strategy Differentiation module that differentiates the entity representations of each agent based on their current task differences, allowing agents to have distinct real-time strategies, effectively mitigating the issue of strategy homogenization. We evaluate the proposed method on the SMAC benchmark. It demonstrates better final performance, faster convergence, and greater stability compared to other comparative methods. Additionally, a series of experiments are conducted to validate the effectiveness of the proposed modules. Our code is available at https://github.com/Hkxuan/TMRC.git.

17:00-17:05, Paper TuDT11.5
Decision Transformer-Based Drone Trajectory Planning with Dynamic Safety–Efficiency Trade-Offs

Ji, Chang-Hun	Korea University of Technology and Education
Song, SiWoon	Chungbuk National University
Han, Youn-Hee	Korea University of Technology and Education
Moon, SungTae	Chungbuk National University
Keywords: Reinforcement Learning, Autonomous Vehicle Navigation, Machine Learning for Robot Control Abstract: A drone trajectory planner should be able to dynamically adjust the safety–efficiency trade-off according to varying mission requirements in unknown environments. Although traditional polynomial-based planners offer computational efficiency and smooth trajectory generation, they require expert knowledge to tune multiple parameters to adjust this trade-off. Moreover, even with careful tuning, the resulting adjustment may fail to achieve the desired trade-off. Similarly, although reinforcement learning-based planners are adaptable in unknown environments, they do not explicitly address the safety-efficiency trade-off. To overcome this limitation, we introduce a Decision Transformer–based trajectory planner that leverages a single parameter, Return-to-Go (RTG), as a temperature parameter to dynamically adjust the safety–efficiency trade-off. In our framework, since RTG intuitively measures the safety and efficiency of a trajectory, RTG tuning does not require expert knowledge. We validate our approach using Gazebo simulations in both structured grid and unstructured random environments. The experimental results demonstrate that our planner can dynamically adjust the safety–efficiency trade-off by simply tuning the RTG parameter. Furthermore, our planner outperforms existing baseline methods across various RTG settings, generating safer trajectories when tuned for safety and more efficient trajectories when tuned for efficiency. Real-world experiments further confirm the reliability and practicality of our proposed planner.

17:05-17:10, Paper TuDT11.6
Cost Function Estimation Using Inverse Reinforcement Learning with Minimal Observations

Mehrdad, Sarmad	New York University
Meduri, Avadesh	New York University
Righetti, Ludovic	New York University
Keywords: Reinforcement Learning, Optimization and Optimal Control Abstract: We present an iterative inverse reinforcement learning algorithm to infer optimal cost functions in continuous spaces. Based on a popular maximum entropy criteria, our approach iteratively finds a weight improvement step and proposes a method to find an appropriate step size that ensures learned cost function features remain similar to the demonstrated trajectory features. In contrast to similar approaches, our algorithm can individually tune the effectiveness of each observation for the partition function based on the current estimate of the cost function parameters, guiding the algorithm towards better estimates in the following iterations. In addition, it does not need a large sample set, enabling faster learning. We generate sample trajectories by solving an optimal control problem instead of random sampling, leading to more informative trajectories. The performance of our method is compared to two state of the art algorithms to demonstrate its benefits in several simulated environments.

17:10-17:15, Paper TuDT11.7
CROSS-GAiT: Cross-Attention-Based Multimodal Representation Fusion for Parametric Gait Adaptation in Complex Terrains

Seneviratne, Gershom Devake	University of Maryland, College Park
Kulathun Mudiyanselage, Kasun Weerakoon	University of Maryland, College Park
Elnoor, Mohamed	University of Maryland
Rajagopal, Vignesh	University of Maryland, College Park
Varatharajan, Harshavarthan	University of Maryland
M Jaffar, Mohamed Khalid	University of Maryland, College Park
Pusey, Jason	U.S. Army Research Laboratory (ARL)
Manocha, Dinesh	University of Maryland
Keywords: Representation Learning, Perception-Action Coupling, AI-Enabled Robotics Abstract: We present CROSS-GAiT, a novel algorithm for quadruped robots that uses Cross Attention to fuse terrain representations derived from visual and time-series inputs, including linear accelerations, angular velocities, and joint efforts. These fused representations are used to continuously adjust two critical gait parameters (step height and hip splay), enabling adaptive gaits that respond dynamically to varying terrain conditions. To generate terrain representations, we process visual inputs through a masked Vision Transformer (ViT) encoder and time-series data through a dilated causal convolutional encoder. The Cross Attention mechanism then selects and integrates the most relevant features from each modality, combining terrain characteristics with robot dynamics for informed gait adaptation. This fused representation allows CROSS-GAiT to continuously adjust gait parameters in response to unpredictable terrain conditions in real-time. We train CROSS-GAiT on a diverse set of terrains including asphalt, concrete, brick pavements, grass, dense vegetation, pebbles, gravel, and sand and validate its generalization ability on unseen environments. Our hardware implementation on the Ghost Robotics Vision 60 demonstrates superior performance in challenging terrains, such as high-density vegetation, unstable surfaces, sandbanks, and deformable substrates. We observe at least a 7.04% reduction in IMU energy density and a 27.3% reduction in total joint effort, which directly correlates with increased stability and reduced energy usage when compared to state-of-the-art methods. Furthermore, CROSS-GAiT demonstrates at least a 64.5% increase in success rate and a 4.91% reduction in time to reach the goal in four complex scenarios. Additionally, the learned representations perform 4.48% better than the state-of-the-art on a terrain classification task.

17:15-17:20, Paper TuDT11.8
Informative Trajectory Planning for Air-Ground Cooperative Monitoring of Spatiotemporal Fields (I)

Li, Zhuo	Beijing Institute of Technology, School of Automation
Guo, Yunlong	Beijing Institute of Technology
Wang, Gang	Beijing Institute of Technology
Jian, Sun	Beijing Institute of Technology
You, Keyou	Tsinghua University
Keywords: Path Planning for Multiple Mobile Robots or Agents, Cooperating Robots, Reinforcement Learning Abstract: This paper investigates an air-ground cooperative monitoring problem for spatiotemporal fields, such as air pollution, forest fires, oil spills, etc, with unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs). To fully exploit complementarities of these heterogeneous vehicles and improve efficiency of the cooperative monitoring, we design a novel cooperation scheme: each UAV is assigned to loiter over and transmit its observations to a pre-allocated UGV, and the UGV provides guidance on informative trajectories for the UAV and aims to reach a target position as fast as possible. Such a scheme brings challenges to informative trajectory planning of the UGVs, lying in the delayed observations from the UAVs and the cumulative information constraint depending on the unknown field. To overcome them, this work proposes a model-free reinforcement learning (RL)-based trajectory planning method to learn continuous policies for the UGVs, where a field estimator is designed for each UGV to recover observability of the field. In addition, we derive model predictive control (MPC)-based trajectory planners for the UAVs with tailored reference positions, where the uncertain tracking errors can be handled by the RL-based method of the UGVs. Thus, a performance coupling problem of the heterogeneous vehicles is tackled. Simulations illustrate the effectiveness of the proposed trajectory planning methods and the efficiency of the air-ground cooperative monitoring scheme. Note to Practitioners—This article is motivated by cooperative monitoring tasks with UAVs and UGVs in practical applications, such as environmental monitoring, search and rescue after disasters, etc. Due to the complex dynamics of spatiotemporal fields in these tasks, trajectory planning for the cooperative monitoring system is challenging and requires much computations. To resolve the issues, we propose a novel cooperation scheme in this article, where the large computational capability of the UGVs is utilized to solve a minimum-time trajectory planning problem under a cumulative information constraint, and the UAVs only loiter over and transmit measurements about the field to the UGVs. To achieve this scheme, RL-based and MPC-based trajectory planning methods are proposed for the UGVs and the UAVs, respectively. Simulations have validated the effectiveness of the proposed trajectory planning methods and good performance of the cooperative monitoring system.


TuDT12	311B
Robot Audition	Regular Session
Chair: Kong, He	Southern University of Science and Technology

16:40-16:45, Paper TuDT12.1
Robots Have Been Seen and Not Heard: Effects of Consequential Sounds on Human-Perception of Robots

Allen, Aimee	Monash University
Drummond, Tom	University of Melbourne
Kulic, Dana	Monash University
Keywords: Robot Audition, Human-Centered Robotics, Social HRI Abstract: Robots make compulsory machine sounds, known as ‘consequential sounds’, as they move and operate. As robots become more prevalent in workplaces, homes and public spaces, understanding how sounds produced by robots affect human-perceptions of these robots is becoming increasingly important to creating positive human robot interactions (HRI). This paper presents the results from 182 participants (858 trials) investigating how human-perception of robots is changed by consequential sounds. In a between-participants study, participants in the sound condition were shown 5 videos of different robots and asked their opinions on the robots and the sounds they made. This was compared to participants in the control condition who viewed silent videos. Consequential sounds correlated with significantly more negative perceptions of robots, including increased negative ‘associated affects’, feeling more distracted, and being less willing to colocate in a shared environment with robots.

16:45-16:50, Paper TuDT12.2
Swarm Active Audition with Robots and Drones: Real-World Performance Validation

Nakadai, Kazuhiro	Institute of Science Tokyo
Hoshiba, Kotaro	Tokyo Institute of Technology
Yen, Benjamin	Institute of Science Tokyo
Kumon, Makoto	Kumamoto University
Sasaki, Yoko	National Institute of Advanced Industrial Science and Technology
Keywords: Robot Audition, Search and Rescue Robots, Cooperating Robots Abstract: Search and rescue (SAR) operations in largescale disaster sites, such as earthquakes, require rapid victim detection. While drones equipped with cameras are commonly used for SAR, their effectiveness is limited in visually obstructed environments, such as those with debris, smoke, or fog. In such scenarios, auditory information can play a crucial role in locating victims who are not visible. Existing drone audition research has demonstrated the feasibility of detecting sound sources using onboard microphone arrays. However, most studies focus on single-drone systems, which face limitations in coverage and accessibility, particularly in complex environments such as collapsed buildings or urban canyons. Additionally, real-world validation of multi-drone audition systems remains limited, with prior studies relying primarily on simulations or controlled environments. To address these challenges, we propose and evaluate a Multi-Drone and Robot-Based Active Audition System (SAAS-RD: Swarm Active Audition System with Robots and Drones) that integrates multiple drones and ground robots to enhance acoustic search capabilities. Our work focuses on realworld performance validation, conducting field experiments in outdoor environments and analyzing system feasibility through case studies. The results demonstrate the potential of SAAS-RD as a practical solution for large-scale SAR operations.

16:50-16:55, Paper TuDT12.3
SAGENet: Acoustic Echo-Based 3D Depth Estimation with Sparse Angular Queries and Refined Geometric Cues

Liu, GuangYao	Zhejiang University
Cui, Weimeng	Zhejiang University
Xi, Yuzhang	Zhejiang University
Yang, Liu	Zhejiang University
Hu, Peixuan	Zhejiang University
Kong, He	Southern University of Science and Technology
Wang, Zhi	Zhejiang University
Keywords: Robot Audition, Range Sensing Abstract: Inspired by the echo-based perception systems of animals, recent approaches have utilized acoustic echoes for scene depth estimation. Unlike previous methods that implicitly learn spatial features from echoes, which may cause shape and scale drift, we explicitly extract spatial cues, effectively enhancing depth estimation accuracy. First, we leverage signal processing to generate coarse 2D geometric cues, which contain scene scale and shape information, as additional input for the 3D depth estimation network. This approach aids the network in better reconstructing depth information from the scene. Given the substantial noise in the 2D geometric cues, we design a self-supervised denoising loss function to help the network accurately interpret the scale and shape information embedded in the features. Second, we initialize learnable queries with angular spectrum peaks and fuse them with audio features via self-attention to guide the network to focus on the first few reflections echo dominant feature, while effectively suppressing interference from reverberation. Finally, Our experimental results on the Replica and real-world BatVision datasets show that the proposed method outperforms the state-of-the-art binaural echo-based method by 5.6% and 11.3% in AbsRel, respectively. To benefit the community, we opensource the code at https://github.com/zjuersdsd/SAGENet.git.

16:55-17:00, Paper TuDT12.4
Sound Source Localization for Human-Robot Interaction in Outdoor Environments

Liu, Victor	University of Toronto, Defence Research and Development Canada
Du, Junbo	University of Toronto
Sehn, Jordy	University of Toronto
Collier, Jack	Defence R&D Canada
Grondin, Francois	Université De Sherbrooke
Keywords: Robot Audition Abstract: This paper presents a sound source localization strategy that relies on a microphone array embedded in an unmanned ground vehicle and an asynchronous close-talking microphone near the operator. A signal coarse alignment strategy is combined with a time-domain acoustic echo cancellation algorithm to estimate a time-frequency ideal ratio mask to isolate the target speech from interferences and environmental noise. This allows selective sound source localization, and provides the robot with the direction of arrival of sound from the active operator, which enables rich interaction in noisy scenarios. Results demonstrate an average angle error of 4 degrees and an accuracy within 5 degrees of 95% at a signal-to-noise ratio of 1dB, which is significantly superior to the state-of-the-art localization methods.

17:00-17:05, Paper TuDT12.5
3-D Multiple Sound Source Localization Based on a Five-Element Microphone Array

Qiu, Yizhen	Zhejiang University
Jing, Xiaoyun	Zhejiang University
Ji, Haifeng	Zhejiang University
Wang, Baoliang	Zhejiang University
Huang, Zhiyao	Zhejiang University
Keywords: Robot Audition, Localization, Range Sensing Abstract: A 3-D multiple sound source localization (SSL) method is proposed in this work, which uses independent vector analysis (IVA) combining with an elaborately designed five-element microphone array. The proposed method separates the source signals by IVA. Then for each separated signal, four time-difference-of-arrival (TDOA) values are obtained. With the five-element microphone array, the four TODAs are used to realize the localization of each source separately by analytical solution. To meet the prerequisite of IVA, the properties of the mixing matrix with the five-element microphone array are analyzed and studied for 2 and 3 sound sources. And it is proved that, the configuration of the five-element microphone array can avoid the ill condition where the mixing matrix at each frequency bin has linearly dependent columns. To reduce the computation cost of IVA, the microphones with the same number of the sound sources are selected from the array, and their signals are used for audio source separation. Meanwhile, considering the calculation stability, the selected microphones are required to minimize the condition number of the microphone signal covariance matrix. To investigate the effectiveness and the localization performance of the proposed method, a practical five-element microphone array is used and 3-D multiple SSL experiments are carried out. The TDOA values are obtained by the generalized cross correlation based on the phase transform (GCC-PHAT). The experimental results show that the proposed method is effective and the maximum root mean square error of localization is less than 3cm. Compared with the conventional methods, the proposed method has the advantages of lower computation cost and fewer microphones, and can locate sources close to each other.

17:05-17:10, Paper TuDT12.6
Single-Microphone-Based Sound Source Localization for Mobile Robots in Reverberant Environments

Wang, Jiang	Institute of Science Tokyo
Shi, Runwu	Tokyo Institute of Technology
Yen, Benjamin	Institute of Science Tokyo
Kong, He	Southern University of Science and Technology
Nakadai, Kazuhiro	Institute of Science Tokyo
Keywords: Robot Audition, Localization, Deep Learning Methods Abstract: Accurately estimating sound source positions is crucial for robot audition. However, existing sound source localization methods typically rely on a microphone array with at least two spatially preconfigured microphones. This requirement hinders the applicability of microphone-based robot audition systems and technologies. To alleviate these challenges, we propose an online sound source localization method that uses a single microphone mounted on a mobile robot in reverberant environments. Specifically, we develop a lightweight neural network model with only 43k parameters to perform real-time distance estimation by extracting temporal information from reverberant signals. The estimated distances are then processed using an extended Kalman filter to achieve online sound source localization. To the best of our knowledge, this is the first work to achieve online sound source localization using a single microphone on a moving robot, a gap that we aim to fill in this work. Extensive experiments demonstrate the effectiveness and merits of our approach. To benefit the broader research community, we have open-sourced our code at https://github.com/JiangWAV/single-mic-SSL.


TuDT13	311C
Deep Learning for Visual Perception 4	Regular Session
Chair: Zhang, Wei	Shandong University
Co-Chair: Valada, Abhinav	University of Freiburg

16:40-16:45, Paper TuDT13.1
CrowdQuery: Density-Guided Query Module for Enhanced 2D and 3D Detection in Crowded Scenes

Dähling, Marius	Mercedes Benz AG, Karlsruhe Institute of Technology
Krebs, Sebastian	Mercedes-Benz AG
Zöllner, Johann Marius	FZI Forschungszentrum Informatik
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Intelligent Transportation Systems Abstract: This paper introduces a novel method to end-to-end crowd detection that leverages object density information to enhance existing transformer-based detectors. We present CrowdQuery (CQ), whose core component is our CQ module that predicts and subsequently embeds an object density map. The embedded density information is then systematically integrated into the decoder. Existing density map definitions typically depend on head positions or object-based spatial statistics. Our method extends these definitions to include individual bounding box dimensions. By incorporating density information into object queries, our method utilizes density-guided queries to improve detection in crowded scenes. CQ is universally applicable to both 2D and 3D detection without requiring additional data. Consequently, we are the first to design a method that effectively bridges 2D and 3D detection in crowded environments. We demonstrate the integration of CQ into both a general 2D and 3D transformer-based object detector, introducing the architectures CQ2D and CQ3D. CQ is not limited to the specific transformer models we selected. Experiments on the STCrowd dataset for both 2D and 3D domains show significant performance improvements compared to the base models, outperforming most state-of-the-art methods. When integrated into a state-of-the-art crowd detector, CQ can help to further improve performance on the challenging CrowdHuman dataset, demonstrating its generalizability. The code is released at https://github.com/mdaehl/CrowdQuery.

16:45-16:50, Paper TuDT13.2
UDSH: An Unsupervised Deep Image Stitching and De-Occlusion Method for Heavy Occlusion Scene

Chen, Kaixin	Beijing Institute of Technology
Li, Hao	Beijing Institute of Technology
Sun, Rundong	Beijing Institute of Technology
Yang, Yi	Beijing Institute of Technology
Fu, Mengyin	Beijing Institute of Technology
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation Abstract: Image stitching in heavy occlusion scenarios faces the dual challenges of accurate alignment and occlusion removal. On one hand, occlusion causes the loss of key texture and structural information in the image; on the other hand, it affects the image’s integrity. Existing stitching methods perform well in cases with small occlusion coverage, but they often fail in heavy occlusion. This failure is mainly due to three reasons: 1) they cannot accurately identify occluded regions, 2) they cannot suppress interference from the occluded regions, 3) they cannot completely remove the occluded regions. To address these issues, we propose an unsupervised deep image stitching method based on occlusion-aware and content inpainting. First, to solve the issue of occluded region identification, we design an Occlusion-Aware Feature Weighted module (OAFW) that explicitly distinguishes between occluded and non-occluded regions by learning the occlusion masks of the images. Second, to address the issue of interference from occlusion, we use the learned occlusion masks to filter out features from the occluded regions. To further suppress the impact of occlusion-induced errors, we design a Mask-Guided Dual-Granularity Alignment loss function (MGDGA) that only calculates alignment errors for non-occluded regions, effectively reducing occlusion error interference during network training. Finally, to resolve the content loss in the occluded regions, we replace the pixels in the occluded areas with those from the aligned overlapping regions and incorporate a Progressive Content Inpainting module (PCI) to recover the missing content in the non-overlapping regions caused by occlusion, ultimately achieving a complete and natural occlusion-free stitched image. Experimental results show that our method improves the mean squared error metric by 17.45% compared to the state-of-the-art stitching method.

16:50-16:55, Paper TuDT13.3
TEM3-Learning: Time-Efficient Multimodal Multi-Task Learning for Advanced Assistive Driving

Liu, Wenzhuo	Beijing Institute of Technology
Qiao, Yicheng	Tsinghua University
Wang, Zhen	Beijing Institute of Technology, Zhuhai
Guo, Qiannan	Tsinghua University
Chen, Zilong	Tsinghua University
Zhou, Meihua	Wannan Medical College
Li, Xinran	Yale University
Wang, Letian	University of Toronto
Li, Zhiwei	Beijing University of Chemical Technology
Liu, Huaping	Tsinghua University
Wang, Wenshuo	Beijing Institute of Technology
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, Computer Vision for Transportation Abstract: Multi-task learning (MTL) can advance assistive driving by exploring inter-task correlations through shared representations. However, existing methods face two critical limitations: single-modality constraints limiting comprehensive scene understanding and inefficient architectures impeding real-time deployment. This paper proposes TEM3-Learning (Time-Efficient Multimodal Multi-task Learning), a novel framework that jointly optimizes driver emotion recognition, driver behavior recognition, traffic context recognition, and vehicle behavior recognition through a two-stage architecture. The first component, the mamba-based multi-view temporal-spatial feature extraction subnetwork (MTS-Mamba), introduces a forward-backward temporal scanning mechanism and global-local spatial attention to efficiently extract low-cost temporal-spatial features from multi-view sequential images. The second component, the MTL-based gated multimodal feature integrator (MGMI), employs task-specific multi-gating modules to adaptively highlight the most relevant modality features for each task, effectively alleviating the negative transfer problem in MTL. Evaluation on the AIDE dataset, our proposed model achieves state-of-the-art accuracy across all four tasks, maintaining a lightweight architecture with fewer than 6 million parameters and delivering an impressive 142.32 FPS inference speed. Rigorous ablation studies further validate the effectiveness of the proposed framework and the independent contributions of each module. The code is available on https://github.com/Wenzhuo-Liu/TEM3-Learning.

16:55-17:00, Paper TuDT13.4
Visual Loop Closure Detection through Deep Graph Consensus

Büchner, Martin	University of Freiburg
Dahiya, Liza	Honda R&D Co., Ltd
Dorer, Simon	University of Freiburg
Ramtekkar, Vipul Vijay	Honda R&D Co., Ltd
Nishimiya, Kenji	Honda R&D Co., Ltd
Cattaneo, Daniele	University of Freiburg
Valada, Abhinav	University of Freiburg
Keywords: Deep Learning for Visual Perception, AI-Enabled Robotics, SLAM Abstract: Visual loop closure detection traditionally relies on place recognition methods to retrieve candidate loops that are validated using computationally expensive RANSAC-based geometric verification. As false positive loop closures significantly degrade downstream pose graph estimates, verifying a large number of candidates in online simultaneous localization and mapping scenarios is constrained by limited time and compute resources. While most deep loop closure detection approaches only operate on pairs of keyframes, we relax this constraint by considering neighborhoods of multiple keyframes when detecting loops. In this work, we introduce LoopGNN, a graph neural network architecture that estimates loop closure consensus by leveraging cliques of visually similar keyframes retrieved through place recognition. By propagating deep feature encodings among nodes of the clique, our method yields high precision estimates while maintaining high recall. Extensive experimental evaluations on the TartanDrive 2.0 and NCLT datasets demonstrate that LoopGNN outperforms traditional baselines. Additionally, an ablation study across various keypoint extractors demonstrates that our method is robust, regardless of the type of deep feature encodings used, and exhibits higher computational efficiency compared to classical geometric verification baselines. We release our code, supplementary material, and keyframe data at https://loopgnn.cs.uni-freiburg.de.

17:00-17:05, Paper TuDT13.5
HeightAware-BEV: Height-Aware Feature Mapping for Efficient Bird's-Eye-View Perception

Zhou, Renjie	Zhejiang Dahua Technology Co., Ltd
Li, Jiachen	Zhejiang Dahua Technology Co., Ltd
Su, Zhen	Zhejiang Dahua Technology Co., Ltd
Lu, Chao	Zhejiang Dahua Technology Co., Ltd
Wang, Zhengjun	Zhejiang Dahua Technology Co., Ltd
Keywords: Deep Learning for Visual Perception, Sensor Fusion, Computer Vision for Transportation Abstract: Bird’s-Eye View (BEV) perception has gained significant attention in autonomous driving and robotics due to its advantages in simplifying modality alignment and feature fusion. Addressing the challenge of jointly optimizing performance and efficiency in 2D-3D view transformation, we identify that, compared to depth information which is viewpointdependent and requires camera intrinsics for estimation, height information can maintains prediction consistency across different camera perspectives. Based on this insight, we propose the HeightAware-BEV framework, which achieves efficient and accurate view transformation through height-aware feature mapping. (1) Building on an efficient projection-based view transformation approach, 3D voxels directly query the height probability distribution predicted by images according to grid height, weighting corresponding features to enable precise and efficient feature projection; (2) Design a dynamic feature filtering mechanism to filter out task-irrelevant features during the view transformation process. Additionally, a weakly-supervised training strategy is designed to improve model performance in scenarios with limited samples. The HeightAware-BEV (R50@448×800) achieves an IOU of 47.8% on the nuScenes validation set and 60 FPS on 2080Ti, outperforming advanced methods such as SimpleBEV and PointBEV. The code is available at https://github.com/Zhou-Renjie/HeightAware-BEV.

17:05-17:10, Paper TuDT13.6
VCADNet: Vision-Based Circular Accessible Depth Prediction for UGV Perception

Zhang, Tao	Shandong University
Zhao, Yuenan	Shandong University
Xu, Xiaoyu	Shandong University
Song, Ran	Shandong University
Han, Lei	Tencent Robotics X
Zhang, Wei	Shandong University
Keywords: Deep Learning for Visual Perception, Semantic Scene Understanding, Deep Learning Methods Abstract: Circular accessible depth (CAD) provides a lightweight and robust traversability representation for autonomous navigation of unmanned ground vehicles (UGV). Aiming at the limitations of existing LiDAR-based methods in detecting low-thickness targets and executing semantic reasoning, we propose VCADNet, a vision-based neural network for circular accessible depth prediction. VCADNet comprises three core components: a geometry-based query module for multi-view bird's eye view feature extraction, a polar coordinate transformation for CAD alignment, and a multi-scale U-Net architecture for depth prediction. In addition, we present a cross-modal contrastive learning scheme to enhance the spatial reasoning of VCADNet, which transfers knowledge from LiDAR-based encoders to vision-based counterparts. Extensive experiments demonstrate the superior performance of VCADNet in various UGV perception tasks.

17:10-17:15, Paper TuDT13.7
Diffusion-FS: Multimodal Free-Space Prediction Via Diffusion for Autonomous Driving

Gupta, Keshav	International Institute of Information Technology, Hyderabad
Stanley, Tejas Stephen	IIIT Hyderabad
Paul, Pranjal	International Institute of Information Technology
Singh, Arun Kumar	University of Tartu
Krishna, Madhava	IIIT Hyderabad
Keywords: Deep Learning for Visual Perception, Autonomous Vehicle Navigation Abstract: Drivable Freespace prediction is a fundamental and crucial problem in autonomous driving. Recent works have addressed the problem by representing the entire non-obstacle road regions as the freespace. In contrast our aim is to estimate the driving corridors that are a navigable subset of the entire road region. Unfortunately, existing corridor estimation methods directly assume a BEV centric representation, which is hard to obtain. In contrast, we frame drivable freespace corridor prediction as a pure image perception task, using only monocular camera input. However such a formulation poses several challenges as one doesn't have the corresponding data for such freespace corridor segments in the image. Consequently, we develop a novel self-supervised approach for freespace sample generation by leveraging future ego trajectories and front-view camera images, making the process of visual corridor estimation dependent on the ego trajectory. We then employ a diffusion process to model the distribution of such segments in the image. However, the existing binary mask based representation for a segment poses many limitations. Therefore, we introduce ContourDiff, a specialized diffusion-based architecture that denoises over contour points rather than relying on binary mask representations, enabling structured and interpretable freespace predictions. We evaluate our approach qualitatively and quantitatively on both NuScenes and CARLA, demonstrating its effectiveness in accurately predicting safe multimodal navigable corridors in the image.

17:15-17:20, Paper TuDT13.8
Real-Time Consistent Monocular Depth Recovery System for Dynamic Environments

Huang, Gan	Zhejiang University
Pan, Xiaokun	Zhejiang University
Lin, Hengxu	Zhejiang University
Zhang, Ziyang	Zhejiang University
Xu, Weiwei	Zhejiang University
Zhang, Guofeng	Zhejiang University
Keywords: Deep Learning for Visual Perception, Mapping, RGB-D Perception Abstract: Monocular depth estimation is essential for applications such as autonomous navigation and 3D reconstruction. However, achieving accurate and temporally consistent depth estimation in dynamic environments remains challenging due to scale ambiguity, sensitivity to dynamic objects, and inconsistent depth predictions. Traditional SLAM-based methods ensure global consistency but perform poorly in dynamic scenes, while deep learning-based approaches suffer from the absence of absolute scale and temporal stability. To overcome these limitations, we propose a Real-Time Consistent Monocular Depth Recovery System that integrates ORB-SLAM3 for sparse depth estimation, a ViT-based depth completion network, and a motion segmentation module to enhance robustness in dynamic environments. Additionally, we introduce a dual-weight fusion module that adaptively balances RGB semantic features and geometric depth priors, ensuring high accuracy and consistency. Our system jointly optimizes both static and dynamic regions, providing globally scale-consistent dense depth maps with temporal stability. Extensive experiments on benchmark datasets demonstrate that our approach outperforms existing methods in terms of depth accuracy, temporal consistency, and robustness in dynamic scenes, while maintaining real-time efficiency.


TuDT14	311D
Deep Learning Methods 4	Regular Session
Chair: Bai, Yang	Ludwig Maximilian University of Munich
Co-Chair: Rasouli, Amir	Huawei Technologies Canada

16:40-16:45, Paper TuDT14.1
RoboSwap: A GAN-Driven Video Diffusion Framework for Unsupervised Robot Arm Swapping

Bai, Yang	Ludwig Maximilian University of Munich
Yang, Liudi	University of Freiburg
Eskandar, George	University of Stuttgart
Shen, Fengyi	Technical University of Munich
Chen, Dong	Technische Universität München
Altillawi, Mohammad	Huawei, Autonomous University of Barcelona,
Liu, Ziyuan	Huawei Group
Kutyniok, Gitta	The Ludwig Maximilian University of Munich
Keywords: AI-Based Methods, Visual Learning Abstract: Abstract— Recent advancements in generative models have revolutionized video synthesis and editing. However, the scarcity of diverse, high-quality datasets continues to hinder video- conditioned robotic learning, limiting cross-platform general- ization. In this work, we address the challenge of swapping a robotic arm in one video with another— a key step for cross- embodiment learning. Unlike previous methods that depend on paired video demonstrations in the same environmental settings, our proposed framework, RoboSwap, operates on unpaired data from diverse environments, alleviating the data collection needs. RoboSwap introduces a novel video editing pipeline integrating both GANs and diffusion models, combining their isolated advantages. Specifically, we segment robotic arms from their backgrounds and train an unpaired GAN model to translate one robotic arm to another. The translated arm is blended with the original video background and refined with a diffusion model to enhance coherence, motion realism and object interaction. The GAN and diffusion stages are trained independently. Our experiments demonstrate that RoboSwap outperforms state-of-the-art video and image editing models on three benchmarks in terms of both structural coherence and motion consistency, thereby offering a robust solution for generating reliable, cross-embodiment data in robotic learning.

16:45-16:50, Paper TuDT14.2
Stochasticity in Motion: An Information-Theoretic Approach to Trajectory Prediction

Distelzweig, Aron	Albert-Ludwigs-Universität Freiburg
Andreas, Look	Bosch
Kosman, Eitan	Bosch
Janjoš, Faris	Robert Bosch GmbH
Wagner, Jörg	Robert Bosch GmbH
Valada, Abhinav	University of Freiburg
Keywords: Deep Learning Methods, Probability and Statistical Methods, Intelligent Transportation Systems Abstract: In autonomous driving, accurate motion prediction is crucial for safe and efficient motion planning. To ensure safety, planners require reliable uncertainty estimates of the predicted behavior of surrounding agents, yet this aspect has received limited attention. In particular, decomposing uncertainty into its aleatoric and epistemic components is essential for distinguishing between inherent environmental randomness and model uncertainty, thereby enabling more robust and informed decision-making. This paper addresses the challenge of uncertainty modeling in trajectory prediction with a holistic approach that emphasizes uncertainty quantification, decomposition, and the impact of model composition. Our method, grounded in information theory, provides a theoretically principled way to measure uncertainty and decompose it into aleatoric and epistemic components. Unlike prior work, our approach is compatible with state-of-the-art motion predictors, allowing for broader applicability. We demonstrate its utility by conducting extensive experiments on the nuScenes dataset, which show how different architectures and configurations influence uncertainty quantification and model robustness.

16:50-16:55, Paper TuDT14.3
CoCoL: A Communication Efficient Decentralized Collaborative Learning Method for Multi-Robot Systems

Huang, Jiaxi	Zhejiang University
Huang, Yan	KTH - Kungliga Tekniska Högskolan
Zhao, Yixian	Zhejiang University
Meng, Wenchao	Zhejiang University
Xu, Jinming	Zhejiang University
Keywords: Distributed Robot Systems, Deep Learning Methods, Multi-Robot Systems Abstract: Collaborative learning enhances the performance and adaptability of multi-robot systems in complex tasks but faces significant challenges due to high communication overhead and data heterogeneity inherent in multi-robot tasks. To this end, we propose CoCoL, a Communication efficient decentralized Collaborative Learning method tailored for multi-robot systems with heterogeneous local datasets. Leveraging a mirror descent framework, CoCoL achieves remarkable communication efficiency with approximate Newton-type updates by capturing the similarity between objective functions of robots, and reduces computational costs through inexact sub-problem solutions. Furthermore, the integration of a gradient tracking scheme ensures its robustness against data heterogeneity. Experimental results on three representative multi robot collaborative learning tasks show that the proposed CoCoL can significantly reduce both the number of communication rounds and total bandwidth consumption while maintaining state-of-the-art accuracy. These benefits are particularly evident in challenging scenarios involving non-IID (non-independent and identically distributed) data distribution, streaming data, and time-varying network topologies.

16:55-17:00, Paper TuDT14.4
RA-DP: Rapid Adaptive Diffusion Policy for Training-Free High-Frequency Robotics Replanning

Ye, Xi	Polytechnique Montreal
Yang, Rui Heng	Huawei Technologies Canada
Jin, Jun	Huawei Canada
Li, Yinchuan	Huawei
Rasouli, Amir	Huawei Technologies Canada
Keywords: Deep Learning Methods, Sensorimotor Learning, Task and Motion Planning Abstract: Diffusion models exhibit impressive scalability in robotic task learning, yet they struggle to adapt to novel, highly dynamic environments. This limitation primarily stems from their constrained replanning ability: they either operate at a low frequency due to a time-consuming iterative sampling process, or are unable to adapt to unforeseen feedback in case of rapid replanning. To address these challenges, we propose RA-DP, a novel diffusion policy framework with training-free high-frequency replanning ability that solves the above limitations by adapting to unforeseen dynamic environments. Specifically, our method integrates guidance signals, which are often easily obtained in the new environment during the diffusion sampling process, and utilizes a novel action queue mechanism to generate replanned actions at every denoising step without retraining, thus forming a complete training-free framework for robot motion adaptation in unseen environments. We conduct extensive evaluations in both common simulation benchmarks and real-world environments. Our results indicate that RA-DP outperforms the state-of-the-art diffusion-based methods in terms of replanning frequency and success rate. At the end, we show that our framework is theoretically compatible with any training-free guidance signal, hence increasing its applicability to a wide range of robotics tasks.

17:00-17:05, Paper TuDT14.5
STACKGEN: Generating Stable Structures from Silhouettes Via Diffusion

Sun, Luzhe	Toyota Technological Institute at Chicago
Yoneda, Takuma	Toyota Technological Institute at Chicago
Wheeler, Samuel	Argonne National Laboratory
Jiang, Tianchong	Toyota Technological Institute at Chicago
Walter, Matthew	Toyota Technological Institute at Chicago
Keywords: Deep Learning Methods, Deep Learning in Grasping and Manipulation, Representation Learning Abstract: Humans naturally obtain intuition about the interactions between and the stability of rigid objects by observing and interacting with the world. It is this intuition that governs the way in which we regularly configure objects in our environment, allowing us to build complex structures from simple, everyday objects. Robotic agents, on the other hand, traditionally require an explicit model of the world that includes the detailed geometry of each object and an analytical model of the environment dynamics, which are difficult to scale and preclude generalization. Instead, robots would benefit from an awareness of intuitive physics that enables them to similarly reason over the stable interaction of objects in their environment. Towards that goal, we propose STACKGEN—a diffusion model that generates diverse stable configurations of building blocks matching a target silhouette. To demonstrate the capability of the method, we evaluate it in a simulated environment and deploy it in the real setting using a robotic arm to assemble structures generated by the model.

17:05-17:10, Paper TuDT14.6
MambaGCN: Synergistic Integration of Graph Convolutional Networks and State Space Models for Point Cloud Processing

Rao, Zhifeng	Southern University of Science and Technology
Lin, Zhiyun	Southern University of Science and Technology
Keywords: Deep Learning Methods, Autonomous Agents Abstract: Graph Neural Networks (GNNs) have rapidly emerged as a formidable tool for analyzing point clouds, leveraging their capacity to aggregate local features across multiple spatial scales via layered structures. However, a significant challenge lies in effectively and selectively integrating these multi-scale features to maximize overall performance. To tackle this integration challenge, we design a novel model, MambaGCN, which employs a state space model to dynamically adjust feature weights across spatial scales during aggregation, enabling more refined feature integration while ensuring computational efficiency. Unlike transformers with their quadratic complexity, MambaGCN achieves linear complexity, substantially reducing GPU memory usage and computational cost. Moreover, we have enhanced the architectural depth by designing a density-based farthest point sampling algorithm, which allows us to selectively downsample the input data to achieve varying levels of point density. This innovation facilitates the seamless concatenation of multiple MambaGCN layers, significantly deepening the structure of the network and enhancing its ability to tackle complex point cloud tasks effectively. Through these strategic developments, MambaGCN has demonstrated outstanding performance in tasks such as point cloud classification and part segmentation, affirming its robustness and efficiency in processing point cloud data.

17:10-17:15, Paper TuDT14.7
Hyperbolic Transformers with LLMs for Multimodal Human Activity Recognition

Soleimani, Farnaz	LISSI, Université Paris-Est Créteil (UPEC)
Khodabandelou, Ghazaleh	University of Paris-Est Créteil
Chibani, Abdelghani	Lissi Lab Paris EST University
Amirat, Yacine	University of Paris Est Créteil (UPEC)
Keywords: Deep Learning Methods, AI-Based Methods, Optimization and Optimal Control Abstract: Human Activity Recognition (HAR) plays a crucial role in applications such as healthcare, smart environments, and human-robot interaction. This study proposes a novel Hyperbolic optimization strategy that improves model generalization by utilizing the geometric structure of the parameter space. To evaluate its adaptability and effectiveness, Transformer and GPT-2 models are fine-tuned on both unimodal (UCI-HAR, Opportunity) and multimodal (UTD-MHAD, NTU RGB+D) datasets. Unlike prior work that typically leverages only one or two modalities, this study utilizes all available modalities—RGB, depth, skeleton, and inertial—for multimodal evaluation. The Transformer achieves 95.86% and 93.40% accuracy on UCI-HAR and Opportunity, respectively, and 99.08% and 89.93% on UTD-MHAD and NTU RGB+D. GPT-2 also performs competitively, achieving 86.33% and 83.57% on the unimodal datasets, and 83.23% and 81.49% on the multimodal ones. These results highlight the potential of hyperbolic optimization for HAR across diverse sensor modalities and architectures.

17:15-17:20, Paper TuDT14.8
STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive Applications

Gao, Andrew	San Jose State University
Liu, Jun	SJSU
Keywords: Deep Learning Methods, AI-Based Methods, Big Data in Robotics and Automation Abstract: This paper presents a new method for anomaly detection in automated systems with time and compute sensitive requirements with unparalleled efficiency. As systems like autonomous driving become increasingly popular, ensuring their safety has become more important than ever. With this motivation, this paper focuses on how to quickly and effectively detect various anomalies in the aforementioned systems. Many detection systems have been developed with great success under spatial contexts. However, there is still significant room for improvement when it comes to temporal context. While there is substantial work regarding this task, there is minimal work done regarding the efficiency of models and their ability to be applied to scenarios that require real-time inference. To address this gap, we propose STEAD (Spatio-Temporal Efficient Anomaly Detection), whose backbone is developed using (2+1)D Convolutions and Performer Linear Attention, which ensures computational efficiency without sacrificing performance. When evaluated on the UCF-Crime benchmark, our base model achieves an AUC of 91.34%, outperforming the previous SOTA (state of the art), and our fast version achieves an AUC of 88.87%, while having 99.70% less parameters and outperforming the previous SOTA as well. The code and pretrained models are made publicly available at https://github.com/agao8/STEAD.


TuDT15	206
Simulation and Animation	Regular Session
Chair: Wu, Fan	Technical University of Munich
Co-Chair: Shi, Yongliang	Qiyuan Lab

16:40-16:45, Paper TuDT15.1
Industrial-Grade Sensor Simulation Via Gaussian Splatting: A Modular Framework for Scalable Editing and Full-Stack Validation

Zeng, Xianming	Ciaoniao Inc, Alibaba Group
Du, Sicong	Institute of Automation，Chinese Academy of Sciences
Chen, Qifeng	CaiNiao Inc., Alibaba Group
Liu, Lizhe	Cainiao
Shu, Haoyu	Unmanned Vehicle Department of CaiNiao Inc., Alibaba Group
Gao, Jiaxuan	CaiNiao Inc., Alibaba Group
Liu, Jiarun	Alibaba Inc
Xu, Jiulong	CaiNiao
Xu, Jianyun	Alibaba
Chen, Mingxia	Alibaba Group
Zhao, Yiru	Alibaba Group
Chen, Peng	Cainiao Group
Xue, Yapeng	Alibaba
Zhao, Chunming	AliBABA
Yang, Sheng	CaiNiao Inc., Alibaba Group
Li, Qiang	Cainiao Group
Keywords: Simulation and Animation, Deep Learning for Visual Perception, Sensor Fusion Abstract: Sensor simulation is pivotal for scalable validation of autonomous driving systems, yet existing Neural Radiance Fields (NeRF) based methods face applicability and efficiency challenges in industrial workflows. This paper introduces a Gaussian Splatting (GS) based system to address these challenges: We first break down sensor simulator components and analyze the possible advantages of GS over NeRF. Then in practice, we refactor three crucial components through GS, to leverage its explicit scene representation and real-time rendering: (1) choosing the 2D neural Gaussian representation for physics-compliant scene and sensor modeling, (2) proposing a scene editing pipeline to leverage Gaussian primitives library for data augmentation, and (3) coupling a controllable diffusion model for scene expansion and harmonization. We implement this framework on a proprietary autonomous driving dataset supporting cameras and LiDAR sensors. We demonstrate through ablation studies that our approach reduces frame-wise simulation latency, achieves better geometric and photometric consistency, and enables interpretable explicit scene editing and expansion. Furthermore, we showcase how integrating such a GS-based sensor simulator with traffic and dynamic simulators enables full-stack testing of end-to-end autonomy algorithms. Our work provides both algorithmic insights and practical validation, establishing GS as a cornerstone for industrial-grade sensor simulation.

16:45-16:50, Paper TuDT15.2
DISCOVERSE: Efficient Robot Simulation in Complex High-Fidelity Environments

Jia, Yufei	Department of Electronic Engineering, Tsinghua University, China
Wang, Guangyu	Tsinghua University
Dong, Yuhang	Zhejiang University
Wu, Junzhe	Beihang University
Zeng, Yupei	Zhejiang University
Lin, Haonan	Huazhong University of Science and Technology
Wang, Zifan	The Hong Kong University of Science and Technology (Guangzhou)
Ge, Haizhou	Tsinghua University
Gu, Weibin	Politecnico Di Torino
Ding, Kairui	Tsinghua University
Yan, Zike	Tsinghua University
Cheng, Yunjie	Xi'an Jiaotong University
Li, Yue	DISCOVER Robotics
Wang, Ziming	TongJi University
Li, Chuxuan	Tsinghua University
Sui, Wei	Soochow University
Shi, Lu	Tsinghua University
Tian, Guanzhong	Zhejiang University
Huang, Ruqi	Tsinghua Shenzhen International Graduate School
Zhou, Guyue	Tsinghua University
Keywords: Simulation and Animation, Imitation Learning, Data Sets for Robot Learning Abstract: We present DISCOVERSE, the first unified, modular, open-source 3DGS-based simulation framework for Real2Sim2Real robot learning. It features a holistic Real2Sim pipeline that synthesizes hyper-realistic geometry and appearance of complex real-world scenarios, paving the way for analyzing and bridging the Sim2Real gap. Powered by Gaussian Splatting and MuJoCo, DISCOVERSE enables massively parallel simula- tion of multiple sensor modalities and accurate physics, with inclusive supports for existing 3D assets, robot models, and ROS plugins, empowering large-scale robot learning and complex robotic benchmarks. Through extensive experiments on imitation learning, DISCOVERSE demonstrates state-of-the-art zero-shot Sim2Real transfer performance compared to existing simulators. For code and demos: https://air-discoverse.github.io/.

16:50-16:55, Paper TuDT15.3
A Convex Formulation of Material Points and Rigid Bodies with GPU-Accelerated Async-Coupling for Interactive Simulation

Yu, Chang	University of California, Los Angeles
Du, Wenxin	University of California, Los Angeles
Zong, Zeshun	University of California, Los Angeles
Castro, Alejandro	Toyota Research Institute
Jiang, Chenfanfu	University of California, Los Angeles
Han, Xuchen	Toyota Research Institute
Keywords: Simulation and Animation Abstract: We present a novel convex formulation that weakly couples the Material Point Method (MPM) with rigid body dynamics through frictional contact, optimized for efficient GPU parallelization. Our approach features an asynchronous time-splitting scheme to integrate MPM and rigid body dynamics under different time step sizes. We develop a globally convergent quasi-Newton solver tailored for massive parallelization, achieving up to 500× speedup over previous convex formulations without sacrificing stability. Our method enables interactive-rate simulations of robotic manipulation tasks with diverse deformable objects including granular materials and cloth, with strong convergence guarantees. We detail key implementation strategies to maximize performance and validate our approach through rigorous experiments, demonstrating superior speed, accuracy, and stability compared to state-of-the-art MPM simulators for robotics. We make our method available in the open-source robotics toolkit, Drake.

16:55-17:00, Paper TuDT15.4
Controllable Traffic Simulation through LLM-Guided Hierarchical Reasoning and Refinement

Liu, Zhiyuan	Tsinghua University
Li, Leheng	HKUST(GZ)
Wang, Yuning	Tsinghua University
Lin, Haotian	Tsinghua University
Cheng, Hao	Tsinghua University
Liu, Zhizhe	University of Wisconsin-Madison
He, Lei	Tsinghua University
Wang, Jianqiang	Tsinghua University
Keywords: Simulation and Animation, Motion and Path Planning, Deep Learning Methods Abstract: Evaluating autonomous driving systems in complex and diverse traffic scenarios through controllable simulation is essential to ensure their safety and reliability. However, existing traffic simulation methods face challenges in their controllability. To address this, we propose a novel diffusion-based and LLM-enhanced traffic simulation framework. Our approach incorporates a high-level understanding module and a low-level refinement module, which systematically examines the hierarchical structure of traffic elements, guides LLMs to thoroughly analyze traffic scenario descriptions step by step, and refines the generation by self-reflection, enhancing their understanding of complex situations. Furthermore, we propose a Frenet-frame-based cost function framework that provides LLMs with geometrically meaningful quantities, improving their grasp of spatial relationships in a scenario and enabling more accurate cost function generation. Experiments on the Waymo Open Motion Dataset (WOMD) demonstrate that our method can handle more intricate descriptions and generate a broader range of scenarios in a controllable manner.

17:00-17:05, Paper TuDT15.5
Scalable Real2Sim: Physics-Aware Asset Generation Via Robotic Pick-And-Place Setups

Pfaff, Nicholas Ezra	Massachusetts Institute of Technology
Fu, Evelyn	Massachusetts Institute of Technology
Binagia, Jeremy	Amazon Robotics
Isola, Phillip	MIT
Tedrake, Russ	Massachusetts Institute of Technology
Keywords: Simulation and Animation, Calibration and Identification Abstract: Simulating object dynamics from real-world perception shows great promise for digital twins and robotic manipulation but often demands labor-intensive measurements and expertise. We present a fully automated Real2Sim pipeline that generates simulation-ready assets for real-world objects through robotic interaction. Using only a robot’s joint torque sensors and an external camera, the pipeline identifies visual geometry, collision geometry, and physical properties such as inertial parameters. Our approach introduces a general method for extracting high-quality, object-centric meshes from photometric reconstruction techniques (e.g., NeRF, Gaussian Splatting) by employing alpha-transparent training while explicitly distinguishing foreground occlusions from background subtraction. We validate the full pipeline through extensive experiments, demonstrating its effectiveness across diverse objects. By eliminating the need for manual intervention or environment modifications, our pipeline can be integrated directly into existing pick-and-place setups, enabling scalable and efficient dataset creation. Project page (with code and data): https://scalable-real2sim.github.io/.

17:05-17:10, Paper TuDT15.6
RadaRays: Real-Time Simulation of Rotating FMCW Radar for Mobile Robotics Via Hardware-Accelerated Ray Tracing

Mock, Alexander	Osnabrück University
Magnusson, Martin	Örebro University
Hertzberg, Joachim	University of Osnabrueck
Keywords: Simulation and Animation, Range Sensing, Software, Middleware and Programming Environments Abstract: RadaRays allows for the accurate modeling and simulation of rotating FMCW radar sensors in complex environments, including the simulation of reflection, refraction, and scattering of radar waves. Our software is able to handle large numbers of objects and materials, making it suitable for use in a variety of mobile robotics applications. We demonstrate the effectiveness of RadaRays through a series of experiments and show that it can more accurately reproduce the behavior of FMCW radar sensors in a variety of environments, compared to the ray casting-based lidar-like simulations that are commonly used in simulators for autonomous driving such as CARLA. Our experiments additionally serve as a valuable reference point for researchers to evaluate their own radar simulations. By using RadaRays, developers can significantly reduce the time and cost associated with prototyping and testing FMCW radar-based algorithms. We also provide a Gazebo plugin that makes our work accessible to the mobile robotics community.

17:10-17:15, Paper TuDT15.7
GRIP: A General Robotic Incremental Potential Contact Simulation Dataset for Unified Deformable-Rigid Coupled Grasping

Ma, Siyu	University of California, Los Angeles
Du, Wenxin	University of California, Los Angeles
Yu, Chang	University of California, Los Angeles
Jiang, Ying	University of California, Los Angeles
Zong, Zeshun	University of California, Los Angeles
Xie, Tianyi	University of California, Los Angeles
Chen, Yunuo	University of California, Los Angeles
Yang, Yin	University of Utah
Han, Xuchen	Toyota Research Institute
Jiang, Chenfanfu	University of California, Los Angeles
Keywords: Simulation and Animation, Data Sets for Robot Learning, Grasping Abstract: Grasping is fundamental to robotic manipulation, and recent advances in large-scale grasping datasets have provided essential training data and evaluation benchmarks, accelerating the development of learning-based methods for robust object grasping. However, most existing datasets exclude deformable bodies due to the lack of scalable, robust simulation pipelines, limiting the development of generalizable models for compliant grippers and soft manipulands. To address these challenges, we present GRIP, a General Robotic Incremental Potential contact simulation dataset for universal grasping. GRIP leverages an optimized Incremental Potential Contact (IPC)-based simulator for multi-environment data generation, achieving up to 48× speedup while ensuring efficient, intersection- and inversion-free simulations for compliant grippers and deformable objects. Our fully automated pipeline generates and evaluates diverse grasp interactions across 1,200 objects and 100,000 grasp poses, incorporating both soft and rigid grippers. The GRIP dataset enables applications such as neural grasp generation and stress field prediction.


TuDT16	207
Safety in HRI	Regular Session
Chair: Li, Guangliang	Ocean University of China
Co-Chair: Zhou, Hui	Nanjing University of Science and Technology

16:40-16:45, Paper TuDT16.1
Collision Mass Map for Safe and Efficient Human-Robot Interaction

Balletshofer, Julian	Technical University of Munich
Kirschner, Robin Jeanne	TU Munich, Institute for Robotics and Systems Intelligence
Althoff, Matthias	Technische Universität München
Keywords: Safety in HRI, Human-Robot Collaboration, Industrial Robots Abstract: Efficient and safe integration of robots into human workspaces remains a significant challenge. The ISO 10218-2 standard defines permissible force thresholds that a robot is allowed to exert on humans, along with a simple model to estimate the impact force based on the impact velocity, the involved human body part, and the effective mass of the robot. In this work, we experimentally demonstrate that state-of-the-art approaches fail to compute the effective robot mass accurately, leading to unsafe or overly-restrictive robot behavior. We address this shortcoming by presenting a data-driven collision mass map that accurately predicts the effective mass perceived at the end effector for a given collision location for the entire workspace. These maps are trained using a limited set of impact data selected by our proposed measurement procedure and can serve as valuable references for safety-critical applications. We validate our method on two robots, demonstrating accurate force predictions in compliance with ISO 10218-2. In our experiments, we show that our approach greatly reduces the required force measurements compared to state-of-the-art data-driven methods for risk assessment. Furthermore, our approach allows one to easily integrate different payloads, making it highly adaptable to various collaborative tasks. The proposed collision mass map can be standardized and deployed for any collaborative robot, enabling simple integration of robots for safe and more efficient human-robot interaction.

16:45-16:50, Paper TuDT16.2
Deep Learning-Based Proactive Hazard Prediction for Human-Robot Collaboration with Sensor Malfunctions

Ma, Yuliang	University of Stuttgart
Jin, Zilin	University of Stuttgart
Liu, Qi	University of Stuttgart
Mamaev, Ilshat	Proximity Robotics & Automation GmbH
Morozov, Andrey	University of Stuttgart
Keywords: Safety in HRI, Deep Learning Methods, Robot Safety Abstract: Safety is a critical concern in human-robot collaboration (HRC). As collaborative robots take on increasingly complex tasks in human environments, their systems have become more sophisticated through the integration of multimodal sensors, including force-torque sensors, cameras, LiDARs, and IMUs. However, existing studies on HRC safety primarily focus on ensuring safety under normal operating conditions, overlooking scenarios where internal sensor faults occur. While anomaly detection modules can help identify sensor errors and mitigate hazards, two key challenges remain: (1) no anomaly detector is flawless, and (2) not all sensor malfunctions directly threaten human safety. Relying solely on anomaly detection can lead to missed errors or excessive false alarms. To enhance safety in real-world HRC applications, this paper introduces a deep learning-based method that proactively predicts hazards following the detection of sensory anomalies. We simulate two common types of faults—bias and noise—affecting joint sensors and monitor abnormal manipulator behaviors that could pose risks in fenceless HRC environments. A dataset of 2,400 real-world samples is collected to train the proposed hazard prediction model. The approach leverages multimodal inputs, including RGB-D images, human pose, joint states, and planned robot paths, to assess whether sensor malfunctions could lead to hazardous events. Experimental results show that the proposed method outperforms state-of-the-art models, while offering faster inference speed. Additionally, cross-scenario testing confirms its strong generalization capabilities. The code and datasets are available at: DL-based-Hazard-Prediction.

16:50-16:55, Paper TuDT16.3
The Art of Not Getting Smacked: ISO/TS 15066-Compliant Variable Admittance Control for Safe Human-Robot Interaction

Nini, Matteo	University of Modena and Reggio Emilia (UNIMORE)
Pupa, Andrea	University of Modena and Reggio Emilia
Secchi, Cristian	Univ. of Modena & Reggio Emilia
Fantuzzi, Cesare	Università Di Modena E Reggio Emilia
Ferraguti, Federica	Università Degli Studi Di Modena E Reggio Emilia
Keywords: Safety in HRI, Physical Human-Robot Interaction, Human-Robot Collaboration Abstract: Ensuring safe and effective physical human-robot interaction (pHRI) remains a critical challenge in industrial robotics, particularly in ensuring compliance with ISO/TS 15066 safety standards. This paper proposes a novel framework to achieve a safe and robust physical human-robot interaction (pHRI). The framework adapts the parameters of a variable admittance controller online in order to guarantee passivity and compliance with ISO/TS 15066. Passivity is guaranteed using an energy tank, while a safety constraint explicitly handles the Power and Force Limiting (PFL) energy limit. Experimental validation on an industrial robot demonstrates the effectiveness of the framework.

16:55-17:00, Paper TuDT16.4
Analyzing Human Perceptions of a MEDEVAC Robot in a Simulated Evacuation Scenario

Jordan, Tyson	University of Georgia
Pandey, Pranav Kumar	University of Georgia
Doshi, Prashant	University of Georgia
Parasuraman, Ramviyas	University of Georgia
Goodie, Adam	University of Georgia
Keywords: Search and Rescue Robots, Safety in HRI, Human-Centered Robotics Abstract: The use of autonomous systems in medical evacuation (MEDEVAC) scenarios is promising, but existing implementations overlook key insights from human-robot interaction (HRI) research. Studies on human-machine teams demonstrate that human perceptions of a machine teammate are critical in governing the machine's performance. Consequently, it is essential to identify the factors that contribute to positive human perceptions in human-machine teams. Here, we present a mixed factorial design to assess human perceptions of a MEDEVAC robot in a simulated evacuation scenario. Participants were assigned to the role of casualty (CAS) or bystander (BYS) and subjected to three within-subjects conditions based on the MEDEVAC robot's operating mode: autonomous-slow (AS), autonomous-fast (AF), and teleoperation (TO). During each trial, a MEDEVAC robot navigated an 11-meter path, acquiring a casualty and transporting them to an ambulance exchange point while avoiding an idle bystander. Following each trial, subjects completed a questionnaire measuring their emotional states, perceived safety, and social compatibility with the robot. Results indicate a consistent main effect of operating mode on reported emotional states and perceived safety. Pairwise analyses suggest that the employment of the AF operating mode negatively impacted perceptions along these dimensions. There were no persistent differences between CAS and BYS responses.

17:00-17:05, Paper TuDT16.5
LiHRA: A LiDAR-Based HRI Dataset for Automated Risk Monitoring Methods

Plahl, Frederik	Proximity Robotics & Automation GmbH, University of Stuttgart
Katranis, Georgios	University of Stuttgart
Mamaev, Ilshat	Proximity Robotics & Automation GmbH
Morozov, Andrey	University of Stuttgart
Keywords: Safety in HRI, Human-Robot Collaboration Abstract: We present LiHRA, a novel dataset designed to facilitate the development of automated, learning-based, or classical risk monitoring (RM) methods for Human-Robot Interaction (HRI) scenarios. The growing prevalence of collaborative robots in industrial environments has increased the need for reliable safety systems. However, the lack of high-quality datasets that capture realistic human-robot interactions, including potentially dangerous events, slows development. LiHRA addresses this challenge by providing a comprehensive, multi-modal dataset combining 3D LiDAR point clouds, human body keypoints, and robot joint states, capturing the complete spatial and dynamic context of human-robot collaboration. This combination of modalities allows for precise tracking of human movement, robot actions, and environmental conditions, enabling accurate RM during collaborative tasks. The LiHRA dataset covers six representative HRI scenarios involving collaborative and coexistent tasks, object handovers, and surface polishing, with safe and hazardous versions of each scenario. In total, the data set includes 4,431 labeled point clouds recorded at 10 Hz, providing a rich resource for training and benchmarking classical and AI-driven RM algorithms. Finally, to demonstrate LiHRA's utility, we introduce an RM method that quantifies the risk level in each scenario over time. This method leverages contextual information, including robot states and the dynamic model of the robot. With its combination of high-resolution LiDAR data, precise human tracking, robot state data, and realistic collision events, LiHRA offers an essential foundation for future research into real-time RM and adaptive safety strategies in human-robot workspaces.

17:05-17:10, Paper TuDT16.6
Advancing Robot Interaction Safety: A Teleoperated Shared-Control Approach Using a Lightweight Force-Feedback Exoskeleton

Wang, Ruohan	Zhejiang University, Hangzhou, China
Zhang, Guangwei	Zhejiang University
Zhu, Zhengjie	Zhejiang University
Lyu, Honghao	Zhejiang University
Huang, Xiaoyan	Zhejiang University
Dong, Na	Dongfang Electric (Hangzhou) Innovation Institute Company
Chen, Lipeng	Tencent
Deen, Mohamed Jamal	McMaster University
Yang, Geng	Zhejiang University
Keywords: Safety in HRI, Prosthetics and Exoskeletons, Telerobotics and Teleoperation Abstract: Tele-homecare has become a promising approach to meet the growing demand for elderly and disability care. In such a context, ensuring human-robot interaction safety during teleoperation poses a critical challenge. Existing teleoperation control approaches focus solely on the robot's end-effector trajectory, failing to handle inevitable or even desirable contacts on other robot links. This paper proposes a teleoperated shared-control strategy to deal with this challenge. A lightweight exoskeleton is developed to teleoperate the robot and give force feedback to the operator. Additionally, an exoskeleton-based shared-control strategy is proposed to integrate operator commands with real-time proximity sensing information, allowing the robot to avoid collisions while executing tasks. To react to inevitable contact, the force feedback function is incorporated into the proposed strategy to enable the operator to experience intuitive contact. Comparative experiments and a demonstration are designed to evaluate the feasibility and reliability of the proposed strategy in a tele-homecare scenario. Compared to the traditional teleoperation strategy, the proposed method can greatly reduce the contact forces on the robot's links, indicating the potential of the proposed strategy in advancing safety in tele-homecare systems.

17:10-17:15, Paper TuDT16.7
Dynamic Walking Corridor Generation for Visually Impaired Navigation Using Social Force Models and Convex Optimization

Na, Qingquan	Nanjing University of Science and Technology
Zhou, Hui	Nanjing University of Science and Technology
Fu, Zhenyu	Nanjing University of Science and Technology
Yang, Li	Nanjing University of Science and Technology
Frisoli, Antonio	TeCIP Institute, Scuola Superiore Sant'Anna
Keywords: Safety in HRI, Motion and Path Planning, Human-Robot Collaboration Abstract: This paper presents a dynamic walking corridor generation (DWCG) algorithm designed to enhance navigation safety for visually impaired individuals in crowded pedestrian environments. Current physical human-robot interaction (pHRI) systems struggle with random pedestrian movements and interaction disturbances in such settings. To address these limitations, we propose a safety-critical framework that integrates Safe Flight Corridor concepts with pedestrian dynamics modeling. The method constructs time-varying Safe Walking Corridors (SWCs) through convex polyhedra decomposition, constrained by social force model predictions. Simulation experiments demonstrate a 100% success rate in moderate crowds (50 pedestrians or fewer) with 10.1 ms average computation time, and 86.3% success in high-density environments (100 pedestrians), establishing a foundation for reliable assistive navigation systems in complex urban settings.

17:15-17:20, Paper TuDT16.8
Social Robot Haru Assisiting Dynamic Group Disscussion with Autonomous Eye Gaze Behavior

Tang, Fei	Ocean University of China
Hu, Mingyang	Ocean University of China
Fang, Yu	Honda Research Institute Japan Co., Ltd
Yu, Hongqi	Ocean University of China
Nichols, Eric	Honda Research Institute Japan
Gomez, Randy	Honda Research Institute Japan Co., Ltd
Li, Guangliang	Ocean University of China
Keywords: Robot Companions, Human-Centered Automation, Human-Centered Robotics Abstract: Due to recent advances in large language models and robotics, social robots will potentially play an important role in people's daily lives soon, and expected to improve dynamic multi-party group discussions in social scenarios. In this paper, we developed a system to assist dynamic group discussion with our social robot Haru. Our system is composed of three modules: a Dialogue Assistance module via integrating Haru with large language models which facilitates Haru to be an embodied chatbot; a Balancing and Welcoming Behavior module to improve users' engagement and welcome new users to join the discussion with verbal behaviors; an Autonomous Eye Gazing module to show politeness during group discussion, e.g., gazing to the talking user or the less-engaging user to encourage her, looking to the new comer when she joins the discussion, gazing via eyeball movement when the current speaking user is close to the previous one. The autonomous eye gazing behavior was first trained via deep reinforcement learning in simulation and transferred to physical Haru in the real world. Results of our user study with 50 subjects show the significant performance of our system in assisting dynamic group discussion.


TuDT17	210A
Autonomous Vehicles 3	Regular Session
Chair: Yao, Weijia	Hunan University
Co-Chair: Zhu, Yilong	HKUST

16:40-16:45, Paper TuDT17.1
Landing-Aware Multi-Drone Routing in Last-Mile Delivery Services

Kwon, JiHyun	DGIST
Chen, Yi-Ying	National Taiwan University
Lee, GaHyun	DGIST
Lin, Chung-Wei	National Taiwan University
Kim, BaekGyu	DGIST
Keywords: Logistics, Intelligent Transportation Systems, Path Planning for Multiple Mobile Robots or Agents Abstract: We propose a framework to compute the optimal routes for multi-drones to minimize the delivery time in the lastmile delivery service. We mainly focus on a notion of the landing exclusion zone that appears during the landing phase; an area around the drop-off site is blocked until a drop-off is completed. Such zones affect the delivery time as other drones need to detour or hover around the site unnecessarily. We formulate the Mixed-Integer Linear Programming (MILP) problem by explicitly modeling the landing phase. Then, we present the heuristic algorithm that iteratively solves a sequence of single drone delivery problems according to the delivery priorities. A delivery priority is determined according to the spatiotemporal occupancy that quantifies the significance of the size of the landing exclusion zone and its blocking period. We designed the experiment for 48 urban delivery scenarios with varying density and distribution of delivery destinations, departure points, and order quantities. Our experiment results show that the heuristic computes the routes significantly faster than the original MILP, and the delivery time is 5% higher from the optimal solution (lower-bound), and 60% lower from the general requirement of a single package per round-trip (upper-bound).

16:45-16:50, Paper TuDT17.2
Learning to Generate Vectorized Maps at Intersections with Multiple Roadside Cameras

Zheng, Quanxin	NavInfo Co. Ltd
Fan, Miao	Beijing Institute of Graphic Communication
Xu, Shengtong	Autohome Inc
Kong, Linghe	Shanghai Jiao Tong University
Xiong, Haoyi	Baidu
Keywords: Mapping, Computer Vision for Transportation, Intelligent Transportation Systems Abstract: Vectorized maps are indispensable for precise navigation and the safe operation of autonomous vehicles. Traditional methods for constructing these maps fall into two categories: offline techniques, which rely on expensive, labor-intensive LiDAR data collection and manual annotation, and online approaches that use onboard cameras to reduce costs but suffer from limited performance, especially at complex intersections. To bridge this gap, we introduce MRC-VMap, a cost-effective, vision-centric, end-to-end neural network designed to generate high-definition vectorized maps directly at intersections. Leveraging existing roadside surveillance cameras, MRC-VMap directly converts time-aligned, multi-directional images into vectorized map representations. This integrated solution lowers the need for additional intermediate modules--such as separate feature extraction and Bird’s-Eye View (BEV) conversion steps--thus reducing both computational overhead and error propagation. Moreover, the use of multiple camera views enhances mapping completeness, mitigates occlusions, and provides robust performance under practical deployment constraints. Extensive experiments conducted on 4,000 intersections across 4 major metropolitan areas in China demonstrate that MRC-VMap not only outperforms state-of-the-art online methods but also achieves accuracy comparable to high-cost LiDAR-based approaches, thereby offering a scalable and efficient solution for modern autonomous navigation systems.

16:50-16:55, Paper TuDT17.3
On Learning Racing Policies with Reinforcement Learning

Czechmanowski, Grzegorz	Poznan University of Technology, IDEAS Research Institute, IDEAS
Węgrzynowski, Jan	IDEAS NCBR, Poznan University of Technology
Kicki, Piotr	Poznan University of Technology
Walas, Krzysztof, Tadeusz	Poznan University of Technology
Keywords: Machine Learning for Robot Control, Reinforcement Learning, Autonomous Vehicle Navigation Abstract: Fully autonomous vehicles promise enhanced safety and efficiency. However, ensuring reliable operation in challenging corner cases requires control algorithms capable of performing at the vehicle limits. We address this requirement by considering the task of autonomous racing and propose solving it by learning a racing policy using Reinforcement Learning (RL). Our approach leverages domain randomization, actuator dynamics modeling, and policy architecture design to enable reliable and safe zero-shot deployment on a real platform. Evaluated on the F1TENTH race car, our RL policy not only surpasses a state-of-the-art Model Predictive Control (MPC), but, to the best of our knowledge, also represents the first instance of an RL policy outperforming expert human drivers in RC racing. This work identifies the key factors driving this performance improvement, providing critical insights for the design of robust RL-based control strategies for autonomous vehicles.

16:55-17:00, Paper TuDT17.4
Scanning Bot: Efficient Scan Planning Using Panoramic Cameras

Lee, Euijeong	Ewha Womans University
Han, Kyung Min	Ewha Womans Univeristy
Kim, Young J.	Ewha Womans University
Keywords: Motion and Path Planning, Autonomous Vehicle Navigation, View Planning for SLAM Abstract: Panoramic RGB-D cameras enable high-quality 3D scene reconstruction but require manual viewpoint selection and physical camera transportation, making the process time-consuming and tedious—especially for novice users. Key challenges include ensuring sufficient feature overlap between camera views and planning collision-free paths. We propose a fully autonomous scan planner that generates efficient and collision-free tours with adequate viewpoint overlap to address these issues. Experiments in both synthetic and real-world environments show that our method achieves up to 99% scan coverage and is up to 3× faster than state-of-the-art view planners.

17:00-17:05, Paper TuDT17.5
Online Navigation Method for Mobile Robot Based on Thermal Compliance (I)

Wan, Shaoke	Xi'an Jiaotong University
Wang, Yunlong	Xi'an Jiaotong University
Qi, Pengyuan	Xi'an Jiaotong University
Fang, Yuanyang	Xi'an Jiaotong University
Li, Xiaohu	Xi'an Jiaotong University
Keywords: Motion and Path Planning Abstract: In this paper, a novel online navigation method based on thermal compliance is proposed. The fundamental concept is that the optimal navigation path corresponds to the path with minimum thermal resistance during steady-state heat conduction. This objective can be achieved by continuously adding high thermal conductivity materials from the heat source to the heat sink. To determine the optimal laying direction of high thermal conductivity material, which also represents the optimal motion direction of mobile robot, dichotomy is employed. In order to generate feasible trajectories in real-time, the state space of mobile robot is discretized, and a set of motion primitives are generated by solving two-point boundary value optimal control problems. Simulations and experiments demonstrate that this proposed method exhibits robustness in terms of finding paths within unknown and dynamic environments without being trapped in local minima.

17:05-17:10, Paper TuDT17.6
Finite-Time Guiding Vector Fields for Accelerated Path Following of Nonholonomic Robots

Yang, Jian	South China University of Technology
Wu, Junlong	South China University of Technology
Ouyang, Yuan	South China University of Technology
Yao, Weijia	Hunan University
Keywords: Motion Control, Wheeled Robots, Motion and Path Planning Abstract: Guiding vector fields (GVFs) have been widely applied in robotic path-following control. However, most, if not all, of the existing studies derive control algorithms that only render the path-following error asymptotically converging to zero, while more stringent time constraints on the path following error convergence have not been fully studied. In this paper, by introducing a signum-based function, we propose a finite-time GVF that enables a nonholonomic robot to follow an arbitrary smooth nD desired path within a finite time. Note that the finite time is dependent on the initial condition and can be computed in advance. In practical applications, we design a controller based on the proposed GVF for the unicycle model. This controller drives a nonholonomic robot’s velocity to align with that of the GVF within a finite time. In addition, we introduce the extension of the proposed GVF to the distributed motion coordination among an arbitrary number of robots.Finally, we conduct two experiments using unmanned ground vehicles to validate the effectiveness of the proposed algorithms.


TuDT18	210B
Multi-Robot Systems 4	Regular Session
Chair: Zhang, Hong	SUSTech
Co-Chair: Saska, Martin	Czech Technical University in Prague

16:40-16:45, Paper TuDT18.1
Swarming without an Anchor (SWA): Robot Swarms Adapt Better to Localization Dropouts Then a Single Robot

Horyna, Jiri	Czech Technical University in Prague
Jung, Roland	University of Klagenfurt
Weiss, Stephan	Universität Klagenfurt
Ferrante, Eliseo	Vrije Universiteit Amsterdam
Saska, Martin	Czech Technical University in Prague
Keywords: Multi-Robot Systems, Localization, Swarm Robotics Abstract: In this paper, we present the Swarming Without an Anchor (SWA) approach to state estimation in swarms of Unmanned Aerial Vehicles (UAVs) experiencing ego-localization dropout, where individual agents are laterally stabilized using relative information only. We propose to fuse decentralized state estimation with robust mutual perception and onboard sensor data to maintain accurate state awareness despite intermittent localization failures. Thus, the relative information used to estimate the lateral state of UAVs enables the identification of the unambiguous state of UAVs with respect to the local constellation. The resulting behavior reaches velocity consensus, as this task can be referred to as the double integrator synchronization problem. All disturbances and performance degradations except a uniform translation drift of the swarm as a whole is attenuated which is enabling new opportunities in using tight cooperation for increasing reliability and resilience of multi-UAV systems. Simulations and real-world experiments validate the effectiveness of our approach, demonstrating its capability to sustain cohesive swarm behavior in challenging conditions of unreliable or unavailable primary localization.

16:45-16:50, Paper TuDT18.2
MR-COGraphs: Communication-Efficient Multi-Robot Open-Vocabulary Mapping System Via 3D Scene Graphs

Gu, Qiuyi	Tsinghua University
Ye, Zhaocheng	Tsinghua University
Yu, Jincheng	Tsinghua University
Tang, Jiahao	Tsinghua University
Yi, Tinghao	University of Science and Technology of China
Dong, Yuhan	Tsinghua University
Wang, Jian	Tsinghua Univ
Cui, Jinqiang	Peng Cheng Laboratory
Chen, Xinlei	Tsinghua University
Wang, Yu	Tsinghua University
Keywords: Multi-Robot Systems, Mapping, RGB-D Perception Abstract: Collaborative perception in unknown environments is crucial for multi-robot systems. With the emergence of foundation models, robots can now not only perceive geometric information but also achieve open-vocabulary scene understanding. However, existing map representations that support open-vocabulary queries often involve large data volumes, which becomes a bottleneck for multi-robot transmission in communication-limited environments. To address this challenge, we develop a method to construct a graph-structured 3D representation called COGraph, where nodes represent objects with semantic features and edges capture their spatial adjacency relationships. Before transmission, a data-driven feature encoder is applied to compress the feature dimensions of the COGraph. Upon receiving COGraphs from other robots, the semantic features of each node are recovered using a decoder. We also propose a feature-based approach for place recognition and translation estimation, enabling the merging of local COGraphs into a unified global map. We validate our framework on two realistic datasets and the real-world environment. The results demonstrate that, compared to existing baselines for open-vocabulary map construction, our framework reduces the data volume by over 80% while maintaining mapping and query performance without compromise. For more details, please visit our website at https://github.com/efc-robot/ MR-COGraphs.

16:50-16:55, Paper TuDT18.3
Density Adaptive Registration of Large-Scale Point Clouds in Diverse Outdoor Environments

Wang, Jiawei	Dalian University of Technology
Zhuang, Yan	Dalian University of Technology
Yan, Fei	Dalian University of Technology
Zhang, Hong	SUSTech
Keywords: Multi-Robot Systems, Mapping Abstract: Point cloud registration is the foundation of collaborative multi-robot mapping tasks in outdoor environments. Due to the dynamic changes in communication bandwidth, the density of point clouds transmitted from the robot to the server will also change simultaneously, which will significantly affect the point cloud registration accuracy and even lead to the failure of collaborative mapping. To address this problem, we propose a density adaptive registration method for large-scale point clouds with varying densities in diverse outdoor environments. To extract robust point features and establish correspondences between point clouds to be registered, we use an improved MinkowskiUNet32 with a high-resolution point-based branch, which can provide high-resolution point information to supplement the coarse-grained voxel information. Then we propose an outlier rejection algorithm for point correspondences based on the relative height difference of the laser points between point clouds to be registered, which can eliminate wrong matches in the process of coarse registration. To adapt to point density variations in multi-robot outdoor mapping, a novel probability distribution-based point filter is presented to filter out points with dissimilar normal distribution, which can establish accurate point correspondences in fine-tuning. Extensive experiments on large-scale outdoor point cloud datasets KITTI, ETH, and Wild-Places demonstrate that the proposed method achieves state-of-the-art performance in accuracy and efficiency under the condition of varying densities of point clouds. In particular, our method achieves the best generalization on unseen domains between urban and field scenarios. Code is released at https://github.com/dutwjw/darls.

16:55-17:00, Paper TuDT18.4
Homotopy-Aware Multi-Agent Navigation Via Distributed Model Predictive Control

Dong, Haoze	Peking University
Guo, Meng	Peking University
He, Chengyi	Beihang University
Li, Zhongkui	Peking University
Keywords: Multi-Robot Systems, Motion and Path Planning, Cooperating Robots Abstract: Multi-agent trajectory planning ensures safety and efficiency, but deadlocks remain a challenge, especially in obstacle-dense environments. These frequently occur when multiple agents attempt to traverse the same long and narrow corridor simultaneously. To address this, we propose a novel distributed trajectory planning framework that bridges the gap between global path and local trajectory cooperation. At the global level, a homotopy-aware optimal path planning algorithm is proposed, which fully leverages the topological structure of the environment. A reference path is chosen from distinct homotopy classes by considering both its spatial and temporal properties, leading to improved coordination among agents globally. At the local level, a model predictive control-based trajectory optimization method is employed to generate dynamically feasible and collision-free trajectories. Additionally, an online replanning strategy ensures its adaptability to dynamic environments. Simulations and experiments validate the effectiveness of our approach in mitigating deadlocks. Ablation studies demonstrate that by incorporating time-aware homotopic properties into the underlying global paths, our method can significantly reduce deadlocks and improve the average success rate from 4%-13% to over 90% in randomly generated dense scenarios.

17:00-17:05, Paper TuDT18.5
Affine Formation Maneuver Control for NUSVs: An Anti-Competing Interaction Solution with Random Packet Losses (I)

Zhou, Xiaotao	Harbin Engineering University
Huang, Bing	Northwestern Polytechnical University
Zhu, Cheng	Northwestern Polytechnical University
Zhou, Bin	Northwestern Polytechnical University
Qin, Hong-De	Harbin Engineering University
Miao, Jianming	Sun Yat-Sen University
Keywords: Multi-Robot Systems, Networked Robots, Motion Control Abstract: Network’s communication mechanism and reliability are main factors that affect the maneuverability and robustness of networked unmanned surface vehicles (NUSVs). This article investigates event-driven affine formation maneuver control (AFMC) of NUSVs. Two synchronously occurring configuration maneuvers, i.e., position formation and attitude consensus, can be achieved under the event-driven AFMC in manner of discrete neighboring communication. Nevertheless, the communication channel will be simultaneously occupied by multiple packets when different nodes trigger their events at the same time. Worse still, the broadcasted packets may lose when inter-vehicle Euclidean distances are maneuvered beyond the allowable communication range, failing to achieve the pre-specified maneuvering requirement. Regarding this, by introducing a reference system for each vehicle, a novel dynamic interleaved periodic event-triggered mechanism (DIPETM) is subsequently explored to prevent NUSVs from communication competition. Based on this framework, an inner-dynamic variable determined by the conditional-prescribed event detecting period is firstly constructed, which is not only designed to optimize the triggering frequency, but also responsible for evaluating the secondary damage caused by random packet losses. Numerical simulations are conducted to illustrate the efficacy of this work.

17:05-17:10, Paper TuDT18.6
Multi-Robot Reliable Navigation in Uncertain Topological Environments with Graph Attention Networks

Yu, Zhuoyuan	National University of Singapore
Guo, Hongliang	Agency for Science Technology and Research
Chew, Chee Meng	National University of Singapore
Adiwahono, Albertus Hendrawan	I2R A-STAR
Chan, Jianle	Institute for Infocomm Research
Shong, Brina Wey Tynn	Institute for Infocomm Research (I2R), Agency for Science, Techn
Yau, Wei-Yun	I2R
Keywords: Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents, Agent-Based Systems Abstract: This paper studies the multi-robot reliable navigation problem in uncertain topological networks, which aims at maximizing the robot team's on-time arrival probabilities in the face of road network uncertainties. The uncertainty in these networks stems from the unknown edge traversability, which is only revealed to the robot upon its arrival at the edge's starting node. Existing approaches often struggle to adapt to real-time network topology changes, making them unsuitable for varying topological environments. To address the challenge, we reformulate the problem into a Partially Observable Markov Decision Process (POMDP) framework and introduce the Dynamic Adaptive Graph Embedding method to capture the evolving nature of the navigation task. We further enhance each robot's policy learning process by integrating deep reinforcement learning with Graph Attention Networks (GATs), leveraging self-attention to focus on critical graph features. The proposed approach, namely Multi-robot Adaptive Navigation via Graph Attention-based Reinforcement learning (MANGAR) employs the generalized policy gradient algorithm to optimize the robots' real-time decision-making process iteratively. We compare the performance of MANGAR with state-of-the-art reliable navigation algorithms as well as Canadian traveller problem solutions in a range of canonical transportation networks, demonstrating improved adaptability and performance in uncertain topological networks. Additionally, real-world experiments with two robots navigating within a self-constructed indoor environment with uncertain topological structures demonstrate MANGAR's practicality.

17:10-17:15, Paper TuDT18.7
CBTMP: Optimizing Multi-Agent Path Finding in Heterogeneous Cooperative Environments

Gao, Jianqi	Harbin Institute of Technology (Shenzhen)
Li, Yanjie	Harbin Institute of Technology (Shenzhen)
Mu, Yongjin	Harbin Institute of Technology, Shenzhen
Liu, Qi	Northeastern University
Chen, Haoyao	Harbin Institute of Technology, Shenzhen
Lou, Yunjiang	Harbin Institute of Technology, Shenzhen
Keywords: Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents, Collision Avoidance Abstract: This paper introduces the Conflict-Based Three-agent Meeting with Pickup (CBTMP), a near-optimal algorithm tailored for cooperative multi-agent path finding in heterogeneous environments, specifically to boost the operational efficiency of intelligent warehouses. CBTMP is a two-level algorithm. The high-level policy identifies the meeting positions for heterogeneous agents by reformulating the cooperative multi-agent path finding problem as a multi-group, three-agent meeting with pickup problem. Using the meeting positions and predefined task positions, the low-level policy utilizes the proposed conflict-based search with time-step alignment algorithm to plan conflict-free paths for all heterogeneous agents. Extensive evaluations on six two-dimensional grid benchmark maps reveal that CBTMP not only significantly bolsters solution success rates but also attains near-optimal sum-of-costs and makespan values. To confirm its real-world applicability, we also validate CBTMP through experiments with physical Turtlebot3 robots.

17:15-17:20, Paper TuDT18.8
CAT-ORA: Collision-Aware Time-Optimal Formation Reshaping for Efficient Robot Coordination in 3D Environments

Kratky, Vit	Czech Technical University in Prague
Penicka, Robert	Czech Technical University in Prague
Horyna, Jiri	Czech Technical University in Prague
Stibinger, Petr	Czech Technical University in Prague
Baca, Tomas	Czech Technical University in Prague FEE
Petrlik, Matej	Czech Technical University in Prague, Faculty of Electrical Engi
Stepan, Petr	Czech Technical University in Prague, Faculty of Electrical Engi
Saska, Martin	Czech Technical University in Prague
Keywords: Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents, Collision Avoidance, Formation Reshaping Abstract: In this paper, we introduce an algorithm designed to address the problem of time-optimal formation reshaping in 3D environments while preventing collisions between agents. The utility of the proposed approach is particularly evident in mobile robotics, where agents benefit from being organized and navigated in formation for a variety of real-world applications requiring frequent alterations in formation shape for efficient navigation or task completion. Given the constrained operational time inherent to battery-powered mobile robots, the time needed to complete the formation reshaping process is crucial for their efficient operation. The proposed Collision-Aware Time-Optimal formation Reshaping Algorithm (CAT-ORA) builds upon the Hungarian algorithm for the solution of the robot-to-goal assignment implementing the inter-agent collision avoidance through direct constraints on mutually exclusive robot-goal pairs combined with a trajectory generation approach minimizing the duration of the reshaping process. Theoretical validations confirm the optimality of CAT-ORA, with its efficacy further showcased through simulations, and a real-world outdoor experiment involving 19 UAVs. Thorough numerical analysis shows the potential of CAT-ORA to decrease the time required to perform complex formation reshaping tasks by up to 49%, and 12% on average compared to commonly used methods in randomly generated scenarios.


TuDT19	210C
Grippers and End-Effectors	Regular Session
Chair: Seino, Akira	Centre for Transformative Garment Production
Co-Chair: Qu, Juntian	Tsinghua University

16:40-16:45, Paper TuDT19.1
A Novel Robot Hand with Hoeckens Linkages and Soft Phalanges for Scooping and Self-Adaptive Grasping in Environmental Constraints

Guo, Wentao	Beijing Institute of Technology
Wang, Yizhou	Southern University of Science and Technology
Zhang, Wenzeng	Shenzhen X-Institute
Keywords: Grippers and Other End-Effectors, Compliant Joints and Mechanisms, Compliance and Impedance Control Abstract: This paper presents a novel underactuated adaptive robotic hand, Hockens-A Hand, which integrates the Hoeckens mechanism, a double-parallelogram linkage, and a specialized four-bar linkage to achieve three adaptive grasping modes: parallel pinching, asymmetric scooping, and enveloping grasping. Hockens-A Hand requires only a single linear actuator, leveraging passive mechanical intelligence to ensure adaptability and compliance in unstructured environments. Specifically, the vertical motion of the Hoeckens mechanism introduces compliance, the double-parallelogram linkage ensures line contact at the fingertip, and the four-bar amplification system enables natural transitions between different grasping modes. Additionally, the inclusion of a mesh-textured silicone phalanx further enhances the ability to envelop objects of various shapes and sizes. This study employs detailed kinematic analysis to optimize the push angle and design the linkage lengths for optimal performance. Simulations validated the design by analyzing the fingertip motion and ensuring smooth transitions between grasping modes. Furthermore, the grasping force was analyzed using power equations to enhance the understanding of the system's performance.Experimental validation using a 3D-printed prototype demonstrates the three grasping modes of the hand in various scenarios under environmental constraints, verifying its grasping stability and broad applicability.

16:45-16:50, Paper TuDT19.2
Kangaroo Tail-Inspired Variable Stiffness Executive Mechanism for Rescue Robots

Lu, Maoshi	Yanshan University
Zhao, Yanzhi	Yanshan University
Ma, Feixiang	Yanshan University
Shan, Yu	Dalian University of Technology
Zhang, Bowen	Yanshan University
Keywords: Grippers and Other End-Effectors, Biomimetics, Search and Rescue Robots Abstract: In the field of casualty extraction and rescue, a key challenge is avoiding secondary injuries to the human body during rescue operations. Currently, most rescue robot executive mechanisms are rigid, which increases the risk of contact-related injuries. To address this issue, a rescue executive mechanism with variable stiffness, inspired by the kangaroo tail, is proposed. This mechanism can adapt to the human body's contours with flexible contact while providing sufficient rigidity and load-bearing capacity. By applying the layer jamming principle, the mechanism achieves variable stiffness, enabling it to self-adapt to the human body profile and support large loads. Drawing inspiration from the rigid support and shape adaptability of the kangaroo tail, a 3D model and a prototype of the executive mechanism were developed using bionic principles. The mechanism's performance was validated through variable stiffness tests, contour adaptation tests, and human model holding and loading experiments. The results demonstrate that the rescue robot executive mechanism exhibits high load-bearing capacity (load 32.17 kg), strong adaptability, and enhanced safety, effectively addressing the limitations of rigid mechanisms that may harm the human body.

16:50-16:55, Paper TuDT19.3
Passive Actuator-Less Gripper for Pick-And-Place of a Piece of Fabric (I)

Seino, Akira	Centre for Transformative Garment Production
Tokuda, Fuyuki	Centre for Transformative Garment Production
Kobayashi, Akinari	Centre for Transformative Garment Production
Kosuge, Kazuhiro	The University of Hong Kong
Keywords: Grippers and Other End-Effectors, Mechanism Design, Grasping Abstract: In this article, we propose a Passive ActuatorLess Gripper (PALGRIP) for picking a piece of fabric from a stack of fabric parts and placing the picked fabric part. The picking of a piece of fabric from a stack is a simple but difficult process to automate. The proposed gripper can pick a piece of fabric from the stack by simply pressing the fingertips of the gripper against the stack. The fingers are closed and opened by the relative motion between the fingers and the housing of the gripper. The grasping motion of the gripper is generated by two mechanisms: a passive pinching mechanism and a self-locking mechanism. These mechanisms allow the fingers to perform opening and closing movements and to maintain the fingers in either open or closed state. The kinematics of the mechanisms are analyzed to design the gripper. The relation between the movement of the fingers and the force required to operate the gripper is also investigated through static force analysis and the experiment. Finally, experiments using PALGRIP are conducted, and the experimental results illustrate how the pick-and-place operations are carried out using the prototype of PALGRIP. The proposed gripper allows the robot to automate fabric pick-and-place operations easily by attaching it to the robot's endpoint.

16:55-17:00, Paper TuDT19.4
TetraGrip: Sensor-Driven Multi-Suction Reactive Object Manipulation in Cluttered Scenes

Torrado, Paolo	University of Washington
Levin, Joshua	University of Washington
Grotz, Markus	University of Washington (UW)
Smith, Joshua R.	University of Washington
Keywords: Grippers and Other End-Effectors, Sensor-based Control, Grasping Abstract: Warehouse robotic systems equipped with vacuum grippers must reliably grasp a diverse range of objects from densely packed shelves. However, these environments present significant challenges, including occlusions, diverse object orientations, stacked and obstructed items, and surfaces that are difficult to suction. We introduce TetraGrip, a novel vacuum based grasping strategy featuring four suction cups mounted on linear actuators. Each actuator is equipped with an optical time-of-flight (ToF) proximity sensor, enabling reactive grasping. We evaluate TetraGrip in a warehouse-style setting, demonstrating its ability to manipulate objects in stacked and obstructed configurations. Our results show that reinforcement learning (RL) strategies improve picking success in stacked object scenarios by 22.9% compared to a single-suction gripper. Additionally, we demonstrate that TetraGrip can successfully grasp objects in scenarios where a single-suction gripper fails due to physical limitations, specifically in two cases: (1) picking an object occluded by another object and (2) retrieving an object in a complex scenario. These findings highlight the advantages of multi-actuated, suction-based grasping in unstructured warehouse environments. The project website is available at: https://tetragrip.github.io/.

17:00-17:05, Paper TuDT19.5
A Bionic Robotic Hand Designed with Multiple Grasping Modes and Magnetic Tactile Perception

Wang, Shixian	Jiangnan University
Yang, Shaobo	Beijing University of Posts and Telecommunications
Wang, Junfeng	Beijing University of Posts and Telecommunications
Li, Boao	Wuhan University of Science and Technology
Sun, Fuchun	Tsinghua University
Fang, Bin	Beijing University of Posts and Telecommunications / Tsinghua Un
Yan, Junxia	Jiangnan University
Keywords: Grippers and Other End-Effectors, Grasping, Force and Tactile Sensing Abstract: Abstract—This paper presents a novel multi-mode bionic robotic hand. Its bionic finger (BIF) ingeniously combines a magnetic-silica-gel skin with a rigid skeletal framework and integrates a vacuum suction cup at the fingertip. This design enables the bionic manipulator to execute multiple grasping modes, namely enveloping, parallel, and suction grasping. The proposed BIF emulates the skeletal structure of human fingers and equips the fingertip with suction-based grasping functionality, thus achieving both formal bionics and functional superiority. The overall grasping space range of the bionic manipulator can be determined through the computation of the offset of the steel wire, which corresponds to the bending angles of the three joints of the finger. Furthermore, by discerning the four phases within the bionic manipulator’s object-grasping process, in-depth exploration is carried out regarding the unique data characteristics of the magnetic-tactile sensing unit during the grasping operation. On this basis, an accurate prediction of the grasped object’s diameter is achieved. We constructed an autonomous grasping operation platform by integrating an external depth camera with the robotic arm to assess the fundamental performance of this robotic hand in grasping diverse objects.

17:05-17:10, Paper TuDT19.6
A Soft Active Surface Gripper for Safe in Hand Manipulation of Fragile Objects

Xiang, Sheng	Nanjing University of Information Science and Technology
Li, Jiahao	Nanjing University of Information Science and Technology
Zhang, Yinqi	Nanjing University of Information Science and Technology
Wei, Zhong	Nanjing University of Information Science and Technology
Liu, Jia	Nanjing University of Information Science & Technology
Yang, Yang	Nanjing University of Information Science and Technology
Keywords: Grippers and Other End-Effectors, Mechanism Design, Soft Robot Applications Abstract: This paper introduces a soft active surface gripper designed to manipulate fragile objects safely. This gripper consists of two fingers, each equipped with two compliant pneumatic actuators and a soft active surface. The gripper utilizes the elastic belt as its soft active surface, which is driven by a motor, and the opening angle of the elastic band is controlled by pneumatic actuators. The novel design allows for the passive deformation of both the soft active surface and the compliant pneumatic actuator, enabling adaptation to various object shapes and demonstrating superior handling capabilities for delicate items. By synchronizing the opening and closing of the pneumatic fingers with the conveying motion of the active surface, the active surface gripper realizes three degrees of freedom (DOF) for in-plane manipulation, specifically two translational movements and one rotational movement. A prototype gripper has been designed and fabricated for in-plane manipulation experiments with fragile objects, including strawberries, miniature cupcakes, and pears. Experimental results demonstrate that the gripper can execute in-plane in-hand manipulation of fragile objects with varying geometries and dimensions while maintaining secure and robust handling, preventing object slippage and preserving surface integrity without causing damage.

17:10-17:15, Paper TuDT19.7
Dynamic Layer Detection of Thin Materials Using DenseTact Optical Tactile Sensors

Dhawan, Ankush	Stanford University
Chungyoun, Camille	Stanford University
Ting, Karina	Stanford University
Kennedy, Monroe	Stanford University
Keywords: Grippers and Other End-Effectors, Grasping, Force and Tactile Sensing Abstract: Manipulation of thin materials is critical for many everyday tasks and remains a significant challenge for robots. While existing research has made strides in tasks like material smoothing and folding, many studies struggle with common failure modes (crumpled corners/edges, incorrect grasp configurations) that a preliminary step of layer detection can solve. We present a novel method for classifying the number of grasped material layers using a custom gripper equipped with DenseTact 2.0 optical tactile sensors. After grasping a thin material, the gripper performs an anthropomorphic rubbing motion while collecting optical flow, 6-axis wrench, and joint state data. Using this data in a transformer-based network achieves a test accuracy of 98.21% in correctly classifying the number of grasped cloth layers, and 81.25% accuracy in classifying layers of grasped paper, showing the effectiveness of our dynamic rubbing method. Evaluating different inputs and model architectures highlights the usefulness of tactile sensor information and a transformer model for this task. A comprehensive dataset of 568 labeled trials (368 for cloth and 200 for paper) was collected and made open-source along with this paper.

17:15-17:20, Paper TuDT19.8
Development and Evaluation of a Quasi-Passive Stiffness Display

Shi, Ke	Southeast University
Gao, Yingchen	National University of Singapore
Xiang, Yichen	Southeast University
Zhang, Maozeng	Southeast University
Zhu, Lifeng	Southeast University
Song, Aiguo	Southeast University
Keywords: Haptics and Haptic Interfaces, Mechanism Design, Virtual Reality and Interfaces Abstract: In the force feedback of physical human-robot interaction, stiffness is rendered to represent the hardness of the manipulated object, aiding users in executing accurate maneuvers. Active feedback based on electric motors usually cannot simultaneously achieve both high force and high backdrivability, leading to limitations in stiffness rendering. Therefore, this letter explores a quasi-passive stiffness display based on variable stiffness mechanism (VSM). It can provide controllable stiffness feedback theoretically ranging from zero to infinity, while the feedback force mainly comes from the VSM's reaction to the user's press force. First, a VSM that decouples the output stiffness from the press displacement is proposed. Then, a one-degree-of-freedom stiffness display prototype based on the VSM is developed and evaluated through quantitative experiments. The experimental results demonstrate that the quasi-passive stiffness display can meet the requirements of diverse tasks within a wide stiffness/force range.


TuDT20	210D
Humanoid and Bipedal Locomotion 2	Regular Session
Chair: Figueroa, Nícolas F.	University Montpellier, Pontificia Universidad Católica Del Perú
Co-Chair: Tazaki, Yuichi	Kobe University

16:40-16:45, Paper TuDT20.1
Learning Perceptive Humanoid Locomotion Over Challenging Terrain

Sun, Wandong	Harbin Institute of Technology
Cao, Baoshi	Harbin Institute of Technology
Chen, Long	Harbin Institute of Technology
Su, Yongbo	Harbin Institute of Technology
Liu, Yang	Harbin Institute of Technology
Xie, Zongwu	Harbin Institute of Technology
Liu, Hong	Harbin Institute of Technology
Keywords: Humanoid and Bipedal Locomotion, Machine Learning for Robot Control Abstract: Humanoid robots are engineered to navigate terrains akin to those encountered by humans, which necessitates human-like locomotion and perceptual abilities. Currently, the most reliable controllers for humanoid motion rely exclusively on proprioception, a reliance that becomes both dangerous and unreliable when coping with rugged terrain. Although the integration of height maps into perception can enable proactive gait planning, robust utilization of this information remains a significant challenge, especially when exteroceptive perception is noisy. To surmount these challenges, we propose a solution based on a teacher-student distillation framework. In this paradigm, an oracle policy accesses noise-free data to establish an optimal reference policy, while the student policy not only imitates the teacher's actions but also simultaneously trains a world model with a variational information bottleneck for sensor denoising and state estimation. Extensive evaluations demonstrate that our approach markedly enhances performance in scenarios characterized by unreliable terrain estimations. Moreover, we conducted rigorous testing in both challenging urban settings and off-road environments, the model successfully traverse 2 km of varied terrain without external intervention.

16:45-16:50, Paper TuDT20.2
The Duke Humanoid: Design and Control for Energy Efficient Bipedal Locomotion Using Passive Dynamics

Xia, Boxi	Duke University
Li, Bokuan	Duke University
Lee, Jacob	Duke University
Scutari, Michael	Duke University
Chen, Boyuan	Duke University
Keywords: Humanoid and Bipedal Locomotion, Humanoid Robot Systems, Passive Walking Abstract: We present the Duke Humanoid, an open-source 10-degrees-of-freedom humanoid, as an extensible platform for locomotion research. The design mimics human physiology, with symmetrical body alignment in the frontal plane to maintain static balance with straight knees. We develop a reinforcement learning policy that can be deployed zero-shot on the hardware for velocity-tracking walking tasks. Additionally, to enhance energy efficiency in locomotion, we propose an end-to-end reinforcement learning algorithm that encourages the robot to leverage passive dynamics. Our experimental results show that our passive policy reduces the cost of transport by up to 50% in simulation and 31% in real-world tests. Our website is url{http://generalroboticslab.com/DukeHumanoidv1/}.

16:50-16:55, Paper TuDT20.3
Transporting Heavy Payloads with a Humanoid Riding a Hoverboard

Armleder, Simon	Technische Universität München
Zhang, Yupeng	Technical University of Munich
Guadarrama-Olvera, J. Rogelio	Technical University of Munich
Cheng, Gordon	Technical University of Munich
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Force and Tactile Sensing Abstract: Driven by the need for rapid and reliable heavy payload transport in logistics and manufacturing, researchers are increasingly exploring early applications of humanoid robotics in these domains. Although bipedal locomotion excels on challenging terrain, wheeled modes of transportation remain significantly more energy-efficient on flat surfaces. In this work, we develop a control system that enables a humanoid robot to achieve fast transportation - by riding a two-wheeled hoverboard - and robust heavy payload handling through whole-body grasping, where the robot uses its chest and arms to stabilize bulky objects. Our approach models payload-induced disturbances using a Linear Inverted Pendulum Mode extended with external forces and leverages tactile feedback from an integrated robotic skin to estimate the payload’s weight and center of mass. Feeding these estimates into the hoverboard controller reduces drift and enhances stability. Experimental evaluations on a full-sized real humanoid robot show that our system can withstand strong disturbances and autonomously navigate to deliver payloads of up to 20 kg.

16:55-17:00, Paper TuDT20.4
Trajectory Generation for Humanoid Backflips and Jumps Based on Whole-Body Dynamics Optimization with Consideration of KKT Residual Convergence

Konishi, Masanori	The University of Tokyo
Hiraoka, Takuma	The University of Tokyo
Kojima, Kunio	The University of Tokyo
Okada, Kei	The University of Tokyo
Keywords: Humanoid and Bipedal Locomotion, Whole-Body Motion Planning and Control, Optimization and Optimal Control Abstract: For trajectory generation of whole-body jumping motions such as humanoid backflips, it is crucial to simultaneously optimize the takeoff, flight, and landing phases while considering full-body dynamics and kinematics. Although such methods have been proposed for standard jumping motions, they have not been applied to more dynamic actions such as frontflips, backflips, and yaw twist jumps, where strong nonlinearity and high sensitivity to certain parameters (e.g., rotor inertia and torque cost weights) pose significant challenges. To address these challenges, we apply a two-stage optimization strategy to an existing full-body dynamics optimization method that simultaneously optimizes the takeoff, flight, and landing phases. In our approach, the same initialization and reference trajectory generation rules are shared across motions, and the solution from the first optimization is used not only as an initial guess but also as a reference in the second optimization. This strategy improves the convergence of the KKT residuals across various jump types and mitigates sensitivity to parameters such as rotor inertia and torque cost weights. As a result, our method achieves unified trajectory generation for frontflips, backflips, yaw twist jumps, and standard jumps using the same initialization, cost weights, and constraints. We also analyze the sensitivity to rotor inertia and show that exceeding a certain threshold can lead to a sharp deterioration in KKT residual convergence.

17:00-17:05, Paper TuDT20.5
Preferenced Oracle Guided Multi-Mode Policies for Dynamic Bipedal Loco-Manipulation

Ravichandar, Prashanth	University of Southern California
Rajan, Lokesh Krishna	University of Southern California
Sobanbabu, Nikhil	Carnegie Mellon University
Nguyen, Quan	University of Southern California
Keywords: Humanoid and Bipedal Locomotion, Multi-Contact Whole-Body Motion Planning and Control, Reinforcement Learning Abstract: Dynamic loco-manipulation calls for effective whole-body control and contact-rich interactions with the object and the environment. Existing learning-based control synthesis relies on training low-level skill policies and explicitly switching with a high-level policy or a hand-designed finite state machine, leading to quasi-static behaviors. In contrast, dynamic tasks such as soccer require the robot to run towards the ball, decelerate to an optimal approach to dribble, and eventually kick a goal—a continuum of smooth motion. To this end, we propose Preferenced Oracle Guided Multi-mode Policies (OGMP) to learn a single policy mastering all the required modes and preferred sequence of transitions to solve uni-object loco-manipulation tasks. We design hybrid automatons as oracles to generate references with continuous dynamics and discrete mode jumps to perform a guided policy optimization through bounded exploration. To enforce learning a desired sequence of mode transitions, we present a task-agnostic preference reward that enhances performance. The proposed approach demonstrates successful loco-manipulation for tasks like soccer and moving boxes omnidirectionally through whole-body control. In soccer, a single policy learns to optimally reach the ball, transition to contact-rich dribbling, and execute successful goal kicks and ball stops. Leveraging the oracle's abstraction, we solve each loco-manipulation task on robots with varying morphologies, including HECTOR V1, Berkeley Humanoid, Unitree G1, and H1, using the same reward definition and weights.

17:05-17:10, Paper TuDT20.6
Adaptive Step Duration for Accurate Foot Placement: Achieving Robust Bipedal Locomotion on Terrains with Restricted Footholds

Xiang, Zhaoyang	The Ohio State University
Paredes, Victor	The Ohio State University
Castillo, Guillermo A.	The Ohio State University
Hereid, Ayonga	Ohio State University
Keywords: Humanoid and Bipedal Locomotion, Passive Walking, Whole-Body Motion Planning and Control Abstract: Traditional one-step preview planning algorithms for bipedal locomotion struggle to generate viable gaits when walking across terrains with restricted footholds, such as stepping stones. To overcome such limitations, this paper introduces a novel multi-step preview foot placement planning algorithm based on the step-to-step discrete evolution of the Divergent Component of Motion (DCM) of walking robots. Our proposed approach adaptively changes the step duration and the swing foot trajectory for optimal foot placement under constraints, thereby enhancing the long-term stability of the robot and significantly improving its ability to navigate environments with tight constraints on viable footholds. We demonstrate its effectiveness through various simulation scenarios with complex stepping-stone configurations and external perturbations. These tests underscore its improved performance for navigating foothold-restricted terrains, even with external disturbances.

17:10-17:15, Paper TuDT20.7
DSE: Denoising State Estimator for RL-Based Bipedal Robot Locomotion

Du, Yidong	Beijing Institute of Technology
Zhou, Zishun	Beijing Institute of Technology
Chen, Xuechao	Beijing Insititute of Technology
Yu, Zhangguo	Beijing Institute of Technology
Wu, Jiahao	Beijing Institute of Technology
Zhang, YuanXi	Beijing Institute of Technology
Zhao, Qingrui	Beijing Institute of Technology
Huang, Qiang	Beijing Institute of Technology
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Reinforcement Learning Abstract: Recent advancements in legged robot locomotion and reinforcement learning have demonstrated significant potential for the development of bipedal robot. But the state estimation accuracy and bipedal robot locomotion robustness of RL-based controller is significantly influenced by IMU’s measurement noise. High-precision IMUs can obtain accurate information, but manufacturing cost is high, while robots equipped with low-price IMUs may face large noise and bias in-consistence. In this paper, we propose a novel denoising autoencoder-based state estimator (DSE) to address sensor noise cancellation and state estimation problem in RL-based bipedal robot locomotion control. The DSE architecture learns a compact representation of robots’ system dynamics behind those low-price IMU's noisy data and provides noise reduced measurements and accurate state estimation for learning-based controller. We demonstrate the effectiveness of the DSE architecture in reducing noise and enhancing the robustness of both state estimation and locomotion control in various indoor and outdoor experiments. The results highlight the potential of DSE framework in facing noise distribution difference between simulation and reality.

17:15-17:20, Paper TuDT20.8
Natural Humanoid Robot Locomotion with Generative Motion Prior

Zhang, Haodong	Zhejiang University
Zhang, Liang	Zhejiang Humanoid Robot Innovation Center
Chen, Zhenghan	Zhejiang University
Chen, Lu	Zhejiang University
Wang, Yue	Zhejiang University
Xiong, Rong	Zhejiang University
Keywords: Humanoid and Bipedal Locomotion, Human and Humanoid Motion Analysis and Synthesis, Natural Machine Motion Abstract: Natural and lifelike locomotion remains a fundamental challenge for humanoid robots to interact with human society. However, previous methods either neglect motion naturalness or rely on unstable and ambiguous style rewards. In this paper, we propose a novel Generative Motion Prior (GMP) that provides fine-grained motion-level supervision for the task of natural humanoid robot locomotion. To leverage natural human motions, we first employ whole-body motion retargeting to effectively transfer them to the robot. Subsequently, we train a generative model offline to predict future natural reference motions for the robot based on a conditional variational auto-encoder. During policy training, the generative motion prior serves as a frozen online motion generator, delivering precise and comprehensive supervision at the trajectory level, including joint angles and keypoint positions. The generative motion prior significantly enhances training stability and improves interpretability by offering detailed and dense guidance throughout the learning process. Experimental results in both simulation and real-world environments demonstrate that our method achieves superior motion naturalness compared to existing approaches. Project page can be found at https://sites.google.com/view/humanoid-gmp


TuDT21	101
Optimization and Optimal Control 4	Regular Session
Chair: Wang, Fanxin	Xi'an Jiaotong-Liverpool University (XJTLU)
Co-Chair: Li, Xiuxian	Tongji University

16:40-16:45, Paper TuDT21.1
A Receding Horizon Online Trajectory Generation Method for Time-Efficient Path Tracking with Dynamic Factors (I)

Zhang, Shiyu	Beijing University of Posts and Telecommunications
Sun, Da	University of Science and Technology of China
Liao, Qianfang	University of Science and Technology of China
Keywords: Reactive and Sensor-Based Planning, Collision Avoidance, Optimization and Optimal Control Abstract: Current path-tracking trajectory planning methods intrinsically suffer from a heavy computational burden, hindering them from modern autonomous robotic applications requiring real-time reactivity. To achieve flexible, accurate, and efficient motions, this article proposes a real-time trajectory planning framework for time-efficient path tracking. First, a receding horizon method is designed that generates local trajectories online while considering the global kinematic constraints. Second, an analytical closed-form method is developed for calculating the local time-minimal trajectory. Third, the models and solutions are established for handling dynamic factors. Compared to existing methods, this method exhibits lower computational complexity and can deal with path changes during motion executions while maintaining tracking accuracy and time efficiency. Experiments using Franka-Emika Panda robots are conducted in handover scenarios involving unpredictable dynamic obstacles and human interventions. The results demonstrate the low computational overhead. The robot flexibly reacts to online path changes, maintaining tracking accuracy while marginally compromising time efficiency.

16:45-16:50, Paper TuDT21.2
Experimental Investigation of Time-Delayed Control for Enhanced Performance in a High-Static-Low-Dynamic-Stiffness Vibration Isolation System (I)

Cai, Jiazhi	Harbin Institute of Technology, Shenzhen
Gao, Qingbin	Harbin Institute of Technology (Shenzhen)
Zhu, Shihao	Harbin Institute of Technology, ShenZhen
Keywords: Optimization and Optimal Control, Robust/Adaptive Control, Sensor-based Control Abstract: This article explores the dynamic effects of implementing time-delayed feedback in a passive high-static-low-dynamic-stiffness (HSLDS) isolator and its potential to enhance low-frequency isolation performance. An inherent delay of 7–16 ms and a filter delay of 53 ms are identified to demonstrate that the active feedback in HSLDS systems intrinsically involves delay effects. We then clarify the effects of delayed acceleration feedback on the stability and steady-state response of the isolator. Finally, we develop an optimization strategy employing the Broyden-Fletcher Goldfarb-Shanno algorithm to improve the isolation perfor、mance while ensuring the stability margin. Experimental results indicate that optimizing the feedback parameters enhances the isolation by over 50% and reduces the initial isolation frequency from 8.48 to 6.2 Hz. These findings prove that the feedback delay, when effectively harnessed, can serve as a valuable control resource, particularly when combined with the negative stiffness structure to maximize low-frequency vibration suppression effect.

16:50-16:55, Paper TuDT21.3
Optimizing for Ride Comfort: A Model Predictive Control Framework with Frequency-Domain Analysis of the Acceleration Sequence

Hsiao, Chun-Chien	University of Illinois at Urbana-Champaign
An, Gihyeob	University of Illinois at Urbana-Champaign
Talebpour, Alireza	University of Illinois at Urbana-Champaign
Keywords: Optimization and Optimal Control, Autonomous Vehicle Navigation, Intelligent Transportation Systems Abstract: Improving ride comfort can help accelerate the adoption of autonomous vehicles (AVs). Unfortunately, very few studies directly consider comfort in the controller design and among the existing ones, most of them solely focus on instantaneous acceleration and jerk. Such an approach cannot fully capture ride comfort. In fact, the International Organization for Standardization (ISO) emphasizes that ride comfort should be evaluated based on acceleration patterns over time. To bridge this research gap, this study proposes a comfort-centric Model Predictive Control (MPC) framework that optimizes both tangential and lateral acceleration patterns for optimal 2D maneuvers, including longitudinal acceleration and steering rate. The framework is subsequently tested against turning trajectories from the Waymo Open Dataset. Results demonstrate that our approach improves ride comfort compared to the original Waymo trajectories. Here, more comfort improvement can be achieved at higher lateral acceleration, implying that the proposed MPC framework can lead to more gentle turning behaviors. These findings highlight the effectiveness of the proposed MPC framework in enhancing ride comfort.

16:55-17:00, Paper TuDT21.4
Analytic Conditions for Differentiable Collision Detection in Trajectory Optimization

Jaitly, Akshay	Worcester Polytechnic Institute
Jha, Devesh	Mitsubishi Electric Research Laboratories
Ota, Kei	Mitsubishi Electric
Shirai, Yuki	Mitsubishi Electric Research Laboratories
Keywords: Optimization and Optimal Control, Collision Avoidance, Computational Geometry Abstract: Optimization-based methods are widely used for computing fast, diverse solutions for complex tasks such as collision-free movement or planning in the presence of contacts. However, most of these methods require enforcing non-penetration constraints between objects, resulting in a non-trivial and computationally expensive problem. This makes the use of optimization-based methods for planning and control challenging. In this paper, we present a method to efficiently enforce non-penetration of sets while performing optimization over their configuration, which is directly applicable to problems like collision-aware trajectory optimization. We introduce novel differentiable conditions with analytic expressions to achieve this. To enforce non-collision between non-smooth bodies using these conditions, we introduce a method to approximate polytopes as smooth semi-algebraic sets. We present several numerical experiments to demonstrate the performance of the proposed method and compare the performance with other baseline methods recently proposed in the literature.

17:00-17:05, Paper TuDT21.5
Improving Drone Racing Performance through Iterative Learning MPC

Zhao, Haocheng	Technical University of Munich
Schlüter, Niklas	Technical University Munich
Brunke, Lukas	Technical University of Munich
Schoellig, Angela P.	TU Munich
Keywords: Optimization and Optimal Control, Incremental Learning, Learning from Experience Abstract: Autonomous drone racing presents a challenging control problem, requiring real-time decision-making and robust handling of nonlinear system dynamics. While iterative learning model predictive control~(LMPC) offers a promising framework for iterative performance improvement, its direct application to drone racing faces challenges like real-time compatibility or the trade-off between time-optimal and safe traversal. In this paper, we enhance LMPC with three key innovations:~(1) an adaptive cost function that dynamically weights time-optimal tracking against centerline adherence,~(2)~a shifted local safe set to prevent excessive shortcutting and enable more robust iterative updates, and~(3) a Cartesian-based formulation that accommodates safety constraints without the singularities or integration errors associated with Frenet-frame transformations. Results from extensive simulation and real-world experiments demonstrate that our improved algorithm can optimize initial trajectories generated by any controller with any level of tuning for a maximum improvement in lap time by 60.85%. Even applied to the most aggressively tuned state-of-the-art model-based controller, MPCC++, on a real drone, a 6.05% improvement is still achieved. Overall, the proposed method pushes the drone toward faster traversal and avoids collisions in simulation and real-world experiments, making it a practical solution to improve the peak performance of drone racing.

17:05-17:10, Paper TuDT21.6
Priority-Based Energy Allocation in Buildings through Distributed Model Predictive Control (I)

Li, Hongyi	Harbin Institute of Technology, Shenzhen
Xu, Jun	Harbin Institute of Technology, Shenzhen
Zhao, Qianchuan	Tsinghua University
Keywords: Building Automation, Optimization and Optimal Control Abstract: Many countries are facing energy shortages today and most of the global energy is consumed by HVAC systems in buildings. For the scenarios where the energy system is not sufficiently supplied to HVAC systems, a priority-based allocation scheme based on distributed model predictive control is proposed in this paper, which distributes the energy rationally based on priority order. According to the scenarios, two distributed allocation strategies, i.e., one-to-one priority strategy and multi-to-one priority strategy, are developed in this paper and validated by simulation in a building containing three zones and a building containing 36 rooms, respectively. Both priority-based strategies fully exploit the potential of predictive control solutions. The experiment shows that our scheme has good scalability and achieves the performance of the centralized strategy while making the calculation tractable.

17:10-17:15, Paper TuDT21.7
BoundPlanner: A Convex-Set-Based Approach to Bounded Manipulator Trajectory Planning

Oelerich, Thies	TU Wien
Hartl-Nesic, Christian	TU Wien
Beck, Florian	TU Wien
Kugi, Andreas	TU Wien
Keywords: Constrained Motion Planning, Optimization and Optimal Control, Industrial Robots Abstract: Online trajectory planning enables robot manipulators to react quickly to changing environments or tasks. Many robot trajectory planners exist for known environments but are often too slow for online computations. Current methods in online trajectory planning do not find suitable trajectories in challenging scenarios that respect the limits of the robot and account for collisions. This work proposes a trajectory planning framework consisting of the novel Cartesian path planner based on convex sets, called BoundPlanner, and the online trajectory planner BoundMPC. BoundPlanner explores and maps the collision-free space using convex sets to compute a reference path with bounds. BoundMPC is extended in this work to handle convex sets for path deviations, which allows the robot to optimally follow the path within the bounds while accounting for the robot's kinematics. Collisions of the robot's kinematic chain are considered by a novel convex-set-based collision avoidance formulation independent on the number of obstacles. Simulations and experiments with a 7-DoF manipulator show the performance of the proposed planner compared to state-of-the-art methods. The source code is available at github.com/TU-Wien-ACIN-CDS/BoundPlanner and videos of the experiments can be found at www.acin.tuwien.ac.at/42d4.


TuDT22	102A
Robotics in Harsh Environment	Regular Session
Chair: Yun, Dongwon	Daegu Gyeongbuk Institute of Science and Technology (DGIST)
Co-Chair: Zhou, Boyu	Southern University of Science and Technology

16:40-16:45, Paper TuDT22.1
Local Reactive Control for Mobile Manipulators with Whole-Body Safety in Complex Environments

Zheng, Chunxin	The Hong Kong University of Science and Technology(Guangzhou)
Li, Yulin	Hong Kong University of Science and Technology(HKUST)
Song, Zhiyuan	The Hong Kong University of Science and Technology(Guangzhou)
Bi, Zhihai	Hong Kong University of Science and Technology (Guangzhou)
Zhou, Jinni	Hong Kong University of Science and Technology (Guangzhou)
Zhou, Boyu	Southern University of Science and Technology
Ma, Jun	The Hong Kong University of Science and Technology
Keywords: Reactive and Sensor-Based Planning, Collision Avoidance, Mobile Manipulation Abstract: Mobile manipulators typically encounter significant challenges in navigating narrow, cluttered environments due to their high-dimensional state spaces and complex kinematics. While reactive methods excel in dynamic settings, they struggle to efficiently incorporate complex, coupled constraints across the entire state space. In this work, we present a novel local reactive controller that reformulates the time-domain single-step problem into a multi-step optimization problem in the spatial domain, leveraging the propagation of a serial kinematic chain. This transformation facilitates the formulation of customized, decoupled link-specific constraints, which is further solved efficiently with augmented Lagrangian differential dynamic programming (AL-DDP). Our approach naturally absorbs spatial kinematic propagation in the forward pass and processes all link-specific constraints simultaneously during the backward pass, enhancing both constraint management and computational efficiency. Notably, in this framework, we formulate collision avoidance constraints for each link using accurate geometric models with extracted free regions, and this improves the maneuverability of the mobile manipulator in narrow, cluttered spaces. Experimental results showcase significant improvements in safety, efficiency, and task completion rates. These findings underscore the robustness of the proposed method, particularly in narrow, cluttered environments where conventional approaches could falter. The open-source project can be found at https://github.com/Chunx1nZHENG/MM-with-Whole-Body-Safety-Release.git.

16:45-16:50, Paper TuDT22.2
Deformation Control and Thrust Analysis of a Flexible Fishtail with Muscle-Like Actuation

Gu, Junwen	Institute of Automation, Chinese Academy of Sciences
Wang, Jian	Institute of Automation, Chinese Academy of Sciences
Liu, Zhijie	Beihang University
Tan, Min	Institute of Automation, Chinese Academy of Sciences
Yu, Junzhi	Chinese Academy of Sciences
Wu, Zhengxing	Institute of Automation, Chinese Academy of Sciences
Keywords: Robotic Fish, Biologically-Inspired Robots, Biomimetics, Motion Control Abstract: In nature, fish have evolved sophisticated muscular systems that enable them to dynamically regulate their body movements for efficient and agile swimming, which has inspired the development of compact and fast flexibility regulation mechanisms in robotic fish. While existing robotic fish have primarily relied on passive flexible mechanisms and tunable stiffness mechanisms, these approaches often lack the dynamic adjustment capabilities that are characteristic of living fish. This article proposes a novel biomimetic flexible fishtail capable of dynamically controlling its deformation through artificial muscles made from macrofiber composite. In detail, the fishtail is equipped with a servo motor as the sole driving joint, while the artificial muscles regulate the deformation to indirectly adjust stiffness. A dynamic model considering both flexibility and hydrodynamics is established, and a partial differential equation observer is particularly developed to estimate the tail’s full states. Subsequently, a deformation control framework incorporating a deep reinforcement learning strategy is constructed and successfully deployed on an embedded platform via lightweight design. Simulation and experimental results validate the accuracy and effectiveness of the dynamic model, observer, and control strategy. Especially, the proposed fishtail demonstrates the ability to enhance propulsion in fishlike swimming modes across various frequencies, ranging from 15% to 203%. When assembled into an untethered robotic prototype, deformation control allows the prototype’s swimming speed to vary, achieving up to 42% slower or 37% faster speeds compared to passive compliance. Its rapid adjustability and adaptability to different frequencies represent significant advancements not widely reported in previous studies. The obtained results will offer some significant insights for flexible robotic systems to enhance their agility and interactivity.

16:50-16:55, Paper TuDT22.3
Analysis and Design of a Bistable Tail for a Hybrid Throwbot in a Step-Overcoming Scenario

Ju, Insung	DGIST
Kim, MinSeop	University of Seoul
Keum, Jaeyeong	DGIST
Lim, Seunghyun	DGIST
Yun, Dongwon	Daegu Gyeongbuk Institute of Science and Technology (DGIST)
Keywords: Product Design, Development and Prototyping, Field Robots, Wheeled Robots Abstract: In this study, we propose a reconfigurable laminate mechanism based bistable tail for Throwbot transforming into a ball type and a wheel type. Various robots such as snake robots, drones, and throwing robots for life-saving missions on behalf of humans at disaster sites have been studied. In particular the hybrid type throwing robot can have both the throwing ease of the ball type and the driving stability of the wheel type. However, it requires the tail to be stored inside when being thrown and to be rigidly deployed when driving. To satisfy these requirements, we developed a foldable tail based on scissor lift structure in our previous study. But, such a structure was composed of only rigid parts, which caused interference with other parts when stored, and difficulty about changing the maximum deployed tail length further. To overcome these limitations, we wanted to develop a bistable tail suitable for the hybrid type that can maintain a bendable state and a rigid state. Before actual development, we calculate the minimum tail length for overcoming obstacle through statics analysis. Then, we design a bistable structure utilizing a reconfigurable laminate mechanism. Next, we calculate the design constraints to mount it on the actual robot. Finally, the developed tail is mounted on the actual Throwbot to perform obstacle overcoming experiments. We confirm that it can secure both ease throwing and stable obstacle overcoming ability. Through this, we propose a bistable tail suitable for the hybrid type throwing robots.

16:55-17:00, Paper TuDT22.4
Crawler Robot with Movable Bending Point for Enhanced Traversability

Uda, Yuki	Kyushu University
Kanada, Ayato	Kyushu University
Nakashima, Yasutaka	Kyushu University
Yamamoto, Motoji	Kyushu University
Keywords: Search and Rescue Robots, Mechanism Design, Underactuated Robots Abstract: This paper introduces a crawler robot equipped with a movable bending point mechanism that allows it to bend anywhere along its trunk. The robot features a movable motor unit that travels along its body, dynamically adjusting the bending initiation point. This enables the robot to move forward and backward, bend its body, and shift the bending position as needed. By integrating these capabilities, the robot enhances its ability to navigate obstacles and confined spaces. We develop three statics-based models to predict the robot's performance in escaping narrow spaces, climbing steps, and crossing ditches. These models demonstrate that adjusting the bending point improves traversal capabilities, a finding corroborated by experiments with the physical robot.

17:00-17:05, Paper TuDT22.5
Efficient Navigation for Quadruped Robots in Post-Disaster Scenarios

Cruz Ulloa, Christyan	Centro De Automática Y Robótica (UPM-CSIC), Universidad Politécn
Guijarro Tolón, Jorge	Universidad Politécnica De Madrid
del Cerro, Jaime	Universidad Politécnica De Madrid
Barrientos, Antonio	UPM
Keywords: Search and Rescue Robots, Robotics in Hazardous Fields, Legged Robots Abstract: Search and rescue (SAR) operations require optimal solutions to maximize efficiency and minimize risks for human responders. Quadruped robots have emerged as viable agents due to their locomotion and agility capabilities. This work presents a navigation framework designed to enhance stability and adaptability of legged robots in complex environments. A high-fidelity simulation environment was developed using NVIDIA IsaacSim, incorporating realistic disaster conditions such as unstructured terrain, fire, and smoke. Multiple stochastic routes were generated and analyzed in simulation in terms of energy consumption, stability, traversal time, and fall occurrences, to categorize them and determine the safest and most efficient path before real-world deployment, reducing the likelihood of failures. A fuzzy logic controller was proposed to regulate speed and improve locomotion adaptability. The proposed approach was validated in both simulation and real-world experiments. The results demonstrate the effectiveness of the proposed strategy in enabling safe and efficient navigation for quadruped robots in SAR missions. The code is publicly available at https://github.com/Robcib-GIT/IsaacSim_LeggedRobot. Supplementary Video:https://youtu.be/cxEP5YVy8Qo

17:05-17:10, Paper TuDT22.6
Robot Teleoperation Design Requirements from End Users in Nuclear Facilities

Kenan, Alperen	University of the West of England
Bremner, Paul	University of the West of England
Giuliani, Manuel	Kempten University of Applied Sciences
Keywords: Robotics in Hazardous Fields, Human-Centered Robotics, Telerobotics and Teleoperation Abstract: Despite the nuclear industry's reliance on advanced robots being operated by humans, much of the existing research overlooks the operator’s perspective in the context of nuclear decommissioning. This study aims to address this gap by identifying the specific needs and requirements of robot operators in nuclear environments. Three focus groups of experienced robot operators from the UK Atomic Energy Authority and Sellafield Ltd. were conducted to explore key themes, including the operator’s role, tasks where robots are employed, and the risks associated with robot use. Findings reveal that: (1) robots in critical tasks are typically controlled by a team of operators; (2) for human-robot interfaces safety and reliability are the most important features, before effectiveness, intuitiveness and task focus; (3) due to high task variety operators see a need for various types of robots; and (4) operator error is regarded as the most significant and unpredictable risk. Based on these insights, a comprehensive set of 10 robot-specific requirements and 10 overall user requirements has been formulated. The paper provides recommendations for robot operators and designers, detailing how these identified requirements can inform the development of future teleoperated robots for nuclear decommissioning tasks.

17:10-17:15, Paper TuDT22.7
Robotic Grasping for Automated Sorting of Complex, Highly Contaminated Industrial Food Waste: A Benchmark Study

Thilakarathna, Moniesha	University of Canberra
Wang, Xing	CSIRO
Asitha, Wijesinghe	University of Peradeniya
Hinwood, David Ryan	University of Canberra
Herath, Damith	University of Canberra
Keywords: Robotics and Automation in Agriculture and Forestry, Performance Evaluation and Benchmarking, Grasping Abstract: Food waste management plays a vital role in maintaining a sustainable ecosystem, however, the presence of inorganic contaminants within food waste significantly hinders this potential. Robotic automation offers a promising solution to accelerate waste sorting, yet the diverse and unpredictable nature of contaminants poses major challenges to robotic perception and grasping. This benchmark study explores the feasibility and limitations of conventional robotic grasping systems, replicating real-world industrial conditions to highlight the complexities of food waste sorting. A comprehensive automated robotic grasping pipeline is introduced, integrating advanced 6D grasping pose detection, collision-free robotic arm motion planning, and effective grasping with three top-performing robotic end-effectors. Extensive experimental evaluations (up to 1500 robotic grasps) compare the performance of different gripper designs and the corresponding grasping strategies under three high-fidelity environmental scenes, providing valuable insights into the limitations of the current robotic system. Experiment results demonstrate the significant strengths of each gripper when dealing with objects of varying types or in different environments. This is critical for enhancing robotic sorting capabilities, particularly in advancing multimodal gripper technology.


TuDT23	102B
Sensor Fusion 4	Regular Session
Chair: Sun, Zhenglong	Chinese University of Hong Kong, Shenzhen
Co-Chair: Zhang, Shengkai	Wuhan University of Technology

16:40-16:45, Paper TuDT23.1
VISC: MmWave Radar Scene Flow Estimation Using Pervasive Visual-Inertial Supervision

Liu, Kezhong	Wuhan University of Technology
Zhou, Yiwen	Wuhan University of Technology
Chen, Mozi	Wuhan University of Technology
He, Jianhua	University of Essex
Xu, Jingao	Carnege Mellon University
Yang, Zheng	Tsinghua University
Lu, Chris Xiaoxuan	University College London
Zhang, Shengkai	Wuhan University of Technology
Keywords: Sensor Fusion, Visual-Inertial SLAM, Multi-Modal Perception for HRI Abstract: This work proposes a mmWave radar's scene flow estimation framework supervised by data from a widespread visual-inertial (VI) sensor suite, allowing crowdsourced training data from smart vehicles. Current scene flow estimation methods for mmWave radar are typically supervised by dense point clouds from 3D LiDARs, which are expensive and not widely available in smart vehicles. While VI data are more accessible, visual images alone cannot capture the 3D motions of moving objects, making it difficult to supervise their scene flow. Moreover, the temporal drift of VI rigid transformation also degenerates the scene flow estimation of static points. To address these challenges, we propose a drift-free rigid transformation estimator that fuses kinematic model-based ego-motions with neural network-learned results. It provides strong supervision signals to radar-based rigid transformation and infers the scene flow of static points. Then, we develop an optical-mmWave supervision extraction module that extracts the supervision signals of radar rigid transformation and scene flow. It strengthens the supervision by learning the scene flow of dynamic points with the joint constraints of optical and mmWave radar measurements. Extensive experiments demonstrate that, in smoke-filled environments, our method even outperforms state-of-the-art (SOTA) approaches using costly LiDARs.

16:45-16:50, Paper TuDT23.2
High-Dynamic Tactile Sensing for Tactile Servo Manipulation: Let Robots Swing a Hammer

Xu, Yingtian	Chinese University of HongKong, Shenzhen
Sun, Zhenglong	Chinese University of Hong Kong, Shenzhen
Wang, Ziya	Shenzhen University
Keywords: Sensor-based Control, Force and Tactile Sensing, Machine Learning for Robot Control Abstract: High-dynamic tactile sensing and tactile servo control present challenges in robustness and real-time performance. This paper proposes a closed-loop tactile servo control strategy for robotic nail hammering, by allowing controlled hammer slide within a rigid robotic 2-finger gripper. The proposed approach detects tactile information of continuous sliding and sliding-induced vibrations in real time and modulates gripping force. The control encourages rotational sliding to enhance impact and reduce recoil while restricting parallel slippage to maintain grip stability. To achieve real-time processing and effective sliding feature extraction, we employ Short-Time Fourier Transform (STFT) and a dual-stream Physics-Informed Machine Learning (PIML) model, processing tactile data at 1 kHz with an average latency of 1.04 ms. Experimental results show that, compared to conventional methods, controlling hammer slippage reduces arm joint recoil by 64.26% (223.30 N → 79.81 N) while increasing hammer impact force by 179.97% (28.06 N → 78.56 N). The method adapts to hammers with varying mass distributions, significantly improving impact resilience and manipulation performance in high-dynamic interactions. These advancements pave the way for more dexterous and robust robotic systems with embodied intelligence.

16:50-16:55, Paper TuDT23.3
Building Hybrid Omnidirectional Visual-Lidar Map for Visual-Only Localization

Huang, Jingyang	Zhejiang University
Wei, Hao	Zhejiang University
Li, Changze	Shanghai Jiao Tong University
Qin, Tong	Shanghai Jiao Tong University
Gao, Fei	Zhejiang University
Yang, Ming	Shanghai Jiao Tong University
Keywords: Sensor Fusion Abstract: Recently, there has been growing interest in using low-cost sensor combinations, such as cameras and IMUs, to achieve accurate localization within pre-built pointcloud maps. In this paper, we proposed a novel hybrid visual-Lidar mapping and visual only re-localization framework, specifically designed for UAVs with limited computational resources operating in challenging environments. Keyframes function as a bridge in our system, associating images with pointcloud to facilitate efficient and accurate pose estimation. Besides, our system creates omnidirectional keyframes at the mapping stage, enabling effective re-localization from any orientation, which enhance the robustness and practicability of our system. Experiments show that the proposed algorithm achieves high localization accuracy on pre-built maps and is capable of running in real-time on UAVs for autonomous navigation tasks. The source code will be made publicly available soon at https://github.com/ jingyang-huang/global_loop.

16:55-17:00, Paper TuDT23.4
Spatiotemporal Motion Prediction of Intraocular Microsurgical Robot in Non-Visible Regions

Deng, Yawen	Beijing Institute of Technology
Li, Zhen	Institute of Automation, Chinese Academy of Sciences
Ye, Qiang	Institute of Automation, Chinese Academy of Sciences
Zhai, Yu-Peng	School of Automation, Beijing Information Science and Technology
Yu, Weihong	Peking Union Medical College Hospital
Yu, Zhangguo	Beijing Institute of Technology
Bian, Gui-Bin	Institute of Automation, Chinese Academy of Sciences
Keywords: Sensor Fusion, Robot Safety, Medical Robots and Systems Abstract: In intraocular microsurgery with minute operational scales, instruments pass through non-visible regions of the anterior segment, where robot-assisted surgery, which heavily relies on visual perception, fails to determine the instrument's attitude relative to the eyeball. This compromises surgical flexibility, increases risks, and hinders autonomous surgery development. Therefore, a framework for predicting instrument trajectories in non-visible regions during robot-assisted microsurgery has been proposed to mitigate the risks of retinal and lens injuries caused by blind operations and enhance surgical procedures' intelligence and autonomy. First, a lightweight reconstruction of the anterior segment environment is performed under controlled knowledge guidance to construct a global map. Second, the tip position of the surgical instrument is detected through multi-sensor fusion, enabling the perception of instrument-environment interactions under visual constraints. Based on this, a long short-term spatiotemporal aggregation algorithm for instrument trajectory prediction is proposed, which enhances surgical safety by providing high-precision predictions of the instrument tip's motion trajectory. Experiments show that the framework achieved a 0.0435 mm average prediction error in non-visible regions, corresponding to 0.03% of the region in a single dimension and 7.25% of the surgical instrument's diameter. This significantly enhances the precision of robot-assisted surgery under visual constraints and provides robust technical support for safe, intelligent, and autonomous intraocular robotic surgery.

17:00-17:05, Paper TuDT23.5
Positioning with Respect to a Cylinder Using Proximity-Based Control

Thomas, John	Institut Pascal
Chaumette, Francois	Inria Center at University of Rennes
Keywords: Sensor-based Control, Robust/Adaptive Control Abstract: In this paper, we present how a proximity array attached as an end-effector can perform a positioning task with respect to a cylinder. We develop the complete modeling of the system from which a classical control law is designed. Positioning with respect to the outside of a cylinder can be used for non-contact inspection of the exterior of pipes in chemical plants or in oil/gas industries, while positioning with respect to the inside of a hollow cylinder can be used for guidance inside a pipe for non-contact inspection. We provide simulation and experimental results to validate the presented theory.

17:05-17:10, Paper TuDT23.6
LPVIMO-SAM: Tightly-Coupled LiDAR/Polarization Vision/Inertial/Magnetometer/Optical Flow Odometry Via Smoothing and Mapping

Shan, Derui	North China University of Technology
Guo, Peng	North China University of Technology
Li, Wenshuo	Beihang University
Du, Tao	North China University of Technology, Beijing
Keywords: Sensor Fusion, SLAM, Visual-Inertial SLAM Abstract: We propose a tightly-coupled LiDAR/Polarization Vision/Inertial/Magnetometer/Optical Flow Odometry via Smoothing and Mapping (LPVIMO-SAM) framework, which integrates LiDAR, polarization vision, inertial measurement unit, magnetometer, and optical flow in a tightly-coupled fusion. It enables high-precision and robust real-time state estimation and map construction in challenging environments, such as LiDAR-degraded, low-texture regions, and feature-scarce areas. The LPVIMO-SAM comprises two subsystems: a polarized vision-inertial system and a LiDAR/Inertial/Magnetometer/Optical Flow System. The polarized vision enhances the robustness of the Visual/Inertial odometry in low-feature and low-texture scenarios by extracting the polarization information of the scene. The magnetometer acquires the heading angle, and the optical flow obtains the speed and height to reduce the accumulated error. A magnetometer heading prior factor, an optical flow speed observation factor, and a height observation factor are designed to eliminate the cumulative errors of the LiDAR/Inertial odometry through factor graph optimization. Meanwhile, the LPVIMO-SAM can maintain stable positioning even when one of the two subsystems fails, further expanding its applicability in LiDAR-degraded,low-texture, and low-feature environments. Our comparative experiments against existing algorithms reveals that achieves a 43.4% reduction in localization root mean square error relative to LVI-SAM, demonstrating significant precision improvements.

17:10-17:15, Paper TuDT23.7
A Multi-Sensor Fusion Approach for Rapid Orthoimage Generation in Large-Scale UAV Mapping

He, Jialei	Nanjing University
Zhan, Zhihao	TopXGun Robotics
Tu, ZhiTuo	Nanjing University
Zhu, Xiang	TopXGun Robotics
Yuan, Jie	Nanjing University
Keywords: Sensor Fusion, Robotics and Automation in Agriculture and Forestry, Aerial Systems: Applications Abstract: Rapid generation of large-scale orthoimages from Unmanned Aerial Vehicles (UAVs) has been a long-standing focus of research in the field of aerial mapping. A multi-sensor UAV system, integrating the Global Positioning System (GPS), Inertial Measurement Unit (IMU), 4D millimeter-wave radar and camera, can provide an effective solution to this problem. In this paper, we utilize multi-sensor data to overcome the limitations of conventional orthoimage generation methods in terms of temporal performance, system robustness, and geographic reference accuracy. A prior-pose-optimized feature matching method is introduced to enhance matching speed and accuracy, reducing the number of required features and providing precise references for the Structure from Motion (SfM) process. The proposed method exhibits robustness in low-texture scenes like farmlands, where feature matching is difficult. Experiments show that our approach achieves accurate feature matching and orthoimage generation in a short time. The proposed drone system effectively aids in farmland management.


TuDT24	102C
Robust/Adaptive Control 2	Regular Session
Chair: Gomes, Luis	Universidade Nova De Lisboa
Co-Chair: Baldi, Simone	Southeast University

16:40-16:45, Paper TuDT24.1
Multi-Query Robotic Manipulator Task Sequencing with Gromov-Hausdorff Approximations

Sukkar, Fouad	University of Technology Sydney
Wakulicz, Jennifer	University of Technology Sydney, Robotics Institute
Lee, Ki Myung Brian	University of California San Diego
Zhi, Weiming	Carnegie Mellon University
Fitch, Robert	University of Technology Sydney
Keywords: Planning, Scheduling and Coordination, Motion and Path Planning, Industrial Robots, Task Sequencing Abstract: Robotic manipulator applications often require efficient online motion planning. When completing multiple tasks, sequence order and choice of goal configuration can have a drastic impact on planning performance. This is well known as the robot task sequencing problem (RTSP). Existing RTSP algorithms are susceptible to producing poor-quality solutions or failing entirely when available computation time is restricted. We propose a new multi-query task sequencing method designed to operate in semi-structured environments with a combination of static and non-static obstacles. Our method trades off workspace generality for planning efficiency. Given a user-defined task space with static obstacles, we compute a subspace decomposition. The key idea is to establish approximate isometries known as Gromov-Hausdorff approximations that identify points that are close to one another in both task and configuration space. We prove bounded suboptimality guarantees on the lengths of paths within these subspaces. These bounding relations imply that paths within the same subspace can be smoothly concatenated, which we show is useful for determining efficient task sequences.

16:45-16:50, Paper TuDT24.2
Adaptive Tracking and Anti-Swing Control of Quadrotors Carrying Suspended Payload under State-Dependent Uncertainty (I)

Dantu, Swati	Czech Technical University
Yadav, Rishabh Dev	The University of Manchester
Rachakonda, Ananth	International Institute of Information Technology Hyderabad
Roy, Spandan	International Institute of Information Technology, Hyderabad (II
Baldi, Simone	Southeast University
Keywords: Robust/Adaptive Control, Aerial Systems: Mechanics and Control, Motion Control Abstract: Transportation of a suspended payload using quadrotor demands a controller to simultaneously track the desired path and stabilize the planar payload swing angles. Such a control objective is made challenging by the need to orchestrate the coupling between fully actuated dynamics (quadrotor attitude) and underactuated dynamics (quadrotor position and planar payload swings). State-of-the-art controllers cannot orchestrate such coupled dynamics in the presence of unmodeled and state-dependent terms, such as aerodynamic drags and rotor downwash. This article solves this control challenge by adaptively estimating the state-dependent uncertainty. Closed-loop stability for the coupled underactuated and fully actuated dynamics is established analytically. Extensive real-time experiments confirm significant improvements over the state-of-the-art under various scenarios, such as path tracking with suspended payload and anti-swing control against externally induced payload swings.

16:50-16:55, Paper TuDT24.3
Modular Adaptive Aerial Manipulation under Unknown Dynamic Coupling Forces (I)

Yadav, Rishabh Dev	The University of Manchester
Dantu, Swati	Czech Technical University
Pan, Wei	The University of Manchester
Sun, Sihao	Delft University of Technology
Roy, Spandan	International Institute of Information Technology, Hyderabad (II
Baldi, Simone	Southeast University
Keywords: Robust/Adaptive Control, Aerial Systems: Mechanics and Control, Mobile Manipulation Abstract: Successful aerial manipulation largely depends on how effectively a controller can tackle the coupling dynamic forces between the aerial vehicle and the manipulator. However, this control problem has remained largely unsolved as the existing control approaches either require precise knowledge of the aerial vehicle/manipulator inertial couplings, or neglect the state-dependent uncertainties especially arising during the interaction phase. This work proposes an adaptive control solution to overcome this long standing control challenge without any a priori knowledge of the coupling dynamic terms. In addition, in contrast to the existing adaptive control solutions, the proposed control framework is modular, that is, it allows independent tuning of the adaptive gains for the vehicle position subdynamics, the vehicle attitude subdynamics, and the manipulator subdynamics. Stability of the closed loop under the proposed scheme is derived analytically, and real-time experiments validate the effectiveness of the proposed scheme over the state-of-the-art approaches.

16:55-17:00, Paper TuDT24.4
Anti-Slip AI-Driven Model-Free Control with Global Exponential Stability in Skid-Steering Robots

Shahna, Mehdi Heydari	Tampere University
Mustalahti, Pauli	Tampere University
Mattila, Jouni	Tampere University
Keywords: Robust/Adaptive Control, Motion Control, Industrial Robots Abstract: Undesired lateral and longitudinal wheel slippage can disrupt a mobile robot's heading angle, traction, and, eventually, desired motion. This issue makes the robotization and accurate modeling of heavy-duty machinery very challenging because the application primarily involves off-road terrains, which are susceptible to uneven motion and severe slippage. As a step toward robotization in skid-steering heavy-duty robot (SSHDR), this paper aims to design an innovative robust model-free control system developed by neural networks to strongly stabilize the robot dynamics in the presence of a broad range of potential wheel slippages. Before the control design, the dynamics of the SSHDR are first investigated by mathematically incorporating slippage effects, assuming that all functional modeling terms of the system are unknown to the control system. Then, a novel tracking control framework to guarantee global exponential stability of the SSHDR is designed as follows: 1) the unknown modeling of wheel dynamics is approximated using radial basis function neural networks (RBFNNs); and 2) a new adaptive law is proposed to compensate for slippage effects and tune the weights of the RBFNNs online during execution. Simulation and experimental results verify the proposed tracking control performance of a 4,836 kg SSHDR operating on slippery terrain.

17:00-17:05, Paper TuDT24.5
Model-Driven Development of Distributed Controllers Using Petri Nets and Low-Code Strategy

Gomes, Luis	Universidade Nova De Lisboa
Costa, Aniko	NOVA University Lisbon
Moutinho, Filipe	NOVA University Lisbon
Pereira, Fernando	ISEL/IPL
Barros, Joao Paulo	Polytechnic Institute of Beja, UNINOVA-CTS, Center of Technology
Keywords: Petri Nets for Automation Control, Embedded Systems for Robotic and Automation, Hardware-Software Integration in Robotics Abstract: This paper presents a model-driven distributed controller development approach, supported by Petri nets modeling that relies on low-code strategies. The proposed approach addresses distributed controller systems having automation and/or cyber-physical systems as targeted application areas. The distributed controller system can be viewed as a globally asynchronous locally synchronous (GALS) system and its development is fully supported by a web-based tool framework offering comprehensive support for integrating hardware-software co-design techniques when mapping components to specific execution platforms. All development phases are supported, starting with the specification by editing the Petri net model and ending up with the deployment to specific implementation platforms, including FPGAs and microcontroller-based ones, adopting a low-code strategy. The framework also supports simulation and behavioral property verification. An example of an automation system application is presented to illustrate the adequacy of the approach.

17:05-17:10, Paper TuDT24.6
Novel Data-Driven Repetitive Motion Control Scheme for Redundant Manipulators with Zeroing Neurodynamics

Yang, Min	Hunan University
Chen, Kaixu	Hunan University
Zhang, Hui	Hunan University
Keywords: Robust/Adaptive Control, Redundant Robots, Motion Control Abstract: Repetitive motion control of redundant manipulators typically requires precise kinematic models to construct Jacobian matrices. However, model-based approaches are inherently limited when manipulator parameters are unavailable or only partially known. This paper introduces a novel datadriven discrete zeroing neurodynamics (DDZN) model for repetitive motion control. Specifically, a Jacobian matrix estimation method based on data-driven technology is proposed, which eliminates the need for prior models by leveraging historical input-output information. By integrating the Jacobian matrix estimation with a discrete zeroing neurodynamics (DZN) model, the approach enables simultaneous trajectory tracking and repeatable configuration recovery without relying on structural parameters. Theoretical analysis verifies the performance of DDZN model under noise environment. Furthermore, abundant experiment results validate its reliability and superior performance compared with various models.

17:10-17:15, Paper TuDT24.7
Sequen-Sync Contact Force/Torque Control Using Nested Fast Terminal Sliding Mode Control Approach

Xu, Yilan	National University of Singapore
Liang, Wenyu	Institute for Infocomm Research, A*STAR
Xue, Junyuan	National University of Singapore
Wu, Yan	A*STAR Institute for Infocomm Research
Lee, Tong Heng	National University of Singapore
Keywords: Robust/Adaptive Control, Force Control Abstract: As one of the most fundamental control modes in robotics, force/torque (F/T) control plays an essential role in a wide range of applications. However, classical F/T control fails to offer effective means to regulate the convergence sequence of the controlled states, which is beneficial in many real-world tasks, e.g., the unknown surface contact, where the force should preferably converge later than the alignment angles to ensure a sufficient contact and avoid dangerous misalignment. In this work, a novel nested fast terminal sliding mode control approach is proposed. This approach establishes a hierarchical structure for the controlled states, such that the stabilities of controlled states can be achieved in both a sequential and a time-synchronized manner within finite time, which is defined as `Sequen-Sync'. Extensive experiments are conducted for various tasks in different environments. The experimental results show that the proposed approach successfully achieves Sequen-Sync stability, which leads to improved contact quality and enhanced safety.

17:15-17:20, Paper TuDT24.8
Output Consensus of Multi-Agent Systems with Switching Networks and Incomplete Leader Measurement (I)

Sun, Jian	Dalian Minzu University
Wang, Fuhao	Dalian Minzu University
Zhang, Jianxin	Dalian Minzu University
Liu, Lei	Liaoning University of Technology
Shan, Qihe	Dalian Maritime University
Keywords: Robust/Adaptive Control Abstract: This paper investigates an output consensus control problem for heterogeneous multi-agent systems with switching disconnected networks. As compared to similar works, each follower can measure only part information of the leader’s output in this paper, which lightens the measurement burden of simple agents when the dimension of leader's output is large-scaled.In this case, due to the coexistence of incomplete measurements of leader's output and disconnected networks, the outputs of some agents can deviate from the leader though there exists the observer-based control on them. In order to overcome this difficulty, we utilize the theory of switching unstable systems and propose a novel segmented time unit method. With the aid of this method, the switching intervals are segmented into some time units. Then by analyzing the cooperative control rule within the time units, the stabi-lizing characteristics of switching behaviors can be obtained to offset the divergence during the switching intervals. On this basis, a novel segmented time-varying Lyapunov function is developed to analyze the error states and sufficient cri-teria for the output consensus are derived.At last,a numerical simulation is shown to verify the theoretical results.


TuDT25	103A
Soft Robot Materials and Applications	Regular Session
Chair: Guan, Qinghua	EPFL
Co-Chair: Hughes, Josie	EPFL

16:40-16:45, Paper TuDT25.1
Control the Soft Robot Arm with Its Physical Twin

Guan, Qinghua	EPFL
Cheng, Hung Hon	EPFL
Dai, Benhui	EPFL
Hughes, Josie	EPFL
Keywords: Soft Robot Materials and Design, Physical Human-Robot Interaction, Soft Robot Applications Abstract: To exploit the compliant capabilities of soft robot arms we require controller which can exploit their physical capabilities. Teleoperation, leveraging a human in the loop, is a key step towards achieving more complex control strategies. Whilst teleoperation is widely used for rigid robots, for soft robots we require teleoperation methods where the configuration of the whole body is considered. We propose a method of using an identical 'physical twin', or demonstrator of the robot. This tendon robot can be back-driven, with the tendon lengths providing configuration perception, and enabling a direct mapping of tendon lengths for the execture. We demonstrate how this teleoperation across the entire configuration of the robot enables complex interactions with exploit the envrionment, such as squeezing into gaps. We also show how this method can generalize to robots which are a larger scale that the physical twin, and how, tuneability of the stiffness properties of the physical twin simplify its use.

16:45-16:50, Paper TuDT25.2
Development of Wearable Assistive Robots Using Artificial Muscle for Older Adults

Zhao, Yafei	The University of Hong Kong
Ma, Xin	The University of Hong Kong
Zhang, Qingqing	The University of Hong Kong
Zhou, Changqiu	The University of Hong Kong
Ling, Zi-qin	Shenzhen University
Yuan, Wenbo	The University of Hong Kong
Zou, Kehan	The University of Hong Kong
Hong, Jie	The University of Hong Kong
Chen, Jiangcheng	The University of Hong Kong
Lou, Vivian Weiqun	The University of Hong Kong
Xi, Ning	The University of Hong Kong
Keywords: Wearable Robotics, Body Balancing, Soft Robot Applications Abstract: Age-related sarcopenia—a progressive decline in muscle mass and strength—compromises postural stability in older adults, underscoring the urgent need for effective assistive robotic solutions. However, existing assistive technologies often suffer from mechanical, kinematic, and control incompatibilities with the human body. In response, we present a wearable assistive robotic system that integrates muscle-like actuation and control algorithms to directly compensate for impaired muscle force generation at the physiological level. To address mechanical incompatibility, the system employs artificial muscle actuators that replicate the contraction dynamics of human muscles. By mimicking the natural direction of muscle force generation and the anatomical anchor points of muscle attachment, it also resolves kinematic incompatibility. Furthermore, the system uses real-time electromyography (EMG) signals for neuromuscular sensing and implements a muscle-like recruitment and rate coding algorithm to control twisted fiber soft actuators, thereby addressing control incompatibility through human-in-the-loop signal integration. A comprehensive evaluation using the Functional Reach Test (FRT) with ten older adults demonstrated a 16.6 average increase in total ankle joint torque and a statistically significant improvement in forward reach distance (from 66.4 ± 17.4 cm in the OFF condition to 77.2 ± 16.6 cm in the ON condition). These findings highlight the system’s potential to mitigate age-related muscle decline and establish a novel muscle-like actuation–motion–control paradigm for wearable assistive robotics.

16:50-16:55, Paper TuDT25.3
Active Prostate Phantom with Multiple Chambers

Tian, Sizhe	Inria, Université De Lille
Adagolodjo, Yinoussa	INRIA France
Dequidt, Jeremie	University of Lille 1
Keywords: Soft Robot Applications, Medical Robots and Systems, Simulation and Animation Abstract: Prostate cancer is a major global health concern, requiring advancements in robotic surgery and diagnostics to improve patient outcomes. A phantom is a specially designed object that simulates human tissues or organs. It can be used for calibrating and testing a medical process, as well as for training and research purposes. Existing prostate phantoms fail to simulate dynamic scenarios. This paper presents a pneumatically actuated prostate phantom with multiple independently controlled chambers, allowing for precise volumetric adjustments to replicate asymmetric and symmetric benign prostatic hyperplasia (BPH). The phantom is designed based on shape analysis of magnetic resonance imaging (MRI) datasets, modeled with finite element method (FEM), and validated through 3D reconstruction. The simulation results showed strong agreement with physical measurements, achieving average errors of 3.47% in forward modeling and 1.41% in inverse modeling. These results demonstrate the phantom’s potential as a platform for validating robotic-assisted systems and for further development toward realistic simulation-based medical training.

16:55-17:00, Paper TuDT25.4
Data-Driven Fault Detection for Wafer Scanner Cable Slabs Using Koopman Operators

Pumphrey, Michael Joseph	University of Guelph
Al Saaideh, Mohammad	Memorial University of Newfoundland
Al-Rawashdeh, Yazan	Al-Zaytoonah University of Jordan
Alatawneh, Natheer	University of Guelph
Aljanaideh, Khaled	Jordan University of Science and Technology
Boker, Almuatazbellah	Virginia Tech
Al Janaideh, Mohammad	University of Guelph
Keywords: Soft Robot Applications Abstract: The reliability of precision motion systems, such as semiconductor wafer scanners, is often influenced by nonlinear dynamics originating from components such as cable slabs. This paper introduces a data-driven framework for early fault diagnosis in these systems. Koopman operator theory is employed to derive a linear state-space model from experimental data, capturing the complex, hysteretic behavior of the cable slab. This model serves as a digital twin, and by comparing its predictions with real-time sensor measurements, operational anomalies can be detected. A systematic process for selecting observable functions yields a high-fidelity model with a tracking error of approximately ±1% across the operational range. When the proposed approach is tested against a state-of-the-art neural network model, it demonstrates a 75.4% reduction in reaction force prediction error. The framework successfully identifies an injected sensor noise fault (SNR of 20) in just 0.35 s using only force data, validating its potential to improve wafer scanner reliability.

17:00-17:05, Paper TuDT25.5
A Self-Sensing Phase-Change Buoyancy System for Miniaturized Deep-Sea Robotics

Zuo, Zonghao	Beihang University
He, Xia	Beihang University
Wang, Haoxuan	Beihang University
Zhang, Qiyi	Beihang University
Shao, Zhuyin	Beihang University
Wen, Li	Beihang University
Keywords: Soft Sensors and Actuators, Marine Robotics, Soft Robot Materials and Design Abstract: Buoyancy systems play a vital role in the efficient movement and control of deep-sea robots. Traditional buoyancy systems for these robots rely on high-pressure hydraulic pumps and heavy, pressure-resistant shells, leading to notable increases in size, mass, and cost. Consequently, there is an urgent need for a lightweight, high-pressure-resistant, self-sensing buoyancy adjustment device that can accommodate the multi-modal movement requirements of miniaturized deep-sea robots. Drawing inspiration from the buoyancy regulation mechanism of sperm whales, we have developed a self-sensing phase-change buoyancy regulation system tailored for miniaturized robots, designed to operate in extreme deep-sea environments. Each module weighs only 35g while providing 4g of buoyancy adjustment. Experimental results demonstrate that the system maintains stable operation under hydrostatic pressures of 30MPa. When integrated into a miniaturized deep-sea robot, this module successfully enables controlled buoyancy for multi-modal movement in aquatic environments.

17:05-17:10, Paper TuDT25.6
Soft Actuators with Integrated Electrohydrodynamic Pump and Intrinsic Electroadhesion

Sato, Yuki	The University of Electro-Communications
Shibahara, Yuya	The University of Electro-Communications
Shintake, Jun	The University of Electro-Communications
Keywords: Soft Sensors and Actuators, Soft Robot Applications, Soft Robot Materials and Design Abstract: Electrohydrodynamic (EHD) pumps made from flexible or stretchable materials are a promising pumping element for fluidically driven soft robots. In most soft robotic systems, EHD pumps are used separately or connected in series with their target components, such as actuators, which can limit design flexibility and complicate implementation. To address this issue, this paper presents an EHD soft actuator that integrates a pump, actuator, and reservoir into a single device. In this design, the EHD pump is implemented as a flexible PCB, which also serves as a strain-limiting layer, enhancing bending actuation. Additionally, the interdigitated electrodes on the flexible PCB generate fringe electric fields, introducing electroadhesion as an unprecedented functionality for EHD-driven soft actuators. Experimental results from the fabricated actuators demonstrate voltage-controllable actuation, achieving a maximum bending angle of 56.0° and a force of 31.0 mN. The actuators are then incorporated into a soft gripper, where electroadhesion enhances the holding force, with a 1.3× increase for a dielectric object and a 2.9× increase for a conductive object. These enhancements are observed in comparison to a control experiment in which gripping is performed using non-electric, fluidic actuation alone. The results validate the successful implementation of the highly integrated, multifunctional EHD soft actuator, highlighting its potential for soft robotic applications.

17:10-17:15, Paper TuDT25.7
Design and Development of a Deformable Spherical Robot for Amphibious Applications

Xu, Le	Harbin Institute of Technology
Ren, Ruoyu	Harbin Institute of Technology
Wei, Xiaojie	Harbin Institute of Technology
Lee, Hao	Harbin Institute of Technology
Zhang, HanK	Harbin Institute of Technology
Yu, Kaicheng	Harbin Institute of Technology
Li, Ye	Harbin Institute of Technology
Liu, Chenyu	Harbin Institute of Technology
Gao, Siyu	Harbin Institute of Technology
Lu, Lihua	Harbin Institute of Technology
Keywords: Soft Robot Applications, Marine Robotics, Dynamics Abstract: This paper presents a deformable spherical robot based on a six-strut topological structure, capable of achieving multimodal locomotion in complex amphibious environments. The robot realizes isotropic rolling and asymmetric jumping through its innovative geometric configuration, while integrating an airbag-driven module for underwater buoyancy control. Based on collision dynamics analysis, a prototype of the deformable spherical robot is developed. Experiments conducted on land, in transition zones, and underwater validated the feasibility of the robot's multimodal locomotion in multi-medium environments.


TuDT26	103B
Model Learning for Control	Regular Session
Chair: Li, Shihua	Southeast University
Co-Chair: Chen, Wen-Hua	Loughborough University

16:40-16:45, Paper TuDT26.1
Context-Aware Deep Lagrangian Networks for Model Predictive Control

Schulze, Lucas	Technische Universität Darmstadt
Peters, Jan	Technische Universität Darmstadt
Arenz, Oleg	TU Darmstadt
Keywords: Model Learning for Control, Optimization and Optimal Control, Machine Learning for Robot Control Abstract: Controlling a robot based on physics-consistent dynamic models, such as Deep Lagrangian Networks (DeLaN), can improve the generalizability and interpretability of the resulting behavior. However, in complex environments, the number of objects to potentially interact with is vast, and their physical properties are often uncertain. This complexity makes it infeasible to employ a single global model. Therefore, we need to resort to online system identification of context-aware models that capture only the currently relevant aspects of the environment. While physical principles such as the conservation of energy may not hold across varying contexts, ensuring physical plausibility for any individual context-aware model can still be highly desirable, particularly when using it for receding horizon control methods such as model predictive control (MPC). Hence, in this work, we extend DeLaN to make it context-aware, combine it with a recurrent network for online system identification, and integrate it with an MPC for adaptive, physics-consistent control. We also combine DeLaN with a residual dynamics model to leverage the fact that a nominal model of the robot is typically available. We evaluate our method on a 7-DOF robot arm for trajectory tracking under varying loads. Our method reduces the end-effector tracking error by 39%, compared to a 21% improvement achieved by a baseline that uses an extended Kalman filter.

16:45-16:50, Paper TuDT26.2
ICODE: Modeling Dynamical Systems with Extrinsic Input Information (I)

Li, Zhaoyi	Southeast University
Mei, Wenjie	Southeast University
Yu, Ke	Southeast University
Bai, Yang	Hiroshima University
Li, Shihua	Southeast University
Keywords: Model Learning for Control, Dynamics, AI-Based Methods Abstract: Learning models of dynamical systems with external inputs, which may be, for example, nonsmooth or piecewise, is crucial for studying complex phenomena and predicting future state evolution, which is essential for applications such as safety guarantees and decision-making. In this work, we introduce Input Concomitant Neural ODEs (ICODEs), which incorporate precise real-time input information into the learning process of the models, rather than treating the inputs as hidden parameters to be learned. The sufficient conditions to ensure the model's contraction property are provided to guarantee that system trajectories of the trained model converge to a fixed point, regardless of initial conditions across different training processes. We validate our method through experiments on several representative real dynamics: Single-link robot, DC-to-DC converter, motion dynamics of a rigid body, Rabinovich-Fabrikant equation, Glycolytic-glycogenolytic pathway model, and heat conduction equation. The experimental results demonstrate that our proposed ICODEs efficiently learn the ground truth systems, achieving superior prediction performance under both typical and atypical inputs. This work offers a valuable class of neural ODE models for understanding physical systems with explicit external input information, with potentially promising applications in fields such as physics and robotics. Our code is available online at https://github.com/EEE-ai59/ICODE.git.

16:50-16:55, Paper TuDT26.3
Cutting Sequence Diffuser: Sim-To-Real Transferable Planning for Object Shaping by Grinding

Hachimine, Takumi	Nara Institute of Science and Technology
Morimoto, Jun	Kyoto University
Matsubara, Takamitsu	Nara Institute of Science and Technology
Keywords: Model Learning for Control, Manipulation Planning, Machine Learning for Robot Control Abstract: Automating object shaping by grinding with a robot is a crucial industrial process that involves removing material with a rotating grinding belt. This process generates removal resistance depending on such process conditions as material type, removal volume, and robot grinding posture, all of which complicate the analytical modeling of shape transitions. Additionally, a data-driven approach based on real-world data is challenging due to high data collection costs and the irreversible nature of the process. This paper proposes a Cutting Sequence Diffuser (CSD) for object shaping by grinding. The CSD, which only requires simple simulation data for model learning, offers an efficient way to plan long-horizon action sequences transferable to the real world. Our method designs a smooth action space with constrained small removal volumes to suppress the complexity of the shape transitions caused by removal resistance, thus reducing the reality gap in simulations. Moreover, by using a diffusion model to generate long-horizon action sequences, our approach reduces the planning time and allows for grinding the target shape while adhering to the constraints of a small removal volume per step. Through evaluations in both simulation and real robot experiments, we confirmed that our CSD was effective for grinding to different materials and various target shapes in a short time.

16:55-17:00, Paper TuDT26.4
The Input-Mapping-Based Online Learning Sliding Mode Control Strategy with Low Computational Complexity (I)

Yu, Yaru	Shanghai Jiao Tong University
Ma, Aoyun	Shanghai Jiao Tong University
Li, Dewei	Shanghai Jiao Tong University
Xi, Yugeng	Shanghai Jiao Tong University
Gao, Furong	The Hong Kong University of Science and Technology
Keywords: Model Learning for Control, Optimization and Optimal Control Abstract: The data-driven sliding mode control (SMC) method proves to be highly effective in addressing uncertainties and enhancing system performance. In our previous work, we implemented a co-design approach based on an input-mapping data-driven technique, which effectively improves the convergence rate through historical data compensation. However, this approach increases computational complexity in multi-input and multi-output (MIMO) systems due to the dependency of the number of online optimization variables on system dimensions. To improve applicability, this paper introduces a novel input-mapping based online learning SMC strategy with low computational complexity. First, a new sliding mode surface is established through online convex combination of pre-designed offline surfaces. Then, an input-mapping-based online learning sliding mode control (IML-SMC) strategy is designed, utilizing a reaching law with adaptively adjusted convergence and switching coefficients to minimize chattering. The input-mapping technique employs the mapping relationship between historical input and output data for predicting future system dynamics. Accordingly, an optimization problem is formulated to learn from the past dynamics of the uncertain system online, thereby enhancing system performance. The optimization problem in this paper features fewer variables and is independent of system dimension. Additionally, the stability of the proposed method is theoretically validated, and the advantages are demonstrated through a MIMO system.

17:00-17:05, Paper TuDT26.5
Dual Control of Exploration and Exploitation for Auto-Optimization Control with Active Learning (I)

Li, Zhongguo	University of Manchester
Chen, Wen-Hua	Loughborough University
Yang, Jun	Loughborough University
Yan, Yunda	University College London
Keywords: Model Learning for Control, Robotics in Hazardous Fields, Autonomous Agents Abstract: The quest for optimal operation in environments with unknowns and uncertainties is highly desirable but critically challenging across numerous fields. This paper develops a dual control framework for exploration and exploitation (DCEE) to solve an auto-optimization problem in such complex settings. In general, there is a fundamental conflict between tracking an unknown optimal operational condition and parameter identification. The DCEE framework stands out by eliminating the need for additional perturbation signals, a common requirement in existing adaptive control methods. Instead, it inherently incorporates an exploration mechanism, actively probing the uncertain environment to diminish belief uncertainty. An ensemble based multi-estimator approach is developed to learn the environmental parameters and in the meanwhile quantify the estimation uncertainty in real time. The control action is devised with dual effects, which not only minimizes the tracking error between the current state and the believed unknown optimal operational condition but also reduces belief uncertainty by proactively exploring the environment. Formal properties of the proposed DCEE framework like convergence are established. A numerical example is used to validate the effectiveness of the proposed DCEE. Simulation results for maximum power point tracking are provided to further demonstrate the potential of this new framework in real world applications.

17:05-17:10, Paper TuDT26.6
Adaptive Control Based Friction Estimation for Tracking Control of Robot Manipulators

Huang, Junning	Intelligent Autonomous Systems
Tateo, Davide	Technische Universität Darmstadt
Liu, Puze	Technische Universität Darmstadt
Peters, Jan	Technische Universität Darmstadt
Keywords: Model Learning for Control, Robust/Adaptive Control, Calibration and Identification Abstract: Adaptive control is often used for friction compensation in trajectory tracking tasks because it does not require torque sensors. However, it has some drawbacks: first, the most common certainty-equivalence adaptive control design is based on linearized parameterization of the friction model, therefore nonlinear effects, including the stiction and Stribeck effect, are usually omitted. Second, the adaptive control-based estimation can be biased due to non-zero steady-state error. Third, neglecting unknown model mismatch could result in non-robust estimation. This paper proposes a novel linear parameterized friction model capturing the nonlinear static friction phenomenon. Subsequently, an adaptive control-based friction estimator is proposed to reduce the bias during estimation based on backstepping. Finally, we propose an algorithm to generate excitation for robust estimation. Using a KUKA iiwa 14, we conducted trajectory tracking experiments to evaluate the estimated friction model, including random Fourier and drawing trajectories, showing the effectiveness of our methodology in different control schemes.


TuDT27	103C
Parallel and Redundant Robots 2	Regular Session
Chair: Xie, Biyun	University of Kentucky
Co-Chair: Sun, Yongjun	Harbin Institute of Technology

16:40-16:45, Paper TuDT27.1
A Learning Based Method for Computing Self-Motion Manifolds of Redundant Robots for Real-Time Fault-Tolerant Motion Planning

Clark, Landon	University of Kentucky
Xie, Biyun	University of Kentucky
Keywords: Redundant Robots, Kinematics, Motion and Path Planning, Fault tolerance Abstract: The focus of this research is to develop a learning-based method that computes self-motion manifolds (SMMs) efficiently and accurately to enable real-time global fault-tolerant motion planning. The proposed method first develops a learnable, closed-form representation of SMMs based on Fourier series. A cellular automaton is then applied to cluster workspace locations having the same number of SMMs and group SMMs with similar shape by homotopy classes, such that the SMMs of each homotopy class can be accurately learned by a neural network. To approximate the SMMs of an arbitrary workspace location, a neural network is first trained to predict the set of homotopy classes belonging to this workspace location. For each set of homotopy classes, another neural network is trained to approximate the Fourier series coefficients of the SMMs, and the joint configurations along the SMMs can be retrieved using the inverse Fourier transform. The proposed method is validated on planar 3R positioning, spatial 4R positioning, and spatial 7R positioning and orienting robots, using 10,000 randomly sampled workspace locations each.

16:45-16:50, Paper TuDT27.2
CaRoSaC: A Reinforcement Learning-Based Kinematic Control of Cable Driven Parallel Robots by Addressing Cable Sag through Simulation

Dhakate, Rohit	University of Klagenfurt
Jantos, Thomas	University of Klagenfurt
Allak, Eren	University of Klagenfurt
Weiss, Stephan	Universität Klagenfurt
Steinbrener, Jan	Universität Klagenfurt
Keywords: Parallel Robots, Reinforcement Learning, Model Learning for Control Abstract: This paper introduces the Cable Robot Simulation and Control (CaRoSaC) Framework, which integrates a realistic simulation environment with a model-free reinforcement learning control methodology for suspended Cable-Driven Parallel Robots (CDPRs), accounting for the effects of cable sag. Our approach seeks to bridge the knowledge gap of the intricacies of CDPRs due to aspects such as cable sag and precision control necessities, which are missing in existing research and often overlooked in traditional models, by establishing a simulation platform that captures the real-world behaviors of CDPRs, including the impacts of cable sag. The framework offers researchers and developers a tool to further develop estimation and control strategies within the simulation for understanding and predicting the performance nuances, especially in complex operations where cable sag can be significant. Using this simulation framework, we train a model-free control policy rooted in Reinforcement Learning (RL). This approach is chosen for its capability to adaptively learn from the complex dynamics of CDPRs. The policy is trained to discern optimal cable control inputs, ensuring precise end-effector positioning. Unlike traditional feedback-based control methods, our RL control policy focuses on kinematic control and addresses the cable sag issues without being tethered to predefined mathematical models. We also demonstrate that our RL-based controller, coupled with the flexible cable simulation, significantly outperforms the classical kinematics approach, particularly in dynamic conditions and near the boundary regions of the workspace. The combined strength of the described simulation and control approach offers an effective solution in manipulating suspended CDPRs even at workspace boundary conditions where traditional approach fails, as proven from our experiments, ensuring that CDPRs function optimally in various applications while accounting for the often neglected but critical factor of cable sag.

16:50-16:55, Paper TuDT27.3
Autonomous Obstacle Avoidance for a Snake Robot with Surface Pressure Sensing

Sun, Yongjun	Harbin Institute of Technology
Xue, Zhao	Harbin Institute of Technology
Bao, Liming	Harbin Institute of Technology
Liu, Hong	Harbin Institute of Technology
Keywords: Redundant Robots, Force and Tactile Sensing, Motion Control Abstract: A sixteen-joint snake robot with full-body surface pressure sensing capabilities has been developed. A total of 64 thin film pressure sensors are evenly distributed on the surface of the robot. Four intelligent obstacle avoidance movements integrating surface pressure perception were investigated. They are as follows: the roll-over obstacle avoidance motion capable of autonomously switching between the regular rolling gait and the hump rolling gait, the autonomous crawling obstacle avoidance motion under unknown obstacle parameters, the intelligent winding and climbing motion on horizontal pipes with either unknown diameters or those with variable diameters, and the gap-crossing motion that can autonomously detect the gap position and cross over horizontal pipes with gaps. Finally, experiments were conducted in different scenarios to verify the feasibility of these four intelligent motions.

16:55-17:00, Paper TuDT27.4
An Online Reconfiguration Strategy of the Cable-Driven Parallel Robot for pHRI Via APF-Adjusted Linear Approximation

Li, Gengxi	University of Science and Technology of China
Zhang, Bin	University of Science and Technology of China
Shang, Weiwei	University of Science and Technology of China
Keywords: Parallel Robots, Physical Human-Robot Interaction Abstract: The simple and modular structure of cable-driven parallel robots (CDPRs) can enable effective real-time reconfiguration. In this paper, an online reconfiguration strategy is proposed for a 3-DOF point-mass CDPR to adjust the cable anchor positions and enhance its performance in physical human-robot interaction (pHRI). The reconfiguration problem, inclusive of all relevant constraints such as the wrench feasible condition (WFC) and the structural constraint on the cable anchors, is formulated as a non-convex optimization problem to determine the optimal positions of cable anchors. However, such original formulation poses a serious challenge to real-time determination, primarily due to the non-convex constraint imposed by the WFC and the non-convex objective function. To address this issue, the characteristics of the CDPR are considered, and a linear approximation method is employed to simplify the original optimization problem into a linear one, allowing it to be efficiently solved by the dual simplex method. Additionally, an artificial potential field (APF) is designed, considering both the inherent workspace properties and the interaction force, to adjust the solution of the linear optimization problem, which ensures that the optimal solution remains within a safe distance from the boundary of the solution space. Simulations validate the effectiveness of the strategy in improving the interaction metric while satisfying constraints.

17:00-17:05, Paper TuDT27.5
Efficient Hitting with Different Links of a Redundant Robotic Manipulator

Khurana, Harshit	EPFL
Billard, Aude	EPFL
Keywords: Redundant Robots, Manipulation Planning, Machine Learning for Robot Control Abstract: This paper builds up the skill of impact aware non prehensile manipulation through a hitting motion of a redundant robot arm by allowing it to come in contact with the environment with the appropriate link according to the requirements of the hitting task. In tasks where directional effective inertia of a robot is important at the contact point, it is useful to understand inertia at different links, so as to select the appropriate link. Hitting with those links allows us to manipulate a wider range of object masses since the robot effective inertia is different at different links. We propose a learning based methodology for selecting a hitting link based on the hitting task specifications, impact posture generation for the robot and an automated generation of desired directional inertia values throughout the hitting motion.

17:05-17:10, Paper TuDT27.6
Optimal Control of Walkers with Parallel Actuation

De Matteïs, Ludovic	LAAS-CNRS
Batto, Virgile	LAAS-CNRS
Carpentier, Justin	INRIA
Mansard, Nicolas	CNRS
Keywords: Parallel Robots, Optimization and Optimal Control, Contact Modeling Abstract: Legged robots with closed-loop kinematic chains are increasingly prevalent due to their increased mobility and efficiency. Yet, most motion generation methods rely on serial-chain approximations, sidestepping their specific constraints and dynamics. This leads to suboptimal motions and limits the adaptability of these methods to diverse kinematic structures. We propose a comprehensive motion generation method that explicitly incorporates closed-loop kinematics and their associated constraints in an optimal control problem (OCP), integrating kinematic closure conditions and their analytical derivatives. This allows the solver to leverage the non-linear transmission effects inherent to closed-chain mechanisms, reducing peak actuator efforts and expanding their effective operating range. Unlike previous methods, our framework does not require serial approximations, enabling more accurate and efficient motion strategies. We also are able to generate the motion of more complex robots for which an approximate serial chain does not exist. We validate our approach through simulations and experiments, demonstrating superior performance in complex tasks such as rapid locomotion and stair negotiation. This method enhances the capabilities of current closed-loop robots and broadens the design space for future kinematic architectures.


TuDT28	104
Marine Robotics 4	Regular Session
Chair: Furukawa, Tomonari	University of Virginia
Co-Chair: Zhang, Fumin	Hong Kong University of Science and Technology

16:40-16:45, Paper TuDT28.1
Localization of an Unmanned Underwater Vehicle Using a Tethered Cooperative Surface Vehicle and Hybrid EKF/Grid-Based Method

Oxford, A. Malori	University of Virginia
Vu, Nathan	University of Virginia
Furukawa, Tomonari	University of Virginia
Englot, Brendan	Stevens Institute of Technology
Keywords: Marine Robotics, Localization, Cooperating Robots Abstract: This paper presents an approach for the localization of an Unmanned Underwater Vehicle (UUV) in a cooperative team with a tethered Unmanned Surface Vehicle (USV). For the localization, the UUV and the USV carry a camera and a sonar respectively to observe each other. The vehicle states are split between Extended Kalman Filter (EKF) and grid-based estimators based on which sensors provide Gaussian or non-Gaussian observations of each state. Specifically, the horizontal position of the UUV is estimated using a grid-based method because the camera and sonar that observe these states provide non-Gaussian observations when they cannot detect their target. Additionally, the tether to the USV is treated as a non-Gaussian observation that prevents unbounded error growth. Validation of the technique was performed in simulations using sensor models developed based on testing in a lake and pool.

16:45-16:50, Paper TuDT28.2
BESTAnP: Bi-Step Efficient and Statistically Optimal Estimator for Acoustic-N-Point Problem

Sheng, Wenliang	East China University of Science and Technology
Zhao, Hongxu	The Chinese University of Hong Kong, Shenzhen
Chen, Lingpeng	Chinese University of Hong Kong, Shenzhen
Zeng, Guangyang	The Chinese University of Hong Kong, Shenzhen
Shao, Yunling	The Chinese University of Hong Kong, Shenzhen
Hong, Yuze	The Chinese University of Hong Kong，Shenzhen
Yang, Chao	East China University of Science and Technology
Hong, Ziyang	Heriot-Watt University
Wu, Junfeng	The Chinese Unviersity of Hong Kong, Shenzhen
Keywords: Marine Robotics, Localization, Optimization and Optimal Control Abstract: We consider the acoustic-n-point (AnP) problem, which estimates the pose of a 2D forward-looking sonar (FLS) according to n 3D-2D point correspondences. We explore the nature of the measured partial spherical coordinates and reveal their inherent relationships to translation and orientation. Based on this, we propose a bi-step efficient and statistically optimal AnP (BESTAnP) algorithm that decouples the estimation of translation and orientation. Specifically, in the first step, the translation estimation is formulated as the range-based localization problem based on distance-only measurements. In the second step, the rotation is estimated via eigendecomposition based on azimuth-only measurements and the estimated translation. BESTAnP is the first AnP algorithm that gives a closed-form solution for the full 6 degree-of-freedom (DoF) pose. In addition, we conduct bias elimination for BESTAnP such that it owns the statistical property of consistency. Through simulation and real-world experiments, we demonstrate that compared with the state-of-the-art (SOTA) methods, BESTAnP is over ten times faster and features real-time capacity in resource-constrained platforms while exhibiting comparable accuracy. Moreover, we embed BESTAnP into a single sonar-based odometry which shows its effectiveness for trajectory estimation.

16:50-16:55, Paper TuDT28.3
VIMS: A Visual-Inertial-Magnetic-Sonar SLAM System in Underwater Environments

Zhang, Bingbing	Zhejiang University
Yin, Huan	Hong Kong University of Science and Technology
Liu, Shuo	Zhejiang University
Zhang, Fumin	Hong Kong University of Science and Technology
Xu, Wen	Zhejiang University
Keywords: Marine Robotics, Localization, Sensor Fusion Abstract: In this study, we present a novel simultaneous localization and mapping (SLAM) system, VIMS, designed for underwater navigation. Conventional visual-inertial state estimators encounter significant practical challenges in perceptually degraded underwater environments, particularly in scale estimation and loop closing. To address these issues, we first propose leveraging a low-cost single-beam sonar to improve scale estimation. Then, VIMS integrates a high-sampling-rate magnetometer for place recognition by utilizing magnetic signatures generated by an economical magnetic field coil. Building on this, a hierarchical scheme is developed for visual-magnetic place recognition, enabling robust loop closure. Furthermore, VIMS achieves a balance between local feature tracking and descriptor-based loop closing, avoiding additional computational burden on the front end. Experimental results highlight the efficacy of the proposed VIMS, demonstrating significant improvements in both the robustness and accuracy of state estimation within underwater environments.

16:55-17:00, Paper TuDT28.4
Underwater Target Tracking with Unknown Maneuver by Remotely Operated Vehicles: A Digital Twin-Driven Strategy

Zhang, Tianyi	Yanshan University
Yan, Jing	Yanshan University
Yang, Xian	Yanshan University
Chen, Cailian	Shanghai Jiao Tong University
Guan, Xinping	Shanghai Jiaotong University
Keywords: Marine Robotics, Machine Learning for Robot Control, Motion Control Abstract: Underwater target tracking is a critical challenge in marine exploration and defense applications due to the unknown maneuvers of target and the complex marine environment. To overcome the above challenge, this paper develops a digital twin (DT)-driven unknown maneuver target tracking strategy via remotely operated vehicles (ROVs). In order to capture the maneuver characteristics of target, a state prediction-based DT framework is constructed, where the neural network learning strategy is designed to estimate the unknown state transition matrix of target. Based on the predicted target state, a reinforcement learning (RL)-based tracking controller is designed for the virtual ROVs in DT model, such that the optimal tracking policy from DT model can be implemented to physical ROVs. To reduce the matching error between virtual and physical ROVs, an RL-based optimization algorithm is conducted by using the data interaction between DT model and ROVs. Note that the DT-driven target tracking strategy not only can reduce the communication energy consumption by periodically feeding back the real-data of ROVs to the DT model, but also can relax the dependence of target maneuver model via the state prediction method. Finally, experimental results are provided to verify the effectiveness of our strategy.

17:00-17:05, Paper TuDT28.5
From Extended Environment Perception towards Real-Time Dynamic Modeling for Long-Range Underwater Robot

Lei, Lei	City University of Hong Kong
Yu, Zhou	Fuzhou University
Jian-Xing, Zhang	Huazhong University of Technology and Science
Keywords: Marine Robotics, Mechanism Design, Dynamics, Environment Monitoring and Management Abstract: Underwater robots are critical observation platforms for diverse ocean environments. However, existing robotic designs often lack long-range and deep-sea observation capabilities and overlook the effects of environmental uncertainties on robotic operations. This paper presents a novel long-range underwater robot for extreme ocean environments, featuring a low-power dual-circuit buoyancy adjustment system, an efficient mass-based attitude adjustment system, flying wings, and an open sensor cabin. After that, an extended environment perception strategy with incremental updating is proposed to understand and predict full hydrological dynamics based on sparse observations. On this basis, a real-time dynamic modeling approach integrates multibody dynamics, perceived hydrological dynamics, and environment-robot interactions to provide accurate dynamics predictions and enhance motion efficiency. Extensive simulations and field experiments covering 600 km validated the reliability and autonomy of the robot in long-range ocean observations, highlighting the accuracy of the extended perception and real-time dynamics modeling methods.

17:05-17:10, Paper TuDT28.6
RS-ModCubes: Self-Reconfigurable, Scalable, Modular Cubic Robots for Underwater Operations

Jiaxi, Zheng	Carnegie Mellon University
Dai, Guangmin	Westlake University
He, Botao	University of Maryland
Mu, Zhaoyang	Dalian Maritime University
Meng, Zhaochen	Dalian Maritime University
Zhang, Tianyi	Carnegie Mellon University
Zhi, Weiming	Carnegie Mellon University
Fan, Dixia	Westlake University
Keywords: Marine Robotics, Mechanism Design, Multi-Robot Systems Abstract: This paper introduces a reconfigurable underwater robot system, RS-ModCubes, which allows scalable multi-robot configuration. An RS-ModCubes system consists of multiple ModCube modules, that can travel underwater with 6 DoFs and assemble with each other into a larger structure with onboard electromagnets. This system is designed to support diverse underwater applications through modularity and reconfigurability, eliminating the need to customize mechanical designs for a specific task. We present a modeling framework tailored for such reconfigurable robot systems, with hydrodynamics integrated based on Monte Carlo approximation. A model-based feedforward ac{PD} controller serves as the baseline for control. Inspired by dexterous manipulation, we evaluated the robot’s maximum task wrench space and power efficiency, compared against four commercial underwater robots. RS-ModCubes is validated via both real-world experiments and simulations, including individual and multi-module trajectory tracking and hovering docking. We open-source the design and code to facilitate future research: url{https://jiaxi-zheng.github.io/ModCube.github.io/}.

17:10-17:15, Paper TuDT28.7
Learning Flow-Adaptive Dynamic Model for Robotic Fish Swimming in Unknown Background Flow

Chao, Kaitian	ShanghaiTech University
Lin, Xiaozhu	ShanghaiTech University
Liu, Xiaopei	SHANGHAITECH UNIVERSITY
Wang, Yang	Shanghaitech University
Keywords: Marine Robotics, Model Learning for Control, Biologically-Inspired Robots Abstract: Robotic fish face considerable challenges in nature environment due to the absence of a comprehensive and precise model that can depict the intricate fluid-structure interactions, particularly in the presence of background flows. This paper presents a novel data-driven dynamic modeling framework capable of characterizing the swimming motions of the robotic fish under various background flow conditions without the necessity for explicit flow information. The model is synthesized by an internal model with an adaptive residual acceleration model to effectively isolate and address external flow effects. Notably, the external model employs the innovative Domain Adversarially Invariant Meta-Learning (DAIML) approach, allowing the framework to adapt to fluctuating and previously unseen flow scenarios, enhancing its robustness and scalability. Validation through high-fidelity CFD simulations demonstrates the framework’s effectiveness in improving the performance of robotic fish across diverse real-world aquatic environments.

17:15-17:20, Paper TuDT28.8
Incremental Sparse Gaussian Process-Based Model Predictive Control for Trajectory Tracking of Unmanned Underwater Vehicles

Dang, Yukun	University of Shanghai for Science and Technology
Huang, Yao	University of Shanghai for Science and Technology
Shen, XuYu	University of Shanghai for Science and Technology
Zhu, Daqi	USST
Chu, Zhenzhong	University of Shanghai for Science and Technology
Keywords: Marine Robotics, Model Learning for Control, Motion Control Abstract: In this paper, a Model Predictive Control (MPC) approach based on the Incremental Sparse Gaussian Process (ISGP) is designed for trajectory tracking of Unmanned Underwater Vehicles (UUVs). The performance of MPC depends on the accuracy of system modeling. However, building an accurate dynamic model for the UUV is challenging due to imprecise hydrodynamic coefficients and strong nonlinearities. Thus, the Gaussian Process (GP) is employed to regress the deviating parts of the system model. A sparsification rule is proposed to reduce the training dataset size by removing less valuable data,thereby simplifying the complexity of GP regression training.Additionally, a method for incrementally updating the training data is provided, along with a rigorous stability proof. Finally,simulations are conducted in a third-party ROS environment to demonstrate the efficiency and accuracy of the proposed method.


TuDT29	105
SLAM 4	Regular Session
Chair: Wu, Yihong	National Laboratory of Pattern Recognition, InstituteofAutomation, Chinese Academy of Sciences
Co-Chair: Tang, Chaoquan	China University of Mining and Technology

16:40-16:45, Paper TuDT29.1
EAROL: Environmental Augmented Perception-Aware Planning and Robust Odometry Via Downward-Mounted Tilted LiDAR

Liang, Xinkai	Beijing Institute of Technology
Ge, Yigu	Beijing Institute of Technology
Shi, Yangxi	Beijing Institute of Technology
Yang, Haoyu	Beijing Institute of Technology
Cao, Xu	Beijing Institute of Technology
Fang, Hao	Beijing Institute of Technology
Keywords: SLAM, Motion and Path Planning Abstract: To address the challenges of localization drift and perception-planning coupling in unmanned aerial vehicles (UAVs) operating in open-top scenarios (e.g., collapsed buildings, roofless mazes), this paper proposes EAROL, a novel framework with a downward-mounted tilted LiDAR configuration (20° inclination), integrating a LiDAR-Inertial Odometry (LIO) system and a hierarchical trajectory-yaw optimization algorithm. The hardware innovation enables constraint enhancement via dense ground point cloud acquisition and forward environmental awareness for dynamic obstacle detection. A tightly-coupled LIO system, empowered by an Iterative Error-State Kalman Filter (IESKF) with dynamic motion compensation, achieves high level 6-DoF localization accuracy in feature-sparse environments. The planner, augmented by environment, balancing environmental exploration, target tracking precision, and energy efficiency. Physical experiments demonstrate 81% tracking error reduction, 22% improvement in perceptual coverage, and near-zero vertical drift across indoor maze and 60-meter-scale outdoor scenarios. This work pioneers a hardware-algorithm co-design paradigm, offering a robust solution for UAV autonomy in post-disaster search and rescue missions. We will release our software and hardware as an open-source package for the community. Video: https://youtu.be/7av2ueLSiYw

16:45-16:50, Paper TuDT29.2
Floorplan-SLAM: A Real-Time, High-Accuracy, and Long-Term Multi-Session Point-Plane SLAM for Efficient Floorplan Reconstruction

Wang, Haolin	Institute of Automation, Chinese Academy of Sciences
Lv, Zeren	Beijing University of Chemical Technology
Wei, Hao	University of Chinese Academy of Sciences
Zhu, Haijiang	Beijing University of Chemical Technology
Wu, Yihong	National Laboratory of Pattern Recognition, InstituteofAutomatio
Keywords: SLAM, Mapping Abstract: Floorplan reconstruction provides structural priors essential for reliable indoor robot navigation and high-level scene understanding. However, existing approaches either require time-consuming offline processing with a complete map, or rely on expensive sensors and substantial computational resources. To address the problems, we propose Floorplan-SLAM, which incorporates floorplan reconstruction tightly into a multi-session SLAM system by seamlessly interacting with plane extraction, pose estimation, back-end optimization, and loop & map merging, achieving real-time, high-accuracy, and long-term floorplan reconstruction using only a stereo camera. Specifically, we present a robust plane extraction algorithm that operates in a compact plane parameter space and leverages spatially complementary features to accurately detect planar structures, even in weakly textured scenes. Furthermore, we propose a floorplan reconstruction module tightly coupled with the SLAM system, which uses continuously optimized plane landmarks and poses to formulate and solve a novel optimization problem, thereby enabling real-time and high-accuracy floorplan reconstruction. Note that by leveraging the map merging capability of multi-session SLAM, our method supports long-term floorplan reconstruction across multiple sessions without redundant data collection. Experiments on the VECtor and the self-collected datasets indicate that Floorplan-SLAM significantly outperforms state-of-the-art methods in terms of plane extraction robustness, pose estimation accuracy, and floorplan reconstruction fidelity and speed, achieving real-time performance at 25–45 FPS without GPU acceleration, which reduces the floorplan reconstruction time for a 1000-square-meter scene from 16 hours and 44 minutes to just 9.4 minutes.

16:50-16:55, Paper TuDT29.3
CM-LIUW-Odometry: Robust and High-Precision LiDAR-Inertial-UWB-Wheel Odometry for Extreme Degradation Coal Mine Tunnels

Hu, Kun	China University of Mining and Technology
Li, Menggang	China University of Mining and Technology
Jin, Zhiwen	China University of Mining and Technology
Tang, Chaoquan	China University of Mining and Technology
Hu, Eryi	Information Institute, Ministry of Emergency Management of the P
Zhou, Gongbo	China University of Mining and Technology
Keywords: SLAM, Search and Rescue Robots, Mining Robotics Abstract: Simultaneous Localization and Mapping (SLAM) in large-scale, complex, and GPS-denied underground coal mine environments presents significant challenges. Sensors must contend with abnormal operating conditions: GPS unavailability impedes scene reconstruction and absolute geographic referencing, uneven or slippery terrain degrades wheel odometer accuracy, and long, feature-poor tunnels reduce LiDAR effectiveness. To address these issues, we propose CoalMine-LiDAR-IMU-UWB-Wheel-Odometry (CM-LIUW-Odometry), a multimodal SLAM framework based on the Iterated Error-State Kalman Filter (IESKF). First, LiDAR-inertial odometry is tightly fused with UWB absolute positioning constraints to align the SLAM system with a global coordinate. Next, wheel odometer is integrated through tight coupling, enhanced by nonholonomic constraints (NHC) and vehicle lever arm compensation, to address performance degradation in areas beyond UWB measurement range. Finally, an adaptive motion mode switching mechanism dynamically adjusts the robot’s motion mode based on UWB measurement range and environmental degradation levels. Experimental results validate that our method achieves superior accuracy and robustness in real-world underground coal mine scenarios, outperforming state-of-the-art approaches.

16:55-17:00, Paper TuDT29.4
Task-Driven SLAM Benchmarking for Robot Navigation

Du, Yanwei	Georgia Institute of Technology
Feng, Shiyu	Georgia Institute of Technology
Cort, Carlton	Georgia Institute of Technology
Vela, Patricio	Georgia Institute of Technology
Keywords: SLAM, Performance Evaluation and Benchmarking, Software Tools for Benchmarking and Reproducibility Abstract: A critical use case of SLAM for mobile assistive robots is to support localization during a navigation-based task. Current SLAM benchmarks overlook the significance of repeatability (precision), despite its importance in real-world deployments. To address this gap, we propose a task-driven approach to SLAM benchmarking, TaskSLAM-Bench. It employs precision as a key metric, accounts for SLAM’s mapping capabilities, and has easy-to-meet implementation requirements. Simulated and real-world testing scenarios of SLAM methods provide insights into the navigation performance properties of modern visual and LiDAR SLAM solutions. The outcomes show that passive stereo SLAM operates at a level of precision comparable to LiDAR SLAM in typical indoor environments. TaskSLAM-Bench complements existing benchmarks and offers richer assessment of SLAM performance in navigation-focused scenarios. Publicly available code permits in-situ SLAM testing in custom environments with properly equipped robots.

17:00-17:05, Paper TuDT29.5
Correspondence-Free Multiview Point Cloud Registration Via Depth-Guided Joint Optimisation

Zhou, Yiran	University of Technology Sydney
Wang, Yingyu	University of Technology Sydney
Huang, Shoudong	University of Technology, Sydney
Zhao, Liang	The University of Edinburgh
Keywords: SLAM, RGB-D Perception Abstract: Multiview point cloud registration is a fundamental task for constructing globally consistent 3D models. Existing approaches typically rely on feature extraction and data association across multiple point clouds; however, these processes are challenging to obtain global optimal solution in complex environments. In this paper, we introduce a novel correspondence-free multiview point cloud registration method. Specifically, we represent the global map as a depth map and leverage raw depth information to formulate a non-linear least squares optimisation that jointly estimates poses of point clouds and the global map. Unlike traditional feature-based bundle adjustment methods, which rely on explicit feature extraction and data association, our method bypasses these by associating multi-frame point clouds with a global depth map through their corresponding poses. This data association is implicitly incorporated and dynamically refined during the optimisation process. Extensive evaluations on real-world datasets demonstrate that our method outperforms state-of-the-art approaches in accuracy, particularly in challenging environments where feature extraction and data association are difficult.

17:05-17:10, Paper TuDT29.6
SOLO-SMap: Semantic-Aided Online LiDAR Odometry and 3D Static Mapping for Dynamic Scenes

Li, Ruyi	Nankai University
Zhang, Shiyong	Nankai University
Zhang, Xuebo	Nankai University,
Yuan, Jing	Nankai University
Wang, Youwei	Nankai University
Keywords: SLAM, Mapping Abstract: Accurate and reliable online real-time localization and mapping are crucial for autonomous navigation of robot. Dynamic objects within the perception field can affect the accuracy of registration and localization, and also introduce ghost trail artifacts in the map, hindering robot planning and decision-making. While semantic segmentation can assist in perceiving object categories, it struggles to accurately segment moving objects. In this paper, we present SOLO-SMap, a real-time localization and static map construction framework based solely on LiDAR point cloud. We leverage semantic inference to identify potential dynamic points. And then, our instance-level true dynamic points removal is achieved by utilizing geometric rules based on moving point occlusion relationships and multi-object tracking (MOT) within a nearby temporal window in the pre-alignment stage. This design preserves stable static constraints while adhering to the static world model assumption of SLAM systems, benefiting accuracy and reducing drift, particularly in busy intersections. We evaluated the performance of SOLO-SMap in dynamic scenes on KITTI datasets and our self-made datasets, and conducted a comprehensive comparison with other methods, validating the effectiveness and robustness of the proposed method. A supplementary video can be accessed at https://www.youtube.com/watch?v=kiXVf_bvRBg.

17:10-17:15, Paper TuDT29.7
Rejecting Outliers in 2D-3D Point Correspondences from 2D Forward-Looking Sonar Observations

Su, Jiayi	The Hong Kong University of Science and Technology (Guangzhou)
Zou, Shaofeng	Tsinghua University
Qian, Jingyu	The Hong Kong University of Science and Technology (Guangzhou)
Wei, Yan	Zhejiang University
Qu, Fengzhong	Zhejiang University
Yang, Liuqing	The Hong Kong University of Science and Technology (Guangzhou)
Keywords: SLAM, Marine Robotics Abstract: Rejecting outliers before applying classical robust methods is a common approach to increase the success rate of estimation, particularly when the outlier ratio is extremely high (e.g. 90%). However, this method often relies on sensor- or task-specific characteristics, which may not be easily transferable across different scenarios. In this paper, we focus on the problem of rejecting 2D-3D point correspondence outliers from 2D forward-looking sonar (2D FLS) observations, which is one of the most popular perception device in the underwater field but has a significantly different imaging mechanism compared to widely used perspective cameras and LiDAR. We fully leverage the narrow field of view in the elevation of 2D FLS and develop two compatibility tests for different 3D point configurations: (1) In general cases, we design a pairwise length in-range test to filter out overly long or short edges formed from point sets; (2) In coplanar cases, we design a coplanarity test to check if any four correspondences are compatible under a coplanar setting. Both tests are integrated into outlier rejection pipelines, where they are followed by maximum clique searching to identify the largest consistent measurement set as inliers. Extensive simulations demonstrate that the proposed methods for general and coplanar cases perform effectively under outlier ratios of 80% and 90%, respectively.

17:15-17:20, Paper TuDT29.8
3DS-SLAM: A 3D Object Detection Based Semantic SLAM towards Dynamic Indoor Environments

Ghanta, Sai Krishna	University of Louisville
Kundrapu, Supriya	University of Louisville
Baidya, Sabur	University of Louisville
Keywords: SLAM, Mapping, Deep Learning for Visual Perception Abstract: The existence of variable factors within the environment can cause a decline in camera localization accuracy, as it violates the fundamental assumption of a static environment in Simultaneous Localization and Mapping (SLAM) algorithms. Recent semantic SLAM systems towards dynamic environments either rely solely on 2D semantic information, or solely on geometric information, or combine their results in a loosely integrated manner. In this research paper, we introduce 3DSSLAM, 3D Semantic SLAM, tailored for dynamic scenes with visual 3D object detection. The 3DS-SLAM is a tightly-coupled algorithm resolving both semantic and geometric constraints sequentially. We designed a 3D part-aware hybrid transformer for point cloud-based object detection to identify dynamic objects. Subsequently, we propose a dynamic feature filter based on HDBSCAN clustering & Weighted Cluster selection to extract objects with significant absolute depth differences. When compared against ORB-SLAM2, CFP-SLAM, and DYNA-SLAM, 3DS-SLAM exhibits an average improvement of 98.01%, 28.54%, and 50.92%, respectively, across the dynamic sequences of the TUM RGB-D dataset. Furthermore, it surpasses the performance of the other four leading SLAM systems designed for dynamic environments.


TuDT30	106
Aerial Autonomy	Regular Session
Chair: Gao, Fei	Zhejiang University
Co-Chair: Peng, Chunyi	Purdue University

16:40-16:45, Paper TuDT30.1
SFExplorer: A Surface-Frontier-Based Efficient UAV Exploration Method for Large-Scale Unknown Environments

Duan, Peiming	Sun Yat-Sen University
Zhang, Xiaoxun	Sun Yat-Sen University
Zheng, Lanxiang	Sun Yat-Sen University
Huang, Junlong	Sun Yat-Sen University
Liang, Jiahui	School of Computer Science and Engineering, Sun Yat-Sen Univers
Cheng, Hui	Sun Yat-Sen University
Keywords: Aerial Systems: Perception and Autonomy, Motion and Path Planning, Aerial Systems: Applications Abstract: Autonomous exploration in unknown environments is a crucial challenge for various applications of unmanned aerial vehicles (UAVs). However, in large-scale scenarios, existing methods suffer from inefficient environmental information acquisition, computationally expensive exploration planning, and inconsistent motion. In this work, we present a novel method for rapid UAV autonomous exploration in large-scale environments. We develop a surface frontiers guided viewpoints generation strategy that supports efficient coverage to the scenarios. Besides, we introduce an incremental viewpoint clustering method to approximate distant viewpoints using fewer anchor points, decreasing the computational costs of exploration tour planning. Building upon this, we propose a history-informed tour planning method that incorporates information from previous tour into the optimization process, maintaining motion consistency. Extensive simulation experiments validate that our method outperforms existing state-of-the-art methods in terms of exploration time, travel distance, and run time. Various real-world experiments are conducted to indicate the practicality of our approach. The source code will be released to benefit the community.

16:45-16:50, Paper TuDT30.2
Seamless Transition Control in Spring-Legged Quadrotors: A Hybrid Dynamics Perspective with Guaranteed Feasibility

Li, Hongli	Sun Yat-Sen University
Zhang, Botao	Sun Yat-Sen University
Mao, Rui	Sun Yat-Sen University
Wang, Tao	Sun Yat-Sen University
Cheng, Hui	Sun Yat-Sen University
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Optimization and Optimal Control Abstract: Legged aerial-terrestrial robots have garnered significant research attention in recent years due to their enhanced environmental adaptability through combined aerial and terrestrial locomotion. However, existing passive spring-legged aerial robots exhibit limited motion versatility, demonstrating single stance gait during ground impacts, which constrains their task adaptability and creates substantial challenges in hybrid trajectory optimization and switching control. To address these difficulties, this work presents a systematic solution to achieve diverse hybrid locomotion. We innovatively establish the differential flatness property for spring-legged quadrotors in both aerial and terrestrial domains, and propose a unified hybrid trajectory optimization framework that generates smooth, agile, and dynamically feasible multi-modal trajectories incorporating diverse stance gait patterns. Furthermore, a hybrid nonlinear model predictive controller with a trajectory extension strategy is developed to enhance hybrid tracking precision and mode transition execution. Compared to existing methods, we achieve a 30% reduction in tracking error during hybrid locomotion while maintaining high-precision foot placement. The source code will be released to benefit the community.

16:50-16:55, Paper TuDT30.3
Quadrotor Morpho-Transition: Learning vs Model-Based Control Strategies

Mandralis, Ioannis	Caltech
Murray, Richard	California Institute of Technology
Morteza, Gharib	CALTECH
Keywords: Aerial Systems: Mechanics and Control, Aerial Systems: Applications, Reinforcement Learning Abstract: Quadrotor Morpho-Transition, or the act of transitioning from air to ground through mid-air transformation, involves complex aerodynamic interactions and a need to operate near actuator saturation, complicating controller design. In recent work, morpho-transition has been studied from a model-based control perspective, but these approaches remain limited due to unmodeled dynamics and the requirement for planning through contacts. Here, we train an end-to-end Reinforcement Learning (RL) controller to learn a morpho-transition policy and demonstrate successful transfer to hardware. We find that the RL control policy achieves agile landing, but only transfers to hardware if motor dynamics and observation delays are taken into account. On the other hand, a baseline MPC controller transfers out-of-the-box without knowledge of the actuator dynamics and delays, at the cost of reduced recovery from disturbances in the event of unknown actuator failures. Our work opens the way for more robust control of agile in-flight quadrotor maneuvers that require mid-air transformation.

16:55-17:00, Paper TuDT30.4
Estimation of Aerodynamics Forces in Dynamic Morphing Wing Flight

Gupta, Bibek	Northeastern University
Kim, Mintae	University of California, Berkeley
Park, Albert	Rice University
Sihite, Eric	California Institute of Technology
Sreenath, Koushil	University of California, Berkeley
Ramezani, Alireza	Northeastern University
Keywords: Aerial Systems: Mechanics and Control, Biologically-Inspired Robots, Biomimetics Abstract: Accurate estimation of aerodynamic forces is essential for advancing the control, modeling, and design of flapping-wing aerial robots with dynamic morphing capabilities. In this paper, we investigate two distinct methodologies for force estimation on Aerobat, a bio-inspired flapping-wing platform designed to emulate the inertial and aerodynamic behaviors observed in bat flight. Our goal is to quantify aerodynamic force contributions during tethered flight, a crucial step toward closed-loop flight control. The first method is a physics-based observer derived from Hamiltonian mechanics that leverages the concept of conjugate momentum to infer external aerodynamic forces acting on the robot. This observer builds on the system's reduced-order dynamic model and utilizes real-time sensor data to estimate forces without requiring training data. The second method employs a neural network-based regression model, specifically a multi-layer perceptron (MLP), to learn a mapping from joint kinematics, flapping frequency, and environmental parameters to aerodynamic force outputs. We evaluate both estimators using a 6-axis load cell in a high-frequency data acquisition setup that enables fine-grained force measurements during periodic wingbeats. The conjugate momentum observer and the regression model demonstrate strong agreement across three force components (Fx, Fy, Fz).

17:00-17:05, Paper TuDT30.5
FLOAT Drone: A Fully-Actuated Coaxial Aerial Robot for Close-Proximity Operations

Lin, Junxiao	Zhejiang University
Ji, Shuhang	ZheJiang University
Wu, Yuze	Zhejiang University
Wu, Tianyue	Zhejiang University
Han, Zhichao	Zhejiang University
Gao, Fei	Zhejiang University
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Mechanism Design Abstract: How to endow aerial robots with the ability to operate in close proximity remains an open problem. The core challenges lie in the propulsion system's dual-task requirement: generating manipulation forces while simultaneously counteracting gravity. These competing demands create dynamic coupling effects during physical interactions. Furthermore, rotor-induced airflow disturbances critically undermine operational reliability. Although fully-actuated unmanned aerial vehicles (UAVs) alleviate dynamic coupling effects via six-degree-of-freedom (6-DoF) force-torque decoupling, existing implementations fail to address the aerodynamic interference between drones and environments. They also suffer from oversized designs, which compromise maneuverability and limit their applications in various operational scenarios. To address these limitations, we present FLOAT Drone (FuLly-actuated cOaxial Aerial roboT), a novel fully-actuated UAV featuring two key structural innovations. By integrating control surfaces into fully-actuated systems for the first time, we significantly suppress lateral airflow disturbances during operations. Furthermore, a coaxial dual-rotor configuration enables a compact size while maintaining high hovering efficiency. Through dynamic modeling, we have developed hierarchical position and attitude controllers that support both fully-actuated and underactuated modes. Experimental validation through comprehensive real-world experiments confirms the system's functional capabilities in close-proximity operations.

17:05-17:10, Paper TuDT30.6
Thermal Updraft Profiling with an Array of Show Drones

Lacerda, Pedro	Eötvös Loránd University
Vásárhelyi, Gábor	Eötvös University
Nagy, Mate	MTA-ELTE Lendület Collective Behaviour Research Group, Eötvös Un
Keywords: Aerial Systems: Applications, Distributed Robot Systems, Environment Monitoring and Management Abstract: Localized thermal convective updrafts in the atmosphere, commonly referred to as thermals, serve as a significant source of energy-efficient flight for birds, human pilots, and autonomous aircraft. Measuring the vertical airspeed distribution within these updrafts to estimate their strength and characteristics is a significant empirical challenge. In this study, we introduce a proof-of-concept distributed thermal measurement system that uses small, cost-effective multirotor drones equipped with standard sensors, eliminating the need for specialized airspeed instruments. These drones estimate updraft strength by analyzing performance parameters such as power consumption or rotor rotation speed. First, we conducted extensive investigations in a simulated environment that incorporated varying windy conditions to establish the relationship between the updraft speed (the vertical speed of the air) and the performance parameters of the drones. Following this, we conducted outdoor experiments involving up to 49 multirotor drones to demonstrate the effectiveness of the distributed measurement system in action. By advancing our understanding of thermal updrafts, this research contributes valuable information to analyze avian flight behavior and also facilitates the development of realistic simulation environments. These advancements can enhance the design of thermal-utilizing self-driving algorithms for unmanned aerial vehicles (UAVs), paving the way for more efficient and adaptive robotic systems.

17:10-17:15, Paper TuDT30.7
Flying on Point Clouds with Reinforcement Learning

Xu, Guangtong	Zhejiang University
Wu, Tianyue	Zhejiang University
Wang, Zihan	North China Electric Power University (Baoding)
Wang, Qianhao	Zhejiang University
Gao, Fei	Zhejiang University
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Reinforcement Learning Abstract: A long-cherished vision of drones is to autonomously traverse through clutter to reach every corner of the world using onboard sensing and computation. In this paper, we combine onboard 3D lidar sensing and sim-to-real reinforcement learning (RL) to enable autonomous flight in cluttered environments. Compared to vision sensors, lidars appear to be more straightforward and accurate for geometric modeling of surroundings, which is one of the most important cues for successful obstacle avoidance. On the other hand, sim-to-real RL approach facilitates the realization of low-latency control, without the hierarchy of trajectory generation and tracking. We demonstrate that, with design choices of practical significance, we can effectively combine the advantages of 3D lidar sensing and RL to control a quadrotor through a low-level control interface at 50Hz. The key to successfully learn the policy in a lightweight way lies in a specialized surrogate of the lidar's raw point clouds, which simplifies learning while retaining a fine-grained perception to detect narrow free space and thin obstacles. Simulation statistics demonstrate the advantages of the proposed system over alternatives, such as performing easier maneuvers and higher success rates at different speed constraints. With lightweight simulation techniques, the policy trained in the simulator can control a physical quadrotor, where the system can dodge thin obstacles and safely traverse randomly distributed obstacles.

Technical Program for Tuesday October 21, 2025