| |
Last updated on May 14, 2025. This conference program is tentative and subject to change
Technical Program for Tuesday May 20, 2025
|
TuLB1R Poster Session, Hall A1/A2 |
Add to My Program |
Late Breaking Results 1 |
|
|
|
09:30-09:55, Paper TuLB1R.1 | Add to My Program |
Human Factors Characterization and Robot Task Scheduling in Human-Robot Collaborative Tasks |
|
Wang, Weitian | Montclair State University |
Keywords: Human Factors and Human-in-the-Loop, Human-Centered Robotics, Human-Centered Automation
Abstract: Collaborative robots have been widely used in emerging working contexts such as smart manufacturing. How to make the robot understand human factors (HF, e.g., trust, comfort, acceptance) still remains a gap in the research community. In this Late Breaking poster presentation, we will show the preliminary results of our work in human factors characterization and robot task scheduling in human-robot collaborative tasks. We found that human factors-related physiological and physical information can be used to characterize the HF levels in human-robot partnerships. This work has the potential to improve human-robot collaboration safety and productivity in smart manufacturing contexts.
|
|
09:30-09:55, Paper TuLB1R.2 | Add to My Program |
Generative Fuzzy Rules from Expert Demonstration for Robust Imitation Learning |
|
Lee, Sangmoon | Kyungpook National University |
Kim, Joonsu | Kyungpook National University |
Park, Ju H. | Yeungnam University |
Keywords: Imitation Learning, Neural and Fuzzy Control, Learning from Demonstration
Abstract: Imitation learning allows robots to acquire complex actions by imitating expert demonstrations, removing the need for manual control policy design or reward shaping. It has been successfully applied in manipulation, locomotion, and navigation tasks. However, existing approaches often rely on deep neural networks, which require large amounts of expert data and suffer from training instability, high computational costs, and poor interpretability. We propose a fuzzy rule-based imitation learning method that generates interpretable IF-THEN rules from expert demonstrations. Using the Mamdani product and a weighted average mechanism, actions are produced with fewer learnable parameters. This improves learning efficiency, enhances transparency, and enables dynamic adaptation to new conditions without additional expert data. The approach is validated on a 7-DoF robotic manipulator, showing robust and generalizable action generation under varied scenarios.
|
|
09:30-09:55, Paper TuLB1R.3 | Add to My Program |
EffiDynaMix: A Novel Efficient Gaussian Mixture Model for Robot Inertia Model Learning with Dynamics-Motivated Optimal Excitation |
|
Kim, Taehoon | DGIST (Daegu Gyeongbuk Institute of Science & Technology) |
Choi, Kiyoung | Deagu Gyeongbuk Institute of Science and Technology |
Kong, Taejune | DGIST |
Samuel, Kangwagye | Technical University of Munich |
Lee, Hyunwook | Gyeongsang National University |
Oh, Sehoon | DGIST |
Keywords: Data Sets for Robot Learning, Model Learning for Control, Motion Control
Abstract: In this paper, a novel and efficient non-parametric dynamics modeling method, called EffiDynaMix, is proposed. This gray-box, data-driven technique combines the Gaussian Mixture Model (GMM) with the mathematical inertia model of robot behavior. This integration simplifies the creation of train datasets and enhances the model's generalization capabilities. Extensive testing via simulations and experiments on a robot manipulator indicates that EffiDynaMix outperforms conventional data-driven methods such as non-Optimal GMM (nonOpt-GMM), chirp trained GMM (chirp-GMM), Sparse Gaussian Processes (SGP), Long Short-Term Memory (LSTM) networks, and Radial Basis Function Neural Networks (RBFNN) in both training efficiency and accuracy, particularly in new data scenarios. By incorporating system dynamics, EffiDynaMix not only streamlines the learning process but also ensures smooth adaptation to unfamiliar situations.
|
|
09:30-09:55, Paper TuLB1R.4 | Add to My Program |
Nonlinear Disturbance Observer-Based Robust Control for Flapping-Wing Aerial Manipulators |
|
Asignacion, Abner Jr | Chiba University |
Suzuki, Satoshi | Chiba University |
Keywords: Aerial Systems: Mechanics and Control, Robust/Adaptive Control, Aerial Systems: Applications
Abstract: Flapping-wing aerial manipulators offer promising capabilities for efficient and lightweight robotic applications but remain highly susceptible to modeling uncertainties and external disturbances due to their nonlinear, time-varying dynamics and low Reynolds number operation. These challenges significantly hinder their ability to perform stable manipulation tasks. Although robust control presents a viable solution, its implementation is nontrivial for flapping-wing platforms due to complex dynamics and susceptibility to uncertainties and external disturbances. This work proposes a parallel nonlinear disturbance observer-based (NDOB) robust controller targeting the altitude and longitudinal dynamics of an X-shaped flapping-wing UAV. The controller is designed to suppress undesired motion during manipulation, improving stability and disturbance rejection. Experimental results on the Flapper Nimble+ platform validate the effectiveness of the proposed method, demonstrating accurate trajectory tracking and robustness under wind disturbances and varying payloads.
|
|
09:30-09:55, Paper TuLB1R.5 | Add to My Program |
Adaptive Exafference Via eXtended Reality for Functional Movement Disorder Rehabilitation |
|
Dutta, Anirban | University of Birmingham, UK |
Keywords: Rehabilitation Robotics, Design and Human Factors, Perception-Action Coupling
Abstract: Functional Movement Disorders (FMD) involve impaired voluntary motor control despite no structural neurological damage and are increasingly understood within a predictive coding framework. Disruptions in sensory attenuation and prediction error updating impair the sense of agency in FMD. We present an eXtended Reality (XR)-based biofeedback platform that integrates haptic exafference, cued motor imagery, and adaptive reinforcement to recalibrate sensorimotor control through operant conditioning. Using a Kalman filter and Linear-Quadratic-Gaussian (LQG) control model, we simulate disrupted internal modeling in FMD and demonstrate how motor imagery suggestions for preconditioning and haptic postconditioning can restore control by enhancing state estimation and reducing prediction error. Our co-designed platform incorporates real-time visual and force feedback, and wearable brain imaging to monitor neural correlates of agency. Usability studies and simulation results highlighted the potential of XR-based adaptive control for personalized rehabilitation in FMD, bridging computational neuroscience, robotics, and clinical neurorehabilitation.
|
|
09:30-09:55, Paper TuLB1R.6 | Add to My Program |
From Observation to Correction: EEG-fNIRS Microstate Transitions in Surgical Skill Acquisition |
|
Dutta, Anirban | University of Birmingham, UK |
Keywords: Perception-Action Coupling, Surgical Robotics: Planning, Sensorimotor Learning
Abstract: Understanding how the brain transitions between cognitive states during complex tasks is a central challenge in neuroscience for neuroengineering. This study investigates EEG microstate dynamics during the “suturing and intracorporeal knot-tying” task—the most complex skill in the Fundamentals of Laparoscopic Surgery (FLS) program. Using multimodal neuroimaging data (EEG and fNIRS), we examine expertise-dependent brain activation patterns through the lens of a Kalman Filter-based human motor control framework. Microstates—quasi-stable spatial EEG patterns—are mapped to functional brain networks and modeled as elements of internal state prediction, sensory feedback, and error correction. Six EEG-fNIRS microstates were extracted using K-Means clustering and aligned with canonical microstates (A–F), capturing transitions linked to salience, attention, and motor planning. Experts demonstrated efficient microstate transitions during both task initiation and error response, engaging networks like the frontoparietal, salience, and default mode networks, while novices exhibited variable and reactive patterns. Kalman-based control model provided a unified framework linking neurophysiological microstates with adaptive motor control, highlighting how experience shapes cognitive and sensorimotor dynamics during skill learning.
|
|
09:30-09:55, Paper TuLB1R.7 | Add to My Program |
Efficient 6-DoF Grasp Prediction from RGB Via Hybrid 3D Representation |
|
Li, Yiming | Swansea University |
Ren, Hanchi | Swansea University |
Yang, Yue | Swansea University |
Deng, Jingjing | Durham University |
Xie, Xianghua | Swansea University |
Keywords: Deep Learning in Grasping and Manipulation
Abstract: Reliable object grasping is one of the fundamental tasks in robotics. However, determining grasping pose based on single-image input has long been a challenge due to limited visual information and the complexity of real-world objects. In this work, we propose Triplane Grasping, a fast grasping decision-making method that relies solely on a single RGB-only image as input. Triplane Grasping creates a hybrid Triplane-Gaussian 3D representation through a point decoder and a triplane decoder, which produce an efficient and high-quality reconstruction of the object to be grasped to meet real-time grasping requirements. We propose to use an end-to-end network to generate 6-DoF parallel-jaw grasp distributions directly from 3D points in the point cloud as potential grasp contacts and anchor the grasp pose in the observed data. Experiments on the OmniObject3D and GraspNet-1Billion datasets demonstrate that our method achieves rapid modeling and grasping pose decision-making for daily objects, and strong generalization capability.
|
|
09:30-09:55, Paper TuLB1R.8 | Add to My Program |
Towards Improving Open-Source and Benchmarking for Robot Manipulation |
|
Norton, Adam | University of Massachusetts Lowell |
Bekris, Kostas E. | Rutgers, the State University of New Jersey |
Calli, Berk | Worcester Polytechnic Institute |
Dollar, Aaron | Yale University |
Sun, Yu | University of South Florida |
Yanco, Holly | UMass Lowell |
Keywords: Grasping, Performance Evaluation and Benchmarking, Deep Learning in Grasping and Manipulation
Abstract: The Collaborative Open-source Manipulation Performance Assessment for Robotics Enhancement (COMPARE) Ecosystem aims to create a greater cohesion between open-source products that improves the modularity of open-source components in the robot manipulation pipeline for more effective performance benchmarking through the generation of community-driven standards for components of software pipelines, benchmarking practices, objects, datasets, and hardware designs. Online resources for COMPARE include robot-manipulation.org — a landing page for open-source and benchmarking in robot manipulation, including repositories of open-source products and benchmarking assets, event listings for workshops, competitions, and webinars, leaderboards of benchmarking results from across the community, and more — and the COMPARE Slack workspace — the primary online discussion platform for COMPARE, with channels corresponding to each of the relevant open-source and benchmarking thrusts. As part of activating the COMPARE Ecosystem, we are holding a full-day workshop at RSS 2025 — Benchmarking Robot Manipulation: Improving Interoperability and Modularity — and hosting a 1-week COMPARE Summer School Cooperative Hackathon and Benchmarking Exercise. This poster provides an overview of the COMPARE Ecosystem project, information about upcoming events, available resources, and a call for engagement with the international robot manipulation community.
|
|
09:30-09:55, Paper TuLB1R.9 | Add to My Program |
Data-Efficient Fine-Tuning for Ultrasound Needle Tracking with Motion Prefix and Tunable Register |
|
Zhang, Yuelin | CUHK |
Tang, Longxiang | Tsinghua University |
Fang, Chengyu | Tsinghua University |
Cheng, Shing Shin | The Chinese University of Hong Kong |
Keywords: Computer Vision for Medical Robotics, Visual Tracking, Deep Learning Methods
Abstract: Ultrasound (US)-guided needle insertion is crucial for minimally invasive interventions, yet robust needle tracking remains challenging due to the scarcity of labeled training data. This paper introduces a data-efficient fine-tuning (DEFT) framework incorporating a motion prefix and tunable register to reduce data dependency on labeled US needle tracking dataset while maintaining accuracy. With the proposed plug-and-play motion prefix, it enables task-oriented adaptation of historical motion at minimal dependency on training data. By further incorporating the lightweight tunable register, the global context can be adapted to small US datasets without losing learned behavior from pretraining domain. Evaluations were conducted on collected datasets with motorized and manual insertions using different US machines. By pretraining on large-scale open-world datasets and fine-tuning on limited US data, our model demonstrates state-of-the-art performance on various data-insufficient or out-of-distribution scenarios. To the best of our knowledge, this is the first US needle tracker trained with DEFT, offering a practical solution for clinical deployment where labeled data is scarce.
|
|
09:30-09:55, Paper TuLB1R.10 | Add to My Program |
Semnav: A Semantic Segmentation Driven Approach to Visual Semantic Navigation |
|
Flor Rodríguez-Rabadán, Rafael | Alcalá University |
Gutiérrez Álvarez, Carlos | Universidad De Alcalá |
Acevedo, Francisco Javier | Universidad De Alcalá |
Lafuente-Arroyo, Sergio | University of Alcalá |
López-Sastre, Roberto J. | University of Alcalá |
Keywords: Vision-Based Navigation, Data Sets for Robot Learning, Imitation Learning
Abstract: Visual Semantic Navigation (VSN) is a key problem in robotics where an agent must find a target object in an unknown environment using mostly visual cues. Most current VSN models are trained in simulated environments, often using rendered scenes that resemble the real world. These models generally rely on raw RGB images, which limits their ability to generalize to real-world scenarios due to domain adaptation issues. To address this, we propose SemNav, a novel approach that uses semantic segmentation as the main visual input. By incorporating high-level semantic information, our model enhances the agent’s perception and decision-making abilities. This allows it to learn more robust navigation policies that generalize better to unseen environments, both in simulation and in the real world. We also introduce a new dataset—the SemNav dataset—specifically designed to train navigation models that use semantic segmentation. We evaluate our method extensively in simulated environments (using Habitat 2.0 and the HM3D dataset) and with real-world robotic platforms. Experiments show that SemNav outperforms current state-of-the-art VSN methods, achieving higher success rates. Additionally, our real-world tests confirm that semantic segmentation significantly helps bridge the sim-to-real gap, making our method a strong candidate for practical VSN applications. We will release the dataset, code, and trained models to the public.
|
|
09:30-09:55, Paper TuLB1R.11 | Add to My Program |
Gait Optimization for Underwater Legged Robots Using Data-Driven Hydrodynamic Modeling and Reinforcement Learning |
|
Song, Seokyong | POSCO Holdings |
Kim, Taesik | Pohang University of Science and Technology (POSTECH) |
Kim, Seungmin | Pohang University of Science and Technology |
Lee, Joonho | Neuromeka |
Yu, Son-Cheol | Pohang University of Science and Technology (POSTECH) |
Keywords: Marine Robotics, Legged Robots, Machine Learning for Robot Control
Abstract: Precise close-contact inspections are critical in underwater environments, where complex dynamics and biofouling pose significant challenges to traditional vehicles. To address these issues, this study develops a Reinforcement Learning-based framework to optimize the gait of an underwater legged robot for accurate and stable locomotion. The simulation environment integrates hydrodynamic forces, buoyancy effects, added mass, and seabed interactions to train robust gait control policies. By designing the action space to regulate phase and amplitude differences within a legged oscillation model, the framework ensures predictable gait patterns. Experimental validation was conducted both in simulation and on a real robot, confirming that the RL-trained policy effectively mitigates hydrodynamic disturbances. The results demonstrated reduced altitude and pitch instability during fast forward walking, as well as improved heading accuracy in curved trajectories.
|
|
09:30-09:55, Paper TuLB1R.12 | Add to My Program |
Uncertainty-Aware Planning Using Deep Ensembles and Constrained Trajectory Optimization for Social Navigation |
|
Nayak, Anshul | Virginia Tech |
Eskandarian, Azim | Virginia Tech |
Keywords: Human-Aware Motion Planning, Planning under Uncertainty, Social HRI
Abstract: Human motion is stochastic and ensuring safe robot navigation in a pedestrian-rich environment requires proactive decision-making. Past research relied on incorporating deterministic future states of surrounding pedestrians which can be overconfident leading to unsafe robot behaviour. The current paper proposes a predictive uncertainty-aware planner that integrates neural network based probabilistic trajectory prediction into planning. Our method uses a deep ensemble based network for probabilistic forecasting of surrounding humans and integrates the predictive uncertainty as constraints into the planner. We compare numerous constraint satisfaction methods on the planner and evaluated its performance on real world pedestrian datasets. Further, offline robot navigation was carried out on out-of-distribution pedestrian trajectories inside a narrow corridor.
|
|
09:30-09:55, Paper TuLB1R.13 | Add to My Program |
Visual Explanation of DRL-Based Mobile Robot Via Augmented Reality |
|
Itaya, Hidenori | Chubu University |
Hirakawa, Tsubasa | Chubu University |
Yamashita, Takayoshi | Chubu University |
Fujiyoshi, Hironobu | Chubu University |
Sugiura, Komei | Keio University |
Keywords: Human-Centered Robotics, Human-Robot Collaboration
Abstract: Deep reinforcement learning (DRL) is one of the methods used to achieve autonomous mobility in robots. Although it effectively achieves high performance, there is still a problem that the decision-making process of robot actions is unclear. To address this, many studies have been conducted to enable visual analysis through images. However, these images lack a direct connection to physical space, which limits user understanding of robot actions. Therefore, we present an encoder-decoder model based on Transformer that enables a visual explanation of DRL-based robot autonomous movement and projects it onto physical space using augmented reality(AR). In this paper, we demonstrate that the proposed method provides a direct visual explanation for different robot actions and that the visualization using AR effectively enhances users’ understanding of robot actions.
|
|
09:30-09:55, Paper TuLB1R.14 | Add to My Program |
A Multimodal Emotion Interaction Interface for Friendly Collaborative Robots |
|
Murphy, Jordan | Montclair State University |
Li, Rui | Montclair State University |
|
09:30-09:55, Paper TuLB1R.15 | Add to My Program |
Hysteresis-Assisted Shape Morphing for Soft Continuum Robots |
|
Bi, Zheyuan | The University of Sheffield |
Ji, Tianchen | University of Sheffield |
Dogramadzi, Sanja | University of Sheffield |
Cao, Lin | University of Sheffield |
Keywords: Soft Sensors and Actuators, Actuation and Joint Mechanisms, Soft Robot Materials and Design
Abstract: Conventional robot actuation often relies on more actuators for greater dexterity, complicating design and control. While soft robots with passive under-actuation or multi-stable metamaterials simplify actuation, they rely on external forces or limited multi-stable structures. Here, we introduce a Hysteresis-Assisted Shape-Morphing (HasMorph) paradigm for soft continuum robots and integrate it with tip-everting soft-growing robots (SGRs) to enable scalable shape-morphing in confined spaces. HasMorph exploits shape hysteresis—differences in configuration during loading and unloading—to achieve multi-section bending with minimal actuators through an Inverted Zigzag Tendon-Sheath Mechanism. By alternating just two actuators, this mechanism enables billions of stable shape changes in soft continuum robots, vastly expanding their configuration space and workspace without added hardware complexity. By leveraging friction-induced hysteresis, the SGRs can reversibly morph its shape while maintaining frictionless tip growth. Experiments were conducted to demonstrate reversible shape morphing, tip steering, and follow-the-leader tip growth in unstructured spaces. These findings underscore the potential of hysteresis-assisted actuation for dexterous manipulation and navigation of soft continuum robots, paving the way for advanced applications in medical intervention, industrial inspection, and exploration.
|
|
09:30-09:55, Paper TuLB1R.16 | Add to My Program |
Four Principles for Physically Interpretable World Models |
|
Peper, Jordan | University of Florida |
Mao, Zhenjiang | University of Florida |
Geng, Yuang | University of Florida |
Pan, Siyuan | University of Florida |
Ruchkin, Ivan | University of Florida |
Keywords: Deep Learning Methods, Representation Learning, Machine Learning for Robot Control
Abstract: As autonomous systems are increasingly deployed in open and uncertain settings, there is a growing need for trustworthy world models that can reliably predict future high-dimensional observations. The learned latent representations in world models lack direct mapping to meaningful physical quantities and dynamics, limiting their utility and interpretability in downstream planning, control, and safety verification. In this paper, we argue for a fundamental shift from physically informed to physically interpretable world models --- and crystallize four principles that leverage symbolic knowledge to achieve these ends: (1) structuring latent spaces according to the physical intent of variables, (2) learning aligned invariant and equivariant representations of the physical world, (3) adapting training to the varied granularity of supervision signals, and (4) partitioning generative outputs to support scalability and verifiability. We experimentally demonstrate the value of each principle on two benchmarks. This paper opens several intriguing research directions to achieve and capitalize on full physical interpretability in world models.
|
|
09:30-09:55, Paper TuLB1R.17 | Add to My Program |
Improving Camera-LiDAR BEV Fusion for Long-Range 3D Object Detection Via Accurate Depth Projection and Feature Balancing |
|
Sagong, Sungpyo | Seoul National University |
Lee, Minhyeong | Seoul National University |
Lee, Dongjun | Seoul National University |
Keywords: Sensor Fusion, Object Detection, Segmentation and Categorization, Computer Vision for Transportation
Abstract: This paper proposes a Bird’s Eye View (BEV)- based camera-LiDAR fusion algorithm to enhance long-range 3D object detection by leveraging LiDAR-to-image depth projection and voxel-wise average pooling. Accurate detection of distant objects is essential for autonomous driving safety, yet existing BEV-based fusion methods suffer from inaccurate depth estimation and imbalanced feature intensities. To address these limitations, the proposed method leverages accurate depth maps generated by projecting LiDAR points onto images for accurate geometric transformations of camera features into 3D space, and employs voxel-wise average pooling to balance feature intensities across distances, producing consistent BEV representations. Evaluations on the nuScenes dataset, using a distance-scaled linear matching threshold, demonstrate significant improvements, notably a 49.2% of mean average precision increase for objects beyond 60 m.
|
|
09:30-09:55, Paper TuLB1R.18 | Add to My Program |
A High-Torque-Density Robotic Wrist with Embedded Torque Sensing for Peg-In-Hole Tasks |
|
Tsai, Yi-Shian | National Cheng Kung University |
Chen, Yi-Hung | National Cheng Kung University |
Lan, Chao-Chieh | National Cheng Kung University |
Keywords: Compliant Assembly, Force and Tactile Sensing, Compliant Joints and Mechanisms
Abstract: This paper presents the design and experimental validation of a torque-controlled robotic wrist with high torque density. The proposed wrist features a serial pitch-yaw joint configuration that enhances dexterity while maintaining compactness. The design integrates stepper motors, harmonic geartrains, and compliant mechanisms to optimize torque output and control accuracy. A compliant pulley and a compliant cap are introduced, enabling embedded torque sensing without the need for external sensors, thereby reducing system complexity and improving response time. Experimental results demonstrate the effectiveness of the wrist in torque accuracy, backdrivability, impedance regulation, and misalignment correction during peg-in-hole assembly, highlighting the benefits of the compliant-driven torque sensing approach. Compared to existing robotic wrists, the proposed design achieves a higher torque density. The findings contribute to advancing robotic wrist technology, particularly in applications requiring precise force modulation, high dexterity, and adaptable compliance.
|
|
09:30-09:55, Paper TuLB1R.19 | Add to My Program |
Robot Particle Herding: Fundamental Concepts and Experiments |
|
Lee, Hoi-Yin | The Hong Kong Polytechnic University |
Navarro-Alarcon, David | The Hong Kong Polytechnic University |
Keywords: Dual Arm Manipulation, Manipulation Planning
Abstract: In this paper, we address the problem of manipulating multi-particle aggregates using a bimanual robotic system. Our approach enables the autonomous transport of dispersed particles through a series of shaping and pushing actions using robotically-controlled tools. Achieving this advanced manipulation capability presents two key challenges: high-level task planning and trajectory execution. For task planning, we leverage Vision Language Models (VLMs) to enable primitive actions such as tool affordance grasping and non-prehensile particle pushing. For trajectory execution, we represent the evolving particle aggregate's contour using truncated Fourier series, providing efficient parametrization of its closed shape. We adaptively compute trajectory waypoints based on group cohesion and the geometric centroid of the aggregate, accounting for its spatial distribution and collective motion. Through real-world experiments, we demonstrate the effectiveness of our methodology in actively shaping and manipulating multi-particle aggregates while maintaining high system cohesion.
|
|
09:30-09:55, Paper TuLB1R.20 | Add to My Program |
The Shape Awakens: Estimating Dynamic Soft Robot States from the Outer Rim |
|
Zheng, Tongjia | University of Toronto |
Burgner-Kahrs, Jessica | University of Toronto |
Keywords: Modeling, Control, and Learning for Soft Robots, Soft Robot Applications
Abstract: State estimation for soft continuum robots is challenging due to their infinite-dimensional states (poses, strains, velocities) resulting from continuous deformability, while conventional sensors provide only discrete data. A recent method, called a boundary observer, uses Cosserat rod theory to estimate all robot states by measuring only tip velocity. In this work, we propose a novel boundary observer that instead measures the internal wrench at the robot’s base, leveraging the duality between velocity and internal wrench. Both observers are inspired by energy dissipation, but the base-based approach offers a key advantage: it uses only a 6-axis force/torque sensor at the base, avoiding the need for external sensing systems. Combining tip- and base-based methods further enhances energy dissipation, speeds up convergence, and improves estimation accuracy. We validate the proposed algorithms in experiments where all boundary observers converge to the ground truth within 3 seconds, even with large initial deviations, and they recover from unknown disturbances while effectively tracking high-frequency vibrations.
|
|
09:30-09:55, Paper TuLB1R.21 | Add to My Program |
Neural Correspondence of Impaired Decision-Making in Multiple Cue Judgment System with Decision Support |
|
Chang, Yoo-Sang | North Carolina Agricultural and Technical State University |
Seong, Younho | North Carolina Agricultural and Technical State University |
Yi, Sun | North Carolina A&T State University |
|
09:30-09:55, Paper TuLB1R.22 | Add to My Program |
Good Deep Feature to Track: Self-Supervised Feature Extraction and Tracking in Visual Odometry |
|
Gottam, Sai Puneeth Reddy | RWTH Aachen University |
Zhang, Haoming | RWTH Aachen University |
Keras, Eivydas | RWTH Aachen University |
Keywords: Visual-Inertial SLAM, Deep Learning for Visual Perception, Localization
Abstract: Visual-based localization has made significant progress, yet its performance often drops in large-scale, outdoor, and long-term settings due to factors like lighting changes, dynamic scenes, and low-texture areas. These challenges degrade feature extraction and tracking, which are critical for accurate motion estimation. While learning-based methods such as SuperPoint and SuperGlue show improved feature coverage and robustness, they still face generalization issues with out-of-distribution data. We address this by enhancing deep feature extraction and tracking through self-supervised learning with task-specific loss function design. Our method promotes stable and informative features, improving generalization and reliability in challenging environments.
|
|
09:30-09:55, Paper TuLB1R.23 | Add to My Program |
Uncertainty Aware Ankle Exoskeleton Control |
|
Tourk, Fatima Mumtaza | Northeastern University |
Galoaa, Bishoy | Northeastern University |
Shajan, Sanat | Northeastern University |
Young, Aaron | Georgia Tech |
Everett, Michael | Northeastern University |
Shepherd, Max | Northeastern University |
Keywords: Prosthetics and Exoskeletons, Machine Learning for Robot Control, Rehabilitation Robotics
Abstract: Current exoskeleton controllers are constrained to predefined, cyclic tasks (e.g., flat-ground walking) and struggle to adapt to the variability of real-world environments, with only 4% of rehabilitation studies validated in home/community settings. This limitation poses safety risks, as preprogrammed torque profiles may fail during novel movements. We propose a deep learning-based framework to enable task-agnostic control, a high level uncertainty classifier-based controller. The classifier detects whether user actions align with "in-distribution" trained tasks (walking, jogging on inclines/declines) or represent novel "out-of-distribution" scenarios, switching between task-specific assistance and safety-optimized torque. Leveraging biomechanical time-series data from the Dephy Exoboot, three temporal convolutional network (TCN) based architectures—ensembles, autoencoders, and GANs—were trained on actuated/unactuated locomotion data. Evaluated offline on a test set (N=3 users) containing trained tasks (treadmill/overground walking/jogging) and novel actions (stairs, jumping, sitting), the model achieved an F1-score of 0.99 and a J-statistic of 0.985. Results demonstrated clear separation between in-distribution (assisted) and out-of-distribution (safety mode) tasks, even during dynamic transitions. Future work will validate the system in online, continuous environments outside the lab, advancing toward deployable exoskeletons for diverse daily activities.
|
|
09:30-09:55, Paper TuLB1R.24 | Add to My Program |
Monocular Vision-Based Autonomous Docking Considering Estimation and Motion Capabilities |
|
Im, Jinho | Keimyung University |
Hong, Seonghun | Keimyung University |
Keywords: Autonomous Vehicle Navigation, SLAM
Abstract: Autonomous docking is a crucial capability for achieving long-term and ultimate autonomy in unmanned vehicle operations. Monocular cameras can be considered as one of the most efficient and promising sensing options for autonomous docking because many existing unmanned vehicles have at least one onboard camera for teleoperation or monitoring. However, monocular cameras possess a fundamental limitation as a stand-alone position-sensing device in the state estimation for autonomous docking due to their nature of bearings-only sensing and the lack of range observability. This study proposes an autonomous docking method for nonholonomic vehicles using monocular vision-based bearing-only measurements. The proposed approach computes the control inputs that enable docking while drawing trajectories that simultaneously consider the vehicle’s motion constraints and state observability. Simulation results are shown to validate the proposed method under vehicle docking scenarios with a single-point landmark for the dock and bearing-only sensing.
|
|
09:30-09:55, Paper TuLB1R.25 | Add to My Program |
An Underactuated Mechanism Enabling Two-DOF Thumb Motion with a Single Actuator |
|
Sin, MinKi | Korea Institute of Machinery & Materials |
An, Bohyeon | Korea Institute of Machinery & Materials |
Chu, Jun-Uk | Korea Institute of Machinery & Materials |
Keywords: Prosthetics and Exoskeletons, Tendon/Wire Mechanism, Underactuated Robots
Abstract: Structural lightness and functional versatility are key requirements in robotic prosthetic hands. To achieve both characteristics, this study proposes a novel underactuated thumb mechanism that enables independent 2-DOF motion using a single actuator. The design employs the Geneva mechanism’s intermittent motion to control ab/adduction, while its non-engaged range of motion drives flexion/extension via a wire pulley system. A shape-adaptive grasp mechanism is also implemented to enable both power and pinch grips. The mechanism achieves independent motion on different planes with minimal components. Experimental results demonstrate that the proposed system effectively realizes 2-DOF thumb motion using only one actuator, offering a compact and modular solution for next-generation prosthetic hands.
|
|
09:30-09:55, Paper TuLB1R.26 | Add to My Program |
Proactive Vortex Ring State Management for High Speed Descent UAVs in Mountain Rescue: Vibration Modeling and SVM Based Detection |
|
Sun, Jiawei | Guangxi University |
Zhou, Xiang | Guangxi University |
Zhao, Jiannan | Guangxi University |
Shuang, Feng | Guangxi University |
Keywords: Search and Rescue Robots, Aerial Systems: Applications, Aerial Systems: Perception and Autonomy
Abstract: In mountain rescue operations, unmanned aerial vehicles (UAVs) must repeatedly ascend and descend. However, most UAVs face limitations on their vertical descent speed. Exceeding this limit can trigger a dangerous airflow condition known as the vortex ring state (VRS). The VRS causes the UAV to lack rotor speed response and increases fuselage vibration, making it uncontrollable. Howeverpilots or operators cannot perceive the onset of VRS, the timing of exiting the VRS is critical yet challenging. To address these issues, we propose a strategy allowing UAVs to maintain controlled descent within the VRS or recover to a hover state. We developed a VRS vibration model by analyzing the relationship between descent rate, rotor speed, and vibration to calculate the vibration under VRS.Computational Fluid Dynamics (CFD) simulations were employed to confirm VRS flowfield and validate the vibration model. Then we take the realflight tests, we descended the UAV at 10 m/s, recording the descent rate and vibration amplitude with Inertial Measurement Unit (IMU). We assume the high-speed descent UAV can detect and respond to VRS and apply Support Vector Machine (SVM) techniques to analyze vibration signal changes, distinguishing VRS characteristics from those observed during hover and climb phases in real-flight data. With this approach verified, we further tested high-speed descents at 12 m/s and 15 m/s. This study reveals the function between VRS vibration, rotor speed,
|
|
09:30-09:55, Paper TuLB1R.27 | Add to My Program |
Real-Time Robot Base Placement Based on 3D Inverse Reachability Map |
|
Choi, JungHyun | University of Seoul |
Lee, Taegyeom | University of Seoul |
Hwang, Myun Joong | University of Seoul |
Keywords: Mobile Manipulation
Abstract: Optimal base placement is a critical problem in mobile manipulation, as the position of the mobile base directly influences task feasibility and manipulability. In this work, we propose a real-time base placement method using a 3D Inverse Reachability Map (IRM). By slicing the 3D IRM into 2D layers and leveraging manipulability-based metrics, our method efficiently identifies optimal base poses and trajectories for multiple task goals. The proposed approach supports dynamic task scenarios and outperforms prior methods in computation time. Experimental results using a 7-DOF Franka Emika Panda robot demonstrate the effectiveness of the method for both static task poses and continuous end-effector trajectories.
|
|
09:30-09:55, Paper TuLB1R.28 | Add to My Program |
Hierarchical Robotic Intelligence Via LLM and RL |
|
Huynh, Truong Nhut | Florida Institute of Technology |
Pham, Tan Hanh | Florida Institute of Technology |
Gutierrez, Hector | Florida Institute of Technology |
Nguyen, Kim-Doang | Florida Institute of Technology |
Keywords: AI-Enabled Robotics, Object Detection, Segmentation and Categorization, Deep Learning in Grasping and Manipulation
Abstract: Achieving robust robotic intelligence for real-world tasks remains a challenge due to the need for precise control and contextual adaptability. This work introduces an integrative AI framework that synergizes GPT-4’s contextual reasoning with reinforcement learning (RL) for enhanced task planning and object manipulation. Using Hello Robot’s Stretch 2, our approach interprets natural language instructions, generates strategic plans, and executes actions in dynamic environments. Experiments across static, dynamic, and complex scenarios demonstrate superior performance, with success rates of 90%, 85%, and 80%, respectively, outperforming standalone RL, GPT-4, and state-of-the-art methods like TidyBot (85%) and OK-Robot (75%). Our framework bridges high-level reasoning and low-level control, reducing completion times and retries, and paves the way for more autonomous, efficient robotic systems.
|
|
09:30-09:55, Paper TuLB1R.29 | Add to My Program |
Robotic Assistant for Image-Guided Treatment of Ankle Joint Dislocations |
|
Li, Gang | Children's National Medical Center |
Fooladi Talari, Hadi | Children's National Medical Center |
China, Debarghya | Johns Hopkins University |
Uneri, Ali | Johns Hopkins University |
Ghanem, Diane | Johns Hopkins Medicine |
Shafiq, Babar | Johns Hopkins Medicine |
Cleary, Kevin | Children's National Medical Center |
Monfaredi, Reza | Children's National Medical Center |
Keywords: Medical Robots and Systems, Mechanism Design, Surgical Robotics: Planning
Abstract: Trauma to the ankle is a common injury and a major source of long-term disability. Each year, over half a million ankle injuries require surgical intervention. In cases involving syndesmotic sprains, surgical manipulation of the tibia and fibula is necessary to properly align and reduce the syndesmosis space. However, current manual reduction techniques—whether open or percutaneous—often result in inaccurate reduction. We propose a novel system that integrates intraoperative low-dose cone-beam computed tomography (CBCT) with 3D-2D image registration and robotic manipulation of the fibula to precisely restore its anatomical alignment with the tibial incisura, while minimizing radiation exposure to both patients and surgical staff. This study reports on the development of the robotic assistant and the accompanying image-based 3D planning and guidance method. Experiments were conducted to evaluate the positioning accuracy of the robotic system and the accuracy of the multi-body 3D-2D registration. The results demonstrate that the robotic system can achieve the required motion with a free-space positioning accuracy of 0.26 ± 0.10 mm. The 3D-2D registration method achieved high accuracy of 0.2 ± 0.8 mm, enabling tracking of the robot and bones to dynamically correct the trajectory during the procedure.
|
|
09:30-09:55, Paper TuLB1R.30 | Add to My Program |
Adaptive Communication Based on Estimated Situation Awareness Improves Performance in Human-Robot Teams |
|
Ali, Arsha | University of Michigan |
Robert, Lionel | University of Michigan |
Tilbury, Dawn | University of Michigan |
Keywords: Human-Robot Teaming
Abstract: Situation awareness is important for decision making and performance in human-robot teams, yet both estimating and improving situation awareness in real-time is an open research area. We present a situation awareness system with a real-time situation awareness estimator and adaptive robot interventions. The effectiveness of the situation awareness system in improving situation awareness and performance is demonstrated through a human-robot teaming experiment.
|
|
TuAT1 Regular Session, 302 |
Add to My Program |
Award Finalists 1 |
|
|
Chair: Walter, Matthew | Toyota Technological Institute at Chicago |
Co-Chair: Corke, Peter | Queensland University of Technology |
|
09:55-10:00, Paper TuAT1.1 | Add to My Program |
Achieving Human Level Competitive Robot Table Tennis |
|
D'Ambrosio, David | Google |
Abeyruwan, Saminda Wishwajith | Google Inc |
Graesser, Laura | Google |
Iscen, Atil | Google |
Ben Amor, Heni | Arizona State University |
Bewley, Alex | Google |
Reed, Barney J. | Stickman Skills Center LLC |
Reymann, Krista | Google Research |
Takayama, Leila | University of California, Santa Cruz |
Tassa, Yuval | University of Washington |
Choromanski, Krzysztof | Google DeepMind Robotics |
Coumans, Erwin | Google Inc |
Jain, Deepali | Robotics at Google |
Jaitly, Navdeep | Google Research |
Jaques, Natasha | Google |
Kataoka, Satoshi | Google LLC |
Kuang, Yuheng | Google DeepMind |
Lazic, Nevena | Deepmind |
Mahjourian, Reza | Waymo |
Moore, Sherry | Google DeepMind |
Oslund, Kenneth | Google |
Shankar, Anish | Google |
Sindhwani, Vikas | Google Brain, NYC |
Vanhoucke, Vincent | Google |
Vesom, Grace | Google DeepMind |
Xu, Peng | Google |
Sanketi, Pannag | Google |
Keywords: Reinforcement Learning, Deep Learning Methods, Physical Human-Robot Interaction
Abstract: Achieving human-level performance on real world tasks is a north star for the robotics community. We present the first learned robot agent that reaches amateur human-level performance in competitive table tennis. Table tennis is a physically demanding sport that takes humans years to master. We contribute (1) a hierarchical and modular policy architecture consisting of (i) low level controllers with their skill descriptors that model their capabilities and (ii) a high level controller that chooses the low level skills, (2) techniques for enabling zero-shot sim-to-real and curriculum building, including an iterative approach (train in sim, deploy in real), and (3) real time adaptation to unseen opponents. Policy performance was assessed through 29 robot vs. human matches of which the robot won 45% (13/29). All humans were unseen players and their skill level varied from beginner to tournament level. Whilst the robot lost all matches vs. the most advanced players it won 100% matches vs. beginners and 55% matches vs. intermediate players, demonstrating solidly amateur human-level performance.
|
|
10:00-10:05, Paper TuAT1.2 | Add to My Program |
Robo-DM: Data Management for Large Robot Datasets |
|
Chen, Kaiyuan | University of California, Berkeley |
Fu, Letian | UC Berkeley |
Huang, David | University of California, Berkeley |
Zhang, Yanxiang | University of California, Berkeley |
Chen, Lawrence Yunliang | UC Berkeley |
Huang, Huang | University of California at Berkeley |
Hari, Kush | UC Berkeley |
Balakrishna, Ashwin | Toyota Research Institute |
Xiao, Ted | Google DeepMind |
Sanketi, Pannag | Google |
Kubiatowicz, John | UC Berkeley |
Goldberg, Ken | UC Berkeley |
Keywords: Big Data in Robotics and Automation, Methods and Tools for Robot System Design, Engineering for Robotic Systems
Abstract: Recent work suggests that very large datasets of teleoperated robot demonstrations can train transformer-based models that have the potential to generalize to new scenes, robots, and tasks. However, curating, distributing, and loading large datasets of robot trajectories, which typically consist of video, textual, and numerical modalities - including streams from multiple cameras - remains challenging. We propose Robo-DM, an efficient cloud-based data management toolkit for collecting, sharing, and learning with robot data. With Robo-DM, robot datasets are stored in a self-contained format with Extensible Binary Meta Language (EBML). Robo-DM reduces the size of robot trajectory data, transfer costs, and data load time during training. In particular, compared to the RLDS format used in OXE datasets, Robo-DM’s compression saves space by up to 70x (lossy) and 3.5x (lossless). Robo-DM also accelerates data retrieval by load-balancing video decoding with memory-mapped decoding caches. Compared to LeRobot, a framework that also uses lossy video compression, Robo-DM is up to 50x faster. In fine-tuning Octo, a transformer-based robot policy with 73k episodes with RT-1 data, Robo-DM does not incur any loss in training performance. We physically evaluate a model trained by Robo-DM with lossy compression, a pick-and-place task, and In-Context Robot Transformer. Robo-DM uses 75x compression of the original dataset and does not suffer any reduction in downstream task accuracy. Code and evaluation scripts can be found on the website.
|
|
10:05-10:10, Paper TuAT1.3 | Add to My Program |
No Plan but Everything under Control: Robustly Solving Sequential Tasks with Dynamically Composed Gradient Descent |
|
Mengers, Vito | Technische Universität Berlin |
Brock, Oliver | Technische Universität Berlin |
Keywords: Integrated Planning and Control, Reactive and Sensor-Based Planning, Optimization and Optimal Control
Abstract: We introduce a novel gradient-based approach for solving sequential tasks by dynamically adjusting the underlying myopic potential field in response to feedback and the world's regularities. This adjustment implicitly considers subgoals encoded in these regularities, enabling the solution of long sequential tasks, as demonstrated by solving the traditional planning domain of Blocks World— without any planning. Unlike conventional planning methods, our feedback-driven approach adapts to uncertain and dynamic environments, as demonstrated by one hundred real-world trials involving drawer manipulation. These experiments highlight the robustness of our method compared to planning and show how interactive perception and error recovery naturally emerge from gradient descent without explicitly implementing them. This offers a computationally efficient alternative to planning for a variety of sequential tasks, while aligning with observations on biological problem-solving strategies.
|
|
10:10-10:15, Paper TuAT1.4 | Add to My Program |
MiniVLN: Efficient Vision-And-Language Navigation by Progressive Knowledge Distillation |
|
Zhu, Junyou | University of Chinese Academy of Sciences |
Qiao, Yanyuan | The University of Adelaide |
Zhang, Siqi | Tongji University |
He, Xingjian | Institute of Automation Chinese Academy of Sciences |
Wu, Qi | University of Adelaide |
Liu, Jing | Institute of Automation, Chinese Academy of Science |
Keywords: Deep Learning Methods, Transfer Learning
Abstract: In recent years, Embodied Artificial Intelligence (Embodied AI) has advanced rapidly, yet the increasing size of models conflicts with the limited computational capabilities of Embodied AI platforms. To address this challenge, we aim to achieve both high model performance and practical deployability. Specifically, we focus on Vision-and-Language Navigation (VLN), a core task in Embodied AI. This paper introduces a two-stage knowledge distillation framework, producing a student model, MiniVLN, and showcasing the significant potential of distillation techniques in developing lightweight models. The proposed method aims to capture fine-grained knowledge during the pretraining phase and navigation-specific knowledge during the fine-tuning phase. Our findings indicate that the two-stage distillation approach is more effective in narrowing the performance gap between the teacher model and the student model compared to single-stage distillation. On the public R2R and REVERIE benchmarks, MiniVLN achieves performance on par with the teacher model while having only about 12% of the teacher model's parameter count.
|
|
10:15-10:20, Paper TuAT1.5 | Add to My Program |
PolyTouch: A Robust Multi-Modal Tactile Sensor for Contact-Rich Manipulation Using Tactile-Diffusion Policies |
|
Zhao, Jialiang | Massachusetts Institute of Technology |
Kuppuswamy, Naveen | Toyota Research Institute |
Feng, Siyuan | Toyota Research Institute |
Burchfiel, Benjamin | Toyota Research Institute |
Adelson, Edward | MIT |
Keywords: Force and Tactile Sensing, Sensorimotor Learning, Learning from Demonstration
Abstract: Achieving robust dexterous manipulation in unstructured domestic environments remains a significant challenge in robotics. Even with state-of-the-art robot learning methods, haptic-oblivious control strategies (i.e. those relying only on external vision and/or proprioception) often fall short due to occlusions, visual complexities, and the need for precise contact interaction control. To address these limitations, we introduce PolyTouch, a novel robot finger that integrates camera-based tactile sensing, acoustic sensing, and peripheral visual sensing into a single design that is compact and durable. PolyTouch provides high-resolution tactile feedback across multiple temporal scales, which is essential for efficiently learning complex manipulation tasks. Experiments demonstrate an at least 20-fold increase in lifespan over commercial tactile sensors, with a design that is both easy to manufacture and scalable. We then use this multi-modal tactile feedback along with visuo-proprioceptive observations to synthesize a tactile-diffusion policy from human demonstrations; the resulting contact-aware control policy significantly outperforms haptic-oblivious policies in multiple contact-aware manipulation policies. This paper highlights how effectively integrating multi-modal contact sensing can hasten the development of effective contact-aware manipulation policies, paving the way for more reliable and versatile domestic robots. More information can be found at https://polytouch.alanz.info.
|
|
10:20-10:25, Paper TuAT1.6 | Add to My Program |
A New Stereo Fisheye Event Camera for Fast Drone Detection and Tracking |
|
Rodrigues Da Costa, Daniel | Université De Picardie Jules Verne |
Robic, Maxime | Université De Picardie Jules Verne |
Vasseur, Pascal | Université De Picardie Jules Verne |
Morbidi, Fabio | Université De Picardie Jules Verne |
Keywords: Omnidirectional Vision, Visual Tracking, Aerial Systems: Applications
Abstract: In this paper, we present a new compact vision sensor consisting of two fisheye event cameras mounted back-to-back, which offers a full 360-degree view of the surrounding environment. We describe the optical design, projection model and practical calibration using the incoming stream of events, of the novel stereo camera, called SFERA. The potential of SFERA for real-time target tracking is evaluated using a Bayesian estimator adapted to the geometry of the sphere. Real-world experiments with a prototype of SFERA, including two synchronized Prophesee EVK4 cameras and a DJI Mavic Air 2 quadrotor, show the effectiveness of the proposed system for aerial surveillance.
|
|
10:25-10:30, Paper TuAT1.7 | Add to My Program |
Learning-Based Adaptive Navigation for Scalar Field Mapping and Feature Tracking |
|
Fuentes, Jose | Florida International University |
Padrao, Paulo | Florida International University |
Redwan Newaz, Abdullah Al | University of New Orleans |
Bobadilla, Leonardo | Florida International University |
Keywords: Marine Robotics, Environment Monitoring and Management, Field Robots
Abstract: Scalar field features such as extrema, contours, and saddle points are essential for applications in environmental monitoring, search and rescue, and resource exploration. Traditional navigation methods often rely on predefined trajectories, leading to inefficient and resource-intensive mapping. This paper introduces a new adaptive navigation framework that leverages learning techniques to enhance exploration efficiency and effectiveness in scalar fields, even under noisy data and obstacles. The framework employs Partial Differential Equations to model scalar fields and a Gaussian Process Regressor to estimate the fields and their gradients, enabling real-time path adjustments and obstacle avoidance. We provide a theoretical foundation for the approach and address several limitations found in existing methods. The effectiveness of our framework is demonstrated through simulation benchmarks and field experiments with an Autonomous Surface Vehicle, showing improved efficiency and adaptability compared to traditional methods and offering a robust solution for real-time environmental monitoring.
|
|
TuAT2 Regular Session, 301 |
Add to My Program |
SLAM 1 |
|
|
Chair: Indelman, Vadim | Technion - Israel Institute of Technology |
Co-Chair: Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
|
09:55-10:00, Paper TuAT2.1 | Add to My Program |
Measurement Simplification in Rho-POMDP with Performance Guarantees |
|
Yotam, Tom | Technion |
Indelman, Vadim | Technion - Israel Institute of Technology |
Keywords: SLAM, Motion and Path Planning, Autonomous Agents, Foundations of Automation
Abstract: Decision making under uncertainty is at the heart of any autonomous system acting with imperfect information. The cost of solving the decision making problem is exponential in the action and observation spaces, thus rendering it unfeasible for many online systems. This paper introduces a novel approach to efficient decision-making, by partitioning the high-dimensional observation space. Using the partitioned observation space, we formulate analytical bounds on the expected information- theoretic reward, for general belief distributions. These bounds are then used to plan efficiently while keeping performance guar- antees. We show that the bounds are adaptive, computationally efficient, and that they converge to the original solution. We extend the partitioning paradigm and present a hierarchy of partitioned spaces that allows greater efficiency in planning. We then propose a specific variant of these bounds for Gaussian beliefs and show a theoretical performance improvement of at least a factor of 4. Finally, we compare our novel method to other state of the art algorithms in active SLAM scenarios, in simulation and in real experiments. In both cases we show a sign
|
|
10:00-10:05, Paper TuAT2.2 | Add to My Program |
VSS-SLAM: Voxelized Surfel Splatting for Geometally Accurate SLAM |
|
Chen, Xuanhua | Northeastern University |
Zhang, Yunzhou | Northeastern University |
Zhang, Zhiyao | Northeastern University |
Wang, Guoqing | Northeastern University |
Zhao, Bin | Northeastern University |
Wang, Xingshuo | Northeastern University |
Keywords: Deep Learning Methods, Mapping, SLAM
Abstract: Visual Simultaneous Localization and Mapping (SLAM) helps robots estimate their poses and perceive the environment in unknown settings. Recent work has demonstrated that implicit neural radiance fields and 3D Gaussian Splatting (3DGS) offer higher fidelity scene representation than traditional map representations. We propose VSS-SLAM, which utilizes voxelized surfels as the map representation for incremental mapping in unknown environments. This representation effectively addresses the issue of redundant and disordered primitives encountered in previous methods, thereby enhancing geometric accuracy during reconstruction. Specifically, our approach divides the scene using voxels and stores geometric and appearance information in feature vectors at the voxel vertices. Before rendering, these feature vectors are decoded to generate the corresponding surfels. Additionally, we align camera poses through image and depth rendering. Extensive experiments on the Replica and TUM RGBD datasets demonstrate that VSS-SLAM delivers high-fidelity reconstruction and accurate pose estimation in both simulated and real-world environments. Source code will soon be available.
|
|
10:05-10:10, Paper TuAT2.3 | Add to My Program |
New Graph Distance Measures and Matching of Topological Maps for Robotic Exploration |
|
Morbidi, Fabio | Université De Picardie Jules Verne |
Keywords: Mapping, Autonomous Agents, SLAM
Abstract: Comparing graph-structured maps is a task of paramount importance in robotic exploration and cartography, but unfortunately the computational cost of the existing similarity measures, such as the graph edit distance (GED), is prohibitive for large graphs. In this paper, we introduce and characterize three new graph distance measures which satisfy the requirements for a metric. The first one, "LogEig", computes the square root of the sum of the squared logarithms of the generalized eigenvalues of the shifted Laplacian matrices associated with the two graphs, while the second calculates the Bures distance between these positive definite matrices. The third distance, "Rank", computes the rank of the difference of the graph shift operators associated with the two graphs, e.g. the adjacency or the Laplacian matrix. Examples and numerical experiments with graphs from a publicly-available dataset, show the accuracy and computational efficiency of the new metrics for 2D topological-map matching, compared to the GED. The effect of spectral sparsification on the new graph distance measures is examined as well.
|
|
10:10-10:15, Paper TuAT2.4 | Add to My Program |
EnvoDat: A Large-Scale Multisensory Dataset for Robotic Spatial Awareness and Semantic Reasoning in Heterogeneous Environments |
|
Nwankwo, Linus Ebere | University of Leoben |
Ellensohn, Björn | Montanuniversitaet Leoben |
Dave, Vedant | Montanuniversität Leoben |
Hofer, Peter | Theresianische Militarakademie |
Forstner, Jan | Montanuniversität Leoben |
Villneuve, Marlene | Montanuniversität Leoben |
Galler, Robert | Robert.galler@unileoben.ac.at |
Rueckert, Elmar | Montanuniversitaet Leoben |
Keywords: Data Sets for SLAM, Data Sets for Robotic Vision, Semantic Scene Understanding
Abstract: To ensure the efficiency of robot autonomy under diverse real-world conditions, a high-quality heterogeneous dataset is essential to benchmark the operating algorithms' performance and robustness. Current benchmarks predominantly focus on urban terrains, specifically for on-road autonomous driving, leaving multi-degraded, densely vegetated, dynamic and feature-sparse environments, such as underground tunnels, natural fields, and modern indoor spaces underrepresented. To fill this gap, we introduce EnvoDat, a large-scale, multi-modal dataset collected in diverse environments and conditions, including high illumination, fog, rain, and zero visibility at different times of the day. Overall, EnvoDat contains 26 sequences from 13 scenes, 10 sensing modalities, over 1.9TB of data, and over 89K fine-grained polygon-based annotations for more than 82 object and terrain classes. We post-processed EnvoDat in different formats that support benchmarking SLAM and supervised learning algorithms, and fine-tuning multimodal vision models. With EnvoDat, we contribute to environment-resilient robotic autonomy in areas where the conditions are extremely challenging. The datasets and other relevant resources can be accessed through https://linusnep.github.io/EnvoDat/.
|
|
10:15-10:20, Paper TuAT2.5 | Add to My Program |
Probabilistic Degeneracy Detection for Point-To-Plane Error Minimization |
|
Hatleskog, Johan | Norwegian University of Science and Technology |
Alexis, Kostas | NTNU - Norwegian University of Science and Technology |
Keywords: SLAM, Probability and Statistical Methods
Abstract: Degeneracies arising from uninformative geometry are known to deteriorate LiDAR-based localization and mapping. This work introduces a new probabilistic method to detect and mitigate the effect of degeneracies in point-to-plane error minimization. The noise on the Hessian of the point-to-plane optimization problem is characterized by the noise on points and surface normals used in its construction. We exploit this characterization to quantify the probability of a direction being degenerate. The degeneracy-detection procedure is used in a new real-time degeneracy-aware iterative closest point algorithm for LiDAR registration, in which we smoothly attenuate updates in degenerate directions. The method's parameters are selected based on the noise characteristics provided in the LiDAR's datasheet. We validate the approach in four real-world experiments, demonstrating that it outperforms state-of-the-art methods at detecting and mitigating the adverse effects of degeneracies. For the benefit of the community, we release the code for the method at: github.com/ntnu-arl/drpm.
|
|
10:20-10:25, Paper TuAT2.6 | Add to My Program |
SCE-LIO: An Enhanced Lidar Inertial Odometry by Constructing Submap Constraints |
|
Sun, Chao | Beijing Institute of Technology |
Huang, Zhishuai | Beijing Institute of Technology |
Wang, Bo | Shenzhen Automotive Research Institute, BIT |
Xiao, Mancheng | ShenZhen Boundless Sensor Technology Co., Ltd |
Leng, Jianghao | Beijing Institute of Technology |
Li, Jiajun | Shenzhen Automotive Research Institute, Beijing Institute of Tec |
Keywords: SLAM, Mapping, Autonomous Vehicle Navigation
Abstract: In lidar-based Simultaneous Localization and Mapping (SLAM) systems, loop closure detection is crucial for enhancing the accuracy of odometry. However, constraints from loop closure detection are only provided when a loop is detected and can only enhance odometry accuracy at specific moments. Therefore, this paper proposes a lidar inertial odometry system that periodically provides submap constraints to the pose graph and enhances odometry accuracy through pose graph optimization. In the process of creating submap constraints, the system represents lidar keyframes as a collection of submaps containing overlapping information. The optimal pose transformations between submaps, determined using the Iterative Closest Point (ICP) algorithm with point-to-line and point-to-plane methods, are recognized as submap constraints. During the backend optimization phase, submap constraints and adjacent lidar keyframe constraints are integrated into the pose graph. The pose graph is then optimized using the pose graph optimization method to achieve the optimal lidar pose estimation. Additionally, To further enhance pose estimation, point-to-plane correspondence is established by considering the differences in normal vectors of feature points between the scan and the map, and integrated initial positioning module is created by incorporating preintegration and scan-to-scan matching. The results of simulation, public datasets and vehicle experiments show that the accuracy of the proposed algorithm is significantly improved compared to the advanced SLAM algorithm.
|
|
10:25-10:30, Paper TuAT2.7 | Add to My Program |
HDPlanner: Advancing Autonomous Deployments in Unknown Environments through Hierarchical Decision Networks |
|
Liang, Jingsong | National University of Singapore |
Cao, Yuhong | National University of Singapore |
Ma, Yixiao | National University of Singapore |
Zhao, Hanqi | Georgia Institute of Technology |
Sartoretti, Guillaume Adrien | National University of Singapore (NUS) |
Keywords: AI-Based Methods, View Planning for SLAM, Motion and Path Planning
Abstract: In this paper, we introduce HDPlanner, a deep reinforcement learning (DRL) based framework designed to tackle two core and challenging tasks for mobile robots: autonomous exploration and navigation, where the robot must optimize its trajectory adaptively to achieve the task objective through continuous interactions in unknown environments. Specifically, HDPlanner relies on novel hierarchical attention networks to empower the robot to reason about its belief across multiple spatial scales and sequence collaborative decisions, where our networks decompose long-term objectives into short-term informative task assignments and informative path plannings. We further propose a contrastive learning-based joint optimization to enhance the robustness of HDPlanner. We empirically demonstrate that HDPlanner significantly outperforms state-of-the-art conventional and learning-based baselines on an extensive set of simulations, including hundreds of test maps and large-scale, complex Gazebo environments. Notably, HDPlanner achieves real-time planning with travel distances reduced by up to 35.7% compared to exploration benchmarks and by up to 16.5% than navigation benchmarks. Furthermore, we validate our approach on hardware, where it generates high-quality, adaptive trajectories in both indoor and outdoor environments, highlighting its real-world applicability without additional training.
|
|
TuAT3 Regular Session, 303 |
Add to My Program |
3D Content Capture and Generation 1 |
|
|
Chair: Schieber, Hannah | Human-Centered Computing and Extended Reality, Technical University of Munich, School of Medicine and Health, Klinikum Rechts De |
Co-Chair: Zhu, Minghan | University of Michigan |
|
09:55-10:00, Paper TuAT3.1 | Add to My Program |
WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions Via Gaussian Splatting |
|
Qian, Chenghao | University of Leeds |
Guo, Yuhu | Carnegie Mellon University |
Li, Wenjing | University of Leeds |
Markkula, Gustav | University of Leeds |
Keywords: Computer Vision for Automation, Computer Vision for Transportation, Visual Learning
Abstract: 3D Gaussian Splatting (3DGS) has gained significant attention for 3D scene reconstruction, but still suffers from complex outdoor environments, especially under adverse weather. This is because 3DGS treats the artifacts caused by adverse weather as part of the scene and will directly reconstruct them, largely reducing the clarity of the reconstructed scene. To address this challenge, we propose WeatherGS, a 3DGS-based framework for reconstructing clear scenes from multi-view images under different weather conditions. Specifically, we explicitly categorize the multi-weather artifacts into the dense particles and lens occlusions that have very different characters, in which the former are caused by snowflakes and rain streaks in the air, and the latter are raised by the precipitation on the camera lens. In light of this, we propose a dense-to-sparse preprocess strategy, which sequentially removes the dense particles by an Atmospheric Effect Filter (AEF) and then extracts the relatively sparse occlusion masks with a Lens Effect Detector (LED). Finally, we train a set of 3D Gaussians by the processed images and generated masks for excluding occluded areas, and accurately recover the underlying clear scene by Gaussian splatting. We conduct a diverse and challenging benchmark to facilitate the evaluation of 3D reconstruction under complex weather scenarios. Extensive experiments on this benchmark demonstrate that our WeatherGS consistently produces high-quality, clean scenes across various weather scenarios, outperforming existing state-of-the-art methods.
|
|
10:00-10:05, Paper TuAT3.2 | Add to My Program |
RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning |
|
Wu, Yuxuan | Shanghai Jiao Tong University |
Pan, Lei | China University of Mining and Technology |
Wu, Wenhua | Shang Hai Jiao Tong University |
Wang, Guangming | University of Cambridge |
Miao, Yanzi | China University of Mining and Technology |
Xu, Fan | Shanghai Jiao Tong University |
Wang, Hesheng | Shanghai Jiao Tong University |
Keywords: Computer Vision for Automation, Deep Learning in Grasping and Manipulation, Reinforcement Learning
Abstract: Sim-to-Real refers to the process of transferring policies learned in simulation to the real world, which is crucial for achieving practical robotics applications. However, recent Sim2real methods either rely on a large amount of augmented data or large learning models, which is inefficient for specific tasks. In recent years, with the emergence of radiance field reconstruction methods, especially 3D Gaussian splatting, it has become possible to construct realistic real-world scenes. To this end, we propose RL-GSBridge, a novel real-to-sim-to-real framework which incorporates 3D Gaussian Splatting into the conventional RL simulation pipeline, enabling zero-shot simto-real transfer for vision-based deep reinforcement learning. We introduce a mesh-based 3D GS method with soft binding constraints, enhancing the rendering quality of mesh models. Then utilizing a GS editing approach to synchronize the rendering with the physics simulator, RL-GSBridge could reflect the visual interactions of the physical robot accurately. Through a series of sim-to-real experiments, including grasping and pickand-place tasks, we demonstrate that RL-GSBridge maintains a satisfactory success rate in real-world task completion during sim-to-real transfer. Furthermore, a series of rendering metrics and visualization results indicate that our proposed mesh-based 3D GS reduces artifacts in unstructured objects, demonstrating more realistic rendering performance.
|
|
10:05-10:10, Paper TuAT3.3 | Add to My Program |
High-Quality 3D Creation from a Single Image Using Subject-Specific Knowledge Prior |
|
Huang, Nan | Peking University |
Zhang, Ting | Beijing Normal University |
Yuan, Yuhui | Microsoft Research Asia |
Chen, Dong | Microsoft Research Asia |
Zhang, Shanghang | Peking University |
Keywords: Computer Vision for Automation
Abstract: In this paper, we address the critical bottleneck in robotics caused by the scarcity of diverse 3D data by presenting a novel two-stage approach for generating high-quality 3D models from a single image. This method is motivated by the need to efficiently expand 3D asset creation, particularly for robotics datasets, where the variety of object types is currently limited compared to general image datasets. Unlike previous methods that primarily rely on general diffusion priors, which often struggle to align with the reference image, our approach leverages subject-specific prior knowledge. By incorporating subject-specific priors in both geometry and texture, we ensure precise alignment between the generated 3D content and the reference object. Specifically, we introduce a shading mode-aware prior into the NeRF optimization process, enhancing the geometry and refining texture in the coarse outputs to achieve superior quality. Extensive experiments demonstrate that our method significantly outperforms prior approaches. Our approach is well-suited for applications such as novel view synthesis, text-to-3D, and image-to-3D, particularly in the robotics field where diverse 3D data is essential.
|
|
10:10-10:15, Paper TuAT3.4 | Add to My Program |
DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes |
|
Li, Hao | Northwestern Polytechnology University |
Gao, Yuanyuan | Northwestern Polytechnical University |
Peng, Haosong | Beijing Institute of Technology |
Wu, Chenming | Baidu Research |
Ye, Weicai | Zhejiang University |
Zhan, Yufeng | Beijing Institute of Technology |
Zhao, Chen | Baidu Inc |
Zhang, Dingwen | Northwestern Polytechnical University |
Wang, Jingdong | Baidu |
Han, Junwei | Northwestern Polytechnical University |
Keywords: Computer Vision for Transportation
Abstract: 小说视图合成 (NVS) 方法起着关键作用在大型场景重建中。然而,这些方法依赖于严重依赖密集的图像输入和延长的训练时间,使它们不适合计算资源所在的位置 有限。 此外,小样本方法通常难以在广阔的环境中重建质量差。 这 论文介绍了 DGTR,这是一种新颖的分布式框架,用于稀疏视图 vast 的高效高斯重建 场景。 我们的方法将场景划分为多个区域,由具有稀疏图像输入的无人机独立处理。使用前馈高斯模型,我们预测高质量的高斯基元,然后是全局对齐算法来确保几何一致性。综合视图和深度先验被合并到进一步增强训练,而基于蒸馏的模型聚合机制可实现高效的
|
|
10:15-10:20, Paper TuAT3.5 | Add to My Program |
LiDAR-EDIT: LiDAR Data Generation by Editing the Object Layouts in Real-World Scenes |
|
Ho, Shing-Hei | University of Utah |
Thach, Bao | University of Utah |
Zhu, Minghan | University of Michigan |
Keywords: Computer Vision for Transportation, Data Sets for Robotic Vision, Deep Learning for Visual Perception
Abstract: We present LiDAR-EDIT, a novel paradigm for generating synthetic LiDAR data for autonomous driving. Our framework edits real-world LiDAR scans by introducing new object layouts while preserving the realism of the background environment. Compared to end-to-end frameworks that generate LiDAR point clouds from scratch, LiDAR-EDIT offers users full control over the object layout, including the number, type, and pose of objects, while keeping most of the original real-world background. Our method also provides object labels for the generated data. Compared to novel view synthesis techniques, our framework allows for the creation of counterfactual scenarios with object layouts significantly different from the original real-world scene. LiDAR-EDIT uses spherical voxelization to enforce correct LiDAR projective geometry in the generated point clouds by construction. During object removal and insertion, generative models are employed to fill the unseen background and object parts that were occluded in the original real Lidar scans. Experimental results demonstrate that our framework produces realistic LiDAR scans with practical value for downstream tasks. Project website with open-sourced code: https://sites.google.com/view/lidar-edit
|
|
10:20-10:25, Paper TuAT3.6 | Add to My Program |
TICMapNet: A Tightly Coupled Temporal Fusion Pipeline for Vectorized HD Map Learning |
|
Qiu, Wenzhao | Xi'an Jiaotong University |
Pang, Shanmin | Xi'an Jiaotong University |
Zhang, Hao | Xi'an Jiaotong University |
Fang, Jianwu | Xian Jiaotong University |
Xue, Jianru | Xi'an Jiaotong University |
Keywords: Mapping, Deep Learning for Visual Perception, Visual Learning
Abstract: High-Definition (HD) map construction is essential for autonomous driving to accurately understand the surrounding environment. Most existing methods rely on single-frame inputs to predict local map, which often fail to effectively capture the temporal correlations between frames. This limitation results in discontinuities and instability in the generated map.To tackle this limitation, we propose a textit{Ti}ghtly textit{C}oupled temporal fusion textit{Map} textit{Net}work (TICMapNet). TICMapNet breaks down the fusion process into three sub-problems: PV feature alignment, BEV feature adjustment, and Query feature fusion. By doing so, we effectively integrate temporal information at different stages through three plug-and-play modules, using the proposed tightly coupled strategy. Unlike traditional methods, our approach does not rely on camera extrinsic parameters, offering a new perspective for addressing the visual fusion challenge in the field of object detection. Experimental results show that TICMapNet significantly improves upon its single-frame baseline model, achieving at least a 7.0% increase in mAP using just two consecutive frames on the nuScenes dataset, while also showing generalizability across other tasks. The code and demos are available at url{https://github.com/adasfag/TICMapNet}.
|
|
10:25-10:30, Paper TuAT3.7 | Add to My Program |
DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields |
|
Schischka, Nicolas | Technical University of Munich |
Schieber, Hannah | Human-Centered Computing and Extended Reality, Technical Univers |
Karaoglu, Mert Asim | Technical University of Munich, ImFusion GmbH |
Görgülü, Melih | Technical University of Munich |
Grötzner, Florian | Technical University of Munich |
Ladikos, Alexander | ImFusion |
Navab, Nassir | TU Munich |
Roth, Daniel | Technical University of Munich, Klinikum Rechts Der Isar |
Busam, Benjamin | Technical University of Munich |
Keywords: Localization, Mapping
Abstract: The accurate reconstruction of dynamic scenes with neural radiance fields is significantly dependent on the estimation of camera poses. Widely used structure-from-motion pipelines encounter difficulties in accurately tracking the camera trajectory when faced with separate dynamics of the scene content and the camera movement. To address this challenge, we propose Dynamic Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields (DynaMoN). DynaMoN utilizes semantic segmentation and generic motion masks to handle dynamic content for initial camera pose estimation and statics-focused ray sampling for fast and accurate novel-view synthesis. Our novel iterative learning scheme switches between training the NeRF and updating the pose parameters for an improved reconstruction and trajectory estimation quality. The proposed pipeline shows significant acceleration of the training process. We extensively evaluate our approach on two real-world dynamic datasets, the TUM RGB-D dataset and the BONN RGB-D Dynamic dataset. DynaMoN improves over the state-of-the-art both in terms of reconstruction quality and trajectory accuracy. We plan to make our code public to enhance research in this area.
|
|
TuAT4 Regular Session, 304 |
Add to My Program |
Vision-Based Tactile Sensing 1 |
|
|
Chair: Wang, Dongyi | University of Arkansas |
Co-Chair: Luo, Shan | King's College London |
|
09:55-10:00, Paper TuAT4.1 | Add to My Program |
TransForce: Transferable Force Prediction for Vision-Based Tactile Sensors with Sequential Image Translation |
|
Chen, Zhuo | King's College London |
Ou, Ni | Beijing Institute of Technology |
Zhang, Xuyang | University of Bristol |
Luo, Shan | King's College London |
Keywords: Force and Tactile Sensing
Abstract: Vision-based tactile sensors (VBTSs) provide high-resolution tactile images crucial for robot in-hand manipulation. However, force sensing in VBTSs is underutilized due to the costly and time-intensive process of acquiring paired tactile images and force labels. In this study, we introduce a transferable force prediction model, TransForce, designed to leverage collected image-force paired data for new sensors under varying illumination colors and marker patterns while improving the accuracy of predicted forces, especially in the shear direction. Our model effectively achieves translation of tactile images from the source domain to the target domain, ensuring that the generated tactile images reflect the illumination colors and marker patterns of the new sensors while accurately aligning the elastomer deformation observed in existing sensors, which is beneficial to force prediction of new sensors. As such, a recurrent force prediction model trained with generated sequential tactile images and existing force labels is employed to estimate higher-accuracy forces for new sensors with lowest average errors of 0.69N (5.8% in full work range) in x-axis, 0.70N (5.8%) in y-axis, and 1.11N (6.9%) in z-axis compared with models trained with single images. The experimental results also reveal that pure marker modality is more helpful than the RGB modality in improving the accuracy of force in the shear direction, while the RGB modality show better performance in the normal direction.
|
|
10:00-10:05, Paper TuAT4.2 | Add to My Program |
HumanFT: A Human-Like Fingertip Multimodal Visuo-Tactile Sensor |
|
Wu, Yifan | ShanghaiTech University |
Chen, Yuzhou | ShanghaiTech University |
Zhu, Zhengying | ShanghaiTech University |
Qin, Xuhao | Shanghaitech University |
Xiao, Chenxi | ShanghaiTech University |
Keywords: Force and Tactile Sensing, Multi-Modal Perception for HRI, Soft Sensors and Actuators
Abstract: Tactile sensors play a crucial role in enabling robots to interact effectively and safely with objects in everyday tasks. In particular, visuotactile sensors have seen increasing usage in two and three-fingered grippers due to their high-quality feedback. However, a significant gap remains in the development of sensors suitable for humanoid robots, especially five-fingered dexterous hands. One reason is because of the challenges in designing and manufacturing sensors that are compact in size. In this paper, we propose HumanFT, a multimodal visuotactile sensor that replicates the shape and functionality of a human fingertip. To bridge the gap between human and robotic tactile sensing, our sensor features real-time force measurements, high-frequency vibration detection, and overtemperature alerts. To achieve this, we developed a suite of fabrication techniques for a new type of elastomer optimized for force propagation and temperature sensing. Besides, our sensor integrates circuits capable of sensing pressure and vibration. These capabilities have been validated through experiments. The proposed design is simple and cost-effective to fabricate. We believe HumanFT can enhance humanoid robots' perception by capturing and interpreting multimodal tactile information.
|
|
10:05-10:10, Paper TuAT4.3 | Add to My Program |
FeelAnyForce: Estimating Contact Force Feedback from Tactile Sensation for Vision-Based Tactile Sensors |
|
Shahidzadeh, Amir Hossein | University of Maryland |
Caddeo, Gabriele Mario | Istituto Italiano Di Tecnologia |
Alapati, Koushik | University of Maryland, College-Park |
Natale, Lorenzo | Istituto Italiano Di Tecnologia |
Fermuller, Cornelia | University of Maryland |
Aloimonos, Yiannis | University of Maryland |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation
Abstract: In this paper, we tackle the problem of estimating 3D contact forces using vision-based tactile sensors. In particular, our goal is to estimate contact forces over a large range (up to 15 N) on any objects while generalizing across different vision-based tactile sensors. Thus, we collected a dataset of over 200K indentations using a robotic arm that pressed various indenters onto a GelSight Mini sensor mounted on a force sensor and then used the data to train a multi-head transformer for force regression. Strong generalization is achieved via accurate data collection and multi-objective optimization that leverages depth contact images. Despite being trained only on primitive shapes and textures, the regressor achieves a mean absolute error of 4% on a dataset of unseen real-world objects. We further evaluate our approach's generalization capability to other GelSight mini and DIGIT sensors, and propose a reproducible calibration procedure for adapting the pre-trained model to other vision-based sensors. Furthermore, the method was evaluated on real-world tasks, including weighing objects and controlling the deformation of delicate objects, which relies on accurate force feedback.
|
|
10:10-10:15, Paper TuAT4.4 | Add to My Program |
VITaL Pretraining: Visuo-Tactile Pretraining for Tactile and Non-Tactile Manipulation Policies |
|
George, Abraham | Carnegie Mellon University |
Gano, Selam | Carnegie Mellon University |
Katragadda, Pranav | Carnegie Mellon University |
Barati Farimani, Amir | Carnegie Mellon University |
Keywords: Force and Tactile Sensing, Deep Learning in Grasping and Manipulation, Imitation Learning
Abstract: Tactile information is a critical tool for dexterous manipulation. As humans, we rely heavily on tactile information to understand objects in our environments and how to interact with them. We use touch not only to perform manipulation tasks but also to learn how to perform these tasks. Therefore, to create robotic agents that can learn to complete manipulation tasks at a human or super-human level of performance, we need to properly incorporate tactile information into both skill execution and skill learning. In this paper, we investigate how we can incorporate tactile information into imitation learning platforms to improve performance on manipulation tasks. We show that incorporating visuo-tactile pretraining improves imitation learning performance, not only for tactile agents (policies that use tactile information at inference), but also for non-tactile agents (policies that do not use tactile information at inference). For these non-tactile agents, pretraining with tactile information significantly improved performance (for example, improving the accuracy on USB plugging from 20% to 85%), reaching a level on par with visuo-tactile agents, and even surpassing them in some cases. For demonstration videos and access to our codebase, see the project website: https://sites.google.com/andrew.cmu.edu/visuo-tactile-pretr aining
|
|
10:15-10:20, Paper TuAT4.5 | Add to My Program |
EasyCalib: Simple and Low-Cost In-Situ Calibration for Force Reconstruction with Vision-Based Tactile Sensors |
|
Li, Mingxuan | Tsinghua University |
Zhang, Lunwei | Tsinghua University |
Zhou, Yen Hang | Tsinghua University |
Li, Tiemin | Tsinghua University |
Jiang, Yao | Tsinghua University |
Keywords: Force and Tactile Sensing, Contact Modeling, Haptics and Haptic Interfaces
Abstract: For elastomer-based tactile sensors, represented by vision-based tactile sensors, routine calibration of mechanical parameters (Young's modulus and Poisson's ratio) has been shown to be important for force reconstruction. However, the reliance on existing in-situ calibration methods for accurate force measurements limits their cost-effective and flexible applications. This article proposes a new in-situ calibration scheme that relies only on comparing contact deformation. Based on the detailed derivations of the normal contact and torsional contact theories, we designed a simple and low-cost calibration device, EasyCalib, and validated its effectiveness through extensive finite element analysis. We also explored the accuracy of EasyCalib in the practical application and demonstrated that accurate contact distributed force reconstruction can be realized based on the mechanical parameters obtained. EasyCalib balances low hardware cost, ease of operation, and low dependence on technical expertise and is expected to provide the necessary accuracy guarantees for wide applications of visuotactile sensors.
|
|
10:20-10:25, Paper TuAT4.6 | Add to My Program |
NormalFlow: Fast, Robust, and Accurate Contact-Based Object 6DoF Pose Tracking with Vision-Based Tactile Sensors |
|
Huang, Hung-Jui | Carnegie Mellon University |
Kaess, Michael | Carnegie Mellon University |
Yuan, Wenzhen | University of Illinois |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation
Abstract: Tactile sensing is crucial for robots aiming to achieve human-level dexterity. Among tactile-dependent skills, tactile-based object tracking serves as the cornerstone for many tasks, including manipulation, in-hand manipulation, and 3D reconstruction. In this work, we introduce NormalFlow, a fast, robust, and real-time tactile-based 6DoF tracking algorithm. Leveraging the precise surface normal estimation of vision-based tactile sensors, NormalFlow determines object movements by minimizing discrepancies between the tactile-derived surface normals. Our results show that NormalFlow consistently outperforms competitive baselines and can track low-texture objects like table surfaces. For long-horizon tracking, we demonstrate when rolling the sensor around a bead for 360 degrees, NormalFlow maintains a rotational tracking error of 2.5 degrees. Additionally, we present state-of-the-art tactile-based 3D reconstruction results, showcasing the high accuracy of NormalFlow. We believe NormalFlow unlocks new possibilities for high-precision perception and manipulation tasks that involve interacting with objects using hands. Please also check our supplementary video to see our method in action.
|
|
TuAT5 Regular Session, 305 |
Add to My Program |
Aerial Robots 1 |
|
|
Chair: Cheng, Bo | Pennsylvania State University |
Co-Chair: Scaramuzza, Davide | University of Zurich |
|
09:55-10:00, Paper TuAT5.1 | Add to My Program |
Nezha-MB: Design and Implementation of a Morphing Hybrid Aerial-Underwater Vehicle |
|
Xu, Zhuxiu | Shanghai Jiao Tong University |
Shen, Yishu | Shanghai Jiao Tong University |
Bi, Yuanbo | Shanghai Jiao Tong University |
Zeng, Baichuan | The Chinese University of Hong Kong |
Zeng, Zheng | Shanghai Jiao Tong University |
Keywords: Marine Robotics, Field Robots, Aerial Systems: Applications
Abstract: 航空水下航行器 (HAUV) 由于能够在空中和水域中无缝运行,因此表现出巨大的潜力。然而,在两种介质的快速可作性和在跨域阶段实现稳定性仍然是一个重大挑战。受到可伸缩四肢的启发,本文提出了一种新颖的变形 HAUV,哪吒-MB。哪吒-MB 在过渡阶段利用线性执行器结合齿轮和齿条系统进行手臂转换,取代传统的伺服系统。转化机构占总重量的 11%。在空中模式下,哪吒-MB 表现出与四旋翼配置相当的飞行性能。在水下模式下,哪吒-MB 将其四旋翼臂缩回子弹形外壳中,显着减少阻力和能耗,同时能够通过直径小至 134 毫米的狭窄间隙。在空中和水下领域进行的模拟和现场测试表ą
|
|
10:00-10:05, Paper TuAT5.2 | Add to My Program |
From Ceilings to Walls: Universal Dynamic Perching of Small Aerial Robots on Surfaces with Variable Orientations |
|
Habas, Bryan | The Pennsylvania State University |
Brown, Aaron C. | The Pennsylvania State University |
Lee, Donghyeon | The Pennsylvania State University |
Goldman, Mitchell | Penn State University |
Cheng, Bo | Pennsylvania State University |
Keywords: Aerial Systems: Applications, Surveillance Robotic Systems, AI-Enabled Robotics
Abstract: This work demonstrates universal dynamic perching capabilities for quadrotors of various sizes and on surfaces with different orientations. By employing a non-dimensionalization framework and deep reinforcement learning, we systematically assessed how robot size and surface orientation affect landing capabilities. We hypothesized that maintaining geometric proportions across different robot scales ensures consistent perching behavior, which was validated in both simulation and experimental tests. Additionally, we investigated the effects of joint stiffness and damping in the landing gear on perching behaviors and performance. While joint stiffness had minimal impact, joint damping ratios influenced landing success under vertical approaching conditions. The study also identified a critical velocity threshold necessary for successful perching, determined by the robot's maneuverability and leg geometry. Overall, this research advances robotic perching capabilities, offering insights into the role of mechanical design and scaling effects, and lays the groundwork for future drone autonomy and operational efficiency in unstructured environments.
|
|
10:05-10:10, Paper TuAT5.3 | Add to My Program |
Towards Perpetually-Deployable Ubiquitous Aerial Robotics: An Amphibious Self-Sustainable Solar Small-UAS |
|
Carlson, Stephen | University of Nevada, Reno |
Arora, Prateek | University of Nevada, Reno |
Papachristos, Christos | University of Nevada Reno |
Keywords: Field Robots, Aerial Systems: Applications
Abstract: This work deals with the problem of unlocking perpetual deployment capabilities for small-UAS robotics across the diverse settings of the real world and their challenges, encompassing considerations for marine environments alongside the more common terrestrial ones. Via the progress made within this scope, a step towards truly ubiquitous and self-sustainable aerial robotics is accomplished. The work consists of the development of the Gannet Solar-VTOL, a waterproof small-UAS that is capable of resting on the surface of water for prolonged periods of time and over varying temperature ranges, while harvesting solar power to recharge itself. Equally importantly, it integrates a field-proven Self-Sustainable Autonomous System architecture that allows it to hibernate and sustain its battery charge overnight or during periods of solar illumination scarcity, as well as to assess mission-critical parameters (e.g., water surface turbulence, ambient temperature of battery compartment) on the low-power side of the Power Management Stack, and react appropriately. Finally, the robot is equipped with an onboard camera and a Neural Processing Unit that allows it to perform in-field environmental monitoring operations (e.g., wildfire detection). This paper experimentally demonstrates the aforementioned capabilities, and concludes with a presentation of the amphibious small-UAS' long-term deployment within a marine environment in the N.Nevada region, spanning over 3 consecutive days.
|
|
10:10-10:15, Paper TuAT5.4 | Add to My Program |
Autonomous Drone for Dynamic Smoke Plume Tracking |
|
Pal, Srijan Kumar | University of Minnesota |
Sharma, Shashank | University of Minnesota |
Krishnakumar, Nikil | University of Minnesota |
Hong, Jiarong | University of Minnesota |
Keywords: Aerial Systems: Perception and Autonomy, Reinforcement Learning, Vision-Based Navigation
Abstract: This paper presents a novel autonomous drone-based smoke plume tracking system capable of navigating and tracking plumes in highly unsteady atmospheric conditions. The system integrates advanced hardware and software and a comprehensive simulation environment to ensure robust performance in controlled and real-world settings. The quadrotor, equipped with a high-resolution imaging system and an advanced onboard computing unit, performs precise maneuvers while accurately detecting and tracking dynamic smoke plumes under fluctuating conditions. Our software implements a two-phase flight operation: descending into the smoke plume upon detection and continuously monitoring the smoke's movement during in-plume tracking. Leveraging Proportional Integral–Derivative (PID) control and a Proximal Policy Optimization (PPO) based Deep Reinforcement Learning (DRL) controller enables adaptation to plume dynamics. Unreal Engine simulation evaluates performance under various smoke-wind scenarios, from steady flow to complex, unsteady fluctuations, showing that while the PID controller performs adequately in simpler scenarios, the DRL-based controller excels in more challenging environments. Field tests corroborate these findings. This system opens new possibilities for drone-based monitoring in areas like wildfire management and air quality assessment. The successful integration of DRL for real-time decision-making advances autonomous drone control for dynamic environments.
|
|
10:15-10:20, Paper TuAT5.5 | Add to My Program |
EvMAPPER: High-Altitude Orthomapping with Event Cameras |
|
Cladera, Fernando | University of Pennsylvania |
Chaney, Kenneth | University of Pennsylvania |
Hsieh, M. Ani | University of Pennsylvania |
Taylor, Camillo Jose | University of Pennsylvania |
Kumar, Vijay | University of Pennsylvania |
Keywords: Mapping, Field Robots, Aerial Systems: Applications
Abstract: Traditionally, unmanned aerial vehicles (UAVs) rely on CMOS-based cameras to collect images about the world below. One of the most successful applications of UAVs is to generate orthomosaics or orthomaps, in which a series of images are integrated to develop a larger map. However, using CMOS-based cameras with global or rolling shutters means that orthomaps are vulnerable to challenging light conditions, motion blur, and high-speed motion of independently moving objects (IMOs) under the camera. Event cameras are less sensitive to these issues, as their pixels trigger asynchronously on brightness changes. This work introduces the first orthomosaic approach using event cameras. We focus on addressing high-dynamic range and low-light problems in orthomosaics. In contrast to existing methods relying only on CMOS cameras, our approach enables map generation even in challenging light conditions, including direct sunlight and after sunset.
|
|
10:20-10:25, Paper TuAT5.6 | Add to My Program |
Survey of Simulators for Aerial Robots: An Overview and In-Depth Systematic Comparisons |
|
Dimmig, Cora | Johns Hopkins University |
Silano, Giuseppe | Ceske Vysoke Uceni Technicke V Praze, FEL |
McGuire, Kimberly | Bitcraze AB |
Gabellieri, Chiara | University of Twente |
Hoenig, Wolfgang | TU Berlin |
Moore, Joseph | Johns Hopkins University |
Kobilarov, Marin | Johns Hopkins University |
Keywords: Aerial Systems: Perception and Autonomy, Simulation and Animation, Software, Middleware and Programming Environments
Abstract: Uncrewed Aerial Vehicle (UAV) research faces challenges with safety, scalability, costs, and ecological impact when conducting hardware testing. High-fidelity simulators offer a vital solution by replicating real-world conditions to enable the development and evaluation of novel perception and control algorithms. However, the large number of available simulators poses a significant challenge for researchers to determine which simulator best suits their specific use-case, based on each simulator’s limitations and customization readiness. In this paper we present an overview of 44 UAV simulators, including in-depth, systematic comparisons for 14 of the simulators. Additionally, we present a set of decision factors for selection of simulators, aiming to enhance the efficiency and safety of research endeavors.
|
|
10:25-10:30, Paper TuAT5.7 | Add to My Program |
Robotics Meets Fluid Dynamics: A Characterization of the Induced Airflow below a Quadrotor As a Turbulent Jet |
|
Bauersfeld, Leonard | University of Zurich (UZH) |
Muller, Koen | ETH Zürich |
Ziegler, Dominic | IFD, ETH Zürich |
Coletti, Filippo | ETH Zürich |
Scaramuzza, Davide | University of Zurich |
Keywords: Aerial Systems: Applications, Calibration and Identification, Robust/Adaptive Control
Abstract: The widespread adoption of quadrotors for diverse applications, from agriculture to public safety, necessitates an understanding of the aerodynamic disturbances they create. This paper introduces a computationally lightweight model for estimating the time-averaged magnitude of the induced flow below quadrotors in hover. Unlike related approaches that rely on expensive computational fluid dynamics (CFD) simulations or drone specific time-consuming empirical measurements, our method leverages classical theory from turbulent flows. By analyzing over 16 hours of flight data from drones of varying sizes within a large motion capture system, we show for the first time that the combined flow from all drone propellers is well-approximated by a turbulent jet after 2.5 drone-diameters below the vehicle. Using a novel normalization and scaling, we experimentally identify model parameters that describe a unified mean velocity field below differently sized quadrotors. The model, which requires only the drone's mass, propeller size, and drone size for calculations, accurately describes the far-field airflow over a long-range in a very large volume which is impractible to simulate using CFD. Our model offers a practical tool for ensuring safer operations near humans, optimizing sensor placements and drone control in multi-agent scenarios. We demonstrate the latter by designing a controller that compensates for the downwash of another drone, leading to a four times lower altitude deviation when passing below.
|
|
TuAT6 Regular Session, 307 |
Add to My Program |
Perception for Mobile Robots 1 |
|
|
Chair: Everett, Michael | Northeastern University |
Co-Chair: Liang, Claire Yilan | Cornell University |
|
09:55-10:00, Paper TuAT6.1 | Add to My Program |
CoDynTrust: Robust Asynchronous Collaborative Perception Via Dynamic Feature Trust Modulus |
|
Xu, Yunjiang | Soochow University |
Li, Lingzhi | Soochow University |
Wang, Jin | Soochow University |
Yang, Benyuan | Xidian University |
Wu, ZhiWen | Soochow University |
Chen, Xinhong | City University of Hong Kong |
Wang, Jianping | City University of Hong Kong |
Keywords: Object Detection, Segmentation and Categorization, Multi-Robot Systems, Intelligent Transportation Systems
Abstract: Collaborative perception, fusing information from multiple agents, can extend perception range so as to improve perception performance. However, temporal asynchrony in real-world environments, caused by communication delays, clock misalignment, or sampling configuration differences, can lead to information mismatches. If this is not well handled, then the collaborative performance is patchy, and what's worse safety accidents may occur. To tackle this challenge, we propose CoDynTrust, an uncertainty-encoded asynchronous fusion perception framework that is robust to the information mismatches caused by temporal asynchrony. CoDynTrust generates dynamic feature trust modulus (DFTM) for each region of interest by modeling aleatoric and epistemic uncertainty as well as selectively suppressing or retaining single-vehicle features, thereby mitigating information mismatches. We then design a multi-scale fusion module to handle multi-scale feature maps processed by DFTM. Compared to existing works that also consider asynchronous collaborative perception, CoDynTrust combats various low-quality information in temporally asynchronous scenarios and allows uncertainty to be propagated to downstream tasks such as planning and control. Experimental results demonstrate that CoDynTrust significantly reduces performance degradation caused by temporal asynchrony across multiple datasets, achieving state-of-the-art detection performance even with temporal asynchrony. The code is available at https://github.com/CrazyShout/CoDynTrust.
|
|
10:00-10:05, Paper TuAT6.2 | Add to My Program |
The Devil Is in the Quality: Exploring Informative Samples for Semi-Supervised Monocular 3D Object Detection |
|
Zhang, Zhipeng | KargoBot |
Li, Zhenyu | KAUST |
Wang, Hanshi | CASIA |
Yuan, He | KargoBot |
Wang, Ke | Kargobot.AI |
Fan, Heng | University of North Texas |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Recognition
Abstract: This paper tackles the challenging problem of semi-supervised monocular 3D object detection with a general framework. In specific, having observed that the bottleneck of this task lies in lacking reliable and informative samples from unlabeled data for detector learning, we introduce a novel simple yet effective `Augment and Criticize' pipeline that mines abundant informative samples for robust detection. To be more specific, in the `Augment' stage, we present the Augmentation-based Prediction aGgregation (APG), which applies automatically learned transformations to unlabeled images and aggregates detections from various augmented views as pseudo labels. Since not all the pseudo labels from APG are beneficially informative, the subsequent `Criticize' phase is introduced. Particularly, we present the Critical Retraining Strategy (CRS) that, unlike simply filtering pseudo labels using a fixed threshold, employs a learnable network to evaluate the contribution of unlabeled images at different training timestamps. This way, the noisy samples prohibitive to model evolution can be effectively suppressed. In order to validate `Augment-Criticize', we apply it to MonoDLE and MonoFlex, and the two new detectors, dubbed 3DSeMo_DLE and 3DSeMo_FLEX, achieve state-of-the-art results with consistent improvements, evidencing its effectiveness and generality.
|
|
10:05-10:10, Paper TuAT6.3 | Add to My Program |
MonoCT: Overcoming Monocular 3D Detection Domain Shift with Consistent Teacher Models |
|
Meier, Johannes | Technical University of Munich, DeepScenario |
Inchingolo, Louis | Technical University of Munich |
Dhaouadi, Oussema | Technical University of Munich |
Xia, Yan | Technical University of Munich |
Kaiser, Jacques | DeepScenario |
Cremers, Daniel | Technical University of Munich |
Keywords: Object Detection, Segmentation and Categorization, Computer Vision for Transportation, Deep Learning for Visual Perception
Abstract: We tackle the problem of monocular 3D object detection across different sensors, environments, and camera setups. In this paper, we introduce a novel unsupervised domain adaptation approach, MonoCT, that generates highly accurate pseudo labels for self-supervision. Inspired by our observation that accurate depth estimation is critical to mitigating domain shifts, MonoCT introduces a novel Generalized Depth Enhance- ment (GDE) module with an ensemble concept to improve depth estimation accuracy. Moreover, we introduce a novel Pseudo Label Scoring (PLS) module by exploring inner-model consistency measurement and a Diversity Maximization (DM) strategy to further generate high-quality pseudo labels for self-training. Extensive experiments on six benchmarks show that MonoCT outperforms existing SOTA domain adaptation methods by large margins (∼21% minimum for AP Mod.) and generalizes well to car, traffic camera and drone views.
|
|
10:10-10:15, Paper TuAT6.4 | Add to My Program |
LiDAR Inertial Odometry and Mapping Using Learned Registration-Relevant Features |
|
Dong, Zihao | Northeastern University |
Pflueger, Jeff | Northeastern University |
Jung, Leonard | Northeastern University |
Thorne, David | University of California, Los Angeles |
Osteen, Philip | U.S. Army Research Laboratory |
Robison, Christopher, Christa | Army Research Laboratory |
Lopez, Brett | University of California, Los Angeles |
Everett, Michael | Northeastern University |
Keywords: AI-Based Methods, Localization, SLAM
Abstract: SLAM is an important capability for many autonomous systems, and modern LiDAR-based methods offer promising performance. However, for long duration missions, existing works that either take directly the full pointclouds or extracted features face key tradeoffs in accuracy and computational efficiency (e.g., memory consumption). To address these issues, this paper presents DFLIOM with several key innovations. Unlike previous methods that rely on handcrafted heuristics and hand-tuned parameters for feature extraction, we propose a learning-based approach that select points relevant to LiDAR SLAM pointcloud registration. Furthermore, we extend our prior work DLIOM with the learned feature extractor and observe our method enables similar or even better localization performance using only about 20% of the points in the dense point clouds. We demonstrate that DFLIOM performs well on multiple public benchmarks, achieving a 2.4% decrease in localization error and 57.5% decrease in memory usage compared to state-of-the-art methods (DLIOM). Although extracting features with the proposed network requires extra time, it is offset by the faster processing time downstream, thus maintaining real-time performance using 20Hz LiDAR on our hardware setup. The effectiveness of our learning-based feature extraction module is further demonstrated through comparison with several handcrafted feature extractors.
|
|
10:15-10:20, Paper TuAT6.5 | Add to My Program |
DreamDrive: Generative 4D Scene Modeling from Street View Images |
|
Mao, Jiageng | University of Southern California |
Li, Boyi | UC Berkeley |
Ivanovic, Boris | NVIDIA |
Chen, Yuxiao | Nvidia Research |
Wang, Yan | NVIDIA |
You, Yurong | Cornell University |
Xiao, Chaowei | University of Wisconsin, Madison |
Xu, Danfei | Georgia Institute of Technology |
Pavone, Marco | Stanford University |
Wang, Yue | USC |
Keywords: Computer Vision for Automation, Autonomous Vehicle Navigation, Virtual Reality and Interfaces
Abstract: Synthesizing photo-realistic visual observations from an ego vehicle's driving trajectory is a critical step towards scalable training of self-driving models. Reconstruction-based methods create 3D scenes from driving logs and synthesize geometry-consistent driving videos through neural rendering, but their dependence on costly object annotations limits their ability to generalize to in-the-wild driving scenarios. On the other hand, generative models can synthesize action-conditioned driving videos in a more generalizable way but often struggle with maintaining 3D visual consistency. In this paper, we present ourmethod{}, a 4D spatial-temporal scene generation approach that combines the merits of generation and reconstruction, to synthesize generalizable 4D driving scenes and dynamic driving videos with 3D consistency. Specifically, we leverage the generative power of video diffusion models to synthesize a sequence of visual references and further elevate them to 4D with a novel hybrid Gaussian representation. Given a driving trajectory, we then render 3D-consistent driving videos via Gaussian splatting. The use of generative priors allows our method to produce high-quality 4D scenes from in-the-wild driving data, while neural rendering ensures 3D-consistent video generation from the 4D scenes. Extensive experiments on nuScenes and in-the-wild driving data demonstrate that ourmethod{} can generate controllable and generalizable 4D driving scenes, synthesize novel views of driving videos with high fidelity and 3D consistency, decompose static and dynamic elements in a self-supervised manner, and enhance perception and planning tasks for autonomous driving.
|
|
10:20-10:25, Paper TuAT6.6 | Add to My Program |
DISORF: A Distributed Online 3D Reconstruction Framework for Mobile Robots |
|
Li, Chunlin | University of Toronto |
Fan, Hanrui | University of Toronto |
Huang, Xiaorui | University of Toronto |
Liang, Ruofan | University of Toronto |
Durvasula, Sankeerth | University of Toronto |
Vijaykumar, Nandita | University of Toronto |
Keywords: Visual Learning, Incremental Learning, Mapping
Abstract: We present a framework, DISORF, to enable online 3D reconstruction and visualization of scenes captured by resource-constrained mobile robots and edge devices. To address the limited computing capabilities of edge devices and potentially limited network availability, we design a framework that efficiently distributes computation between the edge device and the remote server. We leverage on-device SLAM systems to generate posed keyframes and transmit them to remote servers that can perform high-quality 3D reconstruction and visualization at runtime by leveraging recent advances in neural 3D methods. We identify a key challenge with online training where naive image sampling strategies can lead to significant degradation in rendering quality. We propose a novel shifted exponential frame sampling method that addresses this challenge for online training. We demonstrate the effectiveness of our framework in enabling high-quality real-time reconstruction and visualization of unknown scenes as they are captured and streamed from cameras in mobile robots and edge devices.
|
|
10:25-10:30, Paper TuAT6.7 | Add to My Program |
Key-Scan-Based Mobile Robot Navigation: Integrated Mapping, Planning, and Control Using Graphs of Scan Regions |
|
Bashkaran Latha, Dharshan | Eindhoven University of Technology |
Arslan, Omur | Eindhoven University of Technology |
Keywords: Reactive and Sensor-Based Planning, Integrated Planning and Control, Motion and Path Planning
Abstract: Safe autonomous navigation in a priori unknown environments is an essential skill for mobile robots to reliably and adaptively perform diverse tasks (e.g., delivery, inspection, and interaction) in unstructured cluttered environments. Hybrid metric-topological maps, constructed as a pose graph of local submaps, offer a computationally efficient world representation for adaptive mapping, planning, and control at the regional level. In this paper, we consider a pose graph of locally sensed star-convex scan regions as a metric-topological map, with star convexity enabling simple yet effective local navigation strategies. We design a new family of safe local scan navigation policies and present a perception-driven feedback motion planning method through the sequential composition of local scan navigation policies, enabling provably correct and safe robot navigation over the union of local scan regions. We introduce a new concept of frontier and bridging scans for automated key scan selection and exploration for integrated mapping and navigation in unknown environments. We demonstrate the effectiveness of our key-scan-based navigation and mapping framework using a mobile robot equipped with a 360˝ laser range scanner in 2D cluttered environments through numerical ROS-Gazebo simulations and real hardware experiments.
|
|
TuAT7 Regular Session, 309 |
Add to My Program |
Legged Locomotion: Novel Methods |
|
|
Chair: Lynch, Kevin | Northwestern University |
Co-Chair: Kim, Joohyung | University of Illinois Urbana-Champaign |
|
09:55-10:00, Paper TuAT7.1 | Add to My Program |
Angular Divergent Component of Motion: A Step towards Planning Spatial DCM Objectives for Legged Robots |
|
Herron, Connor | Virginia Tech |
Schuller, Robert | German Aerospace Center (DLR) |
Beiter, Benjamin | Virginia Polytechnic Institute and State University |
Griffin, Robert J. | Institute for Human and Machine Cognition (IHMC) |
Leonessa, Alexander | Virginia Tech |
Englsberger, Johannes | DLR (German Aerospace Center) |
Keywords: Humanoid and Bipedal Locomotion, Body Balancing, Whole-Body Motion Planning and Control
Abstract: In this work, the Divergent Component of Motion (DCM) method is expanded to include angular coordinates for the first time. This work introduces the idea of spatial DCM, which adds an angular objective to the existing linear DCM theory. To incorporate the angular component into the framework, a discussion is provided on extending beyond the linear motion of the Linear Inverted Pendulum model (LIPM) towards the Single Rigid Body model (SRBM) for DCM. This work presents the angular DCM theory for a 1D rotation, simplifying the SRBM rotational dynamics to a flywheel to satisfy necessary linearity constraints. The 1D angular DCM is mathematically identical to the linear DCM and defined as an angle which is ahead of the current body rotation based on the angular velocity. This theory is combined into a 3D linear and 1D angular DCM framework, with discussion on the feasibility of simultaneously achieving both sets of objectives. A simulation in MATLAB and hardware results on the TORO humanoid are presented to validate the framework's performance.
|
|
10:00-10:05, Paper TuAT7.2 | Add to My Program |
Finite-Step Capturability and Recursive Feasibility for Bipedal Walking in Constrained Regions |
|
Kumbhar, Shubham | University of Delaware |
Kulkarni, Abhijeet Mangesh | University of Delaware |
Poulakakis, Ioannis | University of Delaware |
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Humanoid Robot Systems
Abstract: This paper presents a Model Predictive Control (MPC) formulation for bipedal footstep planning based on the Linear Inverted Pendulum (LIP) model, ensuring recursive feasibility when navigating restricted regions. The proposed approach incorporates capturability and introduces a new constraint that forces the Divergent Component of Motion (DCM) into a finite-step capture region, adjusted between consecutive MPC calls. This constraint enables the MPC to anticipate beyond its prediction horizon, preventing collisions with the walking surface boundaries. We validate the approach through high-fidelity simulations with the bipedal robot Digit, demonstrating recursively feasible MPC footstep planning in restricted regions. Future efforts will extend the approach to general polytopic constraints, thereby facilitating footstep planning in cluttered environments while preserving the MPC's recursive feasibility.
|
|
10:05-10:10, Paper TuAT7.3 | Add to My Program |
Realtime Limb Trajectory Optimization for Humanoid Running through Centroidal Angular Momentum Dynamics |
|
Sovukluk, Sait | TU Wien |
Schuller, Robert | German Aerospace Center (DLR) |
Englsberger, Johannes | DLR (German Aerospace Center) |
Ott, Christian | TU Wien |
Keywords: Humanoid and Bipedal Locomotion, Whole-Body Motion Planning and Control, Optimization and Optimal Control
Abstract: One of the essential aspects of humanoid robot running is determining the limb-swinging trajectories. During the flight phases, where the ground reaction forces are not available for regulation, the limb swinging trajectories are significant for the stability of the next stance phase. Due to the conservation of angular momentum, improper leg and arm swinging results in highly tilted and unsustainable body configurations at the next stance phase landing. In such cases, the robotic system fails to maintain locomotion independent of the stability of the center of mass trajectories. This problem is more apparent for fast and high flight time trajectories. This paper proposes a real-time nonlinear limb trajectory optimization problem for humanoid running. The optimization problem is tested on two different humanoid robot models, and the generated trajectories are verified using a running algorithm for both robots in a simulation environment.
|
|
10:10-10:15, Paper TuAT7.4 | Add to My Program |
Pitching Motion in a Humanoid Robot Using Human-Inspired Shoulder Elastic Energy and Motor Torque Optimization |
|
Nakazawa, Yuri | Waseda University |
Iwamoto, Masaki | Waseda University |
Watanabe, Ryuhya | Waseda University |
Aoki, Riku | Waseda University |
Mineshita, Hiroki | Waseda University |
Otani, Takuya | Shibaura Institute of Technology |
Kawakami, Yasuo | Waseda University |
Lim, Hun-ok | Kanagawa University |
Takanishi, Atsuo | Waseda University |
Keywords: Modeling and Simulating Humans, Humanoid Robot Systems, Human and Humanoid Motion Analysis and Synthesis
Abstract: Humanoid robots that mimic human movement have garnered significant attention in recent years. This study focuses on mimicking the efficient pitching motion of humans by incorporating two main approaches into a humanoid robot: (1) the use of elastic elements to assist joint torque, and (2) the optimization of motor torque to minimize energy consumption. This robot is intended to emulate human physical characteristics, such as mass, link length, and center of gravity, with a particular focus on utilizing the elastic energy generated during shoulder internal and external rotation. A leaf spring is attached in parallel with the motor at the shoulder pitch joint to release the elastic energy stored during shoulder external rotation, thereby assisting internal rotation in a manner similar to human biomechanics. Additionally, motor torque optimization is performed using Fujitsu's Digital Annealer to generate energy-efficient motions. Experiments conducted through simulations and with an actual pitching robot assessed the effectiveness of these technologies in mimicking human-like pitching motion. The results suggest that combining elastic elements with motion optimization techniques enable robots to achieve more efficient human-like movements.
|
|
10:15-10:20, Paper TuAT7.5 | Add to My Program |
Single-Stage Optimization of Open-Loop Stable Limit Cycles with Smooth, Symbolic Derivatives |
|
Saud Ul Hassan, Muhammad | Advanced Micro Devices, Inc |
Hubicki, Christian | Florida State University |
Keywords: Legged Robots, Optimization and Optimal Control, Passive Walking
Abstract: Open-loop stable limit cycles are foundational to legged robotics, providing inherent self-stabilization that minimizes the need for computationally intensive feedback-based gait correction. While previous methods have primarily targeted specific robotic models, this paper introduces a general framework for rapidly generating limit cycles across various dynamical systems, with the flexibility to impose arbitrarily tight stability bounds. We formulate the problem as a single-stage constrained optimization problem and use Direct Collocation to transcribe it into a nonlinear program with closed-form expressions for constraints, objectives, and their gradients. Our method supports multiple stability formulations. In particular, we tested two popular formulations for limit cycle stability in robotics: (1) based on the spectral radius of a discrete return map, and (2) based on the spectral radius of the monodromy matrix, and tested five different constraint-satisfaction formulations of the eigenvalue problem to bound the spectral radius. We compare the performance and solution quality of the various formulations on a robotic swing-leg model, highlighting the Schur decomposition of the monodromy matrix as a method with broader applicability due to weaker assumptions and stronger numerical convergence properties. As a case study, we apply our method on a hopping robot model, generating open-loop stable gaits in under 2 seconds on an Intel Core i7-6700K, while simultaneously minimizing energy consumption even under tight stability constraints.
|
|
10:20-10:25, Paper TuAT7.6 | Add to My Program |
Iterative Periodic Running Control through Swept Angle Adjustment with Modified SLIP Model |
|
Kang, Woosong | Korea Institute of Machinery & Materials(kimm) |
Jeong, Jeil | Korea Advanced Institute of Science and Technology |
Hong, Jeongwoo | Daegu Gyeongbuk Institute of Science and Technology (DGIST) |
Yeo, Changmin | DGIST |
Park, Dongil | Korea Institute of Machinery and Materials (KIMM) |
Oh, Sehoon | DGIST |
Keywords: Legged Robots, Dynamics, Humanoid and Bipedal Locomotion
Abstract: This paper presents a periodic running control strategy based on a modified Spring-Loaded Inverted Pendulum (SLIP) model to achieve stable running at various velocities. While the traditional SLIP model is valued for its simplicity and intuitive representation of running dynamics, its limitations impede its extension and integration with feedback control systems. To address this, we introduce a novel Quasi-Linearized SLIP model (QLSLIP) that incorporates additional forces in the radial and angular directions to enable stable running across various velocities. This model simplifies the analytical representation of the stance phase and defines the required swept angle for maintaining periodic motion during the flight phase. Using this model, we develop a feedback control system that ensures the stability of QLSLIP-based periodic locomotion, even in the presence of external disturbances. This control framework optimizes trajectories and sustains periodic motion in real-time across diverse scenarios. Additionally, we propose an algorithm to extend this approach to articulated leg mechanisms. The effectiveness of the proposed algorithm is validated through simulations under various conditions, demonstrating improvements in the stability and performance of running.
|
|
10:25-10:30, Paper TuAT7.7 | Add to My Program |
Efficient, Responsive, and Robust Hopping on Deformable Terrain |
|
Lynch, Daniel | Northwestern University |
Pusey, Jason | U.S. Army Research Laboratory (ARL) |
Gart, Sean | US Army Research Lab |
Umbanhowar, Paul | Northwestern University |
Lynch, Kevin | Northwestern University |
Keywords: Legged Robots, Dynamics, Compliance and Impedance Control, Granular Media
Abstract: Legged robot locomotion is hindered by a mismatch between applications featuring deformable substrates, where legs can outperform wheels or treads, and existing planners and controllers, most of which assume flat, rigid substrates. In this study we focus on the effects of plastic ground deformation on the hop-to-hop energy dynamics of a hopping robot driven by a switched-compliance energy injection controller. We derive a hop-to-hop energy return map, and we use experiments and simulations to validate this map for a real robot hopping on a real deformable substrate. By analyzing the map’s fixed points and eigenvalues, we identify constant-fixed-point surfaces in parameter space that suggest it is possible to tune control parameters for efficiency or responsiveness while targeting a desired gait energy level. We also identify conditions for which the map’s fixed points are globally stable, and we characterize the basins of attraction of fixed points when these conditions are not satisfied. We conclude by discussing the implications of this energy map for planning, control, and estimation for efficient, agile, and robust legged locomotion on deformable terrain.
|
|
TuAT8 Regular Session, 311 |
Add to My Program |
Medical Robotics 1 |
|
|
Chair: Kuntz, Alan | University of Utah |
Co-Chair: Nanayakkara, Thrishantha | Imperial College London |
|
09:55-10:00, Paper TuAT8.1 | Add to My Program |
Accounting for Hysteresis in the Forward Kinematics of Nonlinearly-Routed Tendon-Driven Continuum Robots Via a Learned Deep Decoder Network |
|
Cho, Brian Y | University of Utah |
Esser, Daniel | Vanderbilt University |
Thompson, Jordan | University of Utah |
Thach, Bao | University of Utah |
Webster III, Robert James | Vanderbilt University |
Kuntz, Alan | University of Utah |
Keywords: Surgical Robotics: Steerable Catheters/Needles, Medical Robots and Systems, Deep Learning Methods
Abstract: Tendon-driven continuum robots have been gaining popularity in medical applications due to their ability to curve around complex anatomical structures, potentially reducing the invasiveness of surgery. However, accurate modeling is required to plan and control the movements of these flexible robots. Physics-based models have limitations due to unmodeled effects, leading to mismatches between model prediction and actual robot shape. Recently proposed learning-based methods have been shown to overcome some of these limitations but do not account for hysteresis, a significant source of error for these robots. To overcome these challenges, we propose a novel deep decoder neural network that predicts the complete shape of tendon-driven robots using point clouds as the shape representation, conditioned on prior configurations to account for hysteresis. We evaluate our method on a physical tendon-driven robot and show that our network model accurately predicts the robot's shape, significantly outperforming a state-of-the-art physics-based model and a learning-based model that does not account for hysteresis.
|
|
10:00-10:05, Paper TuAT8.2 | Add to My Program |
Graph-Based Spatial Reasoning for Tracking Landmarks in Dynamic Laparoscopic Environments |
|
Zhang, Jie | Huazhong University of Science and Technology |
Wang, Yiwei | Huazhong University of Science and Technology |
Zhou, Song | Huazhong University of Science and Technology |
Zhao, Huan | Huazhong University of Science and Technology |
Wan, Chidan | Huazhong University of Science and Technology |
Cai, Xiong | Huazhong University of Science and Technology |
Ding, Han | Huazhong University of Science and Technology |
Keywords: Surgical Robotics: Laparoscopy, Semantic Scene Understanding, Medical Robots and Systems
Abstract: Accurate anatomical landmark tracking is crucial yet challenging in laparoscopic surgery due to the changing appearance of landmarks during dynamic tool-anatomy interactions and visual domain shifts between cases. Unlike appearance-based detection methods, this work proposes a novel graph-based approach to reconstruct the entire target landmark area by explicitly modeling the evolving spatial relations over time among scenario entities, including observable regions, surgical tools, and landmarks. Considering tool-anatomy interactions, we present the Tool-Anatomy Interaction Graph (TAI-G), a spatio-temporal graph that captures spatial dependencies among entities, attribute interactions within entities, and temporal dependencies of spatial relations. To mitigate domain shifts, geometric segmentation features are designated as node attributes, representing domain-invariant image information in the graph space. Message passing with attention helps propagate information across TAI-G, enhancing robust tracking by reconstructing landmark data. Evaluated on laparoscopic cholecystectomy, our framework demonstrates effective handling of complex tool-anatomy interactions and visual domain gaps to accurately track landmarks, showing promise in enhancing the stability and reliability of intricate surgical tasks.
|
|
10:05-10:10, Paper TuAT8.3 | Add to My Program |
A Robust Deep Reinforcement Learning Framework for Image-Based Autonomous Guidewire Navigation |
|
Yoo, Sangbaek | KAIST |
Kwon, Hojun | KAIST |
Choi, Jaesoon | Asan Medical Center |
Chang, Dong Eui | KAIST |
Keywords: Reinforcement Learning, Medical Robots and Systems, Vision-Based Navigation
Abstract: Percutaneous coronary intervention (PCI) involves the insertion of a catheter or guidewire into a blood vessel of a patient, which poses a problem as a doctor is exposed to radiation during the procedure. The use of assistive robots has been proposed to address this issue. Furthermore, recent research is progressing toward complete autonomous navigation using deep reinforcement learning (DRL). Nevertheless, existing algorithms face limitations when operating in numerous unseen environments close to real PCI. This study proposes a robust DRL framework for image-based guidewire navigation to overcome the limitation. We introduce a subtasks strategy and domain randomization to improve robustness in various environments. The subtasks strategy consistently addresses complex global tasks by breaking them into subtasks designed using local maps, allowing them to be robustly solved by a single agent. Domain randomization is applied to handle real PCI issues, including variations in vessel geometry, guidewire deformation, and camera settings. By integrating the two novel methods, our DRL algorithm demonstrates superior performance compared to existing methods across various challenging simulation and phantom environments, validating its effectiveness in real-world scenarios. A video of our experiment is available at url{https://youtu.be/93Q88gESzOY}.
|
|
10:10-10:15, Paper TuAT8.4 | Add to My Program |
CTS: A Consistency-Based Medical Image Segmentation Model |
|
Zhang, Kejia | Harbin Engineering University |
Zhang, Lan | Harbin Engineering University |
Pan, Haiwei | Harbin Engineering University |
Keywords: Deep Learning Methods, Computer Vision for Medical Robotics
Abstract: 在医学图像分割任务中,扩散模型具有 显示出巨大的潜力。然而,主流 扩散模型显示出包括多次采样在内的缺点 时间和慢速预测结果。最近,作为 独立生成网络,一致性模型具有 解决了现有问题。与扩散模型相比, 一致性模型可以将采样时间缩短到一次, 不仅可以实现类似的生成效果,而且 显著加快训练和预测速度。 但是,它们不适合图像分割 任务。同时,它们在医学成像中的应用 Field 尚未接受调查。因此,本研究 采用一致性模型执行医学影像 分割任务,设计多尺度特征信号 监管模式及损失功能引导实现 模型收敛。实验表明, CTS 模型能够获得更好的医学图像 分Ó
|
|
10:15-10:20, Paper TuAT8.5 | Add to My Program |
An Adversarial Learning Framework for Reliable Myoelectric Force Estimation under Fatigue |
|
Pan, Huiming | Shanghai Jiao Tong University |
Li, Dongxuan | Shanghai Jiao Tong University |
Chen, Chen | Shanghai Jiao Tong University |
Jiang, Shuo | Tongji University |
Shull, Peter B. | Shanghai Jiao Tong University |
Keywords: Prosthetics and Exoskeletons, Deep Learning Methods, Force and Tactile Sensing
Abstract: Electromyography (EMG) signals are widely used as control inputs for myoelectric exoskeletons. However, muscle fatigue, which can result from prolonged use or heavy loads, significantly affects muscle activation patterns, leading to reduced estimation accuracy. To address this challenge, we propose an adversarial learning framework to enhance grip force estimation under fatigue conditions. The framework consists of three key components: a domain-invariant feature extractor to mitigate domain shifts between non-fatigue and fatigue states, a force estimator to predict grip forces from these domain-invariant features, and a domain discriminator to distinguish between the two domains. The proposed method was evaluated on a dataset collected from eight participants performing gripping tasks under both non-fatigue and fatigue conditions, during which high-density EMG signals and grip forces were recorded simultaneously. Experimental results demonstrated that our method significantly reduced the root mean square error (RMSE) from 0.264 to 0.127, outperforming a baseline model consisting of only the feature extractor and force estimator ( p < 0.01). Additionally, the proposed approach exhibited consistent performance across all participants, highlighting its robustness and generalizability. These findings suggest that the proposed adversarial learning framework effectively enhances grip force estimation accuracy under muscle fatigue, offering a promising solution for improving the reliability and usability of myoelectric exoskeletons.
|
|
10:20-10:25, Paper TuAT8.6 | Add to My Program |
An Origami-Inspired Endoscopic Capsule with Tactile Perception for Early Tissue Anomaly Detection |
|
Ge, Yukun | Imperial College London |
Zong, Rui | Imperial College London |
Chen, Xiaoshuai | Imperial College London |
Nanayakkara, Thrishantha | Imperial College London |
Keywords: Soft Sensors and Actuators, Force and Tactile Sensing, Medical Robots and Systems
Abstract: Video Capsule Endoscopy (VCE) is currently one of the most effective methods for detecting intestinal diseases. However, it is challenging to detect early-stage small nodules with this method because they lack obvious color or shape features. In this letter, we present a new origami capsule endoscope to detect early small intestinal nodules using tactile sensing. Four soft tactile sensors made out of piezoresistive material feed four channels of phase-shifted data that are processed using a particle filter. The particle filter uses an importance assignment template designed using experimental data from six known sizes ofnodules. Moreover, the proposed capsule can use shape changes to passively move forward or backward under peristalsis, enabling it to reach any position in the intestine for detection. Experimental results show that the proposed capsule can detect nodules of more than 3mm diameter with 100% accuracy.
|
|
10:25-10:30, Paper TuAT8.7 | Add to My Program |
Exploring the Limitations and Implications of the JIGSAWS Dataset for Robot-Assisted Surgery |
|
Hendricks, Antonio | Univeristy of Florida |
Panoff, Maximillian | University of Florida |
Xiao, Kaiwen | University of Florida |
Wang, Zhaoqi | University of Florida |
Wang, Shuo | University of Florida |
Bobda, Christophe | University of Florida |
Keywords: Surgical Robotics: Laparoscopy, Medical Robots and Systems, Performance Evaluation and Benchmarking
Abstract: The JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) dataset has proven to be a foundational component of modern work on the skill analysis of robotic surgeons. In particular, methods using either the system's kinematics or video data have shown to be able to classify operators into distinct experience levels, and recent approaches have even ventured to recover numeric skill ratings assigned to assessment sessions. Although prior works have achieved positive results in these directions, challenges still remain with classification across all three levels of operator training amounts and objective skill rating regressions. To this end, we perform the first statistical analysis of the dataset itself and compile the results here. We find limited relationships between the amount of experience or training of an operator and their performance in JIGSAWS. Moreover, as operator-side kinematics have well-known relationships with their skill, previous works have used both robot and operator-side kinematics to classify operator skill; we find the first explicit relationships between pure robot-side kinematics and surgical performance. Finally, we analyze the robotic kinematic trends associated with high performance in JIGSAWS tasks and present how they may be used as indicators in human and automated surgeon training.
|
|
TuAT9 Regular Session, 312 |
Add to My Program |
Motion Planning 1 |
|
|
Chair: Alonso-Mora, Javier | Delft University of Technology |
Co-Chair: Ren, Hongliang | Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS) |
|
09:55-10:00, Paper TuAT9.1 | Add to My Program |
Path Planning Using Instruction-Guided Probabilistic Roadmaps |
|
Bao, Jiaqi | CyberAgent |
Yonetani, Ryo | CyberAgent |
Keywords: Integrated Planning and Learning, Motion and Path Planning, Autonomous Vehicle Navigation
Abstract: This work presents a novel data-driven path planning algorithm named Instruction-Guided Probabilistic Roadmap (IG-PRM). Despite the recent development and widespread use of mobile robot navigation, the safe and effective travels of mobile robots still require significant engineering effort to take into account the constraints of robots and their tasks. With IG-PRM, we aim to address this problem by allowing robot operators to specify such constraints through natural language instructions, such as ``aim for wider paths'' or ``mind small gaps''. The key idea is to convert such instructions into embedding vectors using large-language models (LLMs) and use the vectors as a condition to predict instruction-guided cost maps from occupancy maps. By constructing a roadmap based on the predicted costs, we can find instruction-guided paths via the standard shortest path search. Experimental results demonstrate the effectiveness of our approach on both synthetic and real-world indoor navigation environments.
|
|
10:00-10:05, Paper TuAT9.2 | Add to My Program |
Pushing through Clutter with Movability Awareness of Blocking Obstacles |
|
Weeda, Joris J. | TU Delft |
Bakker, Saray | Delft University of Technology |
Chen, Gang | Delft University of Technology |
Alonso-Mora, Javier | Delft University of Technology |
Keywords: Motion and Path Planning, Collision Avoidance, Integrated Planning and Control
Abstract: Navigation Among Movable Obstacles (NAMO) poses a challenge for traditional path-planning methods when obstacles block the path, requiring push actions to reach the goal. We propose a framework that enables movability-aware planning to overcome this challenge without relying on explicit obstacle placement. Our framework integrates a global Semantic Visibility Graph and a local Model Predictive Path Integral (SVG-MPPI) approach to efficiently sample rollouts, taking into account the continuous range of obstacle movability. A physics engine is adopted to simulate the interaction result of the rollouts with the environment, and generate trajectories with lower contact force. Qualitative and quantitative experiments suggest that SVG-MPPI outperforms existing paradigm that uses only binary movability for planning, achieving higher success rates with reduced cumulative contact forces. Our code is available at: https://github.com/tud-amr/SVG-MPPI
|
|
10:05-10:10, Paper TuAT9.3 | Add to My Program |
Improving Efficiency in Path Planning: Tangent Line Decomposition Algorithm |
|
Tian, Yu | The Chinese University of Hong Kong |
Ren, Hongliang | Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS) |
Keywords: Motion and Path Planning, Collision Avoidance
Abstract: This paper introduces a tangent line decomposition (TLD) algorithm that efficiently finds collision-free paths close to optimal in both 2D and 3D environment. Compared with the existing visibility line-based algorithms, the proposed algorithm innovatively proposed the concept of tangent line decomposition, which decomposes complicated planning into many simple steps. For each step, only one key obstacle is taken into consideration. Besides, instead of constructing a complete graph, a best-first search algorithm is used to avoid searching redundant edges. The path planned by the algorithm is not the optimal path. However, following the idea of the informed RRT* algorithm, the path length planned by TLD can be used as a precondition for other optimal algorithms. In this way, the overall efficiency can be significantly improved. The simulations show that the proposed methods outperform existing methods regarding planning efficiency and solution quality.
|
|
10:10-10:15, Paper TuAT9.4 | Add to My Program |
Gradient Guided Search for Aircraft Contingency Landing Planning |
|
Tekaslan, Huseyin Emre | Virginia Tech |
Atkins, Ella | University of Michigan |
Keywords: Motion and Path Planning, Aerial Systems: Applications, Aerial Systems: Perception and Autonomy
Abstract: This paper presents a three-dimensional discrete search path planner for fixed-wing aircraft emergency landing planning that manages state-space complexity by incorporating cost gradients to assure descent flight path angle and runway heading alignment constraints are met. Our approach incorporates steady wind and maximizes margin from flight envelope boundaries to accommodate wind variation in a manner commensurate with a loss of thrust condition. A novel multi-objective cost function that combines gradient-based path guidance and population risk metrics is implemented to efficiently enable discrete search to find a robust solution. The proposed method is demonstrated through use cases with population data for a region of Long Island, New York that highlight our algorithm's effectiveness.
|
|
10:15-10:20, Paper TuAT9.5 | Add to My Program |
Search-Based Path Planning in Interactive Environments among Movable Obstacles |
|
Ren, Zhongqiang | Shanghai Jiao Tong University |
Suvonov, Bunyod | Shanghai Jiao Tong University |
Chen, Guofei | Carnegie Mellon University |
He, Botao | University of Maryland |
Liao, Yijie | Shanghai Jiao Tong University |
Fermuller, Cornelia | University of Maryland |
Zhang, Ji | Carnegie Mellon University |
Keywords: Motion and Path Planning
Abstract: This paper investigates Path planning Among Movable Obstacles (PAMO), which seeks a minimum cost collision-free path among static obstacles from start to goal while allowing the robot to push away movable obstacles (i.e., objects) along its path when needed. To develop planners that are complete and optimal for PAMO, the planner has to search a giant state space involving both the location of the robot as well as the locations of the objects, which grows exponentially with respect to the number of objects. This paper leverages a simple yet under-explored idea that, only a small fraction of this giant state space needs to be searched during planning as guided by a heuristic, and most of the objects far away from the robot are intact, which thus leads to runtime efficient algorithms. Based on this idea, this paper introduces two PAMO formulations, i.e., bi-objective and resource constrained problems in an occupancy grid, and develops PAMO*, a planning method with completeness and solution optimality guarantees, to solve the two problems. We then further extend PAMO* to hybrid-state PAMO* to plan in continuous spaces with high-fidelity interaction between the robot and the objects. Our results show that, PAMO* can often find optimal solutions within a second in cluttered maps with up to 400 objects.
|
|
10:20-10:25, Paper TuAT9.6 | Add to My Program |
Neural Encodings for Energy-Efficient Motion Planning |
|
Shah, Deval | The University of British Columbia |
Zhao, Jocelyn | University of British Columbia |
Aamodt, Tor Michael | University of British Columbia |
Keywords: Motion and Path Planning, Energy and Environment-Aware Automation, Deep Learning Methods
Abstract: Neural motion planners can increase motion plan- ning quality and, by reducing collision detection computations, improve runtime. However, when profiled on an accelerator-rich hardware system, neural planning contributes to more than 50% of the runtime, and 33% of the computation energy consumption, motivating the design of compute- and energy-efficient neural planners. In this work, we propose a neural planner using Binary Encoded Labels (BEL), where a set of binary classifiers are used instead of a typical regression network. Compared to conventional regression-based neural planners, the proposed BEL neural planner reduces neural planning (inference) computation and collision detection checks while maintaining equal or higher motion planning success rate across various motion planning benchmarks. This computation reduction can improve the energy efficiency of neural planning by 1.4x−21.4x. Finally, we demonstrate the trade-offs between collision detection and neural planning computation to maximize energy efficiency for different hardware configurations.
|
|
10:25-10:30, Paper TuAT9.7 | Add to My Program |
Rigid Body Path Planning Using Mixed-Integer Linear Programming |
|
Yu, Mingxin | Massachusetts Institute of Technology |
Fan, Chuchu | Massachusetts Institute of Technology |
Keywords: Formal Methods in Robotics and Automation, Motion and Path Planning
Abstract: Navigating rigid body objects through crowded environments can be challenging, especially when narrow passages are presented. Existing sampling-based planners and optimization-based methods like mixed integer linear programming (MILP) formulations, suffer from limited scalability with respect to either the size of the workspace or the number of obstacles. In order to address the scalability issue, we propose a three-stage algorithm that first generates a graph of convex polytopes in the workspace free of collision, then poses a large set of small MILPs to generate viable paths between polytopes, and finally queries a pair of start and end configurations for a feasible path online. The graph of convex polytopes serves as a decomposition of the free workspace and the number of decision variables in each MILP is limited by restricting the subproblem within two or three free polytopes rather than the entire free region. Our simulation results demonstrate shorter online computation time compared to baseline methods and scales better with the size of the environment and tunnel width than sampling-based planners in both 2D and 3D environments.
|
|
TuAT10 Regular Session, 313 |
Add to My Program |
Multi-Robot Swarms 1 |
|
|
Chair: Lu, Qi | The University of Texas Rio Grande Valley |
Co-Chair: Hauert, Sabine | University of Bristol |
|
09:55-10:00, Paper TuAT10.1 | Add to My Program |
Strain-Coordinated Formation, Migration, and Encapsulation Behaviors in a Tethered Robot Collective |
|
Cutler, Sadie | Cornell University |
Ma, Danna | Cornell University |
Petersen, Kirstin Hagelskjaer | Cornell University |
Keywords: Distributed Robot Systems, Robust/Adaptive Control, Sensor-based Control
Abstract: Tethers are an underutilized tool in multi-robot systems: tethers can provide power, facilitate retrieval and sensing, and be used to manipulate and gather objects. Starting with the simplest possible configuration, our work explores how agents linked in series by flexible, passive, fixed-length tethers, can use those tethers as sensors to achieve distributed formation control. In this study, we extend upon previous work to show the applicability of strain-coordinated formation control for encapsulation and migration along a global gradient as well as the trade-offs between formation control and taxis in an obstacle-laden environment. Our results indicate significant potential for tethered robot collectives: versatile behaviors that can work on simple, resource-constrained robots or serve as a fallback mechanism in case more sophisticated means of coordination fail.
|
|
10:00-10:05, Paper TuAT10.2 | Add to My Program |
Deep Learning-Enhanced Visual Monitoring in Hazardous Underwater Environments with a Swarm of Micro-Robots |
|
Chen, Shuang | Durham University |
He, Yifeng | The University of Manchester |
Lennox, Barry | The University of Manchester |
Arvin, Farshad | Durham University |
Atapour-Abarghouei, Amir | Durham University |
Keywords: Robotics in Hazardous Fields, Deep Learning for Visual Perception, Data Sets for Robotic Vision
Abstract: Long-term monitoring and exploration of extreme environments, such as underwater storage facilities, is costly, labor-intensive, and hazardous. Automating this process with low-cost, collaborative robots can greatly improve efficiency. These robots capture images from different positions, which must be processed simultaneously to create a spatio-temporal model of the facility. In this paper, we propose a novel approach that integrates data simulation, a multi-modal deep learning network for coordinate prediction, and image reassembly to address the challenges posed by environmental disturbances causing drift and rotation in the robots’ positions and orientations. Our approach enhances the precision of alignment in noisy environments by integrating visual information from snapshots, global positional context from masks, and noisy coordinates. We validate our method through extensive experiments using synthetic data that simulate real-world robotic operations in underwater settings. The results demonstrate very high coordinate prediction accuracy and plausible image assembly, indicating the real-world applicability of our approach. The assembled images provide clear and coherent views of the underwater environment for effective monitoring and inspection, showcasing the potential for broader use in extreme settings, further contributing to improved safety, efficiency, and cost reduction in hazardous field monitoring.
|
|
10:05-10:10, Paper TuAT10.3 | Add to My Program |
CapBot: Enabling Battery-Free Swarm Robotics |
|
Liu, Mengyao | KU Leuven |
Deferme, Lowie | KU Leuven |
Van Eyck, Tom | KU Leuven |
Yang, Fan | KU Leuven |
Abadie, Alexandre | Inria |
Alvarado-Marin, Said | INRIA |
Maksimovic, Filip | INRIA |
Miyauchi, Genki | The University of Sheffield |
Jayakumar, Jessica | University of Sheffield |
Talamali, Mohamed S. | University of Sheffield |
Watteyne, Thomas | Inria |
Gross, Roderich | Technical University of Darmstadt |
Hughes, Danny | KU Leuven |
Keywords: Swarm Robotics, Embedded Systems for Robotic and Automation, Hardware-Software Integration in Robotics
Abstract: Swarm robotics focuses on designing and coordinating large groups of relatively simple robots to perform tasks in a decentralised and collective manner. The swarm provides a resilient and flexible solution for many applications. However, contemporary swarm robots have a significant power problem in that secondary (i.e. rechargeable) batteries are slow to charge and offer lifetimes of only a few years, increasing maintenance costs and pollution due to battery replacement.We imagine a different future, wherein battery free robots powered by supercapacitors can be recharged in seconds, offer long-life autonomous operation and can rapidly pass charge between one another using trophallaxis. In pursuit of this vision, we contribute the CapBot, a battery-free swarm robot equipped with Mecanum wheels, a Cortex M4F application processor and Bluetooth Low Energy networking. The CapBot fully recharges in 16 s, offers 51 min of autonomous operation at top speed, and can transfer up to 50% of its available charge to a peer via trophallaxis in under 20 s. The CapBot is fully open source and all software and hardware source is available online.
|
|
10:10-10:15, Paper TuAT10.4 | Add to My Program |
Express Yourself: Enabling Large-Scale Public Events Involving Multi-Human-Swarm Interaction for Social Applications with MOSAIX |
|
Alhafnawi, Merihan | Princeton University |
Gomez-Gutierrez, Maca | We the Curios |
Hunt, Edmund Robert | University of Bristol |
Lemaignan, Séverin | PAL Robotics |
O'Dowd, Paul Jason | University of Bristol |
Hauert, Sabine | University of Bristol |
Keywords: Swarm Robotics, Social HRI, Art and Entertainment Robotics
Abstract: Robot swarms have the potential to help groups of people with social tasks, given their ability to scale to large numbers of robots and users. Developing multi-human-swarm interaction is therefore crucial to support multiple people interacting with the swarm simultaneously - which is an area that is scarcely researched, unlike single-human, single-robot or single-human, multi-robot interaction. Moreover, most robots are still confined to laboratory settings. In this paper, we present our work with MOSAIX, a swarm of robot Tiles, that facilitated ideation at a science museum. 63 robots were used as a swarm of smart sticky notes, collecting input from the public and aggregating it based on themes, providing an evolving visualization tool that engaged visitors and fostered their participation. Our contribution lies in creating a large-scale (63 robots and 294 attendees) public event, with a completely decentralized swarm system in real-life settings. We also discuss learnings we obtained that might help future researchers create multi-human-swarm interaction with the public.
|
|
10:15-10:20, Paper TuAT10.5 | Add to My Program |
MochiSwarm: A Testbed for Robotic Micro-Blimps in Realistic Environments |
|
Xu, Jiawei | Lehigh University |
Vu, Thong | Lehigh University |
S. D'Antonio, Diego | Lehigh University |
Saldaña, David | Lehigh University |
Keywords: Software-Hardware Integration for Robot Systems, Aerial Systems: Applications, Swarm Robotics
Abstract: Efficient energy management and scalability are critical for aerial robots in tasks such as pickup-and-delivery and surveillance. This paper introduces MochiSwarm, an open-source testbed of light-weight micro robotic blimps designed for multi-robot operation without external localization. We propose a modular system architecture that integrates adaptable hardware, a flexible software framework, and a detachable perception module. The hardware is designed to allow for rapid modifications and sensor integration, while the software supports multiple actuation models and robust communication between a base station and multiple blimps. We showcase a differential-drive module as an example, in which autonomy is enabled by visual servoing using the perception module. A case study of pickup-and-delivery tasks with up to 12 blimps highlights the autonomy of the MochiSwarm without relying on external infrastructures.
|
|
10:20-10:25, Paper TuAT10.6 | Add to My Program |
Exploring Unstructured Environments Using Minimal Sensing on Cooperative Nano-Drones |
|
Arias-Perez, Pedro | Universidad Politécnica De Madrid |
Gautam, Alvika | Texas a & M University |
Fernandez-Cortizas, Miguel | Universidad Politécnica De Madrid |
Perez Saura, David | Computer Vision and Aerial Robotics Group (CVAR), Universidad Po |
Saripalli, Srikanth | Texas A&M |
Campoy, Pascual | Computer Vision & Aerial Rootics Group. Universidad Politécnica |
Keywords: Aerial Systems: Perception and Autonomy, Micro/Nano Robots, Multi-Robot Systems
Abstract: Recent advances have improved autonomous navigation and mapping under payload constraints, but current multi-robot inspection algorithms are unsuitable for nano-drones, due to their need for heavy sensors and high computational resources. To address these challenges, we introduce ExploreBug, a novel hybrid frontier range-bug algorithm designed to handle limited sensing capabilities for a swarm of nano-drones. This system includes three primary components: a mapping subsystem, an exploration subsystem, and a navigation subsystem. Additionally, an intra-swarm collision avoidance system is integrated to prevent collisions between drones. We validate the efficacy of our approach through extensive simulations and real-world exploration experiments, involving up to seven drones in simulations and three in real-world settings, across various obstacle configurations and with a maximum navigation speed of 0.75 m/s. Our tests prove that the algorithm efficiently completes exploration tasks, even with minimal sensing, across different swarm sizes and obstacle densities. Furthermore, our frontier allocation heuristic ensures an equal distribution of explored areas and paths traveled by each drone in the swarm. We publicly release the source code of the proposed system to foster further developments in mapping and exploration using autonomous nano drones.
|
|
10:25-10:30, Paper TuAT10.7 | Add to My Program |
Continuous Sculpting: Persistent Swarm Shape Formation Adaptable to Local Environmental Changes |
|
Curtis, Andrew | Northwestern |
Yim, Mark | University of Pennsylvania |
Rubenstein, Michael | Northwestern University |
Keywords: Swarms, Path Planning for Multiple Mobile Robots or Agents, Distributed Robot Systems, Shape Formation
Abstract: Despite their growing popularity, swarms of robots remain limited by the operating time of each individual. We present algorithms which allow a human to sculpt a swarm of robots into a shape that persists in space perpetually, independent of onboard energy constraints such as batteries. Robots generate a path through a shape such that robots cycle in and out of the shape. Robots inside the shape react to human initiated changes and adapt the path through the shape accordingly. Robots outside the shape recharge and return to the shape so that the shape can persist indefinitely. The presented algorithms communicate shape changes throughout the swarm using message passing and robot motion. These algorithms enable the swarm to persist through any arbitrary changes to the shape. We describe these algorithms in detail and present their performance in simulation and on a swarm of mobile robots. The result is a swarm behavior more suitable for extended duration, dynamic shape-based tasks in applications such as entertainment, agriculture, and emergency response.
|
|
TuAT11 Regular Session, 314 |
Add to My Program |
Calibration 1 |
|
|
Chair: Mueller, Andreas | Johannes Kepler University |
Co-Chair: Lee, Min Cheol | Pusan National University |
|
09:55-10:00, Paper TuAT11.1 | Add to My Program |
Kinematic Calibration of a Redundant Robot in Closed-Loop System Using Indicated Competitive Swarm Method |
|
Kim, Jaehyung | Pusan National Univ |
Lee, Min Cheol | Pusan National University |
Keywords: Calibration and Identification, Redundant Robots, Kinematics
Abstract: Previous calibration techniques often relied on specialized end-effector tracking devices, such as a laser tracker, which can be expensive and impractical in specific environments. Furthermore, research on the calibration of redundant manipulators has been relatively scarce compared to non-redundant counterparts. To overcome these limitations, this article introduces a novel method for kinematic calibration of a damaged redundant serial robot, employing an indicated competitive swarm optimization with a finite-screw deviation model. The proposed kinematic calibration method utilizes a kinematic closed-loop method, which identifies an axis deviation without using expensive end-effector tracking equipment. Moreover, a competitive-swarm-inspired optimization model is introduced to efficiently identify axis deviations, significantly reducing the required calibration points compared to prior studies and thereby facilitating calibration for redundant manipulators. Both simulation and experiment were conducted to validate the proposed method using a seven-degree-of-freedom redundant serial robot. The results demonstrate the proposed calibration method's effectiveness and practicality, which can be easily applied for a redundant robot calibration.
|
|
10:00-10:05, Paper TuAT11.2 | Add to My Program |
KFCalibNet: A KansFormer-Based Self-Calibration Network for Camera and LiDAR |
|
Xu, Zejing | Traffic Control Technology Co., Ltd |
Liu, Yiqing | University of Birmingham |
Gao, Ruipeng | Beijing Jiaotong University |
Tao, Dan | Beijing Jiaotong University |
Qi, Peng | Beijing Jiaotong University |
Zhao, Ning | University of Birmingham |
Fu, Zhe | Traffic Control Technology Co., Ltd., Beijing, China |
Keywords: Calibration and Identification, Sensor Fusion, Deep Learning Methods
Abstract: In autonomous driving and robotic navigation, multi-sensor fusion technology has become increasingly mainstream, with precise sensor calibration as its foundation. Traditional calibration methods rely on manual effort or specific targets, limiting adaptability to complex environments. Learning-based calibration methods still face challenges, such as insufficient overlap between the fields of view (FoV) of multiple sensors and suboptimal cross-modal feature association, which hinder accurate parameter regression. Unlike traditional CNN-based networks, we propose a KansFormer-based self-Calibration Network for camera and LiDAR (KFCalibNet) that replaces fixed activation functions and linear transformations with learnable nonlinear activation functions. This enables the extraction of more fine-grained features from both image and point cloud, significantly enhancing the network's robustness in scenarios with limited FoV overlap. We also employ a multihead attention (MHA) module to compute correlations between image and point cloud features, significantly enhancing cross-modal feature association. To reduce learning complexity, we designed KansFormer with FastKAN as the feedforward network, enabling deep fusion and regression of fine-grained cross-modal features for accurate extrinsic calibration. KFCalibNet achieves an absolute average calibration error of 0.0965 cm in translation and 0.0234° in rotation on the KITTI Odometry dataset, outperforming existing state-of-the-art calibration methods. Moreover, its accuracy and generalization capability have been validated across multiple real-world railway lines.
|
|
10:05-10:10, Paper TuAT11.3 | Add to My Program |
Inducing Matrix Sparsity Bias for Improved Dynamic Identification of Parallel Kinematic Manipulators Using Deep Learning |
|
Lahoud, Marcel | Italian Institute of Technology |
Gnad, Daniel | Johannes Kepler University Linz |
Marchello, Gabriele | Istituto Italiano Di Tecnologia |
D'Imperio, Mariapaola | Istituto Italiano Di Tecnologia |
Mueller, Andreas | Johannes Kepler University |
Cannella, Ferdinando | Istituto Italiano Di Tecnologia |
Keywords: Dynamics, Deep Learning Methods, Calibration and Identification
Abstract: Among the many challenges of parallel kinematic manipulators, achieving high-speed and accurate control remains crucial. Estimating their dynamic properties is essential for designing precise and efficient control schemes. Conventional methods for dynamic model identification have been effective, though deep learning approaches have historically faced limitations due to data inefficiencies. However, recent advancements in physics-informed neural networks (PINNs) offer a way to improve both control and the extraction of interpretable physical properties from these robots. In this work, we propose and validate a PINN-based dynamic model for a Delta parallel robot, specifically the ABB IRB 360-6/1600. Our approach incorporates known physical properties, such as mass matrix sparsity, to improve accuracy and computational efficiency in dynamic model identification. To the best of our knowledge, this is the first study applying PINNs to model parallel robots. The method is validated experimentally, and its performance is compared to a validated identification technique for physically consistent identification, demonstrating the effectiveness of this approach for real-world applications in parallel robots.
|
|
10:10-10:15, Paper TuAT11.4 | Add to My Program |
Infield Self-Calibration of Intrinsic Parameters for Two Rigidly Connected IMUs |
|
Huang, Can | XREAL, Inc |
Lai, Wenqian | XREAL |
Guo, Ruonan | XREAL |
Wu, Kejian | XREAL |
Keywords: Calibration and Identification, Sensor Fusion, Localization
Abstract: This paper presents a study on the infield self-calibration of two rigidly connected IMUs' intrinsic parameters, without the aid of any external sensors, equipment, or specialized procedures. Specifically, we consider the calibration of gyroscope biases, gyroscope scale factors, and accelerometer biases, using only IMU data and known extrinsics between the two IMUs. We focus on the observability analysis of this system, and show that all gyroscope intrinsic parameters and a portion of accelerometer biases are observable, with information from both IMUs and sufficient motion. Moreover, we identify the additional unobservable directions in the intrinsic parameters that arise from various degenerate motions. Finally, we validate our observability findings through numerical simulations, and assess our system's calibration accuracy using real-world data.
|
|
10:15-10:20, Paper TuAT11.5 | Add to My Program |
PlaneHEC: Efficient Hand-Eye Calibration for Multi-View Robotic Arm Via Any Point Cloud Plane Detection |
|
Wang, Ye | Xi'an Jiaotong University |
Jing, Haodong | Xi'an Jiaotong University |
Liao, Yang | Xi'an Jiaotong University |
Ma, Yongqiang | Xi'an Jiaotong University |
Zheng, Nanning | Xi'an Jiaotong University |
Keywords: Calibration and Identification, RGB-D Perception, Perception for Grasping and Manipulation
Abstract: Hand-eye calibration is an important task in vision-guided robotic systems and is crucial for determining the transformation matrix between the camera coordinate system and the robot end-effector. Existing methods, for multi-view robotic systems, usually rely on accurate geometric models or manual assistance, generalize poorly, and can be very complicated and inefficient. Therefore, in this study, we propose PlaneHEC, a generalized hand-eye calibration method that does not require complex models and can be accomplished using only depth cameras, which achieves the optimal and fastest calibration results using arbitrary planar surfaces like walls and tables. PlaneHEC introduces hand-eye calibration equations based on planar constraints, which makes it strongly interpretable and generalizable. PlaneHEC also uses a comprehensive solution that starts with closed-form solution and improves it with iterative optimization, which greatly improves accuracy. We comprehensively evaluated the performance of PlaneHEC in both simulated and real-world environments and compared the results with other point-cloud-based calibration methods, proving its superiority. Our approach achieves universal and fast calibration with an innovative design of computational models, providing a strong contribution to the development of multi-agent system and embodied intelligence.
|
|
10:20-10:25, Paper TuAT11.6 | Add to My Program |
Bayesian Optimal Experimental Design for Robot Kinematic Calibration |
|
Das, Ersin | Caltech |
Touma, Thomas | Caltech |
Burdick, Joel | California Institute of Technology |
Keywords: Calibration and Identification, Kinematics
Abstract: This paper develops a Bayesian optimal experimental design for robot kinematic calibration on {mathbb{S}^3 !times! mathbb{R}^3}. Our method builds upon a Gaussian process approach that incorporates a geometry-aware kernel based on Riemannian Mat'ern kernels over {mathbb{S}^3}. To learn the forward kinematics errors via Bayesian optimization with a Gaussian process, we define a geodesic distance-based objective function. Pointwise values of this function are sampled via noisy measurements taken using fiducial markers on the end-effector using a camera and computed pose with the nominal kinematics. The corrected Denavit-Hartenberg parameters are obtained using an efficient quadratic program that operates on the collected data sets. The effectiveness of the proposed method is demonstrated via simulations and calibration experiments on NASA's ocean world lander autonomy testbed (OWLAT).
|
|
10:25-10:30, Paper TuAT11.7 | Add to My Program |
Automatic Target-Less Camera-LiDAR Calibration from Motion and Deep Point Correspondences |
|
Petek, Kürsat | University of Freiburg |
Vödisch, Niclas | University of Freiburg |
Meyer, Johannes | University of Freiburg |
Cattaneo, Daniele | University of Freiburg |
Valada, Abhinav | University of Freiburg |
Burgard, Wolfram | University of Technology Nuremberg |
Keywords: Calibration and Identification, Deep Learning Methods, Sensor Fusion
Abstract: Sensor setups of robotic platforms commonly include both camera and LiDAR as they provide complementary information. However, fusing these two modalities typically requires a highly accurate calibration between them. In this paper, we propose MDPCalib which is a novel method for camera-LiDAR calibration that requires neither human supervision nor any specific target objects. Instead, we utilize sensor motion estimates from visual and LiDAR odometry as well as deep learning-based 2D-pixel-to-3D-point correspondences that are obtained without in-domain retraining. We represent camera-LiDAR calibration as an optimization problem and minimize the costs induced by constraints from sensor motion and point correspondences. In extensive experiments, we demonstrate that our approach yields highly accurate extrinsic calibration parameters and is robust to random initialization. Additionally, our approach generalizes to a wide range of sensor setups, which we demonstrate by employing it on various robotic platforms including a self-driving perception car, a quadruped robot, and a UAV. To make our calibration method publicly accessible, we release the code on our project website at https://calibration.cs.uni-freiburg.de.
|
|
TuAT12 Regular Session, 315 |
Add to My Program |
Identifcation and Estimation for Legged Robots |
|
|
Chair: Boularias, Abdeslam | Rutgers University |
Co-Chair: Bekris, Kostas E. | Rutgers, the State University of New Jersey |
|
09:55-10:00, Paper TuAT12.1 | Add to My Program |
Legged Robot State Estimation with Invariant Extended Kalman Filter Using Neural Measurement Network |
|
Youm, Donghoon | Korea Advanced Institute of Science and Technology |
Oh, Hyunsik | Korea Advanced Institute of Science and Technology |
Choi, Suyoung | KAIST |
Kim, HyeongJun | Korea Advanced Institute of Science and Technology |
Jeon, Seunghun | KAIST |
Hwangbo, Jemin | Korean Advanced Institute of Science and Technology |
Keywords: Legged Robots, Deep Learning Methods
Abstract: This paper introduces a novel proprioceptive state estimator for legged robots that combines model-based filters with deep neural networks. In environments where vision systems are not reliable, proprioceptive state estimators become indispensable. Traditionally, proprioceptive state estimators are based on model-based approaches, which rely solely on contact foot kinematics as measurements. In contrast, learning-based approaches have obtained new measurements, such as displacement and covariance, by leveraging real-world data in a supervised manner. In this work, we develop a state estimation framework that trains a neural measurement network (NMN) to estimate the base's linear velocity and foot contact probability, which are then employed as measurements in an invariant extended Kalman filter. Our approach relies solely on simulation data for training, as it allows us to obtain extensive data easily. We address the sim-to-real gap by adapting existing learning techniques and regularization. To validate our proposed method, we conduct hardware experiments using a quadruped robot on four types of terrain: flat, debris, soft, and slippery. In our experiments, the proposed method demonstrates significant improvements over the model-based state estimator, achieving an average reduction in Absolute Trajectory Error (ATE) by 61.8% for position and 8.5% for velocity.
|
|
10:00-10:05, Paper TuAT12.2 | Add to My Program |
Physically-Consistent Parameter Identification of Robots in Contact |
|
Khorshidi, Shahram | University of Bonn |
Elnagdi, Murad | University of Bonn |
Nederkorn, Benno | Technical University of Munich |
Bennewitz, Maren | University of Bonn |
Khadiv, Majid | Technical University of Munich |
Keywords: Legged Robots, Model Learning for Control, Calibration and Identification
Abstract: Accurate inertial parameter identification is crucial for the simulation and control of robots encountering intermittent contacts with the environment. Classically, robots' inertial parameters are obtained from CAD models that are not precise (and sometimes not available, e.g., Spot from Boston Dynamics), hence requiring identification. To do that, existing methods require access to contact force measurement, a modality not present in modern quadruped and humanoid robots. This paper presents an alternative technique that utilizes joint current/torque measurements —a standard sensing modality in modern robots— to identify inertial parameters without requiring direct contact force measurements. By projecting the whole-body dynamics into the null space of contact constraints, we eliminate the dependency on contact forces and reformulate the identification problem as a linear matrix inequality that can handle physical and geometrical constraints. We compare our proposed method against a common black-box identification method using a deep neural network and show that incorporating physical consistency significantly improves the sample efficiency and generalizability of the model. Finally, we validate our method on the Spot quadruped robot across various locomotion tasks, showcasing its accuracy and generalizability in real-world scenarios over different gaits.
|
|
10:05-10:10, Paper TuAT12.3 | Add to My Program |
Contact Force Estimation for a Leg-Wheel Transformable Robot with Varying Contact Points |
|
Shen, Yi-Syuan | National Taiwan University |
Yu, Wei-Shun | National Taiwan University |
Lin, Pei-Chun | National Taiwan University |
Keywords: Legged Robots, Dynamics, Multi-Contact Whole-Body Motion Planning and Control
Abstract: Accurate estimation of contact forces is crucial for the effective control of quadrupedal robots, especially in complex locomotion scenarios. In this paper, we introduce a novel force estimation technique for robots equipped with transformable leg-wheels. Unlike conventional methods that focus on forces at specific contact points, our approach expresses varying contact points through a simplified kinematic model and derives the corresponding Jacobian matrices. This allows us to apply the virtual work method to evaluate contact forces across the entire surface of the leg-wheel, including the tips, sides, and other contact regions. This adaptability is particularly advantageous in hybrid locomotion modes, where different parts of the leg-wheel interact with the terrain. The proposed method is highly efficient, relying solely on motor current and position feedback without the need for additional sensors. We validate our approach through simulations and real-world experiments, demonstrating its accuracy, robustness, and applicability under diverse operational conditions.
|
|
10:10-10:15, Paper TuAT12.4 | Add to My Program |
Simultaneous Collision Detection and Force Estimation for Dynamic Quadrupedal Locomotion |
|
Zhou, Ziyi | Georgia Institute of Technology |
Di Cairano, Stefano | Mitsubishi Electric Research Laboratories |
Wang, Yebin | Mitsubishi Electric Research Laboratories |
Berntorp, Karl | Mitsubishi Electric Research Labs |
Keywords: Legged Robots, Motion Control
Abstract: In this paper we address the simultaneous collision detection and force estimation problem for quadrupedal loco- motion using joint encoder information and the robot dynamics only. We design an interacting multiple-model Kalman filter (IMM-KF) that estimates the external force exerted on the robot and multiple possible contact modes. The method is invariant to any gait pattern design. Our approach leverages pseudo-measurement information of the external forces based on the robot dynamics and encoder information. Based on the estimated contact mode and external force, we design a reflex motion and an admittance controller for the swing leg to avoid collisions by adjusting the leg’s reference motion. Additionally, we implement a force-adaptive model predictive controller to enhance balancing. Simulation ablatation studies and experiments show the efficacy of the approach.
|
|
10:15-10:20, Paper TuAT12.5 | Add to My Program |
PROBE: Proprioceptive Obstacle Detection and Estimation While Navigating in Clutter |
|
Metha Ramesh, Dhruv | Rutgers University |
Sivaramakrishnan, Aravind | Amazon Fulfillment Technology & Robotics |
Keskar, Shreesh | Rutgers University |
Bekris, Kostas E. | Rutgers, the State University of New Jersey |
Yu, Jingjin | Rutgers University |
Boularias, Abdeslam | Rutgers University |
Keywords: Legged Robots, Sensorimotor Learning, Mapping
Abstract: In critical applications, including search-and- rescue in degraded environments, blockages can be prevalent and prevent the effective deployment of certain sensing modalities, particularly vision, due to occlusion and the constrained range of view of onboard camera sensors. To enable robots to tackle these challenges, we propose a new approach, Proprioceptive Obstacle Detection and Estimation while navigating in clutter (PROBE), which instead relies only on the robot’s proprioception to infer the presence or absence of occluded rectangular obstacles while predicting their dimensions and poses in SE(2). The proposed approach is a Transformer neural network that receives as input a history of applied torques and sensed whole-body movements of the robot and returns a parameterized representation of the obstacles in the environment. The effectiveness of PROBE is evaluated on simulated environments in Isaac Gym and with a real Unitree Go1 quadruped robot. The project webpage can be found at https://dhruvmetha.github.io/legged-probe/
|
|
10:20-10:25, Paper TuAT12.6 | Add to My Program |
Fast Decentralized State Estimation for Legged Robot Locomotion Via EKF and MHE |
|
Xiong, Xiaobin | University of Wisconsin Madison |
Kang, Jiarong | University of Wisconsin Madison |
Wang, Yi | Columbia University |
Keywords: Legged Robots, Sensor Fusion
Abstract: In this paper, we present a fast and decentralized state estimation framework for the control of legged locomotion. The nonlinear estimation of the floating base states is decentralized to an orientation estimation via Extended Kalman Filter (EKF) and a linear velocity estimation via Moving Horizon Estimation (MHE). The EKF fuses the inertia sensor with vision to estimate the floating base orientation. The MHE uses the estimated orientation with all the sensors within a time window in the past to estimate the linear velocities based on a time-varying linear dynamics formulation of the interested states with state constraints. More importantly, a marginalization method based on the optimization structure of the full information filter (FIF) is proposed to convert the equality-constrained FIF to an equivalent MHE. This decoupling of state estimation promotes the desired balance of computation efficiency, accuracy of estimation, and the inclusion of state constraints. The proposed method is shown to be capable of providing accurate state estimation to several legged robots, including the highly dynamic hopping robot PogoX, the bipedal robot Cassie, and the quadrupedal robot Unitree Go1, with a frequency at 200 Hz and a window interval of 0.1s.
|
|
TuAT13 Regular Session, 316 |
Add to My Program |
Assistive Robotics 1 |
|
|
Chair: Cabrera, Maria Eugenia | University of Massachusetts Lowell |
Co-Chair: Xiao, Chenzhang | University of Illinois at Urbana-Champaign |
|
09:55-10:00, Paper TuAT13.1 | Add to My Program |
Elderly Bodily Assistance Robot (E-BAR): A Robot System for Body-Weight Support, Ambulation Assistance, and Fall Catching, without the Use of a Harness |
|
Bolli, Roberto | MIT |
Asada, Harry | MIT |
Keywords: Physically Assistive Devices, Domestic Robotics, Mechanism Design
Abstract: As over 11,000 people turn 65 each day in the U.S., our country, like many others, is facing growing challenges in caring for elderly persons, further exacerbated by a major shortfall of care workers. To address this, we introduce an eldercare robot (E-BAR) capable of lifting a human body, assisting with postural changes/ambulation, and catching a user during a fall, all without the use of any wearable device or harness. Our robot is the first to integrate these 3 tasks, and is capable of lifting the full weight of a human outside of the robot’s base of support (across gaps and obstacles). In developing E-BAR, we interviewed nurses and care professionals and conducted user-experience tests with elderly persons. Based on their functional requirements, the design parameters were optimized using a computational model and trade-off analysis. We developed a novel 18-bar linkage to lift a person from a floor to a standing position along a natural trajectory, while providing maximal mechanical advantage at key points. An omnidirectional, non-holonomic drive base, in which the wheels could be oriented to passively maximize floor grip, enabled the robot to resist lateral forces without active compensation. With a minimum width of 38 cm, the robot’s small footprint allowed it to navigate the typical home environment. Four airbags were used to catch and stabilize a user during a fall in less than 250 ms. We demonstrate E-BAR's utility in multiple typical home scenarios, including getting into/out of a bathtub, bending to reach for objects, sit-to-stand transitions, and ambulation.
|
|
10:00-10:05, Paper TuAT13.2 | Add to My Program |
A Cane-Mounted System for Dynamic Orientation Prediction for Correcting Incorrect Cane-Tapping by Visually Challenged Persons |
|
Singh, Gagandeep | Yardi School of Artificial Intelligence, Indian Institute of Tec |
Nadir, Mohd | Indian Institute of Technology |
Chanana, Piyush | School of Information Technology, Indian Institute of Technology |
Paul, Rohan | Indian Institute of Technology Delhi |
Keywords: Wearable Robotics, AI-Based Methods, Health Care Management
Abstract: People with visual impairments rely on Electronic Travel Aids (ETAs), such as sensor-equipped guide canes, for safe and effective navigation. Misalignment or improper handling of these devices can reduce their effectiveness, increasing the risk of collisions and injuries. This paper presents an AI-based embedded system designed to predict and correct the orientation of a guide cane in real time. By integrating an Inertial Measurement Unit (IMU) with a neural network, the system continuously monitors the cane's lateral angle and orientation while providing feedback to help the user self-correct. The feedback is proportional to the degree of error, guiding users to maintain proper cane positioning during mobility. The device logs data that can be visualized remotely, offering mobility trainers valuable insights into the user's navigation patterns. Evaluation by visually impaired users demonstrated that the system effectively aided in real-time orientation correction. This system represents a significant advancement in improving the safety and independence of individuals with visual impairments through wearable ETAs.
|
|
10:05-10:10, Paper TuAT13.3 | Add to My Program |
SRL-Gym: A Morphology and Controller Co-Optimization Framework for Supernumerary Robotic Limbs in Load-Bearing Locomotion |
|
Meng, Lingyi | University of Chinese Academy of Sciences |
Zheng, Enhao | Institute of Automation, Chinese Academy of Sciences |
Li, Xiong | Tencent |
Zhang, Zhong | City University of Hong Kong |
Keywords: Wearable Robotics, Physically Assistive Devices, Human and Humanoid Motion Analysis and Synthesis
Abstract: Supernumerary Robotic Limbs (SRLs) can assist human motions by providing extra degrees of freedom (DoFs) and body support. The extra DoFs lead to larger design space in structure and control policies, which is complex and timeconsuming with the traditional manual design process. In this pilot study, we proposed a novel morphology-controller cooptimization framework to automatically generate and optimize the SRL structure based on the locomotion task input. There are two layers, with the inner layer optimizing the controller to achieve human-robot synchronization, and the outer layer optimizing the morphology parameters for performance enhancement. We validated the proposed framework through simulations using SRLs in a load-bearing locomotion task. The results demonstrate that the controller optimization can automatically generate realistic gait patterns and stable humanrobot synchronization, while the SRLs significantly improve the user’s load-bearing capability. Additionally, the co-optimization process reduces both the manufacturing cost of the SRL and the torque on the joints. This approach shows potential for exhaustive exploration of the design space and acceleration of the design process. Future works will be done in a more realistic SRL generative design model and achieve Sim2Real for practical uses.
|
|
10:10-10:15, Paper TuAT13.4 | Add to My Program |
Adaptive Walker: User Intention and Terrain Aware Intelligent Walker with High-Resolution Tactile and IMU Sensor |
|
Choi, Yunho | Gwangju Institute of Science and Technology |
Hwang, Seokhyun | University of Washington |
Moon, Jaeyoung | Gwangju Institute of Science and Technology |
Lee, Hosu | Gyeongsang National University |
Yeo, Dohyeon | Gwangju Institute of Science and Technology |
Seong, Minwoo | Gwangju Institute of Science and Technology |
Luo, Yiyue | University of Washington |
Kim, SeungJun | Gwangju Institute of Science and Technology |
Matusik, Wojciech | MIT |
Rus, Daniela | MIT |
Kim, Kyung-Joong | Gwangju Institute of Science and Technology |
Keywords: Physically Assistive Devices, Rehabilitation Robotics, Machine Learning for Robot Control
Abstract: In this paper, we present an adaptive walker system designed to address limitations in current intelligent walker technologies. While recent advancements have been made in this field, existing systems often struggle to seamlessly interpret user intent for speed control and lack adaptability across diverse scenarios and terrain. Our proposed solution incorporates high-resolution tactile sensors, deep learning algorithms, IMU sensors, and linear motors to dynamically adjust to the user’s intentions and terrain changes. The system is capable of predicting the user’s desired speed with an error margin of only 20.99%, relying solely on tactile input from hand and arm contact points. Additionally, it maintains the walker’s horizontal stability with an error of less than 1 degree by adjusting leg lengths in response to variations in ground angle. This adaptive walker enhances user safety and comfort, particularly for individuals with reduced strength or cognitive abilities, and offers reliable assistance on uneven terrain such as uphill and downhill paths.
|
|
10:15-10:20, Paper TuAT13.5 | Add to My Program |
IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition |
|
Liu, Rui | University of Maryland |
Mahammad, Zahiruddin | University of Maryland College Park |
Bhaskar, Amisha | University of Maryland, College Park |
Tokekar, Pratap | University of Maryland |
Keywords: Human-Centered Robotics, Representation Learning, Imitation Learning
Abstract: Robotic assistive feeding holds significant promise for improving the quality of life for individuals with eating disabilities. However, acquiring diverse food items under varying conditions and generalizing to unseen food presents unique challenges. Existing methods that rely on surface-level geometric information (e.g., bounding box and pose) derived from visual cues (e.g., color, shape, and texture) often lacks adaptability and robustness, especially when foods share similar physical properties but differ in visual appearance. We employ imitation learning (IL) to learn a policy for food acquisition. Existing methods employ IL or Reinforcement Learning (RL) to learn a policy based on off-the-shelf image encoders such as ResNet-50. However, such representations are not robust and struggle to generalize across diverse acquisition scenarios. To address these limitations, we propose a novel approach, IMRL (Integrated Multi-Dimensional Representation Learning), which integrates visual, physical, temporal, and geometric representations to enhance the robustness and generalizability of IL for food acquisition. Our approach captures food types and physical properties (e.g., solid, semi-solid, granular, liquid, and mixture), models temporal dynamics of acquisition actions, and introduces geometric information to determine optimal scooping points and assess bowl fullness. IMRL enables IL to adaptively adjust scooping strategies based on context, improving the robot’s capability to handle diverse food acquisition scenarios. Experiments on a real robot demonstrate our approach’s robustness and adaptability across various foods and bowl configurations, including zero-shot generalization to unseen settings. Our approach achieves an improvement up to 35% in success rate compared with the best-performing baseline. More details can be found on our website https://ruiiu.github.io/imrl.
|
|
10:20-10:25, Paper TuAT13.6 | Add to My Program |
An Interactive Hands-Free Controller for a Riding Ballbot to Enable Simple Shared Control Tasks |
|
Xiao, Chenzhang | University of Illinois at Urbana-Champaign |
Song, Seung Yun | University of Illinois at Urbana-Champaign |
Chen, Yu | University of Illinois at Urbana-Champaign |
Mansouri, Mahshid | University of Illinois at Urbana-Champaign |
Ramos, Joao | University of Illinois at Urbana-Champaign |
Norris, William | University of Illinois Urbana-Champaign |
Hsiao-Wecksler, Elizabeth T. | University of Illinois at Urbana-Champaign |
Keywords: Physically Assistive Devices, Physical Human-Robot Interaction, Human-Centered Robotics
Abstract: Our team developed a riding ballbot (called PURE) that is dynamically stable, omnidirectional, and driven by lean-to-steer control. A hands-free admittance control scheme (HACS) was previously integrated to allow riders with different torso functions to control the robot's movements via torso leaning and twisting. Such an interface requires motor coordination skills and could result in collisions with obstacles due to low proficiency. Hence, a shared controller (SC) that limits the speed of PURE could be helpful to ensure the safety of riders. However, the self-balancing dynamics of PURE could result in a weak control authority of its motion, in which the torso motion of the rider could easily result in poor tracking of the command speed dictated by the shared controller. Thus, we proposed an interactive hands-free admittance control scheme (iHACS), which added two modules to the HACS to improve the speed-tracking performance of PURE: control gain personalization module and interaction compensation module. Human riding tests of simple tasks, idle-keeping and speed-limiting, were conducted to compare the performance of HACS and iHACS. Two manual wheelchair users and two able-bodied individuals participated in this study. They were instructed to use ``adversarial" torso motions that would tax the SC's ability to keep the ballbot idling or below a set speed, i.e., competing objectives between rider and robot. In the idle-keeping tasks, iHACS demonstrated minimal translational motion and low command speed tracking RMSE, even with significant torso lean angles. During the speed-limiting task, where the commanded speed was saturated at 0.5 m/s, the system achieved an average maximum speed of 1.1 m/s with iHACS, compared with that of over 1.9 m/s with HACS. These results suggest that iHACS can enhance PURE's control authority over the rider, which enables PURE to provide physical interactions back to the rider and results in a collaborative rider-robot synergy.
|
|
10:25-10:30, Paper TuAT13.7 | Add to My Program |
Garment Diffusion Models for Robot-Assisted Dressing |
|
Kotsovolis, Stelios | Imperial College London |
Demiris, Yiannis | Imperial College London |
Keywords: Physical Human-Robot Interaction, Model Learning for Control, Human-Centered Robotics
Abstract: Robots have the potential to assist people with disabilities and the elderly. One of the most common and burdensome tasks for caregivers is dressing. Two challenges of robot-assisted dressing are modeling the dynamics of garments and handling visual occlusions that obstruct the perception of the full state of the garment due to the proximity between the garment, the robot, and the human. In this paper, we propose a diffusion-based dynamics model for garments during robot-assisted dressing that can deal with partial point cloud observations. The diffusion model, conditioned on the observation and the robot's action, is used to predict a full point cloud of the garment's opening of the future state. The model is utilized in a model predictive controller, that is trained iteratively with model-based reinforcement learning. In our experiments, we examine a common problem of dressing: the insertion of a garment's sleeve on an arm. As demonstrated by the performed experiments, the proposed diffusion-based model predictive controller can be effectively used for robot-assisted dressing and handle visual occlusions. Moreover, our approach is highly sample-efficient. Specifically, the controller achieved 91.2% success rate in the examined dressing task with less than 100 sampled trajectories. Real-wold experiments demonstrate that the proposed method can adapt to the sim-to-real gap and generalize well to novel garments and configurations of the body.
|
|
TuAT14 Regular Session, 402 |
Add to My Program |
Tracking and Prediction 1 |
|
|
Chair: Dionigi, Alberto | University of Perugia |
Co-Chair: Tang, Chen | University of California Berkeley |
|
09:55-10:00, Paper TuAT14.1 | Add to My Program |
Pedestrian Intention and Trajectory Prediction in Unstructured Traffic Using IDD-PeD |
|
Bokkasam, Ruthvik | IIIT Hyderabad |
Gangisetty, Shankar | IIIT Hyderabad |
Abdul Hafez, A. H. | Hasan Kalyoncu Uiversity |
Jawahar, C.V. | IIIT, Hyderabad |
Keywords: Data Sets for Robotic Vision, Vision-Based Navigation, Intelligent Transportation Systems
Abstract: With the rapid advancements in autonomous driving, accurately predicting pedestrian behavior has become essential for ensuring safety in complex and unpredictable traffic conditions. The growing interest in this challenge highlights the need for comprehensive datasets that capture unstructured environments, enabling the development of more robust prediction models to enhance pedestrian safety and vehicle navigation. In this paper, we introduce an Indian driving pedestrian dataset designed to address the complexities of modeling pedestrian behavior in unstructured environments, such as illumination changes, occlusion of pedestrians, unsignalized scene types and vehicle-pedestrian interactions. The dataset provides high-level and detailed low-level comprehensive annotations focused on pedestrians requiring the ego-vehicle’s attention. Evaluation of the state-of-the-art intention prediction methods on our dataset shows a significant performance drop of up to 15%, while trajectory prediction methods underperform with an increase of up to 1208 MSE, defeating standard pedestrian datasets. Additionally, we present exhaustive quantitative and qualitative analysis of intention and trajectory baselines. We believe that our dataset will open new challenges for the pedestrian behavior research community to build robust models. Project Page: https://cvit.iiit.ac.in/ research/projects/cvit-projects/iddped
|
|
10:00-10:05, Paper TuAT14.2 | Add to My Program |
Visual-Linguistic Reasoning for Pedestrian Trajectory Prediction |
|
Shenkut, Dereje | Carnegie Mellon University |
Vijaya Kumar, B.V.K | Carnegie Mellon University |
Keywords: Intelligent Transportation Systems
Abstract: Accurate prediction of pedestrian trajectories is crucial as autonomous vehicles become more prevalent on roads. The dynamic nature of urban environments and the less predictable behavior of pedestrians present significant challenges in developing reliable prediction models. Earlier methods relying on recurrent neural networks (RNNs) and long-short-term memory (LSTM) networks have shown promise, but often fail to fully take advantage of the rich visual and contextual information available in real-world scenarios. Recent advances in vision-language models (VLMs) offer new opportunities to improve pedestrian trajectory prediction by incorporating multimodal reasoning capabilities. This paper introduces a novel approach that uses a powerful pre-trained VLM to improve the estimation of pedestrian trajectories. Specifically, we first enable learning of semantically useful scene context and high-level reasoning feature via vision-language model fine-tuning on specific prompts using road scenes with pedestrians. Next, with the learned VLM features and the pedestrian's past trajectory history, we predict future trajectories using an encoder-decoder head. Through experiments with first-person datasets JAAD and PIE, we show that utilizing visual-linguistic semantics via a pre-trained vision-language model outperforms previous methods in both deterministic and stochastic trajectory prediction setups.
|
|
10:05-10:10, Paper TuAT14.3 | Add to My Program |
Curb Your Attention: Causal Attention Gating for Robust Trajectory Prediction in Autonomous Driving |
|
Ahmadi, Ehsan | University of Alberta |
Mercurius, Ray Coden | University of Toronto |
Mohamad Alizadeh Shabestary, Soheil | Huawei Technologies Canada |
Rezaee, Kasra | Huawei Technologies |
Rasouli, Amir | Huawei Technologies Canada |
Keywords: Intelligent Transportation Systems, Autonomous Vehicle Navigation, Autonomous Agents
Abstract: Trajectory prediction models in autonomous driving are vulnerable to perturbations from non-causal agents whose actions should not affect the ego-agent’s behavior. Such perturbations can lead to incorrect predictions of other agents’ trajectories, potentially compromising the safety and efficiency of the ego-vehicle’s decision-making process. Motivated by this challenge, we propose Causal tRajecTory predICtion (CRiTIC), a novel model that utilizes a causal discovery network to identify inter-agent causal relations over a window of past time steps. To incorporate discovered causal relationships, we propose a novel Causal Attention Gating mechanism to selectively filter information in the proposed Transformer-based architecture. We conduct extensive experiments on two autonomous driving benchmark datasets to evaluate the robustness of our model against non-causal perturbations and its generalization capacity. Our results indicate that the robustness of predictions can be improved by up to 54% without a significant detriment to prediction accuracy. Lastly, we demonstrate the superior domain generalizability of the proposed model, which achieves up to 29% improvement in cross-domain performance. These results underscore the potential of our model to enhance both robustness and generalization capacity for trajectory prediction in diverse autonomous driving domains.
|
|
10:10-10:15, Paper TuAT14.4 | Add to My Program |
Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking |
|
Ishaq, Ayesha | Mohamed Bin Zayed University of Artificial Intelligence |
Boudjoghra, Mohamed El Amine | Mohamed Bin Zayed University of Artificial Intelligence |
Lahoud, Jean | MBZUAI |
Khan, Fahad | Linkoping University |
Khan, Salman | CSIRO |
Cholakkal, Hisham | MBZUAI |
Anwer, Rao | MBZUAI |
Keywords: Visual Tracking, Visual Learning, Deep Learning for Visual Perception
Abstract: 3D multi-object tracking plays a critical role in autonomous driving by enabling the real-time monitoring and prediction of multiple objects' movements. Traditional 3D tracking systems are typically constrained by predefined object categories, limiting their adaptability to novel, unseen objects in dynamic environments. To address this limitation, we introduce open-vocabulary 3D tracking, which extends the scope of 3D tracking to include objects beyond predefined categories. We formulate the problem of open-vocabulary 3D tracking and introduce dataset splits designed to represent various open-vocabulary scenarios. We propose a novel approach that integrates open-vocabulary capabilities into a 3D tracking framework, allowing for generalization to unseen object classes. Our method effectively reduces the performance gap between tracking known and novel objects through strategic adaptation. Experimental results demonstrate the robustness and adaptability of our method in diverse outdoor driving scenarios. To the best of our knowledge, this work is the first to address open-vocabulary 3D tracking, presenting a significant advancement for autonomous systems in real-world settings.
|
|
10:15-10:20, Paper TuAT14.5 | Add to My Program |
Asynchronous Multi-Object Tracking with an Event Camera |
|
Apps, Angus | Australian National University |
Wang, Ziwei | Australian National University |
Perejogin, Vladimir | Defence Science and Technology Organisation |
Molloy, Timothy L. | Australian National University |
Mahony, Robert | Australian National University |
Keywords: Object Detection, Segmentation and Categorization, Visual Tracking, Data Sets for Robotic Vision
Abstract: Events cameras are ideal sensors for enabling robots to detect and track objects in highly dynamic environments due to their low latency output, high temporal resolution, and high dynamic range. In this paper, we present the Asynchronous Event Multi-Object Tracking (AEMOT) algorithm for detecting and tracking multiple objects by processing individual raw events asynchronously. AEMOT detects salient event blob features by identifying regions of consistent optical flow using a novel Field of Active Flow Directions built from the Surface of Active Events. Detected features are tracked as candidate objects using the recently proposed Asynchronous Event Blob (AEB) tracker in order to construct small intensity patches of each candidate object. A novel learnt validation stage promotes or discards candidate objects based on classification of their intensity patches, with promoted objects having their position, velocity, size, and orientation estimated at their event rate. We evaluate AEMOT on a new Bee Swarm Dataset, where it tracks dozens of small bees with precision and recall performance exceeding that of alternative event-based detection and tracking algorithms by over 37%. Source code and the labelled event Bee Swarm Dataset will be open sourced.
|
|
10:20-10:25, Paper TuAT14.6 | Add to My Program |
Co-MTP: A Cooperative Trajectory Prediction Framework with Multi-Temporal Fusion for Autonomous Driving |
|
Zhang, Xinyu | Tongji University |
Zhou, Zewei | University of California, Los Angeles |
Wang, Zaoyi | Tongji University |
Ji, Yangjie | Tongji University,College of Automotive Studies |
Huang, Yanjun | Tongji University |
Chen, Hong | Tongji University |
Keywords: Computer Vision for Transportation, Sensor Fusion, Deep Learning Methods
Abstract: Vehicle-to-everything technologies (V2X) have become an ideal paradigm to extend the perception range and see through the occlusion. Exiting efforts focus on single-frame cooperative perception, however, how to capture the temporal cue between frames with V2X to facilitate the prediction task even the planning task is still underexplored. In this paper, we introduce the Co-MTP, a general cooperative trajectory prediction framework with multi-temporal fusion for autonomous driving, which leverages the V2X system to fully capture the interaction among agents in both history and future domains to benefit the planning. In the history domain, V2X can complement the incomplete history trajectory in single-vehicle perception, and we design a heterogeneous graph transformer to learn the fusion of the history feature from multiple agents and capture the history interaction. Moreover, the goal of prediction is to support future planning. Thus, in the future domain, V2X can provide the prediction results of surrounding objects, and we further extend the graph transformer to capture the future interaction among the ego planning and the other vehicles' intentions and obtain the final future scenario state under a certain planning action. We evaluate the Co-MTP framework on the real-world dataset V2X-Seq, and the results show that Co-MTP achieves state-of-the-art performance and that both history and future fusion can greatly benefit prediction. Our code is available on our project website: https://xiaomiaozhang.github.io/Co-MTP/
|
|
10:25-10:30, Paper TuAT14.7 | Add to My Program |
Predictive Spliner: Data-Driven Overtaking in Autonomous Racing Using Opponent Trajectory Prediction |
|
Baumann, Nicolas | ETH |
Ghignone, Edoardo | ETH |
Hu, Cheng | Zhejiang University |
Hildisch, Benedict | ETH Zurich |
Hämmerle, Tino | ETH Zürich |
Bettoni, Alessandro | University |
Carron, Andrea | ETH Zurich |
Xie, Lei | State Key Laboratory of Industrial Control Technology, Zhejiang |
Magno, Michele | ETH Zurich |
Keywords: Wheeled Robots, Collision Avoidance, Embedded Systems for Robotic and Automation
Abstract: Head-to-head racing against opponents is a challenging and emerging topic in the domain of autonomous racing. We propose Predictive Spliner, a data-driven overtaking planner designed to enhance competitive performance by anticipating opponent behavior. Using GP regression, the method learns and predicts the opponent’s trajectory, enabling the ego vehicle to calculate safe and effective overtaking maneuvers. Experimentally validated on a 1:10 scale autonomous racing platform, Predictive Spliner outperforms commonly employed overtaking algorithms by overtaking opponents at up to 83.1% of its own speed, being on average 8.4% faster than the previous best-performing method. Additionally, it achieves an average success rate of 84.5%, which is 47.6% higher than the previous best-performing method. The proposed algorithm maintains computational efficiency, making it suitable for real-time robotic applications. These results highlight the potential of Predictive Spliner to enhance the performance and safety of autonomous racing vehicles. The code for Predictive Spliner is available at: https://github.com/ForzaETH/predictive-spliner.
|
|
TuAT15 Regular Session, 403 |
Add to My Program |
Surgical Robotics: Continuum Robots |
|
|
Chair: Rodrigue, Hugo | Seoul National University |
Co-Chair: Park, Sukho | DGIST |
|
09:55-10:00, Paper TuAT15.1 | Add to My Program |
Workspace Expansion of Magnetic Soft Continuum Robot Using Movable Opposite Magnet |
|
Park, Joo-Won | DGIST |
Kee, Hyeonwoo | DGIST |
Park, Sukho | DGIST |
Keywords: Surgical Robotics: Steerable Catheters/Needles, Soft Robot Materials and Design, Micro/Nano Robots
Abstract: Recently, in the minimally invasive surgery field, magnetic soft continuum robots (MSCRs) have been actively studied, which are driven by an external magnetic field with a small magnet attached to the tip of the SCR. In addition, MSCR with opposite magnets (MSCR-OMs) has been reported for high dexterity, which has a small permanent magnet attached to the end of the MSCR and an additional opposite magnet fixed in the middle. To overcome the limitations of the existing MSCR and MSCR-OM and improve the workspace, we proposed a magnetic soft continuum robot with a movable opposite magnet (MSCR-MOM) with a 2.2 mm diameter and 10cm length, that can change the position of the opposite magnet. In this study, an analytical model of the proposed MSCR-MOM was presented, and through simulation and various experiments, its characteristics were analyzed and the workspace expansion was validated. In addition, the clinical applicability of the proposed MSCR-MOM was verified through phantom experiments. In the future, we expect that the proposed MSCR-MOM will be developed into a medical catheter that can be applied in various procedures through miniaturization and various clinical application studies.
|
|
10:00-10:05, Paper TuAT15.2 | Add to My Program |
Sim4EndoR: A Reinforcement Learning Centered Simulation Platform for Task Automation of Endovascular Robotics |
|
Yao, Tianliang | Tongji University |
Ban, Madaoji | The University of Hong Kong |
Lu, Bo | Soochow University |
Pei, Zhiqiang | University of Shanghai for Science and Technology |
Qi, Peng | Tongji University |
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles, Modeling, Control, and Learning for Soft Robots
Abstract: Robotic-assisted percutaneous coronary intervention (PCI) holds considerable promise for elevating precision and safety in cardiovascular procedures. Nevertheless, current systems heavily depend on human operators, resulting in variability and the potential for human error. To tackle these challenges, Sim4EndoR, an innovative reinforcement learning (RL) based simulation environment, is first introduced to bolster task-level autonomy in PCI. This platform offers a comprehensive and risk-free environment for the development, evaluation, and refinement of potential autonomous systems, enhancing data collection efficiency and minimizing the need for costly hardware trials. A notable aspect of the groundbreaking Sim4EndoR is its reward function, which takes into account the anatomical constraints of the vascular environment, utilizing the geometric characteristics of vessels to steer the learning process. By seamlessly integrating advanced physical simulations with neural network-driven policy learning, Sim4EndoR fosters efficient sim-to-real translation, paving the way for safer, more consistent robotic interventions in clinical practice, ultimately improving patient outcomes.
|
|
10:05-10:10, Paper TuAT15.3 | Add to My Program |
Design and Implementation of a Snake Robot for Cranial Surgery |
|
Law, Jones | University of Toronto |
Stickley, Emma | The Hospital for Sick Children |
Gondokaryono, Radian | University of Toronto |
Looi, Thomas | Hospital for Sick Children |
Diller, Eric D. | University of Toronto |
Podolsky, Dale | University of Toronto |
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles, Tendon/Wire Mechanism
Abstract: Craniosynostosis involves premature fusion of the cranial sutures resulting in abnormal skull morphology and elevated intracranial pressure. Surgical intervention is necessary to correct the skull shape and to allow for unrestricted brain growth. This study presents a novel snake robot designed for minimally invasive cranial osteotomies featuring two articulating bending segments. The end-effector comprises a bone-punch for bone-cutting, a dural and scalp retractor, as well as channels for an endoscope and an instrument. The robot’s bending mechanism is driven by tendons and utilizes geared linkages to facilitate a smooth curved shape. Pre-tensioned antagonistic tendons allow the robot to modulate its stiffness to adapt to external loads. A follow-the-leader algorithm was implemented to guide the robot along a skull cutting path. Experimental results demonstrated that at maximum bending of 60◦ for segment 1 and 90◦ for segment 2 there was a 15.9◦ and 11.5◦ error, respectively. Position errors ranged from 2.5 to 21.5mm when tracing a curved path. The tool increased stiffness with tendon pre-tensioning from 20-100N during bent configurations q1 and q2 for segments 1 and 2, respectively, at [q1,q2] = [0◦,30◦] and [30◦,60◦]. Tip deflection reduced from 0.42 to 0.03cm and 0.37 to 0.10cm during axial loading and from 11.40 to 3.88cm and 3.62 to 0.48cm during radial loading for each configuration, respectively. Ex vitro trials demonstrated the robots ability to perform simulated osteotomies on skull models to 68-73% of desired path lengths with a maximum deviation of 8mm.
|
|
10:10-10:15, Paper TuAT15.4 | Add to My Program |
Single-Fiber Optical Frequency Domain Reflectometry(OFDR) Shape Sensing of Continuum Manipulators with Planar Bending |
|
Tavangarifard, Mobina | The University of Texas at Austin |
Rodriguez Ovalle, Wendy | The University of Texas at Austin |
Alambeigi, Farshid | University of Texas at Austin |
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles, Soft Sensors and Actuators
Abstract: To address the challenges associated with shape sensing of continuum manipulators (CMs) using Fiber Bragg Grating (FBG) optical fibers, we present a unique shape sensing assembly utilizing solely a single Optical Frequency Domain Reflectometry (OFDR) fiber attached to a flat nitinol wire (NiTi).Integrating this easy-to-manufacture unique sensor with a long and soft CM with 170 mm length, we performed different experiments to evaluate its C-, J-, and S-shape reconstruction ability. Results demonstrate phenomenal shape reconstruction accuracy for the performed C-shape (< 3.14 mm tip error, < 2.54 mm shape error), J-shape (< 1.91 mm tip error, < 1.11 mm shape error), and S-shape (< 1.74 mm tip error, < 1.40 mm shape error) experiments.
|
|
10:15-10:20, Paper TuAT15.5 | Add to My Program |
Learning-Based Tip Contact Force Estimation for FBG-Embedded Continuum Robots |
|
Roshanfar, Majid | Postdoctoral Research Fellow at the Hospital for Sick Children ( |
Fekri, Pedram | Concordia University |
Nguyen, Robert Hideki | The Hospital for Sick Children |
He, Changyan | University of Newcastle, Australia |
Kang, Paul Hoseok | University of Toronto |
Drake, James | Hospital for Sick Children, University of Toronto |
Diller, Eric D. | University of Toronto |
Looi, Thomas | Hospital for Sick Children |
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles, Haptics and Haptic Interfaces
Abstract: Knowledge of the tip contact force in continuum robots, which are often used as medical instruments, is critical for clinical applications. It enhances the interventionalist's decision-making, navigation efficiency, and procedural safety. However, accurately determining the tip contact force in conventionally sized instruments remains challenging. This study introduces a learning-based method for estimating the external contact force at the tip of a continuum robot. By leveraging curvature and bending angle data from a multi-core fiber equipped with fiber Bragg gratings (FBGs) embedded inside the Nitinol tube, the method maps these inputs to the corresponding tip force in 3D. Experiments conducted on an FBG-embedded Nitinol rod validate the feasibility of the proposed method, yielding Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) values of 20.9 (mN 2), 2.7 (mN), and 4.6 (mN), respectively, which represent a 26% improvement compared to the learning-based vision methodology.
|
|
10:20-10:25, Paper TuAT15.6 | Add to My Program |
Three-Dimension Tip Force Perception and Axial Contact Location Identification for Flexible Endoscopy Using Tissue-Compliant Soft Distal Attachment Cap Sensors |
|
Zhang, Tao | Chinese University of Hong Kong |
Yang, Yang | Sichuan University |
Yang, Yang | The Chinese University of Hong Kong |
Gao, Huxin | National University of Singapore |
Lai, Jiewen | The Chinese University of Hong Kong |
Ren, Hongliang | Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS) |
Keywords: Medical Robots and Systems, Force and Tactile Sensing
Abstract: In endoluminal surgeries, inserting a flexible endoscope is one of the fundamental procedures. During this process, vision remains the primary feedback, while the perception of tactile magnitude and location is insufficient. This limitation can hinder the clinician’s efficiency when navigating the endoscope through various segments of the natural lumens. To address this issue, we propose a fiber Bragg grating (FBG)–based tissue-compliant sensor cap with multi-mode sensing capabilities, including contact location identification at the terminal surface and the three-dimensional contact force perception at the tip. The soft sensor cap can be affixed to the standard endoscope tip, like a distal attachment cap, for easy installation. Utilizing the relative contact location information, operators can adjust the steerable segment of the endoscope when transitioning from one segment of a natural orifice to a narrower segment, which may be obstructed by constricted lumens. A finite element analysis simulation and the corresponding calibration process based on learning-based approaches have been carried out. The FBG-based sensor can perceive the tip contact force and identify the axial contact location with high precision, where the force perception error is less than 3%, and the contact location identification accuracy is 98.8%. The experimental results demonstrate the potential of the proposed sensing mechanism to be applied in surgeries requiring endoscope insertions.
|
|
10:25-10:30, Paper TuAT15.7 | Add to My Program |
MPC Design of a Continuum Robot for Pulmonary Interventional Surgery Using Koopman Operators |
|
Song, Yuhua | Southeast University |
Zhu, Lifeng | Southeast University |
Li, Jinfeng | Zhuhai Hanglok Medical Technology Co., Ltd |
Deng, Jiawei | Hanglok-Tech Co., Ltd |
Wang, Cheng | Hanglok-Tech Co. Ltd |
Song, Aiguo | Southeast University |
Keywords: Surgical Robotics: Steerable Catheters/Needles, Modeling, Control, and Learning for Soft Robots, Optimization and Optimal Control
Abstract: This study focuses on the flexible tube of a bronchoscope robot used in pulmonary intervention surgery, which is considered as a continuum robot. The dynamics model is proposed based on the Koopman operator, leveraging real data to solve for the system matrix parameters accurately. To enhance control precision, we designed a model predictive control (MPC) algorithm aimed at tracking the desired curvature and deflection angles of the flexible tube. The MPC controller uses real-time data from electromagnetic sensors to adjust the tube shape, ensuring accurate and responsive manipulation. The effectiveness of the proposed algorithm are validated through extensive experiments conducted on the Binary experimental platform, demonstrating significant improvements in tracking performance and operational reliability compared to traditional open-loop control methods.
|
|
TuAT16 Regular Session, 404 |
Add to My Program |
Manipulation 1 |
|
|
Chair: Holladay, Rachel | University of Pennsylvania |
Co-Chair: Saveriano, Matteo | University of Trento |
|
09:55-10:00, Paper TuAT16.1 | Add to My Program |
A Perturbation-Robust Framework for Admittance Control of Robotic Systems with High-Stiffness Contacts and Heavy Payload |
|
Samuel, Kangwagye | Technical University of Munich |
Haninger, Kevin | Fraunhofer IPK |
Oboe, Roberto | University of Padova |
Haddadin, Sami | Mohamed Bin Zayed University of Artificial Intelligence |
Oh, Sehoon | DGIST |
Keywords: Compliance and Impedance Control, Human-Robot Collaboration, Motion Control
Abstract: Applications involving serial manipulators, in both co-manipulation with humans and autonomous operation tasks, require the robot to render high admittance so as to minimize contact forces and maintain stable contacts with high-stiffness surfaces. This can be achieved through admittance control, however, inner loop dynamics limit the bandwidth within which the desired admittance can be rendered from the outer loop. Moreover, perturbations affect the admittance control performance whereas other system specific limitations such as “black box” PD position control in typical industrial manipulators hinder the implementation of more advanced control methods. To address these challenges, a perturbation-robust multisensor framework, designed for serial manipulators engaged in contact-rich tasks involving heavy payloads, is introduced in this paper. Within this framework, a generalized perturbation-robust observer (PROB), which exploits the joint velocity measurements and inner loop velocity control model, and accommodates the varying stiffness of contacts through contact force measurements is introduced. Three PROBs including a novel Combined Dynamics Observer (CDYOB) are presented. The CDYOB can render wide-range admittance without bandwidth limitations from the inner loop. Theoretical analyses and experiments with an industrial robot validate the effectiveness of the proposed method.
|
|
10:00-10:05, Paper TuAT16.2 | Add to My Program |
Tension Maintenance Mechanism for Control Consistency of Twisted String Actuation-Based Hyper-Redundant Manipulator |
|
Cho, Minjae | KAIST |
Yi, Yesung | Korea Advanced Institute of Science and Technology |
Kyung, Ki-Uk | Korea Advanced Institute of Science & Technology (KAIST) |
Keywords: Tendon/Wire Mechanism, Redundant Robots, Mechanism Design
Abstract: Hyper-redundant manipulators have been developed for hazardous environment exploration due to their flexibility and high agility in workplace. In this research, we designed a hyper-redundant manipulator by integrating Twisted String Actuators (TSAs) and Rolling Contact Joints (RCJs) to overcome the limitations of traditional cable-driven system, such as difficulties with long-distance power transmission, and to achieve high payload capability with a compact design. To prevent instantaneous tension loss by the slack and to enhance control consistency of the manipulator by preserving the relationship between contraction ratio of TSA and motor rotations, we proposed a tension maintenance mechanism using compression springs at the distal end of the manipulator. Additionally, to reduce losses from string contact friction, spring sheaths were inserted along the joint holes. Our approaches enhance the repeatability and position controllability of the manipulator. We noted a 33.5% reduction of error in repeatability test along with 35.9% and 38.8% improvements in piecewise position control accuracy and precision compared to a conventional manipulator, respectively, leading to enhanced controllability. We also experimentally verified that the proposed manipulator can maintain its trajectory with a variance of less than 2.83% up to 1600 g. Overall, our manipulator has the potential to expand the exploration environments in which robots can be used by simultaneously demonstrating large payload and controllability.
|
|
10:05-10:10, Paper TuAT16.3 | Add to My Program |
The Franka Emika Robot: A Standard Platform in Robotics Research |
|
Haddadin, Sami | Mohamed Bin Zayed University of Artificial Intelligence |
Keywords: Compliance and Impedance Control, Force Control, Performance Evaluation and Benchmarking
Abstract: Over the last decade, industrial robots have evolved from well-established position-controlled systems to collaborative and soft robots. In 2017 we introduced the tactile lightweight robot system Franka Emika Robot, characterized by advanced safety control, force sensing, joint torque and force control, and hand-guiding performance. In the meantime, the system has become a well-adopted reference platform for robotics research in AI and machine learning, manipulation, control, human-robot interaction, and motion planning. It features multiple functional and widely used interfaces, including 1kHz real-time joint torque control access or precise kinematic and dynamic models. Furthermore, it became a crystallization point of a research ecosystem since the system's affordability further lowered the entrance barrier to high-performance joint torque-controlled robots. In this article, a quantitative analysis and discussion of the use of the system in worldwide research labs over the last five years, its impact on the creation of a compatible software ecosystem, and examples of milestone experiments made possible with the robot are given. The robotics community benefits from understanding
|
|
10:10-10:15, Paper TuAT16.4 | Add to My Program |
MeshDMP: Motion Planning on Discrete Manifolds Using Dynamic Movement Primitives |
|
Dalle Vedove, Matteo | University of Trento |
Abu-Dakka, Fares | New York University Abu Dhabi |
Palopoli, Luigi | University of Trento |
Fontanelli, Daniele | University of Trento |
Saveriano, Matteo | University of Trento |
Keywords: Learning from Demonstration, Constrained Motion Planning
Abstract: An open problem in industrial automation is to reliably perform tasks requiring in-contact movements with complex workpieces, as current solutions lack the ability to seamlessly adapt to the workpiece geometry. In this paper, we propose a Learning from Demonstration approach that allows a robot manipulator to learn and generalise motions across complex surfaces by leveraging differential mathematical operators on discrete manifolds to embed information on the geometry of the workpiece extracted from triangular meshes, and extend the Dynamic Movement Primitives (DMPs) framework to generate motions on the mesh surfaces. We also propose an effective strategy to adapt the motion to different surfaces, by introducing an isometric transformation of the learned forcing term. The resulting approach, namely MeshDMP, is evaluated both in simulation and real experiments, showing promising results in typical industrial automation tasks like car surface polishing.
|
|
10:15-10:20, Paper TuAT16.5 | Add to My Program |
Robotic Sim-To-Real Transfer for Long-Horizon Pick-And-Place Tasks in the Robotic Sim2Real Competition |
|
Yang, Ming | University of Chinese Academy of Sciences |
Cao, Hongyu | Tianjin University |
Zhao, Lixuan | TianJin University |
Zhang, Chenrui | TianJin University |
Chen, Yaran | Institute of Automation, Chinese Academy of Sciense |
Keywords: Engineering for Robotic Systems, Mobile Manipulation, Perception for Grasping and Manipulation
Abstract: This paper presents a fully autonomous robotic system that performs sim-to-real transfer in complex long-horizon tasks involving navigation, recognition, grasping, and stacking in an environment with multiple obstacles. The key feature of the system is the ability to overcome typical sensing and actuation discrepancies during sim-to-real transfer and to achieve consistent performance without any algorithmic modifications. To accomplish this, a lightweight noise-resistant visual perception system and a nonlinearity-robust servo system are adopted. We conduct a series of tests in both simulated and real-world environments. The visual perception system achieves the speed of 11 ms per frame due to its lightweight nature, and the servo system achieves sub-centimeter accuracy with the proposed controller. Both exhibit high consistency during sim-to-real transfer. Our robotic system took first place in the mineral searching task of the Robotic Sim2Real Challenge hosted at ICRA 2024. The simulator is available from the competition committee at https://github.com/AIR-DISCOVER/ICRA2024-Sim2Real-RM, and all code and competition videos can be accessed via our GitHub repository at https://github.com/Bob-Eric/rmus2024_solution_ZeroBug.
|
|
10:20-10:25, Paper TuAT16.6 | Add to My Program |
Towards Autonomous Data Annotation and System-Agnostic Robotic Grasping Benchmarking with 3D-Printed Fixtures |
|
Boerdijk, Wout | German Aerospace Center (DLR) |
Durner, Maximilian | German Aerospace Center DLR |
Sakagami, Ryo | German Aerospace Center (DLR) |
Lehner, Peter | German Aerospace Center (DLR) |
Triebel, Rudolph | German Aerospace Center (DLR) |
Keywords: Deep Learning for Visual Perception, Grasping, Data Sets for Robotic Vision
Abstract: The interaction of robots with their environment requires robust object-centric perception capabilities, typically achieved using learning-based methods trained on synthetic data. However, real-world deployment demands evaluating these capabilities in relevant environments, often involving extensive manual annotation for a quantitative analysis. Additionally, standardized evaluations for robotic tasks, such as grasping, need reproducible object scene configurations and performance benchmarks. We propose a solution to both problems by temporarily employing 3D-printed components, so-called fixtures, which can be designed for any rigid object. Once the scene is set up and object poses are extracted, the fixtures are removed, leaving the natural scene without any artificial distractions. The presented approach is seemingly applicable for pre-determined configurations of multiple objects, which enables precise re-building of scenes with consistent object-to-object relations. Our suggested annotation procedure achieves strong pose accuracy solely on RGB images without any manual involvement. We evaluate and show the usability of the proposed fixtures for automated real-world data annotation to fine-tune a detector and for benchmarking object pose estimation algorithms for robotic grasping. Code and fixture meshes for 3D printing are available at https://github.com/DLR-RM/fixture_generation.
|
|
10:25-10:30, Paper TuAT16.7 | Add to My Program |
From Instantaneous to Predictive Control: A More Intuitive and Tunable MPC Formulation for Robot Manipulators |
|
Ubbink, Johan Bernard | KU Leuven |
Viljoen, Ruan Matthys | KU Leuven |
Aertbelien, Erwin | KU Leuven |
Decré, Wilm | Katholieke Universiteit Leuven |
De Schutter, Joris | KU Leuven |
Keywords: Optimization and Optimal Control, Sensor-based Control, Motion Control
Abstract: Model predictive control (MPC) has become increasingly popular for the control of robot manipulators due to its improved performance compared to instantaneous control approaches. However, tuning these controllers remains a significant hurdle. To address this hurdle, we propose a practical MPC formulation which retains the more interpretable tuning parameters of the instantaneous control approach while enhancing performance through a prediction horizon. The formulation is motivated at hand of a simple example, highlighting the practical tuning challenges associated with typical MPC approaches and showing how the proposed formulation alleviates these challenges. Furthermore, the formulation is validated on a surface-following task, illustrating its applicability to industrially relevant scenarios. Although the research is presented in the context of robot manipulator control, we anticipate that the formulation is more broadly applicable.
|
|
TuAT17 Regular Session, 405 |
Add to My Program |
Prosthetics and Physically Assistive Devices |
|
|
Chair: Hirata, Yasuhisa | Tohoku University |
Co-Chair: Thomas, Gray | Texas A&M University |
|
09:55-10:00, Paper TuAT17.1 | Add to My Program |
A Control Framework for Accurate Mechanical Impedance Rendering with Series-Elastic Joints in Prosthetic Actuation Applications |
|
Harris, Isaac | University of Michigan |
Rouse, Elliott | University of Michigan |
Gregg, Robert D. | University of Michigan |
Thomas, Gray | Texas A&M University |
Keywords: Compliance and Impedance Control, Compliant Joints and Mechanisms, Prosthetics and Exoskeletons
Abstract: In addition to lifting up the body during gait, human legs provide stabilizing torques that can be modeled as a spring-damper mechanical impedance. While powered prosthetic leg actuators can also imitate spring-damper be- haviors, the rendered impedance can be quite different from the desired impedance, stemming from unmodeled torques in the transmission (e.g., sliding friction, bearing damping, gear inefficiency, etc.). Moreover, for powered prostheses to mimic human joint impedance, they will need actuators that accurately render a wide range of mechanical impedances in a variety of ground contact conditions, including nearly free-swinging behavior in swing phase and stiff spring-like behavior in stance phase. For series-elastic prosthetic leg actuators, as in Open- Source Leg (OSL), these sudden output inertia changes present a challenge for traditional cascaded impedance control. In this paper we propose a solution based on disturbance observers and full state feedback (FSF) impedance control. With transmission disturbances attenuated, the FSF controller can use pole-zero placement to specify the actuator impedance that couples to the uncertain joint inertia. We validate our control framework on an OSL-like two-actuator dynamometry testbed.
|
|
10:00-10:05, Paper TuAT17.2 | Add to My Program |
Concept and Prototype Development of Adaptive Touch Walking Support Robot for Maximizing Human Physical Potential |
|
Terayama, Junya | Tohoku University |
Ravankar, Ankit A. | Tohoku University |
Salazar Luces, Jose Victorio | Tohoku University |
Tafrishi, Seyed Amir | Cardiff Univerity |
Hirata, Yasuhisa | Tohoku University |
Keywords: Human-Centered Robotics, Physically Assistive Devices, Motion Control
Abstract: We propose a new walking support robot concept, "Nimbus Guardian," designed to enhance the mobility of both healthy and frail elderly individuals who can walk independently. The proposed robot differs from traditional walker-type or cane-type aids by offering adaptive, minimal touch support based on the user's walking dynamics. Our goal is to realize versatile touch to the user as a preliminary study for developing the adaptive touch walking support robot. To achieve this, we've established a categorization system for walking support touch, outlining the specific types of assistance required for our robot. Based on these categorization, we've developed a prototype that improves the versatility of touch support (touch point, force, and initiator), adapting to the user's body. Our prototype is equipped to offer multiple touch support parts, adjusting to the user's physique. For versatile touch capabilities, we designed a motion control algorithm that includes a controller which directs the robot's wheel movements according to the chosen support points, and a state machine that provides multiple arm placements and movements. We have experimentally implemented this motion control algorithm in our prototype. Through experiments, we verified the touch versatility and discussed the prototype's utility and potential for further development.
|
|
10:05-10:10, Paper TuAT17.3 | Add to My Program |
Learning and Online Replication of Grasp Forces from Electromyography Signals for Prosthetic Finger Control |
|
Arbaud, Robin | HRI2 Lab., Istituto Italiano Di Tecnologia ; Dept. of Informatic |
Motta, Elisa | Italian Institute of Technology |
Avaro, Marco | Airworks S.r.l |
Picinich, Stefano | Airworks |
Lorenzini, Marta | Istituto Italiano Di Tecnologia |
Ajoudani, Arash | Istituto Italiano Di Tecnologia |
Keywords: Prosthetics and Exoskeletons, Human Factors and Human-in-the-Loop, Force Control
Abstract: Partial hand amputations significantly affect the physical and psychosocial well-being of individuals, yet intuitive control of externally powered prostheses remains an open challenge. To address this gap, we developed a force-controlled prosthetic finger activated by electromyography (EMG) signals. The prototype, constructed around a wrist brace, functions as a supernumerary finger placed near the index, allowing for early-stage evaluation on unimpaired subjects. A neural network-based model was then implemented to estimate fingertip forces from EMG inputs, allowing for online adjustment of the prosthetic finger grip strength. The force estimation model was validated through experiments with ten participants, demonstrating its effectiveness in predicting forces. Additionally, online trials with four users wearing the prosthesis exhibited precise control over the device. Our findings highlight the potential of using EMG-based force estimation to enhance the functionality of prosthetic fingers.
|
|
10:10-10:15, Paper TuAT17.4 | Add to My Program |
Integrated Motion State Prediction for Sit-To-Stand and Stand-To-Sit Motions Toward Effective Power Assist Control |
|
Ren, Kai | Kyoto University |
Nakamura, Yuichi | Kyoto University |
Kondo, Kazuaki | Kyoto University |
Shimonishi, Kei | Kyoto University |
Ito, Takahide | RIKEN |
Furukawa, Jun-ichiro | Guardian Robot Project, RIKEN |
An, Qi | The University of Tokyo |
Keywords: Intention Recognition, Behavior-Based Systems, Physically Assistive Devices
Abstract: Sit-to-stand and stand-to-sit motions are important in daily activities. However, elderly individuals often find these motions difficult to perform with declining lower limb strength, which causes a considerable reduction to their quality of life. In this study, a sensing method for controlling robotic assistive devices was proposed. This method utilizes electromyographic measurements and a deep neural network to predict motion initiation, and it estimates the timing of triggering assistive devices. Experimental results indicate that four muscle synergy patterns are required to represent the sit-to-stand and stand-to-sit motions together, with two of them being shared between both movements. Subsequently, a long short-term memory network was designed to forecast these two motions, and the result indicates that the prediction accuracy reached 92.95% +- 0.83% with forecasting time of 300 ms.
|
|
10:15-10:20, Paper TuAT17.5 | Add to My Program |
On Chain Driven, Adaptive, Underactuated Fingers for the Development of Affordable, Robust Humanlike Prosthetic Hands |
|
Heinemann, Trevor | University of Auckland |
Wallace, Raymond | The University of Auckland |
Liarokapis, Minas | The University of Auckland |
Keywords: Prosthetics and Exoskeletons, Medical Robots and Systems
Abstract: Amputations and limb loss can have detrimental effects on personal well-being. Although prosthetic devices can offer significant benefits helping amputees regain some of the lost dexterity, they often lack the required affordability and durability. Current affordable prosthetic designs have trended towards underactuation which leads to stable grasps but is often characterized by low durability. In this paper, a new chain driven, adaptive, underactuated finger design has been proposed for the development of affordable and highly durable prosthetic hands. The transmission mechanism employed is in the form of a steel roller chain. The finger phalanges are constructed of 3D printed PLA, and finger flexion is produced by pulling the internally routed roller chain that is rerouted using sprockets. In total, six 3D printed PLA sprockets are used for chain routing, with an emphasis on high force transmission. The performance of the proposed chain-driven finger was experimentally validated and compared with an analogous tendon-driven version. The metrics employed for this comparison were longevity, pinch grasp efficiency, force response, and maximum force capability. The chain driven finger was shown to have a higher maximum transmissible force, better long term durability, and no issues related to elongation (such as tendon elongation). The cost to manufacture the chain driven robotic finger is only 91 USD, making it an excellent solution for affordable prostheses.
|
|
10:20-10:25, Paper TuAT17.6 | Add to My Program |
Force Myography Based Torque Estimation in Human Knee and Ankle Joints |
|
Marquardt, Charlotte | Karlsruhe Institute of Technology (KIT) |
Schulz, Arne | Karlsruhe Institute of Technology (KIT) |
Dezman, Miha | Karlsruhe Institute of Technology |
Kurz, Gunther | Karlsruhe Institute of Technology (KIT) |
Stein, Thorsten | Karlsruhe Institute of Technology, Center |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Prosthetics and Exoskeletons, Physically Assistive Devices, Wearable Robotics
Abstract: The online adaptation of exoskeleton control based on muscle activity sensing offers a promising approach to personalizing exoskeleton behavior based on the user’s biosignals. While electromyography (EMG)-based methods have demonstrated improvements in joint torque estimation, EMG sensors require direct skin contact and extensive post-processing. In contrast, force myography (FMG) measures normal forces resulting from changes in muscle volume due to muscle activity. We propose an FMG-based method to estimate knee and ankle joint torques by integrating joint angles and velocities with muscle activity data. We learn a model for joint torque estimation using Gaussian process regression (GPR). The effectiveness of the proposed FMG-based method is validated on isokinetic motions performed by ten participants. The model is compared to a baseline model that uses only joint angle and velocity, as well as a model augmented by EMG data. The results indicate that incorporating FMG into exoskeleton control can improve the estimation of joint torque for the ankle and knee joints in novel task characteristics within a single participant. Although the findings suggest that this approach may not improve the generalizability of estimates between multiple participants, they highlight the need for further research into its potential applications in exoskeleton control.
|
|
10:25-10:30, Paper TuAT17.7 | Add to My Program |
Adaptive Ankle-Foot Prosthesis with Passive Agonist-Antagonist Design |
|
Crotti, Matteo | Istituto Italiano Di Tecnologia |
Pace, Anna | Istituto Italiano Di Tecnologia |
Grioli, Giorgio | Istituto Italiano Di Tecnologia |
Bicchi, Antonio | Fondazione Istituto Italiano Di Tecnologia |
Catalano, Manuel Giuseppe | Istituto Italiano Di Tecnologia |
Keywords: Prosthetics and Exoskeletons
Abstract: The development of prosthetic feet that closely replicate the natural biomechanics of the human foot remains a significant challenge in prosthetics engineering. This paper presents the design and testing of a novel agonist-antagonist architecture for the ankle joint of a passive prosthetic foot featuring an adaptive sole. The ankle mechanism, inspired by the dynamics of the human leg-ankle-foot complex, utilizes compliant elements in an agonist-antagonist configuration to passively achieve an ankle torque close to that of a sound ankle without the need for external actuation. Concurrently, the adaptive sole adjusts its shape in response to different terrains, potentially improving stability and comfort for the user. The theoretical model underlying the proposed design is presented, followed by a preliminary validation through simulations. Finally, a prototype based on the new architecture is tested by a healthy subject using customized walking boots, demonstrating its potential to improve the functional performance of prosthetic feet in diverse environments.
|
|
TuAT18 Regular Session, 406 |
Add to My Program |
Intelligent Transportation and Smart Cities |
|
|
Chair: Fanti, Maria Pia | Politecnico Di Bari |
Co-Chair: Koide, Kenji | National Institute of Advanced Industrial Science and Technology |
|
09:55-10:00, Paper TuAT18.1 | Add to My Program |
V2X-DGW: Domain Generalization for Multi-Agent Perception under Adverse Weather Conditions |
|
Li, Baolu | Cleveland State University |
Li, Jinlong | Cleveland State University |
Liu, Xinyu | Cleveland State University |
Xu, Runsheng | UCLA |
Tu, Zhengzhong | Texas A&M University |
Guo, Jiacheng | Cleveland State University |
Zou, Qin | Wuhan University |
Li, Xiaopeng | University of Wisconsin-Madison |
Yu, Hongkai | Cleveland State University |
Keywords: Intelligent Transportation Systems, Computer Vision for Transportation, Cooperating Robots
Abstract: Current LiDAR-based Vehicle-to-Everything (V2X) multi-agent perception systems have shown the significant success on 3D object detection. While these models perform well in the trained clean weather, they struggle in unseen adverse weather conditions with the domain gap. In this paper, we propose a Domain Generalization based approach, named V2X-DGW, for LiDAR-based 3D object detection on multi-agent perception system under adverse weather conditions. Our research aims to not only maintain favorable multi-agent performance in the clean weather but also promote the performance in the unseen adverse weather conditions by learning only on the clean weather data. To realize the Domain Generalization, we first introduce the Adaptive Weather Augmentation (AWA) to mimic the unseen adverse weather conditions, and then propose two alignments for generalizable representation learning: Trust-region Weather-invariant Alignment (TWA) and Agent-aware Contrastive Alignment (ACA). To evaluate this research, we add Fog, Rain, Snow conditions on two publicized multi-agent datasets based on physics-based models, resulting in two new datasets: OPV2V-w and V2XSet-w. Extensive experiments demonstrate that our V2X-DGW achieved significant improvements in the unseen adverse weathers.
|
|
10:00-10:05, Paper TuAT18.2 | Add to My Program |
The Automation of Uncrewed Aircraft Systems Traffic Management Calibration Based on Experimental Platform Data |
|
Henderson, Thomas C. | University of Utah |
Sacharny, David | University of Utah |
Mello, Chad | US Air Force Academy |
Raley, William | University of Utah |
Keywords: Automation Technologies for Smart Cities, Intelligent Transportation Systems, Autonomous Vehicle Navigation
Abstract: Many countries are developing an Urban Air Mobility (UAM) capability defining an Uncrewed Aircraft Systems (UAS) Traffic Management (UTM) architecture to allow safe UAS services in urban environments (e.g., delivery, inspection, air taxis, etc.). The main considerations are air worthiness, operator certification, air traffic management, C2 Link, detect and avoid (DAA), safety management, and security. In addition, if thousands of simultaneous UAS flights are to be achieved, it is not possible for them to be controlled individually by human operators. This makes it necessary to have a rigorous and safe automation methodology to handle such a number of flights. A lane-based airspace structure has been proposed which reduces the complexity of strategic deconfliction by providing UAS agents with a set of pre-defined airway corridors called lanes. This yields collateral benefits including UAS information privacy, robust contingency handling exploiting the lane structure, as well as improved observability and control of the air space. A robust set of UTM parameters and policies must be determined based on the performance characteristics of the deployed UAS platforms, and a methodology which constitutes a first step toward this end is proposed and demonstrated here. In order to realize this approach, a set of initial experiments have been performed to determine the constraints imposed by the UTM on UAS platform capabilities and vice versa. Initial implementation parameters and policies are defined. The major contribution here is a methodology to calibrate UTM safety parameters (e.g., headway, platform speed) in terms of specific platform models’ operational characteristics. That is, UTM parameters are a function of platform and not some arbitrarily imposed values. Safety uncertainty is then characterized by the calibration method.
|
|
10:05-10:10, Paper TuAT18.3 | Add to My Program |
TS-DETR: Traffic Sign Detection Based on Positive and Negative Sample Augmentation |
|
Lin, Ching-Lun | National Chung Cheng University |
Lin, Huei-Yung | National Taipei University of Technology |
Wang, Chieh-Chih | National Yang Ming Chiao Tung University |
Keywords: Intelligent Transportation Systems, Autonomous Vehicle Navigation
Abstract: Traffic sign detection plays an essential role in advanced driver assistance system (ADAS) or self-driving vehicles. Typically, deep neural networks are employed to analyze road scene images captured by an onboard camera. However, due to the significant variation in appearance of different traffic signs, the classification of high similarity patterns is still a challenging task. To address these issues, this paper presents an end-to-end traffic sign detection framework based on DETR. The proposed network incorporates data augmentation and negative sample learning to mitigate the problem of data imbalance and enhance the model recognition capability effectively. An UASPP module (Upsample Atrous Pyramid Pooling) is introduced to integrate multi-scale features and global information. In the experiments, the performance evaluation has demonstrated the improvement of mAP by 3.9% on TT100K and 36.3% on GTSDB compared to state-of-the-art methods. The code and datasets are available at https://github.com/chinglun/TS-DETR.
|
|
10:10-10:15, Paper TuAT18.4 | Add to My Program |
A User Based HVAC System Management through Blockchain Technology and Model Predictive Control (I) |
|
Olivieri, Giuseppe | Politecnico Di Bari |
Volpe, Gaetano | Politecnico Di Bari |
Mangini, Agostino Marcello | Politecnico Di Bari |
Fanti, Maria Pia | Politecnico Di Bari |
Keywords: Automation Technologies for Smart Cities, Building Automation, Energy and Environment-Aware Automation
Abstract: This paper introduces an innovative approach to designing a user-based Heating, Ventilation, and Air- Conditioning (HVAC) system management connected with the District Energy Management System. By classifying the users into dynamic energy consumption classes to reward energy efficiency and penalize excessive use, users can modify their behavior to pass to a less expensive and more virtuous consumption class. To this aim, a blockchain platform determines the rewards and penalties and, by a K-means clustering algorithm, categorizes users into respective groups. Then, a Class Follower Problem is formulated and solved by a Model Predictive Control (MPC) strategy integrated with a Long Short-Term Memory network as a predictive model. If the users follow the suggestions proposed by the controller, i.e., the thermostat set-points and the time intervals in which the HVAC system must be switched off or on, the users can be located in a more virtuous consumption class. A case study conducted within an energy district in Bari (Italy) shows how the proposed architectural framework tuned thermal regulation in intelligent buildings while concurrently achieving energy optimization
|
|
10:15-10:20, Paper TuAT18.5 | Add to My Program |
Non-Parametric GNSS Integer Ambiguity Estimation Via Positional Likelihood Field Marginalization |
|
Takanose, Aoki | National Institute of Advanced Industrial Science and Technology |
Koide, Kenji | National Institute of Advanced Industrial Science and Technology |
Oishi, Shuji | National Institute of Advanced Industrial Science and Technology |
Yokozuka, Masashi | Nat. Inst. of Advanced Industrial Science and Technology |
Keywords: Localization, Autonomous Agents, Intelligent Transportation Systems
Abstract: In this paper, we propose a non-parametric method for estimating the posterior distribution of GNSS integer ambiguity. Is is difficult to estimate the posterior probability of discrete integer ambiguities directly from carrier phase observations due to the unclear domain definition. We thus introduce the positional likelihood field that accumulates the ambiguity function method values in the positional space and then estimate the integer ambiguity distributions by marginalizing the likelihood over the entire position. Because the positional likelihood field is defined in the positional space, it enables ease of carrier phase likelihood accumulation. In order to correctly estimate the posterior distribution, however, a sufficient density of the samples is required that results in a large computational cost. The proposed method enables large-scale sampling by taking advantage of GPU parallel processing. Experimental results demonstrate that the proposed method enables accurate and robust estimation of integer ambiguity distributions, contributing to improved centimeter-level position estimation accuracy. In addition, the histograms provided quantitative evidence of events in urban environments where integer ambiguity is not uniquely determined.
|
|
10:20-10:25, Paper TuAT18.6 | Add to My Program |
Whenever, Wherever: Towards Orchestrating Crowd Simulations with Spatio-Temporal Spawn Dynamics |
|
Kreutz, Thomas | Technical University Darmstadt |
Mühlhäuser, Max | Technical University of Darmstadt |
Sanchez Guinea, Alejandro | TU Darmstadt |
Keywords: Automation Technologies for Smart Cities, Modeling and Simulating Humans, Simulation and Animation
Abstract: Realistic crowd simulations are essential for immersive virtual environments, relying on both individual behaviors (microscopic dynamics) and overall crowd patterns (macroscopic characteristics). While recent data-driven methods like deep reinforcement learning improve microscopic realism, they often overlook critical macroscopic features such as crowd density and flow, which are governed by spatio-temporal spawn dynamics, namely, when and where agents enter a scene. Traditional methods, like random spawn rates, stochastic processes, or fixed schedules, are not guaranteed to capture the underlying complexity or lack diversity and realism. To address this issue, we propose a novel approach called nTPP-GMM that models spatio-temporal spawn dynamics using Neural Temporal Point Processes (nTPPs) that are coupled with a spawn-conditional Gaussian Mixture Model (GMM) for agent spawn and goal positions. We evaluate our approach by orchestrating crowd simulations of three diverse real-world datasets with nTPP-GMM. Our experiments demonstrate the orchestration with nTPP-GMM leads to realistic simulations that reflect real-world crowd scenarios and allow crowd analysis.
|
|
10:25-10:30, Paper TuAT18.7 | Add to My Program |
RMP-YOLO: A Robust Motion Predictor for Partially Observable Scenarios Even If You Only Look Once |
|
Sun, Jiawei | National University of Singapore |
Li, Jiahui | National University of Singapore |
Liu, Tingchen | National University of Singapore |
Yuan, Chengran | National Universtiy of Singapore |
Sun, Shuo | National University of Singapore |
Huang, Zefan | National University of Singapore |
Wong, Anthony | Moovita |
Tee, Keng Peng | Moovita |
Ang Jr, Marcelo H | National University of Singapore |
Keywords: Autonomous Vehicle Navigation, Integrated Planning and Learning, Intelligent Transportation Systems
Abstract: We introduce RMP-YOLO, a unified framework designed to provide robust motion predictions even with incomplete input data. Our key insight stems from the observation that complete and reliable historical trajectory data plays a pivotal role in ensuring accurate motion prediction. Therefore, we propose a new paradigm that prioritizes the reconstruction of intact historical trajectories before feeding them into the prediction modules. Our approach introduces a novel scene tokenization module to enhance the extraction and fusion of spatial and temporal features. Following this, our proposed recovery module reconstructs agents' incomplete historical trajectories by leveraging local map topology and interactions with nearby agents. The reconstructed, clean historical data is then integrated into the downstream prediction modules. Our framework is able to effectively handle missing data of varying lengths and remains robust against observation noise, while maintaining high prediction accuracy. Furthermore, our recovery module is compatible with existing prediction models, ensuring seamless integration. Extensive experiments validate the effectiveness of our approach, and deployment in real-world autonomous vehicles confirms its practical utility. In the 2024 Waymo Motion Prediction Competition, our method, RMP-YOLO, achieves state-of-the-art performance, securing third place. Our code is open-source at https://github.com/ggosjw/RMP-YOLO.
|
|
TuAT19 Regular Session, 407 |
Add to My Program |
Visual-Inertial Odometry |
|
|
Chair: Myung, Hyun | KAIST (Korea Advanced Institute of Science and Technology) |
Co-Chair: Sanchez-Lopez, Jose Luis | University of Luxembourg |
|
09:55-10:00, Paper TuAT19.1 | Add to My Program |
Leg Exoskeleton Odometry Using a Limited FOV Depth Sensor |
|
Elnecave Xavier, Fabio | MINES Paris / Wandercraft |
Viozelange, Matis | Wandercraft |
Burger, Guillaume | Wandercraft |
Petriaux, Marine | Wandercraft |
Deschaud, Jean-Emmanuel | ARMINES |
Goulette, François | MINES ParisTech |
Keywords: Sensor Fusion, Mapping, Prosthetics and Exoskeletons
Abstract: For leg exoskeletons to operate effectively in real-world environments, they must be able to perceive and understand the terrain around them. However, unlike other legged robots, exoskeletons face specific constraints on where depth sensors can be mounted due to the presence of a human user. These constraints lead to a limited Field Of View (FOV) and greater sensor motion, making odometry particularly challenging. To address this, we propose a novel odometry algorithm that integrates proprioceptive data from the exoskeleton with point clouds from a depth camera to produce accurate elevation maps despite these limitations. Our method builds on an extended Kalman filter (EKF) to fuse kinematic and inertial measurements, while incorporating a tailored iterative closest point (ICP) algorithm to register new point clouds with the elevation map. Experimental validation with a leg exoskeleton demonstrates that our approach reduces drift and enhances the quality of elevation maps compared to a purely proprioceptive baseline, while also outperforming a more traditional point cloud map-based variant.
|
|
10:00-10:05, Paper TuAT19.2 | Add to My Program |
Improving Monocular Visual-Inertial Initialization with Structureless Visual-Inertial Bundle Adjustment |
|
Song, Junlin | University of Luxembourg |
Richard, Antoine | University of Luxembourg |
Olivares-Mendez, Miguel A. | Interdisciplinary Centre for Security, Reliability and Trust - U |
Keywords: Localization
Abstract: Monocular visual inertial odometry (VIO) has facilitated a wide range of real-time motion tracking applications, thanks to the small size of the sensor suite and low power consumption. To successfully bootstrap VIO algorithms, the initialization module is extremely important. Most initialization methods rely on the reconstruction of 3D visual point clouds. These methods suffer from high computational cost as state vector contains both motion states and 3D feature points. To address this issue, some researchers recently proposed a structureless initialization method, which can solve the initial state without recovering 3D structure. However, this method potentially compromises performance due to the decoupled estimation of rotation and translation, as well as linear constraints. To improve its accuracy, we propose novel structureless visual-inertial bundle adjustment to further refine previous structureless solution. Extensive experiments on real-world datasets show our method significantly improves the VIO initialization accuracy, while maintaining real-time performance.
|
|
10:05-10:10, Paper TuAT19.3 | Add to My Program |
ORB-SfMLearner: ORB-Guided Self-Supervised Visual Odometry with Selective Online Adaptation |
|
Jin, Yanlin | Sichuan University, Rice University |
Ju, Rui-Yang | National Taiwan University |
Liu, Haojun | Carnegie Mellon University |
Zhong, Yuzhong | Sichuan University |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, SLAM
Abstract: Deep visual odometry, despite extensive research, still faces limitations in accuracy and generalizability that prevent its broader application. To address these challenges, we propose an Oriented FAST and Rotated BRIEF (ORB)-guided visual odometry with selective online adaptation named ORB-SfMLearner. We present a novel use of ORB features for learning-based ego-motion estimation, leading to more robust and accurate results. We also introduce the cross-attention mechanism to enhance the explainability of PoseNet and have revealed that driving direction of the vehicle can be explained through the attention weights. To improve generalizability, our selective online adaptation allows the network to rapidly and selectively adjust to the optimal parameters across different domains. Experimental results on KITTI and vKITTI datasets show that our method outperforms previous state-of-the-art deep visual odometry methods in terms of ego-motion accuracy and generalizability.
|
|
10:10-10:15, Paper TuAT19.4 | Add to My Program |
QVIO2: Quantized MAP-Based Visual-Inertial Odometry |
|
Peng, Yuxiang | University of Delaware |
Chen, Chuchu | University of Delaware |
Huang, Guoquan (Paul) | University of Delaware |
Keywords: Visual-Inertial SLAM, Localization, SLAM
Abstract: Energy-efficient visual-inertial motion tracking on SWAP-constrained edge devices (e.g., drones and AR glasses) is essential but challenging. Our previous work introduced the first-of-its-kind quantized visual-inertial odometry (QVIO), utilizing either raw measurement quantization (zQVIO) or single-bit residual quantization (rQVIO). While QVIO has demonstrated significant data transfer reduction with competitive performance, but it has limitations. Specifically, zQVIO directly quantizes raw measurements into multi-bit values, while requiring the ad-hoc inflation of measurement noise to account for quantization errors. On the other hand, rQVIO is limited to single-bit measurement with a certain accuracy loss. This work introduces QVIO2 to address these issues. The proposed QVIO2 improves data quantization strategies and derives a Maximum A Posteriori (MAP) quantized estimator that rigorously handles both multi-bit and single-bit, raw and residual quantized measurements in a unified manner. These improvements lead to more communication-efficient and accurate systems. Additionally, we optimize the communication protocol to further reduce data transfer by eliminating un- necessary transmissions. Extensive numerical and experimental results demonstrate reduced communication requirements and improved accuracy. Compared to the previous QVIO system, zQVIO2 achieves the same accuracy with a 30% reduction in data transfer, while rQVIO2 improves accuracy without increasing data communication. In real-world scenarios, our new zQVIO2 and rQVIO2 have demonstrated nearly no accuracy loss with only 4.6 bits and 3.5 bits of data communication, achieving compression rates of 7× and 9.1×.
|
|
10:15-10:20, Paper TuAT19.5 | Add to My Program |
Is Iteration Worth It? Revisit Its Impact in Sliding-Window VIO |
|
Chen, Chuchu | University of Delaware |
Peng, Yuxiang | University of Delaware |
Huang, Guoquan (Paul) | University of Delaware |
Keywords: Visual-Inertial SLAM, Sensor Fusion, Localization
Abstract: Visual-inertial odometry (VIO), which fuses noisy inertial readings and camera measurements to provide 3D motion tracking, is a foundational component in many autonomous applications. With the increasing use of next-generation edge devices (e.g., IoT devices, nano drones, and mobile robotics) that are constrained by limited power, resources, and multi-tasking demands, balancing computational efficiency and accuracy in VIO estimators has become more critical than ever. Historically, state estimation algorithms have been developed using either optimization-based or filtering-based methods, with the key distinction being the ability to relinearize measurements and correct state estimates iteratively. While it has been widely claimed that iterative methods improve accuracy by allowing for the reduction of error through relinearization at a higher computational demand. Conversely, filtering methods are more efficient but may suffer from significant linearization errors. However, these trade-offs have not been thoroughly examined in the context of visual-inertial motion tracking. In this paper, we conduct the first comprehensive study on the impact of iterative algorithms in sliding-window VIO. We analyze the relinearization of IMU and camera measurements separately, providing insights into how each affects system performance. By considering key factors such as system observability and measurement processes, we offer a deeper understanding of VIO estimator behavior. Our findings, supported by proof-of-concept real-world tests, provide practical guidelines for balancing accuracy and efficiency, helping practitioners determine when to prioritize iterative methods or simpler filtering approaches while encouraging researchers and engineers to rethink VIO design for optimal resource allocation.
|
|
10:20-10:25, Paper TuAT19.6 | Add to My Program |
EAR-SLAM: Environment-Aware Robust Localization System for Terrestrial-Aerial Bimodal Vehicles |
|
He, Wenjun | Harbin Engineering University |
Wang, XingPeng | ZheJiang University |
Wang, Pengfei | Huzhou Institute of Zhejiang University |
Zhang, Tianfu | Huzhou Institute of Zhejiang University |
Xu, Chao | Zhejiang University |
Gao, Fei | Zhejiang University |
Cao, Yanjun | Zhejiang University, Huzhou Institute of Zhejiang University |
Keywords: SLAM, Visual-Inertial SLAM
Abstract: Terrestrial-aerial bimodal vehicles (TABVs) have attracted great attention because of their advantages over single-model robots. TABVs can provide superior obstacle avoidance capability (flying mode) and safe mobility with long duration (ground mode), offering enhanced adaptability and flexibility in various challenging environments. However, a robust localization approach becomes the bottleneck to stably applying the TABVs in real-world tasks. In this paper, we present an environment-aware robust localization system specifically designed for passive-wheel-based TABVs, which feature two passive wheels alongside a standard quadrotor. The localization system tightly integrates data from multiple sensors, including a camera, Inertial Measurement Units (IMUs), encoders, and single-point laser distance sensors. First, we introduce a terrain-aware odometer model that accurately estimates terrain slope and vehicle's velocity by fusing gyroscope, encoder, and single-point laser measurements. Then, we propose an anomaly-aware method that senses anomalous sensors and dynamically adjusts the optimization weights accordingly. By explicitly estimating the environmental conditions, such as ground terrain slopes and visual information qualities, the robot can achieve accurate and robust localization results on the ground. To validate our localization approach, we conducted extensive experiments across various challenging scenarios, demonstrating the effectiveness and reliability of our system for real-world applications.
|
|
10:25-10:30, Paper TuAT19.7 | Add to My Program |
DynaVINS++: Robust Visual-Inertial State Estimator in Dynamic Environments by Adaptive Truncated Least Squares and Stable State Recovery |
|
Song, Seungwon | Hyundai Motor Company |
Lim, Hyungtae | Massachusetts Institute of Technology |
Lee, Alex | Sookmyung Women’s University |
Myung, Hyun | KAIST (Korea Advanced Institute of Science and Technology) |
Keywords: Visual-Inertial SLAM, Sensor Fusion, SLAM
Abstract: Despite extensive research in robust visual-inertial navigation systems(VINS) in dynamic environments, many approaches remain vulnerable to objects that suddenly start moving, which are referred to as abruptly dynamic objects. In addition, most approaches have considered the effect of dynamic objects only at the feature association level. In this study, we observed that the state estimation diverges when errors from false correspondences owing to moving objects incorrectly propagate into the IMU bias terms. To overcome these problems, we propose a robust VINS framework called DynaVINS++, which employs a) adaptive truncated least square method that adaptively adjusts the truncation range using both feature association and IMU preintegration to effectively minimize the effect of the dynamic objects while reducing the computational cost, and b) stable state recovery with bias consistency check to correct misestimated IMU bias and to prevent the divergence caused by abruptly dynamic objects. As verified in both public and real-world datasets, our approach shows promising performance in dynamic environments, including scenes with abruptly dynamic objects.
|
|
TuAT20 Regular Session, 408 |
Add to My Program |
Teleoperation |
|
|
Chair: Fiorini, Paolo | University of Verona |
Co-Chair: Cui, Yuchen | University of California, Los Angeles |
|
09:55-10:00, Paper TuAT20.1 | Add to My Program |
A Pragmatic Approach to Bi-Directional Impedance Reflection Telemanipulation Control: Design and User Study |
|
Lieftink, Robin | TNO, University of Twente |
Falcone, Sara | University of Twente |
Van Der Walt, Christophe | University of Twente |
Van Erp, Jan | TNO |
Dresscher, Douwe | University of Twente |
Keywords: Telerobotics and Teleoperation, Haptics and Haptic Interfaces, Human Factors and Human-in-the-Loop
Abstract: Force feedback generally increases the effectiveness of execution and sense of embodiment on telemanipulation systems. However, systems with force feedback are vulnerable to time delays, reducing their transparency and stability. In this paper, we implement a bi-directional impedance reflection controller, a concept that was presented already in 1989 by Blake Hannaford [1] but was never fully implemented. In this method, the simplified impedances of the operator and the environment are estimated and reflected back to the remote robot and haptic interface, respectively. A trajectory predictor is added to compensate for the delayed motion. We then evaluated the effectiveness of the system in a user study, comparing it to a system with a classical bilateral impedance controller with passivity layers. Three time delay groups (0, 10, and 20 ms one-way delay) of 10 participants each executed different tasks with both controllers. The results show that the bi-directional impedance reflection controller performs significantly better in the 10 ms and 20 ms time delay groups, in terms of task performance, user experience and sense of embodiment. We conclude that this study is the first to show that bi-directional impedance reflection is robust to time delays of at least 20 ms.
|
|
10:00-10:05, Paper TuAT20.2 | Add to My Program |
3D Whole-Body Pose Estimation Using Graph High-Resolution Network for Humanoid Robot Teleoperation |
|
Zhang, Mingyu | Sun Yat-Sen University |
Gao, Qing | Sun Yat-Sen University |
Lai, Yuanchuan | Sun Yat-Sen University |
Zhang, Ye | Sun Yat-Sen University |
Chang, Tao | National University of Defense Technology |
Guo, Yulan | Sun Yat-Sen University |
Keywords: Deep Learning for Visual Perception, Gesture, Posture and Facial Expressions, Human Detection and Tracking
Abstract: In the realm of robotics, teleoperation plays a pivotal role in performing high-risk or intricate tasks, and obtaining precise 3D whole-body pose is crucial for this purpose. Traditional two-stage methods have limitations in estimating different body parts, leading to complex systems and higher estimation errors. In order to address these issues,the paper introduces a novel framework called Graph High-Resolution Network(GraphHRNet) for accurate 3D whole-body pose estimation, which is essential for the teleoperation of humanoid robots. GraphHRNet effectively captures global structural information and local details by integrating a High-Resolution Module and a Multi-branch Regression Module. The High-Resolution Module utilizes an enhanced graph convolution kernel to fuse multi-scale features, capturing global information, while the Multi-branch Regression Module focuses on refining and predicting accurate 3D coordinates for intricate body parts such as hands and face. Experimental results on the H3WB dataset demonstrate that GraphHRNet surpasses state-of-the-art(SOTA) methods in 3D whole-body pose estimation , significantly improving performance. Furthermore, the paper explores the potential application of this approach in a teleoperation system for humanoid robots, providing an intuitive and high-fidelity solution for remotely executing complex tasks. The code have been publicly available at https://github.com/Z-mingyu/GraphHRNet.git
|
|
10:05-10:10, Paper TuAT20.3 | Add to My Program |
Towards Real-Time Generation of Delay-Compensated Video Feeds for Outdoor Mobile Robot Teleoperation |
|
Chakraborty, Neeloy | University of Illinois at Urbana-Champaign |
Fang, Yixiao | University of Illinois at Urbana-Champaign |
Schreiber, Andre | University of Illinois Urbana-Champaign |
Ji, Tianchen | University of Illinois at Urbana-Champaign |
Huang, Zhe | University of Illinois at Urbana-Champaign |
Mihigo, Aganze | University of Illinois at Urbana-Champaign |
Wall, Cassidy | University of Illinois at Urbana Champaign |
Almana, Abdulrahman | University of Illinois Urbana-Champaign |
Driggs-Campbell, Katherine | University of Illinois at Urbana-Champaign |
Keywords: Field Robots, Telerobotics and Teleoperation, Deep Learning for Visual Perception
Abstract: Teleoperation is an important technology to enable supervisors to control agricultural robots remotely. However, environmental factors in dense crop rows and limitations in network infrastructure hinder the reliability of data streamed to teleoperators. These issues result in delayed and variable frame rate video feeds that often deviate significantly from the robot's actual viewpoint. We propose a modular learning-based vision pipeline to generate delay-compensated images in real-time for supervisors. Our extensive offline evaluations demonstrate that our method generates more accurate images compared to state-of-the-art approaches in our setting. Additionally, ours is one of the few works to evaluate a delay-compensation method in outdoor field environments with complex terrain on data from a real robot in real-time. Resulting videos and code are provided at https://sites.google.com/illinois.edu/comp-teleop.
|
|
10:10-10:15, Paper TuAT20.4 | Add to My Program |
ForceMimic: Force-Centric Imitation Learning with Force-Motion Capture System for Contact-Rich Manipulation |
|
Liu, Wenhai | Shanghai Jiao Tong University |
Wang, Junbo | Shanghai Jiao Tong University |
Wang, Yiming | Shanghai Jiao Tong University |
Wang, Weiming | Shanghai Jiao Tong University |
Lu, Cewu | ShangHai Jiao Tong University |
Keywords: Imitation Learning, Force Control, Deep Learning in Grasping and Manipulation
Abstract: In most contact-rich manipulation tasks, humans apply time-varying forces to the target object, compensating for inaccuracies in the vision-guided hand trajectory. However, current robot learning algorithms primarily focus on trajectory-based policy, with limited attention given to learning force-related skills. To address this limitation, we introduce ForceMimic, a force-centric robot learning system, providing a natural, force-aware and robot-free robotic demonstration collection system, along with a hybrid force-motion imitation learning algorithm for robust contact-rich manipulation. Using the proposed ForceCapture system, an operator can peel a zucchini in 5 minutes, while force-feedback teleoperation takes over 13 minutes and struggles with task completion. With the collected data, we propose HybridIL to train a force-centric imitation learning model, equipped with hybrid force-position control primitive to fit the predicted wrench-position parameters during robot execution. Experiments demonstrate that our approach enables the model to learn a more robust policy under the contact-rich task of vegetable peeling, increasing the success rates by 54.5% relatively compared to state-of-the-art pure-vision-based imitation learning. Hardware, code, data and more results can be found on the project website at https://forcemimic.github.io.
|
|
10:15-10:20, Paper TuAT20.5 | Add to My Program |
How to Train Your Robots? the Impact of Demonstration Modality on Imitation Learning |
|
Li, Haozhuo | Stanford University |
Cui, Yuchen | University of California, Los Angeles |
Sadigh, Dorsa | Stanford University |
Keywords: Imitation Learning, Learning from Demonstration, Data Sets for Robot Learning
Abstract: Imitation learning is a promising approach for learning robot policies with user-provided data. The way demonstrations are provided, i.e., demonstration modality, influences the quality of the data. While existing research shows that kinesthetic teaching (physically guiding the robot) is preferred by users for the intuitiveness and ease of use, the majority of existing manipulation datasets were collected through teleoperation via a VR controller or spacemouse. In this work, we investigate how different demonstration modalities impact downstream learning performance as well as user experience. Specifically, we compare low-cost demonstration modalities including kinesthetic teaching, teleoperation with a VR controller, and teleoperation with a spacemouse controller. We experiment with three table-top manipulation tasks with different motion constraints. We evaluate and compare imitation learning performance using data from different demonstration modalities, and collected subjective feedback on user experience. Our results show that kinesthetic teaching is rated the most intuitive for controlling the robot and provides cleanest data for best downstream learning performance. However, it is not preferred as the way for large-scale data collection due to the physical load. Based on such insight, we propose a simple data collection scheme that relies on a small number of kinesthetic demonstrations mixed with data collected through teleoperation to achieve the best overall learning performance while maintaining low data-collection effort.
|
|
10:20-10:25, Paper TuAT20.6 | Add to My Program |
The Impact of Stress and Workload on Human Performance in Robot Teleoperation Tasks |
|
Yi Ting, Sam | Georgia Institute of Technology |
Hedlund-Botti, Erin | Georgia Institute of Technology |
Natarajan, Manisha | Georgia Institute of Technology |
Heard, Jamison | Rochester Institute of Technology |
Gombolay, Matthew | Georgia Institute of Technology |
Keywords: Telerobotics and Teleoperation, Cognitive Human-Robot Interaction, Human Factors and Human-in-the-Loop, Human-Centered Robotics
Abstract: Advances in robot teleoperation have enabled groundbreaking innovations in many fields, such as space exploration, healthcare, and disaster relief. The human operator's performance plays a key role in the success of any teleoperation task, with prior evidence suggesting that operator stress and workload can impact task performance. As robot teleoperation is currently deployed in safety-critical domains, it is essential to analyze how different stress and workload levels impact the operator. We are unaware of any prior work investigating how both stress and workload impact teleoperation performance. We conducted a novel study (n=24) to jointly manipulate users' stress and workload and analyze the user's performance through objective and subjective measures. Our results indicate that, as stress increased, over 70% of our participants performed better up to a moderate level of stress; yet, the majority of participants performed worse as the workload increased. Importantly, our experimental design elucidated that stress and workload have related yet distinct impacts on task performance, with workload mediating the effects of distress on performance (p<.05).
|
|
10:25-10:30, Paper TuAT20.7 | Add to My Program |
Adaptive User Interface with Parallel Neural Networks for Robot Teleoperation |
|
SharafianArdakani, Payman | UofL |
Hanafy, Mohamed A. | University of Louisville |
Kondaurova, Irina | UofL |
Ashary, Ali | University of Louisville |
Rayguru, Madan Mohan | Delhi Technological University |
Popa, Dan | University of Louisville |
Keywords: Telerobotics and Teleoperation, Human Performance Augmentation, Virtual Reality and Interfaces
Abstract: In recent years, human-robot interaction (HRI) has become an increasingly important field of research. The human experience during HRI tasks like teleoperation or turn-taking largely depends on the interface design between the robot and the user. Designing an intuitive user interface (UI) between an arbitrary M-dimensional input device and an N-degree of freedom (DOF) robot remains a significant challenge. This paper proposes a novel UI design approach named the Parallel Neural Networks Adaptive User Interface (PNNUI). PNNUI utilizes two parallel neural networks to learn and then improve the teleoperation performance of users by minimizing task completion time and maximizing motion smoothness. Our method is designed to learn an unintuitive input-output map between user interface hardware and the robot by minimizing task completion time in an offline unsupervised learning scheme based on Neural Networks (NNs) and Genetic Algorithms. Secondly, PNNUI minimizes teleoperation jerk online by adapting the weights of a parallel neural network. We experimentally evaluated the resulting UI for teleoperating a 3-DOF nonholonomic robot through a conventional joystick with three inputs. Twenty human subjects operated the robot along an obstacle course in several conditions. The statistical analysis of the user trial data shows that PNNUI improves the human experience in robot teleoperation by maximizing smoothness while maintaining the completion time of the offline learning scheme. Furthermore, the abstract nature of our formulation enables the customization of performance measures, which extends its applicability to other interface devices and HRI tasks, particularly those that are not intuitive to start with.
|
|
TuAT21 Regular Session, 410 |
Add to My Program |
Reinforcement Learning 1 |
|
|
Chair: Song, WenZhan | University of Georgia |
Co-Chair: Kantaros, Yiannis | Washington University in St. Louis |
|
09:55-10:00, Paper TuAT21.1 | Add to My Program |
Hierarchical Visual Policy Learning for Long-Horizon Robot Manipulation in Densely Cluttered Scenes |
|
Wang, Hecheng | Fudan University |
Qi, Lizhe | Fudan University |
Wang, Ziheng | Academy for Engineering & Technology, Fudan University |
Ren, Jiankun | Fudan University |
Li, Wei | Fudan University |
Sun, Yunquan | Fudan University |
Keywords: Reinforcement Learning, Imitation Learning, Deep Learning in Grasping and Manipulation
Abstract: In this work, we focus on addressing the long-horizon packing tasks in densely cluttered scenes. Such tasks require policies to effectively manage severe occlusions among objects and continually produce precise actions based on visual observations. We propose a vision-based Hierarchical policy for Cluttered-scene Long-horizon Manipulation (HCLM). It employs a high-level policy and three options to select and instantiate three parameterized action primitives: push, pick, and place. We first train the two-stream pick and place options by behavior cloning (BC). Subsequently, we use hierarchical reinforcement learning (HRL) to train the high-level policy and push option. During HRL, we propose a Spatially Extended Q-update (SEQ) to augment the updates for the push option and a Two-Stage Update Scheme (TSUS) to alleviate the non-stationary transition problem in updating the high-level policy. We demonstrate that HCLM significantly outperforms baseline methods in terms of success rate and efficiency in diverse tasks both in simulation and real world. The ablation studies also validate the key roles of SEQ and TSUS in HRL.
|
|
10:00-10:05, Paper TuAT21.2 | Add to My Program |
AERAS: Adaptive Experience Replay with Attention-Based Sequence Embedding for Improved Multi-Agent Reinforcement Learning |
|
Xie, Zaipeng | Hohai University |
Shen, Sitong | Hohai University |
Wang, Yaowu | Hohai University |
Fang, Wenhao | Hohai University |
Song, WenZhan | University of Georgia |
Keywords: Reinforcement Learning, Agent-Based Systems, Autonomous Agents
Abstract: Multi-agent systems in non-stationary environments face challenges due to rapidly changing dynamics, leading to quick obsolescence of experiences in the replay buffer. To address this, we propose the Adaptive Experience Replay with Attention-Based Sequence Embedding (AERAS) framework, which integrates sequence embedding with an attention mechanism to prioritize experiences based on their relevance. By assigning adaptive weights, AERAS emphasizes relevant experiences while diminishing the impact of outdated ones, enhancing efficiency and learning performance in multi-agent reinforcement learning. Evaluations on the StarCraft II Multi-Agent Challenge and Google Research Football environments show that AERAS consistently outperforms state-of-the-art methods, achieving faster convergence and higher win rates. Ablation studies confirm the essential roles of sequence embedding and attention mechanisms in boosting AERAS's robustness and adaptability, underscoring its effectiveness in managing non-stationary environments within multi-agent systems.
|
|
10:05-10:10, Paper TuAT21.3 | Add to My Program |
Multi-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal Preferences |
|
Liu, Ziang | East China Normal University |
Xu, Junjie | East China Normal University |
Wu, XingJiao | East China Normal University |
Yang, Jing | East China Normal University |
He, Liang | East China Normal University |
Keywords: Reinforcement Learning, Human Factors and Human-in-the-Loop, Deep Learning Methods
Abstract: Preference-Based reinforcement learning (PBRL) learns directly from the preferences of human teachers regarding agent behaviors without needing meticulously designed reward functions. However, existing PBRL methods often learn primarily from explicit preferences, neglecting the possibility that teachers may choose equal preferences. This neglect may hinder the understanding of the agent regarding the task perspective of the teacher, leading to the loss of important information. To address this issue, we introduce the Equal Preference Learning Task, which optimizes the neural network by promoting similar reward predictions when the behaviors of two agents are labeled as equal preferences. Building on this task, we propose a novel PBRL method, Multi-Type Preference Learning (MTPL), which allows simultaneous learning from equal preferences while leveraging existing methods for learning from explicit preferences. To validate our approach, we design experiments applying MTPL to four existing state-of-the-art baselines across ten locomotion and robotic manipulation tasks in the DeepMind Control Suite. The experimental results indicate that simultaneous learning from both equal and explicit preferences enables the PBRL method to more comprehensively understand the feedback from teachers, thereby enhancing feedback efficiency. Project page: url{https://github.com/FeiCuiLengMMbb/paper_MTPL}
|
|
10:10-10:15, Paper TuAT21.4 | Add to My Program |
Neural Lyapunov Function Approximation with Self-Supervised Reinforcement Learning |
|
McCutcheon, Luc Harold Lucien | University of Surrey |
Gharesifard, Bahman | UCLA |
Fallah, Saber | University of Surrey |
Keywords: Reinforcement Learning, Robot Safety, Machine Learning for Robot Control
Abstract: Control Lyapunov functions are traditionally used to design a controller which ensures convergence to a desired state, yet deriving these functions for nonlinear systems remains a complex challenge. This paper presents a novel, sample-efficient method for neural approximation of nonlinear Lyapunov functions, leveraging self-supervised Reinforcement Learning (RL) to enhance training data generation, particularly for inaccurately represented regions of the state space. The proposed approach employs a data-driven World Model to train Lyapunov functions from off-policy trajectories. The method is validated on both standard and goal-conditioned robotic tasks, demonstrating faster convergence and higher approximation accuracy compared to the state-of-the-art neural Lyapunov approximation baseline. The code is available at: https://github.com/CAV-Research-Lab/SACLA.git
|
|
10:15-10:20, Paper TuAT21.5 | Add to My Program |
SuPLE: Robot Learning with Lyapunov Rewards |
|
Nguyen, Phu | San Jose State University |
Polani, Daniel | University of Hertfordshire |
Tiomkin, Stas | Texas Tech University |
Keywords: Reinforcement Learning, Machine Learning for Robot Control
Abstract: The reward function is an essential component in robot learning. Reward directly affects the sample and computational complexity of learning, and the quality of a solution. The design of informative rewards requires domain knowledge, which is not always available. We use the properties of the dynamics to produce system-appropriate reward without adding external assumptions. Specifically, we explore an approach to utilize the Lyapunov exponents of the system dynamics to generate a system-immanent reward. We demonstrate that the Sum of the Positive Lyapunov Exponents (SuPLE) is a strong candidate for the design of such a reward. We develop a computational framework for the derivation of this reward, and demonstrate its effectiveness on classical benchmarks for sample-based stabilization of various dynamical systems. It eliminates the need to start the training trajectories at arbitrary states, also known as auxiliary exploration. While the latter is a common practice in simulated robot learning, it is unpractical to consider to use it in real robotic systems, since they typically start from natural rest states such as a pendulum at the bottom, a robot on the ground, etc. and can not be easily initialized at arbitrary states. Comparing the performance of SuPLE to commonly-used reward functions, we observe that the latter fail to find a solution without auxiliary exploration, even for the task of swinging up the double pendulum and keeping it stable at the upright position, a prototypical scenario for multi- linked robots. SuPLE-induced rewards for robot learning offer a novel route for effective robot learning in typical as opposed to highly specialized or fine-tuned scenarios. Our code is publicly available for reproducibility and further research.
|
|
10:20-10:25, Paper TuAT21.6 | Add to My Program |
SpeedTuning: Speeding up Policy Execution with Lightweight Reinforcement Learning |
|
Yuan, David D. | Stanford University |
Zhao, Zihao | Stanford University |
Burns, Kaylee | Stanford University |
Finn, Chelsea | Stanford University |
Keywords: Reinforcement Learning, Imitation Learning, Deep Learning Methods
Abstract: While learned robotic policies hold promise for advancing generalizable manipulation, their practical deployment is often hindered by suboptimal execution speeds. Imitation learning policies are inherently limited by hardware constraints and the speed of the operator during data collection. In addition, there are no established methods for accelerating policies learned via imitation, and the empirical relationship between execution speed and task success remains underexplored. To address these issues, we introduce SpeedTuning, a reinforcement learning framework specifically designed to enhance the speed of manipulation policies. SpeedTuning learns to predict the optimal execution speed for actions, thereby complementing a base policy without necessitating additional data collection. We provide empirical evidence that SpeedTuning achieves substantial improvements in execution speed, exceeding 2.4x speed-up, while preserving an adequate success rate compared to both the original task policy and straightforward speed-up methods such as linear interpolation at a fixed speed. We evaluate our approach across a diverse set of dynamic and precise tasks, including pouring, throwing, and picking, demonstrating its effectiveness and robustness in enhancing real-world robotic manipulation. Videos and code are available at https://github.com/DaivdYuan/SpeedTuning
|
|
10:25-10:30, Paper TuAT21.7 | Add to My Program |
Simplifying Reward Design in Complex Robotics: Average-Reward Maximum Entropy Reinforcement Learning |
|
Choe, Jean Seong Bjorn | Korea University |
Choi, Bumkyu | Korea University |
Kim, Jong-kook | Korea Univeristy |
Keywords: Reinforcement Learning, Underactuated Robots, Robust/Adaptive Control
Abstract: This paper presents a novel approach to addressing the control challenges of underactuated systems, focusing on the swing-up and stabilisation tasks on the double pendulum system. We propose the Average-Reward Entropy Advantage Policy Optimisation (AR-EAPO), a model-free reinforcement learning (RL) algorithm that integrates the strengths of the average-reward RL and the maximum entropy RL (MaxEnt RL). The average reward criterion allows the use of a simple reward function by naturally promoting the long-term goals, at the same time MaxEnt RL encourages the robustness of the policy. We validate our approach through simulations, consistently outperforming standard RL baselines and traditional control methods. Also, we provide preliminary test results on real double pendulum hardware. Additional experiments on MuJoCo environments further demonstrate AR-EAPO's efficacy on general continuous control tasks. This work underscores the potential of the average-reward criterion in simplifying control design while achieving superior results.
|
|
TuAT22 Regular Session, 411 |
Add to My Program |
Learning Based Planning for Manipulation 1 |
|
|
Chair: Hermans, Tucker | University of Utah |
Co-Chair: Pompili, Dario | Rutgers University |
|
09:55-10:00, Paper TuAT22.1 | Add to My Program |
Multi-Stage Reinforcement Learning for Non-Prehensile Manipulation |
|
Wang, Dexin | Shandong University |
Liu, Chunsheng | Shandong University |
Chang, Faliang | Shandong University |
Huan, Hengqiang | Shandong University |
Cheng, Kun | Shandong University |
Keywords: Grasping, Manipulation Planning, Reinforcement Learning
Abstract: Manipulating objects without grasping them facilitates complex tasks, known as non-prehensile manipulation. Most previous methods are limited to learning a single skill to manipulate objects with primitive shapes, and are unserviceable for flexible object manipulation that requires a combination of multiple skills. We explore skill-unconstrained non-prehensile manipulation, and propose a Multi-stage Reinforcement Learning for Non-prehensile Manipulation (MRLNM), which calculates a intermediate state between the initial and goal states and divides the task into multiple stages for sequential learning. At each stage, the policy takes the desired 6-DOF object pose as the goal, and proposes a spatially-continuous action, allowing the robot to explore arbitrary skills to accomplish the task. To handle objects with different shapes, we propose a State-Goal Fusion Representation (SGF-Representation) to represent observations and goals as point clouds with motion, which improves the policy's perception of scene layout and task goal. To improve sample efficiency, we propose a Spatially-Reachable Distance Metric (SR-Distance) to approximately measure the shortest distance between two points without intersecting the scene. We evaluate MRLNM on an occluded grasping task which aims to grasp the object in initially occluded configurations. MRLNM demonstrates strong generalization to unseen objects with shapes outside the training distribution and can be transferred to the real world with zero-shot transfer, achieving a 95% success rate.
|
|
10:00-10:05, Paper TuAT22.2 | Add to My Program |
Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics |
|
Huang, Yixuan | University of Utah |
Agia, Christopher George | Stanford University |
Wu, Jimmy | Princeton University |
Hermans, Tucker | University of Utah |
Bohg, Jeannette | Stanford University |
Keywords: Deep Learning in Grasping and Manipulation, Mobile Manipulation, Manipulation Planning
Abstract: We present Points2Plans, a framework for composable planning with a relational dynamics model that enables robots to solve long-horizon manipulation tasks from partial-view point clouds. Given a language instruction and a point cloud of the scene, our framework initiates a hierarchical planning procedure, whereby a language model generates a high-level plan and a sampling-based planner produces constraint-satisfying continuous parameters for manipulation primitives sequenced according to the high-level plan. Key to our approach is the use of a relational dynamics model as a unifying interface between the continuous and symbolic representations of states and actions, thus facilitating language-driven planning from high-dimensional perceptual input such as point clouds. Whereas previous relational dynamics models require training on datasets of multi-step manipulation scenarios that align with the intended test scenarios, Points2Plans uses only single-step simulated training data while generalizing zero-shot to a variable number of steps during real-world evaluations. We evaluate our approach on tasks involving geometric reasoning, multi-object interactions, and occluded object reasoning in both simulated and real-world settings. Results demonstrate that Points2Plans offers strong generalization to unseen long-horizon tasks in the real world, where it solves over 85% of evaluated tasks while the next best baseline solves only 50%.
|
|
10:05-10:10, Paper TuAT22.3 | Add to My Program |
Retrieval-Augmented Hierarchical In-Context Reinforcement Learning and Hindsight Modular Reflections for Task Planning with LLMs |
|
Sun, Chuanneng | Rutgers University |
Huang, Songjun | Rutgers University |
Liu, Haiqiao | Rutgers University |
Gong, Jie | Rutgers University |
Pompili, Dario | Rutgers University |
Keywords: AI-Based Methods, Reinforcement Learning, Agent-Based Systems
Abstract: Large Language Models (LLMs) have demonstrated remarkable abilities in various language tasks, making them promising candidates for decision-making in robotics. Inspired by Hierarchical Reinforcement Learning (HRL), we propose Retrieval-Augmented in-context reinforcement Learning (RAHL), a novel framework that decomposes complex tasks into sub-tasks using an LLM-based high-level policy, in which a complex task is decomposed into sub-tasks by a high-level policy on-the-fly. The sub-tasks, defined by goals, are assigned to the low-level policy to complete. To improve the agent's performance in multi-episode execution, we propose Hindsight Modular Reflection (HMR), where, instead of reflecting on the full trajectory, we let the agent reflect on shorter sub-trajectories to improve reflection efficiency. We evaluate the decision-making ability of the proposed RAHL in three benchmark environments--ALFWorld, Webshop, and HotpotQA. Results show that RAHL can achieve 9%, 42%, and 10% performance improvement in 5 episodes of execution over strong baselines. Furthermore, we also implemented RAHL on the Boston Dynamics SPOT robot. The experiment shows that the robot is able to scan the environment, find doorways, and navigate to new rooms controlled by the LLM policy.
|
|
10:10-10:15, Paper TuAT22.4 | Add to My Program |
Automatic Behavior Tree Expansion with LLMs for Robotic Manipulation |
|
Styrud, Jonathan | ABB |
Iovino, Matteo | ABB Corporate Research |
Norrlöf, Mikael | Linköping University |
Björkman, Mårten | KTH |
Smith, Claes Christian | KTH Royal Institute of Technology |
Keywords: AI-Enabled Robotics, AI-Based Methods, Behavior-Based Systems
Abstract: Robotic systems for manipulation tasks are increasingly expected to be easy to configure for new tasks or unpredictable environments, while keeping a transparent policy that is readable and verifiable by humans. We propose the method BEhavior TRee eXPansion with Large Language Models (ours) to dynamically and automatically expand and configure Behavior Trees as policies for robot control. The method utilizes an LLM to resolve errors outside the task planner's capabilities, both during planning and execution. We show that the method is able to solve a variety of tasks and failures and permanently update the policy to handle similar problems in the future.
|
|
10:15-10:20, Paper TuAT22.5 | Add to My Program |
LLM-As-BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning |
|
Ao, Jicong | Technical University Munich |
Wu, Fan | Technical University of Munich |
Wu, Yansong | Technische Universität München |
Swikir, Abdalla | Mohamed Bin Zayed University of Artificial Intelligence |
Haddadin, Sami | Mohamed Bin Zayed University of Artificial Intelligence |
Keywords: Behavior-Based Systems, AI-Enabled Robotics, Assembly
Abstract: Robotic assembly tasks remain an open challenge due to their long horizon nature and complex part relations. Behavior trees (BTs) are increasingly used in robot task planning for their modularity and flexibility, but creating them manually can be effort-intensive. Large language models (LLMs) have recently been applied to robotic task planning for generating action sequences, yet their ability to generate BTs has not been fully investigated. To this end, we propose LLM-as-BT-Planner, a novel framework that leverages LLMs for BT generation in robotic assembly task planning. Four in-context learning methods are introduced to utilize the natural language processing and inference capabilities of LLMs for producing task plans in BT format, reducing manual effort while ensuring robustness and comprehensibility. Additionally, we evaluate the performance of fine-tuned smaller LLMs on the same tasks. Experiments in both simulated and real-world settings demonstrate that our framework enhances LLMs' ability to generate BTs, improving success rate through in-context learning and supervised fine-tuning.
|
|
10:20-10:25, Paper TuAT22.6 | Add to My Program |
Enhancing Multi-Agent Systems Via Reinforcement Learning with LLM-Based Planner and Graph-Based Policy |
|
Jia, Ziqi | Tsinghua University |
Li, Junjie | Huazhong University of Science and Technology |
Qu, Xiaoyang | Ping an Technology (Shenzhen) |
Wang, Jianzong | Ping an Technology (Shenzhen) |
Keywords: AI-Based Methods, AI-Enabled Robotics, Multi-Robot Systems
Abstract: Multi-agent systems (MAS) have shown great potential in executing complex tasks, but coordination and safety remain significant challenges. Multi-Agent Reinforcement Learning (MARL) offers a promising framework for agent collaboration, but it faces difficulties in handling complex tasks and designing reward functions. The introduction of Large Language Models (LLMs) has brought stronger reasoning and cognitive abilities to MAS, but existing LLM-based systems struggle to respond quickly and accurately in dynamic environ ments. To address these challenges, we propose LGC-MARL, a framework that efficiently combines LLMs and MARL. This framework decomposes complex tasks into executable subtasks and achieves efficient collaboration among multiple agents through graph-based coordination. Specifically, LGC-MARL consists of two main components: an LLM planner and a graph based collaboration meta policy. The LLM planner transforms complex task instructions into a series of executable subtasks, evaluates the rationality of these subtasks using a critic model, and generates an action dependency graph. The graph-based collaboration meta policy facilitates communication and collab oration among agents based on the action dependency graph, and adapts to new task environments through meta-learning. Experimental results on the AI2-THOR simulation platform demonstrate the superior performance and scalability of LGC MARL in completing various complex tasks.
|
|
10:25-10:30, Paper TuAT22.7 | Add to My Program |
A Black-Box Physics-Informed Estimator Based on Gaussian Process Regression for Robot Inverse Dynamics Identification |
|
Giacomuzzo, Giulio | University of Padova |
Carli, Ruggero | University of Padova |
Romeres, Diego | Mitsubishi Electric Research Laboratories |
Dalla Libera, Alberto | University of Padova |
Keywords: Dynamics, Calibration and Identification, Model Learning for Control, Gaussian Process Regression
Abstract: Learning the inverse dynamics of robots directly from data, adopting a black-box approach, is interesting for several real-world scenarios where limited knowledge about the system is available. In this paper, we propose a black-box model based on Gaussian Process (GP) Regression for the identification of the inverse dynamics of robotic manipulators. The proposed model relies on a novel multidimensional kernel, called Lagrangian Inspired Polynomial (LIP) kernel. The LIP kernel is based on two main ideas. First, instead of directly modeling the inverse dynamics components, we model as GPs the kinetic and potential energy of the system. The GP prior on the inverse dynamics components is derived from those on the energies by applying the properties of GPs under linear operators. Second, as regards the energy prior definition, we prove a polynomial structure of the kinetic and potential energy, and we derive a polynomial kernel that encodes this property. As a consequence, the proposed model allows also to estimate the kinetic and potential energy without requiring any label on these quantities. Results on simulation and on two real robotic manipulators, namely a 7 DOF Franka Emika Pa
|
|
TuAT23 Regular Session, 412 |
Add to My Program |
Autonomous Vehicle Navigation 1 |
|
|
Chair: Kunze, Lars | UWE Bristol |
Co-Chair: Otte, Michael W. | University of Maryland |
|
09:55-10:00, Paper TuAT23.1 | Add to My Program |
Annealed Winner-Takes-All for Motion Forecasting |
|
Xu, Yihong | Valeo.ai |
Letzelter, Victor | Telecom ParisTech, Valeo AI |
Chen, Mickaël | Valeo |
Zablocki, Eloi | Valeo |
Cord, Matthieu | Sorbonne Université, Valeo.ai |
Keywords: Autonomous Vehicle Navigation, Computer Vision for Automation, Vision-Based Navigation
Abstract: In autonomous driving, motion prediction aims at forecasting the future trajectories of nearby agents, helping the ego vehicle to anticipate behaviors and drive safely. A key challenge is generating a diverse set of future predictions, commonly addressed using data-driven models with Multiple Choice Learning (MCL) architectures and Winner-Takes-All (WTA) training objectives. However, these methods face initialization sensitivity and training instabilities. Additionally, to compensate for limited performance, some approaches rely on training with a large set of hypotheses, requiring a post-selection step during inference to significantly reduce the number of predictions. To tackle these issues, we take inspiration from annealed MCL, a recently introduced technique that improves the convergence properties of MCL methods through an annealed Winner-Takes-All loss (aWTA). In this paper, we demonstrate how the aWTA loss can be integrated with state-of-the-art motion forecasting models to enhance their performance using only a minimal set of hypotheses, eliminating the need for the cumbersome post-selection step. Our approach can be easily incorporated into any trajectory prediction model normally trained using WTA and yields significant improvements. To facilitate the application of our approach to future motion forecasting models, the code will be made publicly available upon acceptance.
|
|
10:00-10:05, Paper TuAT23.2 | Add to My Program |
Causal Contrastive Learning with Data Augmentations for Imitation-Based Planning |
|
Xin, Haojie | Xi'an Jiaotong University |
Zhang, Xiaodong | Xidian University |
Yan, Songyang | Xi'an Jiaotong University |
Sun, Jun | Singapore Management University |
Yang, Zijiang | University of Science and Technology of China |
Keywords: Autonomous Vehicle Navigation, Intelligent Transportation Systems
Abstract: Motion planning is a difficult task, especially when generating feasible future trajectories in complex and interactive scenarios. While recent advancements in imitation-based planning have shown significant progress, this approach often encounters causal confusion in dynamic traffic environments. This confusion will cause the planner to incorrectly associate certain actions with outcomes, leading to suboptimal or unsafe plans. To address this, we introduce a novel framework called C2L, which improves the planner’s latent Causal understanding by incorporating Contrastive Learning and counterfactual data augmentation. Additionally, we propose a shortcut eliminator to extract copycat-free features from history states, reducing the impact of temporal spurious correlations. We validate our method on the nuPlan and interPlan benchmarks, with extensive experiments demonstrating that C2L delivers highly competitive performance compared to state-of-the-art methods.
|
|
10:05-10:10, Paper TuAT23.3 | Add to My Program |
Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving |
|
Xiao, Lingyu | Southeast University |
Liu, Jiang-Jiang | Baidu |
Yang, Sen | Baidu |
Li, Xiaofan | Baidu |
Ye, Xiaoqing | Baidu Inc |
Yang, Wankou | Southeast University |
Wang, Jingdong | Baidu |
Keywords: Autonomous Vehicle Navigation, Integrated Planning and Control, Intelligent Transportation Systems
Abstract: The autoregressive world model exhibits robust generalization capabilities in vectorized scene understanding but encounters difficulties in deriving actions due to insufficient uncertainty modeling and self-delusion. In this paper, we explore the feasibility of deriving decisions from an autoregressive world model by addressing these challenges through the formulation of multiple probabilistic hypotheses. We propose LatentDriver, a framework models the environment’s next states and the ego vehicle’s possible actions as a mixture distribution, from which a deterministic control signal is then derived. By incorporating mixture modeling, the stochastic nature of decision-making is captured. Additionally, the self-delusion problem is mitigated by providing intermediate actions sampled from a distribution to the world model. Experimental results on the recently released close-loop benchmark Waymax demonstrate that LatentDriver surpasses state-of-the-art reinforcement learning and imitation learning methods, achieving expert-level performance. The code and models will be made available at https://github.com/Sephirex-X/LatentDriver.
|
|
10:10-10:15, Paper TuAT23.4 | Add to My Program |
Autonomous Wheel Loader Navigation Using Goal-Conditioned Actor-Critic MPC |
|
Mäki-Penttilä, Aleksi | Tampere University |
Toulkani, Naeim Ebrahimi | Tampere University |
Ghabcheloo, Reza | Tampere University |
Keywords: Autonomous Vehicle Navigation, Optimization and Optimal Control, Motion and Path Planning
Abstract: This paper proposes a novel control method for an autonomous wheel loader, enabling time-efficient navigation to an arbitrary goal pose. Unlike prior works which combine high-level trajectory planners with Model Predictive Control (MPC), we directly enhance the planning capabilities of MPC by incorporating a cost function derived from Actor-Critic Reinforcement Learning (RL). Specifically, we first train an RL agent to solve the pose reaching task in simulation, then transfer the learned planning knowledge to an MPC by incorporating the trained neural network critic as both the stage and terminal cost. We show through comprehensive simulations that the resulting MPC inherits the time-efficient behavior of the RL agent, generating trajectories that compare favorably against those found using trajectory optimization. We also deploy our method on a real-world wheel loader, where we demonstrate successful navigation in various scenarios.
|
|
10:15-10:20, Paper TuAT23.5 | Add to My Program |
Unlock the Power of Unlabeled Data in Language Driving Model |
|
Wang, Chaoqun | The Chinese University of Hong Kong, Shenzhen |
Yang, Jie | The Chinese University of Hong Kong, Shenzhen |
Hong, Xiaobin | Nanjing University |
Zhang, Ruimao | The Chinese University of Hong Kong (Shenzhen) |
Keywords: Autonomous Vehicle Navigation, Vision-Based Navigation, Intelligent Transportation Systems
Abstract: Recent Vision-based Large Language Models (VisionLLMs) for autonomous driving have seen rapid advancements. However, such promotion is extremely dependent on large-scale high-quality annotated data, which is costly and labor-intensive. To address this issue, we propose unlocking the value of abundant yet unlabeled data to improve the language-driving model in a semi-supervised learning manner. Specifically, we first introduce a series of template-based prompts to extract scene information, generating questions that create pseudo-answers for the unlabeled data based on a model trained with limited labeled data. Next, we propose a Self-Consistency Refinement method to improve the quality of these pseudo-annotations, which are later used for further training. By utilizing a pre-trained VisionLLM (e.g., InternVL), we build a strong Language Driving Model (LDM) for driving scene question-answering, outperforming previous state-of-the-art methods. Extensive experiments on the DriveLM benchmark show that our approach performs well with just 5% labeled data, achieving competitive performance against models trained with full datasets. In particular, our LDM achieves 44.85% performance with limited labeled data, increasing to 54.27% when using unlabeled data, while models trained with full datasets reach 60.68% on the DriveLM benchmark.
|
|
10:20-10:25, Paper TuAT23.6 | Add to My Program |
CAFE-AD: Cross-Scenario Adaptive Feature Enhancement for Trajectory Planning in Autonomous Driving |
|
Zhang, Junrui | University of Science & Technology of China |
Wang, Chenjie | Institute of Artificial Intelligence, Hefei Comprehensive Nation |
Peng, Jie | University of Science and Technology of China |
Li, Haoyu | University of Science and Technology of China |
Ji, Jianmin | University of Science and Technology of China |
Zhang, Yu | University of Science and Technology of China |
Zhang, Yanyong | University of Science and Technology of China |
Keywords: Autonomous Vehicle Navigation, Motion and Path Planning, Intelligent Transportation Systems
Abstract: Imitation learning based planning tasks on the nuPlan dataset have gained great interest due to their potential to generate human-like driving behaviors. However, open-loop training on the nuPlan dataset tends to cause causal confusion during closed-loop testing, and the dataset also presents a long-tail distribution of scenarios. These issues introduce challenges for imitation learning. To tackle these problems, we introduce CAFE-AD, a Cross-Scenario Adaptive Feature Enhancement for Trajectory Planning in Autonomous Driving method, designed to enhance feature representation across various scenario types. We develop an adaptive feature pruning module that ranks feature importance to capture the most relevant information while reducing the interference of noisy information during training. Moreover, we propose a cross-scenario feature interpolation module that enhances scenario information to introduce diversity, enabling the network to alleviate over-fitting in dominant scenarios. We evaluate our method CAFE-AD, on the challenging public nuPlan Test14-Hard closed-loop simulation benchmark. The results demonstrate that CAFE-AD outperforms state-of-the-art methods including rule-based and hybrid planners, and exhibits the potential in mitigating the impact of long-tail distribution within the dataset. Additionally, we further validate its effectiveness in real-world environments. The code and models will be made available at https://github.com/AlniyatRui/CAFE-AD.
|
|
10:25-10:30, Paper TuAT23.7 | Add to My Program |
Beyond Simulation: Benchmarking World Models for Planning and Causality in Autonomous Driving |
|
Schofield, Hunter | York University |
Elmahgiubi, Mohammed | Huawei Technologies Inc |
Rezaee, Kasra | Huawei Technologies |
Shan, Jinjun | York University |
Keywords: Autonomous Vehicle Navigation, Autonomous Agents, Motion and Path Planning
Abstract: World models have become increasingly popular in acting as learned traffic simulators. Recent work has explored replacing traditional traffic simulators with world models for policy training. In this work, we explore the robustness of existing metrics to evaluate world models as traffic simulators to see if the same metrics are suitable for evaluating a world model as a pseudo-environment for policy training. Specifically, we analyze the metametric employed by the Waymo Open Sim-Agents Challenge (WOSAC) and compare world model predictions on standard scenarios where the agents are fully or partially controlled by the world model (partial replay). Furthermore, since we are interested in evaluating the ego action-conditioned world model, we extend the standard WOSAC evaluation domain to include agents that are causal to the ego vehicle. Our evaluations reveal a significant number of scenarios where top-ranking models perform well under no perturbation but fail when the ego agent is forced to replay the original trajectory. To address these cases, we propose new metrics to highlight the sensitivity of world models to uncontrollable objects and evaluate the performance of world models as pseudo-environments for policy training and analyze some state-of-the-art world models under these new metrics.
|
|
TuAT24 Regular Session, 401 |
Add to My Program |
Testing and Validation |
|
|
Chair: Hollis, Ralph | Carnegie Mellon University |
Co-Chair: Heckman, Christoffer | University of Colorado at Boulder |
|
09:55-10:00, Paper TuAT24.1 | Add to My Program |
Enhancing Repeatability and Reliability of Accelerated Risk Assessment in Robot Testing |
|
Capito, Linda | Transportation Research Center Inc. C/o NHTSA |
Castillo, Guillermo A. | The Ohio State University |
Weng, Bowen | Iowa State University |
Keywords: Probability and Statistical Methods, Performance Evaluation and Benchmarking, Legged Robots
Abstract: Risk assessment of a robot in controlled environments, such as laboratories and proving grounds, is a common means to assess, certify, validate, verify, and characterize the robots' safety performance before, during, and even after their commercialization in the real-world. A standard testing program that acquires the risk estimate is expected to be (i) repeatable, such that it obtains similar risk assessments of the same testing subject among multiple trials or attempts with the similar testing effort by different stakeholders, and (ii) reliable against a variety of testing subjects produced by different vendors and manufacturers. Both repeatability and reliability are fundamental and crucial for a testing algorithm's validity, fairness, and practical feasibility, especially for standardization. However, these properties are rarely satisfied or ensured, especially as the subject robots become more complex, uncertain, and varied. This issue was present in traditional risk assessments through Monte-Carlo sampling, and remains a bottleneck for the recent accelerated risk assessment methods, primarily those using importance sampling. This study aims to enhance existing accelerated testing frameworks by proposing a new algorithm that provably integrates repeatability and reliability with the already established formality and efficiency. It also features demonstrations assessing the risk of instability from frontal impacts, initiated by push-over disturbances on a controlled inverted pendulum and a 7-DoF planar bipedal robot Rabbit managed by various control algorithms.
|
|
10:00-10:05, Paper TuAT24.2 | Add to My Program |
Learning-Based Bayesian Inference for Testing of Autonomous Systems |
|
Parashar, Anjali | MIT |
Yin, Ji | Georgia Institute of Technology |
Dawson, Charles | MIT |
Tsiotras, Panagiotis | Georgia Tech |
Fan, Chuchu | Massachusetts Institute of Technology |
Keywords: Formal Methods in Robotics and Automation, Robot Safety, Hybrid Logical/Dynamical Planning and Verification
Abstract: For the safe operation of robotic systems, it is important to accurately understand its failure modes using prior testing. Hardware testing of robotic infrastructure is known to be slow and costly. Instead, failure prediction in simulation can help to analyze the system before deployment. Conventionally, large-scale naive Monte Carlo simulations are used for testing; however, this method is only suitable for testing average system performance. For safety-critical systems, worst-case performance is more crucial as failures are often rare events, and the size of test batches increases substantially as failures become more rare. Rare-event sampling methods can be helpful; however, they exhibit slow convergence and cannot handle constraints. This research introduces a novel sampling-based testing framework for autonomous systems which bridges these gaps by utilizing a discretized gradient-based second-order Langevin algorithm combined with learning-based techniques for constrained sampling of failure modes. Our method can predict more diverse failures by exploring the search space efficiently and ensures feasibility with respect to temporal and implicit constraints. We demonstrate the use of our testing methodology on two categories of testing problems, via simulations and hardware experiments. Our method discovers up to 2X failures compared to naive Random Walk sampling, with only half of the sample size.
|
|
10:05-10:10, Paper TuAT24.3 | Add to My Program |
Foundation Models for Rapid Autonomy Validation |
|
Farid, Alec | Princeton |
Schleede, Peter | Zoox |
Huang, Aaron | Zoox |
Heckman, Christoffer | University of Colorado at Boulder |
Keywords: Performance Evaluation and Benchmarking, Deep Learning Methods, Representation Learning
Abstract: We are motivated by the problem of autonomous vehicle performance validation. A key challenge is that an autonomous vehicle requires testing in every kind of driving scenario it could encounter, including rare events, to provide a strong case for safety and show there is no edge-case pathological behavior. Autonomous vehicle companies rely on potentially millions of miles driven in realistic simulation to expose the driving stack to enough miles to estimate rates and severity of collisions. To address scalability and coverage, we propose the use of a behavior foundation model, specifically a masked autoencoder (MAE), trained to reconstruct driving scenarios. We leverage the foundation model in two complementary ways: we (i) use the learned embedding space to group qualitatively similar scenarios together and (ii) fine-tune the model to label scenario difficulty based on the likelihood of a collision upon simulation. We use the difficulty scoring as importance weighting for the groups of scenarios. The result is an approach which can more rapidly estimate the rates and severity of collisions by prioritizing hard scenarios while ensuring exposure to every kind of driving scenario.
|
|
10:10-10:15, Paper TuAT24.4 | Add to My Program |
The Mini Wheelbot: A Testbed for Learning-Based Balancing, Flips, and Articulated Driving |
|
Hose, Henrik | Institute for Data Science in Mechanical Engineering (DSME), RWT |
Weisgerber, Jan Luca | RWTH Aachen |
Trimpe, Sebastian | RWTH Aachen University |
Keywords: Wheeled Robots, Underactuated Robots, Machine Learning for Robot Control
Abstract: The Mini Wheelbot is a balancing, reaction wheel unicycle robot designed as a testbed for learning-based control. It is an unstable system with highly nonlinear yaw dynamics, non-holonomic driving, and discrete contact switches in a small, powerful, and rugged form factor. The Mini Wheelbot can use its wheels to stand up from any initial orientation - enabling automatic environment resets in repetitive experiments and even challenging half flips. We illustrate the effectiveness of the Mini Wheelbot as a testbed by implementing two popular learning-based control algorithms. First, we showcase Bayesian optimization for tuning the balancing controller. Second, we use imitation learning from an expert nonlinear MPC that uses gyroscopic effects to reorient the robot and can track higher-level velocity and orientation commands. The latter allows the robot to drive around based on user commands - for the first time in this class of robots. The Mini Wheelbot is not only compelling for testing learning-based control algorithms, but it is also just fun to work with, as demonstrated in the video of our experiments.
|
|
10:15-10:20, Paper TuAT24.5 | Add to My Program |
The Impact of Sensor Faults on Connected Autonomous Vehicle Localization |
|
Kuwada, Shinsaku | Illinois Institute of Technology |
Joerger, Mathieu | Virginia Tech |
Spenko, Matthew | Illinois Institute of Technology |
Keywords: Probability and Statistical Methods, Localization, Multi-Robot Systems
Abstract: Connected autonomous vehicles (CAVs) can provide benefits over individual vehicles for precise navigation, especially in GNSS-denied environments. CAV collaboration can enhance estimation accuracy, but the safety of collaborative localization in the presence of undetected sensor faults remains underexplored. This paper introduces an integrity monitoring method for CAV collaborative localization in both centralized and decentralized implementations. Fault models for landmark and relative measurements are described, and the probability of hazardous misleading information, or integrity risk, is derived. Simulation and experimental results for notional two-CAV scenarios indicate that collaborative localization reduces integrity risk and enhances navigation safety.
|
|
10:20-10:25, Paper TuAT24.6 | Add to My Program |
Realistic Extreme Behavior Generation for Improved AV Testing |
|
Dyro, Robert | Stanford University |
Foutter, Matthew | Stanford University |
Li, Ruolin | Stanford |
Di Lillo, Luigi | Swiss Reinsurance Company, Ltd; Autonomous Systems Lab, Stanford |
Schmerling, Edward | Stanford University |
Zhou, Xilin | Swiss Re |
Pavone, Marco | Stanford University |
Keywords: Performance Evaluation and Benchmarking, Optimization and Optimal Control
Abstract: This work introduces a framework to diagnose the strengths and shortcomings of Autonomous Vehicle (AV) collision avoidance technology with synthetic yet realistic potential collision scenarios adapted from real-world, collision-free data. Our framework generates counterfactual collisions with diverse crash properties, e.g., crash angle and velocity, between an adversary and a target vehicle by adding perturbations to the adversary's predicted trajectory from a learned AV behavior model. Our main contribution is to ground these adversarial perturbations in realistic behavior as defined through the lens of data-alignment in the behavior model's parameter space. Then, we cluster these synthetic counterfactuals to identify plausible and representative collision scenarios to form the basis of a test suite for downstream AV system evaluation. We demonstrate our framework using two state-of-the-art behavior prediction models as sources of realistic adversarial perturbations, and show that our scenario clustering evokes interpretable failure modes from a baseline AV policy under evaluation.
|
|
10:25-10:30, Paper TuAT24.7 | Add to My Program |
Limits of Specifiability for Sensor-Based Robotic Planning Tasks |
|
Sakcak, Basak | University of Oulu |
Shell, Dylan | Texas A&M University |
O'Kane, Jason | Texas A&M University |
Keywords: Formal Methods in Robotics and Automation, Reactive and Sensor-Based Planning, Task and Motion Planning
Abstract: There is now a large body of techniques, many based on formal methods, for describing and realizing complex robotics tasks, including those involving a variety of rich goals and time-extended behavior. This paper explores the limits of what sorts of tasks are specifiable, examining how the precise grounding of specifications, that is, whether the specification is given in terms of the robot's states, its actions and observations, its knowledge, or some other information, is crucial to whether a given task can be specified. While prior work included some description of particular choices for this grounding, our contribution treats this aspect as a first-class citizen: we introduce notation to deal with a large class of problems, and examine how the grounding affects what tasks can be posed. The results demonstrate that certain classes of tasks are specifiable under different combinations of groundings.
|
|
TuBT1 Regular Session, 302 |
Add to My Program |
Award Finalists 2 |
|
|
Chair: Smart, William D. | Oregon State University |
Co-Chair: Asada, Harry | MIT |
|
11:15-11:20, Paper TuBT1.1 | Add to My Program |
Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition |
|
Luo, Shengcheng | Shanghai Jiao Tong University |
Peng, Quanquan | Shanghai Jiao Tong University |
Lv, Jun | Shanghai Jiao Tong University |
Hong, Kaiwen | University of Illinois at Urbana Champaign |
Driggs-Campbell, Katherine | University of Illinois at Urbana-Champaign |
Lu, Cewu | ShangHai Jiao Tong University |
Li, Yong-Lu | Shanghai Jiao Tong University |
Keywords: AI-Based Methods, Deep Learning in Grasping and Manipulation, Human-Robot Collaboration
Abstract: Employing a teleoperation system for gathering demonstrations offers the potential for more efficient learning of robot manipulation. However, teleoperating a robot arm equipped with a dexterous hand or gripper, via a teleoperation system presents inherent challenges due to the task's high dimensionality, complexity of motion, and differences between physiological structures. In this study, we introduce a novel system for joint learning between human operators and robots, that enables human operators to share control of a robot end-effector with a learned assistive agent, simplifies the data collection process and facilitating simultaneous human demonstration collection and robot manipulation training. As data accumulates, the assistive agent gradually learns. Consequently, less human effort and attention are required, enhancing the efficiency of the data collection process. It also allows the human operator to adjust the control ratio to achieve a trade-off between manual and automated control. We conducted experiments in both simulated environments and physical real-world settings. Through user studies and quantitative evaluations, it is evident that the proposed system could enhance data collection efficiency and reduce the need for human adaptation while ensuring the collected data is of sufficient quality for downstream tasks. More details please refer to our main page https://norweig1an.github.io/HAJL.github.io/.
|
|
11:20-11:25, Paper TuBT1.2 | Add to My Program |
To Ask or Not to Ask: Human-In-The-Loop Contextual Bandits with Applications in Robot-Assisted Feeding |
|
Banerjee, Rohan | Cornell University |
Jenamani, Rajat Kumar | Cornell University |
Vasudev, Sidharth | Cornell University |
Nanavati, Amal | University of Washington |
Dimitropoulou, Katherine | Columbia University |
Dean, Sarah | Cornell University |
Bhattacharjee, Tapomayukh | Cornell University |
Keywords: Human Factors and Human-in-the-Loop, Reinforcement Learning, Physically Assistive Devices
Abstract: Robot-assisted bite acquisition involves picking up food items with varying shapes, compliance, sizes, and textures. Fully autonomous strategies may not generalize efficiently across this diversity. We propose leveraging feedback from the care recipient when encountering novel food items. However, frequent queries impose a workload on the user. We formulate human-in-the-loop bite acquisition within a contextual bandit framework and introduce LinUCB-QG, a method that selectively asks for help using a predictive model of querying workload based on query types and timings. This model is trained on data collected in an online study involving 14 participants with mobility limitations, 3 occupational therapists simulating physical limitations, and 89 participants without limitations. We demonstrate that our method better balances task performance and querying workload compared to autonomous and always-querying baselines and adjusts its querying behavior to account for higher workload in users with mobility limitations. This is validated through experiments in a simulated food dataset and a user study with 19 participants, including one with severe mobility limitations. Please check out our project website at: https://emprise.cs.cornell.edu/hilbiteacquisition/.
|
|
11:25-11:30, Paper TuBT1.3 | Add to My Program |
Point and Go: Intuitive Reference Frame Reallocation in Mode Switching for Assistive Robotics |
|
Wang, Allie | University of Alberta |
Jiang, Chen | University of Alberta |
Przystupa, Michael | University of Alberta |
Valentine, Justin | University of Alberta |
Jagersand, Martin | University of Alberta |
Keywords: Rehabilitation Robotics, Kinematics, Physically Assistive Devices
Abstract: Operating high degree of freedom robots can be difficult for users of wheelchair mounted robotic manipulators. Mode switching in Cartesian space has several drawbacks such as unintuitive control reference frames, separate translation and orientation control, and limited movement capabilities that hinder performance. We propose Point and Go mode switching, which reallocates the Cartesian mode switching reference frames into a more intuitive action space comprised of new translation and rotation modes. We use a novel sweeping motion to point the gripper, which defines the new translation axis along the robot base frame's horizontal plane. This creates an intuitive 'point and go' translation mode that allows the user to easily perform complex, human-like movements without switching control modes. The system's rotation mode combines position control with a refined end-effector oriented frame that provides precise and consistent robot actions in various end-effector poses. We verified its effectiveness through initial experiments, followed by a three-task user study that compared our method to Cartesian mode switching and a state of the art learning method. Results show that Point and Go mode switching reduced completion times by 31%, pauses by 41%, and mode switches by 33%, while receiving significantly favorable responses in user surveys.
|
|
11:30-11:35, Paper TuBT1.4 | Add to My Program |
RoboCrowd: Scaling Robot Data Collection through Crowdsourcing |
|
Mirchandani, Suvir | Stanford University |
Yuan, David D. | Stanford University |
Burns, Kaylee | Stanford University |
Islam, Md Sazzad | Stanford University |
Zhao, Zihao | Stanford University |
Finn, Chelsea | Stanford University |
Sadigh, Dorsa | Stanford University |
Keywords: Telerobotics and Teleoperation, Data Sets for Robot Learning, Human Factors and Human-in-the-Loop
Abstract: In recent years, imitation learning from large-scale human demonstrations has emerged as a promising paradigm for training robot policies. However, the burden of collecting large quantities of human demonstrations is significant in terms of collection time and the need for access to expert operators. We introduce a new data collection paradigm, RoboCrowd, which distributes the workload by utilizing crowdsourcing principles and incentive design. RoboCrowd helps enable scalable data collection and facilitates more efficient learning of robot policies. We build RoboCrowd on top of ALOHA (Zhao et al. 2023)---a bimanual platform that supports data collection via puppeteering---to explore the design space for crowdsourcing in-person demonstrations in a public environment. We propose three classes of incentive mechanisms to appeal to users' varying sources of motivation for interacting with the system: material rewards, intrinsic interest, and social comparison. We instantiate these incentives through tasks that include physical rewards, engaging or challenging manipulations, as well as gamification elements such as a leaderboard. We conduct a large-scale, two-week field experiment in which the platform is situated in a university cafe. We observe significant engagement with the system---over 200 individuals independently volunteered to provide a total of over 800 interaction episodes. Our findings validate the proposed incentives as mechanisms for shaping users' data quantity and quality. Further, we demonstrate that the crowdsourced data can serve as useful pre-training data for policies fine-tuned on expert demonstrations---boosting performance up to 20% compared to when this data is not available. These results suggest the potential for RoboCrowd to reduce the burden of robot data collection by carefully implementing crowdsourcing and incentive design principles. Videos are available at https://robocrowd.github.io.
|
|
11:35-11:40, Paper TuBT1.5 | Add to My Program |
How Sound-Based Robot Communication Impacts Perceptions of Robotic Failure |
|
Crider, Jai'La Lee | Oregon State University |
Preston, Rhian | Oregon State University |
Fitter, Naomi T. | Oregon State University |
Keywords: Social HRI, Robot Companions, Natural Dialog for HRI
Abstract: One challenge in human-robot interaction is selecting communication methods that fit a given robotic system and avoid overpromising. For example, verbal speech provides a clear and easy-to-understand communication method, but can inflate expectations of robot abilities. Is verbal speech the ultimate option? Might other tactics provide similar advantages with fewer downsides? The presented work focuses on addressing these important questions by 1) quantifying any inflated opinions of robots that use verbal speech and 2) gathering perspectives on alternative nonverbal sound-based communication tactics (as a means to potentially shrink gaps between expected and actual robot performance). We conducted a within-subjects online study that varied robot communication modes in videos of successful and unsuccessful mock tasks by a modern commercial robot. Assessments of robot competence and trust after an observed robot failure were higher for verbal robots, but we observed less decline in competence and trust ratings due to the failure for a nonverbal robot using character-like sound (compared to a robot using verbal communication). Human-robot interaction practitioners can use our results to design effective and robust communication strategies for robots.
|
|
11:40-11:45, Paper TuBT1.6 | Add to My Program |
Obstacle-Avoidant Leader Following with a Quadruped Robot |
|
Scheidemann, Carmen | ETH Zurich |
Werner, Lennart | ETH Zürich |
Reijgwart, Victor | ETH Zurich |
Cramariuc, Andrei | ETHZ |
Chomarat, Joris | ETH Zurich |
Chiu, Jia-Ruei | ETH Zurich |
Siegwart, Roland | ETH Zurich |
Hutter, Marco | ETH Zurich |
Keywords: Human-Centered Robotics, Human Detection and Tracking, Legged Robots
Abstract: Personal mobile robotic assistants are expected to find wide applications in industry and healthcare. For example, people with limited mobility can benefit from robots helping with daily tasks, or construction workers can have robots perform precision monitoring tasks on-site. However, manually steering a robot while in motion requires significant concentration from the operator, especially in tight or crowded spaces. This reduces walking speed, and the constant need for vigilance increases fatigue and, thus, the risk of accidents. This work presents a virtual leash with which a robot can naturally follow an operator. We use a sensor fusion based on a custom-built RF transponder, RGB cameras, and a LiDAR. In addition, we customize a local avoidance planner for legged platforms, which enables us to navigate dynamic and narrow environments. We successfully validate on the ANYmal platform the robustness and performance of our entire pipeline in real-world experiments.
|
|
TuBT2 Regular Session, 301 |
Add to My Program |
Transfer and Continual Learning |
|
|
Chair: Gupta, Abhishek | University of Washington |
Co-Chair: Nemlekar, Heramb | Virginia Tech |
|
11:15-11:20, Paper TuBT2.1 | Add to My Program |
Semantic Cross-Pose Correspondence from a Single Example |
|
Hadjivelichkov, Denis | University College London |
Zwane, Sicelukwanda Njabuliso Tunner | University College London |
Deisenroth, Marc Peter | University College London |
Agapito, Lourdes | University College London |
Kanoulas, Dimitrios | University College London |
Keywords: Representation Learning, Transfer Learning, Learning from Demonstration
Abstract: This article focuses on predicting how an object can be transformed to a semantically meaningful pose relative to another object, given only one or few examples. Current pose correspondence methods rely on vast 3D object datasets and do not actively consider semantic information, which limits the objects to which they can be applied. We present a novel method for learning cross-object pose correspondence. rev{The proposed method detects interacting object parts, performs one-shot part correspondence, and uses geometric and visual-semantic features. Given one example of two objects posed relative to each other, the model can learn how to transfer the demonstrated relations to unseen object instances.
|
|
11:20-11:25, Paper TuBT2.2 | Add to My Program |
H2O+: An Improved Framework for Hybrid Offline-And-Online RL with Dynamics Gaps |
|
Niu, Haoyi | Tsinghua University |
Ji, Tianying | Tsinghua University |
Bingqi, Liu | Beihang University |
Zhao, Haocheng | Tsinghua University |
Zhu, Xiangyu | Tsinghua University |
Zheng, Jianying | Beihang University |
Huang, Pengfei | Tsinghua University |
Zhou, Guyue | Tsinghua University |
Hu, Jianming | Tsinghua University |
Zhan, Xianyuan | Tsinghua University |
Keywords: Transfer Learning, Reinforcement Learning, Machine Learning for Robot Control
Abstract: Solving real-world complex tasks using reinforcement learning (RL) without high-fidelity simulation environments or large amounts of offline data can be quite challenging. Online RL agents trained in imperfect simulation environments can suffer from severe sim-to-real issues. Offline RL approaches although bypass the need for simulators, often pose demanding requirements on the size and quality of the offline datasets. The recently emerged hybrid offline-and-online RL provides an attractive framework that enables joint use of limited offline data and imperfect simulator for transferable policy learning. In this paper, we develop a new algorithm, called H2O+, which offers great flexibility to bridge various choices of offline and online learning methods, while also accounting for dynamics gaps between the real and simulation environment. Through extensive simulation and real-world robotics experiments, we demonstrate superior performance and flexibility of H2O+ over advanced cross-domain online and offline RL algorithms.
|
|
11:25-11:30, Paper TuBT2.3 | Add to My Program |
M2Distill: Multi-Modal Distillation for Lifelong Imitation Learning |
|
Roy, Kaushik | CSIRO |
Dissanayake, Akila | Commonwealth Scientific and Industrial Research Organisation |
Tidd, Brendan | CSIRO |
Moghadam, Peyman | CSIRO |
Keywords: Continual Learning, Incremental Learning, Imitation Learning
Abstract: Lifelong imitation learning for manipulation tasks poses significant challenges due to distribution shifts that occur in incremental learning steps. Existing methods often focus on unsupervised skill discovery to construct an ever-growing skill library or distillation from multiple policies, which can lead to scalability issues as diverse manipulation tasks are continually introduced and may fail to ensure a consistent latent space throughout the learning process, leading to catastrophic forgetting of previously learned skills. In this paper, we introduce M2Distill, a multi-modal distillation-based method for lifelong imitation learning focusing on preserving consistent latent space across vision, language, and action distributions throughout the learning process. By regulating the shifts in latent representations across different modalities from previous to current steps, and reducing discrepancies in Gaussian Mixture Model (GMM) policies between consecutive learning steps, we ensure that the learned policy retains its ability to perform previously learned tasks while seamlessly integrating new skills. Extensive evaluations on the LIBERO lifelong imitation learning benchmark suites, including LIBERO-OBJECT, LIBERO-GOAL, and LIBERO-SPATIAL, demonstrate that our method consistently outperforms prior state-of-the-art methods across all evaluated metrics.
|
|
11:30-11:35, Paper TuBT2.4 | Add to My Program |
Expert-Enhanced Masked Point Modeling for Point Cloud Self-Supervised Learning |
|
Liu, Yujun | Tsinghua University |
Zha, Yaohua | Tsinghua University |
Li, Naiqi | Tsinghua University |
Tao, Dai | Shenzhen University |
Chen, Bin | Harbin Institute of Technology, Shenzhen |
Xia, Shu-Tao | Tsinghua University |
Keywords: Transfer Learning, Object Detection, Segmentation and Categorization, Deep Learning Methods
Abstract: Recently, learning-based point cloud analysis has played a crucial role in robotic perception. Masked Point Modeling (MPM), owing to its powerful representational capabilities, has become the mainstream point cloud self-supervised learning method. However, existing MPM-based methods often suffer from the problem of negative transfer, due to the disparity in semantic distribution between upstream data and downstream data. To address this issue, we propose an expert enhancement strategy for existing MPM-based methods. Specifically, we insert a Sparse Mixture of Experts (SMoE) layer after each block of the backbone network, which utilizes a multi-branch expert architecture with routers that allocate data of different semantics to the appropriate experts for analysis. During the pre-training phase, our expert-enhanced model not only learns universal 3D representations for the backbone network but also acquires powerful semantic routing capabilities for all expert layers. In the fine-tuning phase, we freeze all backbones and conduct end-to-end fine-tuning solely on our expert layers to adaptively select multiple experts most relevant to the semantics of each downstream data for analysis. Extensive downstream experiments demonstrate the superiority of our method, especially outperforming baseline (Point-MAE) by 5.16%, 5.86%, and 4.62% in three variants of ScanObjectNN while utilizing only 12% of its trainable parameters. Our code is released at https://github.com/chenchen1104/point_e2mae.
|
|
11:35-11:40, Paper TuBT2.5 | Add to My Program |
3D Dense Captioning Via Prototypical Momentum Distillation |
|
Mi, Jinpeng | USST |
Wang, Ying | University of Shanghai for Science and Technology |
Jin, Shaofei | University of Shanghai for Science and Technology |
Zhang, Shiming | University of Shanghai for Science and Technology |
Wei, Xian | East China Normal University |
Zhang, Jianwei | Hamburg University |
Keywords: Transfer Learning, Deep Learning for Visual Perception
Abstract: 3D dense captioning aims to describe the crucial regions in 3D visual scenes in the form of natural language. Recent prevailing approaches achieve promising results by leveraging complicated structures incorporated with large-scale models, which necessitate abundant parameters and pose challenges regarding its practical applications. Besides, with limited training data, 3D dense captioners are often susceptible to overfitting, directly degrading caption generation performance. Drawing inspiration from the recent advancements in knowledge distillation, we propose a novel approach termed Prototypical Momentum Distillation (PMD) to prompt the model to generate more detailed captions. PMD incorporates Momentum Distillation (MD) with an Uncertainty-aware Prototype-anchored Clustering (UPC) strategy to transfer knowledge by considering the uncertainty of the teacher knowledge. Specifically, we employ the original captioner as the student model and maintain an Exponential Moving Average (EMA) copy of the captioner as the teacher model to impart knowledge as the auxiliary supervision of the student. To abate the misleading caused by uncertain knowledge, we present an Uncertainty-aware Prototype-anchored Clustering (UPC) strategy to cluster the distilled knowledge according to its confidence. We then transfer the rearranged knowledge from the teacher to guide the training route of the student. We conduct extensive experiments and ablation studies on two widely used benchmark datasets, ScanRefer and Nr3D. Experimental results demonstrate that PMD outperforms all state-of-the-art approaches on the benchmarks with MLE training, highlighting its effectiveness.
|
|
11:40-11:45, Paper TuBT2.6 | Add to My Program |
DUOLINGO: Dynamics Utilization for Online Translation of Actions |
|
Vemuri, Karthikeya | University of Washington |
Wu, Alan | University of Washington |
Thareja, Arnav | University of Washington |
Chen, Zoey | University of Washington |
Good, Ian | University of Washington |
Lipton, Jeffrey | Northeastern University |
Gupta, Abhishek | University of Washington |
Keywords: Continual Learning, Transfer Learning, Robust/Adaptive Control
Abstract: Robots in the real world experience wear and tear, leading to changing system dynamics. This challenge is particularly exacerbated for non-rigid systems such as soft robots or robotic systems made of meta-materials with hysteresis. This setting results in a challenging problem for most learning-based controllers that typically rely on the assumption that the system dynamics remain fixed over time. In the absence of explicit mechanisms to account for this change in dynamics, learning-based control algorithms show considerable degradation in performance over time. In this work, we consider a particular class of dynamics shift in under-actuated systems, that is localized to the dynamics of the fully actuated robot itself, while independently leaving the dynamics of the environment unchanged. This captures real-world phenomena such as fatigue or hysteresis in robotic systems. In this setting, we propose an efficient algorithm that can account for dynamics shift. Using a simple calibration procedure, we propose a technique for learning a non-linear ``action-translation" model that can capture the localized shift in dynamics. This enables continual learning and transfer despite considerable dynamics shift during the learning process. We demonstrate the efficacy of this procedure on several tasks in simulation, as well as a real-world robotic system - a 4 DoF electrically driven handed shearing auxetic (HSA) platform.
|
|
TuBT3 Regular Session, 303 |
Add to My Program |
Field Robotics: Forestry and Mining |
|
|
Chair: Vu, Minh Nhat | TU Wien, Austria |
Co-Chair: Sharf, Inna | McGill University |
|
11:15-11:20, Paper TuBT3.1 | Add to My Program |
DigiForests: A Longitudinal LiDAR Dataset for Forestry Robotics |
|
Malladi, Meher Venkata Ramakrishna | University of Bonn |
Chebrolu, Nived | University of Oxford |
Scacchetti, Irene | University of Bonn |
Lobefaro, Luca | University of Bonn |
Guadagnino, Tiziano | University of Bonn |
Casseau, Benoit | University of Oxford |
Oh, Haedam | University of Oxford |
Freißmuth, Leonard | Technical University Munich |
Karppinen, Markus | PreFor Ltd |
Schweier, Janine | Swiss Federal Institute for Forest, Snow and Landscape Research |
Leutenegger, Stefan | Technical University of Munich |
Behley, Jens | University of Bonn |
Stachniss, Cyrill | University of Bonn |
Fallon, Maurice | University of Oxford |
Keywords: Robotics and Automation in Agriculture and Forestry, Data Sets for Robotic Vision
Abstract: Forests are vital to our ecosystems, acting as carbon sinks, climate stabilizers, biodiversity centers, and wood sources. Due to their scale, monitoring and managing forests takes a lot of work. Forestry robotics offers the potential for enabling efficient and sustainable foresting practices through automation. Despite increasing interest in this field, the scarcity of robotics datasets and benchmarks in forest environments is hampering progress in this domain. In this paper, we present a real-world, longitudinal dataset for forestry robotics that enables the development and comparison of approaches for various relevant applications, ranging from semantic interpretation to estimating traits relevant to forestry management. The dataset consists of multiple recordings of the same plots in a forest in Switzerland during three different growth periods. We recorded the data with a mobile 3D LiDAR scanning setup. Additionally, we provide semantic annotations of trees, shrubs, and ground, instance-level annotations of trees, as well as more fine-grained annotations of tree stems and crowns. Furthermore, we provide reference field measurements of traits relevant to forestry management for a subset of the trees. Together with the data, we also provide open-source baseline panoptic segmentation and tree trait estimation approaches to enable the community to bootstrap further research and simplify comparisons in this domain.
|
|
11:20-11:25, Paper TuBT3.2 | Add to My Program |
Near Time-Optimal Hybrid Motion Planning for Timber Cranes |
|
Ecker, Marc-Philip | TU Wien, Austrian Institute of Technology |
Bischof, Bernhard | Austrian Institute of Technology |
Vu, Minh Nhat | TU Wien, Austria |
Froehlich, Christoph | Austrian Institute of Technology |
Glück, Tobias | AIT Austrian Institute of Technology GmbH |
Kemmetmueller, Wolfgang | TU Wien |
Keywords: Robotics and Automation in Agriculture and Forestry, Motion and Path Planning
Abstract: Efficient, collision-free motion planning is essential for automating large-scale manipulators like timber cranes. They come with unique challenges such as hydraulic actuation constraints and passive joints—factors that are seldom addressed by current motion planning methods. This paper introduces a novel approach for time-optimal, collision-free hybrid motion planning for a hydraulically actuated timber crane with passive joints. We enhance the via-point-based stochastic trajectory optimization (VP-STO) algorithm to include pump flow rate constraints and develop a novel collision cost formulation to improve robustness. The effectiveness of the enhanced VP-STO as an optimal single-query global planner is validated by comparison with an informed RRT* algorithm using a time-optimal path parameterization (TOPP). The overall hybrid motion planning is formed by combination with a gradient-based local planner that is designed to follow the global planner's reference and to systematically consider the passive joint dynamics for both collision avoidance and sway damping.
|
|
11:25-11:30, Paper TuBT3.3 | Add to My Program |
An Ultra-Light Seedling Planting Mechanism for Use in Aerial Reforestation |
|
Lloyd, Steffan | Norwegian Institute of Bioeconomy Research (NIBIO) |
Astrup, Rasmus | Norwegian Institute for Bioeconomy Research (NIBIO) |
Keywords: Robotics and Automation in Agriculture and Forestry, Aerial Systems: Applications, Mechanism Design
Abstract: This article presents a novel, ultralight tree planting mechanism for use on an aerial vehicle. Current tree planting operations are typically performed manually, and existing automated solutions use large land-based vehicles or excavators which cause significant site damage and are limited to open, clear-cut plots. Our device uses a high-pressure compressed air power system and a novel double-telescoping design to achieve a weight of only 8 kg: well within the payload capacity of medium to large drones. This article describes the functionality and key components of the device and validates its feasibility through experimental testing. We propose this mechanism as a cost-effective, highly scalable solution that avoids ground damage, produces minimal emissions, and can operate equally well on open clear-cut sites as in denser, selectively-harvested forests.
|
|
11:30-11:35, Paper TuBT3.4 | Add to My Program |
Towards Autonomous Wood-Log Grasping with a Forestry Crane: Simulator and Benchmarking |
|
Vu, Minh Nhat | TU Wien, Austria |
Wachter, Alexander | TU Wien |
Ebmer, Gerald | TU Wien |
Ecker, Marc-Philip | TU Wien, Austrian Institute of Technology |
Glück, Tobias | AIT Austrian Institute of Technology GmbH |
Nguyen, Anh | University of Liverpool |
Kemmetmueller, Wolfgang | TU Wien |
Kugi, Andreas | TU Wien |
Keywords: Robotics and Automation in Agriculture and Forestry, Agricultural Automation
Abstract: Forestry machines operated in forest production environments face challenges when performing manipulation tasks, especially regarding the complicated dynamics of underactuated crane systems and the different sizes of logs to be grasped. This study investigates the feasibility of using reinforcement learning for forestry crane manipulators in grasping and lifting a varying-diameter wood log in a simulation environment. The Mujoco physics engine creates realistic scenarios, including modeling a forestry crane with 8 degrees of freedom from CAD data and wood logs of different sizes. Our results show the successful implementation of a velocity controller for log grasping by deep reinforcement learning using a curriculum strategy. Given the six degrees of freedom (DoF) poses of the wood log, i.e., the 3D Cartesian position and the orientation, the proposed control strategy exhibits a success rate of 96% when grasping logs of different diameters and under random initial configurations of the forestry crane. In addition, reward functions and reinforcement learning baselines are investigated to provide an open-source benchmark for the community in large-scale manipulation tasks. A video with several demonstrations can be seen at https://www.acin.tuwien.ac.at/en/d18a/.
|
|
11:35-11:40, Paper TuBT3.5 | Add to My Program |
Designing Experimental Setup Emulating Log-Loader Manipulator and Implementing Anti-Sway Trajectory Planner |
|
Jebellat, Iman | McGill University |
Sideris, George | McGill University |
Saif, Rafid | McGill University |
Sharf, Inna | McGill University |
Keywords: Robotics and Automation in Agriculture and Forestry, Manipulation Planning, Mechanism Design
Abstract: Forestry machines are not easily accessible for experimentation or demonstration of research results. These mobile robots are massive, very expensive, and require a large outdoor space and permits to operate. These factors hinder conducting experiments on real forestry robots. Thus, it is essential to design experimental setups utilizing easily accessible robots in indoor labs that can effectively replicate the behavior of interest of a forestry machine. We design a setup to resemble a log-loader crane and grapple motions using a Kinova Jaco2 arm by manufacturing a specialized end-effector to attach passively to the Jaco2 arm. Passively attached grapple causes undesirable sway, which is problematic and dangerous in forestry. To address the sway problem, we employ dynamic programming to develop an anti-sway motion planner, and validate its performance for different point-to-point maneuvers in our experimental setup. We also repeat each experiment at least 6 times to ensure the repeatability and reliability of the experiments. The experimental results showcase the excellent sway-damping performance of our planner and also the very good repeatability of our experiments.
|
|
11:40-11:45, Paper TuBT3.6 | Add to My Program |
FRAME: A Modular Framework for Autonomous Map Merging: Advancements in the Field (I) |
|
Stathoulopoulos, Nikolaos | Luleå University of Technology |
Lindqvist, Björn | Luleå University of Technology |
Koval, Anton | Luleå University of Technology |
Agha-mohammadi, Ali-akbar | NASA-JPL, Caltech |
Nikolakopoulos, George | Luleå University of Technology |
Keywords: Multi-Robot Systems, Field Robots
Abstract: In this article, a novel approach for merging 3-D point cloud maps in the context of egocentric multirobot exploration is presented. Unlike traditional methods, the proposed approach leverages state-of-the-art place recognition and learned descriptors to efficiently detect overlap between maps, eliminating the need for the time-consuming global feature extraction and feature matching process. The estimated overlapping regions are used to calculate a homogeneous rigid transform, which serves as an initial condition for the general iterative closest point (GICP) point cloud registration algorithm to refine the alignment between the maps. The advantages of this approach include faster processing time, improved accuracy, and increased robustness in challenging environments. Furthermore, the effectiveness of the proposed framework is successfully demonstrated through multiple field missions of robot exploration in a variety of different underground environments.
|
|
TuBT4 Regular Session, 304 |
Add to My Program |
Vision-Based Tactile Sensors 2 |
|
|
Chair: Moghadam, Peyman | CSIRO |
Co-Chair: Jenkin, Michael | York University |
|
11:15-11:20, Paper TuBT4.1 | Add to My Program |
Evetac: An Event-Based Optical Tactile Sensor for Robotic Manipulation |
|
Funk, Niklas Wilhelm | TU Darmstadt |
Helmut, Erik | Technische Universität Darmstadt |
Chalvatzaki, Georgia | Technische Universität Darmstadt |
Calandra, Roberto | TU Dresden |
Peters, Jan | Technische Universität Darmstadt |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation, Deep Learning in Robotics and Automation, Event-based Sensing
Abstract: Optical tactile sensors have recently become popular. They provide high spatial resolution, but struggle to offer fine temporal resolutions. To overcome this shortcoming, we study the idea of replacing the RGB camera with an event-based camera and introduce a new event-based optical tactile sensor called Evetac. Along with hardware design, we develop touch processing algorithms to process its measurements online at 1000 Hz. We devise an efficient algorithm to track the elastomer’s deformation through the imprinted markers despite the sensor’s sparse output. Benchmarking experiments demonstrate Evetac’s capabilities of sensing vibrations up to 498 Hz, reconstructing shear forces, and significantly reducing data rates compared to RGB optical tactile sensors. Moreover, Evetac’s output and the marker tracking provide meaningful features for learning data-driven slip detection and prediction models. The learned models form the basis for a robust and adaptive closed-loop grasp controller capable of handling a wide range of objects. We believe that fast and efficient event-based tactile sensors like Evetac will be essential for bringing human-like manipulation capabilities to robotics.
|
|
11:20-11:25, Paper TuBT4.2 | Add to My Program |
Shape-Space Deformer: Unified Visuo-Tactile Representations for Robotic Manipulation of Deformable Objects |
|
Collins, Sean Michael Varian | CSIRO |
Tidd, Brendan | CSIRO |
Baktashmotlagh, Mahsa | UQ |
Moghadam, Peyman | CSIRO |
Keywords: Perception for Grasping and Manipulation, Deep Learning in Grasping and Manipulation, Representation Learning
Abstract: Accurate modeling of object deformations is crucial for a wide range of robotic manipulation tasks, where interacting with soft or deformable objects is essential. Current methods struggle to generalize to unseen forces or adapt to new objects, limiting their utility in real-world applications. We propose Shape-Space Deformer, a unified representation for encoding a diverse range of object deformations using template augmentation to achieve robust, fine-grained reconstructions that are resilient to outliers and unwanted artifacts. Our method improves generalization to unseen forces and can rapidly adapt to novel objects, significantly outperforming existing approaches. We perform extensive experiments to test a range of force generalisation settings and evaluate our methods ability to reconstruct unseen deformations, demonstrating significant improvements in reconstruction accuracy and robustness. Our approach is suitable for real-time performance, making it ready for downstream manipulation applications.
|
|
11:25-11:30, Paper TuBT4.3 | Add to My Program |
Depth Estimation through Translucent Surfaces |
|
Dai, Siyu | Amazon |
Lou, Xibai | Amazon.com LLC |
Nilsson, Petter | Amazon |
Thakar, Shantanu | Amazon.com |
Meeker, Cassie | Columbia University |
Gordon, Ariel | Amazon |
Kong, Xiangxin | Amazon |
Zhang, Jenny | Amazon Robotics |
Knoerlein, Benjamin | Amazon |
Ruguan, Liu | Amazon.com |
Chandrashekhar, Bhavana mysore | Amazon |
Karumanchi, Sisir | Amazon |
Keywords: Perception for Grasping and Manipulation, Data Sets for Robotic Vision, Logistics
Abstract: In this paper, we tackle the novel computer vision problem of depth estimation through a translucent barrier. This is an important problem for robotics when manipulating objects through plastic wrapping, or when predicting the depth of items behind a translucent barrier for manipulation. We propose two approaches for providing depth prediction models the ability to see through translucent barriers: removing translucent barriers through image inpainting before passing to standard depth prediction models as input, and directly training depth models with images with translucent barriers. We show that image inpainting allows standard learned monocular and stereo depth estimation models to achieve 3 cm MAE for predicting depth of shelved items behind plastic, whereas training with real images with translucent barriers allows them to achieve centimeter or sub-centimeter MAE. We demonstrate in real robot experiments that depth-aided space estimation allows the robot to place 46% additional items into shelves with translucent barriers. This paper also provides a publicly available dataset of objects occluded by translucent barriers in a tabletop environment and a shelf environment which will allow others to contribute to this novel problem that's critical for many robotic manipulation applications including suction gripping and item packing.
|
|
11:30-11:35, Paper TuBT4.4 | Add to My Program |
Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor |
|
Ablett, Trevor | University of Toronto |
Limoyo, Oliver | University of Toronto |
Sigal, Adam | McGill University |
Jilani, Affan | McGill University |
Kelly, Jonathan | University of Toronto |
Siddiqi, Kaleem | McGill University |
Hogan, Francois | Massachusetts Institute of Technology |
Dudek, Gregory | McGill University |
Keywords: Force and Tactile Sensing, Learning from Demonstration, Deep Learning in Robotics and Automation, Imitation Learning
Abstract: Contact-rich tasks continue to present many challenges for robotic manipulation. In this work, we leverage a multimodal visuotactile sensor within the framework of imitation learning (IL) to perform contact-rich tasks that involve relative motion (e.g., slipping and sliding) between the end-effector and the manipulated object. We introduce two algorithmic contributions, tactile force matching and learned mode switching, as complimentary methods for improving IL. Tactile force matching enhances kinesthetic teaching by reading approximate forces during the demonstration and generating an adapted robot trajectory that recreates the recorded forces. Learned mode switching uses IL to couple visual and tactile sensor modes with the learned motion policy, simplifying the transition from reaching to contacting. We perform robotic manipulation experiments on four door-opening tasks with a variety of observation and algorithm configurations to study the utility of multimodal visuotactile sensing and our proposed improvements. Our results show that the inclusion of force matching raises average policy success rates by 62.5%, visuotactile mode switching by 30.3%, and visuotactile data as a policy input by 42.5%, emphasizing the value of see-through tactile sensing for IL, both for data collection to allow force matching, and for policy execution to enable accurate task feedback.
|
|
11:35-11:40, Paper TuBT4.5 | Add to My Program |
DotTip: Enhancing Dexterous Robotic Manipulation with a Tactile Fingertip Featuring Curved Perceptual Morphology |
|
Zheng, Haoran | Zhejiang University |
Shi, Xiaohang | Zhejiang University |
Bao, Ange | Zhejiang Univeristy |
Jin, Yongbin | ZJU-Hangzhou Global Scientific and Technological Innovation Cent |
Zhao, Pei | Zhejiang University |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation, Dexterous Manipulation
Abstract: Tactile sensing technologies enable robots to interact with the environment in increasingly nuanced and dexterous ways. A significant gap in this domain is the absence of curved tactile sensors, which are essential for performing sophisticated manipulation tasks. In this study, we present DotTip, a tactile fingertip featuring a three-dimensional curved perceptual surface that closely mimics human fingertip morphology. A convolutional neural network-based deep learning framework precisely calculates the contact angles and forces from the sensor tactile images, achieving mean errors of 1.56 degrees and 0.28 N, respectively. DotTip's performance is evaluated in real-world tasks, demonstrating its efficacy in tactile servoing, slip prevention, and grasping, along with the more challenging benchmark task of controlling a joystick. These findings demonstrate that DotTip possesses superior 3D tactile sensing capabilities necessary for fine-grained dexterous manipulations compared to its flat counterparts.
|
|
11:40-11:45, Paper TuBT4.6 | Add to My Program |
Visual-Tactile Inference of 2.5D Object Shape from Marker Texture |
|
Jilani, Affan | McGill University |
Hogan, Francois | Massachusetts Institute of Technology |
Morissette, Charlotte | McGill University |
Dudek, Gregory | McGill University |
Jenkin, Michael | York University |
Siddiqi, Kaleem | McGill University |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation, Recognition
Abstract: Visual-tactile sensing affords abundant capabilities for contact-rich object manipulation tasks including grasping and placing. Here we introduce a shape-from-texture inspired contact shape estimation approach for visual-tactile sensors equipped with visually distinct membrane markers. Under a perspective projection camera model, measurements related to the change in marker separation upon contact are used to recover surface shape. Our approach allows for shape sensing in real time, without requiring network training or complex assumptions related to lighting, sensor geometry or marker placement. Experiments show that the surface contact shape recovered is qualitatively and quantitatively consistent with those obtained through the use of photometric stereo, the current state of the art for shape recovery in visual-tactile sensors. Importantly, our approach is applicable to a large family of sensors not equipped with photometric stereo hardware, and also to those with semi-transparent membranes. The recovery of surface shape affords new capabilities to these sensors for robotic applications, such as the estimation of contact and slippage in object manipulation tasks and the use of force matching for kinesthetic teaching using multimodal visual-tactile sensing.
|
|
TuBT5 Regular Session, 305 |
Add to My Program |
Aerial Robots 2 |
|
|
Chair: Smeur, Ewoud | Delft University of Technology |
Co-Chair: Weiss, Stephan | Universität Klagenfurt |
|
11:15-11:20, Paper TuBT5.1 | Add to My Program |
STHN: Deep Homography Estimation for UAV Thermal Geo-Localization with Satellite Imagery |
|
Xiao, Jiuhong | New York University |
Zhang, Ning | TII |
Tortei, Daniel | Technology Innovation Institute |
Loianno, Giuseppe | New York University |
Keywords: Deep Learning for Visual Perception, Aerial Systems: Applications, Localization
Abstract: Accurate geo-localization of Unmanned Aerial Vehicles (UAVs) is crucial for outdoor applications including search and rescue operations, power line inspections, and environmental monitoring. The vulnerability of Global Navigation Satellite Systems (GNSS) signals to interference and spoofing necessitates the development of additional robust localization methods for autonomous navigation. Visual Geo-localization (VG), leveraging onboard cameras and reference satellite maps, offers a promising solution for absolute localization. Specifically, Thermal Geo-localization (TG), which relies on image-based matching between thermal imagery with satellite databases, stands out by utilizing infrared cameras for effective nighttime localization. However, the efficiency and effectiveness of current TG approaches, are hindered by dense sampling on satellite maps and geometric noises in thermal query images. To overcome these challenges, we introduce STHN, a novel UAV thermal geo-localization approach that employs a coarse-to-fine deep homography estimation method. This method attains reliable thermal geo-localization within a 512-meter radius of the UAV's last known location even with a challenging 11% size ratio between thermal and satellite images, despite the presence of indistinct textures and self-similar patterns. We further show how our research significantly enhances UAV thermal geo-localization performance and robustness against geometric noises under low-visibility conditions in the wild. The code is made publicly available.
|
|
11:20-11:25, Paper TuBT5.2 | Add to My Program |
Vision Transformers for End-To-End Vision-Based Quadrotor Obstacle Avoidance |
|
Bhattacharya, Anish | University of Pennsylvania, GRASP |
Rao, Nishanth Arun | University of Pennsylvania |
Parikh, Dhruv Ketan | University of Pennsylvania |
Kunapuli, Pratik | University of Pennsylvania |
Wu, Yuwei | University of Pennsylvania |
Tao, Yuezhan | University of Pennsylvania |
Matni, Nikolai | University of Pennsylvania |
Kumar, Vijay | University of Pennsylvania |
Keywords: Vision-Based Navigation, Deep Learning for Visual Perception, Aerial Systems: Perception and Autonomy
Abstract: We demonstrate the capabilities of an attention-based end-to-end approach for high-speed quadrotor obstacle avoidance in dense, cluttered environments, with comparison to various state-of-the-art learning architectures. Quadrotor unmanned aerial vehicles (UAVs) have tremendous maneuverability when flown fast; however, as flight speed increases, traditional vision-based navigation via independent mapping, planning, and control modules breaks down due to increased sensor noise, compounding errors, and increased processing latency. Thus, learning-based, end-to-end vision-to-control networks have shown to be effective for online control of these fast robots through cluttered environments. We train and compare convolutional, U-Net, and recurrent architectures against vision transformer (ViT) models for depth image-to-control in high-fidelity simulation, observing that ViT models are more effective than others as quadrotor speeds increase and in generalization to unseen environments, while the addition of recurrence further improves performance while reducing quadrotor energy cost across all speeds. We assess performance at speeds of 1-7m/s in simulation and hardware. To the best of our knowledge, this is the first work to utilize vision transformers for end-to-end vision-based quadrotor control.
|
|
11:25-11:30, Paper TuBT5.3 | Add to My Program |
DroneDiffusion: Robust Quadrotor Dynamics Learning with Diffusion Models |
|
Das, Avirup | University of Manchester |
Yadav, Rishabh Dev | The University of Manchester |
Sun, Sihao | Delft University of Technology |
Sun, Mingfei | The University of Manchester |
Kaski, Samuel | Aalto University, University of Manchester |
Pan, Wei | The University of Manchester |
Keywords: Machine Learning for Robot Control, Model Learning for Control, Robust/Adaptive Control
Abstract: An inherent fragility of quadrotor systems stems from model inaccuracies and external disturbances. These factors hinder performance and compromise the stability of the system, making precise control challenging. Existing model-based approaches either make deterministic assumptions, utilize Gaussian-based representations of uncertainty, or rely on nominal models, all of which often fall short in capturing the complex, multimodal nature of real-world dynamics. This work introduces DroneDiffusion, a novel framework that leverages conditional diffusion models to learn quadrotor dynamics, formulated as a sequence generation task. DroneDiffusion achieves superior generalization to unseen, complex scenarios by capturing the temporal nature of uncertainties and mitigating error propagation. We integrate the learned dynamics with an adaptive controller for trajectory tracking with stability guarantees. Extensive experiments in both simulation and real-world flights demonstrate the robustness of the framework across a range of scenarios, including unfamiliar flight paths and varying payloads, velocities, and wind disturbances.
|
|
11:30-11:35, Paper TuBT5.4 | Add to My Program |
FlightForge: Advancing UAV Research with Procedural Generation of High-Fidelity Simulation and Integrated Autonomy |
|
Čapek, David | Czech Technical University in Prague |
Hrnčíř, Jan | Czech Technical University in Prague |
Baca, Tomas | Ceske Vysoke Uceni Technicke V Praze, FEL |
Jirkal, Jakub | Czech Technical University in Prague |
Vonasek, Vojtech | Czech Technical University in Prague |
Penicka, Robert | Czech Technical University in Prague |
Saska, Martin | Czech Technical University in Prague |
Keywords: Software Tools for Benchmarking and Reproducibility, Aerial Systems: Perception and Autonomy, Software, Middleware and Programming Environments
Abstract: Robotic simulators play a crucial role in the development and testing of autonomous systems, particularly in the realm of Uncrewed Aerial Vehicles (UAV). However, existing simulators often lack high-level autonomy, hindering their immediate applicability to complex tasks such as autonomous navigation in unknown environments. This limitation stems from the challenge of integrating realistic physics, photorealistic rendering, and diverse sensor modalities into a single simulation environment. At the same time, the existing photorealistic UAV simulators use mostly hand-crafted environments with limited environment sizes, which prevents the testing of long-range missions. This restricts the usage of existing simulators to only low-level tasks such as control and collision avoidance. To this end, we propose the novel FlightForge UAV open-source simulator. FlightForge offers advanced rendering capabilities, diverse control modalities, and, foremost, procedural generation of environments. Moreover, the simulator is already integrated with a fully autonomous UAV system capable of long-range flights in cluttered unknown environments. The key innovation lies in novel procedural environment generation and seamless integration of high-level autonomy into the simulation environment. Experimental results demonstrate superior sensor rendering capability compared to existing simulators, and also the ability of autonomous navigation in almost infinite environments.
|
|
11:35-11:40, Paper TuBT5.5 | Add to My Program |
AIVIO: Closed-Loop, Object-Relative Navigation of UAVs with AI-Aided Visual Inertial Odometry |
|
Jantos, Thomas | University of Klagenfurt |
Scheiber, Martin | University of Klagenfurt |
Brommer, Christian | University of Klagenfurt |
Allak, Eren | University of Klagenfurt |
Weiss, Stephan | Universität Klagenfurt |
Steinbrener, Jan | Universität Klagenfurt |
Keywords: AI-Based Methods, Vision-Based Navigation, Autonomous Vehicle Navigation
Abstract: Object-relative mobile robot navigation is essential for a variety of tasks, e.g. autonomous critical infrastructure inspection, but requires the capability to extract semantic information about the objects of interest from raw sensory data. While deep learning-based (DL) methods excel at inferring semantic object information from images, such as class and relative 6 degree of freedom (6-DoF) pose, they are computationally demanding and thus often not suitable for payload constrained mobile robots. In this letter we present a real-time capable unmanned aerial vehicle (UAV) system for object-relative, closed-loop navigation with a minimal sensor configuration consisting of an inertial measurement unit (IMU) and RGB camera. Utilizing a DL-based object pose estimator, solely trained on synthetic data and optimized for companion board deployment, the object-relative pose measurements are fused with the IMU data to perform object-relative localization. We conduct multiple real-world experiments to validate the performance of our system for the challenging use case of power pole inspection. An example closed-loop flight is presented in the supplementary video.
|
|
11:40-11:45, Paper TuBT5.6 | Add to My Program |
Unified Incremental Nonlinear Controller for the Transition Control of a Hybrid Dual-Axis Tilting Rotor Quad-Plane |
|
Mancinelli, Alessandro | Delft University of Technology |
Remes, Bart | Delft University of Technology |
de Croon, Guido | Delft University of Technology |
Smeur, Ewoud | Delft University of Technology |
Keywords: Tilt rotor UAVs, Optimization and Optimal Control, Control Architectures and Programming, Aerial Systems: Mechanics and Control
Abstract: Overactuated Tilt Rotor Unmanned Aerial Vehicles are renowned for exceptional wind resistance and a broad operational range, which poses complex control challenges due to non-affine dynamics. Traditional solutions employ multi-state switched logic controllers for transitions. Our study introduces a novel unified incremental nonlinear controller for overactuated dual-axis tilting rotor quad-planes, seamlessly managing pitch, roll, and physical actuator commands. The nonlinear control allocation problem is addressed using a sequential quadratic programming iterative optimization algorithm, well-suited for nonlinear actuator effectiveness in thrust vectoring vehicles. The controller design integrates desired roll and pitch angle inputs as an additional degree of freedom during slow airspeed phases. At high airspeed, the roll and pitch angles cannot be chosen freely and are set by the controller. We incorporate an angle of attack protection logic to prevent wing stall and a yaw rate reference model for coordinated turns. Flight tests confirm the controller's effectiveness in transitioning from hovering to forward flight, achieving desired vertical and lateral accelerations, and reverting to hover.
|
|
TuBT6 Regular Session, 307 |
Add to My Program |
Perception for Mobile Robots 2 |
|
|
Chair: O'Kane, Jason | Texas A&M University |
Co-Chair: Wang, Wenshan | Carnegie Mellon University |
|
11:15-11:20, Paper TuBT6.1 | Add to My Program |
Graph2Nav: 3D Object-Relation Graph Generation to Robot Navigation |
|
Shan, Tixiao | SRI International |
Rajvanshi, Abhinav | SRI International |
Mithun, Niluthpol Chowdhury | SRI International |
Chiu, Han-Pang | SRI International |
Keywords: Semantic Scene Understanding, AI-Enabled Robotics
Abstract: We propose Graph2Nav, a real-time 3D object-relation graph generation framework, for autonomous navigation in the real world. Our framework fully generates and exploits both 3D objects and a rich set of semantic relationships among objects in a 3D layered scene graph, which is applicable to both indoor and outdoor scenes. It learns to generate 3D semantic relations among objects, by leveraging and advancing state-of-the-art 2D panoptic scene graph works into the 3D world via 3D semantic mapping techniques. This approach avoids previous training data constraint in learning 3D scene graphs directly from 3D data. We conduct experiments to validate the accuracy in locating 3D objects and labeling object-relations in our 3D scene graphs. We also evaluate the impact from Graph2Nav via integration with SayNav, a state-of-the-art planner based on large language models, on an unmanned ground robot to object search tasks in real environments. Our results demonstrate that modeling object relations in our scene graphs improves search efficiency in these navigation tasks.
|
|
11:20-11:25, Paper TuBT6.2 | Add to My Program |
Transferring Visual Knowledge: Semi-Supervised Instance Segmentation for Object Navigation across Varying Height Viewpoints |
|
Zheng, Qiu | The Chinese University of HongKong, Shenzhen |
Hu, Junjie | The Chinese University of Hong Kong, Shenzhen |
Liu, Yuming | The Chinese University of Hong Kong, Shenzhen |
Zeng, Zengfeng | Baidu Inc., Beijing, China |
Wang, Fan | Baidu International Technology (Shenzhen) Co., Ltd |
Lam, Tin Lun | The Chinese University of Hong Kong, Shenzhen |
Keywords: Object Detection, Segmentation and Categorization, Vision-Based Navigation, Autonomous Vehicle Navigation
Abstract: The object navigation task requires robots to understand the semantic regularities in their environments. However, existing modular object navigation frameworks rely on instance segmentation models trained at fixed camera height viewpoints, limiting generalization performance and increasing labeling costs for new height viewpoints. To tackle this issue, we propose a semi-supervised method that transfers knowledge from a source height to a target height, minimizing the need for additional labels. Our approach introduces three key innovations: i) a projection policy to enhance the teacher model's detection capabilities at the target height, ii) a dynamic weight mechanism that emphasizes high-confidence pseudo-labels to reduce overfitting, and iii) a prototype contrast transferring method to transfer knowledge effectively. Experiments on the Habitat-Matterport 3D (HM3D) dataset show our method outperforms state-of-the-art semi-supervised techniques, improving both segmentation accuracy and navigation performance. The code is available at:https://github.com/FreeformRobotics/TransferKnowledge.
|
|
11:25-11:30, Paper TuBT6.3 | Add to My Program |
An Algorithm for Geometric Navigation Planning under Uncertainty Using Terrain Boundary Detection |
|
Carley, Bennett | Texas A&M University |
Bamgbelu, Adeolayemi | Texas A&M University |
Zhang, XiMing | Texas A&M University |
O'Kane, Jason | Texas A&M University |
Keywords: Reactive and Sensor-Based Planning, Planning under Uncertainty, Marine Robotics
Abstract: We explore a navigation planning problem under uncertainty for a simple robot with extremely limited sensing. Our robot can turn subject to significant proportional error and move forward. As it moves in an environment with a known terrain map, the robot can detect changes in the terrain at its current position. Given an initial pose and a goal segment, the robot should find some sequence of actions to travel reliably from start to goal, if such a sequence exists. The resulting plan should guarantee the robot reaches the goal segment despite any movement errors experienced within some known error bound. In this paper, we propose an algorithm to find such an action sequence, implement and evaluate this algorithm, and present evidence for the feasibility of such an algorithm in an underwater navigation setting.
|
|
11:30-11:35, Paper TuBT6.4 | Add to My Program |
ProMi: An Efficient Prototype-Mixture Baseline for Few-Shot Segmentation with Bounding-Box Annotations |
|
Chiaroni, Florent | Thales |
Ayub, Ali | Concordia University |
Ahmad, Ola | Thales Canada |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Learning Categories and Concepts
Abstract: In robotics applications, few-shot segmentation is crucial because it allows robots to perform complex tasks with minimal training data, facilitating their adaptation to diverse, real-world environments. However, pixel-level annotations of even small amount of images is highly time-consuming and costly. In this paper, we present a novel few-shot binary segmentation method based on bounding-box annotations instead of pixel-level labels. We introduce, ProMi, an efficient prototype-mixture-based method that treats the background class as a mixture of distributions. Our approach is simple, training-free, and effective, accommodating coarse annotations with ease. Compared to existing baselines, ProMi achieves the best results across different datasets with significant gains, demonstrating its effectiveness. Furthermore, we present qualitative experiments tailored to real-world mobile robot tasks, demonstrating the applicability of our approach in such scenarios. Our code: https://github.com/ThalesGroup/promi.
|
|
11:35-11:40, Paper TuBT6.5 | Add to My Program |
IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes |
|
Zhang, Haochen | Carnegie Mellon University |
Zantout, Nader | Carnegie Mellon University |
Kachana, Pujith | Carnegie Mellon University |
Zhang, Ji | Carnegie Mellon University |
Wang, Wenshan | Carnegie Mellon University |
Keywords: Semantic Scene Understanding, Data Sets for Robotic Vision, Vision-Based Navigation
Abstract: With the recent rise of large language models, vision-language models, and other general foundation models, there is growing potential for multimodal, multi-task robotics that can operate in diverse environments given natural language input. One such application is indoor navigation using natural language instructions. However, despite recent progress, this problem remains challenging due to the 3D spatial reasoning and semantic understanding required. Additionally, the language used may be imperfect or misaligned with the scene, further complicating the task. To address this challenge, we curate a benchmark dataset, IRef-VLA, for Interactive Referential Vision and Language-guided Action in 3D Scenes with imperfect references. IRef-VLA is the largest real-world dataset for the referential grounding task, consisting of over 11.5K scanned 3D rooms from existing datasets, 7.6M heuristically generated semantic relations, and 4.7M referential statements. Our dataset also contains semantic object and room annotations, scene graphs, navigable free space annotations, and is augmented with statements where the language has imperfections or ambiguities. We verify the generalizability of our dataset by evaluating with state-of-the-art models to obtain a performance baseline and also develop a graph-search baseline to demonstrate the performance bound and generation of alternatives using scene-graph knowledge. With this benchmark, we aim to provide a resource for 3D scene understanding that aids the development of robust, interactive navigation systems. The dataset and all source code is publicly released.
|
|
11:40-11:45, Paper TuBT6.6 | Add to My Program |
PTS-Map: Probabilistic Terrain State Map for Uncertainty-Aware Traversability Mapping in Unstructured Environments |
|
Kim, Dong-Wook | Seoul National University |
Son, E-In | Seoul National University |
Kim, Chan | Seoul National University |
Hwang, Ji-Hoon | Seoul National University |
Seo, Seung-Woo | Seoul National University |
Keywords: Field Robots, Autonomous Vehicle Navigation, Probability and Statistical Methods
Abstract: Traversability mapping for autonomous navigation in unstructured environments has been widely investigated for decades. However, it remains challenging due to the uncertainty in geometry perception and the simplified representation of traversability maps that fail to capture detailed structures of environments. We propose PTS-Map, a 2.5D probabilistic terrain state map to address these issues. PTS-Map sequentially updates the ground surface state and above-ground elevation state, explicitly distinguishing the geometric features of ground and obstacles. During state updates, we introduce a novel ground uncertainty estimation to mitigate the effects of unreliable feature measurements. By effectively designing the terrain states and addressing the uncertainty of the ground surface, PTS-Map constructs a temporally consistent traversability map that provides precise ground conditions and vertical features relevant to navigation. Experiments are conducted in various large-scale unstructured environments with distinct characteristics. PTS-Map outperforms other state-of-the-art methods in success rate and efficiency by constructing a precise traversability map of the environments.
|
|
TuBT7 Regular Session, 309 |
Add to My Program |
Marine Robotics 1 |
|
|
Chair: Fischer, Tobias | Queensland University of Technology |
Co-Chair: Clement, Benoit | ENSTA, Institut Polytechnique De Paris |
|
11:15-11:20, Paper TuBT7.1 | Add to My Program |
Efficient Non-Myopic Layered Bayesian Optimization for Large-Scale Bathymetric Informative Path Planning |
|
Wallén Kiessling, Alexander | Royal Institute of Technology (KTH) |
Torroba Balmori, Ignacio | KTH Royal Institute of Technology |
Sidrane, Chelsea | KTH Royal Institute of Technology |
Stenius, Ivan | KTH |
Tumova, Jana | KTH Royal Institute of Technology |
Folkesson, John | KTH |
Keywords: Marine Robotics, Reactive and Sensor-Based Planning, Mapping
Abstract: Informative path planning (IPP) applied to bathymetric mapping allows AUVs to focus on feature-rich areas to quickly reduce uncertainty and increase mapping efficiency. Existing methods based on Bayesian optimization (BO) over Gaussian Process (GP) maps work well on small scenarios but they are short-sighted and computationally heavy when mapping larger areas, hindering deployment in real applications. To overcome this, we present a 2-layered BO IPP method that performs non-myopic, real-time planning in a tree Search fashion over large Stochastic Variational GP maps, while respecting the AUV motion constraints and accounting for localization uncertainty. Our framework outperforms the standard industrial lawn-mowing pattern and a myopic baseline in a set of hardware in the loop (HIL) experiments in an embedded platform over real bathymetry.
|
|
11:20-11:25, Paper TuBT7.2 | Add to My Program |
Visual Lidar Recursive Online Tracker (ViLiROT) for Autonomous Surface Vessels |
|
Hilmarsen, Henrik | NTNU |
Dalhaug, Nicholas | Norwegian University of Science and Technology |
Nygård, Trym Anthonsen | NTNU |
Brekke, Edmund | NTNU |
Stahl, Annette | Norwegian University of Science and Technology (NTNU) |
Mester, Rudolf | NTNU Trondheim |
Keywords: Marine Robotics, Visual Tracking, Collision Avoidance
Abstract: We propose a multi-sensor fusion pipeline for multiple object tracking in autonomous surface vessels using lidar and camera data. Our approach follows the tracking-by-detection paradigm, leveraging the precision of lidar for accurate state estimation and camera data for robust association. The method addresses issues with false tracks from lidar returns by suppressing non-moving objects on the basis of optical flow. We compare the proposed pipeline against prior work, particularly in the use of lidar and stereo cameras as depth modalities, demonstrating its effectiveness in improving tracking performance.
|
|
11:25-11:30, Paper TuBT7.3 | Add to My Program |
Open-Set Semantic Uncertainty Aware Metric-Semantic Graph Matching |
|
Singh, Kurran | Massachusetts Institute of Technology |
Leonard, John | MIT |
Keywords: Marine Robotics
Abstract: Underwater object-level mapping requires incorporating visual foundation models to handle the uncommon and often previously unseen object classes encountered in marine scenarios. In this work, a metric of semantic uncertainty for open-set object detections produced by visual foundation models is calculated and then incorporated into an object-level uncertainty tracking framework. Object-level uncertainties and geometric relationships between objects are used to enable robust object-level loop closure detection for unknown object classes. The above loop closure detection problem is formulated as a graph matching problem. While graph matching, in general, is NP-Complete, a solver for an equivalent formulation of the proposed graph matching problem as a graph editing problem is tested on multiple challenging underwater scenes. Results for this solver as well as three other solvers demonstrate that the proposed methods are feasible for real-time use in marine environments for the robust, open-set, multi-object, semantic-uncertainty-aware loop closure detection. Further experimental results on the KITTI dataset demonstrate that the method generalizes to large-scale terrestrial scenes.
|
|
11:30-11:35, Paper TuBT7.4 | Add to My Program |
AI-Enhanced Automatic Design of Efficient Underwater Gliders |
|
Chen, Peter Yichen | MIT |
Ma, Pingchuan | MIT CSAIL |
Hagemann, Niklas | Massachusetts Institute of Technology |
Romanishin, John | MIT |
Wang, Wei | University of Wisconsin-Madison |
Rus, Daniela | MIT |
Matusik, Wojciech | MIT |
Keywords: Deep Learning Methods, Machine Learning for Robot Control, Marine Robotics
Abstract: The development of novel autonomous underwater gliders has been hindered by limited shape diversity, primarily due to the reliance on traditional design tools that depend heavily on manual trial and error. Building an automated design framework is challenging due to the complexities of representing glider shapes and the high computational costs associated with modeling complex solid-fluid interactions. In this work, we introduce an AI-enhanced automated computational framework designed to overcome these limitations by enabling the creation of underwater robots with non-trivial hull shapes. Our approach involves an algorithm that co-optimizes both shape and control signals, utilizing a reduced-order geometry representation and a differentiable neural-network-based fluid surrogate model. This end-to-end design workflow facilitates rapid iteration and evaluation of hydrodynamic performance, leading to the discovery of optimal and complex hull shapes across various control settings. We validate our method through wind tunnel experiments and swimming pool gliding tests, demonstrating that our computationally designed gliders surpass manually designed counterparts in terms of energy efficiency. By addressing challenges in efficient shape representation and neural fluid surrogate models, our work paves the way for the development of highly efficient underwater gliders, with significant implications for ocean exploration and environmental monitoring.
|
|
11:35-11:40, Paper TuBT7.5 | Add to My Program |
EnKode: Active Learning of Unknown Flows with Koopman Operators |
|
Li, Alice Kate | University of Pennsylvania |
Costa Silva, Thales | University of Pennsylvania |
Hsieh, M. Ani | University of Pennsylvania |
Keywords: Environment Monitoring and Management, Marine Robotics, Dynamics
Abstract: In this letter, we address the task of adaptive sampling to model vector fields. When modeling environmental phenomena with a robot, gathering high resolution information can be resource intensive. Actively gathering data and modeling flows with the data is a more efficient alternative. However, in such scenarios, data is often sparse and thus requires flow modeling techniques that are effective at capturing the relevant dynamical features of the flow to ensure high prediction accuracy of the resulting models. To accomplish this effectively, regions with high informative value must be identified. We propose EnKode, an active sampling approach based on Koopman Operator theory and ensemble methods that can build high quality flow models and effectively estimate model uncertainty. For modeling complex flows, EnKode provides comparable or better estimates of unsampled flow regions than Gaussian Process Regression models with hyperparameter optimization. Additionally, our active sensing scheme provides more accurate flow estimates than comparable strategies that rely on uniform sampling. We evaluate EnKode using three common benchmarking systems: the Bickley Jet, Lid-Driven Cavity flow with an obstacle, and real ocean currents from the National Oceanic and Atmospheric Administration (NOAA).
|
|
TuBT8 Regular Session, 311 |
Add to My Program |
Medical Robotics 2 |
|
|
Chair: Doulgeri, Zoe | Aristotle University of Thessaloniki |
Co-Chair: Liu, Fei | University of Tennessee Knoxville |
|
11:15-11:20, Paper TuBT8.1 | Add to My Program |
A Cylindrical Halbach Array Magnetic Actuation System for Longitudinal Robot Actuation across 2D Workplane |
|
Sun, Hongzhe | The Chinese University of Hong Kong |
Cheng, Shing Shin | The Chinese University of Hong Kong |
Keywords: Medical Robots and Systems, Automation at Micro-Nano Scales, Mechanism Design
Abstract: Magnetic actuation has been widely investigated for miniature robot control due to its wireless control capability. As a permanent magnetic (PM) actuation system, the Halbach array can provide strong and controllable magnetic fields with large motion workspace. However, existing cylindrical Halbach array systems can only generate axial force along its central axis and require the workspace (i.e., patient anatomy) to be manipulated inside the system for any useful robot manipulation, severely limiting their application for robotic surgery. In this work, we introduce a cylindrical Halbach array actuation system capable of generating a magnetic field with longitudinal gradients across a 2-dimensional (2D) workplane instead of only along the central axis, effectively extending the longitudinal force actuation coverage from 1D to a 2D plane. This is achieved by optimizing the magnet sizes and roll angles of the Halbach rows arranged circumferentially around the system. Co-alignment between the field and gradient directions is also achieved through proper configuration of the magnet pitch angles along each Halbach row, resulting in tip-leading robot motion capability. A series of model-based simulations were performed during the optimization process and later verified experimentally. The actuation system was experimentally demonstrated to stably drive a 2mm diameter magnetic robot longitudinally at different locations within the workplane and at different velocities. This represents a significant advancement towards deploying cylindrical Halbach array systems for robot manipulation in clinical cases.
|
|
11:20-11:25, Paper TuBT8.2 | Add to My Program |
Towards Autonomous Verification: Integrating Cognitive AI and Semantic Digital Twins in Medical Robotics |
|
Mania, Patrick | University of Bremen |
Neumann, Michael | Uni Bremen |
Kenghagho Kenfack, Franklin | University of Bremen |
Beetz, Michael | University of Bremen |
Keywords: Medical Robots and Systems, Service Robotics, Computer Vision for Medical Robotics
Abstract: In medical laboratory environments, where precision and safety are critical, the deployment of autonomous robots requires not only accurate object manipulation but also the ability to verify task success to comply with regulatory requirements. This paper introduces a novel imagination-enabled perception framework that integrates cognitive AI with semantic digital twins to allow medical robots to simulate task outcomes, compare them with real-world results, and autonomously verify the success of their actions. Our approach addresses challenges related to handling small and transparent objects commonly found in sterility testing kits and other related consumables. By enhancing the RoboKudo perception system with parthood-based reasoning, we enable more accurate task verification through focused attention on object subparts. Experiments show that our system significantly improves performance compared to traditional object-centric methods, increasing accuracy in complex environments without the need for extensive retraining. This work demonstrates a novel concept in making robotic systems more adaptable and reliable for critical tasks in medical laboratories.
|
|
11:25-11:30, Paper TuBT8.3 | Add to My Program |
SC-Former: A Segmentation Convolution Transformer for Lung Surgery Robots |
|
Li, Nanyu | Broncus Medical |
Keywords: Medical Robots and Systems, Computer Vision for Medical Robotics
Abstract: For lung surgery robots, the precise segmentation of pulmonary fissures is very important. Damaging the interlobar fissures during surgery can have serious consequences. Accurately segmenting weak and abnormal fissures commonly found in clinical CT scans remains a challenging task. To solve the above problem, we aimed to develop a novel Convolution Transformer for accurate fissure segmentation (SC-Former). The proposed SC-Former adopts an encoder, attention block, and decoder structure. First, we designed an encoder with a hybrid CNNs-transformer block that ingeniously amalgamates coordinate convolution and coordinate transformer to effectively capture both local and global feature information. Second, we introduced the long skip connections of our designed attention block at layers of the decoder-encoder structure to emphasize the field of view for fissures. Third, we added the distance map strategy to alleviate the challenge of training the network to segment the false positives from the complex textures in the lung. Fourth, we developed a multi-scale supervision strategy for independent prediction at various decoder levels, effectively integrating multi-scale semantic information to facilitate the segmentation of weak and abnormal fissures. Because of the lack of open-source inter-pulmonary fissure datasets, we collected 3D CT scans from 400 participants in the clinical trial and created a new high-quality dataset: BMI dataset. Extensive experiments on this dataset revealed the great superiority of our method over several state-of-the-art competitors. The ablation study also validated the effectiveness and robustness of each part of SC-Former.
|
|
11:30-11:35, Paper TuBT8.4 | Add to My Program |
Passive Bilateral Surgical Teleoperation with RCM and Spatial Constraints in the Presence of Time Delays |
|
Kastritsi, Theodora | Istituto Italiano Di Tecnologia |
Prapavesis Semetzidis, Theofanis | Aristotle University of Thessaloniki |
Doulgeri, Zoe | Aristotle University of Thessaloniki |
Keywords: Surgical Robotics: Laparoscopy, Telerobotics and Teleoperation, Physical Human-Robot Interaction, Passivity, Stability and Performance
Abstract: The primary issue in bilateral teleportation setups is the existence of communication delays, which can destabilize the system. We are addressing this challenge in the case of a bilateral leader-follower surgical setup, where the surgeon uses a haptic device as the leader robot to manipulate the surgical instrument held by a general-purpose manipulator, the follower robot. The follower robot is equipped with an elongated tool that through a small incision passes inside the patient's body, where sensitive structures may exist. These structures may include organs, arteries, or veins that require protection during surgery. To address this challenge, we propose a bilateral control framework that is proven to maintain passivity, ensure bounded tracking errors between the leader and follower robots, and impose remote center of motion and spatial constraints related with the sensitive structures, all in the presence of constant and variable communication delays. Experimental results in a virtual intraoperative environment, using a point cloud of a kidney and its surrounding vessels, demonstrate the effectiveness of our control scheme under various communication delay scenarios.
|
|
TuBT9 Regular Session, 312 |
Add to My Program |
Motion Planning 2 |
|
|
Chair: Salzman, Oren | Technion |
Co-Chair: Kousik, Shreyas | Georgia Institute of Technology |
|
11:15-11:20, Paper TuBT9.1 | Add to My Program |
Direction Informed Trees (DIT*): Optimal Path Planning Via Direction Filter and Direction Cost Heuristic |
|
Zhang, Liding | Technical University of Munich |
Chen, Kejia | Technical University of Munich |
Cai, Kuanqi | Technical University of Munich |
Zhang, Yu | Technical University of Munich |
Dang, Yixuan | Technische Universität München |
Wu, Yansong | Technische Universität München |
Bing, Zhenshan | Technical University of Munich |
Wu, Fan | Technical University of Munich |
Haddadin, Sami | Technical University of Munich |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: Motion and Path Planning, Manipulation Planning, Task and Motion Planning
Abstract: Optimal path planning requires finding a series of feasible states from the starting point to the goal to optimize objectives. Popular path planning algorithms, such as Effort Informed Trees (EIT*), employ effort heuristics to guide the search. Effective heuristics are accurate and computationally efficient, but achieving both can be challenging due to their conflicting nature. This paper proposes Direction Informed Trees (DIT*), a sampling-based planner that focuses on optimizing the search direction for each edge, resulting in goal bias during exploration. We define edges as generalized vectors and integrate similarity indexes to establish a directional filter that selects the nearest neighbors and estimates direction costs. The estimated direction cost heuristics are utilized in edge evaluation. This strategy allows the exploration to share directional information efficiently. DIT* convergence faster than existing single-query, sampling-based planners on tested problems in R^4 to R^16 and has been demonstrated in real-world environments with various planning tasks. A video showcasing our experimental results is available at: https://youtu.be/2SX6QT2NOek.
|
|
11:20-11:25, Paper TuBT9.2 | Add to My Program |
Optimal Motion Planning for a Class of Dynamical Systems |
|
Rousseas, Panagiotis | National Technical University of Athens |
Bechlioulis, Charalampos | University of Patras |
Kyriakopoulos, Kostas | New York University - Abu Dhabi |
Keywords: Motion and Path Planning, Optimization and Optimal Control
Abstract: A novel method for optimal motion planning in the context of a class of dynamical system is proposed in this work. Our approach is based on the design of a provably safe and convergent actor structure, which is optimized via a policy iteration method. The proposed actor has wide applications, from control of mechanical systems to providing acceleration commands for more complex robotic platforms. Extra care is taken to provide theoretical guarantees, and the scheme is validated against an existing sampling-based planner.
|
|
11:25-11:30, Paper TuBT9.3 | Add to My Program |
Asymptotically-Optimal Multi-Query Path Planning for a Polygonal Robot |
|
Zhang, Duo | Rutgers University |
Ye, Zihe | Rutgers University |
Yu, Jingjin | Rutgers University |
Keywords: Motion and Path Planning, Constrained Motion Planning, Computational Geometry
Abstract: Shortest-path roadmaps, also known as reduced visibility graphs, provide a highly efficient multi-query method for computing optimal paths in two-dimensional environments. Combined with Minkowski sum computations, shortest-path roadmaps can compute optimal paths for a translating robot in 2D. In this study, we explore the intuitive idea of stacking up a set of reduced visibility graphs at different orientations for a polygonal holonomic robot to support the fast computation of near-optimal paths, allowing simultaneous 2D translation and rotation. The resulting algorithm, rotation-stacked visibility graph(RVG), is shown to be resolution-complete and asymptotically optimal. Extensive computational experiments show RVG significantly outperforms state-of-the-art single- and multi-query sampling-based methods on both computation time and solution optimality fronts.
|
|
11:30-11:35, Paper TuBT9.4 | Add to My Program |
Asymptotically Optimal Sampling-Based Motion Planning through Anytime Incremental Lazy Bidirectional Heuristic Search |
|
Wang, Yi | University of New Hampshire |
Mu, Bingxian | University of New Hampshire |
Salzman, Oren | Technion |
Keywords: Motion and Path Planning
Abstract: This paper introduces Bidirectional Lazy Informed Trees (BLIT*), the first algorithm to incorporate anytime incremental lazy bidirectional heuristic search (Bi-HS) into batch-wise sampling-based motion planning (Bw-SBMP). BLIT* operates on batches of informed states (states that can potentially improve the cost of the incumbent solution) structured as an implicit random geometric graph (RGG). The computational cost of collision detection is mitigated via a new lazy edge-evaluation strategy by focusing on states near obstacles. Experimental results, especially in high dimensions, show that BLIT* outperforms existing Bw-SBMP planners by efficiently finding an initial solution and effectively improving the quality as more computational resources are available.
|
|
11:35-11:40, Paper TuBT9.5 | Add to My Program |
Propagative Distance Optimization for Motion Planning |
|
Chen, Yu | Carnegie Mellon University |
Xu, Jinyun | Carnegie Mellon University |
Cai, Yilin | Georgia Institute of Technology |
Wong, Ting-Wei | Carnegie Mellon University |
Ren, Zhongqiang | Shanghai Jiao Tong University |
Choset, Howie | Carnegie Mellon University |
Shi, Guanya | Carnegie Mellon University |
Keywords: Motion and Path Planning, Constrained Motion Planning
Abstract: This paper focuses on the motion planning problem for serial articulated robots with revolute joints under kinematic constraints. Many motion planners leverage iterative local optimization methods but are often trapped in local minima due to non-convexity of the problem. A key reason for the non-convexity is the trigonometric term when parameterizing the kinematics using joint angles. Recent distance-based formulation can eliminate these trigonometric terms by formulating the kinematics based on distances, and has shown superior performance against classic joint angle based formulations in domains like inverse kinematics (IK). However, distance-based kinematics formulations have not yet been studied for motion planning, and naively applying them for motion planning may lead to poor computational efficiency. In particular, IK seeks one configuration while motion planning seeks a sequence of configurations, which greatly increases the scale of the underlying optimization problem. This paper proposes Propagative Distance Optimization for Motion Planning (PDOMP), which addresses the challenge by (i) introducing a new compact representation that reduces the number of variables in the distance-based formulation, and (ii) leveraging the chain structure to efficiently compute forward kinematics and Jacobians of the robot among waypoints along a path.
|
|
11:40-11:45, Paper TuBT9.6 | Add to My Program |
Dynamically Feasible Path Planning in Cluttered Environments Via Reachable B'ezier Polytopes |
|
Csomay-Shanklin, Noel | California Institute of Technology |
Compton, William | California Institute of Technology |
Ames, Aaron | California Institute of Technology |
Keywords: Motion and Path Planning, Optimization and Optimal Control, Legged Robots
Abstract: The deployment of robotic systems in real world environments requires the ability to quickly produce paths through cluttered, non-convex spaces. These planned trajectories must be both kinematically feasible (i.e., collision free) and dynamically feasible (i.e., satisfy the underlying system dynamics), necessitating a consideration of both the free space and the dynamics of the robot in the path planning phase. In this work, we explore the application of reachable Bezier polytopes as an efficient tool for generating trajectories satisfying both kinematic and dynamic requirements. Furthermore, we demonstrate that by offloading specific computation tasks to the GPU, such an algorithm can meet tight real time requirements. We propose a layered control architecture that efficiently produces collision free and dynamically feasible paths for nonlinear control systems, and demonstrate the framework on the tasks of 3D hopping in a cluttered environment.
|
|
TuBT10 Regular Session, 313 |
Add to My Program |
Multi-Robot Systems 1 |
|
|
Chair: Quattrini Li, Alberto | Dartmouth College |
Co-Chair: Grosu, Radu | TU Wien |
|
11:15-11:20, Paper TuBT10.1 | Add to My Program |
CoPeD--Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments |
|
Zhou, Yang | New York University |
Quang, Long | U.S. DEVCOM Army Research Laboratory |
Nieto-Granda, Carlos | DEVCOM U.S. Army Research Laboratory |
Loianno, Giuseppe | New York University |
Keywords: Data Sets for Robotic Vision, Deep Learning for Visual Perception, Multi-Robot Systems
Abstract: In the past decade, although single-robot perception has made significant advancements, the exploration of multi-robot collaborative perception remains largely unexplored. This involves fusing compressed, intermittent, limited, heterogeneous, and asynchronous environmental information across multiple robots to enhance overall perception, despite challenges like sensor noise, occlusions, and sensor failures. One major hurdle has been the lack of real-world datasets. This paper presents a pioneering and comprehensive real-world multi-robot collaborative perception dataset to boost research in this area. Our dataset leverages the untapped potential of air-ground robot collaboration featuring distinct spatial viewpoints, complementary robot mobilities, coverage ranges, and sensor modalities. It features raw sensor inputs, pose estimation, and optional high-level perception annotation, thus accommodating diverse research interests. Compared to existing datasets predominantly designed for Simultaneous Localization and Mapping (SLAM), our setup ensures a diverse range and adequate overlap of sensor views to facilitate the study of multi-robot collaborative perception algorithms. We demonstrate the value of this dataset qualitatively through multiple collaborative perception tasks. We believe this work will unlock the potential research of high-level scene understanding through multi-modal collaborative perception in multi-robot settings.
|
|
11:20-11:25, Paper TuBT10.2 | Add to My Program |
Generalized Synchronized Active Learning for Multi-Agent-Based Data Selection on Mobile Robotic Systems |
|
Schmidt, Sebastian | BMW |
Stappen, Lukas | BMW Group Research and Technology |
Schwinn, Leo | Techincal University Munich |
Günnemann, Stephan | Technical University of Munich |
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Deep Learning Methods
Abstract: In mobile robotics, perception in uncontrolled environments like autonomous driving is a central hurdle. Existing active learning frameworks can help enhance perception by efficiently selecting data samples for labeling but are often constrained by the necessity of full data availability in data centers, hindering real-time on-field adaptations. To address this, our work unveils a novel active learning formulation optimized for multi-robot settings. It harnesses the collaborative power of several robotic agents, considerably enhancing data acquisition and synchronization processes. Experimental evidence indicates that our approach markedly surpasses traditional active learning frameworks by up to 2.5 percent points and 90% less data uploads, delivering new possibilities for advancements in the realms of mobile robotics and autonomous systems.
|
|
11:25-11:30, Paper TuBT10.3 | Add to My Program |
Scenario-Based Curriculum Generation for Multi-Agent Driving |
|
Brunnbauer, Axel | TU Wien |
Berducci, Luigi | TU Wien |
Priller, Peter | AVL List GmbH |
Nickovic, Dejan | AIT Austrian Institute of Technology |
Grosu, Radu | TU Wien |
Keywords: Reinforcement Learning, Intelligent Transportation Systems, Software Tools for Benchmarking and Reproducibility
Abstract: The automated generation of diversified training scenarios has been an important ingredient in many complex learning tasks, especially in real-world application domains such as autonomous driving, where auto-curriculum generation is considered vital for obtaining robust and general policies. However, crafting traffic scenarios with multiple, heterogeneous agents is typically considered a tedious and time-consuming task, especially in more complex simulation environments. To this end, we introduce MATS-Gym, a multi-agent training framework for autonomous driving that uses partial-scenario specifications to generate traffic scenarios with a variable number of agents which are executed in CARLA, a high-fidelity driving simulator. MATS-Gym reconciles scenario execution engines, such as Scenic and ScenarioRunner, with established multi-agent training frameworks where the interaction between the environment and the agents is modeled as a partially observable stochastic game. Furthermore, we integrate MATS-Gym with techniques from unsupervised environment design to automate the generation of adaptive auto-curricula, which is the first application of such algorithms to the domain of autonomous driving. The code is available at https://github.com/AutonomousDrivingExaminer/mats-gym.
|
|
11:30-11:35, Paper TuBT10.4 | Add to My Program |
ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation Via Large Language Models |
|
Venkatesh, L.N Vishnunandan | Purdue University |
Min, Byung-Cheol | Purdue University |
Keywords: Learning Categories and Concepts, Multi-Robot Systems, Planning, Scheduling and Coordination
Abstract: Incorporating language comprehension into robotic operations unlocks significant advancements in robotics, but also presents distinct challenges, particularly in executing spatially oriented tasks like pattern formation. This paper introduces ZeroCAP, a novel system that integrates large language models with multi-robot systems for zero-shot context aware pattern formation. Grounded in the principles of language-conditioned robotics, ZeroCAP leverages the interpretative power of language models to translate natural language instructions into actionable robotic configurations. This approach combines the synergy of vision-language models, cutting-edge segmentation techniques and shape descriptors, enabling the realization of complex, context-driven pattern formations in the realm of multi robot coordination. Through extensive experiments, we demonstrate the systems proficiency in executing complex context aware pattern formations across a spectrum of tasks, from surrounding and caging objects to infilling regions. This not only validates the system's capability to interpret and implement intricate context-driven tasks but also underscores its adaptability and effectiveness across varied environments and scenarios. The experimental videos and additional information about this work can be found at https://sites.google.com/view/zerocap/home.
|
|
11:35-11:40, Paper TuBT10.5 | Add to My Program |
Multi-Robot Collaboration through Reinforcement Learning and Abstract Simulation |
|
Labiosa, Adam | University of Wisconsin-Madison |
Hanna, Josiah | University of Wisconsin -- Madison |
Keywords: Reinforcement Learning, Cooperating Robots, Machine Learning for Robot Control
Abstract: Teams of people coordinate to perform complex tasks by forming abstract mental models of world and agent dynamics. The use of abstract models contrasts with much recent work in robot learning that uses a high-fidelity simulator and reinforcement learning (RL) to obtain policies for physical robots. Motivated by this difference, we investigate the extent to which so-called abstract simulators can be used for multi-agent reinforcement learning (MARL) and the resulting policies successfully deployed on teams of physical robots. An abstract simulator models the robot's target task at a high-level of abstraction and discards many details of the world that could impact optimal decision-making. Policies are trained in an abstract simulator then transferred to the physical robot by making use of separately-obtained low-level perception and motion control modules. We identify three key categories of modifications to the abstract simulator that enable policy transfer to physical robots: simulation fidelity enhancements, training optimizations and simulation stochasticity. We then run an empirical study with extensive ablations to determine the value of each modification category for enabling policy transfer in cooperative robot soccer tasks. We also compare the performance of policies produced by our method with a well-tuned non-learning-based behavior architecture from the annual RoboCup competition and find that our approach leads to a similar level of performance. Broadly we show that MARL can be use to train cooperative physical robot behaviors using highly abstract models of the world.
|
|
11:40-11:45, Paper TuBT10.6 | Add to My Program |
Graph-Based Decentralized Task Allocation for Multi-Robot Target Localization |
|
Peng, Juntong | Purdue University |
Viswanath, Hrishikesh | Purdue University |
Bera, Aniket | Purdue University |
Keywords: Machine Learning for Robot Control, Deep Learning Methods, Constrained Motion Planning
Abstract: We introduce a new graph neural operator-based approach for task allocation in a system of heterogeneous robots composed of Unmanned Ground Vehicles (UGVs) and Un- manned Aerial Vehicles (UAVs). The proposed model, GATAR, or Graph Attention Task AllocatoR aggregates information from neighbors in the multi-robot system, with the aim of achieving globally optimal target localization. Being decentral- ized, our method is highly robust and adaptable to situations where the number of robots and the number of tasks may change over time. We also propose a heterogeneity-aware preprocessing technique to model the heterogeneity of the system. The experimental results demonstrate the effectiveness and scalability of the proposed approach in a range of simulated scenarios generated by varying the number of UGVs and UAVs and the number and location of the targets. We show that a single model can handle a heterogeneous robot team with the number of robots ranging between 2 and 12, while outperforming the baseline architectures.
|
|
TuBT11 Regular Session, 314 |
Add to My Program |
Human-Robot Interaction 1 |
|
|
Chair: Del Bue, Alessio | Istituto Italiano Di Tecnologia |
Co-Chair: Choi, Sungjoon | Korea University |
|
11:15-11:20, Paper TuBT11.1 | Add to My Program |
Towards Embedding Dynamic Personas in Interactive Robots: Masquerading Animated Social Kinematic (MASK) |
|
Park, Jeongeun | Korea University |
Jeong, Taemoon | Korea University |
Kim, Hyeonseong | Korea University |
Byun, Taehyun | Korea University |
Shin, Seungyoun | Korea University |
Choi, Keunjun | Rainbow Robotics |
Kwon, Jaewoon | NAVER LABS |
Lee, Taeyoon | Boston Dynamics AI Institute |
Pan, Matthew | Queen's University |
Choi, Sungjoon | Korea University |
Keywords: Social HRI, Gesture, Posture and Facial Expressions, Design and Human Factors
Abstract: This paper presents the design and development of an innovative interactive robotic system to enhance audience engagement using character-like personas. Built upon the foundations of persona-driven dialog agents, this work extends the agent’s application to the physical realm, employing robots to provide a more captivating and interactive experience. The proposed system, named the Masquerading Animated Social Kinematic (MASK), leverages an anthropomorphic robot which interacts with guests using non-verbal interactions, including facial expressions and gestures. A behavior generation system based upon a finite-state machine structure effectively conditions robotic behavior to convey distinct personas. The MASK framework integrates a perception engine, a behavior selection engine, and a comprehensive action library to enable real-time, dynamic interactions with minimal human intervention in behavior design. Throughout the user subject studies, we examined whether the users could recognize the intended character in both personality- and film-character-based persona conditions. We conclude by discussing the role of personas in interactive agents and the factors to consider for creating an engaging user experience.
|
|
11:20-11:25, Paper TuBT11.2 | Add to My Program |
Simultaneous Dialogue Services Using Multiple Semiautonomous Robots in Multiple Locations by a Single Operator: A Field Trial on Souvenir Recommendation |
|
Sakai, Kazuki | Osaka University |
Kawata, Megumi | Osaka University |
Meneses, Alexis | Osaka University |
Ishiguro, Hiroshi | Osaka University |
Yoshikawa, Yuichiro | Osaka University |
Keywords: Social HRI, Human-Robot Collaboration
Abstract: Recently, systems have emerged enabling a single operator to engage with users across multiple locations simultaneously. However, under such systems, a potential challenge exists where the operator, upon switching locations, may need to join ongoing conversations without a complete understanding of their history. Consequently, a seamless transition and the development of high-quality conversations may be impeded. This study directs its attention to the utilization of multiple robots, aiming to create a semiautonomous teleoperation system. This system enables an operator to switch between twin robots at each location as needed, thereby facilitating the provision of higher-quality dialogue services simultaneously. As an initial phase, a field experiment was conducted to assess user satisfaction with recommendations made by the operator using twin robots. Results collected from 391 participants over 13 days revealed heightened user satisfaction when the operator intervened and provided recommendations through multiple robots compared with autonomous recommendations by the robots. These findings contribute significantly to the formulation of a teleoperation system that allows a single operator to deliver multipoint conversational services.
|
|
11:25-11:30, Paper TuBT11.3 | Add to My Program |
Safety and Naturalness Perceptions of Robot-To-Human Handovers Performed by Data-Driven Robotic Mimicry of Human Givers |
|
Megyeri, Ava | Wright State University |
Wiederhold, Noah | Clarkson University |
Liu, Yu | Clarkson University |
Banerjee, Sean | Wright State Univeristy |
Banerjee, Natasha Kholgade | Wright State University |
Keywords: Human-Centered Robotics, Human-Robot Collaboration, Physical Human-Robot Interaction
Abstract: We study human perceptions of a robot that performs robot-to-human (R2H) handovers controlled to grasp, transport, and transfer 34 objects by mimicking human givers in human-human (H2H) handover data. Recognizing the importance of human-like robotic behavior for successful collaboration, R2H studies use models of human behavior or observations of H2H data to plan robot giver motion. However, R2H studies have been limited in object counts. In this work, we use the Human-Object-Human (HOH) dataset, consisting of H2H interactions performed by 20 giver-receiver pairs with 136 objects, to conduct an R2H study with 34 objects. We teleoperate a Kinova Gen3 manipulator to grip an object as grasped by an HOH human giver, and program it to automatically transport and orient the object to a participant by mimicking the HOH giver's trajectory and transfer pose. We survey participants on safety, naturalness, and preferred choice over linear trajectory and random orientation baselines. We find that transfer pose influences perceptions of naturalness, with HOH poses showing higher naturalness ratings. Participants prefer handovers with HOH end poses when asked to pick their preferred interaction.
|
|
11:30-11:35, Paper TuBT11.4 | Add to My Program |
Integrating Human-Robot Teaming Dynamics into Mission Planning Tools for Transparent Tactics in Multi-Robot Human Integrated Teams |
|
Aldridge, Audrey L. | Mississippi State University |
Errico, Tyler | United States Military Academy |
Morrell, Mitchell | United States Military Academy |
Bethel, Cindy L. | Mississippi State University |
James, John | United States Military Academy |
Chewar, Christa | United States Military Academy |
Novitzky, Michael | United States Military Academy |
Keywords: Human-Robot Teaming, Integrated Planning and Control, Human-Robot Collaboration
Abstract: This research aims to demonstrate how integrating human-robot teaming dynamics into mission planning tools impacts the abilities of robot operators as they coordinate multiple robot agents during a mission. This was investigated in a pilot study using two inter-robot collaboration modalities and interface tools, which required different human-robot interaction techniques to execute a mission with a team of four robots. In the first modality, the operator manually inserted waypoints for each robot, as they acted as individual agents. In the second modality, the operator used the Planning Execution to After-Action Review (PETAAR) toolset to plot a single waypoint for the team of robots, as the robots coordinated their movement as a group. One novel component of this study is the investigation of how human-robot teaming dynamics and the PETAAR toolset impacted robot operators' real-time situation awareness and perceived cognitive load as well as team performance. Although the teaming modalities differed greatly with respect to the level of operator input needed, the time required to complete the simulation, the participant’s perceived cognitive load, and interface usability were very similar for both modalities. In contrast, the results revealed statistically significant differences between the two teaming modalities related to participants’ abilities to maintain a wedge formation while remaining situationally aware. Results from this work will be used to guide development of PETAAR along with the design of future studies investigating more complex teaming scenarios and for creating a baseline for comparing future results.
|
|
11:35-11:40, Paper TuBT11.5 | Add to My Program |
XBG: End-To-End Imitation Learning for Autonomous Behaviour in Human-Robot Interaction and Collaboration |
|
Cardenas Perez, Carlos Andres | Italian Institute of Technology |
Romualdi, Giulio | Istituto Italiano Di Tecnologia |
Elobaid, Mohamed | Fondazione Istituto Italiano Di Tecnologia |
Dafarra, Stefano | Istituto Italiano Di Tecnologia |
L'Erario, Giuseppe | Istituto Italiano Di Tecnologia |
Traversaro, Silvio | Istituto Italiano Di Tecnologia |
Morerio, Pietro | Istituto Italiano Di Tecnologia |
Del Bue, Alessio | Istituto Italiano Di Tecnologia |
Pucci, Daniele | Italian Institute of Technology |
Keywords: AI-Enabled Robotics, Imitation Learning, Humanoid Robot Systems
Abstract: This paper presents XBG (eXteroceptive Behaviour Generation), a multimodal end-to-end Imitation Learning (IL) system for a whole-body autonomous humanoid robot used in real-world Human-Robot Interaction (HRI) scenarios. The main contribution is an architecture for learning HRI behaviours using a data-driven approach. A diverse dataset is collected via teleoperation, covering multiple HRI scenarios, such as handshaking, handwaving, payload reception, walking, and walking with a payload. After synchronizing, filtering, and transforming the data, Deep Neural Networks (DNN) are trained, integrating exteroceptive and proprioceptive information to help the robot understand both its environment and its actions. The robot takes in sequences of images (RGB and depth) and joints state information to react accordingly. By fusing multimodal signals over time, the model enables autonomous capabilities in a robotic platform. The models are evaluated based on the success rates in the mentioned HRI scenarios and they are deployed on the ergoCub humanoid robot. XBG achieves success rates between 60% and 100% even when tested in unseen environments.
|
|
TuBT12 Regular Session, 315 |
Add to My Program |
Calibration 2 |
|
|
Chair: Yip, Michael C. | University of California, San Diego |
Co-Chair: Hwang, Hyoseok | Kyung Hee University |
|
11:15-11:20, Paper TuBT12.1 | Add to My Program |
CoL3D: Collaborative Learning of Single-View Depth and Camera Intrinsics for Metric 3D Shape Recovery |
|
Zhang, Chenghao | Alibaba Cloud |
Fan, Lubin | Alibaba Cloud |
Cao, Shen | Alibaba Cloud |
Wu, Bojian | Independent Researcher |
Ye, Jieping | Alibaba Cloud |
Keywords: Deep Learning for Visual Perception, RGB-D Perception, Calibration and Identification
Abstract: Recovering the metric 3D shape from a single image is particularly relevant for robotics and embodied intelligence applications, where accurate spatial understanding is crucial for navigation and interaction with environments. Usually, the mainstream approaches achieve it through monocular depth estimation. However, without camera intrinsics, the 3D metric shape can not be recovered from depth alone. In this study, we theoretically demonstrate that depth serves as a 3D prior constraint for estimating camera intrinsics and uncover the reciprocal relations between these two elements. Motivated by this, we propose a collaborative learning framework for jointly estimating depth and camera intrinsics, named CoL3D, to learn metric 3D shapes from single images. Specifically, CoL3D adopts a unified network and performs collaborative optimization at three levels: depth, camera intrinsics, and 3D point clouds. For camera intrinsics, we design a canonical incidence field mechanism as a prior that enables the model to learn the residual incident field for enhanced calibration. Additionally, we incorporate a shape similarity measurement loss in the point cloud space, which improves the quality of 3D shapes essential for robotic applications. As a result, when training and testing on a single dataset with in-domain settings, CoL3D delivers outstanding performance in both depth estimation and camera calibration across several indoor and outdoor benchmark datasets, which leads to remarkable 3D shape quality for the perception capabilities of robots.
|
|
11:20-11:25, Paper TuBT12.2 | Add to My Program |
Non-Destructive 3D Root Structure Modeling |
|
Lu, Guoyu | University of Georgia |
Keywords: Deep Learning for Visual Perception, Visual Learning, Sensor Fusion
Abstract: Deep neural networks (DNNs) have gained significant attention in 3D object reconstruction. However, detecting and reconstructing hidden or buried objects underground remains a challenging task. Ground Penetrating Radar (GPR) has emerged as a cost-effective and non-destructive technology for subsurface object detection, including soil structures, pipelines, and plant roots. In this study, we present a deep convolutional neural network-based method for detecting target signals and performing curve parameter regression using multiple B-scans from GPR data. By leveraging the detection and regression outcomes, we further generate fitted curves that represent underground structures. To reconstruct a comprehensive and detailed 3D root structure, we design a shape reconstruction network that takes sparse sliced 3D points as input. The proposed approach is extensively trained and validated using synthetic 3D root datasets and simulated GPR data generated with gprMax. Additionally, the trained model demonstrates strong generalization capabilities when applied to real-world GPR data, ensuring its practical applicability.
|
|
11:25-11:30, Paper TuBT12.3 | Add to My Program |
PTZ-Calib: Robust Pan-Tilt-Zoom Camera Calibration |
|
Guo, Jinhui | Alibaba Cloud |
Fan, Lubin | Alibaba Cloud |
Wu, Bojian | Independent Researcher |
Gu, Jiaqi | Alibaba Cloud |
Cao, Shen | Alibaba Cloud |
Ye, Jieping | Alibaba Cloud |
Keywords: Surveillance Robotic Systems, Calibration and Identification, SLAM
Abstract: In this paper, we present PTZ-Calib, a robust two-stage PTZ camera calibration method, that efficiently and accurately estimates camera parameters for arbitrary viewpoints. Our method includes an offline and an online stage. In the offline stage, we first uniformly select a set of reference images that sufficiently overlap to encompass a complete 360° view. We then utilize the novel PTZ-IBA (PTZ Incremental Bundle Adjustment) algorithm to automatically calibrate the cameras within a local coordinate system. Additionally, for practical application, we can further optimize camera parameters and align them with the geographic coordinate system using extra global reference 3D information. In the online stage, we formulate the calibration of any new viewpoints as a relocalization problem. Our approach balances the accuracy and computational efficiency to meet real-world demands. Extensive evaluations demonstrate our robustness and superior performance over state-of-the-art methods on various real and synthetic datasets.
|
|
11:30-11:35, Paper TuBT12.4 | Add to My Program |
CtRNet-X: Camera-To-Robot Pose Estimation in Real-World Conditions Using a Single Camera |
|
Lu, Jingpei | University of California San Diego |
Liang, Zekai | Univeristy of California, San Diego |
Xie, Tristin | University of California San Diego |
Richter, Florian | University of California, San Diego |
Lin, Shan | Arizona State University |
Liu, Sainan | Intel |
Yip, Michael C. | University of California, San Diego |
Keywords: Visual Tracking, Perception for Grasping and Manipulation, Computer Vision for Automation
Abstract: Camera-to-robot calibration is crucial for vision-based robot control and requires effort to make it accurate. Recent advancements in markerless pose estimation methods have eliminated the need for time-consuming physical setups for camera-to-robot calibration. While the existing markerless pose estimation methods have demonstrated impressive accuracy without the need for cumbersome setups, they rely on the assumption that all the robot joints are visible within the camera's field of view. However, in practice robots usually move in and out of view and some portion of the robot may stay out-of-frame during the whole manipulation task due to real-world constraints, leading to a lack of sufficient visual features and subsequent failure of these approaches. To address this challenge and enhance the applicability to vision-based robot control, we propose a novel framework capable of estimating the robot pose with partially visible robot manipulators. Our approach leverages the Vision-Language Models for fine-grained robot components detection, and integrates it into a keypoint-based pose estimation network, which enables more robust performance in varied operational conditions. The framework is evaluated on both public robot datasets and self-collected partial-view datasets to demonstrate our robustness and generalizability. As a result, this method is effective for robot pose estimation in a wider range of real-world manipulation scenarios.
|
|
11:35-11:40, Paper TuBT12.5 | Add to My Program |
Foundation Feature-Driven Online End-Effector Pose Estimation: A Marker-Free and Learning-Free Approach |
|
Wu, Tianshu | Peking University |
Zhang, Jiyao | Peking University |
Liang, Sheldon | Carnegie Mellon University, Peking University |
Han, Zhengxiao | Northwestern University |
Dong, Hao | Peking University |
Keywords: Visual Servoing, Calibration and Identification, Visual Tracking
Abstract: Accurate transformation estimation between camera space and robot space is essential. Traditional methods using markers for hand-eye calibration require offline image collection, limiting their suitability for online self-calibration. Recent learning-based robot pose estimation methods, while advancing online calibration, struggle with cross-robot generalization and require the robot to be fully visible. This work proposes a Foundation feature-driven online End-Effector Pose Estimation (FEEPE) algorithm, characterized by its training-free and cross-end-effector generalization capabilities. Inspired by the zero-shot generalization capabilities of foundation models, FEEPE leverages pre-trained visual features to estimate 2D-3D correspondences derived from CAD models and reference images, enabling 6D pose estimation via the PnP algorithm. To resolve ambiguities from partial observations and symmetry, a multihistorical key frame enhanced pose optimization algorithm is introduced, utilizing temporal information for improved accuracy. Compared to traditional hand-eye calibration, FEEPE enables marker-free online calibration. Unlike robot pose estimation, it generalizes across robots and end-effectors in a training-free manner. Extensive experiments demonstrate its superior flexibility, generalization, and performance.
|
|
11:40-11:45, Paper TuBT12.6 | Add to My Program |
Camera-LiDAR Extrinsic Calibration Using Constrained Optimization with Circle Placement |
|
Kim, Daeho | Kyung Hee University |
Shin, Seunghui | Kyung Hee University |
Hwang, Hyoseok | Kyung Hee University |
Keywords: Calibration and Identification, Sensor Fusion, Intelligent Transportation Systems
Abstract: Monocular camera-LiDAR data fusion has demonstrated remarkable environmental perception capabilities in various fields. The success of data fusion relies on the accurate matching of correspondence features from images and point clouds. In this letter, we propose a target-based Camera-LiDAR extrinsic calibration by matching correspondences in both data. Specifically, to extract accurate features from the point cloud, we propose a novel method that estimates the circle centers by optimizing the probability distribution from the initial position. This optimization involves generating the probability distribution of circle centers from circle edge points and using the Lagrangian multiplier method to estimate the optimal positions of the circle centers. We conduct two types of experiments: simulations for quantitative results and real system eval uations for qualitative assessment. Our method demonstrates a 21% improvement in simulation calibration performance for 20 target poses with LiDAR noise of 0.03 m compared to existing methods, and also shows high visual quality in reprojecting point cloud onto images in real-world scenarios. Codes are available at https://github.com/AIRLABkhu/SquareCalib.
|
|
TuBT13 Regular Session, 316 |
Add to My Program |
Assistive Robotics 2 |
|
|
Chair: Gregg, Robert D. | University of Michigan |
Co-Chair: Ha, Sehoon | Georgia Institute of Technology |
|
11:15-11:20, Paper TuBT13.1 | Add to My Program |
A Modeling and Control Strategy for the Gaze-Guided Teleoperation of Robotic Manipulators Via Smart Glasses |
|
Lawson, Andrew | University of North Carolina Wilmington |
Saeidi, Hamed | University of North Carolina Wilmington |
Keywords: Human-Centered Robotics, Human Performance Augmentation, Telerobotics and Teleoperation
Abstract: Object manipulation is a high-frequency task required in assistive robotic systems in order to aid the elderly or those with disabilities that impact motor control. In the instance where arms cannot be used to command a robot, gaze-tracking via smart glasses is a suitable candidate. In this work, we develop a modeling method and model-based filtering and control strategy for direct gaze-guided teleoperation of robotic manipulators. We demonstrate the feasibility of this control strategy in an object manipulation case study with six participants. The results indicate that a model-based gaze filtering and control strategy produces smooth commands for the robot that are easy for the participants to use. These methods can reduce the perceived workload of the user by 37.51% and lower the gripper positioning error by 39.09% compared to using unfiltered gaze data.
|
|
11:20-11:25, Paper TuBT13.2 | Add to My Program |
Unlocking Potential: Gaze-Based Interfaces in Assistive Robotics for Users with Severe Speech and Motor Impairment |
|
Vishwakarma, Himanshu | Indian Institute of Science |
Mitra, Mukund | IISc Bangalore |
Vinay Krishna Sharma, Vinay Krishna | Indian Institute of Science |
Sulthan, Jabeen | IIT Kanpur |
Atulkar, Aniruddha | Indian Institute of Science Bangalore |
Bhathad, Dinesh | Indiana University |
Biswas, Pradipta | Indian Institute of Science |
Keywords: Virtual Reality and Interfaces, Human-Robot Collaboration, Product Design, Development and Prototyping
Abstract: Individuals with Severe Speech and Motor Impairment (SSMI) struggle to interact with their surroundings due to physical and communicative limitations. To address these challenges, this paper presents a gaze-controlled robotic system that helps SSMI users perform stamp printing tasks. The system includes gaze-controlled interfaces and a robotic arm with a gripper, designed specifically for SSMI users to enhance accessibility and interaction. User studies with gaze-controlled interfaces such as video see-through (VST), video pass-through (VPT), and optical see-through (OST) displays demonstrated the system's effectiveness. Results showed that VST had the average stamping time of 28.45 s (SD = 15.44 s) and the average stamp count 7.36 (SD = 3.83), outperforming VPT and OST.
|
|
11:25-11:30, Paper TuBT13.3 | Add to My Program |
Do Looks Matter? Exploring Functional and Aesthetic Design Preferences for a Robotic Guide Dog |
|
Cohav, Aviv | Georgia Institute of Technology |
Gong, Xinran | Georgia Institute of Technology |
Kim, Joanne Taery | Georgia Institute of Technology |
Zeagler, Clint | Georgia Tech |
Ha, Sehoon | Georgia Institute of Technology |
Walker, Bruce | Georgia Tech |
Keywords: Design and Human Factors, Human-Centered Robotics, Physically Assistive Devices
Abstract: Dog guides offer an effective mobility solution for blind or visually impaired (BVI) individuals, but conventional dog guides have limitations including the need for care, potential distractions, societal prejudice, high costs, and limited availability. To address these challenges, we seek to develop a robot dog guide capable of performing the tasks of a conventional dog guide, enhanced with additional features. In this work, we focus on design research to identify functional and aesthetic design concepts to implement into a quadrupedal robot. The aesthetic design remains relevant even for BVI users due to their sensitivity toward societal perceptions and the need for smooth integration into society. We collected data through interviews and surveys to answer specific design questions pertaining to the appearance, texture, features, and method of controlling and communicating with the robot. Our study identified essential and preferred features for a future robot dog guide, which are supported by relevant statistics aligning with each suggestion. These findings will inform the future development of user-centered designs to effectively meet the needs of BVI individuals.
|
|
11:30-11:35, Paper TuBT13.4 | Add to My Program |
Comparison of Three Interface Approaches for Gaze Control of Assistive Robots for Individuals with Tetraplegia |
|
Nunez Sardinha, Emanuel | Bristol Robotics Lab, University of the West of England |
Zook, Nancy | University of the West of England |
Ruiz Garate, Virginia | University of Mondragon |
Western, David | University of Bristol |
Munera, Marcela | University of West England |
Keywords: Physically Assistive Devices, Grasping, Telerobotics and Teleoperation
Abstract: Individuals with tetraplegia have their independence and quality of life severely affected. Assistive robotic arms can enhance their autonomy, but effective control interfaces are essential for optimizing their usability and performance. This study aims to evaluate the performance and user experience of three control interfaces for an assistive robotic arm: Graphical User Interfaces (GUI), Embedded Interface, and Directional Gaze. Thirty-three able-bodied participants were recruited to control an assistive robotic arm through the three different interfaces in a between-subjects experiment. Performance was measured using the Yale-CMU-Berkeley (YCB) Block Pick and Place Protocol. Usability (SUS) and task, workload (NASA-TLX) were measured through subjective questionnaires. Additionally, we report saccades per minute and fixation duration. The results revealed statistically significant differences showing that Embedded and GUI interfaces, when compared to the Directional Gaze interface, can lead to lower workloads and higher performance in pick-up tasks.
|
|
11:35-11:40, Paper TuBT13.5 | Add to My Program |
A Laser-Guided Interaction Interface for Providing Effective Robot Assistance to People with Upper Limbs Impairments |
|
Torielli, Davide | Humanoids and Human Centered Mechatronics (HHCM), Istituto Itali |
Bertoni, Liana | Italian Institute of Technology |
Muratore, Luca | Istituto Italiano Di Tecnologia |
Tsagarakis, Nikos | Istituto Italiano Di Tecnologia |
Keywords: Physically Assistive Devices, Human-Robot Collaboration, Visual Servoing
Abstract: Robotics has shown significant potential in assisting people with disabilities to enhance their independence and involvement in daily activities. Indeed, a societal long-term impact is expected in home-care assistance with the deployment of intelligent robotic interfaces. This work presents a human-robot interface developed to help people with upper limbs impairments, such as those affected by stroke injuries, in activities of everyday life. The proposed interface leverages on a visual servoing guidance component, which utilizes an inexpensive but effective laser emitter device. By projecting the laser on a surface within the workspace of the robot, the user is able to guide the robotic manipulator to desired locations, to reach, grasp and manipulate objects. Considering the targeted users, the laser emitter is worn on the head, enabling to intuitively control the robot motions with head movements that point the laser in the environment, which projection is detected with a neural network based perception module. The interface implements two control modalities: the first allows the user to select specific locations directly, commanding the robot to reach those points; the second employs a paper keyboard with buttons that can be virtually pressed by pointing the laser at them. These buttons enable a more direct control of the Cartesian velocity of the end-effector and provides additional functionalities such as commanding the action of the gripper. The proposed interface is evaluated in a series of manipulation tasks involving a 6DOF assistive robot manipulator equipped with 1DOF beak-like gripper. The two interface modalities are combined to successfully accomplish tasks requiring bimanual capacity that is usually affected in people with upper limbs impairments.
|
|
TuBT14 Regular Session, 402 |
Add to My Program |
Wearable Robotics 1 |
|
|
Chair: Masia, Lorenzo | Technische Universität München (TUM) |
Co-Chair: Zhang, Haohan | University of Utah |
|
11:15-11:20, Paper TuBT14.1 | Add to My Program |
Gravity Compensation Method for Whole Body-Mounted Robot with Contact Force Distribution Sensor |
|
Masaoka, Shinichi | Nagoya University |
Funabora, Yuki | Nagoya University |
Doki, Shinji | Nagoya University |
Keywords: Wearable Robotics, Force Control, Physically Assistive Devices
Abstract: The emergence of sheet-type force distribution sensors has allowed direct measurement of contact force. We developed a wearable assistive robot that can directly measure contact force and investigated the gravity compensation effect of contact-force-based control. For conventional robots that do not measure the force acting between the robot and the human body (contact force) directly, a precise robot model is required for gravity compensation, which is difficult to implement in software. In the first experiment, we examined a method of gravity compensation using only joint sensors in torque-based control, which is a common conventional method, and assessed the difficulty of this method. In the next experiment, which involved one healthy subject, we confirmed that contact-force-based control has a significant gravity compensation effect without requiring a rigorous robot model. Experiments with two additional healthy subjects using the same parameters revealed that even rough parameter tuning can produce a gravity compensation effect. This study not only proposes a simplified gravity compensator for wearable assistive robots but also demonstrates the robustness of parameter tuning in contact-force-based control under static conditions. Based on the findings of this study, we will further study the possibility of other kinds of disturbance compensation and dynamic conditions in the future.
|
|
11:20-11:25, Paper TuBT14.2 | Add to My Program |
Unsupervised Domain Adaptation for Gait State Estimation |
|
Medrano, Roberto | University of Michigan |
Thomas, Gray | Texas A&M University |
Rouse, Elliott | University of Michigan |
Keywords: Prosthetics and Exoskeletons, Machine Learning for Robot Control, Sensor-based Control
Abstract: Exoskeleton controllers have recently employed machine learning (ML) techniques to provide appropriate assistance throughout the terrains of the real world. One successful approach has been to learn a mapping between an exoskeleton wearer's kinematic measurements and a gait state vector that encodes how the wearer is currently walking (i.e. gait phase, speed), and then dynamically update the assistance based on the gait state. However, these methods require paired datasets of input kinematics to output gait states, which usually involves manual, time-consuming labeling of data from participants wearing specific exoskeletons and thus limits the scalability of these ML methods. A prior solution to this challenge---leveraging large pre-labeled datasets of normative human walking---introduces another problem, in that networks trained on these datasets learn only normative locomotion patterns, and thus may deteriorate when the data are changed by wearing the exoskeleton itself. In this context, we present an unsupervised-learning-based approach to both bypass the requirement of labeled data for gait state prediction and address the difficulty of domain adaptation from normative to exoskeleton-assisted walking. We validate our method in a set of walking simulations that featured exoskeleton data from 14 participants. This model showed significant improvements in state estimation relative to a model trained solely on pre-labeled normative walking, while also not requiring ground truth labels. This work presents a foundation that demonstrates labeled, device-specific data may not be required for predicting walking behavior in real time.
|
|
11:25-11:30, Paper TuBT14.3 | Add to My Program |
Anti-Sensing: Defense against Unauthorized Radar-Based Human Vital Sign Sensing with Physically Realizable Wearable Oscillators |
|
Tasnim Oshim, Md Farhan | University of Massachusetts Amherst |
Doering, Nigel | University of California San Diego |
Islam, Bashima | Worcester Polytechnic Institute |
Weng, Tsui-Wei | UCSD |
Rahman, Tauhidur | University of California San Diego |
Keywords: Wearable Robotics, Physically Assistive Devices, Human-Centered Robotics
Abstract: Recent advancements in Ultra-Wideband (UWB) radar technology have enabled contactless, non-line-of-sight vital sign monitoring, making it a valuable tool for healthcare. However, UWB radar's ability to capture sensitive physiological data, even through walls, raises significant privacy concerns, particularly in human-robot interactions and autonomous systems that rely on radar for sensing human presence and physiological functions. In this paper, we present Anti-Sensing, a novel defense mechanism designed to prevent unauthorized radar-based sensing. Our approach introduces physically realizable perturbations, such as oscillatory motion from wearable devices, to disrupt radar sensing by mimicking natural cardiac motion, thereby misleading heart rate (HR) estimations. We develop a gradient-based algorithm to optimize the frequency and spatial amplitude of these oscillations for maximal disruption while ensuring physiological plausibility. Through both simulations and real-world experiments with radar data and neural network-based HR sensing models, we demonstrate the effectiveness of Anti-Sensing in significantly degrading model accuracy, offering a practical solution for privacy preservation.
|
|
11:30-11:35, Paper TuBT14.4 | Add to My Program |
Vision-Based Fuzzy Control System with Intention Detection for Smart Walkers: Enhancing Usability for Stroke Survivors with Unilateral Upper Limb Impairments |
|
Abdollah Chalaki, Mahdi | University of Alberta |
Zakerimanesh, Amir | University of Alberta |
Soleymani, Abed | University of Alberta |
Mushahwar, Vivian K. | University of Alberta |
Tavakoli, Mahdi | University of Alberta |
Keywords: Rehabilitation Robotics, Medical Robots and Systems, Physical Human-Robot Interaction
Abstract: Mobility impairments, particularly those caused by stroke-induced hemiparesis, significantly impact independence and quality of life. Current smart walker controllers operate by using input forces from the user to control linear motion and input torques to dictate rotational movement; however, because they predominantly rely on user-applied torque exerted on the device handle as an indicator of user intent to turn, they fail to adequately accommodate users with unilateral upper limb impairments. This leads to increased physical strain and cognitive load. This paper introduces a novel smart walker equipped with a fuzzy control algorithm that leverages shoulder abduction angles to intuitively interpret user intentions using just one functional hand. By integrating a force sensor and stereo camera, the system enhances walker responsiveness, usability, and safety. Experimental evaluations with five participants demonstrated that the fuzzy controller significantly reduced wrist torque and improved user comfort compared to traditional admittance controllers. Results confirmed a strong correlation between shoulder abduction angles and directional intent, with users reporting decreased effort and enhanced ease of use. This study contributes to assistive robotics by providing an adaptable control mechanism for smart walkers, suggesting a pathway towards enhancing mobility and independence for individuals with mobility impairments. Project page: https://tbs-ualberta.github.io/fuzzy-sw/
|
|
11:35-11:40, Paper TuBT14.5 | Add to My Program |
A Lower Limb Wearable Exosuit for Improved Sitting, Standing, and Walking Efficiency |
|
Zhang, Xiaohui | Heidelberg University |
Tricomi, Enrica | Heidelberg University |
Ma, Xunju | Beijing Institute of Technology |
Gomez-Correa, Manuela | Instituto Politecnico Nacional |
Ciaramella, Alessandro | Università Di Pisa |
Missiroli, Francesco | Heidelberg University |
Miskovic, Luka | Jožef Stefan Institute |
Su, Huimin | Heidelberg University |
Masia, Lorenzo | Technische Universität München (TUM) |
Keywords: Wearable Robots, Modeling, Control, and Learning for Soft Robots, Human Performance Augmentation, Adaptive Lower Limb Assistance Control
Abstract: Sitting, standing, and walking are fundamental activities crucial for maintaining independence in daily life. However, aging or lower limb injuries can impede these activities, posing obstacles to individuals' autonomy. In response to this challenge, we developed the LM-Ease, a compact and soft wearable robot designed to provide hip assistance. Its purpose is to aid users in carrying out essential daily activities such as sitting, standing, and walking. The LM-Ease features a fully-actuated tendon-driven system that seamlessly transitions between assistance actuation profiles tailored for sitting, standing, and walking movements. This device provides the user with gravity support during stand-to-sit, and offers hip extension assistance pulling force during sit-to-stand and walking. Our preliminary results show that with the LM-Ease, healthy young adults (n = 8) had significantly lower muscle activation: average reduction of 15.6% during stand-to-sit and 17.8% during sit-to-stand. Furthermore, with LM-Ease, participants demonstrated a 12.7% reduction in metabolic cost during ground walking.
|
|
11:40-11:45, Paper TuBT14.6 | Add to My Program |
Towards Shape-Adaptive Attachment Design for Wearable Devices Using Granular Jamming |
|
Brignone, Joseph | University of Utah |
Lancaster, Logan | University of Utah |
Battaglia, Edoardo | University of Utah |
Zhang, Haohan | University of Utah |
Keywords: Wearable Robotics, Soft Robot Applications
Abstract: Attaching a wearable device to the user's body for comfort and function while accommodating the differences and changes in body shapes often represents a challenge. In this paper, we propose an approach that addresses this problem through granular jamming, where a granule-filled membrane stiffens by rapidly decreasing the internal air pressure (e.g., vacuum), causing the granule material to be jammed together due to friction. This structure was used to conform to complex shapes of the human body when it is in the soft state while switching to the rigid state for proper robot functions by jamming the granules via vacuum. We performed an experiment to systematically investigate the effect of multiple design parameters on the ability of such jamming-based interfaces to hold against a lateral force. Specifically, we developed a bench prototype where modular granular-jamming structures are attached to objects of different sizes and shapes via a downward suspension force. Our data showed that the use of jamming is necessary to increase the overall structure stability by 1.73 to 2.16 N. Furthermore, using three modules, high suspension force, and a low membrane infill (~25%) also contribute to high resistance to lateral force. Our results lay a foundation for future implementation of wearable attachments using granular-jamming structures.
|
|
TuBT15 Regular Session, 403 |
Add to My Program |
Robot Mapping 1 |
|
|
Chair: Mangelson, Joshua | Brigham Young University |
Co-Chair: Henderson, Thomas C. | University of Utah |
|
11:15-11:20, Paper TuBT15.1 | Add to My Program |
EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video |
|
Zhou, Zhen | Chinese Academy of Sciences Institute of Automation |
Ma, Yunkai | Institute of Automation, Chinese Academy of Sciences |
Fan, Junfeng | Institute of Automation, Chinese Academy of Sciences |
Zhang, Shaolin | Institute of Automation, Chinese Academy of Sciences |
Jing, Fengshui | Institute of Automation, CAS |
Tan, Min | Institute of Automation, Chinese Academy of Sciences |
Keywords: Mapping, Semantic Scene Understanding, Deep Learning for Visual Perception
Abstract: Panoptic 3D reconstruction from a monocular video is a fundamental perceptual task in robotic scene understanding. However, existing efforts suffer from inefficiency in terms of inference speed and accuracy, limiting their practical applicability. We present EPRecon, an efficient real-time panoptic 3D reconstruction framework. Current volumetric-based reconstruction methods usually utilize multi-view depth map fusion to obtain scene depth priors, which is time-consuming and poses challenges to real-time scene reconstruction. To address this issue, we propose a lightweight module to directly estimate scene depth priors in a 3D volume for reconstruction quality improvement by generating occupancy probabilities of all voxels. In addition, compared with existing panoptic segmentation methods, EPRecon extracts panoptic features from both voxel features and corresponding image features, obtaining more detailed and comprehensive instance-level semantic information and achieving more accurate segmentation results. Experimental results on the ScanNetV2 dataset demonstrate the superiority of EPRecon over current state-of-the-art methods in terms of both panoptic 3D reconstruction quality and real-time inference. Code is available at https://github.com/zhen6618/EPRecon.
|
|
11:20-11:25, Paper TuBT15.2 | Add to My Program |
Neural Surface Reconstruction and Rendering for LiDAR-Visual Systems |
|
Liu, Jianheng | The University of Hong Kong |
Zheng, Chunran | The University of Hong Kong |
Wan, YunFei | The University of Hong Kong |
Wang, Bowen | University of Hong Kong |
Cai, Yixi | KTH Royal Institute of Technology |
Zhang, Fu | University of Hong Kong |
Keywords: Mapping, Visual Learning, RGB-D Perception
Abstract: This paper presents a unified surface reconstruction and rendering framework for LiDAR-visual systems, integrating Neural Radiance Fields (NeRF) and Neural Distance Fields (NDF) to recover both appearance and structural information from posed images and point clouds. We address the structural visible gap between NeRF and NDF by utilizing a visible-aware occupancy map to classify space into the free, occupied, visible unknown, and background regions. This classification facilitates the recovery of a complete appearance and structure of the scene. We unify the training of the NDF and NeRF using a spatial-varying scale SDF-to-density transformation for levels of detail for both structure and appearance. The proposed method leverages the learned NDF for structure-aware NeRF training by an adaptive sphere tracing sampling strategy for accurate structure rendering. In return, NeRF further refines structural in recovering missing or fuzzy structures in the NDF. Extensive experiments demonstrate the superior quality and versatility of the proposed method across various scenarios. To benefit the community, the codes will be released at url{https://github.com/hku-mars/M2Mapping}.
|
|
11:25-11:30, Paper TuBT15.3 | Add to My Program |
LVBA: LiDAR-Visual Bundle Adjustment for RGB Point Cloud Mapping |
|
Li, Rundong | University of Hong Kong |
Liu, Xiyuan | The University of Hong Kong |
Li, Haotian | The University of Hong Kong |
Liu, Zheng | University of Hong Kong |
Lin, Jiarong | The University of Hong Kong |
Cai, Yixi | KTH Royal Institute of Technology |
Zhang, Fu | University of Hong Kong |
Keywords: Mapping, SLAM
Abstract: Point cloud maps with accurate color are crucial in robotics and mapping applications. Existing approaches for producing RGB-colorized maps are primarily based on real-time localization using filter-based estimation or sliding window optimization, which may lack accuracy and global consistency. In this work, we introduce a novel global LiDAR-Visual bundle adjustment (BA) named LVBA to improve the quality of RGB point cloud mapping beyond existing baselines. LVBA first optimizes LiDAR poses via a global LiDAR BA, followed by a photometric visual BA incorporating planar features from the LiDAR point cloud for camera pose optimization. Additionally, to address the challenge of map point occlusions in constructing optimization problems, we implement a novel LiDAR-assisted global visibility algorithm in LVBA. To evaluate the effectiveness of LVBA, we conducted extensive experiments by comparing its mapping quality against existing state-of-the-art baselines (i.e., R3LIVE and FAST-LIVO). Our results prove that LVBA can proficiently reconstruct high-fidelity, accurate RGB point cloud maps, outperforming its counterparts.
|
|
11:30-11:35, Paper TuBT15.4 | Add to My Program |
LiDAR-Enhanced 3D Gaussian Splatting Mapping |
|
Shen, Jian | WuHan University |
Yu, Huai | Wuhan University |
Wu, Ji | Wuhan University |
Yang, Wen | Wuhan University |
Xia, Gui-Song | Wuhan University |
Keywords: Mapping, SLAM
Abstract: This paper introduces LiGSM, a novel LiDAR-enhanced 3D Gaussian Splatting (3DGS) mapping framework that improves the accuracy and robustness of 3D scene mapping by integrating LiDAR data. LiGSM constructs joint loss from images and LiDAR point clouds to estimate the poses and optimize their extrinsic parameters, enabling dynamic adaptation to variations in sensor alignment. Furthermore, it leverages LiDAR point clouds to initialize 3DGS, providing a denser and more reliable starting points compared to sparse SfM points. In scene rendering, the framework augments standard image-based supervision with depth maps generated from LiDAR projections, ensuring an accurate scene representation in both geometry and photometry. Experiments on public and self-collected datasets demonstrate that LiGSM outperforms comparative methods in pose tracking and scene rendering.
|
|
11:35-11:40, Paper TuBT15.5 | Add to My Program |
Depth-Visual-Inertial (DVI) Mapping System for Robust Indoor 3D Reconstruction |
|
Hamesse, Charles | Royal Military Academy |
Vlaminck, Michiel | Ghent University |
Luong, Hiep | Ghent University |
Haelterman, Rob | Royal Military Academy |
Keywords: RGB-D Perception, Mapping, Search and Rescue Robots
Abstract: We propose the Depth-Visual-Inertial (DVI) Mapper: a robust multi-sensor fusion framework for dense 3D mapping using time-of-flight cameras equipped with RGB and IMU sensors. Inspired by recent developments in real-time LiDAR-based odometry and mapping, our system uses an error-state iterative Kalman filter for state estimation: it processes the inertial sensor's data for state propagation, followed by a state update first using visual-inertial odometry, then depth-based odometry. This sensor fusion scheme makes our system robust to degenerate scenarios (e.g. lack of visual or geometrical features, fast rotations) and to noisy sensor data, like those that can be obtained with off-the-shelf time-of-flight DVI sensors. For evaluation, we propose the new Bunker DVI Dataset, featuring data from multiple DVI sensors recorded in challenging conditions reflecting search-and-rescue operations. We show the superior robustness and precision of our method against previous work. Following the open science principle, we make both our source code and dataset publicly available.
|
|
11:40-11:45, Paper TuBT15.6 | Add to My Program |
MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors |
|
Du, Zhenhua | National University of Defense Technology |
Xu, Binbin | University of Toronto |
Zhang, Haoyu | National University of Defense Technology |
Huo, Kai | National University of Defense Technology |
Zhi, Shuaifeng | National University of Defense Technology |
Keywords: Semantic Scene Understanding, Representation Learning, Deep Learning for Visual Perception
Abstract: Accurately reconstructing dense and semantically annotated 3D meshes from monocular images remains a challenging task due to the lack of geometry guidance and imperfect view-dependent 2D priors. Though we have witnessed recent advancements in implicit neural scene representations enabling precise 2D rendering simply from multi-view images, there have been few works addressing 3D scene understanding with monocular priors alone. In this paper, we propose MOSE, a neural field semantic reconstruction approach to lift inferred image-level noisy priors to 3D, producing accurate semantics and geometry in both 3D and 2D space. The key motivation for our method is to leverage generic class-agnostic segment masks as guidance to promote local consistency of rendered semantics during training. With the help of semantics, we further apply a smoothness regularization to texture-less regions for better geometric quality, thus achieving mutual benefits of geometry and semantics. Experiments on the ScanNet dataset show that our MOSE outperforms relevant baselines across all metrics on tasks of 3D semantic segmentation, 2D semantic segmentation and 3D surface reconstruction.
|
|
TuBT16 Regular Session, 404 |
Add to My Program |
Manipulation 2 |
|
|
Chair: Zhou, Jianshu | University of California, Berkeley |
Co-Chair: Tiomkin, Stas | Texas Tech University |
|
11:15-11:20, Paper TuBT16.1 | Add to My Program |
Pushing in the Dark: A Reactive Pushing Strategy for Mobile Robots Using Tactile Feedback |
|
Ozdamar, Idil | HRI2 Lab., Istituto Italiano Di Tecnologia. Dept. of Informatics |
Sirintuna, Doganay | HRI2 Lab., Istituto Italiano Di Tecnologia. Dept. of Informatics |
Arbaud, Robin | HRI2 Lab., Istituto Italiano Di Tecnologia ; Dept. of Informatic |
Ajoudani, Arash | Istituto Italiano Di Tecnologia |
Keywords: Mobile Manipulation, Manipulation Planning
Abstract: For mobile robots, navigating cluttered or dynamic environments often necessitates non-prehensile manipulation, particularly when faced with objects that are too large, irregular, or fragile to grasp. The unpredictable behavior and varying physical properties of these objects significantly complicate manipulation tasks. To address this challenge, this manuscript proposes a novel Reactive Pushing Strategy. This strategy allows a mobile robot to dynamically adjust its base movements in real-time to achieve successful pushing maneuvers towards a target location. Notably, our strategy adapts the robot motion based on changes in contact location obtained through the tactile sensor covering the base, avoiding dependence on object-related assumptions and its modeled behavior. The effectiveness of the Reactive Pushing Strategy was initially evaluated in the simulation environment, where it significantly outperformed the compared baseline approaches. Following this, we validated the proposed strategy through real-world experiments, demonstrating the robot capability to push objects to the target points located in the entire vicinity of the robot. In both simulation and real-world experiments, the object-specific properties (shape, mass, friction, inertia) were altered along with the changes in target locations to assess the robustness of the proposed method comprehensively.
|
|
11:20-11:25, Paper TuBT16.2 | Add to My Program |
Foresee and Act Ahead: Task Prediction and Pre-Scheduling Enabled Efficient Robotic Warehousing |
|
Cao, Bo | MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai J |
Liu, Zhe | Shanghai Jiao Tong University |
Han, Xingyao | Shanghai Jiao Tong University |
Zhou, Shunbo | Huawei |
Zhang, Heng | Huawei |
Han, Lijun | Shanghai Jiao Tong University |
Wang, Lin | Shanghai Jiao Tong University |
Wang, Hesheng | Shanghai Jiao Tong University |
Keywords: Manipulation Planning, Intelligent Transportation Systems
Abstract: In warehousing systems, to enhance efficiency amid surging demand volumes, much attention has been placed on how to reasonably allocate tasks of delivery to robots. However, the labor of robots is still inevitably wasted to some extent. In this paper, we propose a pre-scheduling enhanced warehousing framework aiming to foresee and act in advance, which consists of task flow prediction and hybrid task allocation. For task prediction, we design the spatio-temporal representations of the task flow and introduce a periodicity-decoupled mechanism tailored for the generation patterns of aggregated orders, and then further extract spatial features of task distribution with a novel combination of graph structures. In hybrid tasks allocation, we consider the known tasks and predicted future tasks simultaneously to optimize the task allocation. In addition, we consider factors such as predicted task uncertainty and sector-level efficiency to realize more balanced and rational allocations. We validate our task prediction model across datasets derived from factories, achieving SOTA performance. Furthermore, we implement our system in a real-world robotic warehouse, demonstrating more than 30% improvements in efficiency.
|
|
11:25-11:30, Paper TuBT16.3 | Add to My Program |
Embodiment-Agnostic Action Planning Via Object-Part Scene Flow |
|
Tang, Weiliang | The Chinese University of Hong Kong |
Pan, Jia-Hui | The Chinese University of Hong Kong |
Zhan, Wei | Univeristy of California, Berkeley |
Zhou, Jianshu | University of California, Berkeley |
Yao, Huaxiu | UNC-Chapel Hill |
Liu, Yunhui | Chinese University of Hong Kong |
Tomizuka, Masayoshi | University of California |
Ding, Mingyu | UC Berkeley |
Fu, Chi-Wing | The Chinese University of Hong Kong |
Keywords: AI-Based Methods, Deep Learning in Grasping and Manipulation, Manipulation Planning
Abstract: Observing that the key for robotic action planning is to understand the target-object motion when its associated part is manipulated by the end effector, we propose to generate the 3D object-part scene flow and extract its transformations to solve the action trajectories for diverse embodiments. The advantage of our approach is that it derives the robot action explicitly from object motion prediction, yielding a more robust policy by understanding the object motions. Also, beyond policies trained on embodiment-centric data, our method is embodiment-agnostic, generalizable across diverse embodiments, and being able to learn from human demonstrations. Our method comprises three components: an object-part predictor to locate the part for the end effector to manipulate, an RGBD video generator to predict future RGBD videos, and a trajectory planner to extract embodiment-agnostic transformation sequences and solve the trajectory for diverse embodiments. Trained on videos even without trajectory data, our method still outperforms existing works significantly by 27.7% and 26.2% on the prevailing virtual environments MetaWorld and Franka-Kitchen, respectively. Furthermore, we conducted real-world experiments, showing that our policy, trained only with human demonstration, can be deployed to various embodiments.
|
|
11:30-11:35, Paper TuBT16.4 | Add to My Program |
Acoustic Wave Manipulation through Sparse Robotic Actuation |
|
Shah, Tristan | San Jose State University |
Smilovich, Noam | San Jose State University |
Amirkulova, Feruza | San Jose State University |
Gerges, Samer | San Jose State University |
Tiomkin, Stas | Texas Tech University |
Keywords: Manipulation Planning, Model Learning for Control
Abstract: Recent advancements in robotics, control, and machine learning have facilitated progress in the challenging area of object manipulation. These advancements include, among others, the use of deep neural networks to represent dynamics that are partially observed by robot sensors, as well as effective control using sparse control signals. In this work, we explore a more general problem: the manipulation of acoustic waves, which are partially observed by a robot capable of influencing the waves through spatially sparse actuators. This problem holds great potential for the design of new artificial materials, ultrasonic cutting tools, energy harvesting, and other applications. We develop an efficient data-driven method for robot learning that is applicable to either focusing scattered acoustic energy in a designated region or suppressing it, depending on the desired task. The proposed method is better in terms of a solution quality and computational complexity as compared to a state-of-the-art learning based method for manipulation of dynamical systems governed by partial differential equations. Furthermore our proposed method is competitive with a classical semi-analytical method in acoustics research on the demonstrated tasks. We have made the project code publicly available, along with a web page featuring video demonstrations: https://gladisor.github.io/waves/
|
|
11:35-11:40, Paper TuBT16.5 | Add to My Program |
Integrating Model-Based Control and RL for Sim2Real Transfer of Tight Insertion Policies |
|
Marougkas, Isidoros | Rutgers University |
Metha Ramesh, Dhruv | Rutgers University |
Doerr, Joe | Rutgers University |
Granados, Edgar | Rutgers University |
Sivaramakrishnan, Aravind | Amazon Fulfillment Technology & Robotics |
Boularias, Abdeslam | Rutgers University |
Bekris, Kostas E. | Rutgers, the State University of New Jersey |
Keywords: Integrated Planning and Learning, Reinforcement Learning, Manipulation Planning
Abstract: Object insertion under tight tolerances (< 1mm) is an important but challenging assembly task as even small errors can result in undesirable contacts. Recent efforts focused on Reinforcement Learning (RL) often depends on careful definition of dense reward functions. This work proposes an effective strategy for such tasks that integrates traditional model-based control with RL to achieve improved insertion accuracy. The policy is trained exclusively in simulation and is zero-shot transferred to the real system. It employs a potential field-based controller to acquire a model-based policy for inserting a plug into a socket given full observability in simulation. This policy is then integrated with residual RL, which is trained in simulation given only a sparse, goal-reaching reward. A curriculum scheme over observation noise and action magnitude is used for training the residual RL policy. Both policy components use as input the SE(3) poses of both the plug and the socket and return the plug’s SE(3) pose transform, which is executed by a robotic arm using a controller. The integrated policy is deployed on the real system without further training or fine-tuning, given a visual SE(3) object tracker. The proposed solution and alternatives are evaluated across a variety of objects and conditions in simulation and reality. The proposed approach outperforms recent RL-based methods in this domain and prior efforts with hybrid policies. Ablations highlight the impact of each component of the approach. For more information please refer to the corresponding website.
|
|
11:40-11:45, Paper TuBT16.6 | Add to My Program |
Generative Graphical Inverse Kinematics |
|
Limoyo, Oliver | University of Toronto |
Maric, Filip | University of Toronto Institute for Aerospace Studies |
Giamou, Matthew | McMaster University |
Alexson, Petra | University of Toronto |
Petrovic, Ivan | University of Zagreb |
Kelly, Jonathan | University of Toronto |
Keywords: Deep Learning in Robotics and Automation, Kinematics, Manipulation Planning, Redundant Robots
Abstract: Quickly and reliably finding accurate inverse kinematics (IK) solutions remains a challenging problem for many robot manipulators. Existing numerical solvers are broadly applicable but typically only produce a single solution and rely on local search techniques to minimize nonconvex objective functions. More recent learning-based approaches that approximate the entire feasible set of solutions have shown promise as a means to generate multiple fast and accurate IK results in parallel. However, existing learning-based techniques have a significant drawback: each robot of interest requires a specialized model that must be trained from scratch. To address this key shortcoming, we propose a novel distance-geometric robot representation coupled with a graph structure that allows us to leverage the sample efficiency of Euclidean equivariant functions and the generalizability of graph neural networks (GNNs). Our approach is generative graphical inverse kinematics (GGIK), the first learned IK solver able to accurately and efficiently produce a large number of diverse solutions in parallel while also displaying the ability to generalize—a single learned model can be used to produce IK solutions for a variety of different robots. When compared to several other learned IK methods, GGIK provides more accurate solutions with the same amount of data. GGIK can generalize reasonably well to robot manipulators unseen during training. Additionally, GGIK can learn a constrained distribution that encodes joint limits and scales efficiently to larger robots and a high number of sampled solutions. Finally, GGIK can be used to complement local IK solvers by providing reliable initializations for a local optimization process.
|
|
TuBT17 Regular Session, 405 |
Add to My Program |
Localization 1 |
|
|
Chair: Urbann, Oliver | Fraunhofer IML |
Co-Chair: Dümbgen, Frederike | ENS, PSL University |
|
11:15-11:20, Paper TuBT17.1 | Add to My Program |
GNSS/Multi-Sensor Fusion Using Continuous-Time Factor Graph Optimization for Robust Localization |
|
Zhang, Haoming | RWTH Aachen University |
Chen, Chih-Chun | RWTH Aachen University |
Vallery, Heike | TU Delft |
Barfoot, Timothy | University of Toronto |
Keywords: Sensor Fusion, Localization, Autonomous Vehicle Navigation, Factor Graph Optimization
Abstract: Accurate and robust vehicle localization in highly urbanized areas is challenging. Sensors are often corrupted in those complicated and large-scale environments. This paper introduces GNSS-FGO, an online trajectory estimator that fuses GNSS observations alongside multiple sensor measurements for robust vehicle localization. In GNSS-FGO, we fuse asynchronous sensor measurements into the graph with a continuous-time trajectory representation. This enables querying states at arbitrary timestamps so that sensor observations are fused without strict state and measurement synchronization. We employed datasets from measurement campaigns in Aachen, Düsseldorf, and Cologne in experimental studies and presented comprehensive discussions on sensor observations, smoother types, and hyperparameter tuning. Our results show that the proposed approach enables robust trajectory estimation in dense urban areas, where the classic multi-sensor fusion method fails. In a test sequence containing a 17km route through Aachen, our method results in a mean 2-D positioning error of 0.48m while fusing raw GNSS observations with lidar odometry in tight coupling.
|
|
11:20-11:25, Paper TuBT17.2 | Add to My Program |
Equivariant Filter for Tightly Coupled LiDAR-Inertial Odometry |
|
Tao, Anbo | Wuhan University |
Luo, Yarong | Wuhan University |
Xia, Chunxi | Wuhan University |
Guo, Chi | Wuhan University |
Li, Xingxing | Wuhan University |
Keywords: Localization, SLAM
Abstract: Pose estimation is a crucial problem in simultaneous localization and mapping (SLAM). However, developing a robust and consistent state estimator remains a significant challenge, as the traditional extended Kalman filter (EKF) struggles to handle the model nonlinearity, especially for inertial measurement unit (IMU) and light detection and ranging (LiDAR). To provide a consistent and efficient solution of pose estimation, we propose Eq-LIO, a robust state estimator for tightly coupled LIO systems based on an equivariant filter (EqF). Compared with the invariant Kalman filter based on the SE_2(3) group structure, the EqF uses the symmetry of the semi-direct product group to couple the system state including IMU bias, navigation state, and LiDAR extrinsic calibration state, thereby suppressing linearization error and improving the behavior of the estimator in the event of unexpected state changes. The proposed Eq-LIO owns natural consistency and higher robustness, which is theoretically proven with mathematical derivation and experimentally verified through a series of tests on both public and private datasets.
|
|
11:25-11:30, Paper TuBT17.3 | Add to My Program |
Monocular Visual Place Recognition in LiDAR Maps Via Cross-Modal State Space Model and Multi-View Matching |
|
Yao, Gongxin | Zhejiang University |
Li, Xinyang | Zhejiang University |
Fu, Luowei | Zhejiang University |
Pan, Yu | Zhejiang University |
Keywords: Localization, Deep Learning for Visual Perception, Recognition
Abstract: Achieving monocular camera localization within pre-built LiDAR maps can bypass the simultaneous mapping process of visual SLAM systems, potentially reducing the computational overhead of autonomous localization. To this end, one of the key challenges is cross-modal place recognition, which involves retrieving 3D scenes (point clouds) from a LiDAR map according to online RGB images. In this paper, we introduce an efficient framework to learn descriptors for both RGB images and point clouds. It takes visual state space model (VMamba) as the backbone and employs a pixel-view-scene joint training strategy for cross-modal contrastive learning. To address the field-of-view differences, independent descriptors are generated from multiple evenly distributed viewpoints for point clouds. A visible 3D points overlap strategy is then designed to quantify the similarity between point cloud views and RGB images for multi-view supervision. Additionally, when generating descriptors from pixel-level features using NetVLAD, we compensate for the loss of geometric information, and introduce an efficient scheme for multi-view generation. Experimental results on the KITTI and KITTI-360 datasets demonstrate the effectiveness and generalization of our method. The code is available at https://github.com/y2w-oc/I2P-CMPR.
|
|
11:30-11:35, Paper TuBT17.4 | Add to My Program |
Learning IMU Bias with Diffusion Model |
|
Zhou, Shenghao | University of Delaware |
Katragadda, Saimouli | University of Delaware |
Huang, Guoquan (Paul) | University of Delaware |
Keywords: Visual-Inertial SLAM, SLAM, Localization
Abstract: Motion sensing and tracking with IMU data is essential for spatial intelligence, which however is challenging due to the presence of time-varying stochastic bias. IMU bias is affected by various factors such as temperature and vibration, making it highly complex and difficult to model analytically. Recent data-driven approaches using deep learning have shown promise in predicting bias from IMU readings. However, these methods often treat the task as a regression problem, overlooking the stochatic nature of bias. In contrast, we model bias, conditioned on IMU readings, as a probabilistic distribution and design a conditional diffusion model to approximate this distribution. Through this approach, we achieve improved performance and make predictions that align more closely with the known behavior of bias.
|
|
11:35-11:40, Paper TuBT17.5 | Add to My Program |
On Semidefinite Relaxations for Matrix-Weighted State-Estimation Problems in Robotics |
|
Holmes, Connor | University of Toronto |
Dümbgen, Frederike | ENS, PSL University |
Barfoot, Timothy | University of Toronto |
Keywords: SLAM, Localization, Optimization and Optimal Control, Certifiable
Abstract: In recent years, there has been remarkable progress in the development of so-called certifiable perception methods, which leverage semidefinite, convex relaxations to find global optima of perception problems in robotics. However, many of these relaxations rely on simplifying assumptions that facilitate the problem formulation, such as an isotropic measurement noise distribution. In this paper, we explore the tightness of the semidefinite relaxations of matrix-weighted (anisotropic) state-estimation problems and reveal the limitations lurking therein: matrix-weighted factors can cause convex relaxations to lose tightness. In particular, we show that the semidefinite relaxations of localization problems with matrix weights may be tight only for low noise levels. To better understand this issue, we introduce a theoretical connection between the posterior uncertainty of the state estimate and the certificate matrix obtained via convex relaxation. With this connection in mind, we empirically explore the factors that contribute to this loss of tightness and demonstrate that redundant constraints can be used to regain it. As a second technical contribution of this paper, we show
|
|
11:40-11:45, Paper TuBT17.6 | Add to My Program |
Drift-Free Visual SLAM Using Digital Twins |
|
Merat, Roxane | ETH Zurich |
Cioffi, Giovanni | University of Zurich |
Bauersfeld, Leonard | University of Zurich (UZH) |
Scaramuzza, Davide | University of Zurich |
Keywords: SLAM, Localization
Abstract: Globally-consistent localization in urban environments is crucial for autonomous systems such as self-driving vehicles and drones, as well as assistive technologies for visually impaired people. Traditional Visual-Inertial Odometry (VIO) and Visual Simultaneous Localization and Mapping (VSLAM) methods, though adequate for local pose estimation, suffer from drift in the long term due to reliance on local sensor data. While GPS counteracts this drift, it is unavailable indoors and often unreliable in urban areas. An alternative is to localize the camera to an existing 3D map using visual-feature matching. This can provide centimeter-level accurate localization but is limited by the visual similarities between the current view and the map. This paper introduces a novel approach that achieves accurate and globally-consistent localization by aligning the sparse 3D point cloud generated by the VIO/VSLAM system to a digital twin using point-to-plane matching; no visual data association is needed. The proposed method provides a 6-DoF global measurement tightly integrated into the VIO/VSLAM system. Experiments run on a high-fidelity GPS simulator and real-world data collected from a drone demonstrate that our approach outperforms state-of-the-art VIO-GPS systems and offers superior robustness against viewpoint changes compared to the state-of-the-art Visual SLAM systems.
|
|
TuBT18 Regular Session, 406 |
Add to My Program |
Place Recognition 1 |
|
|
Chair: Bogoslavskyi, Igor | Robotics and AI Institute |
Co-Chair: Malone, Connor | Queensland University of Technology |
|
11:15-11:20, Paper TuBT18.1 | Add to My Program |
TDFANet: Encoding Sequential 4D Radar Point Clouds Using Trajectory-Guided Deformable Feature Aggregation for Place Recognition |
|
Lu, Shouyi | Tongji University |
Zhuo, Guirong | Tongji University, Shanghai |
Wang, Haitao | The Shanghai Geometrical Perception and Learning Co., Ltd |
Zhou, Quan | Tongji University |
Zhou, Huanyu | Tongji University |
Huang, Renbo | Tongji University |
Huang, Minqing | Tongji University |
Zheng, Lianqing | TONGJI University |
Shu, Qiang | The Shanghai Tongyu Automotive Technology Co., Ltd |
Keywords: Localization, Autonomous Vehicle Navigation
Abstract: Place recognition is essential for achieving closed-loop or global positioning in autonomous vehicles and mobile robots. Despite recent advancements in place recognition using 2D cameras or 3D LiDAR, it remains to be seen how to use 4D radar for place recognition - an increasingly popular sensor for its robustness against adverse weather and lighting conditions. Compared to LiDAR point clouds, radar data are drastically sparser, noisier and in much lower resolution, which hampers their ability to effectively represent scenes, posing significant challenges for 4D radar-based place recognition. This work addresses these challenges by leveraging multi-modal information from sequential 4D radar scans and effectively extracting and aggregating spatio-temporal features. Our approach follows a principled pipeline that comprises (1) dynamic points removal and ego-velocity estimation from velocity property, (2) bird's eye view (BEV) feature encoding on the refined point cloud, (3) feature alignment using BEV feature map motion trajectory calculated by ego-velocity, (4) multi-scale spatio-temporal features of the aligned BEV feature maps are extracted and aggregated. Real-world experimental results validate the feasibility of the proposed method and demonstrate its robustness in handling dynamic environments. Source codes are available.
|
|
11:20-11:25, Paper TuBT18.2 | Add to My Program |
HeLiOS: Heterogeneous LiDAR Place Recognition Via Overlap-Based Learning and Local Spherical Transformer |
|
Jung, Minwoo | Seoul National University |
Jung, Sangwoo | Seoul National University |
Gil, Hyeonjae | SNU |
Kim, Ayoung | Seoul National University |
Keywords: Localization, Range Sensing, SLAM
Abstract: LiDAR place recognition is a crucial module in localization that matches the current location with previously observed environments. Most existing approaches in LiDAR place recognition dominantly focus on the spinning type LiDAR to exploit its large FOV for matching. However, with the recent emergence of various LiDAR types, the importance of matching data across different LiDAR types has grown significantly—a challenge that has been largely overlooked for many years. To address these challenges, we introduce HeLiOS, a deep network tailored for heterogeneous LiDAR place recognition, which utilizes small local windows with spherical transformers and optimal transport-based cluster assignment for robust global descriptors. Our overlap-based data mining and guided-triplet loss overcome the limitations of traditional distance-based mining and discrete class constraints. HeLiOS is validated on public datasets, demonstrating performance in heterogeneous LiDAR place recognition while including an evaluation for long-term recognition, showcasing its ability to handle unseen LiDAR types. We release the HeLiOS code as an open source for the robotics community at https://github.com/minwoo0611/HeLiOS.
|
|
11:25-11:30, Paper TuBT18.3 | Add to My Program |
InsCMPR: Efficient Cross-Modal Place Recognition Via Instance-Aware Hybrid Mamba-Transformer |
|
Jiao, Shuaifeng | National University of Defense Technology |
Su, Zhuoqun | National University of Defense Technology |
Luo, Lun | Zhejiang University |
Yu, Hongshan | Hunan University |
Zhou, Zongtan | National University of Defense Technology |
Lu, Huimin | National University of Defense Technology |
Chen, Xieyuanli | National University of Defense Technology |
Keywords: Localization, Deep Learning for Visual Perception, Visual Learning
Abstract: Place recognition is an important technique for autonomous mobile robotic applications. While single-modal sensor-based approaches have shown satisfactory performance, cross-modal place recognition remains underexplored due to the challenge of bridging the cross-modal heterogeneity gap. In this work, we introduce an instance-aware cross-modal place recognition approach, named InsCMPR. We design a novel instance-aware modality alignment module, which aligns multi-modal data at both pixel-level and instance-level by leveraging a pre-trained vision foundation model SAM. Then a novel dual-branch hybrid Mamba-Transformer network is proposed to efficiently enhance the distinctiveness of the produced descriptors by integrating global features with local instance features. Experimental results on the KITTI, NCLT, and HAOMO datasets show that our proposed methods achieve state-of-the-art performance while operating in real time. We will open source the implementation of our method at: https://github.com/nubot-nudt/InsCMPR.
|
|
11:30-11:35, Paper TuBT18.4 | Add to My Program |
Adaptive Thresholding for Sequence-Based Place Recognition |
|
Vysotska, Olga | ETH Zurich |
Bogoslavskyi, Igor | Magic Leap |
Hutter, Marco | ETH Zurich |
Stachniss, Cyrill | University of Bonn |
Keywords: SLAM, Localization, Mapping
Abstract: Robots need to know where they are in the world to operate effectively without human support. One common first step for precise robot localization is visual place recognition. It is a challenging problem, especially when the output is required in an online fashion, and the current state-of-the-art approaches that tackle it usually require either large amounts of labeled training data or rely on parameters that need to be tuned manually, often per dataset. One such parameter often used for sequence-based place recognition is the image similarity threshold that allows to differentiate between pairs of images that represent the same place even in the presence of severe environmental and structural changes, and those that represent different places even if they share a similar appearance. Currently, selecting this threshold is a manual procedure and requires human expertise. We propose an automatic similarity threshold selection technique and integrate it into a complete sequence-based place recognition system. The experiments on a broad range of real-world and simulated data show that our approach is capable of matching image sequences under various illumination, viewpoint and underlying structural changes, runs online, and requires no manual parameter tuning while yielding performance comparable to a manual, dataset-specific parameter tuning. Thus, this paper substantially increases the ease of use of visual place recognition in real-world settings.
|
|
11:35-11:40, Paper TuBT18.5 | Add to My Program |
RE-TRIP : Reflectivity Instance Augmented Triangle Descriptor for 3D Place Recognition |
|
Park, Yechan | Yonsei University |
Pak, Gyuhyeon | Yonsei University |
Kim, Euntai | Yonsei University |
Keywords: SLAM, Localization, Mapping
Abstract: While most people associate LiDAR primarily with its ability to measure distances and provide geometric information about the environment (via point clouds), LiDAR also captures additional data, including reflectivity or intensity values. Unfortunately, when LiDAR is applied to Place Recognition (PR) in mobile robotics, most previous works on LiDAR-based PR rely only on geometric measurements, neglecting the additional reflectivity information that LiDAR provides. In this paper, we propose a novel descriptor for 3D PR, named RE-TRIP (REflectivity-instance augmented TRIangle descriPtor). This new descriptor leverages both geometric measurements and reflectivity to enhance robustness in challenging scenarios such as geometric degeneracy, high geometric similarity, and the presence of dynamic objects. To implement RE-TRIP in real-world applications, we further propose (1) keypoint extraction method, (2) key instance segmentation method, (3) RE-TRIP matching method, and (4) reflectivity combined loop verification method. Finally, we conduct a series of experiments to demonstrate the effectiveness of RE-TRIP. Applied to public datasets (i.e., HELIPR, FusionPortable) containing diverse scenarios—including long corridors, bridges, large-scale urban areas, and highly dynamic environments—our experimental results show that the proposed method outperforms existing state-of-the-art methods in terms of Scan Context, Intensity Scan Context and STD. Our code is available at : https://github.com/pyc5714/RE-TRIP.
|
|
11:40-11:45, Paper TuBT18.6 | Add to My Program |
Context Graph-Based Visual-Language Place Recognition |
|
Woo, Soojin | Seoul National University |
Kim, Seong-Woo | Seoul National University |
Keywords: Localization, AI-Enabled Robotics, Object Detection, Segmentation and Categorization
Abstract: In vision-based robot localization and SLAM, Visual Place Recognition (VPR) is essential. This paper addresses the problem of VPR, which involves accurately recognizing the location corresponding to a given query image. A popular approach to vision-based place recognition relies on low-level visual features. Despite significant progress in recent years, place recognition based on low-level visual features remains challenging in scenarios with changes in scene appearance. To address this, end-to-end training approaches have been proposed to overcome the limitations of hand-crafted features. However, these approaches still fail under drastic changes and require large amounts of labeled data for model training, presenting a significant limitation. To handle variations in appearance, methods that leverage high-level semantic information, such as objects or categories, have been introduced. In this paper, we introduce a novel VPR approach that does not require additional training and remains robust to scene changes. Our method constructs semantic image descriptors by extracting pixel-level embeddings from a zero-shot, languagedriven semantic segmentation model. We validate our approach in challenging place recognition scenarios using real-world public dataset. The experiments demonstrate that our method outperforms non-learned image representation techniques and offthe- shelf convolutional neural network (CNN) descriptors. Our code is available at https://github.com/woo-soojin/ context-based-vlpr.git.
|
|
TuBT19 Regular Session, 407 |
Add to My Program |
Tactile Sensing 1 |
|
|
Chair: Chin, Lillian | UT Austin |
Co-Chair: Li, Monica | Yale University |
|
11:15-11:20, Paper TuBT19.1 | Add to My Program |
Marker or Markerless? Mode-Switchable Optical Tactile Sensing for Diverse Robot Tasks |
|
Ou, Ni | Beijing Institute of Technology |
Chen, Zhuo | King's College London |
Luo, Shan | King's College London |
Keywords: Force and Tactile Sensing, Grasping
Abstract: Optical tactile sensors play a pivotal role in robot perception and manipulation tasks. The membrane of these sensors can be painted with markers or remain markerless, enabling them to function in either marker or markerless mode. However, this uni-modal selection means the sensor is only suitable for either manipulation or perception tasks. While markers are vital for manipulation, they can also obstruct the camera, thereby impeding perception. The dilemma of selecting between marker and markerless modes presents a significant obstacle. To address this issue, we propose a novel mode-switchable optical tactile sensing approach that facilitates transitions between the two modes. The marker-to-markerless transition is achieved through a generative model, whereas its inverse transition is realized using a sparsely supervised regressive model. Our approach allows a single-mode optical sensor to operate effectively in both marker and markerless modes without the need for additional hardware, making it well-suited for both perception and manipulation tasks. Extensive experiments validate the effectiveness of our method. For perception tasks, our approach decreases the number of categories that include misclassified samples by 2 and improves contact area segmentation IoU by 3.53%. For manipulation tasks, our method attains a high success rate of 92.59% in slip detection. Code, dataset and demo videos are available at the project website https://gitouni.github.io/Marker-Markerless-Transition/
|
|
11:20-11:25, Paper TuBT19.2 | Add to My Program |
Self-Mixing Laser Interferometry for Robotic Tactile Sensing |
|
Proesmans, Remko | Ghent University |
Ward, Goossens | Ghent University |
Van den Stockt, Lowiek | OTIV |
Christiaen, Lowie | Ugent |
Wyffels, Francis | Ghent University |
Keywords: Force and Tactile Sensing, Perception for Grasping and Manipulation, Embedded Systems for Robotic and Automation
Abstract: Self-mixing interferometry (SMI) has been lauded for its sensitivity in detecting microvibrations, while requiring no physical contact with its target. In robotics, microvibrations have traditionally been interpreted as a marker for object slip, and recently as a salient indicator of extrinsic contact. We present the first-ever robotic fingertip making use of SMI for slip and extrinsic contact sensing. The design is validated through measurement of controlled vibration sources, both before and after encasing the readout circuit in its fingertip package. Then, the SMI fingertip is compared to acoustic sensing through four experiments. The results are distilled into a technology decision map. SMI was found to be more sensitive to subtle slip events and significantly more resilient against ambient noise. We conclude that the integration of SMI in robotic fingertips offers a new, promising branch of tactile sensing in robotics. Design and data files are available at https://github.com/RemkoPr/icra2025-SMI-tactile-sensing.
|
|
11:25-11:30, Paper TuBT19.3 | Add to My Program |
Estimating High-Resolution Neural Stiffness Fields Using Visuotactile Sensors |
|
Han, Jiaheng | University of Illinois Urbana-Champaign |
Yao, Shaoxiong | University of Illinois Urbana-Champaign |
Hauser, Kris | University of Illinois at Urbana-Champaign |
Keywords: Force and Tactile Sensing
Abstract: High-resolution visuotactile sensors provide detailed contact information that is promising to infer the physical properties of objects in contact. This paper introduces a novel technique for high-resolution stiffness estimation of heterogeneous deformable objects using the Punyo bubble sensor. We developed an observation model for dense contact forces to estimate object stiffness using a visuotactile sensor and a dense force estimator. Additionally, we propose a neural volumetric stiffness field (VSF) formulation that represents stiffness as a continuous function, which allows dynamic point sampling at visuotactile sensor observation resolution. The neural VSF significantly reduces artifacts commonly associated with traditional point-based methods, particularly in stiff inclusion estimation and heterogeneous stiffness estimation. We further apply our method in a blind localization task, where objects within opaque bags are accurately modeled and localized, demonstrating the superior performance of neural VSF compared to existing techniques.
|
|
11:30-11:35, Paper TuBT19.4 | Add to My Program |
High-Resolution Reconstruction of Non-Planar Tactile Patterns from Low-Resolution Taxel-Based Tactile Sensors |
|
Zhou, Chen | The University of Hong Kong |
Zhao, He | Dalian University of Technology |
Liu, Qian | Dalian University of Technology |
Keywords: Force and Tactile Sensing, Contact Modeling
Abstract: Over the past decades, the development of tactile sensors has gained increasing attention and has gradually become a fundamental device for robots. Especially in today's context where human-robot interaction demands are growing and the requirements for tactile perception are becoming stricter, how to enable robots to better perceive their environment has become a topic worth discussing. Tactile sensors, after years of development, have emerged in two main types: taxel-based and vision-based sensors, where the latter can provide relatively low resolution (LR) tactile patterns compared with the former. Both of them have seen significant enhancements in their tactile perception capabilities on flat and regular surfaces. However, as application scenarios expand, current flat tactile perception can no longer meet the robots' needs for multi-dimensional and complex perception capabilities. Therefore, we investigate the high-resolution (HR) reconstruction of non-planar tactile patterns captured by LR taxel-based sensors in this paper. We first develop a new dataset, where the ground truth of non-planar tactile patterns are obtained with a vision-based GelSight Mini tactile sensor, and the LR data are collected via a commercial taxel-based Xela sensor. In addition, we propose to adapt the state-of-the-art CNN- and GAN-based tactile super-resolution model of flat/planar surfaces to the non-planar scenario, and also develop a diffusion-based model for the non-planar HR reconstruction. Experimental results confirm the efficiency of the proposed models.
|
|
11:35-11:40, Paper TuBT19.5 | Add to My Program |
Blind Tactile Exploration for Surface Reconstruction |
|
Sinha, Yashaswi | Indian Institute of Science, Bengaluru |
Bhattacharya, Soumojit | Indian Institute of Technology Kharagpur |
Sahu, Yash Kumar | Indian Institute of Science, Bengaluru |
Biswas, Pradipta | Indian Institute of Science |
Keywords: Force and Tactile Sensing, Manipulation Planning
Abstract: Accurate 3d reconstruction capturing the fine details of an object’s shape is essential for tasks such as automated assembly, inspection, and quality control. While monocular cameras provide broad visual structure but often miss critical surface details and depth accuracy in underexposed or occluded environments. Tactile sensors offer precise, localized depth information, capturing fine textures, yet exploring varied curvature surfaces with only tactile input remains challenging. To address this, the paper proposes a blind surface exploration method for convex objects using a set of sequential controllers to efficiently guide the manipulator's interaction with surfaces featuring sharp edge changes. This approach ensures precise tactile exploration, leading to highly detailed surface reconstruction. With the controller employed, the algorithm was able to move along the surface while maintaining contact along normal and reconstruct the object with IoU as high as 91% for objects with sharp edges.
|
|
11:40-11:45, Paper TuBT19.6 | Add to My Program |
Graph-Structured Super-Resolution for Geometry-Generalized Tomographic Tactile Sensing: Application to Humanoid Faces |
|
Park, Hyunkyu | Samsung Advanced Institute of Technology |
Kim, Woojong | KAIST |
Jeon, Sangha | Korea Advanced Institute of Science and Technology(KAIST) |
Na, Youngjin | Sookmyung Women's University |
Kim, Jung | KAIST |
Keywords: Force and Tactile Sensing, Deep Learning in Robotics and Automation, Sensor-based Control, Tomographic Reconstruction
Abstract: Electrical impedance tomographic (EIT) tactile sensing holds great promise for whole-body coverage of contact-rich robotic systems, offering extensive flexibility in sensor geometry. However, low spatial resolution restricts its practical use, despite the existing deep-learning-based reconstruction methods. This study introduces EIT-GNN, a graph-structured data-driven EIT reconstruction framework that achieves super-resolution in large-area tactile perception on unbounded form factors of robots. EIT-GNN represents the arbitrary sensor shape into mesh connections, then employs a two-fold architecture of transformer encoder and graph convolutional neural network to best manage such the geometrical prior knowledge, resulting in the accurate, generalized, and parameter-efficient reconstruction procedure. As a proof-of-concept, we demonstrate its application using large-area face-shaped sensor hardware, which represents one of the most complex geometries in human/humanoid anatomy. An extensive set of experiments, including simulation study, ablation analysis, single-touch indentation test, and latent feature analysis, confirm its superiority over alternative models. The beneficial features of the approach are demonstrated through its application in active tactile-servo control of humanoid head motion, paving the new way for integrating tactile sensors with intricate designs into robotic systems.
|
|
TuBT20 Regular Session, 408 |
Add to My Program |
Design and Robust Control |
|
|
Chair: Isler, Volkan | University of Minnesota |
Co-Chair: Tan, Xiaobo | Michigan State University |
|
11:15-11:20, Paper TuBT20.1 | Add to My Program |
Control Reallocation Using Deep Reinforcement Learning for Actuator Fault Recovery of an Autonomous Underwater Vehicle |
|
Lagattu, Katell | ENSTA |
Artusi, Eva | Naval Group |
Santos, Paulo E. | Priori Analytica |
Sammut, Karl | Flinders University |
Le Chenadec, Gilles | ENSTA |
Clement, Benoit | ENSTA, Institut Polytechnique De Paris |
Keywords: Robust/Adaptive Control, Reinforcement Learning, Autonomous Agents
Abstract: Actuator faults in dynamic systems pose significant challenges, particularly for robotic systems operating in hostile environments such as Autonomous Underwater Vehicles (AUVs), risking loss of stability and performance degradation. Fault Tolerant Control (FTC) strategies, including Control Reallocation (CR), have been developed to mitigate such risks. However, these strategies extensively depend on explicit fault diagnosis, which may present challenges regarding computational demands and efficiency, particularly when dealing with unknown faults. This paper presents a novel method that performs CR with Deep Reinforcement Learning (DRL) for actuator fault recovery without explicit fault diagnosis. The approach is implemented on a BlueROV2 underwater vehicle and demonstrates improved performance for fault recovery compared to a standard Proportional-Integral-Derivative (PID) controller and a variable gain PID controller, both in simulation and in real-world conditions. The DRL-based CR method demonstrates generalisability by successfully handling faults not encountered during training, highlighting its adaptability to unforeseen circumstances.
|
|
11:20-11:25, Paper TuBT20.2 | Add to My Program |
A New Framework for Repetitive Control of Robot Manipulators Via Operator-Theoretic Robust Stabilization |
|
Song, Geun Il | Postech |
Kim, Jung Hoon | Pohang University of Science and Technology |
Keywords: Robust/Adaptive Control, Motion Control, Industrial Robots
Abstract: This paper establishes a new framework for repetitive control of uncertain robot manipulators via operator-theoretic robust stabilization. After applying the inverse dynamics approach to robot manipulators, by which the relevant nonlinear inputslash output behavior is converted to a linear time-invariant (LTI) equation, we take the repetitive control approach. Even though such a repetitive controller is known to achieve high performances for periodic reference inputs, it is quite difficult to derive the stability analysis for the resulting closed-loop systems in a rigorous fashion. To solve this difficulty, we construct an operator-theoretic approach to the repetitive control treatment, and show that the closed-loop systems are exponentially stable if and only if the spectrum radius of the relevant monodromy operator is less than 1. Based on the necessary and sufficient condition, we develop a guideline to take the relevant control parameters. Finally, some experiment results are given to demonstrate the overall arguments developed in this paper.
|
|
11:25-11:30, Paper TuBT20.3 | Add to My Program |
Learning Robust Policies Via Interpretable Hamilton-Jacobi Reachability-Guided Disturbances |
|
Hu, Hanyang | Simon Fraser University |
Zhang, Xilun | Carnegie Mellon University |
Lyu, Xubo | Simon Fraser University |
Chen, Mo | Simon Fraser University |
Keywords: Robust/Adaptive Control, Reinforcement Learning
Abstract: Deep Reinforcement Learning (RL) has shown remarkable success in robotics with complex and heterogeneous dynamics. However, its vulnerability to unknown disturbances and adversarial attacks remains a significant challenge. In this paper, we propose a robust policy training framework that integrates model-based control principles with adversarial RL training to improve robustness without the need for external black-box adversaries. Our approach introduces a novel Hamilton-Jacobi reachability-guided disturbance for adversarial RL training, where we use interpretable worst-case or near-worst-case disturbances as adversaries against the robust policy. We evaluated its effectiveness across three distinct tasks: a reach-avoid game in both simulation and real-world settings, and a highly dynamic quadrotor stabilization task in simulation. We validate that our learned critic network is consistent with the ground-truth HJ value function, while the policy network shows comparable performance with other learning-based methods.
|
|
11:30-11:35, Paper TuBT20.4 | Add to My Program |
Optimal Fault-Tolerant Control for Tugboats Robust Path Following in Nearshore |
|
Shi, Jiangteng | Hainan University |
Zhang, Jun | Graz University of Technology |
Chen, Yujing | HaiNan University |
Ren, Jia | Hainan University |
Keywords: Robust/Adaptive Control, Marine Robotics, Reinforcement Learning
Abstract: External ocean disturbances (EODs) and internal thruster loss-of-effectiveness faults (ITLEFs) are key factors influencing the accuracy of the autonomous tugboat's path following, as well as the stability and safety of the tugboat's hull during maritime operations. To achieve robust path following for the autonomous tugboat, this paper proposes an optimal fault-tolerant control scheme. Firstly, we formulate the robust path following of the tugboat as an optimal fault-tolerance control problem. A matrixed error system for the control scheme is constructed to uniformly consider both EODs and ITLEFs. Secondly, considering the time and economic costs associated with algorithm deployment and tuning process on tugboats in real world, we present an adaptive dynamic programming algorithm to solve the proposed optimal fault-tolerance problem, which is characterized by ease of tuning. Then, the stability of the control system is proven based on the Lyapunov criterion. Finally, the proposed control scheme is evaluated under practical conditions with EODs and ITLEFs. The comparative results with backstepping-based control scheme demonstrate that the proposed control scheme exhibits more robustness for path following under EODs and ITLEFs.
|
|
11:35-11:40, Paper TuBT20.5 | Add to My Program |
Neural L1 Adaptive Control of Vehicle Lateral Dynamics |
|
Mukherjee, Pratik | Florida Atlantic University |
Gonultas, Burak M | University of Minnesota |
Poyrazoglu, Oguzhan Goktug | University of Minnesota |
Isler, Volkan | University of Minnesota |
Keywords: Robust/Adaptive Control, Machine Learning for Robot Control, Autonomous Vehicle Navigation
Abstract: We address the problem of stable and robust control of vehicles with lateral error dynamics for the application of lane keeping. Lane departure is the primary reason for half of the fatalities in road accidents, making the development of stable, adaptive and robust controllers a necessity. Traditional linear feedback controllers achieve satisfactory tracking performance, however, they exhibit unstable behavior when uncertainties are induced into the system. Any disturbance or uncertainty introduced to the steering-angle input can be catastrophic for the vehicle. Therefore, controllers must be developed to actively handle such uncertainties. In this work, we introduce a Neural L1 Adaptive controller which learns the uncertainties in the lateral error dynamics of a front-steered Ackermann vehicle and guarantees stability and robustness. Our contributions are threefold: i) We extend the theoretical results for guaranteed stability and robustness of conventional L1 Adaptive controllers to Neural L1 Adaptive controller; ii) We implement a Neural L1 Adaptive controller for the lane keeping application which learns uncertainties in the dynamics accurately; iii) We evaluate the performance of Neural L1 Adaptive controller on a physics-based simulator, PyBullet, and conduct extensive real-world experiments with the F1TENTH platform to demonstrate superior reference trajectory tracking performance of Neural L1 Adaptive controller compared to other state-of-the-art controllers, in the presence of uncertainties.
|
|
11:40-11:45, Paper TuBT20.6 | Add to My Program |
Mechanical Design and Data-Enabled Predictive Control of a Planar Soft Robot |
|
Wang, Huanqing | Michigan State University |
Zhang, Kaixiang | Michigan State University |
Lee, Kyungjoon | University of California Riverside |
Mei, Yu | Michigan State University |
Zhu, Keyi | Michigan State University |
Srivastava, Vaibhav | Michigan State University |
Sheng, Jun | University of California Riverside |
Li, Zhaojian | Michigan State University |
Keywords: Modeling, Control, and Learning for Soft Robots, Optimization and Optimal Control, Soft Sensors and Actuators
Abstract: Soft robots offer a unique combination of flexibility, adaptability, and safety, making them well-suited for a diverse range of applications. However, the inherent complexity of soft robots poses great challenges in their modeling and control. In this paper, we present the mechanical design and data-driven control of a pneumatic-driven soft planar robot. Specifically, we employ a data-enabled predictive control (DeePC) strategy that directly utilizes system input/output data to achieve safe and optimal control, eliminating the need for tedious system identification or modeling. In addition, a dimension reduction technique is introduced into the DeePC framework, resulting in significantly enhanced computational efficiency with minimal to no degradation in control performance. Comparative experiments are conducted to validate the efficacy of DeePC in the control of the fabricated soft robot.
|
|
TuBT21 Regular Session, 410 |
Add to My Program |
Reinforcement Learning 2 |
|
|
Chair: Qi, Xinda | Michigan State University |
Co-Chair: Jia, Yunyi | Clemson University |
|
11:15-11:20, Paper TuBT21.1 | Add to My Program |
Efficient Imitation without Demonstrations Via Value-Penalized Auxiliary Control from Examples |
|
Ablett, Trevor | University of Toronto |
Chan, Bryan | University of Alberta |
Wang, Jayce Haoran | University of Toronto |
Kelly, Jonathan | University of Toronto |
Keywords: Reinforcement Learning, Imitation Learning, Learning from Demonstration
Abstract: Common approaches to providing feedback in reinforcement learning are the use of hand-crafted rewards or full-trajectory expert demonstrations. Alternatively, one can use examples of completed tasks, but such an approach can be extremely sample inefficient. We introduce value-penalized auxiliary control from examples (VPACE), an algorithm that significantly improves exploration in example-based control by adding examples of simple auxiliary tasks and an above-success level value penalty. Across both simulated and real robotic environments, we show that our approach substantially improves learning efficiency for challenging tasks, while maintaining bounded value estimates. Preliminary results also suggest that VPACE may learn more efficiently than the more common approaches of using full trajectories or true sparse rewards. Project site: https://papers.starslab.ca/vpace/.
|
|
11:20-11:25, Paper TuBT21.2 | Add to My Program |
QuasiNav: Asymmetric Cost-Aware Navigation Planning with Constrained Quasimetric Reinforcement Learning |
|
Hossain, Jumman | University of Maryland Baltimore County |
Faridee, Abu-Zaher | University of Maryland Baltimore County, USA |
Asher, Derrik | DEVCOM Army Research Lab, USA |
Freeman, Jade | DEVCOM Army Research Lab, USA |
Gregory, Timothy | DEVCOM Army Research Lab, USA |
Trout, Theron T. | Stormfish Scientific Corp |
Roy, Nirmalya | University of Maryland Baltimore County, USA |
Keywords: Reinforcement Learning, Autonomous Vehicle Navigation, Motion and Path Planning
Abstract: Autonomous navigation in unstructured outdoor environments is inherently challenging due to the presence of asymmetric traversal costs, such as varying energy expenditures for uphill versus downhill movement. Traditional reinforcement learning methods often assume symmetric costs, which can lead to suboptimal navigation paths and increased safety risks in real-world scenarios. In this paper, we introduce QuasiNav, a novel reinforcement learning framework that integrates quasimetric embeddings to explicitly model asymmetric costs and guide efficient, safe navigation. QuasiNav formulates the navigation problem as a constrained Markov decision process (CMDP) and employs quasimetric embeddings to capture directionally dependent costs, allowing for a more accurate representation of the terrain. We combine this approach with adaptive constraint tightening. This ensures that safety constraints are dynamically enforced during learning. We validate QuasiNav on a Clearpath Jackal robot in three challenging navigation scenarios—undulating terrains, asymmetric hill traversal, and directionally dependent terrain traversal—demonstrating its effectiveness in both simulated and real-world environments. Experimental results show that QuasiNav significantly outperforms conventional methods, achieving higher success rates, improved energy efficiency (13.6% reduction in energy consumption compared to baseline methods), and better adherence to safety constraints.
|
|
11:25-11:30, Paper TuBT21.3 | Add to My Program |
Learning a High-Quality Robotic Wiping Policy Using Systematic Reward Analysis and Visual-Language Model Based Curriculum |
|
Liu, Yihong | Georgia Institute of Technology |
Kang, Dongyeop | ETRI |
Ha, Sehoon | Georgia Institute of Technology |
Keywords: Reinforcement Learning, AI-Enabled Robotics, Force Control
Abstract: Autonomous robotic wiping is an important task in various industries, ranging from industrial manufacturing to sanitization in healthcare. Deep reinforcement learning (Deep RL) has emerged as a promising algorithm, however, it often suffers from a high demand for repetitive reward engineering. Instead of relying on manual tuning, we first analyze the convergence of quality-critical robotic wiping, which requires both high-quality wiping and fast task completion, to show the poor convergence of the problem and propose a new bounded reward formulation to make the problem feasible. Then, we further improve the learning process by proposing a novel visual-language model (VLM) based curriculum, which actively monitors the progress and suggests hyperparameter tuning. We demonstrate that the combined method can find a desirable wiping policy on surfaces with various curvatures, frictions, and waypoints, which cannot be learned with the baseline formulation. The demo of this project can be found at: https://sites.google.com/view/highqualitywiping
|
|
11:30-11:35, Paper TuBT21.4 | Add to My Program |
Actor-Critic Cooperative Compensation to Model Predictive Control for Off-Road Autonomous Vehicles under Unknown Dynamics |
|
Gupta, Prakhar | Clemson University |
Smereka, Jonathon M. | U.S. Army TARDEC |
Jia, Yunyi | Clemson University |
Keywords: Machine Learning for Robot Control, Motion Control, Autonomous Vehicle Navigation
Abstract: This study presents an Actor-Critic Cooperative Compensated Model Predictive Controller (AC3MPC) designed to address unknown system dynamics. To avoid the difficulty of modeling highly complex dynamics and ensuring real-time control feasibility and performance, this work uses deep reinforcement learning with a model predictive controller in a cooperative framework to handle unknown dynamics. The model-based controller takes on the primary role as both controllers are provided with predictive information about the other. This improves tracking performance and retention of inherent robustness of the model predictive controller. We evaluate this framework for off-road autonomous driving on unknown deformable terrains that represent sandy deformable soil, sandy and rocky soil, and cohesive clay-like deformable soil. Our findings demonstrate that our controller statistically outperforms standalone model-based and learning-based controllers by upto 29.2% and 10.2%. This framework generalized well over varied and previously unseen terrain characteristics to track longitudinal reference speeds with lower errors. Furthermore, this required significantly less training data compared to purely learning-based controller, while delivering better performance even when under-trained.
|
|
11:35-11:40, Paper TuBT21.5 | Add to My Program |
Soft Actor-Critic-Based Control Barrier Adaptation for Robust Autonomous Navigation in Unknown Environments |
|
Mohammad, Nicholas | University of Virginia |
Bezzo, Nicola | University of Virginia |
Keywords: Machine Learning for Robot Control, Motion and Path Planning, Collision Avoidance
Abstract: Motion planning failures during autonomous navigation often occur when safety constraints are either too conservative, leading to deadlocks, or too liberal, resulting in collisions. To improve robustness, a robot must dynamically adapt its safety constraints to ensure it reaches its goal while balancing safety and performance measures. To this end, we propose a Soft Actor-Critic (SAC)-based model for adapting Control Barrier Function (CBF) constraint parameters at runtime, ensuring safe yet non-conservative motion. The proposed approach is designed for a general high-level motion planner, low-level controller, and target system model, and is trained in simulation only. Through extensive simulations and physical experiments, we demonstrate that our framework effectively adapts CBF constraints, enabling the robot to reach its final goal without compromising safety.
|
|
11:40-11:45, Paper TuBT21.6 | Add to My Program |
Multi-Task Reinforcement Learning for Quadrotors |
|
Xing, Jiaxu | University of Zurich |
Geles, Ismail | Robotics and Perception Group, University of Zurich |
Song, Yunlong | University of Zurich |
Aljalbout, Elie | University of Zurich |
Scaramuzza, Davide | University of Zurich |
Keywords: Reinforcement Learning, Machine Learning for Robot Control, Aerial Systems: Perception and Autonomy
Abstract: Reinforcement learning (RL) has shown great effectiveness in quadrotor control, enabling specialized policies to develop outstanding, even human-level, performance in single-task scenarios. However, these specialized policies often struggle with novel tasks, requiring a complete retraining of the policy from scratch. This limitation is particularly challenging in various real-world applications such as search and rescue or infrastructure inspection, where quick and efficient adaptation to diverse tasks is crucial. To address this limitation, we propose a novel multi-task reinforcement learning (MTRL) framework for multiple quadrotor control tasks. Quadrotor control tasks have fundamental similarities based on the consistent physical properties and dynamics of the platform itself. We leverage these similarities and propose an MTRL approach based on an efficient knowledge-sharing framework. Our approach significantly improves the sample efficiency compared to learning tasks individually without compromising task performance. As a result, our approach produces a single high-performance policy capable of executing complex maneuvers such as stabilizing from high speed, velocity tracking, and autonomous racing. Our experimental results, validated both in simulation and real-world scenarios, demonstrate that our framework outperforms baseline approaches in terms of sample efficiency and overall task performance.
|
|
TuBT22 Regular Session, 411 |
Add to My Program |
Learning for Navigation |
|
|
Chair: Song, Daeun | George Mason University |
Co-Chair: Kuipers, Benjamin | University of Michigan |
|
11:15-11:20, Paper TuBT22.1 | Add to My Program |
Watch Your STEPP: Semantic Traversability Estimation Using Pose Projected Features |
|
Aegidius, Sebastian | University College London |
Hadjivelichkov, Denis | University College London |
Jiao, Jianhao | University College London |
Embley-Riches, Jonathan | University College London |
Kanoulas, Dimitrios | University College London |
Keywords: Vision-Based Navigation, Motion and Path Planning, Legged Robots
Abstract: Understanding the traversability of terrain is essential for autonomous robot navigation, particularly in unstructured environments such as natural landscapes. Although traditional methods, such as occupancy mapping, provide a basic framework, they often fail to account for the complex mobility capabilities of some platforms such as legged robots. In this work, we propose a method for estimating terrain traversability by learning from demonstrations of human walking. Our approach leverages dense, pixel-wise feature embeddings generated using the DINOv2 vision Transformer model, which are processed through an encoder-decoder MLP architecture to analyze terrain segments. The averaged feature vectors, extracted from the masked regions of interest, are used to train the model in a reconstruction-based framework. By minimizing reconstruction loss, the network distinguishes between familiar terrain with a low reconstruction error and unfamiliar or hazardous terrain with a higher reconstruction error. This approach facilitates the detection of anomalies, allowing a legged robot to navigate more effectively through challenging terrain. We run real-world experiments on the ANYmal legged robot both indoor and outdoor to prove our proposed method. The code is open-source, while video demonstrations can be found on our website: https://rpl-cs-ucl.github.io/STEPP
|
|
11:20-11:25, Paper TuBT22.2 | Add to My Program |
GND: Global Navigation Dataset with Multi-Modal Perception and Multi-Category Traversability in Outdoor Campus Environments |
|
Liang, Jing | University of Maryland |
Das, Dibyendu | George Mason University |
Song, Daeun | George Mason University |
Shuvo, Md Nahid Hasan | George Mason University |
Durrani, Mohammad | University of Maryland |
Taranath, Karthik | University of Maryland |
Penskiy, Ivan | University of Maryland, College Park |
Manocha, Dinesh | University of Maryland |
Xiao, Xuesu | George Mason University |
Keywords: Data Sets for Robot Learning, Motion and Path Planning, Integrated Planning and Learning
Abstract: Navigating large-scale outdoor environments, e.g., university campuses, requires complex reasoning in terms of geometric structures, environmental semantics, and terrain characteristics using onboard sensors like LiDARs and cameras. Although existing mobile robots can navigate such environments using pre-defined, high-precision maps based on hand-crafted rules catered for every environment, they lack commonsense reasoning capabilities that most humans possess when navigating unknown outdoor spaces. To equip robots with such capabilities, we propose a large-scale Global Navigation Dataset, GND, which incorporates multi-modal sensory data (3D LiDAR point clouds and RGB and 360textdegree~images) and multi-category traversability maps (pedestrian walkways, vehicle roadways, stairs, off-road terrain, and obstacles) from ten university campuses. We also present a set of novel use cases of GND to showcase its utility to enable global robot navigation. GND's website can be found at https://cs.gmu.edu/~xiao/Research/GND/.
|
|
11:25-11:30, Paper TuBT22.3 | Add to My Program |
VLM-GroNav: Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments |
|
Elnoor, Mohamed | University of Maryland |
Kulathun Mudiyanselage, Kasun Weerakoon | University of Maryland, College Park |
Seneviratne, Gershom Devake | University of Maryland, College Park |
Xian, Ruiqi | University of Maryland-College Park |
Guan, Tianrui | University of Maryland |
M Jaffar, Mohamed Khalid | University of Maryland, College Park |
Rajagopal, Vignesh | University of Maryland, College Park |
Manocha, Dinesh | University of Maryland |
Keywords: Vision-Based Navigation, Motion and Path Planning, Perception-Action Coupling
Abstract: We present a novel autonomous robot navigation algorithm for outdoor environments that is capable of handling diverse terrain traversability conditions. Our approach, VLM-GroNav, uses vision-language models (VLMs) and integrates them with physical grounding that is used to assess intrinsic terrain properties such as deformability and slipperiness. We use proprioceptive-based sensing, which provides direct measurements of these physical properties, and enhances the overall semantic understanding of the terrains. Our formulation uses in-context learning to ground the VLM’s semantic understanding with proprioceptive data to allow dynamic updates of traversability estimates based on the robot’s real-time physical interactions with the environment. We use the updated traversability estimations to inform both the local and global planners for real-time trajectory replanning. We validate our method on a legged robot (Ghost Vision 60) and a wheeled robot (Clearpath Husky), in diverse real-world outdoor environments with different deformable and slippery terrains. In practice, we observe significant improvements over state-of-the-art methods by up to 50% increase in navigation success rate.
|
|
11:30-11:35, Paper TuBT22.4 | Add to My Program |
TANGO: Traversability-Aware Navigation with Local Metric Control for Topological Goals |
|
Podgorski, Stefan | University of Adelaide |
Garg, Sourav | University of Adelaide |
Hosseinzadeh, Mehdi | The Australian Institute for Machine Learning (AIML) -- the Univ |
Mares, Lachlan | University of Adelaide |
Dayoub, Feras | The University of Adelaide |
Reid, Ian | University of Adelaide |
Keywords: Learning from Demonstration, Machine Learning for Robot Control, Vision-Based Navigation
Abstract: Visual navigation in robotics traditionally relies on globally-consistent 3D maps or learned controllers, which can be computationally expensive and difficult to generalize across diverse environments. In this work, we present a novel RGB-only, object-level topometric navigation pipeline that enables zero-shot, long-horizon robot navigation without requiring 3D maps or pre-trained controllers. Our approach integrates global topological path planning with local metric trajectory control, allowing the robot to navigate towards object-level subgoals while avoiding obstacles. We address key limitations of previous methods by continuously predicting local trajectory rollout using monocular depth and traversability estimation, and incorporating an auto-switching mechanism that falls back to a baseline controller when necessary. The system operates using foundational models, ensuring open-set applicability without the need for domain-specific fine-tuning. We demonstrate the effectiveness of our method in both simulated environments and real-world tests, highlighting its robustness and deployability. Our approach outperforms existing state-of-the-art methods, offering a more adaptable and effective solution for visual navigation in open-set environments. The source code is made publicly available: url{https://github.com/podgorki/TANGO}.
|
|
11:35-11:40, Paper TuBT22.5 | Add to My Program |
NavFormer: A Transformer Architecture for Robot Target-Driven Navigation in Unknown and Dynamic Environments |
|
Wang, Haitong | University of Toronto |
Tan, Aaron Hao | University of Toronto |
Nejat, Goldie | University of Toronto |
Keywords: Vision-Based Navigation, Search and Rescue Robots, AI-Enabled Robotics
Abstract: In unknown cluttered and dynamic environments such as disaster scenes, mobile robots need to perform target-driven navigation in order to find people or objects of interest, where the only information provided about these targets are images of the individual targets. In this paper, we introduce NavFormer, a novel end-to-end transformer architecture developed for robot target-driven navigation in unknown and dynamic environments. NavFormer leverages the strengths of both 1) transformers for sequential data processing and 2) self-supervised learning (SSL) for visual representation to reason about spatial layouts and to perform collision-avoidance in dynamic settings. The architecture uniquely combines dual-visual encoders consisting of a static encoder for extracting invariant environment features for spatial reasoning, and a general encoder for dynamic obstacle avoidance. The primary robot navigation task is decomposed into two sub-tasks for training: single robot exploration and multi-robot collision avoidance. We perform cross-task training to enable the transfer of learned skills to the complex primary navigation task. Simulated experiments demonstrate that NavFormer can effectively navigate a mobile robot in diverse unknown environments, outperforming existing state-of-the-art methods. A comprehensive ablation study is performed to evaluate the impact of the main design choices of NavFormer. Furthermore, real-world experiments validate the generalizability of NavFormer.
|
|
11:40-11:45, Paper TuBT22.6 | Add to My Program |
Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy |
|
Kim, Yunho | Neuromeka |
Lee, Jeong Hyun | Korea Advanced Institute of Science & Technology (KAIST) |
Lee, Choongin | KAIST |
Mun, Juhyeok | Korea Advanced Institute of Science and Technology |
Youm, Donghoon | Korea Advanced Institute of Science and Technology |
Park, Jeongsoo | KAIST |
Hwangbo, Jemin | Korean Advanced Institute of Science and Technology |
Keywords: Vision-Based Navigation, Semantic Scene Understanding, Deep Learning for Visual Perception
Abstract: For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves manual data collection with the target robot and annotation by human labelers which is prohibitively expensive and unscalable. In this work, we present an effective methodology for training a semantic traversability estimator using egocentric videos and an automated annotation process. Egocentric videos are collected from a camera mounted on a pedestrian's chest. The dataset for training the semantic traversability estimator is then automatically generated by extracting semantically traversable regions in each video frame using a recent foundation model in image segmentation and its prompting technique. Extensive experiments with videos taken across several countries and cities, covering diverse urban scenarios, demonstrate the high scalability and generalizability of the proposed annotation method. Furthermore, performance analysis and real-world deployment for autonomous robot navigation showcase that the trained semantic traversability estimator is highly accurate, able to handle diverse camera viewpoints, computationally light, and real-world applicable. The summary video is available at https://youtu.be/EUVoH-wA-lA.
|
|
TuBT23 Regular Session, 412 |
Add to My Program |
Autonomous Vehicle Navigation 2 |
|
|
Chair: Zhou, Lifeng | Drexel University |
Co-Chair: Yang, Yi | Beijing Institute of Technology |
|
11:15-11:20, Paper TuBT23.1 | Add to My Program |
Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent |
|
Chen, Yuxiao | Nvidia Research |
Tonkens, Sander | University of California - San Diego |
Pavone, Marco | Stanford University |
Keywords: Autonomous Vehicle Navigation, AI-Based Methods, Intelligent Transportation Systems
Abstract: Adept traffic models are critical to both real-time prediction/planning and closed-loop simulation for autonomous vehicles (AV). Key design objectives include accuracy, diverse multimodal behaviors, interpretability, and compatibility with other modules in the autonomy stack, e.g., the downstream planner. We present Categorical Traffic Transformer (CTT), a traffic model that outputs both continuous trajectory predictions and categorical predictions with clear semantic meanings (lane modes, homotopies, etc.). The most outstanding feature of CTT is its fully interpretable latent space, which enables direct supervision of the latent variables from the ground truth during training and avoids mode collapse completely. As a result, CTT can generate diverse behaviors conditioned on different semantic modes while significantly beating SOTA on prediction accuracy. In addition, CTT's ability to input and output tokens enables direct integration with semantic-heavy modules such as behavior planners and language models, bridging the tokenized representation and the continuous trajectory space.
|
|
11:20-11:25, Paper TuBT23.2 | Add to My Program |
LACNS: Language-Assisted Continuous Navigation in Structured Spaces |
|
Peng, RuTong | Beijing Institute of Technology |
Zhang, Yiqing | Beijing Institute of Technology |
Yang, Yi | Beijing Institute of Technology |
Fu, Mengyin | Beijing Institute of Technology |
Keywords: Autonomous Vehicle Navigation, Intelligent Transportation Systems
Abstract: Current autonomous driving technology typically relies on high-precision (HD) maps to ensure safe, reliable, and accurate navigation in urban environments. While these maps provide essential road information, their creation and maintenance are costly, limiting their widespread application. To mitigate this reliance, we propose a novel system, Language-Assisted Continuous Navigation in Structured Spaces (LACNS). LACNS facilitates autonomous driving without the need for HD maps by integrating vehicle-centric local perception with real-time language commands from map software or human navigators. LACNS begins by generating a BEV map using the vehicle's front-facing camera. Simultaneously, a pre-trained Visual Language Model (VLM) detects intersections from the camera images, assigning a score to each. Road elements are then extracted from the BEV map and combined with the intersection scores to identify potential navigation frontiers. Language instructions, processed by a pre-trained Large Language Model(LLM), are used to select the most suitable frontier. Finally, the chosen frontier and BEV map are employed to plan a safe route and control the vehicle's movement. We evaluated LACNS using the Carla simulator to validate its navigation capabilities in continuous spaces. Initial tests involved navigating through four intersections with varying directional commands, where LACNS demonstrated high and consistent success rates across multiple trials. Further simulations in real-time navigation scenarios revealed that LACNS consistently maintained a high success rate across three progressively challenging routes. These results highlight the effectiveness of our novel HD map-independent autonomous driving navigation method.
|
|
11:25-11:30, Paper TuBT23.3 | Add to My Program |
Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset and Consensus-Based Models |
|
Wu, Fangyu | UC Berkeley |
Wang, Dequan | UC Berkeley |
Hwang, Minjune | Stanford University |
Hao, Chenhui | UC Berkeley |
Lu, Jiawei | UC Berkeley |
Zhang, Jiamu | UC Berkeley |
Chou, Christopher | UC Berkeley |
Darrell, Trevor | UC Berkeley |
Bayen, Alexandre | UC Berkeley |
Keywords: Autonomous Vehicle Navigation, Collision Avoidance, Distributed Robot Systems
Abstract: A significant portion of roads, particularly in densely populated developing countries, lacks explicitly defined right-of-way rules. These understructured roads pose substantial challenges for autonomous vehicle motion planning, where efficient and safe navigation relies on understanding decentralized human coordination for collision avoidance. This coordination, often termed "social driving etiquette," remains underexplored due to limited open-source empirical data and suitable modeling frameworks. In this paper, we present a novel dataset and modeling framework designed to study motion planning in these understructured environments. The dataset includes 20 aerial videos of representative scenarios, an image dataset for training vehicle detection models, and a development kit for vehicle trajectory estimation. We demonstrate that a consensus-based modeling approach can effectively explain the emergence of priority orders observed in our dataset, and is therefore a viable framework for decentralized collision avoidance planning.
|
|
11:30-11:35, Paper TuBT23.4 | Add to My Program |
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction |
|
Suhwan, Choi | Maum.AI |
Cho, Yongjun | MaumAI |
Kim, Minchan | Seoul National University |
Jung, Jaeyoon | MAUM.AI |
Joe, Myunchul | MAUM.AI, Inc |
Park, Yu Been | MaumAI |
Kim, Minseo | Yonsei University |
Kim, Sungwoong | Yonsei University |
Lee, Sungjae | Yonsei University |
Park, Whiseong | Maumai |
Chung, Jiwan | Yonsei University |
Yu, Youngjae | Yonsei University |
Keywords: Autonomous Vehicle Navigation, Human-Robot Collaboration, Imitation Learning
Abstract: Real-life robot navigation involves more than just reaching a destination; it requires optimizing movements while addressing scenario-specific goals. An intuitive way for humans to express these goals is through abstract cues like verbal commands or rough sketches. Such human guidance may lack details or be noisy. Nonetheless, we expect robots to navigate as intended. For robots to interpret and execute these abstract instructions in line with human expectations, they must share a common understanding of basic navigation concepts with humans. To this end, we introduce CANVAS, a novel framework that combines visual and linguistic instructions for commonsense-aware navigation. Its success is driven by imitation learning, enabling the robot to learn from human navigation behavior. We present COMMAND, a comprehensive dataset with human-annotated navigation results, spanning over 48 hours and 219 km, designed to train commonsense-aware navigation systems in simulated environments. Our experiments show that CANVAS outperforms the strong rule-based system ROS NavStack across all environments, demonstrating superior performance with noisy instructions. Notably, in the orchard environment, where ROS NavStack records a 0% total success rate, CANVAS achieves a total success rate of 67%. CANVAS also closely aligns with human demonstrations and commonsense constraints, even in unseen environments. Furthermore, real-world deployment of CANVAS showcases impressive Sim2Real transfer with a total success rate of 69%, highlighting the potential of learning from human demonstrations in simulated environments for real-world applications.
|
|
11:35-11:40, Paper TuBT23.5 | Add to My Program |
BETTY Dataset: A Multi-Modal Dataset for Full-Stack Autonomy |
|
Nye, Micah | Carnegie Mellon University |
Raji, Ayoub | University of Modena and Reggio Emilia |
Saba, Andrew | Carnegie Mellon University |
Erlich, Eidan | University of Waterloo |
Exley, Robert | University of Pittsburgh |
Goyal, Aragya | University of Pittsburgh |
Matros, Alexander | University of Waterloo |
Misra, Ritesh | University of Pittsburgh |
Sivaprakasam, Matthew | Carnegie Mellon University |
Marko, Bertogna | Unimore |
Ramanan, Deva | Carnegie Mellon University |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: Autonomous Vehicle Navigation, Data Sets for Robot Learning, Dynamics
Abstract: We present the BETTY dataset, a large-scale, multi-modal dataset collected on several autonomous racing vehicles, targeting supervised and self-supervised state estimation, dynamics modeling, motion forecasting, perception, and more. Existing large-scale datasets, especially autonomous vehicle datasets, focus primarily on supervised perception, planning, and motion forecasting tasks. Our work enables multi-modal, data-driven methods by including all sensor inputs and the outputs from the software stack, along with semantic metadata and ground truth information. The dataset encompasses 4 years of data, currently comprising of over 13 hours and 32 TB, collected on autonomous racing vehicle platforms. This data spans 6 diverse racing environments, including high-speed oval courses, for single and multi-agent algorithm evaluation in feature-sparse scenarios, as well as high-speed road courses with high longitudinal and lateral accelerations and tight, GPS-denied environments. It captures highly dynamic states, such as 63 m/s crashes, loss of tire traction, and operation at the limit of stability. By offering a large breadth of cross-modal and dynamic data, the BETTY dataset enables the training and testing of full autonomy stack pipelines, pushing the performance of all algorithms to the limits. The current dataset is available at https://pitt-mit-iac.github.io/betty-dataset/.
|
|
11:40-11:45, Paper TuBT23.6 | Add to My Program |
LiCS: Navigation Using Learned-Imitation on Cluttered Space |
|
Damanik, Joshua Julian | KAIST |
Jung, Jaewon | KAIST |
Deresa, Chala Adane | KAIST |
Choi, Han-Lim | KAIST |
Keywords: Imitation Learning, Constrained Motion Planning, Autonomous Vehicle Navigation
Abstract: This work proposes a robust and fast navigation system in a narrow indoor environment for UGV (Unmanned Ground Vehicle) using 2D LiDAR. We used behavior cloning with Transformer neural network to learn the optimization-based baseline algorithm. We inject Gaussian noise during expert demonstration to increase the robustness of the learned policy and evaluate the performance of LiCS using both simulation and hardware experiments. It outperforms all other baselines in terms of navigation performance, achieving a success rate 100% in highly cluttered simulated environments. During the hardware experiments, LiCS can maintain safe navigation at maximum speed of 1.5 m/s.
|
|
TuBT24 Regular Session, 401 |
Add to My Program |
Rehabiliation and Ergonomics |
|
|
Chair: Clark, Janelle | UMBC |
Co-Chair: Mimnaugh, Katherine J. | University of Oulu |
|
11:15-11:20, Paper TuBT24.1 | Add to My Program |
Towards Industry 5.0 - a Neuroergonomic Workstation for Human-Centered Cobot-Supported Manual Assembly Process |
|
Knezevic, Nikola | University of Belgrade - School of Electrical Engineering |
Savić, Andrej | University of Belgrade, School of Electrical Engineering |
Gordić, Zaviša | University of Belgrade, School of Electrical Engineering |
Ajoudani, Arash | Istituto Italiano Di Tecnologia |
Jovanovic, Kosta | University of Belgrade |
Keywords: Human-Centered Automation, Human-Robot Collaboration, Assembly
Abstract: This paper brings the concept of neuroergonomic workcell with its essential components (psychological and physical assessment, non-physical, physical and strategic support) for improving the well-being and productivity of workers at their workplaces. A proof-of-concept neuroergonomic human-centered workstation is demonstrated in a real factory environment for a typical industrial laborious task - assembly. The pilot workstation introduces a fully portable, non-invasive EEG-based users’ mental workload assessment, a non-obtrusive human-machine interface, illustrative graphical assembly guidelines, a cobot assistant, and an intelligent task scheduler. The subjects’ performance and workload were assessed using a NASA-TLX questionnaire, three EEG workload indices, hand gesture detection accuracy, number of errors, and task duration. We identified a notable correlation between multiple EEG indices of workload and NASA score results. The new workstation boosts productivity with better performance and fewer errors on the assembly line while reducing mental demand. Its modular design ensures easy integration and adaptation into factory settings, optimizing manual assembly processes.
|
|
11:20-11:25, Paper TuBT24.2 | Add to My Program |
Remote Extended Reality with Markerless Motion Tracking for Sitting Posture Training |
|
Ai, Xupeng | Columbia University in the City of New York |
Agrawal, Sunil | Columbia University |
Keywords: Virtual Reality and Interfaces, Human Performance Augmentation, Rehabilitation Robotics
Abstract: Dynamic postural control during sitting is essential for functional mobility and daily activities. Extended reality (XR) presents a promising solution for posture training in addressing conventional training limitations related to patient accessibility and ecological validity. We developed a remote XR rehabilitation system with markerless motion tracking for sitting posture training. Forty-two healthy subjects participated in this proof-of-concept pilot study. Each subject completed 24 rounds of multi-directional reach tasks using the system and 24 rounds without it. Motion data were collected via online meetings using built-in camera in the user's laptop. Functional reach test scores were analyzed to assess the impact of the system on motor performance. Four standard questionnaires were used to assess the effects of this system on presence, simulator sickness, engagement, and enjoyment. Our results indicate that the remote XR training system significantly improved functional reach performance and proved highly effective for telerehabilitation. XR interaction also enhanced training engagement and enjoyment. By bridging the spatial gap between patients and therapists, this system enables personalized and engaging home-based intervention. Additionally, it facilitates more natural movements by eliminating body marker constraints and laboratory limitations. This study should serve as a stepping stone to advancing novel remote XR rehabilitation systems.
|
|
11:25-11:30, Paper TuBT24.3 | Add to My Program |
Error-Subspace Transform Kalman Filter Based Real-Time Gait Prediction for Rehabilitation Exoskeletons |
|
Zeng, Haozhou | Zhejiang University |
Li, Jiaxing | Zhejiang University |
Gu, Yu | Zhejiang University |
Yi, Jingang | Rutgers University |
Ouyang, Xiaoping | Zhejiang University |
Liu, Tao | Zhejiang University |
Keywords: Prosthetics and Exoskeletons, Physical Human-Robot Interaction, Rehabilitation Robotics
Abstract: With the rapid development of rehabilitation robotics, there is a pressing need for efficient and accurate gait prediction methods. However, due to the complexity and variability of individual gait characteristics and external disturbances, accurately predicting gait in real time remains a significant challenge. This paper proposes an innovative Bayesian-inference-based method for real-time gait prediction while a subject walks with a lower-limb exoskeleton. Periodic gait information is represented using von Mises basis functions, and the weight parameters serve as real-time updated state variables. The error-subspace transform Kalman filter (ESTKF) is applied for gait trajectory prediction. A fully connected neural network (FCNN) is used to estimate the walking speeds in real time based on predicted trajectories. Comparative experiments based on an open-source database prove the advantages of ESKTF compared with other Bayesian filters. Walking experiments are conducted to estimate phase and speed in real time, and to predict the joint angle, total joint torque, and lower-limb muscle surface electromyography (sEMG) values. Experimental results validate the method’s prediction performance across different speeds and demonstrate its resilience to external interference.
|
|
11:30-11:35, Paper TuBT24.4 | Add to My Program |
A Comparative Study of Pulley and Bowden Transmissions in a Novel Cable-Driven Exosuit, the Stillsuit |
|
Jammot, Matthias | ETH Zürich |
Esser, Adrian | ETH Zurich |
Wolf, Peter | ETH Zurich, Institute of Robotics and Intelligent Systems |
Riener, Robert | ETH Zurich |
Basla, Chiara | ETH Zurich |
Keywords: Prosthetics and Exoskeletons, Tendon/Wire Mechanism, Mechanism Design
Abstract: Cable-driven exosuits assist users in ambulatory activities by transmitting assistive torques from motors to the actuated joints. State-of-the-art exosuits typically use Bowden cable transmissions, albeit their limited efficiencies (40–60%) and non-linear response in curved paths. This paper evaluates the efficiency and responsiveness of a new cable-pulley transmission compared to a Bowden transmission, using both steel and Dyneema cables. The analysis includes three experiments: a test bench simulating a curved transmission path, followed by a static and dynamic experiment where six unimpaired participants donned an exosuit featuring both transmissions across the hips and knees. Our findings demonstrate that the pulley transmission consistently outperformed the Bowden’s efficiency by absolute margins of 18.77 ± 7.29% using a steel cable and by 40.60 ± 6.76% using a Dyneema cable across all experiments. Additionally, the steel cable was on average 19.19 ± 5.29% more efficient than the Dyneema cable in the pulley transmission and 41.02 ± 6.34% in the Bowden tube. These results led to the development of the Stillsuit, a novel lower-limb cable-driven exosuit that uses a pulley transmission and steel cable. The Stillsuit sets a new benchmark for exosuits with 87.56 ± 3.92% transmission efficiency, generating similar biological torques to those found in literature (16.4% and 19.0% of the biological knee and hip torques, respectively) while using smaller motors, resulting in a lighter actuation unit (1.92 kg).
|
|
11:35-11:40, Paper TuBT24.5 | Add to My Program |
Rapid Online Learning of Hip Exoskeleton Assistance Preferences |
|
Ramella, Giulia | EPFL |
Ijspeert, Auke | EPFL |
Bouri, Mohamed | EPFL |
Keywords: Prosthetics and Exoskeletons, Wearable Robotics, Physically Assistive Devices
Abstract: Hip exoskeletons are increasing in popularity due to their effectiveness across various scenarios and their ability to adapt to different users. However, personalizing the assistance often requires lengthy tuning procedures and computationally intensive algorithms, and most existing methods do not incorporate user feedback. In this work, we propose a novel approach for rapidly learning users' preferences for hip exoskeleton assistance. We perform pairwise comparisons of distinct randomly generated assistive profiles, and collect participants preferences through active querying. Users' feedback is integrated into a preference-learning algorithm that updates its belief, learns a user-dependent reward function, and changes the assistive torque profiles accordingly. Results from eight healthy subjects display distinct preferred torque profiles, and users' choices remain consistent when compared to a perturbed profile. A comprehensive evaluation of users' preferences reveals a close relationship with individual walking strategies. The tested torque profiles do not disrupt kinematic joint synergies, and participants favor assistive torques that are synchronized with their movements, resulting in lower negative power from the device. This straightforward approach enables the rapid learning of users preferences and rewards, grounding future studies on reward-based human-exoskeleton interaction.
|
|
11:40-11:45, Paper TuBT24.6 | Add to My Program |
A Human-In-The-Loop Simulation Framework for Evaluating Control Strategies in Gait Assistive Robots |
|
Wang, Yifan | Nanyang Technological University |
Chan, Sherwin Stephen | Nanyang Technological University |
Lei, Mingyuan | Nanyang Technological University |
Lim, Lek Syn | Nanyang Technological University |
Johan, Henry | Nanyang Technological University |
Zuo, Bingran | Nanyang Technological University |
Ang, Wei Tech | Nanyang Technological University |
Keywords: Physical Human-Robot Interaction, Human Factors and Human-in-the-Loop, Simulation and Animation
Abstract: As the global population ages, effective rehabilitation and mobility aids will become increasingly critical. Gait assistive robots are promising solutions, but designing adaptable controllers for various impairments poses a significant challenge. This paper presents a Human-In-The-Loop (HITL) simulation framework tailored specifically for gait-assistive robots, addressing unique challenges posed by passive support systems. We incorporate a realistic physical human-robot interaction (pHRI) model to enable a quantitative evaluation of control strategies, highlighting the performance of a speed-adaptive controller compared to a conventional PID controller in maintaining compliance and reducing gait distortion. We assess the accuracy of the simulated interactions against that of the real-world data and reveal discrepancies in the adaptation strategies taken and their effect on the human's gait. This work provides valuable insights into optimizing and evaluating system parameters, emphasizing the potential of HITL simulation as a versatile tool for developing and fine-tuning personalized control policies for various users.
|
|
TuLB2R Poster Session, Hall A1/A2 |
Add to My Program |
Late Breaking Results 2 |
|
|
|
14:45-15:15, Paper TuLB2R.1 | Add to My Program |
Voice Control in Mobile Home Assistive Robots for Older Adults with MCI |
|
Fayzullin, Timofey | University of Massachusetts Lowell |
Reig, Samantha | University of Massachusetts Lowell |
Cabrera, Maria Eugenia | University of Massachusetts Lowell |
Keywords: Human-Centered Automation, Domestic Robotics, Human-Robot Collaboration
Abstract: Assistive technologies for home environments is a growing field. The wide availability of voice assistants that allow people to complete tasks in a “hands-busy, eyes-busy” manner is especially useful for people who have trouble using computers and smartphones. As robotic platforms advance and become more affordable, they are expected to make a similar impact, supporting people with tasks that require a physical embodiment. While both of these technological components show great potential to alleviate the burden of certain at-home tasks, there are technical and usability-related challenges that need to be explored in order to develop and deploy them so populations that need the most assistance in the shortest timelines can benefit. Older adults, particularly those with mild cognitive impairment (MCI), may be more likely to benefit from voice as an input and the physical capabilities of home robots, but less able or willing to learn and reliably use the input structures required to successfully interact with them. In this research, we equip a Hello Robot Stretch with a voice control interface and explore older adults’ responses to the robot’s completion of various tasks. We analyze the results of a formative survey involving videos of the interface and discuss future plans for a between-subject study to compare different voice command structures.
|
|
14:45-15:15, Paper TuLB2R.2 | Add to My Program |
Late Breaking Results on Generation of Metric-Semantic Scene Graphs with Factors Based on Graph Neural Networks |
|
Millan Romera, Jose Andres | University of Luxembourg |
Bavle, Hriday | University of Luxembourg |
Shaheer, Muhammad | University of Luxembourg |
Oswald, Martin R. | University of Amsterdam |
Sanchez-Lopez, Jose Luis | University of Luxembourg |
Voos, Holger | University of Luxembourg |
Keywords: SLAM, Learning Categories and Concepts, Deep Learning for Visual Perception
Abstract: Robots can improve their environmental reasoning by identifying high-level spatial concepts like rooms and walls from observable planes. These concepts enhance: - Geometrical consistency in mapping [1], - Situational awareness in complex environments [2,3], - Explainability through human-aligned knowledge [4]. - Richer input for tasks like planning[5]. Current SLAM systems rely on ad-hoc methods for detecting and defining spatial concepts, limiting the recognition of complex structures. The proposed approach uses two GNNs for the generation of: - Semantic graph: Edge classification + stable community detection + epistemic noise. - Geometric graph: Inference of geometric attributes (centroids). - Factor graph: uncertainty-aware geometric factors. - Real-time complex concept integration in SLAM.
|
|
14:45-15:15, Paper TuLB2R.3 | Add to My Program |
Beyond Violation Types: The Influence of Dispositional Trust on Human-Robot Collaboration after Trust Violations |
|
Mélot-Chesnel, Joséphine | Utrecht University |
Nagy, Timea | Utrecht University |
de Graaf, Maartje | Utrecht University |
Keywords: Acceptability and Trust, Design and Human Factors, Human-Robot Collaboration
Abstract: Robots, like humans, make mistakes that undermine trust, potentially threatening successful collaboration in human-robot teams. Our lab study evaluates three trust repair strategies -- apology, denial, and compensation -- following two types of trust violations -- competence-based and integrity-based. Results show that integrity-based violations cause greater damage to trust and social perception of the robot than competence-based violations, leading to reduced collaboration. This underlines the importance of prioritizing the resolution of integrity-based violations in a robot’s programming, as competence-based violations may be more easily overlooked by users. Building on findings from our previous research, dispositional trust significantly influenced the effectiveness of repair strategies, whereas trust violation types did not. Notably, denial was the most effective strategy for repairing performance trust among individuals with high dispositional trust, while apologies were most effective for repairing honesty trust in individuals with low dispositional trust. These combined results highlight the importance for adaptive trust repair approaches that account for individual human differences, rather than violation types, to foster successful human-robot collaboration continuance.
|
|
14:45-15:15, Paper TuLB2R.4 | Add to My Program |
Compute Reliability Maps and Optimize Reliable Workspace for Redundant Robots Experiencing Locked Joint Failures |
|
Xing, Yuchen | University of Kentucky |
Xie, Biyun | University of Kentucky |
Keywords: Redundant Robots, Kinematics, Failure Detection and Recovery
Abstract: Robot’s failure tolerant workspace is defined as the reachable workspace both before and after an arbitrary joint is locked at an arbitrary angle during a motion. Once the task is located within the failure-tolerant workspace, task completion can be easily guaranteed, and only a simple failure-recovery strategy is needed to complete the remaining path. The existing methods for computing and optimizing failure-tolerant workspace have the following limitations. First, these methods can only be used to compute the 3D positional failure-tolerant workspace, but not the 3D orientational failure-tolerant workspace. Second, all the existing studies optimized the failure-tolerant workspace only considering its volume, where other morphological features of the failure-tolerant workspace are completely ignored. To address these limitations, this work develops a method to compute the 6D reliability map based on the joint failure probabilities and then optimize the reliable workspace of a robot considering its shape and size. The main contributions of this work are as follows: (1) The concept of reliability maps is introduced to show the reliability of various regions within the workspace. (2) A computationally efficient method is developed to compute the 6D reliability map, including both position and orientation workspace. (3) A new metric is proposed to evaluate and optimize the computed reliable workspace, considering both its volume and connectivity.
|
|
14:45-15:15, Paper TuLB2R.5 | Add to My Program |
FACETS: Efficient Constrained Iterative NAS for Object Detection in Robotic Perception |
|
Tran, Tony | University of Houston |
Hu, Bin | University of Houston |
Keywords: Computer Vision for Automation, Embedded Systems for Robotic and Automation, Deep Learning for Visual Perception
Abstract: Neural Architecture Search (NAS) for object detection frameworks typically involves multiple interdependent modules, contributing to a vast search space and high computational cost. Joint optimization across modules is both challenging and expensive, especially under device-specific constraints. We propose FACETS—eFficient Architecture Search for Constrained itEraTive Search—a unified NAS approach that refines all modules iteratively. FACETS alternates between fixing one module’s architecture and optimizing the others, leveraging feedback across cycles to reduce the search space while maintaining inter-module dependencies. This strategy allows FACETS to satisfy computational constraints and reach high-quality solutions more efficiently. In experiments, FACETS finds architectures with up to 4.75% higher accuracy twice as fast as progressive methods and yields a refined search space with candidates up to 27% more accurate than global search and 5% over progressive search via random sampling. Its generality and efficiency make FACETS especially promising for real-time, resource-constrained robotic perception systems across diverse platforms.
|
|
14:45-15:15, Paper TuLB2R.6 | Add to My Program |
Multiple Sensors Based Perception and Limitation for Autonomous Ships |
|
Choi, Hyun-Taek | Korea Research Institute of Ships and Oceans Engineering |
Park, Jeonghong | KRISO |
Choi, Jinwoo | KRISO, Korea Research Institute of Ships & Ocean Engineering |
Kang, Minju | Korea Research Institute of Ships & Ocean Engineering |
Choo, Ki-Beom | Korea Research Institute of Ships & Ocean Engineering(kriso) |
Kim, Jinwhan | KAIST |
Keywords: Intelligent Transportation Systems, Sensor Fusion, Vision-Based Navigation
Abstract: With advances in probabilistic inference, AI, and high-performance computing, autonomous navigation has made great strides. Yet, ships remain uniquely challenged by dynamic ocean environments and long offshore operations. These conditions place high demands on maritime situational awareness systems, which must ensure robust object detection and tracking while remaining cost-effective. This paper presents a multi-object tracking system specifically designed for autonomous ships, grounded in a deep understanding of their distinctive operational characteristics. The proposed architecture integrates multiple detection and navigation sensors with an AI-based perception algorithm and a probabilistic data fusion framework. With modularity and scalability at its core, the system processes sensor data through two specialized stages to ensure optimal performance. Its effectiveness is demonstrated through real-world trials under realistic maritime conditions. Beyond system performance, this study highlights the limitations of relying solely on local awareness systems in commercial maritime operations. It underscores the necessity for a standardized, systematic approach to generating and sharing positional data among vessels—via Next-Generation AIS and inter-ship communication. Drawing inspiration from structured environments in robotics, we suggest that such an approach offers a practical path to balancing operational safety with economic viability.
|
|
14:45-15:15, Paper TuLB2R.7 | Add to My Program |
UAV Path Planning for Multi-Target Wildlife Localization |
|
Mohammadi, Mahsa | Northern Arizona University |
Shafer, Michael | Northern Arizona University |
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Field Robots
Abstract: In recent years, Uncrewed Aerial Vehicles (UAVs) have become essential tools for the high-precision and rapid localization of radio-tagged wildlife, supporting ecological monitoring and conservation. This research presents an online planning algorithm that intelligently selects waypoints for UAVs tasked with localizing multiple animals equipped with Very High Frequency (VHF) tags. The UAV collects bearing measurements at each waypoint and predicts future waypoints to minimize localization uncertainty and total mission time. The algorithm accounts for constraints such as communication range and maintaining a safe distance from animals to reduce disturbance. Performance is evaluated using Monte Carlo simulations, measuring uncertainty reduction, localization accuracy, and mission time. Simulation results demonstrate the potential of our method for fast and precise multi-target tracking under field conditions. This work contributes to the field of autonomous UAV operations, offering valuable insights into efficient robotic path planning and optimization for wildlife tracking applications.
|
|
14:45-15:15, Paper TuLB2R.8 | Add to My Program |
Development of an Automated Jig-Assisted Calibration Framework for UAV IMU Alignment in 3D SLAM |
|
Jung, Eui-Jung | Korea Institute of Robot and Convergence |
Kim, Tae-Hwan | Korea Institute of Robotics & Technology Convergence |
Jeon, Kwang Woo | Korea Institute of Robotics and Technology Convergence |
Chung, Hyun-Joon | Korea Institute of Robotics and Technology Convergence |
Keywords: Calibration and Identification, Kinematics, Engineering for Robotic Systems
Abstract: Accurate coordinate alignment of Inertial Measurement Units (IMUs) is essential for ensuring robust and reliable 3D Simultaneous Localization and Mapping (SLAM) performance in Unmanned Aerial Vehicles (UAVs). However, manual calibration methods are often time-consuming, error-prone, and lack repeatability. This study presents a jig-assisted calibration strategy that enables precise and reproducible alignment of IMU coordinate systems to enhance 3D SLAM accuracy in UAVs. The proposed method involves the design and implementation of a mechanical jig that constrains the UAV and reference sensor into a known geometric relationship. By leveraging rigid-body transformations and optimization techniques, the system effectively estimates and compensates for misalignments between the IMU and the UAV’s navigation frame. Experimental validation was conducted using simulated environments. The results demonstrate significant improvements in SLAM trajectory consistency, localization precision, and map reconstruction fidelity. This jig-based calibration strategy offers a practical, scalable, and low-cost solution for UAV systems requiring high-accuracy sensor integration in GPS-denied environments.
|
|
14:45-15:15, Paper TuLB2R.9 | Add to My Program |
A High-Payload Robotic Hopper Powered by Bidirectional Thrusters |
|
Li, Song | City University of Hong Kong |
Bai, Songnan | City University of Hong Kong |
Jia, Ruihan | City University of Hong Kong |
Cai, Yixi | KTH Royal Institute of Technology |
Ding, Runze | City University of Hongkong |
Shi, Yu | City University of Hong Kong |
Zhang, Fu | University of Hong Kong |
Chirarattananon, Pakpong | City University of Hong Kong |
Keywords: Aerial Systems: Mechanics and Control, Legged Robots, Biologically-Inspired Robots
Abstract: Mobile robots have revolutionized various fields, offering solutions for manipulation, environmental monitoring, and exploration. However, payload capacity remains a limitation. This paper presents a novel thrust-based robotic hopper capable of carrying payloads up to 9 times its own weight while maintaining agile mobility over less structured terrain. The 220-gram robot carries up to 2 kg while hopping—--a capability that bridges the gap between high-payload ground robots and agile aerial platforms. Key advancements that enable this high-payload capacity include the integration of bidirectional thrusters, allowing for both upward and downward thrust generation to enhance energy management while hopping. Additionally, we present a refined model of dynamics that accounts for heavy payload conditions, particularly for large jumps. To address the increased computational demands, we employ a neural network compression technique, ensuring real-time onboard control. The robot's capabilities are demonstrated through a series of experiments, including autonomous navigation while carrying a 730-g LiDAR payload. This showcases the robot's potential for applications such as mobile sensing and mapping in challenging environments.
|
|
14:45-15:15, Paper TuLB2R.10 | Add to My Program |
SADRNN: Spatial Attention Distribution with Recurrent Neural Network for Imitation Learning |
|
Hara, Takumi | Kyoto University |
Sato, Takashi | Kyoto University |
Awano, Hiromitsu | Kyoto University |
Keywords: Imitation Learning, Deep Learning in Grasping and Manipulation, Deep Learning for Visual Perception
Abstract: This paper proposes Spatial Attention Distribution with Recurrent Neural Network (SADRNN), a novel attention mechanism for neural networks in imitation learning, which models spatial attention using a two-dimensional Gaussian distribution. Unlike conventional methods that rely on discrete spatial attention points, SADRNN generates richer spatial attention distributions by extending the attention mechanism to continuous 2D Gaussian distributions.Simulation results using a Franka Panda robotic arm demonstrate that SADRNN achieves a success rate of 88.2% in a lifting task with fewer attention distributions than existing methods. Furthermore, SADRNN reduces the number of MAC (Multiply–Accumulate) operations during inference by 28.9% compared to baseline approaches.Owing to its use of flexible 2D Gaussian distributions, SADRNN is also capable of producing spatial attention with varying widths and orientations, including diagonal patterns, at inference time.
|
|
14:45-15:15, Paper TuLB2R.11 | Add to My Program |
Collaborative Large Language Models for Task Allocation in Construction Robots |
|
Prieto, Samuel A. | New York University Abu Dhabi |
Garcia de Soto, Borja | New York University Abu Dhabi |
Keywords: Robotics and Automation in Construction, Building Automation, AI-Enabled Robotics
Abstract: The construction industry, with its dynamic environments and resource constraints, poses unique challenges for task allocation and management, particularly in integrating robotic automation. This paper introduces a multi-agent framework leveraging Large Language Models (LLMs) to address these challenges. The proposed approach utilizes two collaborative LLM agents, a Planner agent responsible for producing task instructions and a Supervisor agent that validates these instructions for feasibility, aiming to reduce hallucinations and planning mistakes. Through simulated experiments, we demonstrate the system’s effectiveness in coordinating multiple robots for a high-level construction task, with essential considerations for robot battery management and logical task sequencing. Our results indicate that the multi-agent LLM system improves task reliability and accuracy compared to a single-agent configuration. This study advances the application of LLMs in construction robotics and underscores the potential for multi-agent LLM architectures to enhance task allocation in complex, automated workflows.
|
|
14:45-15:15, Paper TuLB2R.12 | Add to My Program |
Gait Optimization for Legged Systems through Mixed Distribution Cross-Entropy Optimization |
|
Tsikelis, Ioannis | Inria |
Chatzilygeroudis, Konstantinos | University of Patras |
Keywords: Legged Robots, Multi-Contact Whole-Body Motion Planning and Control
Abstract: Legged robots are well-suited for real-world tasks, offering strong load-bearing capabilities, autonomy, and mobility over rough terrain. They strike an effective balance between agility and payload capacity, performing well in diverse environments. However, planning and optimizing gaits and contact sequences is highly challenging due to the complexity of their dynamics and the many optimization variables involved. Traditional trajectory optimization methods address this by minimizing cost functions and discovering contact sequences automatically. Yet, they often struggle with the resulting nonlinear and hard-to-solve problems. To overcome these issues, we propose CrEGOpt, a bi-level optimization method that merges trajectory optimization with black-box optimization. At the higher level, CrEGOpt uses the Mixed Distribution Cross-Entropy Method to optimize gait sequences and phase durations, easing the lower-level trajectory optimization. This approach enables fast solutions for complex gait problems. Experiments in simulation show that CrEGOpt can optimize gaits for bipedal, quadrupedal, and hexapod robots in under 10 seconds. This novel bi-level framework opens new possibilities for efficient, automatic contact scheduling in legged robotics.
|
|
14:45-15:15, Paper TuLB2R.13 | Add to My Program |
Model-Structured Neural Networks to Model and Control Robots |
|
Piccinini, Mattia | Technical University of Munich |
Mungiello, Aniello | University of Naples Federico II |
Betz, Johannes | Technical University of Munich |
Papini, Gastone Pietro | University of Trento |
Keywords: AI-Enabled Robotics, Machine Learning for Robot Control, Neural and Fuzzy Control
Abstract: This poster presents model-structured neural networks (MS-NNs), an emerging approach to model and control robotic systems by embedding physical laws directly into neural architectures. MS-NNs extend traditional physics-based models through targeted neuralization, preserving fundamental principles such as the superposition of forces while enhancing the learning potential. We introduce nnodely, an open-source PyTorch library designed to streamline the development of MS-NNs, while supporting other emerging paradigms like physics-informed neural networks (PINNs) and neural ordinary differential equations, making these tools accessible even to non-experts. The library consolidates our work in MS-NNs for autonomous driving, and allows for rapid prototyping of interpretable, domain-informed models. We demonstrate the effectiveness of MS-NNs through two applications: learning the longitudinal dynamics of a RoboRacer car and controlling the steering of a full-scale autonomous race car. In both cases, MS-NNs outperform conventional neural networks in generalization to unseen scenarios with limited training data, showcasing their potential for real-world applications in robotics.
|
|
14:45-15:15, Paper TuLB2R.14 | Add to My Program |
Towards Efficient Navigation in Dense Forests Using Reinforcement Learning and 3D LiDAR |
|
Cancelliere, Francesco | University of Catania |
Keywords: Field Robots, Reinforcement Learning, Vision-Based Navigation
Abstract: In search and rescue operations a fleet of small ground robots could cover more area than one larger robot, at the expense of battery autonomy. To suffice this issue, a neural network for end-to-end navigation that leverages a small network to efficiently navigate dense forest environments has been developed. The network uses relatively sparse 3D LiDAR (100x20 points) as input and has been trained using reinforcement learning in MIDGARD, a photorealistic simulation environment, exploiting curriculum learning to improve training efficiency.
|
|
14:45-15:15, Paper TuLB2R.15 | Add to My Program |
LEGS-POMDP: A Language and Gesture Conditioned Framework for Robot Object Search |
|
He, Ivy | Brown University |
Tellex, Stefanie | Brown |
Liu, Jason Xinyu | Brown University |
Keywords: Gesture, Posture and Facial Expressions, Multi-Modal Perception for HRI, Social HRI
Abstract: Natural human-robot interaction depends on accurately interpreting multimodal instructions, particularly those combining language and gesture. However, existing approaches often assume constrained settings and struggle with ambiguity in human intent. We present LEGS-POMDP, a probabilistic framework for robot object search that integrates multimodal instructions into a joint observation model within a Partially Observable Markov Decision Process. The robot maintains a belief over both object pose and identity, updating it through visual grounding, language input, and a robust gesture cone representation that fuses multiple pose vectors to infer pointing direction. In simulation across environments of varying ambiguity, fusing gesture and language achieved a 76.6% task success rate, significantly outperforming language-only (53.3%), gesture-only (46.7%), and no-instruction (20.0%) baselines. These results demonstrate that interpreting joint observations enables more accurate belief updates, reduces ambiguity, and improves decision-making for goal-directed search. We also deploy the system on the Spot robot in a 20×20 ft real-world indoor environment, where the robot successfully interprets human instructions to navigate, identify, and localize target objects among distractors—confirming the system’s robustness and generalizability beyond tabletop settings.
|
|
14:45-15:15, Paper TuLB2R.16 | Add to My Program |
Learning by Teaching: Enhancing Mathematical Skills by Teaching a Robot |
|
Boucenna, Sofiane | CNRS - Cergy-Pontoise University |
Keywords: Education Robotics, Human-Robot Collaboration
Abstract: In this study, we explore the use of human-robot interaction to strengthen students' mathematical skills. We develop an active pedagogical approach based on the principle of learning by teaching, where the student teaches a robot to solve mathematical exercises. By guiding the robot through the solutions, the student consolidates their own knowledge and reinforces their learning. The experiment was conducted with around ten participants and evaluated using pre- and post-tests as well as a questionnaire measuring the quality of the interaction. Preliminary results indicate an improvement in students' performance on the concepts covered, as well as an overall positive feedback regarding the robot-assisted learning experience.
|
|
14:45-15:15, Paper TuLB2R.17 | Add to My Program |
Unreal Engine Based SONAR Simulator for Marine Robot Research |
|
Kweon, Heekyu | Kyungpook National University |
Joe, Hangil | Kyungpook National University |
Keywords: Marine Robotics, Field Robots, Software, Middleware and Programming Environments
Abstract: There is a consensus that using simulations can reduce the costs associated with developing sensing technology in marine robotics. Recently developed marine simulators have focused on enhancing sensor simulation quality by increasing the rendering quality of environmental details. In this context, we developed a SONAR simulator based on Unreal Engine 5 (UE5), which supports high-fidelity environments and sensor data. UE5 offers high-quality rendered images and facilitates the implementation of large virtual environments. The robust built-in functions in UE5 also facilitated the development of a custom sensor module. We developed a Forward-Looking SONAR (FLS) sensor module using a render-based approach, which exploits high-level rendering quality. For the FLS, we capture the depth image and normal map simultaneously and convert them into range, azimuth, and intensity values to generate a sonar image. To improve the quality of the simulated sensor data, appropriate noise models were applied to the sensor module. To validate the sensor modules, we compared the simulated sensor data with real-world data. We also conducted an experiment and presented an application using ROS. For future work, we are developing additional sensor modules and robot models, aiming to utilize them for various field robot research purposes.
|
|
14:45-15:15, Paper TuLB2R.18 | Add to My Program |
Unsupervised Identification of Motion Primitives Based on physical Constraints in Contact-Rich Tasks |
|
Oishi, Ryoga | Saitama University |
Sakaino, Sho | University of Tsukuba |
Tsuji, Toshiaki | Saitama University |
Keywords: Learning from Demonstration, Task and Motion Planning, Motion and Path Planning
Abstract: This work proposes an automatic method for identifying motion primitives from human motion data by focusing on physical constraints. Motion primitives—fundamental units of movement in robotics—are particularly useful for dividing contact-rich tasks according to physical constraints. The challenge lies in detecting these constraints as they change direction and presence over time. The solution identifies coordinate systems that represent these continuously changing constraints, extracts features along these axes, and clusters them using BOCPD with PCA and GMM. Applied to human insertion tasks, this approach achieved over 90% identification accuracy across different insertion directions, advancing imitation learning for complex robotic tasks.
|
|
14:45-15:15, Paper TuLB2R.19 | Add to My Program |
Design and Control of a Bench-Top Left Ventricular Simulator Reproducing Physiological Pressure-Volume Dynamics |
|
Lee, Junheon | Seoul National University |
Kim, Jiyeop | Seoul National University |
Han, Amy Kyungwon | Seoul National University |
Keywords: Medical Robots and Systems, Mechanism Design, Assembly
Abstract: Cardiovascular disease accounts for nearly 25% of all U.S. deaths and was responsible for 18.6 million global fatalities in 2019. In response, many cardiac assistive devices have been developed, yet their reliance on animal testing raises ethical concerns, and is costly, time-consuming, and frequently faces translational failures. Consequently, ventricular simulators have emerged as promising alternatives. However, due to their method of direct fluid injection into the heart phantom, simultaneously increasing volume and pressure, traditional simulators fail to replicate physiological pressure–volume (PV) dynamics of a functioning heart. We introduce a novel ventricular simulator designed with precise actuation control mechanisms to accurately mimic in vivo PV loops by controlling both pressure and fluid flow externally to the heart phantom. This integrated system features a silicone heart phantom, a syringe motor pump for controlled fluid actuation, a compliance chamber to modulate pressure dynamics, and solenoid valves for directional flow control. The optimized system design facilitates precise parameter adjustments, replicating PV loops of both normal and pathological heart conditions. Our simulator offers a promising alternative for cardiac function evaluation and medical device testing, potentially reducing animal testing reliance and enhancing clinical outcomes.
|
|
14:45-15:15, Paper TuLB2R.20 | Add to My Program |
GLASS: A Global and Local Adaptive Segmentation System for Robust Transparent Object Segmentation |
|
Cheng, Yanchun | National University of Singapore |
Wang, Rundong | National University of Singapore |
Lee, Christina Dao Wen | National University of Singapore |
Ang Jr, Marcelo H | National University of Singapore |
Keywords: Computer Vision for Automation, Deep Learning for Visual Perception, Recognition
Abstract: Transparent object segmentation is a critical yet challenging task in computer vision and autonomous perception systems, primarily due to the absence of distinct textures and ambiguous boundaries. To address these difficulties, we present GLASS, a Global and Local Adaptive Segmentation System that effectively integrates convolutional and Transformer-based representations through a hierarchical dual-path architecture. This design enables the model to simultaneously capture fine-grained local cues and long-range contextual dependencies. To further enhance the model’s discriminative capability around complex contours and transparent edges, we introduce two key components: the Multi-Scale Detail Enhancement Module (MS-DEM), which enriches local structural features at multiple resolutions, and the Bi-directional Adaptive Feature Fusion Module (Bi-AFFM), which dynamically balances global and local representations through a content-aware fusion mechanism. Extensive experiments on two challenging benchmarks, Trans10K-V2 and GSD, demonstrate that GLASS not only achieves state-of-the-art segmentation performance but also exhibits strong robustness to various real-world scenes and efficiency in inference, making it a practical solution for downstream tasks involving transparent object understanding in robotics and augmented reality.
|
|
14:45-15:15, Paper TuLB2R.21 | Add to My Program |
Long-Horizon Locomotion and Manipulation on a Quadrupedal Robot with Multiple LLM Agents |
|
Ouyang, Yutao | Tsinghua University |
Li, Jinhan | Tsinghua University |
Li, Yunfei | Tsinghua University |
Li, Zhongyu | University of California, Berkeley |
Yu, Chao | Tsinghua University |
Sreenath, Koushil | University of California, Berkeley |
Wu, Yi | Tsinghua University |
Keywords: Agent-Based Systems, AI-Enabled Robotics, Deep Learning Methods
Abstract: We present a large language model (LLM) based system to empower quadrupedal robots with problem-solving abilities for long-horizon tasks beyond short-term motions. Long-horizon tasks for quadrupeds are challenging since they require both a high-level understanding of the semantics of the problem for task planning and a broad range of locomotion and manipulation skills to interact with the environment. Our system builds a high-level reasoning layer with large language models, which generates hybrid discrete-continuous plans as robot code from task descriptions. It comprises multiple LLM agents: a semantic planner that sketches a plan, a parameter calculator that predicts arguments in the plan, a code generator that converts the plan into executable robot code, and a replanner that handles execution failures or human interventions. At the low level, we adopt reinforcement learning to train a set of motion planning and control skills to unleash the flexibility of quadrupeds for rich environment interactions. Our system is tested on long-horizon tasks that are infeasible to complete with one single skill. Simulation and real-world experiments show that it successfully figures out multi-step strategies and demonstrates non-trivial behaviors, including building tools or notifying a human for help.
|
|
14:45-15:15, Paper TuLB2R.22 | Add to My Program |
Optimal Sonar Image Selection through Information Analysis for AUV Applications |
|
Sim, Hyeonmin | Kyungpook National University |
Joe, Hangil | Kyungpook National University |
Keywords: Marine Robotics, Field Robots, Software, Middleware and Programming Environments
Abstract: This paper proposes an optimal image selection method for forward-looking sonar (FLS) images aimed at improving odometry estimation for AUV applications. Selecting optimal sonar images is crucial for odometry estimation and seafloor mapping in autonomous underwater vehicles (AUVs) using image registration. However, the inherent characteristics of sonar imagery—such as low signal-to-noise ratio (SNR), shadow effects, and perceptual ambiguity—limit the effectiveness of conventional, feature-based keyframe extraction techniques commonly used in optical imagery. To address these challenges, we propose representing sonar images as information matrices by dividing each image into grids, calculating the entropy of each grid region, arranging the entropy values row-wise, and constructing the information matrices using the Kullback-Leibler (KL) divergence. The change between frames is quantified using cosine similarity between the information matrices, and frames are selected as optimal images when the similarity falls below a predefined threshold. The proposed method was validated through indoor water tank experiments using three objects with different shapes and materials, which demonstrated a clear decreasing trend in similarity. Additionally, both simulation-based evaluations and real-world seafloor data were used to estimate odometry, confirming the practical applicability and effectiveness of the proposed method.
|
|
14:45-15:15, Paper TuLB2R.23 | Add to My Program |
Approximating Human Joint Motion Using a Simple Four-Bar Linkage Mechanism |
|
Kang, Jihun | UST Graduate School |
Park, Seungtae | Korea National University of Science and Technology |
Shin, Wonseok | Korea Institute of Industrial Technology(KITECH) |
Kwon, Suncheol | KITECH |
Ahn, Bummo | Korea Institute of Industrial Technology |
Keywords: Legged Robots, Actuation and Joint Mechanisms, Biologically-Inspired Robots
Abstract: Recent studies have explored human-like walking to achieve natural, efficient, and adaptive motion in robots. Among various mechanical approaches, linkage-based mechanisms have been utilized to approximate human joint trajectories due to their advantages in efficient force transmission, reduced control complexity, and high repeatability under external disturbances. Notable examples include Hoeken’s linkage, the Stephenson III six-bar linkage, and the Theo Jansen mechanism. Although multi-bar linkage systems with more than four bars generate human-like joint trajectories, their integration of multiple joints—such as the hip and knee or the knee and ankle—into a single structure complicates independent control and optimization of each joint. Therefore, this study proposes the four-bar linkage, the simplest form of a linkage mechanism, to approximate each human joint motion individually. Four-bar link lengths were optimized to align the joint trajectory with the target angular range. To improve alignment with the desired trajectory, the slope of the joint angle profile was adjusted during simulation. The effectiveness of this adjustment was then verified through experimental comparison with the simulated motion. This study demonstrates the potential of the proposed four-bar linkage system as a structurally efficient solution for approximating human gait, with applicability to robotic lower-limb systems including assistive devices and bipedal robots.
|
|
14:45-15:15, Paper TuLB2R.24 | Add to My Program |
Adaptive Gaits of TARS: A Sci-Fi Inspired Multimodal Robot |
|
Sripada, Aditya | Nimlbe.ai |
Keywords: Legged Robots, Humanoid and Bipedal Locomotion, Actuation and Joint Mechanisms
Abstract: Science fiction has long inspired robotic design, often depicting unconventional locomotion strategies challenge real-world feasibility. One such example is TARS, the quadrilateral robot from Interstellar, which demonstrates three distinct gaits: (1) a bipedal gait where the inner and outer legs move together, (2) an upright quadrupedal gait, and (3) a high-speed rolling gait. This study explores how these three gaits could enhance mobility in space exploration and whether a robotic system can be designed to transition between them seamlessly. Two robotic designs have been developed to investigate the feasibility of these gaits: TARS v1.0, a 1:20 scale prototype that successfully implements a passive dynamic-inspired bipedal gait, and TARS v2.0, a four-legged version capable of transitioning between all three gaits. Although conventional legged systems such as quadrupeds and bipeds have often been touted as the primary solution for unmanned space exploration that overcomes the limitations of wheeled systems, this narrow perspective might overlook alternative robotic forms and locomotion strategies. This study explores the feasibility of unconventional gaits through rapid prototyping (e.g. 3D printing). Preliminary results confirm the viability of bipedal gait, with ongoing work refining the quadrupedal and rolling modes. Research advances multimodal locomotion and expands robotic design beyond conventional paradigms.
|
|
14:45-15:15, Paper TuLB2R.25 | Add to My Program |
Shape Memory Alloy Based Implantable Device for Heart Failure |
|
Kim, Yongjin | Seoul National University |
Lee, Jeonghyeon | Seoul National University |
Kim, Jeongwon | Seoul National University College of Medicine, SMG-SNU Boramae M |
Cho, Domin | Seoul National University |
Oh, Se Jin | Seoul National University College of Medicine, SMG-SNU Boramae M |
Han, Amy Kyungwon | Seoul National University |
Keywords: Medical Robots and Systems, Soft Robot Applications, Health Care Management
Abstract: Heart failure (HF), particularly acute decompensated heart failure (ADHF), remains a major clinical burden with limited long-term therapeutic strategies. Pharmacological treatments primarily aim to reduce congestion, yet many patients are discharged with residual symptoms, resulting in poor prognosis. Existing device-based therapies, such as balloon catheters, offer temporary relief but require hospitalization, involve direct blood contact, and are unsuitable for long-term use. This study introduces a Shape-Memory-Alloy (SMA) based implantable device designed to externally compress the superior vena cava (SVC) to modulate venous return and alleviate ADHF symptoms. The biocompatible SMA actuators dynamically adjust the cross-sectional area of the SVC without direct blood contact, thereby minimizing thrombosis and endothelial injury. Benchtop testing demonstrated effective blood flow regulation with a 68% reduction in flow rate while maintaining a safe surface temperature (<40 °C). In-vivo validation using a porcine HF model showed a significant reduction in left ventricular preload (18.6% decrease in end-diastolic volume), confirmed by a leftward shift in the pressure-volume loop. These results highlight the potential of this SMA-based device as a minimally invasive, long-term solution for treating ADHF.
|
|
14:45-15:15, Paper TuLB2R.26 | Add to My Program |
Pose-Free Dynamic Reconstruction from Sparse Multi-Camera Sequences |
|
Noh, Jeongho | Seoul National University |
Kim, Ayoung | Seoul National University |
Keywords: Deep Learning for Visual Perception, Multi-Robot SLAM, Semantic Scene Understanding
Abstract: Accurate 3D reconstruction and camera pose tracking are critical for robotic applications such as navigation and manipulation. However, existing methods often degrade in dynamic environments where moving objects introduce pose estimation errors or rely on restrictive assumptions such as fixed extrinsic calibration. While some recent approaches address dynamic scenes, most rely on monocular inputs, which can lead to geometric ambiguity. To address these challenges, we purpose a pose-free, vision-based pipeline for reconstructing dynamic scenes from sparse multi-camera sequences. Our approach leverages the foundation model MASt3R to align RGB images across time and viewpoints without requiring ground-truth camera parameters or depth measurements. To address dynamic environments, we utilize the discrepancy between ego-motion flow and optical flow to segment moving regions and decouple the point cloud into static and dynamic components. We then refine camera poses for static regions using ICP, improving the overall alignment. We validate our pipeline on a custom indoor dataset, where time-synchronized frames from dual cameras are selectively sampled to form sparse sequences reflecting real-world motion. The results demonstrate accurate camera pose estimation and clean scene reconstruction, even in the presence of motion blur and occlusions. This pipeline offers practical feasibility for multi-agent systems in real-world settings where calibration and synchronization are limited.
|
|
14:45-15:15, Paper TuLB2R.27 | Add to My Program |
Embodied Artificial Intelligence for Autonomous Perception, Planning and Control of Soft Grippers |
|
Rao, Varshith | Singapore University of Technology and Design |
Truong, Van Tien | School of Materials Science and Engineering, Nanyang Technologic |
Tan, John | SUTD |
Stalin, Thileepan | Singapore University of Technology and Design |
Kanhere, Elgar | Singapore University of Technology and Design |
Aurobindo, Aaditya | Singapore University of Technology and Design |
Valdivia y Alvarado, Pablo | Singapore University of Technology and Design, MIT |
Keywords: Soft Robot Applications, AI-Enabled Robotics, Perception for Grasping and Manipulation
Abstract: Integrating AI with soft robotics offers a novel approach to enhancing robotic planning and execution. This work explores embodied AI for generalized planning in manipulation tasks using soft grippers. By leveraging large language models (LLMs) and open-set object detection, robots can autonomously generate high-level plans from natural language input. The system features two robotic arms, each with a soft gripper and RGBD camera. Built upon ROS 1 Framework, it includes modules for perception, planning, and control. The perception model performs 2D object detection and 3D localization using image grounding. Multistep perception expands the workspace and improves accuracy, while depth post-processing like Navier-Stokes inpainting preserves edge continuity. User input is processed via NLP to generate a concise prompt. This, along with an annotated image, is sent to a remote vision-enabled LLM, which returns high-level parameterized plans. These are translated into 6-axis joint trajectories using inverse kinematics and Cartesian path planning with MoveIt. Each joint position trajectory is sent to the corresponding joint controller through the robotic arm’s ROS driver. By linking high-level AI planning with real-world execution via soft robotics, this approach minimizes the need for extensive low-level motion planning, boosts automation in dynamic environments, and offers a more user-friendly experience.
|
|
14:45-15:15, Paper TuLB2R.28 | Add to My Program |
Perching and Grasping by a Dual-Purpose Hybrid Gripper for Aerial Robots |
|
Han, Ziyin | University of Illinois at Urbana Champaign |
Cheng, Sheng | University of Illinois Urbana-Champaign |
Gao, Junjie | University of Illinois Urbana-Champaign |
Pham, Tien Hung | Japan Advanced Institute of Science and Technology |
Ho, Van | Japan Advanced Institute of Science and Technology |
Hovakimyan, Naira | University of Illinois at Urbana-Champaign |
Keywords: Aerial Systems: Mechanics and Control, Aerial Systems: Applications, Soft Sensors and Actuators
Abstract: This poster presents a dual-function hybrid gripper that enables aerial robots to have both perching and grasping functionalities, addressing the major limitations of previous solutions that focused on either functionality exclusively. Traditional perching end-effectors lack manipulation capabilities, while aerial grasping platforms often struggle with stable perching. Our approach integrates both functionalities, allowing the drone to land on various shapes of poles and irregular landing spots using a grasped connector. The hybrid gripper preserves the compliance of soft material for adaptive contact while incorporating a rigid shell to ensure sufficient load-bearing capacity. Experimental validation demonstrates successful perching on complex geometries and securely grasping diverse objects, highlighting the advantages of a unified perching-grasping mechanism.
|
|
14:45-15:15, Paper TuLB2R.29 | Add to My Program |
Development of SAM^3: A Cable-Suspended Aerial Manipulator with a Moving Mass |
|
Yun, Tae Ho | KAIST |
Han, Wonjun | Korea Advanced Institute of Science and Technology |
Kim, Min Jun | KAIST |
Keywords: Aerial Systems: Mechanics and Control, Mechanism Design, Aerial Systems: Applications
Abstract: Cable-suspended aerial manipulators (SAMs), whose bases are supported by tensioned cables and whose manipulators operate designated tasks, overcome the limitations of many aerial manipulators, such as restricted operation time and limited payload capacity, and enable aerial manipulation tasks. However, most SAM designs are confined to rotor-based actuation for oscillation damping and attitude control, which generates loud noise and strong airflow during operation, hindering integration into daily environments. Moreover, rotor-based systems require continuous base moments depending on attitude, posing challenges in terms of energy efficiency. This study proposes a SAM system that incorporates internal actuation mechanisms, namely moving masses and reaction wheels, to mitigate the drawbacks of rotor-based designs and reduce the energy required for attitude control.
|
|
14:45-15:15, Paper TuLB2R.30 | Add to My Program |
Enhanced TEB Planner for Narrow Passages |
|
Lee, Hahjin | Ewha Womans University |
Kim, Young J. | Ewha Womans University |
Keywords: Motion and Path Planning
Abstract: Traditional path-planning methods often struggle to find optimal trajectories in narrow passages, leading to inefficient or failed maneuvers. In this study, we propose an enhanced Timed-Elastic-Band (TEB) algorithm that significantly improves obstacle avoidance in constrained environments. Our pipeline begins by randomly sampling robot configurations, which are filtered based on their proximity to the global path, generated using the A* algorithm. Next, we evaluate the clearance of each sampled configuration from nearby obstacles. If the configuration has low clearance from obstacles, it proceeds to the Medial Axis Searcher; otherwise, it is directly optimized through hyper-graph Optimization. For configurations located near obstacles, a grid search is conducted to identify the local maximum in the distance map, effectively locating a sample on the medial axis of the free space. The medial-axis configurations are then used as positional constraints in a hyper-graph optimization framework for the final local path. We evaluate our pipeline in a simulated environment across multiple narrow passage scenarios. Results show that the proposed algorithm improves success rate by 27.5% with only an 8.1% increase in average planning time compared to TEB, demonstrating its effectiveness.
|
|
14:45-15:15, Paper TuLB2R.31 | Add to My Program |
Assistive Grasp: Smart Object Tracking for Robotic Manipulator |
|
Shahria, Md Tanzil | University of Wisconsin-Milwaukee |
Rahman, Mohammad | University of Wisconsin-Milwaukee |
Keywords: AI-Based Methods, Human-Centered Automation, Computer Vision for Automation
Abstract: In the U.S., over 6.8 million people live with mobility-related disabilities, and nearly one-third of wheelchair users rely on daily assistance for essential tasks. With caregivers typically needed for 4–6 hours a day, families face growing physical and financial strain — all while the caregiver workforce continues to shrink. As a result, assistive technologies have emerged as alternatives, aiming to restore independence for individuals with disabilities. However, many solutions remain underutilized due to their operational complexity. For instance, wheelchair-mounted assistive robotic manipulators offer promising potential for tasks like object grasping. Yet, they demand significant cognitive and physical effort from users, which can be a frustrating and time-consuming process. To address this gap, we introduce a vision-guided system to make this easier. The robot detects and tracks the target object, aligns itself, and leaves the user with a simple task: command the gripper to grasp. Users can still fine-tune the position. The system also adapts to target motion mid-task. The system combines YOLOv8m for detection, DeepSORT for tracking, RealSense depth data for 3D localization, and PID-based planning for smooth control. It achieved an 85% success rate in trials, with a 40% reduction in user control time compared to manual operation. This work contributes toward making assistive manipulation more accessible, efficient, and user-friendly for wheelchair-bound individuals.
|
|
TuCT1 Regular Session, 302 |
Add to My Program |
Award Finalists 3 |
|
|
Chair: Vasudevan, Ram | University of Michigan |
Co-Chair: Tapia, Lydia | University of New Mexico |
|
15:15-15:20, Paper TuCT1.1 | Add to My Program |
Individual and Collective Behaviors in Soft Robot Worms Inspired by Living Worm Blobs |
|
Kaeser, Carina | Student |
Kwon, Junghan | Pusan National University |
Challita, Elio | Harvard University |
Tuazon, Harry | Georgia Institute of Technology |
Wood, Robert | Harvard University |
Bhamla, Saad | Georgia Institute of Technology |
Werfel, Justin | Harvard University |
Keywords: Swarm Robotics, Biologically-Inspired Robots, Soft Robot Applications
Abstract: California blackworms constitute a recently identified animal system exhibiting unusual collective behaviors, in which dozens to thousands of worms entangle to form a "blob" capable of actions like locomotion as an aggregate. In this paper we describe a system of pneumatic soft robots inspired by the blackworms, intended for the study of collective behaviors enabled and mediated by such physical entanglement. Both the robots and worms have high aspect ratio (≳1:50), intertwine in complex 3D configurations, operate both in air and underwater, and can locomote both individually and as a collective. We demonstrate and characterize locomotion for both individual robots and entangled blobs, explore the tunability of entanglement strength, and compare these to the analogous versions in living worms. The robots provide a testbed for studying mechanisms underlying behaviors observed in worm blobs, as well as serving as a platform for studies of novel collective behaviors based on physical entanglement.
|
|
15:20-15:25, Paper TuCT1.2 | Add to My Program |
Informed Repurposing of Quadruped Legs for New Tasks |
|
Chen, Fuchen | Arizona State University |
Aukes, Daniel | Arizona State University |
Keywords: Mechanism Design, Legged Robots, Compliant Joints and Mechanisms
Abstract: Redesigning and remanufacturing robots are infeasible for resource-constrained environments like space or undersea. This work thus studies how to evaluate and repurpose existing, complementary, quadruped legs for new tasks. We implement this approach on 15 robot designs generated from combining six pre-selected leg designs. The performance maps for force-based locomotion tasks like pulling, pushing, and carrying objects are constructed via a learned policy that works across all designs and adapts to the limits of each. Performance predictions agree well with real-world validation results. The robot can locomote at 0.5 body lengths per second while exerting a force that is almost 60% of its weight.
|
|
15:25-15:30, Paper TuCT1.3 | Add to My Program |
Intelligent Self-Healing Artificial Muscle: Mechanisms for Damage Detection and Autonomous Repair of Puncture Damage in Soft Robotics |
|
Krings, Ethan | University of Nebraska-Lincoln |
McManigal, Patrick | University of Nebraska-Lincoln |
Markvicka, Eric | University of Nebraska-Lincoln |
Keywords: Soft Robot Materials and Design, Soft Sensors and Actuators, Soft Robot Applications
Abstract: Soft robotics are characterized by their high deformability, mechanical robustness, and inherent resistance to damage. These unique properties present exciting new opportunities to enhance both emerging and existing fields such as healthcare, manufacturing, and exploration. However, to function effectively in unstructured environments, these technologies must be able to withstand the same real-world conditions that human skin and other soft biological materials are typically subjected to. Here, we present a novel soft material architecture designed for active detection of material damage and autonomous repair in soft robotic actuators. By integrating liquid metal (LM) microdroplets within a silicone elastomer, the system can detect and localize damage through the formation of conductive pathways that arise from extreme pressure or puncture events. These newly formed conductive networks function as in situ Joule heating elements, facilitating the reprocessing and healing of the material. The architecture allows for the reconfiguration of the newly formed electrical network using high current densities, employing electromigration and thermal mechanisms to restore functionality without manual intervention. This innovative approach not only enhances the resilience and performance of soft materials but also supports a wide range of applications in soft robotics and wearable technologies, where adaptive and autonomous systems are crucial for operation in dynamic and unpredictable environments.
|
|
15:30-15:35, Paper TuCT1.4 | Add to My Program |
SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models |
|
Wu, Yi | Purdue University |
Xiong, Zikang | Deeproute.ai |
Hu, Yiran | Purdue University |
Iyengar, Shreyash Sridhar | Purdue University |
Jiang, Nan | Purdue University |
Bera, Aniket | Purdue University |
Tan, Lin | Purdue University |
Jagannathan, Suresh | Purdue University |
Keywords: AI-Based Methods, Autonomous Agents, Agent-Based Systems
Abstract: Despite significant advancements in large language models (LLMs) that enhance robot agents’ understanding and execution of natural language (NL) commands, ensuring the agents adhere to user-specified constraints remains challenging, particularly for complex commands and long-horizon tasks. To address this challenge, we present three key insights, equivalence voting, constrained decoding, and domain-specific fine-tuning, which significantly enhance LLM planners’ capability in handling complex tasks. Equivalence voting ensures consistency by generating and sampling multiple Linear Temporal Logic (LTL) formulas from NL commands, grouping equivalent LTL formulas, and selecting the majority group of formulas as the final LTL formula. Constrained decoding then uses the generated LTL formula to enforce the autoregressive inference of plans, ensuring the generated plans conform to the LTL. Domain-specific fine-tuning customizes LLMs to produce safe and efficient plans within specific task domains. Our approach, Safe Efficient LLM Planner (SELP), combines these insights to create LLM planners to generate plans adhering to user commands with high confidence. We demonstrate the effectiveness and generalizability of SELP across different robot agents and tasks, including drone navigation and robot manipulation. For drone navigation tasks, SELP outperforms state-of-the-art planners by 10.8% in safety rate (i.e., finishing tasks conforming to NL commands) and by 19.8% in plan efficiency. For robot manipulation tasks, SELP achieves 20.4% improvement in safety rate. Our datasets for evaluating NL-to-LTL and robot task planning will be released in github.com/lt-asset/selp.
|
|
15:35-15:40, Paper TuCT1.5 | Add to My Program |
Marginalizing and Conditioning Gaussians Onto Linear Approximations of Smooth Manifolds with Applications in Robotics |
|
Guo, Zi Cong | University of Toronto |
Forbes, James Richard | McGill University |
Barfoot, Timothy | University of Toronto |
Keywords: Probability and Statistical Methods, SLAM, Probabilistic Inference
Abstract: We present closed-form expressions for marginalizing and conditioning Gaussians onto linear manifolds, and demonstrate how to apply these expressions to smooth nonlinear manifolds through linearization. Although marginalization and conditioning onto axis-aligned manifolds are well-established procedures, doing so onto non-axis-aligned manifolds is not as well understood. We demonstrate the utility of our expressions through three applications: 1) approximation of the projected normal distribution, where the quality of our linearized approximation increases as problem nonlinearity decreases; 2) covariance extraction in Koopman SLAM, where our covariances are shown to be consistent on a real-world dataset; and 3) covariance extraction in constrained GTSAM, where our covariances are shown to be consistent in simulation.
|
|
15:40-15:45, Paper TuCT1.6 | Add to My Program |
Dynamic Tube MPC: Learning Error Dynamics with Massively Parallel Simulation for Robust Safety in Practice |
|
Compton, William | California Institute of Technology |
Csomay-Shanklin, Noel | California Institute of Technology |
Johnson, Cole | Georgia Institute of Technology |
Ames, Aaron | California Institute of Technology |
Keywords: Deep Learning Methods, Motion and Path Planning, Robot Safety
Abstract: Safe navigation of cluttered environments is a critical challenge in robotics. It is typically approached by separating the planning and tracking problems, with planning executed on a reduced order model to generate reference trajectories, and control techniques used to track these trajectories on the full order dynamics. Inevitable tracking error necessitates robustification of the nominal plan to ensure safety; in many cases, this is accomplished via worst-case bounding, which ignores the fact that some trajectories of the planning model may be easier to track than others. In this work, we present a novel method leveraging massively parallel simulation to learn a dynamic tube representation, which characterizes tracking performance as a function of actions taken by the planning model. Planning model trajectories are then optimized such that the dynamic tube lies in the free space, allowing a balance between performance and safety to be traded off in real time. The resulting Dynamic Tube MPC is applied to the 3D hopping robot ARCHER, enabling agile and performant navigation of cluttered environments, and safe collision-free traversal of narrow corridors.
|
|
TuCT2 Regular Session, 301 |
Add to My Program |
SLAM 2 |
|
|
Chair: Wang, Chen | University at Buffalo |
Co-Chair: Fallon, Maurice | University of Oxford |
|
15:15-15:20, Paper TuCT2.1 | Add to My Program |
ISLAM: Imperative SLAM |
|
Fu, Taimeng | University at Buffalo |
Su, Shaoshu | State University of New York at Buffalo |
Lu, Yiren | Case Western Reserve University |
Wang, Chen | University at Buffalo |
Keywords: SLAM, Deep Learning Methods
Abstract: Simultaneous Localization and Mapping (SLAM) stands as one of the critical challenges in robot navigation. A SLAM system often consists of a front-end component for motion estimation and a back-end system for eliminating estimation drifts. Recent advancements suggest that data-driven methods are highly effective for front-end tasks, while geometry-based methods continue to be essential in the back-end processes. However, such a decoupled paradigm between the data-driven front-end and geometry-based back-end can lead to sub-optimal performance, consequently reducing the system's capabilities and generalization potential. To solve this problem, we proposed a novel self-supervised imperative learning framework, named imperative SLAM (iSLAM), which fosters reciprocal correction between the front-end and back-end, thus enhancing performance without necessitating any external supervision. Specifically, we formulate the SLAM problem as a bilevel optimization so that the front-end and back-end are bidirectionally connected. As a result, the front-end model can learn global geometric knowledge obtained through pose graph optimization by back-propagating the residuals from the back-end component. We showcase the effectiveness of this new framework through an application of stereo-inertial SLAM. The experiments show that the iSLAM training strategy achieves an accuracy improvement of 22% on average over a baseline model. To the best of our knowledge, iSLAM is the first SLAM system showing that the front-end and back-end components can mutually correct each other in a self-supervised manner.
|
|
15:20-15:25, Paper TuCT2.2 | Add to My Program |
DGS-SLAM: A Visual Dense SLAM Based on Gaussian Splatting in Dynamic Environments |
|
Chen, Yushi | Beijing University of Posts and Telecommunications |
Liu, Haosong | Beijing University of Posts and Telecommunications |
Zhao, Fang | Beijing University of Posts and Telecommunications |
Hong, Yunhan | Beijing University of Posts and Telecommunications |
Yan, Jiaquan | Beijing University of Posts and Telecommunications |
Luo, Haiyong | Institute of Computing Technology, Chinese Academy of Sciences |
Keywords: SLAM, RGB-D Perception, Mapping
Abstract: Visual dense SLAM can facilitate pose estimation and map reconstruction for sensor carriers in unknown environments. However, in uncontrolled environments such as offices, shopping malls, and train stations, frequent occurrences of people walking back and forth or temporary movement of objects within the scene are common. Most existing visual dense SLAM systems do not account for these dynamic factors, leading to localization drift and map distortion. In this paper, we propose DGS-SLAM, a system capable of achieving robust localization and high-fidelity static map reconstruction in dynamic environments. We utilize semantic 3D Gaussians for scene representation, effectively eliminating interference from dynamic objects and refining the reconstruction of static background. We enhance the tracking accuracy and mapping quality of dense SLAM by using a distance distribution-based Gaussian pruning algorithm and implementing a coarse-to-fine tracking strategy with bundle adjustment and differentiable rendering. We perform qualitative and quantitative evaluations on two publicly available dynamic environment datasets. The results indicate that our method effectively reduces the interference caused by dynamic objects, enabling visual dense SLAM to maintain competitive tracking accuracy and mapping performance in dynamic environments.
|
|
15:25-15:30, Paper TuCT2.3 | Add to My Program |
ARS-SLAM: Accurate Robust Spinning LiDAR SLAM for a Quadruped Robot in Large-Scale Scenario |
|
Li, Jiehao | South China University of Technology |
Li, Chenglin | South China Agricultural University |
Chen, Hongkai | South China Agricultural University |
Guo, Haijun | South China Agricultural University |
Luo, Xiwen | South China Agricultural University |
Chen, C. L. Philip | South China University of Technology |
Yang, Chenguang | University of Liverpool |
Keywords: SLAM, Legged Robots, Mapping
Abstract: It is challenging to employ a quadruped robot for real-time mapping and positioning in a large range of scenes. Due to the large vibration and instability of the quadruped robot in the process of motion and a large amount of calculation when expressing a large range of dense scenes, the accuracy of the drawing construction is poor and the real-time performance is poor. Therefore, we propose an accurate robust spinning LiDAR SLAM (ARS-SLAM) algorithm for a quadruped robot under the large-scale scene. The tightly coupled iterative Kalman filter in FAST-LIO2 is introduced into the front end of the cartographer framework to improve the accuracy and robustness of robot pose estimation. To reduce the computational complexity of the original cartographer framework, a pose threshold optimization algorithm was introduced to effectively remove redundant information from loop detection and improve computational efficiency and real-time performance. We tested the system's performance against the most advanced point-cloud-based methods, LIO-SAM and FAST-LIO2, on a large dataset of large science parks and underground parking lots, and the results show that the proposed system achieves the same or better accuracy and real-time performance.
|
|
15:30-15:35, Paper TuCT2.4 | Add to My Program |
Tightly Coupled Range Inertial Odometry and Mapping with Exact Point Cloud Downsampling |
|
Koide, Kenji | National Institute of Advanced Industrial Science and Technology |
Takanose, Aoki | National Institute of Advanced Industrial Science and Technology |
Oishi, Shuji | National Institute of Advanced Industrial Science and Technology |
Yokozuka, Masashi | Nat. Inst. of Advanced Industrial Science and Technology |
Keywords: SLAM, Localization, Mapping
Abstract: In this work, to facilitate the real-time processing of multi-scan registration error minimization on factor graphs, we devise a point cloud downsampling algorithm based on coreset extraction. This algorithm extracts a subset of the residuals of input points such that the subset yields exactly the same quadratic error function as that of the original set for a given pose. This enables a significant reduction in the number of residuals to be evaluated without approximation errors at the sampling point. Using this algorithm, we devise a complete SLAM framework that consists of odometry estimation based on sliding window optimization and global trajectory optimization based on registration error minimization over the entire map, both of which can run in real time on a standard CPU. The experimental results demonstrate that the proposed framework outperforms state-of-the-art CPU-based SLAM frameworks without the use of GPU acceleration.
|
|
15:35-15:40, Paper TuCT2.5 | Add to My Program |
Scalable Multi-Session Visual SLAM in Large-Scale Scenes with Subgraph Optimization |
|
Pan, Xiaokun | Zhejiang University |
Li, Zhenzhe | Zhejiang University |
Fan, Tianxing | Zhejiang University |
Zhai, Hongjia | Zhejiang University |
Bao, Hujun | Zhejiang University |
Zhang, Guofeng | Zhejiang University |
Keywords: SLAM, Localization, Mapping
Abstract: Multi-session visual SLAM systems enable 6-DoF camera localization along with long-term maintenance and expansion of the global map, by utilizing image data from different sessions. However, in large-scale environments, these systems often suffer from severe scale drift. While modern SLAM systems attempt to maintain global map consistency through loop detection and correction, they still face challenges in terms of convergence and accuracy. In this paper, we propose a robust large-scale multi-session SLAM system for long-term localization and mapping that achieves global consistency. Furthermore, to address the backend optimization problem in large-scale environments, we introduce a hierarchical optimization strategy based on the graph structure. More specifically, a subgraph structure is introduced to reduce the size of problem while effectively propagating scale correction information. In addition, a hierarchical strategy enables coarse-to-fine updates of the graph states. Experimental results not only demonstrate that our method efficiently optimizes the pose graph and maintains map consistency in large-scale environments, but also highlight the effectiveness and scalability of the proposed approach.
|
|
15:40-15:45, Paper TuCT2.6 | Add to My Program |
Accurate and Rapidly-Convergent GNSS/INS/LiDAR Tightly-Coupled Integration Via Invariant EKF Based on Two-Frame Group (I) |
|
Xia, Chunxi | Wuhan University |
Li, Xingxing | Wuhan University |
He, Feiyang | China Ship Development and Design Center |
Li, Shengyu | Wuhan University |
Zhou, Yuxuan | Wuhan University |
Keywords: SLAM, Autonomous Vehicle Navigation
Abstract: Nowadays, increasing attention has been directed toward the integration of global navigation satellite system (GNSS), inertial navigation satellite system (INS), and light detection and ranging (LiDAR) for intelligent system navigation. However, the existing systems, which generally adopt estimators of the extended Kalman filter (EKF) or factor graph optimization (FGO), still face challenges regarding consistency and convergence. Such methods could provide optimal navigation solutions only if the initial guess of the state is sufficiently close to the true trajectory; otherwise, the systems might undergo accuracy loss or even worse, divergence. To address this issue, we derive an invariant extended Kalman filter (IEKF) based on the two-frame group (TFG) in the left-invariant form, and integrate raw GNSS double-differenced observations, inertial measurements, and LiDAR plane features within this framework. By designing a unified group structure that simultaneously maintains both the navigation states and inertial measurement unit (IMU) biases, TFG contributes to the approximate log-linearity and invariance of the system dynamics model, expected to effectively resolve the convergence issue. A set of real-world experiments was conducted to evaluate the system, with results indicating its potential to achieve submeter to centimeter-level positioning accuracy, surpassing state-of-the-art methods in terms of accuracy, availability, and convergence.
|
|
TuCT3 Regular Session, 303 |
Add to My Program |
3D Content Capture and Generation 2 |
|
|
Chair: Elghazaly, Gamal | University of Luxembourg |
Co-Chair: Bano, Sophia | University College London |
|
15:15-15:20, Paper TuCT3.1 | Add to My Program |
Winding Number-Guided Edge-Preserving Implicit Neural Representation of CAD Surfaces |
|
Cheng, Yuhang | Southwest University |
Wang, Zhiyuan | Southwest University |
He, Jialan | Southwest University |
Wang, Xiaogang | Southwest University |
Keywords: Computer Vision for Manufacturing, Semantic Scene Understanding
Abstract: Implicit Neural representations have emerged as a powerful tool for the task of 3D reconstruction due to their excellent performance. However, the existing methods cannot achieve ideal results on CAD models, mainly because they are usually constructed from piecewise smooth surfaces and have sharp edge structure. To this end, we propose a winding number-guided implicit surface reconstruction method, which mainly consists of a winding number-guided regularizer and a dynamic edge sampling strategy. Among them, the winding number-guided regularizer can effectively constrain the global normal consistency of the input raw data, as well as improve the unsatisfactory implicit surface reconstruction result caused by the unavailability of normal information. Meanwhile, in order to reduce the excessive smoothing at sharp edges of implicit surface, we proposed a dynamic edge sampling strategy for sampling near the sharp edge regions of 3D shape, which can effectively avoid the regularizer from smoothing all regions. Finally, we combine them with a simple data term for robust implicit surface reconstruction. Compared with the state-of-the-art methods, experimental results show that our method significantly improves the quality of 3D reconstruction results.
|
|
15:20-15:25, Paper TuCT3.2 | Add to My Program |
A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction |
|
Zhang, Bin | Guangdong University of Technology |
Zeng, Bi | Guangdong University of Technology |
Peng, Zexin | Guangdong University of Technology |
Keywords: Visual Learning
Abstract: In recent years, Neural Radiance Fields (NeRF) has revolutionized three-dimensional (3D) reconstruction with its implicit representation. Building upon NeRF, 3D Gaussian Splatting (3D-GS) has departed from the implicit representation of neural networks and instead directly represents scenes as point clouds with Gaussian-shaped distributions. While this shift has notably elevated the rendering quality and speed of radiance fields but inevitably led to a significant increase in memory usage. Additionally, effectively rendering dynamic scenes in 3D-GS has emerged as a pressing challenge. To address these concerns, this paper proposes a refined 3D Gaussian representation for high-quality dynamic scene reconstruction. Firstly, we use a deformable multi-layer perceptron (MLP) network to capture the dynamic offset of Gaussian points and express the color features of points through hash encoding and a tiny MLP to reduce storage requirements. Subsequently, we introduce a learnable denoising mask coupled with denoising loss to eliminate noise points from the scene, thereby further compressing 3D Gaussian model. Finally, motion noise of points is mitigated through static constraints and motion consistency constraints. Experimental results demonstrate that our method surpasses existing approaches in rendering quality and speed, while significantly reducing the memory usage associated with 3D-GS, making it highly suitable for various tasks such as novel view synthesis, and dynamic mapping.
|
|
15:25-15:30, Paper TuCT3.3 | Add to My Program |
GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene Reconstruction |
|
Xiang, Haodong | Tsinghua University |
Li, Xinghui | Tsinghua University |
Cheng, Kai | USTC |
Lai, Xiansong | Tsinghua University |
Zhang, Wanting | Tsinghua University |
Liao, Zhichao | Tsinghua University |
Zeng, Long | Tsinghua University |
Liu, Xueping | Tsinghua University |
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, Visual Learning
Abstract: Embodied intelligence requires precise reconstruction and rendering to simulate large-scale real-world data. Although 3D Gaussian Splatting (3DGS) has recently demonstrated high-quality results with real-time performance, it still faces challenges in indoor scenes with large, textureless regions, resulting in incomplete and noisy reconstructions due to poor point cloud initialization and underconstrained optimization. Inspired by the continuity of signed distance field (SDF), which naturally has advantages in modeling surfaces, we propose a unified optimization framework that integrates neural signed distance fields (SDFs) with 3DGS for accurate geometry reconstruction and real-time rendering. This framework incorporates a neural SDF field to guide the densification and pruning of Gaussians, enabling Gaussians to model scenes accurately even with poor initialized point clouds. Simultaneously, the geometry represented by Gaussians improves the efficiency of the SDF field by piloting its point sampling. Additionally, we introduce two regularization terms based on normal and edge priors to resolve geometric ambiguities in textureless areas and enhance detail accuracy. Extensive experiments in ScanNet and ScanNet++ show that our method achieves state-of-the-art performance in both surface reconstruction and novel view synthesis.
|
|
15:30-15:35, Paper TuCT3.4 | Add to My Program |
Hide-In-Motion: Embedding Steganographic Copyright Information into 4D Gaussian Splatting Assets |
|
Liu, Hengyu | The Chinese University of Hong Kong |
Li, Chenxin | Chinese University of Hong Kong |
Pan, Wentao | The Chinese University of Hong Kong |
Yang, Zhiqin | The Chinese University of Hong Kong |
Yang, Yifeng | Shanghai Jiao Tong University |
Liu, Yifan | Chinese University of Hong Kong |
Li, Wuyang | City University of Hong Kong |
Yuan, Yixuan | Chinese University of Hong Kong |
Keywords: Visual Learning, Deep Learning for Visual Perception, RGB-D Perception
Abstract: As 4D extensions of 3D Gaussian Splatting (4D-GS) emerge as groundbreaking techniques for dynamic scene reconstruction and novel view synthesis in robotics and computer vision, ensuring the security and trustworthiness of these assets becomes crucial. While steganography has advanced significantly in 2D and 3D media, existing methods are inadequate for the complex, dynamic nature of 4D-GS representations. To address this gap, we propose name, a novel 4D steganography method for hiding information through deformation in Gaussian splatting. Our approach introduces a composite attribute and a Decouple Feature Field for coarse-to-fine deformation modeling and embedding implicit information, along with an Opacity-Guided Adaptive strategy. name~overcomes the limitations of previous techniques, enhancing both the robustness of embedded information and the quality of 4D reconstruction. Extensive evaluations demonstrate that our method successfully embeds and recovers implicit information across various modalities while maintaining high rendering quality in dynamic scenes. This work not only advances copyright protection and secure data transmission for 4D assets but also paves the way for enhancing the security and integrity of 4D digital assets. Code is available at https://github.com/CUHK-AIM-Group/Hide-in-Motion.
|
|
15:35-15:40, Paper TuCT3.5 | Add to My Program |
DENSER: 3D Gaussian Splatting for Scene Reconstruction of Dynamic Urban Environments |
|
Mohamad, Mahmud Ali | University of Luxembourg |
Elghazaly, Gamal | University of Luxembourg |
Hubert, Arthur | University of Luxembourg |
Frank, Raphael | University of Luxembourg |
Keywords: Intelligent Transportation Systems, Computer Vision for Transportation
Abstract: This paper presents DENSER, a framework leveraging 3D Gaussian splatting (3DGS) for the reconstruction of dynamic urban environments. While several methods for photorealistic scene representations, both implicitly using neural radiance fields (NeRF) and explicitly using 3DGS have shown promising results in scene reconstruction of relatively complex dynamic scenes, modeling the dynamic appearance of foreground objects tends to be challenging, limiting the applicability of these methods to capture subtleties and details of the scenes, especially for dynamic objects. To this end, we propose DENSER, a framework that significantly enhances the representation of dynamic objects and accurately models the appearance of dynamic objects in the driving scene. Instead of directly using Spherical Harmonics (SH) to model the appearance of dynamic objects, we introduce and integrate a new method aiming at dynamically estimating SH bases using wavelets, resulting in better representation of dynamic objects appearance in both space and time. Besides object appearance, DENSER enhances object shape representation through densification of its point cloud across multiple scene frames, resulting in faster convergence of model training. Extensive evaluations on the KITTI dataset show that the proposed approach outperforms state-of-the-art methods by a wide margin. Source codes and models will be uploaded to this repository https://github.com/sntubix/denser
|
|
15:40-15:45, Paper TuCT3.6 | Add to My Program |
Factorized Multi-Resolution HashGrid for Efficient Neural Radiance Fields: Execution on Edge-Devices |
|
Jun-Seong, Kim | POSTECH |
Kim, Mingyu | UBC |
Kim, GeonU | POSTECH |
Oh, Tae-Hyun | POSTECH |
Kim, Jin-Hwa | NAVER Cloud |
Keywords: Computational Geometry, Mapping
Abstract: We introduce Fact-Hash, a novel parameter-encoding method for training on-device neural radiance fields. Neural Radiance Fields (NeRF) have proven pivotal in 3D representations, but their applications are limited due to large computational resources. On-device training can open large application fields, providing strength in communication limitations, privacy concerns, and fast adaptation to a frequently changing scene. However, challenges such as limited resources (GPU memory, storage, and power) impede their deployment. To handle this, we introduce Fact-Hash, a novel parameter-encoding merging Tensor Factorization and Hash-encoding techniques. This integration offers two benefits: the use of rich high resolution features and the few-shot robustness. In Fact-Hash, we project 3D coordinates into multiple lower-dimensional forms (2D or 1D) before applying the hash function and then aggregate them into a single feature. Comparative evaluations against state-of-the-art methods demonstrate Fact-Hash's superior memory efficiency, preserving quality and rendering speed. Fact-Hash saves memory usage by over one-third while maintaining the PSNR values compared to previous encoding methods. The on-device experiment validates the superiority of Fact-Hash compared to alternative positional encoding methods in computational efficiency and energy consumption. These findings highlight Fact-Hash as a promising solution to improve feature grid representation, address memory constraints, and improve quality in various applications.
|
|
15:45-15:50, Paper TuCT3.7 | Add to My Program |
3D Uncertain Implicit Surface Mapping Using GMM and GP |
|
Zou, Qianqian | Leibniz University Hannover |
Sester, Monika | Leibniz University Hannover, Institute of Cartography and Geoinf |
Keywords: Mapping, Probability and Statistical Methods, Range Sensing
Abstract: In this study, we address the challenge of constructing continuous three-dimensional (3D) models that accurately represent uncertain surfaces, derived from noisy and incomplete LiDAR scanning data. Building upon our prior work, which utilized the Gaussian Process (GP) and Gaussian Mixture Model (GMM) for structured building models, we introduce a more generalized approach tailored for complex surfaces in urban scenes, where GMM Regression and GP with derivative observations are applied. A Hierarchical GMM (HGMM) is employed to optimize the number of GMM components and speed up the GMM training. With the prior map obtained from HGMM, GP inference is followed for the refinement of the final map. Our approach models the implicit surface of the geo-object and enables the inference of the regions that are not completely covered by measurements. The integration of GMM and GP yields well-calibrated uncertainty estimates alongside the surface model, enhancing both accuracy and reliability. The proposed method is evaluated on real data collected by a mobile mapping system. Compared to the performance in mapping accuracy and uncertainty quantification of other methods, such as Gaussian Process Implicit Surface map (GPIS) and log-Gaussian Process Implicit Surface map (Log-GPIS), the proposed method achieves lower RMSEs, higher log-likelihood values and lower computational costs for the evaluated datasets.
|
|
TuCT4 Regular Session, 304 |
Add to My Program |
Object Detection 1 |
|
|
Chair: Brandt, Laura Eileen | Massachusetts Institute of Technology |
Co-Chair: Martinson, Eric | Lawrence Technological University |
|
15:15-15:20, Paper TuCT4.1 | Add to My Program |
Mono-Camera-Only Target Chasing for a Drone in a Dense Environment by Cross-Modal Learning |
|
Yoo, Seungyeon | Seoul National University |
Jung, Seungwoo | Seoul National University |
Lee, Yunwoo | Seoul National University |
Shim, Dongseok | Seoul National University |
Kim, H. Jin | Seoul National University |
Keywords: Visual Learning, Deep Learning Methods, Vision-Based Navigation
Abstract: Chasing a dynamic target in a dense environment is one of the challenging applications of autonomous drones. The task requires multi-modal data, such as RGB and depth, to accomplish safe and robust maneuver. However, using different types of modalities can be difficult due to the limited capacity of drones in aspects of hardware complexity and sensor cost. Our framework resolves such restrictions in the target chasing task by using only a monocular camera instead of multiple sensor inputs. From an RGB input, the perception module can extract a cross-modal representation containing information from multiple data modalities. To learn cross-modal representations at training time, we employ variational auto-encoder (VAE) structures and the joint objective function across heterogeneous data. Subsequently, using latent vectors acquired from the pre-trained perception module, the planning module generates a proper next-time-step waypoint by imitation learning of the expert, which performs a numerical optimization using the privileged RGB-D data. Furthermore, the planning module considers temporal information of the target to improve tracking performance through consecutive cross-modal representations. Ultimately, we demonstrate the effectiveness of our framework through the reconstruction results of the perception module, the target chasing performance of the planning module, and the zero-shot sim-to-real deployment of a drone.
|
|
15:20-15:25, Paper TuCT4.2 | Add to My Program |
CoopDETR: A Unified Cooperative Perception Framework for 3D Detection Via Object Query |
|
Wang, Zhe | Institute for AI Industry Research, Tsinghua University |
Xu, Shaocong | Xiamen University |
Xucai, Zhuang | Institute for Al Industry Research, Tsinghua University |
Xu, Tongda | Tsinghua University |
Wang, Yan | Tsinghua University |
Liu, Jingjing | Institute for AI Industry Research (AIR), Tsinghua University |
Chen, Yilun | Tsinghua University |
Zhang, Ya-Qin | Institute for AI Industry Research(AIR), Tsinghua University |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception, Autonomous Agents
Abstract: Cooperative perception enhances the individual perception capabilities of autonomous vehicles (AVs) by providing a comprehensive view of the environment. However, balancing perception performance and transmission costs remains a significant challenge. Current approaches that transmit region-level features across agents are limited in interpretability and demand substantial bandwidth, making them unsuitable for practical applications. In this work, we propose CoopDETR, a novel cooperative perception framework that introduces object-level feature cooperation via object query. Our framework consists of two key modules: single-agent query generation, which efficiently encodes raw sensor data into object queries, reducing transmission cost while preserving essential information for detection; and cross-agent query fusion, which includes Spatial Query Matching (SQM) and Object Query Aggregation (OQA) to enable effective interaction between queries. Our experiments on the OPV2V and V2XSet datasets demonstrate that CoopDETR achieves state-of-the-art performance and significantly reduces transmission costs to 1/782 of previous methods.
|
|
15:25-15:30, Paper TuCT4.3 | Add to My Program |
Learning Better Representations for Crowded Pedestrians in Offboard LiDAR-Camera 3D Tracking-By-Detection |
|
Li, Shichao | Zhuoyu Technology |
Li, Peiliang | HKUST, Robotics Institute |
Lian, Qing | HKUST |
Yun, Peng | The Hong Kong University of Science and Technology |
Chen, Xiaozhi | DJI |
Keywords: Computer Vision for Transportation, Intelligent Transportation Systems, Representation Learning
Abstract: Perceiving pedestrians in highly crowded urban environments is a difficult long-tail problem for learning-based autonomous perception. Speeding up 3D ground truth generation for such challenging scenes is performance-critical yet very challenging. The difficulties include the sparsity of the captured pedestrian point cloud and a lack of suitable benchmarks for a specific system design study. To tackle the challenges, we first collect a new multi-view LiDAR-camera 3D multiple-object-tracking benchmark of highly crowded pedestrians for in-depth analysis. We then build an offboard auto-labeling system that reconstructs pedestrian trajectories from LiDAR point cloud and multi-view images. To improve the generalization power for crowded scenes and the performance for small objects, we propose to learn high-resolution representations that are density-aware and relationship-aware. Extensive experiments validate that our approach significantly improves the 3D pedestrian tracking performance towards higher auto-labeling efficiency. The code will be publicly available at this HTTP URL.
|
|
15:30-15:35, Paper TuCT4.4 | Add to My Program |
Bi-Stream Knowledge Transfer for Semi-Supervised 3D Point Cloud Object Detection |
|
Zheng, Jilai | Shanghai Jiao Tong University |
Tang, Pin | Shanghai Jiao Tong University, China |
Ren, Xiangxuan | Shanghai Jiao Tong University |
Wang, Zhongdao | Noah's Ark Laboratory |
Ma, Chao | Shanghai Jiao Tong University |
Keywords: Computer Vision for Transportation, Deep Learning for Visual Perception, Autonomous Vehicle Navigation
Abstract: 3D point cloud object detection plays an important role in autonomous driving. However, labeling 3D object boxes is expensive and time-consuming, limiting the number of annotated point clouds used in fully-supervised training. This has led to a rise in semi-supervised 3D object detection research, which aims to improve model performance by leveraging both labeled and unlabeled point clouds. Existing methods typically rely on the Mean Teacher (MT) paradigm, which uses unlabeled instances discovered by the teacher with confidence scores higher than certain thresholds to train the student. However, this leads to a loss of information as it overlooks ambiguous instances from the teacher that could also contain valuable knowledge. To address this issue, we propose a Bi-Stream Knowledge Transfer (BiKT) framework that fully exploits and transfers knowledge from both confident and ambiguous instances to the student network. Specifically, all pseudo labels are allocated into two knowledge streams, the deterministic stream and the noisy stream, and then subsequently guide the student network through bi-level supervision. We also introduce a Dynamic Stream Switching (DSS) algorithm that sets the stream boundary tailored for the current learning status. To further improve the quality of pseudo labels in the knowledge streams, we propose a Diffusive Label Denoising (DLD) module, which is trained by explicitly generating noised instances and then learning to denoise them, as in diffusion models. Experiments show the state-of-the-art performance of our BiKT on the ONCE validation and testing sets, as well as the robust generalization capability when confronted with diverse base detectors, increased amount of unlabeled data, and distinct datasets (e.g., Waymo), unveiling the power of semi-supervised learning in 3D object detection.
|
|
15:35-15:40, Paper TuCT4.5 | Add to My Program |
Semantic-Supervised Spatial-Temporal Fusion for LiDAR-Based 3D Object Detection |
|
Wang, Chaoqun | The Chinese University of Hong Kong, Shenzhen |
Hong, Xiaobin | Nanjing University |
Li, Wenzhong | Nanjing University |
Zhang, Ruimao | The Chinese University of Hong Kong (Shenzhen) |
Keywords: Computer Vision for Automation, Deep Learning for Visual Perception, Sensor Fusion
Abstract: LiDAR-based 3D object detection presents significant challenges due to the inherent sparsity of LiDAR points. A common solution involves long-term temporal LiDAR data to densify the inputs. However, efficiently leveraging spatial-temporal information remains an open problem. In this paper, we propose a novel Semantic-Supervised Spatial-Temporal Fusion (ST-Fusion) method, which introduces a novel fusion module to relieve the spatial misalignment caused by the object motion over time and a feature-level semantic supervision to sufficiently unlock the capacity of the proposed fusion module. Specifically, the ST-Fusion consists of a Spatial Aggregation (SA) module and a Temporal Merging (TM) module. The SA module employs a convolutional layer with progressively expanding receptive fields to aggregate the object features from the local regions to alleviate the spatial misalignment, the TM module dynamically extracts object features from the preceding frames based on the attention mechanism for a comprehensive sequential presentation. Besides, in the semantic supervision, we propose a Semantic Injection method to enrich the sparse LiDAR data via injecting the point-wise semantic labels, using it for training a teacher model and providing a reconstruction target at the feature level supervised by the proposed object-aware loss. Extensive experiments on various LiDAR-based detectors demonstrate the effectiveness and universality of our proposal, yielding an improvement of approximately +2.8% in NDS based on the nuScenes benchmark.
|
|
15:40-15:45, Paper TuCT4.6 | Add to My Program |
OoDIS: Anomaly Instance Segmentation and Detection Benchmark |
|
Nekrasov, Alexey | RWTH Aachen University |
Zhou, Rui | Rheinisch-Westfälische Technische Hochschule |
Ackermann, Miriam | Ruhr Universität Bochum |
Hermans, Alexander | RWTH Aachen University |
Leibe, Bastian | RWTH Aachen University |
Rottmann, Matthias | University of Wuppertal |
Keywords: Data Sets for Robotic Vision, Failure Detection and Recovery, Object Detection, Segmentation and Categorization
Abstract: Safe navigation of self-driving cars and robots requires a precise understanding of their environment. Training data for perception systems cannot cover the wide variety of objects that may appear during deployment. Thus, reliable identification of unknown objects, such as wild animals and untypical obstacles, is critical due to their potential to cause serious accidents. Significant progress in semantic segmentation of anomalies has been facilitated by the availability of out-of-distribution (OOD) benchmarks. However, a comprehensive understanding of scene dynamics requires the segmentation of individual objects, and thus the segmentation of instances is essential. Development in this area has been lagging, largely due to the lack of dedicated benchmarks. The situation is similar in object detection. While there is interest in detecting and potentially tracking every anomalous object, the availability of dedicated benchmarks is clearly limited. To address this gap, this work extends some commonly used anomaly segmentation benchmarks to include the instance segmentation and object detection tasks. Our evaluation of anomaly instance segmentation and object detection methods shows that both of these challenges remain unsolved problems. We provide a competition and benchmark website under https://vision.rwth-aachen.de/oodis.
|
|
15:45-15:50, Paper TuCT4.7 | Add to My Program |
Hybrid Attention for Robust RGB-T Pedestrian Detection in Real-World Conditions |
|
Rathinam, Arunkumar | University of Luxembourg |
Pauly, Leo | City, University of London |
Shabayek, Abd El Rahman | SnT, University of Luxembourg, Luxembourg |
Rharbaoui, Wassim | University of Le Mans |
Kacem, Anis | University of Luxembourg |
Gaudillière, Vincent | Inria, CNRS, Université De Lorraine |
Aouada, Djamila | SnT, University of Luxembourg |
Keywords: Object Detection, Segmentation and Categorization, Human Detection and Tracking, Deep Learning for Visual Perception
Abstract: Multispectral pedestrian detection has gained significant attention in recent years, particularly in autonomous driving applications. To address the challenges posed by adversarial illumination conditions, the combination of thermal and visible images has demonstrated its advantages. However, existing fusion methods rely on the critical assumption that the RGB-Thermal (RGB-T) image pairs are fully overlapping. These assumptions often do not hold in real-world applications, where only partial overlap between images can occur due to the sensor configuration. Moreover, sensor failure can cause loss of information in one modality. In this paper, we propose a novel module called the Hybrid Attention (HA) mechanism as our main contribution to mitigate performance degradation caused by partial overlap and sensor failure, i.e. when at least part of the scene is acquired by a single sensor. We propose an improved RGB-T fusion algorithm, robust against partial overlap and sensor failure encountered during inference in real-world applications. We also leverage a mobile-friendly backbone to cope with resource constraints in embedded systems. We conducted experiments by simulating various partial overlap and sensor failure scenarios to evaluate the performance of our proposed method. The results demonstrate that our approach outperforms state-of-the-art methods, showcasing its superiority in handling real-world challenges.
|
|
TuCT5 Regular Session, 305 |
Add to My Program |
Aerial Robots: Trajectory Planning and Control |
|
|
Chair: Cheng, Sheng | University of Illinois Urbana-Champaign |
Co-Chair: Zhang, Hong | SUSTech |
|
15:15-15:20, Paper TuCT5.1 | Add to My Program |
Safe Interval Motion Planning for Quadrotors in Dynamic Environments |
|
Huang, Songhao | University of Pennsylvania |
Wu, Yuwei | University of Pennsylvania |
Tao, Yuezhan | University of Pennsylvania |
Kumar, Vijay | University of Pennsylvania |
Keywords: Aerial Systems: Applications, Motion and Path Planning, Constrained Motion Planning
Abstract: Trajectory generation in dynamic environments presents a significant challenge for quadrotors, particularly due to the non-convexity in the spatial-temporal domain. Many existing methods either assume simplified static environments or struggle to produce optimal solutions in real-time. In this work, we propose an efficient safe interval motion planning framework for navigation in dynamic environments. A safe interval refers to a time window during which a specific configuration is safe. Our approach addresses trajectory generation through a two-stage process: a front-end graph search step followed by a back-end gradient-based optimization. We ensure completeness and optimality by constructing a dynamic connected visibility graph and incorporating low-order dynamic bounds within safe intervals and temporal corridors. To avoid local minima, we propose a Uniform Temporal Visibility Deformation (UTVD) for the complete evaluation of spatial-temporal topological equivalence. We represent trajectories with B-Spline curves and apply gradient-based optimization to navigate around static and moving obstacles within spatial-temporal corridors. Through simulation and real-world experiments, we show that our method can achieve a success rate of over 95% in environments with different density levels, exceeding the performance of other approaches, demonstrating its potential for practical deployment in highly dynamic environments.
|
|
15:20-15:25, Paper TuCT5.2 | Add to My Program |
Towards Safe and Energy-Efficient Real-Time Motion Planning in Windy Urban Environments |
|
Folk, Spencer | University of Pennsylvania |
Melton, John | NASA Ames Research Center |
Margolis, Benjamin W. L. | NASA Ames Research Center |
Yim, Mark | University of Pennsylvania |
Kumar, Vijay | University of Pennsylvania |
Keywords: Aerial Systems: Perception and Autonomy, Energy and Environment-Aware Automation, Autonomous Vehicle Navigation
Abstract: Urban winds are a serious hazard for low-altitude autonomous aerial operations in urban airspaces. Previous methods for motion planning in urban winds require global knowledge of the obstacles and flow field and do not lend themselves to real-time application. In this paper, a planning and control framework is proposed for safe and energy-efficient navigation through urban flow fields that strictly relies on onboard sensing. The algorithm incorporates predictions of local wind flow fields into a receding horizon optimal controller, balancing energy consumption with obstacle avoidance on the fly to reach a goal destination. Simulation studies on a procedurally generated urban map with diverse wind conditions demonstrate that the energy-aware motion planner reduces energy consumption by as much as 30% and results in 32% fewer crashes on average compared to the wind-agnostic baseline. Comparisons to a global wind-aware planner indicate only minor trade-offs associated with planning on a local horizon.
|
|
15:25-15:30, Paper TuCT5.3 | Add to My Program |
Dynamic Perception-Enhanced Motion Planning and Control for UAVs Flights in Challenging Dynamic Environments |
|
Liu, Luyao | Southern University of Science and Technology |
Xu, Jiarui | Southern University of Science and Technology |
Zhang, Hong | SUSTech |
Keywords: Aerial Systems: Perception and Autonomy, Collision Avoidance, Motion and Path Planning
Abstract: The autonomous flights of unmanned aerial vehicles (UAVs) in unknown environments have garnered significant attention. However, most existing methods only achieve safe navigation in static environments or spacious scenes with few moving obstacles. Motivated by this open problem, this paper presents a complete system for safe and autonomous UAVs flights in unknown clustered environments with multiple dynamic obstacles. To properly represent complex dynamic environments, we develop a 3D dynamic Euclidean Signed Distance Field (ESDF) mapping method that initially segments and tracks dynamic obstacles using a novel feature-based association strategy, while fusing the remaining static obstacles into ESDF map. Then, we propose a joint trajectory planning and motion control framework for safely avoiding surrounding obstacles. Specifically, the gradient-based B-spline trajectory optimization algorithm is employed to generate a collision-free static trajectory with respect to static obstacles. To avoid dynamic obstacles while adaptively tracking the static trajectory, we utilize time-adaptive model predictive control combined with Dynamic Control Barrier Function (D-CBF), which maps the collision avoidance constraints of dynamic obstacles onto the control inputs. Extensive simulated and real-world experiments confirm that our proposed method outperforms previous approaches for UAVs flights in challenging dynamic environments.
|
|
15:30-15:35, Paper TuCT5.4 | Add to My Program |
Real-Time Sampling-Based Online Planning for Drone Interception |
|
Ryou, Gilhyun | Massachusetts Institute of Technology |
Lao Beyer, Lukas | Massachusetts Institute of Technology |
Karaman, Sertac | Massachusetts Institute of Technology |
Keywords: Aerial Systems: Perception and Autonomy, Planning under Uncertainty, AI-Based Methods
Abstract: This paper studies high-speed online planning in dynamic environments. The problem requires finding time-optimal trajectories that conform to system dynamics, meeting computational constraints for real-time adaptation, and accounting for uncertainty from environmental changes. To address these challenges, we propose a sampling-based online planning algorithm that leverages neural network inference to replace time-consuming nonlinear trajectory optimization, enabling rapid exploration of multiple trajectory options under uncertainty. The proposed method is applied to the drone interception problem, where a defense drone must intercept a target while avoiding collisions and handling imperfect target predictions. The algorithm efficiently generates trajectories toward multiple potential target drone positions in parallel. It then assesses trajectory reachability by comparing traversal times with the target drone's predicted arrival time, ultimately selecting the minimum-time reachable trajectory. Through extensive validation in both simulated and real-world environments, we demonstrate our method's capability for high-rate online planning and its adaptability to unpredictable movements in unstructured settings.
|
|
15:35-15:40, Paper TuCT5.5 | Add to My Program |
Optimal Trajectory Planning for Cooperative Manipulation with Multiple Quadrotors Using Control Barrier Functions |
|
Pallar, Arpan | Ap7538@nyu.edu |
Li, Guanrui | Worcester Polytechnic Institute |
Sarvaiya, Mrunal | Agile Robotics and Perception Lab, NYU |
Loianno, Giuseppe | New York University |
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Multi-Robot Systems
Abstract: In this paper, we present a novel trajectory planning algorithm for cooperative manipulation with multiple quadrotors using control barrier functions (CBFs). Our ap- proach addresses the complex dynamics of a system in which a team of quadrotors transports and manipulates a cable- suspended rigid-body payload in environments cluttered with obstacles. The proposed algorithm ensures obstacle avoidance for the entire system, including the quadrotors, cables, and the payload in all six degrees of freedom (DoF). We introduce the use of CBFs to enable safe and smooth maneuvers, effectively navigating through cluttered environments while accommodating the system’s nonlinear dynamics. To simplify complex constraints, the system components are modeled as convex polytopes, and the Duality theorem is employed to reduce the computational complexity of the optimization prob- lem. We validate the performance of our planning approach both in simulation and real-world environments using multiple quadrotors. The results demonstrate the effectiveness of the proposed approach in achieving obstacle avoidance and safe trajectory generation for cooperative transportation tasks.
|
|
15:40-15:45, Paper TuCT5.6 | Add to My Program |
Servo Integrated Nonlinear Model Predictive Control for Overactuated Tiltable-Quadrotors |
|
Li, Jinjie | The University of Tokyo |
Sugihara, Junichiro | The University of Tokyo |
Zhao, Moju | The University of Tokyo |
Keywords: Aerial Systems: Mechanics and Control, Motion Control
Abstract: Utilizing a servo to tilt each rotor transforms quadrotors from underactuated to overactuated systems, allowing for independent control of both attitude and position, which provides advantages for aerial manipulation. However, this enhancement also introduces model nonlinearity, sluggish servo response, and limited operational range into the system, posing challenges to dynamic control. In this study, we propose a control approach for tiltable-quadrotors based on nonlinear model predictive control (NMPC). Unlike conventional cascade methods, our approach preserves the full dynamics without simplification. It directly uses rotor thrust and servo angle as control inputs, where their limited working ranges are considered input constraints. Notably, we incorporate a first-order servo model within the NMPC framework. Simulation reveals that integrating the servo dynamics is not only an enhancement to control performance but also a critical factor for optimization convergence. To evaluate the effectiveness of our approach, we fabricate a tiltable-quadrotor and deploy the algorithm onboard at 100Hz. Extensive real-world experiments demonstrate rapid, robust, and smooth pose-tracking performance.
|
|
15:45-15:50, Paper TuCT5.7 | Add to My Program |
Geometric Tracking Control of Omnidirectional Multirotors for Aggressive Maneuvers |
|
Lee, Hyungyu | University of Illinois Urbana-Champaign |
Cheng, Sheng | University of Illinois Urbana-Champaign |
Wu, Zhuohuan | University of Illinois Urbana-Champaign |
Lim, Jaeyoung | ETH Zurich |
Siegwart, Roland | ETH Zurich |
Hovakimyan, Naira | University of Illinois at Urbana-Champaign |
Keywords: Aerial Systems: Mechanics and Control, Robust/Adaptive Control
Abstract: An omnidirectional multirotor has the maneuverability of decoupled translational and rotational motions, superseding the traditional multirotors' motion capability. Such maneuverability is achieved due to the ability of the omnidirectional multirotor to frequently alter the thrust amplitude and direction. In doing so, the rotors' settling time, which is induced by inherent rotor dynamics, significantly affects the omnidirectional multirotor's tracking performance, especially in aggressive flights. To resolve this issue, we propose a novel tracking controller that takes the rotor dynamics into account and does not require additional rotor state measurement. This is achieved by integrating a linear rotor dynamics model into the vehicle's equations of motion and designing a PD controller to compensate for the effects introduced by rotor dynamics. We prove that the proposed controller yields almost global exponential stability. The proposed controller is validated in experiments, where we demonstrate significantly improved tracking performance in multiple aggressive maneuvers compared with a baseline geometric PD controller.
|
|
TuCT6 Regular Session, 307 |
Add to My Program |
Perception for Mobile Robots 3 |
|
|
Chair: Chiu, Han-Pang | SRI International |
Co-Chair: Xiao, Jing | Worcester Polytechnic Institute (WPI) |
|
15:15-15:20, Paper TuCT6.1 | Add to My Program |
Topology-Based Visual Active Room Segmentation |
|
Bao, Chenyu | The Chinese University of Hong Kong, Shenzhen |
Hu, Junjie | The Chinese University of Hong Kong, Shenzhen |
Zheng, Qiu | The Chinese University of HongKong, Shenzhen |
Lam, Tin Lun | The Chinese University of Hong Kong, Shenzhen |
Keywords: Semantic Scene Understanding, Perception-Action Coupling, Vision-Based Navigation
Abstract: Room segmentation plays a significant role in scene understanding, semantic mapping, and scene coverage for robots navigating in real-world indoor environments. However, most previous works take a passive segmentation that requires a complete and uncluttered grid map as input, often resulting in lower segmentation accuracy and cannot be deployed in unknown environments. In this paper, we propose an active room segmentation framework that can enable a robot to incrementally and autonomously perform room segmentation in cluttered indoor environments. Our framework consists of three key components: i) a door extraction module where a visual semantic feature, specifically, door, is extracted to better identify rooms in cluttered environments, ii) a within-room exploration module that detects frontiers within the currently exploring room, and iii) a topological module that represents connectivity between rooms and determines next room for exploration. We show through experiments that the proposed method depicts two distinct advantages against existing methods in segmentation accuracy and autonomy. The code is available at https://github.com/FreeformRobotics/Active_room_segmentatio n .
|
|
15:20-15:25, Paper TuCT6.2 | Add to My Program |
ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation |
|
Anwar, Abrar | University of Southern California |
John, Welsh, John Bradford | NVIDIA |
Biswas, Joydeep | University of Texas at Austin |
Pouya, Soha | Stanford University |
Chang, Yan | Nvidia |
Keywords: AI-Enabled Robotics, Semantic Scene Understanding, Vision-Based Navigation
Abstract: Navigating and understanding complex environments over extended periods of time is a significant challenge for robots. People interacting with the robot may want to ask questions like where something happened, when it occurred, or how long ago it took place, which would require the robot to reason over a long history of their deployment. To address this problem, we introduce a Retrieval-augmented Memory for Embodied Robots, or ReMEmbR, a system designed for long-horizon video question answering for robot navigation. To evaluate ReMEmbR, we introduce the NaVQA dataset where we annotate spatial, temporal, and descriptive questions to long-horizon robot navigation videos. ReMEmbR employs a structured approach involving a memory building and a querying phase, leveraging temporal information, spatial information, and images to efficiently handle continuously growing robot histories. Our experiments demonstrate that ReMEmbR outperforms LLM and VLM baselines, allowing ReMEmbR to achieve effective long-horizon reasoning with low latency. Additionally, we deploy ReMEmbR on a robot and show that our approach can handle diverse queries. The dataset, code, videos, and other material can be found at the following link: https://nvidia-ai-iot.github.io/remembr.
|
|
15:25-15:30, Paper TuCT6.3 | Add to My Program |
Online Diffusion-Based 3D Occupancy Prediction at the Frontier with Probabilistic Map Reconciliation |
|
Reed, Alec | University of Colorado Boulder |
Achey, Lorin | University of Colorado Boulder |
Crowe, Brendan | University of Colorado Boulder |
Hayes, Bradley | University of Colorado Boulder |
Heckman, Christoffer | University of Colorado at Boulder |
Keywords: AI-Enabled Robotics, Deep Learning Methods, AI-Based Methods
Abstract: Autonomous navigation and exploration in unmapped environments remains a significant challenge in robotics due to the difficulty robots face in making commonsense inference of unobserved geometries. Recent advancements have demonstrated that generative modeling techniques, particularly diffusion models, can enable systems to infer these geometries from partial observation. In this work, we present implementation details and results for real-time, online occupancy prediction using a modified diffusion model. By removing attention-based visual conditioning and visual feature extraction components, we achieve a 73% reduction in runtime with minimal accuracy reduction. These modifications enable occupancy prediction across the entire map, rather than limiting it to the area around the robot where sensor data can be collected. We introduce a probabilistic update method for merging predicted occupancy data into running occupancy maps, resulting in a 71% improvement in predicting occupancy at map frontiers compared to previous methods. Finally, our code and a ROS node for on-robot operation can be found on our website: https://arpg.github.io/scenesense/.
|
|
15:30-15:35, Paper TuCT6.4 | Add to My Program |
Point2Graph: An End-To-End Point Cloud-Based 3D Open-Vocabulary Scene Graph for Robot Navigation |
|
Xu, Yifan | University of Michigan |
Luo, Ziming | University of Michigan |
Wang, Qianwei | University of Michigan, Ann Arbor |
Kamat, Vineet | University of Michigan |
Menassa, Carol | University of Michigan-Ann Arbor |
Keywords: Object Detection, Segmentation and Categorization, Human Factors and Human-in-the-Loop, Physically Assistive Devices
Abstract: Current open-vocabulary scene graph generation algorithms highly rely on both 3D scene point cloud data and posed RGB-D images and thus have limited applications in scenarios where RGB-D images or camera poses are not readily available. To solve this problem, we propose Point2Graph, a novel end-to-end point cloud-based 3D open-vocabulary scene graph generation framework in which the requirement of posed RGB-D image series is eliminated. This hierarchical framework contains room and object detection/segmentation and open-vocabulary classification. For the room layer, we leverage the advantage of merging the geometry-based border detection algorithm with the learning-based region detection to segment rooms and create a “Snap-Lookup” framework for open-vocabulary room classification. In addition, we create an end-to-end pipeline for the object layer to detect and classify 3D objects based solely on 3D point cloud data. Our evaluation results show that our framework can outperform the current state-of-the-art (SOTA) open-vocabulary object and room segmentation and classification algorithm on widely used real-scene datasets. We provide code and videos at: https://point2graph.github.io/
|
|
15:35-15:40, Paper TuCT6.5 | Add to My Program |
Estimating Commonsense Scene Composition on Belief Scene Graphs |
|
Valdes Saucedo, Mario Alberto | Lulea University of Technology |
Kottayam Viswanathan, Vignesh | Lulea University of Technology |
Kanellakis, Christoforos | LTU |
Nikolakopoulos, George | Luleå University of Technology |
Keywords: Semantic Scene Understanding, Learning Categories and Concepts, AI-Enabled Robotics
Abstract: This work establishes the concept of commonsense scene composition, with a focus on extending Belief Scene Graphs by estimating the spatial distribution of unseen objects. Specifically, the commonsense scene composition capability refers to the understanding of the spatial relationships among related objects in the scene, which in this article is modeled as a joint probability distribution for all possible locations of the semantic object class. The proposed framework includes two variants of a Correlation Information (CECI) model for learning probability distributions: (i) a baseline approach based on a Graph Convolutional Network, and (ii) a neuro-symbolic extension that integrates a spatial ontology based on Large Language Models (LLMs). Furthermore, this article provides a detailed description of the dataset generation process for such tasks. Finally, the framework has been validated through multiple runs on simulated data, as well as in a real-world indoor environment, demonstrating its ability to spatially interpret scenes across different room types. For a video of the article, showcasing the experimental demonstration, please refer to the following link: https://youtu.be/f0tqtPVFZ2A
|
|
15:40-15:45, Paper TuCT6.6 | Add to My Program |
VLM-Vac: Enhancing Smart Vacuums through VLM Knowledge Distillation and Language-Guided Experience Replay |
|
Mirjalili, Reihaneh | University of Technology Nuremberg |
Krawez, Michael | University of Technology Nuremberg |
Walter, Florian | University of Technology Nuremberg |
Burgard, Wolfram | University of Technology Nuremberg |
Keywords: AI-Enabled Robotics, Domestic Robotics, Learning from Experience
Abstract: In this paper, we propose VLM-Vac, a novel framework designed to enhance the autonomy of smart robot vacuum cleaners. Our approach integrates the zero-shot object detection capabilities of a Vision-Language Model (VLM) with a Knowledge Distillation (KD) strategy. By leveraging the VLM, the robot can categorize objects into actionable classes---either to avoid or to suck---across diverse backgrounds. However, frequently querying the VLM is computationally expensive and impractical for real-world deployment. To address this issue, we implement a KD process that gradually transfers the essential knowledge of the VLM to a smaller, more efficient model. Our real-world experiments demonstrate that this smaller model progressively learns from the VLM and requires significantly fewer queries over time. Additionally, we tackle the challenge of continual learning in dynamic home environments by exploiting a novel experience replay method based on language-guided sampling. Our results show that this approach not only reduces energy consumption by 53% compared to cumulative learning but also surpasses conventional vision-based clustering methods, particularly in detecting small objects across diverse backgrounds.
|
|
15:45-15:50, Paper TuCT6.7 | Add to My Program |
Simultaneously Search and Localize Semantic Objects in Unknown Environments |
|
Qian, Zhentian | WPI |
Fu, Jie | University of Florida |
Xiao, Jing | Worcester Polytechnic Institute (WPI) |
Keywords: Reactive and Sensor-Based Planning, SLAM, Planning under Uncertainty
Abstract: For a robot in an unknown environment to find a target semantic object, it must perform simultaneous localization and mapping (SLAM) at both geometric and semantic levels using its onboard sensors while planning and executing its motion based on the ever-updated SLAM results. In other words, the robot must simultaneously conduct localization, semantic mapping, motion planning, and execution online in the presence of sensing and motion uncertainty. This is an open problem as it intertwines semantic SLAM and adaptive online motion planning and execution under uncertainty based on perception. Moreover, the goals of the robot's motion change on the fly depending on whether and how the robot can detect the target object. We propose a novel approach to tackle the problem, leveraging semantic SLAM, Bayesian Networks, and online probabilistic motion planning. The results demonstrate our approach's effectiveness and efficiency.
|
|
TuCT7 Regular Session, 309 |
Add to My Program |
Legged Locomotion: Novel Platforms |
|
|
Chair: Ye, Keran | University of California, Riverside |
Co-Chair: Goldman, Daniel | Georgia Institute of Technology |
|
15:15-15:20, Paper TuCT7.1 | Add to My Program |
Integrated Barometric Pressure Sensors on Legged Robots for Enhanced Tactile Exploration of Edges |
|
Van Hauwermeiren, Thijs | Ghent University |
Sianov, Anatolii | University of Gent, EELAB |
Coene, Annelies | Ghent University |
Crevecoeur, Guillaume | Ghent University |
Keywords: Legged Robots, Soft Sensors and Actuators, Soft Robot Applications
Abstract: This paper presents a new tactile sensor that utilizes an array of barometric pressure sensors encased in a deformable rubber sphere. Designed as an end effector foot of the legged quadruped robot Unitree Go1, the proposed sensor is able to withstand repeated impacts of at least 40N at the feet. Tactile sensing in legged robotics extends their utility specifically in the context of edge detection and exploration. The presented tactile contact framework processes the pressure data to classify the type of contact (no contact, flat or edge) and the orientation of the edges relative to robot base. To assess the performance of the sensors and their ability for tactile edge exploration we extensively test with the legged robot in varied conditions including different terrains, changing payloads, exposure to dynamic disturbances, and sloped edges. The edge detection is compared against the original scalar force sensors. Experiments demonstrate a mean absolute error of 2° for predicted edge orientation at a detection range of 14 mm and robust operation in realistic operating conditions for medium-sized quadruped robots. With this contribution, we aim to enhance the capabilities and safety of legged robots in various applications.
|
|
15:20-15:25, Paper TuCT7.2 | Add to My Program |
Development of a New Biped Robot with Adaptive Suction Modules for Climbing on Curved Surfaces |
|
Li, Zikang | University of Macau |
Zhang, Weijian | University of Macau |
Wu, Zehao | University of Macau |
Xu, Qingsong | University of Macau |
Keywords: Climbing Robots, Legged Robots, Robotics and Automation in Construction
Abstract: Regular cleaning and maintenance of high-altitude pipes and curved surfaces on high-rise buildings are high-risk tasks for human workers due to the difficulty of working on curved planes. To address such challenge, automated robots are widely used for cleaning buildings with flat walls, but they cannot climb on curved surfaces, limiting their practical applications. This paper proposes a novel biped curved-surface climbing robot (BCCR) with five-degree-of-freedom (5-DOF) motion. The BCCR features adaptive vacuum suction modules that can adhere to both curved and flat surfaces, allowing seamless movement of the BCCR across various surfaces. Each terminal suction module is composed of three small suction cups, which are capable of rotating in all directions to achieve adaptive adhesion on various surfaces. The 5-DOF structure enables the robot to cross obstacles and makes it highly versatile for various cleaning tasks on a wide range of surfaces, including large curved pipes. The mechanism design and analytical modeling of the BCCR are carried out, demonstrating its robust curved-surface climbing capabilities. Moreover, a prototype is fabricated for experimental investigation. The results indicate that the proposed 5-DOF BCCR can achieve stable climbing on curved surfaces.
|
|
15:25-15:30, Paper TuCT7.3 | Add to My Program |
Berkeley Humanoid: A Research Platform for Learning-Based Control |
|
Liao, Qiayuan | University of California, Berkeley |
Zhang, Bike | University of California, Berkeley |
Huang, Xuanyu | The Hong Kong University of Science and Technology (Guangzhou) |
Huang, Xiaoyu | Georgia Institute of Technology |
Li, Zhongyu | University of California, Berkeley |
Sreenath, Koushil | University of California, Berkeley |
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Humanoid Robot Systems
Abstract: We introduce Berkeley Humanoid, a reliable and low-cost mid-scale humanoid research platform for learning- based control. Our lightweight, in-house-built robot is designed specifically for learning algorithms with accurate simulation, low simulation complexity, anthropomorphic motion, and high reliability against falls. The narrow sim-to-real gap enables agile and robust locomotion across various terrains in outdoor environments, achieved with a simple reinforcement learning controller using light domain randomization. Furthermore, we demonstrate the robot traversing for hundreds of meters, walking on a steep unpaved trail, and hopping with single and double legs as a testimony to its high performance in dynamic walking. Capable of omnidirectional locomotion and withstanding large perturbations with a compact setup, our system aims for rapid sim-to-real deployment of learning-based humanoid systems. Please check our website https://berkeley-humanoid.com/ and code https://github.com/HybridRobotics/isaac_berkeley_humanoid/.
|
|
15:30-15:35, Paper TuCT7.4 | Add to My Program |
Zippy: The Smallest Power-Autonomous Bipedal Robot |
|
Man, Steven | Carnegie Mellon University |
Narita, Soma | Carnegie Mellon University |
Macera, Josef | Carnegie Mellon University |
Oke, Naomi | Carnegie Mellon University |
Johnson, Aaron M. | Carnegie Mellon University |
Bergbreiter, Sarah | Carnegie Mellon University |
Keywords: Humanoid and Bipedal Locomotion, Passive Walking, Underactuated Robots
Abstract: Miniaturizing legged robot platforms is challenging due to hardware limitations that constrain the number, power density, and precision of actuators at that size. By leveraging design principles of quasi-passive walking robots at any scale, stable locomotion and steering can be achieved with simple mechanisms and open-loop control. Here, we present the design and control of “Zippy”, the smallest self-contained bipedal walking robot at only 3.6 cm tall. Zippy has rounded feet, a single motor without feedback control, and is capable of turning, skipping, and ascending steps. At its fastest pace, the robot achieves a forward walking speed of 25 cm/s, which is 10 leg lengths per second, the fastest biped robot of any size by that metric. This work explores the design and performance of the robot and compares it to similar dynamic walking robots at larger scales.
|
|
15:35-15:40, Paper TuCT7.5 | Add to My Program |
Exploration and Analysis of Torso-Limb Coordination of Quadruped Walkers with Compliant Torso |
|
Xiang, Yuxuan | Japan Advanced Institute of Science and Technology |
Sedoguchi, Taiki | Japan Advanced Institute of Science and Technology |
Zheng, Yanqiu | Ritsumeikan University |
Asano, Fumihiko | Japan Advanced Institute of Science and Technology |
Keywords: Legged Robots, Underactuated Robots
Abstract: Quadrupeds exhibit remarkable locomotion performance through the coordination between their limbs and torso. From past biological knowledge, it is understood that during walking, the forelimbs primarily contribute to braking, while the hindlimb are responsible for propulsion. However, in the field of quadruped robot dynamics, effectively leveraging this coordination remains a challenge. To investigate the torso-limb coordination, this study explores the walking performance of a quadruped walker with a compliant torso, driven by the forelimb or the hindlimb. Through numerical simulations, we analyze the walking behavior under different control drive methods. The findings provide insights into the design of compliant-bodied robots and the optimal distribution of propulsion forces between the forelimbs and hindlimbs.
|
|
15:40-15:45, Paper TuCT7.6 | Add to My Program |
Effective Self-Righting Strategies for Elongate Multi-Legged Robots |
|
Teder, Erik | Hillsdale College |
Chong, Baxi | Georgia Institute of Technology |
He, Juntao | Georgia Institute of Technology |
Wang, Tianyu | Georgia Institute of Technology |
Iaschi, Massimiliano | Georgia Institute of Technology |
Soto, Daniel | Georgia Institute of Technology |
Goldman, Daniel | Georgia Institute of Technology |
Keywords: Legged Robots, Field Robots, Multi-Contact Whole-Body Motion Planning and Control
Abstract: Centipede-like robots offer an effective and robust solution to navigation over complex terrain with minimal sensing. However, when climbing over obstacles, such multi-legged robots often elevate their center-of-mass into unstable configurations, where even moderate terrain uncertainty can cause tipping over. Robust mechanisms for such elongate multi-legged robots to self-right remain unstudied. Here, we developed a comparative biological and robophysical approach to investigate self-righting strategies. We first released textit{S. polymorpha} upside down from a 10 cm height and recorded their self-righting behaviors using top and side view high-speed cameras. Using kinematic analysis, we hypothesize that these behaviors can be prescribed by two traveling waves superimposed in the body’s lateral and vertical planes, respectively. We tested our hypothesis on an elongate robot with static (non-actuated) limbs, and we successfully reconstructed these self-righting behaviors. We further evaluated how wave parameters affect self-righting effectiveness. We identified two key wave parameters: the spatial frequency, which characterizes the sequence of body-rolling, and the wave amplitude, which characterizes body curvature. By empirically obtaining a behavior diagram of spatial frequency and amplitude, we identify effective and versatile self-righting strategies for general elongate multi-legged robots, which greatly enhances these robots' mobility and robustness in practical applications such as agricultural terrain inspection and search-and-rescue.
|
|
15:45-15:50, Paper TuCT7.7 | Add to My Program |
Addition of a Peristaltic Wave Improves Multi-Legged Locomotion Performance on Complex Terrains |
|
Iaschi, Massimiliano | Georgia Institute of Technology |
Chong, Baxi | Georgia Institute of Technology |
Wang, Tianyu | Georgia Institute of Technology |
Lin, Jianfeng | Georgia Institute of Technology |
Xu, Zhaochen | Columbia University |
Soto, Daniel | Georgia Institute of Technology |
He, Juntao | Georgia Institute of Technology |
Goldman, Daniel | Georgia Institute of Technology |
Keywords: Legged Robots, Search and Rescue Robots, Biologically-Inspired Robots
Abstract: Characterized by their elongate bodies and relatively simple legs, multi-legged robots have the potential to locomote through complex terrains for applications such as search-and-rescue and terrain inspection. Prior work has developed effective and reliable locomotion strategies for multi-legged robots by propagating the two waves of lateral body undulation and leg stepping, which we will refer to as the two-wave template. However, these robots have limited capability to climb over obstacles with sizes comparable to their heights. We hypothesize that such limitations stem from the two-wave template that we used to prescribe the multi-legged locomotion. Seeking effective alternative waves for obstacle-climbing, we designed a five-segment robot with static (non-actuated) legs, where each cable-driven joint has a rotational degree-of-freedom (DoF) in the sagittal plane (vertical wave) and a linear DoF (peristaltic wave). We tested robot locomotion performance on a flat terrain and a rugose terrain. While the benefit of peristalsis on flat-ground locomotion is marginal, the inclusion of a peristaltic wave substantially improves the locomotion performance in rugose terrains: it not only enables obstacle-climbing capabilities with obstacles having a similar height as the robot, but it also significantly improves the traversing capabilities of the robot in such terrains. Our results demonstrate an alternative actuation mechanism for multi-legged robots, paving the way towards all-terrain multi-legged robots.
|
|
TuCT8 Regular Session, 311 |
Add to My Program |
Medical Robotics 3 |
|
|
Chair: Diaz-Mercado, Yancy | University of Maryland |
Co-Chair: Dogramadzi, Sanja | University of Sheffield |
|
15:15-15:20, Paper TuCT8.1 | Add to My Program |
In Vivo Feasibility Study: Evaluating Autonomous Data-Driven Robotic Needle Trajectory Correction in MRI-Guided Transperineal Procedures |
|
Bernardes, Mariana C. | Brigham and Women's Hospital / Harvard Medical School |
Moreira, Pedro | Brigham and Women's Hospital / Harvard Medical School |
Lezcano, Dimitri A. | Johns Hopkins University |
Foley, Lori | Brigham and Women's Hospital |
Tuncali, Kemal | BWH |
Tempany, Clare | Brigham & Women's Hospital, Harvard MEDICAL SCHOOL |
Kim, Jin Seob | Johns Hopkins University |
Hata, Nobuhiko | Brigham and Women's Hospital |
Iordachita, Ioan Iulian | Johns Hopkins University |
Tokuda, Junichi | Brigham and Women's Hospital and Harvard Medical School |
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles
Abstract: This study addresses the targeting challenges in MRI-guided transperineal needle placement for prostate cancer diagnosis and treatment, a procedure where accuracy is crucial for effective outcomes. We introduce a parameter-agnostic trajectory correction approach incorporating a data-driven closed-loop strategy by radial displacement and an FBG-based shape sensing to enable autonomous needle steering. In an animal study designed to emulate clinical complexity and assess MRI compatibility through a mock biopsy procedure, our approach demonstrated a significant improvement in targeting accuracy (p < 0.05), with mean target error of only 2.2 ± 1.9 mm on first insertion attempts, without needle reinsertions. To the best of our knowledge, this work represents the first in vivo evaluation of robotic needle steering with FBG-sensor feedback, marking a significant step towards its clinical translation.
|
|
15:20-15:25, Paper TuCT8.2 | Add to My Program |
Pre-Surgical Planner for Robot-Assisted Vitreoretinal Surgery: Integrating Eye Posture, Robot Position and Insertion Point |
|
Inagaki, Satoshi | NSK.Ltd |
Alikhani, Alireza | Augen Klinik Und Poliklinik, Klinikum Rechts Der Isar Der Techn |
Navab, Nassir | TU Munich |
Issa, Peter Charbel | Klinikum Rechts Der Isar, Technical University of Munich |
Nasseri, M. Ali | Technische Universitaet Muenchen |
Keywords: Medical Robots and Systems, Surgical Robotics: Planning
Abstract: Several robotic frameworks have been recently developed to assist ophthalmic surgeons in performing complex vitreoretinal procedures such as subretinal injection of advanced therapeutics. These surgical robots show promising capabilities; however, most of them have to limit their working volume to achieve maximum accuracy. Moreover, the visible area seen through the surgical microscope is limited and solely depends on the eye posture. If the eye posture, trocar position, and robot configuration are not correctly arranged, the instrument may not reach the target position, and the preparation will have to be redone. Therefore, this paper proposes the optimization framework of the eye tilting and the robot positioning to reach various target areas for different patients. Our method was validated with an adjustable phantom eye model, and the error of this workflow was 0.13 ± 1.65 deg (rotational joint around Y axis), -1.40 ± 1.13 deg (around X axis), and 1.80 ± 1.51 mm (depth, Z). The potential error sources are also analyzed in the discussion section.
|
|
15:25-15:30, Paper TuCT8.3 | Add to My Program |
Suture Thread Modeling Using Control Barrier Functions for Autonomous Surgery |
|
Forghani, Kimia | University of Maryland College Park |
Raval, Suraj | University of Maryland, College Park |
Mair, Lamar | Weinberg Medical Physics, Inc |
Krieger, Axel | Johns Hopkins University |
Diaz-Mercado, Yancy | University of Maryland |
Keywords: Surgical Robotics: Steerable Catheters/Needles, Distributed Robot Systems, Collision Avoidance
Abstract: Automating surgical systems enhances precision and safety while reducing human involvement in high-risk environments. A major challenge in automating surgical procedures like suturing is accurately modeling the suture thread, a highly flexible and compliant component. Existing models either lack the accuracy needed for safety-critical procedures or are too computationally intensive for real-time execution. In this work, we introduce a novel approach for modeling suture thread dynamics using control barrier functions (CBFs), achieving both realism and computational efficiency. Thread-like behavior, collision avoidance, stiffness, and damping are all modeled within a unified CBF and control Lyapunov function (CLFs) framework. Our approach eliminates the need to calculate complex forces or solve differential equations, significantly reducing computational overhead while maintaining a realistic model suitable for both automation and virtual-reality surgical training systems. The framework also allows visual cues to be provided based on the thread’s interaction with the environment, enhancing user experience when performing suture or ligation tasks. The proposed model is tested on the MagnetoSuture system, a minimally invasive robotic surgical platform that uses magnetic fields to manipulate suture needles, offering a less invasive solution for surgical procedures.
|
|
15:30-15:35, Paper TuCT8.4 | Add to My Program |
Robotic Colonoscopy: Can High Fidelity Simulation Optimize Robot Design and Validation? |
|
Evans, Michael | The University of Sheffield |
Du, Jiayang | The University of Sheffield |
Cao, Lin | University of Sheffield |
Dogramadzi, Sanja | University of Sheffield |
Keywords: Medical Robots and Systems, Simulation and Animation, Surgical Robotics: Steerable Catheters/Needles
Abstract: This paper presents the use of a simulation environment as an accurate, ethical and sustainable alternative to testing robotic prototypes in animal models and simplified phantom models. The simulation is specifically developed for robotic colonoscopy devices inside the human colon. A virtual simulation of the locomotion mechanism of a prototype robotic colonoscope and the colon was created in Ansys, and robot/colon experiments were conducted on different colon surfaces to validate simulation results. The successfully simulated propulsion force generated by the prototype produced an RMSE of 7% when compared at the optimal operating condition of the device, and 25-30% when compared to a full range of device velocities. The larger RMSE is due to physical phenomena that were not present in the simulation due to the constraints applied. The simulation, however, allowed evaluation of difficult quantities to measure in the real settings such as the normal interaction force between the device and tissue wall, and stress distribution across the locomotion mechanism, as well as a phenomenon of oscillating propulsion force resulting from the device design. This work demonstrates feasibility of using finite element simulation to shape the design and optimization of a robotic colonoscope, and understands its interaction with highly complex human anatomy.
|
|
15:35-15:40, Paper TuCT8.5 | Add to My Program |
Robotic Tissue Manipulation in Endoscopic Submucosal Dissection Via Visual Feedback |
|
Zhang, Tao | Arizona State University |
Jue, Terry | Mayo Clinic |
Marvi, Hamidreza | Arizona State University |
Keywords: Medical Robots and Systems
Abstract: Colorectal cancer is the third most commonly diagnosed cancer and the second leading cause of cancer-related deaths in the United States. Despite advancements in screening and treatment, there remains a critical need for more effective and minimally invasive methods to manage complex polyps and early-stage colorectal cancers. This study introduces a novel approach to magnetic tissue manipulation for Endoscopic Submucosal Dissection (ESD), leveraging visual feedback to enhance precision and control. We develop and evaluate the proposed system within a ROS Gazebo simulation environment, integrating a small magnetic endoscopic clip affixed to tissue, which is manipulated by an external large magnet mounted on a robotic arm. A key challenge in ESD is achieving adequate tissue exposure for precise cutting, particularly in the confined space of the colon where the endoscope is manually controlled. To address this, our system enables controlled manipulation of the magnetic clip to optimize tissue retraction. The robotic arm, guided by real-time visual feedback, dynamically adjusts the internal clip’s orientation. Multiple virtual cameras were used to validate the proposed method. The simulation results demonstrated that the robot arm successfully manipulated the internal magnetic clip to the desired tilt angle within an average of 8.4 seconds (range 5.3 to 15.2 s). Our findings suggest that robotic-assisted magnetic tissue manipulation has the potential to improve ESD success rates while reducing procedure time, paving the way for further advancements in minimally invasive endoscopic surgery.
|
|
15:40-15:45, Paper TuCT8.6 | Add to My Program |
Learning Based Estimation of Tool-Tissue Interaction Forces for Stationary and Moving Environments |
|
Nowakowski, Lukasz | Western University |
Patel, Rajnikant V. | The University of Western Ontario |
Keywords: Medical Robots and Systems, Deep Learning Methods, Haptics and Haptic Interfaces
Abstract: Accurately estimating tool-tissue interaction forces during robotics-assisted minimally invasive surgery is an important aspect of enabling haptics-based teleoperation. By collecting data regarding the state of a robot in a variety of configurations, neural networks can be trained to predict this interaction force. This paper extends existing work in this domain based on collecting one of the largest known ground truth force datasets for stationary as well as moving phantoms that replicate tissue motions found in clinical procedures. Existing methods, and a new transformer-based architecture, are evaluated to demonstrate the domain gap between stationary and moving phantom tissue data and the impact that data scaling has on each architecture's ability to generalize the force estimation task. It was found that temporal networks were more sensitive to the moving domain than single-sample Feed Forward Networks (FFNs) that were trained on stationary tissue data. However, the transformer approach results in the lowest Root Mean Square Error (RMSE) when evaluating networks trained on examples of both stationary and moving phantom tissue samples. The results demonstrate the domain gap between stationary and moving surgical environments and the effectiveness of scaling datasets for increased accuracy of interaction force prediction.
|
|
15:45-15:50, Paper TuCT8.7 | Add to My Program |
RASEC: Rescaling Acquisition Strategy with Energy Constraints under Fusion Kernel for Active Incision Recommendation in Tracheotomy (I) |
|
Yue, Wenchao | The Chinese University of Hong Kong |
Bai, Fan | The Chinese University of Hong Kong |
Liu, Jianbang | The Chinese University of Hong Kong |
Ju, Feng | Nanjing University of Aeronautics and Astronautics |
Meng, Max Q.-H. | The Chinese University of Hong Kong |
Lim, Chwee Ming | National University of Singapore |
Ren, Hongliang | Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS) |
Keywords: Medical Robots and Systems, Surgical Robotics: Planning, Telerobotics and Teleoperation
Abstract: Tracheotomy is critical for patients needing prolonged intubation or airway management, where accurate incision placement is essential to avoid complications. Current techniques rely on palpating cartilage landmarks, which can be imprecise. This paper presents RASEC, an autonomous palpation-based strategy that enhances robot-assisted tracheotomy by interactively predicting acquisition points to maximize information and minimize palpation costs. We employ a Gaussian Process (GP) to model the distribution of tissue hardness, integrating anatomical data as prior input to guide palpation. A dynamic tactile sensor, based on resonant frequency, measures tissue hardness with millimeter-scale contact for secure interaction. We use kernel fusion, combining the Squared Exponential (SE) and Ornstein-Uhlenbeck (OU) kernels, to optimize Bayesian searches using laryngeal anatomical knowledge. The acquisition strategy also considers the tactile sensor’s movement distance and the robotic base link's rotation during incision localization. Simulations and physical experiments demonstrate a 53.1% reduction in sensor movement distance and a 75.2% improvement in rotation angle. The results yield an average precision of 0.932, recall of 0.973, and F1 score of 0.952, showcasing RASEC’s efficacy in exploration efficiency, cost awareness, and localization accuracy for tracheotomy procedures.
|
|
TuCT9 Regular Session, 312 |
Add to My Program |
Motion Planning 3 |
|
|
Chair: Kallmann, Marcelo | Amazon Robotics |
Co-Chair: Vundurthy, Bhaskar | Carnegie Mellon University |
|
15:15-15:20, Paper TuCT9.1 | Add to My Program |
Locally Homotopic Paths: Ensuring Consistent Paths in Hierarchical Path Planning |
|
Wongpiromsarn, Tichakorn | Iowa State University |
Kallmann, Marcelo | Amazon Robotics |
Kolling, Andreas | Amazon |
Keywords: Motion and Path Planning, Integrated Planning and Control, Optimization and Optimal Control
Abstract: We consider a local planner that utilizes model predictive control to locally deviate from a prescribed global path in response to dynamic environments, taking into account the system dynamics. To ensure the consistency between the local and global paths, we introduce the concept of locally homotopic paths for paths with different origins and destinations. We then formulate a hard constraint to ensure that local paths are locally homotopic to a given global path. Additionally, we propose a cost function to penalize any violation of this requirement rather than completely prohibiting it. Experimental results show that both variants of our approach are more resilient to localization errors compared to existing methods that represent the homotopy class constraint as an envelope around the global path.
|
|
15:20-15:25, Paper TuCT9.2 | Add to My Program |
Multi-Covering a Point Set by M Disks with Minimum Total Area |
|
Guitouni, Mariem | University of Houston |
Loi, Chek-Manh | Technische Universität Braunschweig |
Perk, Michael | TU Braunschweig |
Fekete, Sándor | Technische Universität Braunschweig |
Becker, Aaron | University of Houston |
Keywords: Computational Geometry, Aerial Systems: Applications, Optimization and Optimal Control
Abstract: A common robotics sensing problem is to place sensors to robustly monitor a set of assets, where robustness is assured by requiring asset p to be monitored by at least kappa(p) sensors. Given n assets that must be observed by m sensors, each with a disk-shaped sensing region, where should the sensors be placed to minimize the total area observed? We provide and analyze a fast heuristic for this problem. We then use the heuristic to initialize an exact Integer Programming solution. Subsequently, we enforce separation constraints between the sensors by modifying the integer program formulation and by changing the disk candidate set.
|
|
15:25-15:30, Paper TuCT9.3 | Add to My Program |
Non-Conservative Obstacle Avoidance for Multi-Body Systems Leveraging Convex Hulls and Predicted Closest Points |
|
Rassaerts, Lotte | Eindhoven University of Technology |
Suichies, Eke Janke | Eindhoven University of Technology |
van de Vrande, Bram | Philips |
Alonso, Marco | Company |
Meere, Bastiaan Guillermo Lorenzo | Eindhoven University of Technology |
Chong, Michelle S. | Eindhoven University of Technology |
Torta, Elena | Eindhoven University of Technology |
Keywords: Collision Avoidance, Constrained Motion Planning, Computational Geometry
Abstract: This paper introduces a novel approach that integrates future closest point predictions into the distance constraints of a collision avoidance controller, leveraging convex hulls with closest point distance calculations. By addressing abrupt shifts in closest points, this method effectively reduces collision risks and enhances controller performance. Applied to an Image Guided Therapy robot and validated through simulations and user experiments, the framework demonstrates improved distance prediction accuracy, smoother trajectories, and safer navigation near obstacles.
|
|
15:30-15:35, Paper TuCT9.4 | Add to My Program |
Adaptive Distance Functions Via Kelvin Transformation |
|
Cabral Muchacho, Rafael Ignacio | KTH Royal Institute of Technology |
Pokorny, Florian T. | KTH Royal Institute of Technology |
Keywords: Computational Geometry, Robot Safety
Abstract: The term safety in robotics is often understood as a synonym for avoidance. Although this perspective has led to progress in path planning and reactive control, a generalization of this perspective is necessary to include task semantics relevant to contact-rich manipulation tasks, especially during teleoperation and to ensure the safety of learned policies. We introduce the semantics-aware distance function and a corresponding computational method based on the Kelvin Transformation. This allows us to compute smooth distance approximations in an unbounded domain by instead solving a Laplace equation in a bounded domain. The semantics-aware distance generalizes signed distance functions by allowing the zero level set to lie inside of the object in regions where contact is allowed, effectively incorporating task semantics, such as object affordances, in an adaptive implicit representation of safe sets. In numerical experiments we show the computational viability of our method for real applications and visualize the computed function on a wrench with various semantic regions.
|
|
15:35-15:40, Paper TuCT9.5 | Add to My Program |
A Quantum Annealing Approach to Target Tracking |
|
Barbeau, Michel | Carleton University |
Janabi-Sharifi, Farrokh | Ryerson University |
Masnavi, Houman | Toronto Metropolitan University |
Keywords: Motion and Path Planning, Optimization and Optimal Control, Nonholonomic Motion Planning
Abstract: This paper delves into the fusion of quantum computing and robotics, focusing on motion planning in cluttered environments. Traditional algorithms struggle with complex problems where many constraints need to be satisfied. Hence, optimization-based approaches such as Constrained Quadratic Models (CQM) have become increasingly popular. Our work presents a 3D tracking algorithm based on CQM uniquely adapted for quantum computers to address computational challenges. With their parallel processing capabilities, Quantum computers offer a groundbreaking approach to optimizing complex problems. We formulate the CQM problem for efficient resolution on the D-Wave quantum computer, showcasing its superiority over classical counterparts. Our application centers on real-time planning in a target-chaser tracking scenario, highlighting the quantum advantage in handling the computation complexity of constrained problems. This paper bridges the quantum-robotics gap and sets the stage for future research in quantum-enhanced robotic motion planning.
|
|
15:40-15:45, Paper TuCT9.6 | Add to My Program |
Provable Methods for Searching with an Imperfect Sensor |
|
Kasthurirangan, Prahlad Narasimhan | Stony Brook University |
Nguyen, Linh | Stony Brook University |
Perk, Michael | TU Braunschweig |
Chakraborty, Nilanjan | Stony Brook University |
Mitchell, Joseph | State University of New York at Stony Brook |
Keywords: Motion and Path Planning, Computational Geometry, Planning, Scheduling and Coordination
Abstract: Assume that a target is known to be present at an unknown point among a finite set of locations in the plane. We search for it using a mobile robot that has imperfect sensing capabilities. It takes time for the robot to move between locations and search a location; we have a total time budget within which to conduct the search. We study the problem of computing a search path/strategy for the robot that maximizes the probability of detection of the target. Considering non-uniform travel times between points (e.g., based on the distance between them) is crucial for search and rescue applications; such problems have been investigated to a limited extent due to their inherent complexity. In this paper, we describe fast algorithms with performance guarantees for this search problem and some variants, complement them with complexity results, and perform experiments to observe their performance.
|
|
TuCT10 Regular Session, 313 |
Add to My Program |
Multi-Robot Swarms 2 |
|
|
Chair: Artemiadis, Panagiotis | University of Delaware |
Co-Chair: Pimenta, Luciano | Universidade Federal De Minas Gerais |
|
15:15-15:20, Paper TuCT10.1 | Add to My Program |
Emergence of Collective Behaviors for the Swarm Robotics through Visual Attention-Based Selective Interaction |
|
Zheng, Zhicheng | Northwestern Polytechnical University |
Zhou, Yongjian | Northwestern Polytechnical University |
Xiang, Yalun | Northwestern Polytechnical University |
Lei, Xiaokang | Northwestern Polytechnical University |
Peng, Xingguang | Northwestern Polytechnical University |
Keywords: Swarm Robotics, Biologically-Inspired Robots, Probability and Statistical Methods
Abstract: Plenty of local interaction mechanisms have been proposed to achieve collective behaviors in swarm robotics. However, these mechanisms require robots to explicitly obtain the velocity of their neighbors as the sensory input to make motion decisions. This further poses great challenges in real-world applications of swarm robotics. In this letter, inspired by the chasing behavior in large-scale migrating locusts, we propose a visual attention-based model to achieve collective behaviors with positional interaction. Through numerical simulations, we find the emergence of three typical collective behaviors: flocking, milling and swarming. To gain deep insights into the new proposed model, we investigate the impact of group size and sensory blindness on the emergence of collective behaviors. Moreover, by using the mean field analysis framework, we present the theoretical proof of the emergence of flocking and milling behavior. Furthermore, to validate the feasibility of our proposed model, we reproduce the flocking and milling behavior with up to 50 physical robots. Robotic experiments demonstrate the promising ability of the new proposed model to achieve collective behaviors with the absence of velocity alignment.
|
|
15:20-15:25, Paper TuCT10.2 | Add to My Program |
Safe Radial Segregation Algorithm for Swarms of Dubins-Like Robots |
|
Bernardes Ferreira Filho, Edson | Royal Holloway University of London |
Brochero Giraldo, David Felipe | Universidade Federal De Minas Gerais |
Dias Nunes, Arthur Henrique | Universidade Federal De Minas Gerais |
Pimenta, Luciano | Universidade Federal De Minas Gerais |
Keywords: Swarm Robotics, Path Planning for Multiple Mobile Robots or Agents, Multi-Robot Systems
Abstract: This work addresses the problem of radially segregating heterogeneous robotic swarms. Such swarms are those composed of different groups of robots. Unlike other works on segregation in the literature, we propose a controller for Dubins-like robots, motivated by autonomous aerial, wheeled, and underwater vehicles. Our controller can drive the robots individually to converge to circles that are shared only by robots of the same group. We present a heuristic and a collision avoidance scheme in which the information required is locally acquired. We present several simulations widely varying the number of robots per group and the number of groups in which segregation is always reached and collisions between robots are always avoided.
|
|
15:25-15:30, Paper TuCT10.3 | Add to My Program |
Impossibility of Self-Organized Aggregation without Computation |
|
Steinberg, Roy | Technion--Israel Institute of Technology |
Solovey, Kiril | Technion--Israel Institute of Technology |
Keywords: Swarm Robotics, Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents
Abstract: In their seminal work, Gauci et al. (2014) studied the fundamental task of aggregation, wherein multiple robots need to gather without an a priori agreed-upon meeting location, using extremely limited hardware. That paper considered differential-drive robots that are memoryless and unable to compute. Moreover, the robots cannot communicate with one another and are only equipped with a simple sensor that determines whether another robot is directly in front of them. Despite those severe limitations, Gauci et al. introduced a controller and proved mathematically that it aggregates a system of two robots for any initial state. Unfortunately, for larger systems, the same controller aggregates empirically in many cases but not all. Thus, the question of whether there exists a controller that aggregates for any number of robots remains open. In this paper, we show that no such controller exists by investigating the geometric structure of controllers. In addition, we disprove the aggregation proof of the aforementioned paper for two robots and present an alternative controller alongside a simple and rigorous aggregation proof.
|
|
15:30-15:35, Paper TuCT10.4 | Add to My Program |
Learning Adversarial Policies for Swarm Leader Identification Using a Probing Agent |
|
Bachoumas, Stergios | University of Delaware |
Artemiadis, Panagiotis | University of Delaware |
Keywords: Swarm Robotics
Abstract: This study introduces a novel approach to swarm leader identification (SLI) in multi-agent robot systems by employing a physical adversary interacting with the swarm in the same environment. We develop a new simulation environment to study the SLI problem and train an adversary, which we term the prober, to solve the SLI problem using forceful interactions with the swarm as its guiding information source. The prober's policy is modeled using the simplified structure state space sequence (S5) model and trained with the Proximal Policy Optimization (PPO) algorithm. The prober only has access to the information on the relative positions of the other agents. We evaluate our approach through extensive simulations using two performance metrics and validate the sim-to-real transfer through robot experiments. Results on evaluating the performance in 10,000 different testing scenarios demonstrate that our method finds the leader's identity in the vast majority (95.7%) of the cases, regardless of the initial leader selection during training. The proposed system represents the first instance of learning-based automatic identification of leader agents in a swarm. This capability is crucial for enabling efficient and robust human-swarm interaction, understanding artificial swarm behaviors, and analyzing latent behaviors in biological swarms in nature.
|
|
15:35-15:40, Paper TuCT10.5 | Add to My Program |
Realizing Emergent Collective Behaviors through Robotic Swarmalators |
|
Beattie, Richard | MIT |
Ceron, Steven | Massachusetts Institute of Technology |
Rus, Daniela | MIT |
Keywords: Cellular and Modular Robots, Swarm Robotics, Multi-Robot Systems
Abstract: Swarmalators move as a function of their pairwise phase interactions, and control their phase as a function of their relative position or motion to other agents. This enables dual sync and swarm behaviors that mimic those exhibited by diverse natural and artificial swarms; these behaviors have almost entirely been explored only through computational simulations. Here, we realize through a 15-robot collective many of the predicted swarmalator behaviors when agents are chiral and non-chiral, when there is frequency coupling, and when the natural frequency distribution is homogeneous and heterogeneous. This work presents an experimental platform that can realize many theoretically predicted collective behaviors, it sheds light on the differences between the simulations and experiments, and it will serve in future studies to realize swarmalator and active matter collective behaviors.
|
|
15:40-15:45, Paper TuCT10.6 | Add to My Program |
Speed and Density Planning for a Speed-Constrained Robot Swarm through a Virtual Tube |
|
Song, Wenqi | Beihang University |
Gao, Yan | School of Automation Science and Electrical Engineering, Beihang |
Quan, Quan | Beihang University |
Keywords: Constrained Motion Planning, Motion Control, Multi-Robot Systems
Abstract: The planning and control of a robot swarm in a complex environment has attracted increasing attention. To this end, the concept of virtual tubes has been taken up in our previous work. Specifically, a virtual tube with varying widths has been planned to avoid collisions with obstacles in a complex environment. Based on the planned virtual tube for a large number of speed-constrained robots, the average forward speed and density along the virtual tube are further planned in this paper to improve safety and efficiency. Compared with the existing methods, the proposed method is founded upon global information and can is applicable to traversing confined spaces for speed-constrained robot swarms. Numerical simulations and experiments are conducted to show that the safety and efficiency of the passing-through process are improved. A video about the simulations and experiments is available at https://youtu.be/F3Xg1vUcxws.
|
|
TuCT11 Regular Session, 314 |
Add to My Program |
Human-Robot Collaboration 1 |
|
|
Chair: Rocco, Paolo | Politecnico Di Milano |
Co-Chair: Stepputtis, Simon | Virginia Tech |
|
15:15-15:20, Paper TuCT11.1 | Add to My Program |
Let Me Help You! Neuro-Symbolic Short-Context Action Anticipation |
|
Bhagat, Sarthak | Scaled Foundations |
Li, Samuel | Carnegie Mellon University |
Campbell, Joseph | Purdue University |
Xie, Yaqi | Carnegie Mellon University |
Sycara, Katia | Carnegie Mellon University |
Stepputtis, Simon | Carnegie Mellon University |
Keywords: Intention Recognition, Human-Robot Collaboration, Visual Learning
Abstract: In an era where robots become available to the general public, the applicability of assistive robotics extends across numerous aspects of daily life, including in-home robotics. This work presents a novel approach for such systems, leveraging long-horizon action anticipation from short-observation contexts. In an assistive cooking task, we demonstrate that predicting human intention leads to effective collaboration between humans and robots. Compared to prior approaches, our method halves the required observation time of human behavior before accurate future predictions can be made, thus, allowing for quick and effective task support from short contexts. To provide sufficient context in such scenarios, our proposed method analyzes the human user and their interaction with surrounding scene objects by imbuing the system with additional domain knowledge, encoding the scene object's affordances. We integrate this knowledge into a transformer-based action anticipation architecture, which alters the attention mechanism between different visual features by either boosting or attenuating the attention between them. Through this approach, we achieve an up to 9% improvement on two common action anticipation benchmarks, namely 50Salads and Breakfast. After predicting a sequence of future actions, our system selects an appropriate assistive action that is subsequently executed on a robot for a joint salad preparation task between a human and a robot.
|
|
15:20-15:25, Paper TuCT11.2 | Add to My Program |
Evaluating Robotic Performative Autonomy in Collaborative Contexts Impacted by Latency |
|
Sousa Silva, Rafael | Colorado School of Mines |
Smith, Cailyn | Colorado School of Mines |
Ferreira Bezerra, Lara | Colorado School of Mines |
Williams, Tom | Colorado School of Mines |
Keywords: Human-Robot Collaboration, Space Robotics and Automation, Human-Centered Automation
Abstract: Maintaining Situational Awareness (SA) is critical in space exploration contexts, yet made particularly difficult due to the presence of communication latency. In order to increase human SA without inducing cognitive overload, researchers have proposed Performative Autonomy (PA), in which robots intentionally interact at a lower level of autonomy than they are capable of. While researchers have demonstrated positive impacts of PA on team performance even under high latency, previous work on PA has not examined how the benefits of PA might be mediated by latency. In this work, we thus evaluate the impact of latency and PA on trust, SA, and human perceptions of robot intelligence and autonomy. Our results suggest that lower performed autonomy leads to increased cognitive load, especially when robot communication happens frequently and latency is present. In addition, we observe no effect of the PA strategies used within our experimental paradigm on SA, and instead find evidence that operating under high latency leads to negative perceptions of robots regardless of choice of PA strategy.
|
|
15:25-15:30, Paper TuCT11.3 | Add to My Program |
SYNERGAI: Perception Alignment for Human-Robot Collaboration |
|
Chen, Yixin | National Key Laboratory of General Artificial Intelligence, BIGA |
Zhang, Guoxi | Beijing Institute of General Artificial Intelligence |
Zhang, Yaowei | BIGAI |
Xu, Hongming | Beijing Institute for General Artificial Intelligence |
Zhi, Peiyuan | Beijing Institute for General Artificial Intelligence |
Li, Qing | Beijing Institute for General Artificial Intelligence |
Huang, Siyuan | Beijing Institute for General Artificial Intelligence |
Keywords: Domestic Robotics
Abstract: Recently, large language models (LLMs) have shown strong potential in facilitating human-robotic interaction and collaboration. However, existing LLM-based systems often overlook the misalignment between human and robot perceptions, which hinders their effective communication and real-world robot deployment. To address this issue, we introduce SYNERGAI, a unified system designed to achieve both perceptual alignment and human-robot collaboration. At its core, SYNERGAI employs 3D Scene Graph (3DSG) as its explicit and innate representation. This enables the system to leverage LLM to break down complex tasks and allocate appropriate tools in intermediate steps to extract relevant information from the 3DSG, modify its structure, or generate responses. Importantly, SYNERGAI incorporates an automatic mechanism that enables perceptual misalignment correction with users by updating its 3DSG in real-time. In a zero-shot manner, SYNERGAI achieves comparable performance with the data-driven models in ScanQA. Through comprehensive experiments across 10 real-world scenes, SYNERGAI demonstrates its effectiveness in establishing common ground with humans, realizing a success rate of 61.9% in alignment tasks. It also significantly improves the success rate from 3.7% to 45.68% on novel tasks by transferring the knowledge acquired during alignment.
|
|
15:30-15:35, Paper TuCT11.4 | Add to My Program |
Digital Model-Driven Genetic Algorithm for Optimizing Layout and Task Allocation in Human-Robot Collaborative Assemblies |
|
Cella, Christian | Politecnico Di Milano |
Robin, Matteo Bruce | Politecnico Di Milano |
Faroni, Marco | Politecnico Di Milano |
Zanchettin, Andrea Maria | Politecnico Di Milano |
Rocco, Paolo | Politecnico Di Milano |
Keywords: Human-Robot Collaboration, Design and Human Factors
Abstract: This paper addresses the optimization of human-robot collaborative work-cells before their physical deployment. Most of the times, such environments are designed based on the experience of the system integrators, often leading to sub-optimal solutions. Accurate simulators of the robotic cell, accounting for the presence of the human as well, are available today and can be used in the pre-deployment. We propose an iterative optimization scheme where a digital model of the work-cell is updated based on a genetic algorithm. The methodology focuses on the layout optimization and task allocation, encoding both the problems simultaneously in the design variables handled by the genetic algorithm, while the task scheduling problem depends on the result of the upper-level one. The final solution balances conflicting objectives in the fitness function and is validated to show the impact of the objectives with respect to a baseline, which represents possible initial choices selected based on the human judgement.
|
|
15:35-15:40, Paper TuCT11.5 | Add to My Program |
Context-Aware Collaborative Pushing of Heavy Objects Using Skeleton-Based Intention Prediction |
|
Solak, Gokhan | Italian Institute of Technology, Genoa |
Giardini Lahr, Gustavo Jose | Hospital Israelita Albert Einstein |
Ozdamar, Idil | HRI2 Lab., Istituto Italiano Di Tecnologia. Dept. of Informatics |
Ajoudani, Arash | Istituto Italiano Di Tecnologia |
Keywords: Human-Robot Collaboration, Intention Recognition, Physical Human-Robot Interaction
Abstract: In physical human-robot interaction, force feedback has been the most common sensing modality to convey the human intention to the robot. It is widely used in admittance control to allow the human to direct the robot. However, it cannot be used in scenarios where direct force feedback is not available since manipulated objects are not always equipped with a force sensor. In this work, we study one such scenario: the collaborative pushing and pulling of heavy objects on frictional surfaces, a prevalent task in industrial settings. When humans do it, they communicate through verbal and non-verbal cues, where body poses, and movements often convey more than words. We propose a novel context-aware approach using Directed Graph Neural Networks to analyze spatio-temporal human posture data to predict human motion intention for non-verbal collaborative physical manipulation. Our experiments demonstrate that robot assistance significantly reduces human effort and improves task efficiency. The results indicate that incorporating posture-based context recognition, either together with or as an alternative to force sensing, enhances robot decision-making and control efficiency.
|
|
15:40-15:45, Paper TuCT11.6 | Add to My Program |
Interactive Distance Field Mapping and Planning to Enable Human-Robot Collaboration |
|
Ali, Usama | Technische Hochschule Würzburg Schweinfurt |
Wu, Lan | University of Technology Sydney |
Mueller, Adrian | Friedrich-Alexander-Universität Erlangen-Nürnberg |
Sukkar, Fouad | University of Technology Sydney |
Kaupp, Tobias | Technical University of Applied Sciences Würzburg-Schweinfurt |
Vidal-Calleja, Teresa A. | University of Technology Sydney |
Keywords: Mapping, Human-Robot Collaboration
Abstract: Human-robot collaborative applications require scene representations that are kept up-to-date and facilitate safe motions in dynamic scenes. In this letter, we present an interactive distance field mapping and planning (IDMP) framework that handles dynamic objects and collision avoidance through an efficient representation. We define interactive mapping and planning as the process of creating and updating the representation of the scene online while simultaneously planning and adapting the robot's actions based on that representation. The key aspect of this work is an efficient Gaussian Process field that performs incremental updates and handles dynamic objects reliably by identifying moving points via a simple and elegant formulation based on queries from a temporary latent model. In terms of mapping, IDMP is able to fuse point cloud data from single and multiple sensors, query the free space at any spatial resolution, and deal with moving objects without semantics. In terms of planning, IDMP allows seamless integration with gradient-based reactive planners facilitating dynamic obstacle avoidance for safe human-robot interactions. Our mapping performance is evaluated on both real and synthetic datasets. A comparison with similar state-of-the-art frameworks shows superior performance when handling dynamic objects and comparable or better performance in the accuracy of the computed distance and gradient field. Finally, we show how the framework can be used for fast motion planning in the presence of moving objects both in simulated and real-world scenes. An accompanying video, code, and datasets are made publicly available.
|
|
TuCT12 Regular Session, 315 |
Add to My Program |
Calibration 3 |
|
|
Chair: Krovi, Venkat | Clemson University |
Co-Chair: Yuan, Shenghai | Nanyang Technological University |
|
15:15-15:20, Paper TuCT12.1 | Add to My Program |
A Stochastic Cloning Square-Root Information Filter with Accurate Feature Tracking for Visual-Inertial Odometry |
|
Hu, Deshun | Harbin Institute of Technology |
Keywords: Visual-Inertial SLAM, SLAM, Localization
Abstract: In this work, we introduce an enhanced square-root information filter for visual-inertial odometry. This filter utilizes stochastic cloning, implemented via Gaussian elimination, to facilitate time offset calibration and feature anchor changes. By using single-precision numbers within the filter, we significantly reduce computational load and memory requirements. In addition, we employ a fast Mahalanobis distance test and block Householder triangulation to accelerate the calculations. To mitigate feature drift from frame-to-frame optical flow, we create keyframes at regular intervals and refine long-tracked features between them. We use affine optical flow to compensate for patch deformations induced by possible large spatial transformations between keyframes. An analytical approach to computing the affine transformation is proposed. Experiments conducted on real-world data show that the proposed method achieves state-of-the-art performance at a much faster speed.
|
|
15:20-15:25, Paper TuCT12.2 | Add to My Program |
Large-Scale UWB Anchor Calibration and One-Shot Localization Using Gaussian Process |
|
Yuan, Shenghai | Nanyang Technological University |
Lou, Boyang | Beijing University of Posts and Telecommunications |
Nguyen, Thien-Minh | Nanyang Technological University |
Yin, Pengyu | Nanyang Technological University |
Li, Jianping | Nanyang Technological University |
Xu, Xinhang | Nanyang Technological University |
Cao, Muqing | Carnegie Mellon University |
Xu, Jie | Harbin Institute of Technology |
Chen, Siyu | Nanyang Technological University |
Xie, Lihua | NanyangTechnological University |
Keywords: Range Sensing, Localization, Factory Automation
Abstract: Ultra-wideband (UWB) is gaining popularity with devices like AirTags for precise home item localization but faces significant challenges when scaled to large environments like seaports. The main challenges are calibration and localization in obstructed conditions, which are common in logistics environments. Traditional calibration methods, dependent on line-of-sight (LoS), are slow, costly, and unreliable in seaports and warehouses, making large-scale localization a significant pain point in the industry. To overcome these challenges, we propose a UWB-LiDAR fusion-based calibration and one-shot localization framework. Our method uses Gaussian Processes to estimate anchor position from continuous-time LiDAR Inertial Odometry with sampled UWB ranges. This approach ensures accurate and reliable calibration with just one round of sampling in large-scale areas, I.e., 600x450 m². With the LoS issues, UWB-only localization can be problematic, even when anchor positions are known. We demonstrate that by applying a UWB-range filter, the search range for LiDAR loop closure descriptors is significantly reduced, improving both accuracy and speed. This concept can be applied to other loop closure detection methods, enabling cost-effective localization in large-scale warehouses and seaports. It significantly improves precision in challenging environments where UWB-only and LiDAR-Inertial methods fall short. We will open-source our datasets and calibration codes for community use.
|
|
15:25-15:30, Paper TuCT12.3 | Add to My Program |
Online Identification of Skidding Modes with Interactive Multiple Model Estimation |
|
Salvi, Ameya | Clemson University |
Ala, Pardha Sai Krishna | Clemson University |
Smereka, Jonathon M. | U.S. Army TARDEC |
Brudnak, Mark | US Army DEVCOM-GVSC |
Gorsich, David | The U.S. Army Ground Vehicle Systems Center |
Schmid, Matthias | Clemson University |
Krovi, Venkat | Clemson University |
Keywords: Field Robots, Failure Detection and Recovery, Calibration and Identification
Abstract: Skid-steered wheel mobile robots (SSWMRs) operate in a variety of outdoor environments exhibiting motion behaviors dominated by the effects of complex wheel-ground interactions. Characterizing these interactions is crucial both from the immediate robot autonomy perspective (for motion prediction and control) as well as a long-term predictive maintenance and diagnostics perspective. An ideal solution entails capturing precise state measurements for decisions and controls, which is considerably difficult, especially in increasingly unstructured outdoor regimes of operations for these robots. In this milieu, a framework to identify pre-determined discrete modes of operation can considerably simplify the motion model identification process. To this end, we propose an interactive multiple model (IMM) based filtering framework to probabilistically identify predefined robot operation modes that could arise due to traversal in different terrains or loss of wheel traction.
|
|
15:30-15:35, Paper TuCT12.4 | Add to My Program |
RLCNet: A Novel Deep Feature-Matching-Based Method for Online Target-Free Radar-LiDAR Calibration |
|
Luan, Kai | Intelligence Science and Technology,National University O |
Shi, Chenghao | NUDT |
Chen, Xieyuanli | National University of Defense Technology |
Fan, Rui | Tongji University |
Zheng, Zhiqiang | National University of Defense Technology |
Lu, Huimin | National University of Defense Technology |
Keywords: Localization, Deep Learning for Visual Perception
Abstract: While millimeter-wave radars are widely used in robotics and autonomous driving, extrinsic calibration with other sensors remains challenging due to the sparsity and uncertainty of radar point clouds. In this paper, we propose a novel deep feature-matching-based online extrinsic calibration approach for a 4D millimeter-wave radar and 3D LiDAR system. We formulate the calibration problem as a cross-modal point cloud registration task, initiating with keypointlevel matching followed by dense matching refinement. Efficient yet powerful neural networks are employed to extract prior keypoint matches, which are then expanded to surrounding regions, establishing dense point correspondences. Our approach effectively leverages the majority of the information from millimeter-wave radar, mitigating the impact of radar point cloud sparsity. We evaluate our approach on two datasets, and experimental results demonstrate that it outperforms state-of-the-art baseline methods and achieves an average improvement of 66.96% in calibration success rate, while reducing translational error and rotational error by 23.84% and 30.31%, respectively. Our implementation will be made open-source at https://github.com/nubot-nudt/RLCNet.
|
|
15:35-15:40, Paper TuCT12.5 | Add to My Program |
Universal Online Temporal Calibration for Optimization-Based Visual-Inertial Navigation Systems |
|
Fan, Yunfei | ByteDance Inc |
Zhao, Tianyu | Bytedance |
Guo, Linan | China University of Mining and Technology (Beijing) |
Chen, Chen | Bytedance Inc |
Wang, Xin | Bytedance |
Zhou, Fengyi | ByteDance Inc |
Keywords: Visual-Inertial SLAM, Localization, Sensor Fusion
Abstract: 6-Degree of Freedom (6DoF) motion estimation with a combination of visual and inertial sensors is a growing area with numerous real-world applications. However, precise calibration of the time offset between these two sensor types is a prerequisite for accurate and robust tracking. To address this, we propose a universal online temporal calibration strategy for optimization-based visual-inertial navigation systems. Technically, we incorporate the time offset as a state parameter in the optimization residual model to align the IMU state to the corresponding image timestamp using time offset, angular velocity and translational velocity. This allows the temporal misalignment to be optimized alongside other tracking states during the process. As our method only modifies the structure of the residual model, it can be applied to various optimization-based frameworks with different tracking frontends. We evaluate our calibration method with both EuRoC and simulation data and extensive experiments demonstrate that our approach provides more accurate time offset estimation and faster convergence, particularly in the presence of noisy sensor data.The experimental code is available at https://github.com/bytedance/Ts_Online_Optimization.
|
|
15:40-15:45, Paper TuCT12.6 | Add to My Program |
Multi-Camera Hand-Eye Calibration for Human-Robot Collaboration in Industrial Robotic Workcells |
|
Allegro, Davide | University of Padova |
Terreran, Matteo | University of Padova |
Ghidoni, Stefano | University of Padova |
Keywords: Calibration and Identification, Sensor Networks, Human-Robot Collaboration
Abstract: In industrial scenarios, effective human-robot collaboration relies on multi-camera systems to robustly monitor human operators despite the occlusions that typically show up in a robotic workcell. In this scenario, precise localization of the person in the robot coordinate system is essential, making the hand-eye calibration of the camera network critical. This process presents significant challenges when high calibration accuracy should be achieved in short time to minimize production downtime, and when dealing with extensive camera networks used for monitoring wide areas, such as industrial robotic workcells. Our paper introduces an innovative and robust multi-camera hand-eye calibration method, designed to optimize each camera’s pose relative to both the robot’s base and to each other camera. This optimization integrates two types of key constraints: i) a single board-to-end-effector transformation, and ii) the relative camera-to-camera transformations. We demonstrate the superior performance of our method through comprehensive experiments employing the METRIC dataset and real-world data collected on industrial scenarios, showing notable advancements over state-of-the-art techniques even using less than 10 images. Additionally, we release an open-source version of our multi-camera hand-eye calibration algorithm at https://github.com/davidea97/Multi-Camera-Hand-Eye-Calibration.git.
|
|
15:45-15:50, Paper TuCT12.7 | Add to My Program |
EdgeCalib: Multi-Frame Weighted Edge Features for Automatic Targetless LiDAR-Camera Calibration |
|
Li, Xingchen | University of Science and Technology of China |
Duan, Yifan | University of Science and Technology of China |
Wang, Beibei | Hefei Comprehensive National Science Center |
Ren, Haojie | University of Science and Technology of China |
You, Guoliang | University of Science and Technology of China |
Sheng, Yu | University of Science and Technology of China |
Ji, Jianmin | University of Science and Technology of China |
Zhang, Yanyong | University of Science and Technology of China |
Keywords: Calibration and Identification, Sensor Fusion, Multi-Modal Perception for HRI
Abstract: In multimodal perception systems, achieving precise extrinsic calibration between LiDAR and camera is of critical importance. However, the pre-calibrated extrinsic parameters may gradually drift during operation, leading to a decrease in the accuracy of the perception system. It is challenging to address this issue using methods based on artificial targets. In this article, we introduce an edge-based approach for automatic targetless calibration of LiDARs and cameras in real-world scenarios. The edge features, which are prevalent in various environments, are used to establish reliable correspondences in images and point clouds. Specifically, we leverage the Segment Anything Model to facilitate the extraction of stable and reliable image edge features. Then a multi-frame weighting strategy is used for feature filtering while alleviating the dependence on the environment. Finally, we estimate accurate extrinsic parameters based on edge correspondence constraints. Our method achieves a mean rotation error of 0.069 ◦ and a mean translation error of 1.037 cm on the KITTI dataset, outperforming existing edge-based calibration methods and demonstrating strong robustness, accuracy, and generalization capabilities.
|
|
TuCT13 Regular Session, 316 |
Add to My Program |
Radiance Fields for Manipulation |
|
|
Chair: Fazeli, Nima | University of Michigan |
Co-Chair: Shkurti, Florian | University of Toronto |
|
15:15-15:20, Paper TuCT13.1 | Add to My Program |
Gaussian Splatting Visual MPC for Granular Media Manipulation |
|
Tseng, Wei-Cheng | University of Toronto |
Zhang, Ellina | University of Toronto |
Jatavallabhula, Krishna Murthy | MIT |
Shkurti, Florian | University of Toronto |
Keywords: Manipulation Planning, AI-Based Methods, AI-Enabled Robotics
Abstract: Recent advancements in learned 3D representations have enabled significant progress in solving complex robotic manipulation tasks, particularly for rigid-body objects. However, manipulating granular materials such as beans, nuts, and rice, remains challenging due to the intricate physics of particle interactions, high-dimensional and partially observable state, inability to visually track individual particles in a pile, and the computational demands of accurate dynamics prediction. Current deep latent dynamics models often struggle to generalize in granular material manipulation due to a lack of inductive biases. In this work, we propose a novel approach that learns a visual dynamics model over Gaussian splatting representations of scenes and leverages this model for manipulating granular media via Model-Predictive Control. Our method enables efficient optimization for complex manipulation tasks on piles of granular media. We evaluate our approach in both simulated and real-world settings, demonstrating its ability to solve unseen planning tasks and generalize to new environments in a zero-shot transfer. We also show significant prediction and manipulation performance improvements compared to existing granular media manipulation methods.
|
|
15:20-15:25, Paper TuCT13.2 | Add to My Program |
LE-Object: Language Embedded Object-Level Neural Radiance Fields for Open-Vocabulary Scene |
|
Wang, Mengting | Northeastern University |
Zhang, Yunzhou | Northeastern University |
Wang, Xingshuo | Northeastern University |
Zhang, Zhiyao | Northeastern University |
Li, Zhiteng | Northeastern University |
Keywords: Semantic Scene Understanding, Deep Learning Methods, Mapping
Abstract: Recent advancements in Visual Language Models (VLMs) have significantly driven research in open-vocabulary 3D scene reconstruction, showcasing strong potential in open-set retrieval and semantic understanding. However, existing approaches face challenges in open-world environments: they either suffer from insufficient precision in semantic segmentation, leading to inadequate fine-grained scene understanding, or they are limited to object-level reconstruction, failing to capture intricate object details and lack applicability in open-world settings. To address these issues, we introduce LE-Object, an object-centric Neural Implicit Radiance Field (NeRF) method designed for open-world scenarios, aimed at achieving fine-grained scene understanding and high-fidelity object reconstruction. LE-Object integrates spatial features (SF) from object point clouds with visual features (VF) from VLMs to perform object association, ensuring spatiotemporal consistency in object mask segmentation, and extends VLM features from 2D images into 3D space, enabling precise open-world semantic inference and detailed object reconstruction. Experimental results demonstrate that LE-Object excels in zero-shot semantic segmentation and open-world object reconstruction, offering innovative solutions for global navigation and local object manipulation in open-world applications.
|
|
15:25-15:30, Paper TuCT13.3 | Add to My Program |
TranSplat: Surface Embedding-Guided 3D Gaussian Splatting for Transparent Object Manipulation |
|
Kim, Jeongyun | SNU |
Noh, Jeongho | Seoul National University |
Lee, DongGuw | Seoul National University (SNU) |
Kim, Ayoung | Seoul National University |
Keywords: Perception for Grasping and Manipulation, Deep Learning for Visual Perception, Deep Learning in Grasping and Manipulation
Abstract: Transparent object manipulation remains a significant challenge in robotics due to the difficulty of acquiring accurate and dense depth measurements. Conventional depth sensors often fail with transparent objects, resulting in incomplete or erroneous depth data. Existing depth completion methods struggle with interframe consistency and incorrectly model transparent objects as Lambertian surfaces, leading to poor depth reconstruction. To address these challenges, we propose TranSplat, a surface embedding-guided 3D Gaussian Splatting method tailored for transparent objects. TranSplat uses a latent diffusion model to generate surface embeddings that provide consistent and continuous representations, making it robust to changes in viewpoint and lighting. By integrating these surface embeddings with input RGB images, TranSplat effectively captures the complexities of transparent surfaces, enhancing the splatting of 3D Gaussians and improving depth completion. Evaluations on synthetic and real-world transparent object benchmarks, as well as robot grasping tasks, show that TranSplat achieves accurate and dense depth completion, demonstrating its effectiveness in practical applications. We open-source synthetic dataset and model: https://github.com/jeongyun0609/TranSplat
|
|
15:30-15:35, Paper TuCT13.4 | Add to My Program |
NeuGrasp: Generalizable Neural Surface Reconstruction with Background Priors for Material-Agnostic Object Grasp Detection |
|
Fan, Qingyu | University of Chinese Academy of Sciences |
Cai, Yinghao | Institute of Automation, Chinese Academy of Sciences |
Li, Chao | Qiyuan Lab |
He, Wenzhe | Chongqing University |
Zheng, Xudong | Qiyuan Lab |
Lu, Tao | Institute of Automation, Chinese Academy of Sciences |
Liang, Bin | Qiyuan Lab |
Wang, Shuo | Chinese Academy of Sciences |
Keywords: Perception for Grasping and Manipulation, Deep Learning in Grasping and Manipulation, Grasping
Abstract: Robotic grasping in cluttered environments with diverse materials, including transparent and specular surfaces, poses challenges for conventional depth-sensing methods. We introduce NeuGrasp, a neural surface reconstruction method that leverages background priors for material-agnostic grasp detection. NeuGrasp integrates transformers and global prior volumes to aggregate multi-view features with spatial encoding, enabling robust surface reconstruction even in highly narrow and sparse viewing conditions. Our innovative use of background priors enhances focus on foreground objects via residual feature enhancement and refines spatial perception with an occupancy-prior volume, particularly for transparent and specular objects. Extensive experiments in both simulated and real-world settings show NeuGrasp significantly outperforms state-of-the-art methods in grasping while maintaining comparable reconstruction quality. Moreover, NeuGrasp-RA (Reality Augmentation), a fine-tuned variant with small-scale real-world data, demonstrates strong domain adaptation, proving its robustness in practical scenarios.
|
|
15:35-15:40, Paper TuCT13.5 | Add to My Program |
Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting |
|
Strong, Matthew | University of Colorado Boulder |
Lei, Boshu | University of Pennsylvania |
Swann, Aiden | Stanford |
Jiang, Wen | University of Pennsylvania |
Daniilidis, Kostas | University of Pennsylvania |
Kennedy, Monroe | Stanford University |
Keywords: Perception for Grasping and Manipulation, Reactive and Sensor-Based Planning, Semantic Scene Understanding
Abstract: We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes. For more information, please see our project page at https://arm.stanford.edu/next-best-sense.
|
|
15:40-15:45, Paper TuCT13.6 | Add to My Program |
Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects |
|
Yu, Justin | University of California Berkeley |
Hari, Kush | UC Berkeley |
El-Refai, Karim | University of California, Berkeley |
Dalal, Arnav | University of California - Berkeley |
Kerr, Justin | University of California, Berkeley |
Kim, Chung Min | University of California, Berkeley |
Cheng, Richard | California Institute of Technology |
Irshad, Muhammad Zubair | Toyota Research Institute |
Goldberg, Ken | UC Berkeley |
Keywords: Perception for Grasping and Manipulation, Visual Tracking, Visual Servoing
Abstract: Tracking and manipulating irregularly-shaped, previously unseen objects in dynamic environments is important for robotic applications in manufacturing, assembly, and logistics. Recently introduced Gaussian Splats efficiently model object geometry, but lack persistent state estimation for task-oriented manipulation. We present Persistent Object Gaussian Splat (POGS), a system that embeds semantics, self-supervised visual features, and object grouping features into a compact representation that can be continuously updated to estimate the pose of scanned objects. POGS updates object states without requiring expensive rescanning or prior CAD models of objects. After an initial multi-view scene capture and training phase, POGS uses a single stereo camera to integrate depth estimates along with self-supervised vision encoder features for object pose estimation. POGS supports grasping, reorientation, and natural language-driven manipulation by refining object pose estimates, facilitating sequential object reset operations with human-induced object perturbations and tool servoing, where robots recover tool pose despite tool perturbations of up to 30°. POGS achieves up to 12 consecutive successful object resets and recovers from 80% of in-grasp tool perturbations.
|
|
15:45-15:50, Paper TuCT13.7 | Add to My Program |
Tactile Functasets: Neural Implicit Representations of Tactile Datasets |
|
Li, Sikai | University of Michigan |
Rodriguez, Samanta | University of Michigan - Ann Arbor |
Dou, Yiming | University of Michigan |
Owens, Andrew | University of Michigan |
Fazeli, Nima | University of Michigan |
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation, Force and Tactile Sensing
Abstract: Modern incarnations of tactile sensors produce raw, high-dimensional data such as images, making it challenging to efficiently process and generalize across sensors. In this paper, we introduce a novel representation for tactile sensor feedback based on neural implicit functions. Rather than directly using raw tactile images, we propose neural implicit functions trained to reconstruct the tactile dataset, producing compact neural representations that capture the underlying structure of the sensory inputs. These representations offer several advantages over their raw counterparts: they are compact, enable probabilistically interpretable inference, and facilitate generalization across different sensors. We demonstrate the efficacy of this representation on the downstream task of in-hand object pose estimation, achieving improved performance over image-based methods while simplifying downstream models.
|
|
TuCT14 Regular Session, 402 |
Add to My Program |
Tracking and Prediction 2 |
|
|
Chair: Saska, Martin | Czech Technical University in Prague |
Co-Chair: Kyriakopoulos, Kostas | New York University - Abu Dhabi |
|
15:15-15:20, Paper TuCT14.1 | Add to My Program |
I2D-Loc++: Camera Pose Tracking in LiDAR Maps with Multi-View Motion Flows |
|
Yu, Huai | Wuhan University |
Chen, Kuangyi | Graz University of Technology |
Yang, Wen | Wuhan University |
Scherer, Sebastian | Carnegie Mellon University |
Xia, Gui-Song | Wuhan University |
Keywords: Localization, SLAM
Abstract: Camera localization in LiDAR maps has become increasingly popular due to its promising ability to handle complex scenarios, surpassing the limitations of visual-only localization methods. However, existing approaches mostly focus on addressing the cross-modal 2D-3D gaps while overlooking the relationship between adjacent image frames, which results in fluctuations and unreliability of camera poses. To alleviate this, we introduce a novel camera pose tracking framework in LiDAR maps by coupling the 2D-3D correspondences with 2D-2D feature matching (I2D-Loc++), which establishes the multi-view geometric constraints to improve localization stability and trajectory smoothness. Specifically, the framework consists of a front-end hybrid flow estimation network and a non-linear least square pose optimization module. We further design a cross-modal consistency loss to integrate the multi-view motion flows for the network training and the back-end pose optimization. The pose tracking model is trained on the KITTI odometry dataset, and tested on the KITTI odometry, Argoverse, Waymo and Lyft5 datasets, which demonstrates that I2D-Loc++ has superior performance and good generalization ability in improving the accuracy and robustness of camera pose tracking. Our code, pre-trained models, and online demos are available at https://github.com/EasonChen99/2D3DPoseTracking
|
|
15:20-15:25, Paper TuCT14.2 | Add to My Program |
LoFSORT: Sample Online and Real-Time Tracking in Low Frame Rate Scenarios |
|
Wang, Jiabao | Korea Advanced Institute of Science & Technology |
Chang, Dong Eui | KAIST |
Keywords: Visual Tracking, Computer Vision for Automation
Abstract: We propose a novel motion-based tracker specifically designed for tracking multiple people in low frame rate scenarios. While previous studies have predominantly focused on scenarios with high frame rates (exceeding 10 frames per second), tracking in low frame rate conditions is significant for robotic platforms with limited computational resources. Our tracker optimizes the cost function, cascade structure and Kalman filter correction to better adapt to the characteristics of low frame rate environments. First, we enhance the cost function by incorporating stable variables through the introduction of height-based and displacement-based cost terms. Second, we prioritize handling occlusion among individuals during association, which reduces ambiguity in subsequent tracking processes. Third, we utilize the error-compensated observation to correct the Kalman filter, thereby improving tracking accuracy. Experimental results demonstrate that our proposed tracker, LoFSORT, outperforms other motion model-based trackers across various frame rate scenarios. Ablation studies further confirm that each component of our tracker enhances tracking performance in low frame rate scenarios
|
|
15:25-15:30, Paper TuCT14.3 | Add to My Program |
Multirotor Target Tracking through Policy Iteration for Visual Servoing |
|
Aspragkathos, Sotiris | SingularLogic S.A |
Rousseas, Panagiotis | National Technical University of Athens |
Karras, George | University of Thessaly |
Kyriakopoulos, Kostas | New York University - Abu Dhabi |
Keywords: Visual Servoing, Visual Tracking, Optimization and Optimal Control
Abstract: This paper presents a novel vision-based approach for tracking deformable contour targets using Unmanned Aerial Vehicles (UAVs) through combining image moments descriptor and a Policy Iteration scheme ensuring stability and generalization of knowledge to new tasks. This computationally efficient and optimal control scheme is suitable for diverse dynamic environments such as the surveillance and tracking of targets with evolving features. Due to the ability of the proposed scheme to comprehend an optimization output, the generated control sequence, from an offline successively approximated policy, makes the process less challenging. The proposed methodology is validated through extensive simulations and real-word experiments of environmental target surveillance using an octorotor UAV.
|
|
15:30-15:35, Paper TuCT14.4 | Add to My Program |
BiTrack: Bidirectional Offline 3D Multi-Object Tracking Using Camera-LiDAR Data |
|
Huang, Kemiao | Southern University of Science and Technology |
Chen, Yinqi | Southern University of Science and Technology |
Zhang, Meiying | Southern University of Science and Technology |
Hao, Qi | Southern University of Science and Technology |
Keywords: Visual Tracking, Sensor Fusion
Abstract: Compared with real-time multi-object tracking (MOT), offline multi-object tracking (OMOT) has the advantages to perform 2D-3D detection fusion, erroneous link correction, and full track optimization but has to deal with the challenges from bounding box misalignment and track evaluation, editing, and refinement. This paper proposes ``BiTrack'', a 3D OMOT framework that includes modules of 2D-3D detection fusion, initial trajectory generation, and bidirectional trajectory re-optimization to achieve optimal tracking results from camera-LiDAR data. The novelty of this paper includes threefold: (1) development of a point-level object registration technique that employs a density-based similarity metric to achieve accurate fusion of 2D-3D detection results; (2) development of a set of data association and track management skills that utilizes a vertex-based similarity metric as well as false alarm rejection and track recovery mechanisms to generate reliable bidirectional object trajectories; (3) development of a trajectory re-optimization scheme that re-organizes track fragments of different fidelities in a greedy fashion, as well as refines each trajectory with completion and smoothing techniques. The experiment results on the KITTI dataset demonstrate that BiTrack achieves the state-of-the-art performance for 3D OMOT tasks in terms of accuracy and efficiency.
|
|
15:35-15:40, Paper TuCT14.5 | Add to My Program |
ConTrack3D: Contrastive Learning Contributes Concise 3D Multi-Object Tracking |
|
Du, Ruibin | Fudan University |
Ding, Ziheng | Fudan University |
Zhang, Xiaze | Fudan University |
Wang, Zhuoyao | Fudan University |
Cheng, Ying | Fudan University |
Feng, Rui | Fudan University |
Keywords: Visual Tracking, Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: Online object detection and tracking are crucial for embodied intelligence systems, including autonomous vehicles and robotics. Traditional approaches employ a pipeline structure to perform detection and tracking separately, which can not fully leverage the information from the detector. Moreover, most prior tracking methods rely on motion models such as constant velocity for state updates, which can lead to incorrect associations when the velocity estimates are inaccurate. To address these limitations, we propose ConTrack3D, an end-to-end framework that jointly performs detection and tracking in a fully online manner. Specifically, ConTrack3D incorporates a Joint Encoder module to capture detection embeddings and a Temporal Extender module for data-driven state updates. By employing contrastive learning, ConTrack3D learns discriminative tracking representations for more accurate associations. ConTrack3D is evaluated on the nuScenes benchmark, and the experimental results demonstrate its significant improvements in tracking performance.
|
|
15:40-15:45, Paper TuCT14.6 | Add to My Program |
LMH-MOT : A Light Multiple Hypothesis Framework for 3D Multi-Object Tracking |
|
Yuan, Tanghu | Tongji University |
Yang, Mengxiang | Northeastern University |
Keywords: Visual Tracking
Abstract: 3D multi-object tracking (3D MOT) is a key area in the field of autonomous driving. In systems that track by detection, the detection results of deep learning models will inevitably have FP(False Positives) and FN(False Nagatives), and detector always cannot continuously and accurately detect targets when facing obstacle occlusion and sensor blind spots. The task of 3D-MOT is to combine the discrete and disordered target detection results in time sequence into continuous and reliable tracks for use by downstream planning modules. At present, multi-target tracking algorithms in the field of autonomous driving are all based on single-hypothesis. In crowded scenarios, both false negatives (FN) and false positives (FP) significantly increase, making it difficult for single-hypothesis-based tracking algorithms to accurately output tracks. Towards this end, we propose LMH-MOT, a light multiple hypothesis framework for 3D MOT. Specifically, LMH-MOT effectively handles complex data association problems in autonomous driving scenarios by generating and maintaining multiple sets of hypotheses. Recognizing the possibility of switching between different motion states of the object, we use multiple motion models to more accurately estimate the motion state of the same object at the same time, and select the best estimation result for output. Additionally, we introduce a data association method based on decision trees, making full use of various features of the track and greatly reducing false matches and missing matches. In order to ensure the real-time performance of the entire algorithm framework, we also use gibbs sampling to significantly reduce the calculation time. On the NuScenes dataset, our proposed method achieves state-of-the-art performance with 76.2% AMOTA.
|
|
15:45-15:50, Paper TuCT14.7 | Add to My Program |
Towards Safe Mid-Air Drone Interception: Strategies for Tracking & Capture |
|
Pliska, Michal | Czech Technical University in Prague, Faculty of Electrical Engi |
Vrba, Matous | Faculty of Electrical Engineering, Czech Technical University In |
Baca, Tomas | Ceske Vysoke Uceni Technicke V Praze, FEL |
Saska, Martin | Czech Technical University in Prague |
Keywords: Aerial Systems: Perception and Autonomy, Reactive and Sensor-Based Planning, Field Robots
Abstract: A unique approach for the mid-air autonomous aerial interception of non-cooperating Unmanned Aerial Vehicles by a flying robot equipped with a net is presented in this paper. A novel interception guidance method dubbed Fast Response Proportional Navigation (FRPN) is proposed, designed to catch agile maneuvering targets while relying on onboard state estimation and tracking. The proposed method is compared with state-of-the-art approaches in simulations using 100 different trajectories of the target with varying complexity comprising almost 14 hours of flight data, and FRPN demonstrates the shortest response time and the highest number of interceptions, which are key parameters of agile interception. To enable robust transfer from theory and simulation to a real-world implementation, we aim to avoid overfitting to specific assumptions about the target and to tackle interception of a target following an unknown general trajectory. Furthermore, we identify several often overlooked problems related to tracking and estimation of the target's state that can have a significant influence on the overall performance of the system. We propose the use of a novel state estimation filter based on the Interacting Multiple Model filter and a new measurement model. Simulated experiments show that the proposed solution provides significant improvements in estimation accuracy over the commonly employed Kalman Filter approaches when considering general trajectories. Based on these results, we employ the proposed filtering and guidance methods to implement a complete autonomous interception system, which is thoroughly evaluated in realistic simulations and tested in real-world experiments with a maneuvering target, going far beyond the performance of any state-of-the-art solution.
|
|
TuCT15 Regular Session, 403 |
Add to My Program |
Robot Mapping 2 |
|
|
Chair: Ghaffari, Maani | University of Michigan |
Co-Chair: Guo, Yuliang | Bosch Research North America |
|
15:15-15:20, Paper TuCT15.1 | Add to My Program |
RISED: Accurate and Efficient RGB-Colorized Mapping Using Image Selection and Point Cloud Densification |
|
Jiang, Changjian | Zhengjiang University |
Wang, Lijie | Zhejiang University |
Wan, Zeyu | Zhejiang University |
Gao, Ruilan | Zhejiang University |
Wang, Yue | Zhejiang University |
Xiong, Rong | Zhejiang University |
Zhang, Yu | Zhejiang University |
Keywords: Mapping, SLAM, Sensor Fusion
Abstract: Recent advances in robotics have underscored the critical role of colorized point clouds in enhancing environmental perception accuracy. However, conventional multi-sensor fusion Simultaneous Localization and Mapping (SLAM) systems typically employ all available images indiscriminately for point cloud colorization, resulting in suboptimal outcomes with blurred textures. Notably, achieving precise texture-to-geometry alignment remains a challenge despite the availability of accurate pose estimation. This study introduces RISED, an advanced colorized mapping system that tackles this challenge from two perspectives: projection accuracy and distribution uniformity. For projection accuracy, we analyze the influence of camera poses on colorization and carefully select the optimal viewpoint to minimize errors. Regarding distribution uniformity, point cloud densification is applied to eliminate LiDAR scanning traces. Furthermore, a novel evaluation method is introduced to provide comprehensive assessment of colorized point clouds, filling a gap in this field. Experimental results show that our method outperforms traditional approaches in RGB-colorized mapping. Specifically, our method achieves notable improvements in projection accuracy (55.2%), geometric accuracy (63.1%), and surface coverage (30.8%).
|
|
15:20-15:25, Paper TuCT15.2 | Add to My Program |
Modeling Uncertainty in 3D Gaussian Splatting through Continuous Semantic Splatting |
|
Wilson, Joseph | University of Michigan |
Almeida, Marcelino | Amazon Lab126 |
Mahajan, Sachit | Amazon |
Sun, Min | National Tsing Hua University |
Ghaffari, Maani | University of Michigan |
Ewen, Parker | University of Michigan |
Ghasemalizadeh, Omid | Amazon Lab126 |
Kuo, Cheng-Hao | Amazon |
Sen, Arnab | Amazon |
Keywords: Mapping, Probabilistic Inference, Deep Learning for Visual Perception
Abstract: In this paper, we present a novel algorithm for probabilistically updating and rasterizing semantic maps within 3D Gaussian Splatting (3D-GS). Although previous methods have introduced algorithms which learn to rasterize features in 3D-GS for enhanced scene understanding, 3D-GS can fail without warning which presents a challenge for safety-critical robotic applications. To address this gap, we propose a method which advances the literature of continuous semantic mapping from voxels to ellipsoids, combining the precise structure of 3D-GS with the ability to quantify uncertainty of probabilistic robotic maps. Given a set of images, our algorithm performs a probabilistic semantic update directly on the 3D ellipsoids to obtain an expectation and variance through the use of conjugate priors. We also propose a probabilistic rasterization which returns per-pixel segmentation predictions with quantifiable uncertainty. We compare our method with similar probabilistic voxel-based methods to verify our extension to 3D ellipsoids, and perform ablation studies on uncertainty quantification and temporal smoothing.
|
|
15:25-15:30, Paper TuCT15.3 | Add to My Program |
OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving |
|
Shen, Yedong | University of Science & Technology of China |
Zhang, Xinran | University of Science and Technology of China |
Duan, Yifan | University of Science and Technology of China |
Zhang, Shiqi | USTC |
Li, Heng | University of Science and Technology of China |
Wu, Yilong | University of Science and Technology of China |
Ji, Jianmin | University of Science and Technology of China |
Zhang, Yanyong | University of Science and Technology of China |
Jin, Huiqing | National Center of Engineering and Technology for Vehicle Drivin |
Keywords: Mapping, Computer Vision for Automation
Abstract: Accurate and realistic 3D scene reconstruction enables the lifelike creation of autonomous driving simulation environments. With advancements in 3D Gaussian Splatting (3DGS), previous studies have applied it to reconstruct complex dynamic driving scenes. These methods typically require expensive LiDAR sensors and pre-annotated datasets of dynamic objects. To address these challenges, we propose OG-Gaussian, a novel approach that replaces LiDAR point clouds with Occupancy Grids (OGs) generated from surround-view camera images using Occupancy Prediction Network (ONet). Our method leverages the semantic information in OGs to separate dynamic vehicles from static street background, converting these grids into two distinct sets of initial point clouds for reconstructing both static and dynamic objects. Additionally, we estimate the trajectories and poses of dynamic objects through a learning-based approach, eliminating the need for complex manual annotations. Experiments on Waymo Open dataset demonstrate that OG-Gaussian is on par with the current state-of-the-art in terms of reconstruction quality and rendering speed, achieving an average PSNR of 35.13 and a rendering speed of 143 FPS, while significantly reducing computational costs and economic overhead.
|
|
15:30-15:35, Paper TuCT15.4 | Add to My Program |
SMART: Advancing Scalable Map Priors for Driving Topology Reasoning |
|
Ye, Junjie | University of Southern California |
Paz, David | University of California, San Diego |
Zhang, Hengyuan | University of California, San Diego |
Guo, Yuliang | Bosch Research North America |
Huang, Xinyu | Robert Bosch LLC |
Christensen, Henrik Iskov | UC San Diego |
Wang, Yue | USC |
Ren, Liu | Robert Bosch North America Research Technology Center |
Keywords: Mapping, Computer Vision for Transportation
Abstract: Topology reasoning is crucial for autonomous driving as it enables comprehensive understanding of connectivity and relationships between lanes and traffic elements. While recent approaches have shown success in perceiving driving topology using vehicle-mounted sensors, their scalability is hindered by the reliance on training data captured by consistent sensor configurations. We identify that the key factor in scalable lane perception and topology reasoning is the elimination of this sensor-dependent feature. To address this, we propose SMART, a scalable solution that leverages easily available standard-definition (SD) and satellite maps to learn a map prior model, supervised by large-scale geo-referenced high-definition (HD) maps independent of sensor settings. Attributing to scaled training, SMART alone achieves superior offline lane topology understanding using only SD and satellite inputs. Extensive experiments further demonstrate that SMART can be seamlessly integrated into any online topology reasoning method, yielding significant improvements by up to 28% on the OpenLane-V2 benchmark. Project page: https://jay-ye.github.io/smart.
|
|
15:35-15:40, Paper TuCT15.5 | Add to My Program |
DynORecon: Dynamic Object Reconstruction for Navigation |
|
Wang, Yiduo | University of Sydney |
Morris, Jesse | University of Sydney |
Wu, Lan | University of Technology Sydney |
Vidal-Calleja, Teresa A. | University of Technology Sydney |
Ila, Viorela | The University of Sydney |
Keywords: Mapping, Vision-Based Navigation, Motion and Path Planning
Abstract: This paper presents DynORecon, a Dynamic Object Reconstruction system that leverages the information provided by Dynamic SLAM to simultaneously generate a volumetric map of observed moving entities while estimating free space to support navigation. By capitalising on the motion estimations provided by Dynamic SLAM, DynORecon continuously refines the representation of dynamic objects to eliminate residual artefacts from past observations and incrementally reconstructs each object, seamlessly integrating new observations to capture previously unseen structures. Our system is highly efficient (∼20 FPS) and produces accurate (∼10 cm) object reconstructions using simulated and real-world outdoor datasets.
|
|
15:40-15:45, Paper TuCT15.6 | Add to My Program |
Ephemerality Meets LiDAR-Based Lifelong Mapping |
|
Gil, Hyeonjae | SNU |
Lee, Dongjae | Seoul National University |
Kim, Giseop | DGIST (Daegu Gyeongbuk Institute of Science and Technology) |
Kim, Ayoung | Seoul National University |
Keywords: Mapping, SLAM, Range Sensing
Abstract: Lifelong mapping is crucial for the long-term deployment of robots in dynamic environments. In this paper, we present ELite, an ephemerality-aided LiDAR-based lifelong mapping framework which can seamlessly align multiple session data, remove dynamic objects, and update maps in an end-to-end fashion. Map elements are typically classified as static or dynamic, but cases like parked cars indicate the need for more detailed categories than binary. Central to our approach is the probabilistic modeling of the world into two-stage ephemerality, which represent the transiency of points in the map within two different time scales. By leveraging the spatiotemporal context encoded in ephemeralities, ELite can accurately infer transient map elements, maintain a reliable up-to-date static map, and improve robustness in aligning the new data in a more fine-grained manner. Extensive real-world experiments on long-term datasets demonstrate the robustness and effectiveness of our system. The source code is publicly available for the robotics community: https://github.com/dongjae0107/ELite.
|
|
15:45-15:50, Paper TuCT15.7 | Add to My Program |
Addressing Diverging Training Costs Using BEVRestore for High-Resolution Bird's Eye View Map Construction |
|
Kim, Minsu | KAIST |
Kim, Giseop | DGIST (Daegu Gyeongbuk Institute of Science and Technology) |
Choi, Sunwook | NAVER LABS Corp |
Keywords: Sensor Fusion, Mapping, Range Sensing
Abstract: Recent advancements in Bird’s Eye View (BEV) fusion for map construction have demonstrated remarkable mapping of urban environments. However, their deep and bulky architecture incurs substantial amounts of backpropagation memory and computing latency. Consequently, the problem poses an unavoidable bottleneck in constructing high-resolution (HR) BEV maps, as their large-sized features cause significant increases in costs including GPU memory consumption and computing latency, named diverging training costs issue. Affected by the problem, most existing methods adopt low-resolution (LR) BEV and struggle to estimate the precise locations of urban scene components like road lanes, and sidewalks. As the imprecision leads to risky motion planning like collision avoidance, the diverging training costs issue has to be resolved. In this paper, we address the issue with our novel BEVRestore mechanism. Specifically, our proposed model encodes the features of each sensor to LR BEV space and restores them to HR space to establish a memory-efficient map constructor. To this end, we introduce the BEV restoration strategy, which restores aliasing, and blocky artifacts of the up- scaled BEV features, and narrows down the width of the labels. Our extensive experiments show that the proposed mechanism provides a plug-and-play, memory-efficient pipeline, enabling an HR map construction with a broad BEV scope. Our code is available at https://github.com/minshu-kim/BEVRestore.
|
|
TuCT16 Regular Session, 404 |
Add to My Program |
Manipulation 3 |
|
|
Chair: Bhirangi, Raunaq Mahesh | New York University |
Co-Chair: Desingh, Karthik | University of Minnesota |
|
15:15-15:20, Paper TuCT16.1 | Add to My Program |
Smaller and Faster Robotic Grasp Detection Model Via Knowledge Distillation and Unequal Feature Encoding |
|
Nie, Hong | Shanxi University |
Zhao, Zhou | Central China Normal University |
Chen, Lu | Shanxi University |
Lu, Zhenyu | South China University of Technology |
Li, Zhuomao | Shanxi University |
Yang, Jing | Shanxi University |
Keywords: Deep Learning in Grasping and Manipulation, Perception for Grasping and Manipulation
Abstract: In order to achieve higher accuracy, the complexity of grasp detection network increases accordingly with complicated model structures and tremendous parameters. Although various light-weight strategies are adopted, directly designing the compact network can be sub-optimal and difficult to strike the balance between accuracy and model size. To solve this problem, we explore a more efficient grasp detection model from two aspects: elaborately designing a light-weight network and performing knowledge distillation on the designed network. Specifically, based on the designed light-weight backbone, the features from RGB and D images with unequal effective grasping information rates are fully utilized and the information compensation strategies are adopted to make the model small enough while maintaining its accuracy. Then, the grasping features contained in the large teacher model are adaptively and effectively learned by our proposed method via knowledge distillation. Experimental results indicate that the proposed method is able to achieve comparable performance (98.9%, 93.1%, 82.3%, and 90.0% on Cornell, Jacquard, GraspNet, and MultiObj datasets respectively) with more complicate models while reducing the parameters from MBs to KBs. Real-world robotic grasping experiment in an embedded AI computing device also prove the effectiveness of this approach.
|
|
15:20-15:25, Paper TuCT16.2 | Add to My Program |
ViViDex: Learning Vision-Based Dexterous Manipulation from Human Videos |
|
Chen, Zerui | ENS Paris, France |
Chen, Shizhe | Inria |
Arlaud, Etienne | INRIA |
Laptev, Ivan | INRIA |
Schmid, Cordelia | Inria |
Keywords: Dexterous Manipulation, Learning from Demonstration, Sensor-based Control
Abstract: In this work, we aim to learn a unified vision-based policy for multi-fingered robot hands to manipulate a variety of objects in diverse poses. Though prior work has shown benefits of using human videos for policy learning, performance gains have been limited by the noise in estimated trajectories. Moreover, reliance on privileged object information such as ground-truth object states further limits the applicability in realistic scenarios. To address these limitations, we propose a new framework ViViDex to improve vision-based policy learning from human videos. It first uses reinforcement learning with trajectory guided rewards to train state-based policies for each video, obtaining both visually natural and physically plausible trajectories from the video. We then rollout successful episodes from state-based policies and train a unified visual policy without using any privileged information. We propose coordinate transformation to further enhance the visual point cloud representation, and compare behavior cloning and diffusion policy for the visual policy training. Experiments both in simulation and on the real robot demonstrate that ViViDex outperforms state-of-the-art approaches on three dexterous manipulation tasks.
|
|
15:25-15:30, Paper TuCT16.3 | Add to My Program |
HuDOR: Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards |
|
Guzey, Irmak | New York University |
Dai, Yinlong | NYU |
Savva, Georgy | New York University |
Bhirangi, Raunaq Mahesh | New York University |
Pinto, Lerrel | New York University |
Keywords: Dexterous Manipulation, Imitation Learning, Reinforcement Learning
Abstract: Training robots directly from human videos is an emerging area in robotics and computer vision. While there has been notable progress with two-fingered grippers, learning autonomous tasks without teleoperation remains a difficult problem for multi-fingered robot hands. A key reason for this difficulty is that a policy trained on human hands may not directly transfer to a robot hand with a different morphology. In this work, we present HuDOR, a technique that enables online finetuning of the policy by constructing a reward function from the human video. Importantly, this reward function is built using object-oriented rewards derived from off-the-shelf point trackers, which allows for meaningful learning signals even when the robot hand is in the visual observation, while the human hand is used to construct the reward. Given a single video of human solving a task, such as gently opening a music box, HuDOR allows our four-fingered Allegro hand to learn this task with just 30 minutes of online interaction. Our experiments across four tasks, show that HuDOR outperforms alternatives with an average of 4x improvement. Code and videos are available on our website website: https://object-rewards.github.io/.
|
|
15:30-15:35, Paper TuCT16.4 | Add to My Program |
Hand-Object Interaction Pre-Training from Videos |
|
Singh, Himanshu Gaurav | University of California Berkeley |
Loquercio, Antonio | University of Pennsylvania |
Sferrazza, Carmelo | UC Berkeley |
Wu, Jane | University of California, Berkeley |
Qi, Haozhi | UC Berkeley |
Abbeel, Pieter | UC Berkeley |
Malik, Jitendra | UC Berkeley |
Keywords: Representation Learning, Learning from Demonstration, Dexterous Manipulation
Abstract: We present an approach to learn general robot manipulation priors from 3D hand-object interaction trajectories. We build a framework to use in-the-wild videos to generate sensorimotor robot trajectories. We do so by lifting both the human hand and the manipulated object in a shared 3D space and retargeting human motions to robot actions. Generative modeling on this data gives us a task-agnostic base policy. This policy captures a general yet flexible manipulation prior. We empirically demonstrate that finetuning this policy, with both reinforcement learning (RL) and behavior cloning (BC), enables sample-efficient adaptation to downstream tasks and simultaneously improves robustness and generalizability compared to alternate approaches.
|
|
15:35-15:40, Paper TuCT16.5 | Add to My Program |
SuperQ-GRASP: Superquadrics-Based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation |
|
Tu, Xun | University of Minnesota, Twin Cities |
Desingh, Karthik | University of Minnesota |
Keywords: Perception for Grasping and Manipulation, Grasping, RGB-D Perception
Abstract: Grasp planning and estimation have been a long-standing research problem in robotics, with two main approaches to find graspable poses on the objects: 1) geometric approach, which relies on 3D models of objects and the gripper to estimate valid grasp poses, and 2) data-driven, learning-based approach, with models trained to identify grasp poses from raw sensor observations. The latter assumes comprehensive geometric coverage during the training phase. However, the data-driven approach is typically biased toward tabletop scenarios and struggle to generalize to out-of-distribution scenarios with larger objects (e.g. chair). Additionally, raw sensor data (e.g. RGB-D data) from a single view of these larger objects is often incomplete and necessitates additional observations. In this paper, we take a geometric approach, leveraging advancements in object modeling (e.g. NeRF) to build an implicit model by taking RGB images from views around the target object. This model enables the extraction of explicit mesh model while also capturing the visual appearance from novel viewpoints that is useful for perception tasks like object detection and pose estimation. We further decompose the NeRF-reconstructed 3D mesh into superquadrics (SQs) - parametric geometric primitives, each mapped to a set of precomputed grasp poses, allowing grasp composition on the target object based on these primitives. Our proposed pipeline overcomes the problems: a) noisy depth and incomplete view of the object, with a modeling step, and b) generalization to objects of any size. For more qualitative results, refer to the supplementary video and webpage https://bit.ly/3ZrOanU.
|
|
15:40-15:45, Paper TuCT16.6 | Add to My Program |
Collaborative Motion Planning for Multi-Manipulator Systems through Reinforcement Learning and Dynamic Movement Primitives |
|
Singh, Siddharth | University of Virginia |
Xu, Tian | University of Virginia |
Chang, Qing | University of Virginia |
Keywords: Dual Arm Manipulation, Multi-Robot Systems, Manipulation Planning
Abstract: Robotic tasks often require multiple manipulators to enhance task efficiency and speed, but this increases complexity in terms of collaboration, collision avoidance, and the expanded state-action space. To address these challenges, we propose a multi-level approach combining Reinforcement Learning (RL) and Dynamic Movement Primitives (DMP) to generate adaptive, real-time trajectories for new tasks in dynamic environments using a demonstration library. This method ensures collision-free trajectory generation and efficient collaborative motion planning. We validate the approach through experiments in the PyBullet simulation environment with UR5e robotic manipulators.
|
|
TuCT17 Regular Session, 405 |
Add to My Program |
Soft Actuators 1 |
|
|
Chair: Blumenschein, Laura | Purdue University |
Co-Chair: La, Hung | University of Nevada at Reno |
|
15:15-15:20, Paper TuCT17.1 | Add to My Program |
Soft Robot Employing a Series of Pneumatic Actuators and Distributed Balloons: Modeling, Evaluation, and Applications |
|
Ho, Van | Japan Advanced Institute of Science and Technology |
Nguyen, Tuan | Japan Advanced Institute of Science and Technology |
Nguyen, Dinh | VNU University of Engineering and Technology |
Keywords: Soft Robot Materials and Design, Soft Robot Applications, Mechanism Design, Modeling, Control, and Learning for Soft Robots
Abstract: Tasks involving exploration and inspection of nar- row environments demand a robot to have a flexible body. Such a robot is especially preferred if the integrity of its surrounding is crucial, as in endoscopy procedures. We propose the design of a small, self-propelled soft robot that can operate in a constrained environment. By periodic activation of a series of pneumatic actu- ators fabricated using a casting technique, sinusoidal locomotion is achieved. The wave-like locomotive strategy with an additional support mechanism enabled movement in multiple scenarios, including traveling horizontally and vertically in environments of different characteristics. Two analytical models are presented to highlight the design characteristics. The first predicts the velocity of the robot in relation to the working conditions, while the second calculates the force that the robot body exerts on its sur- roundings. Its mobility was tested in simple and complex routes under rigid and elastic environments. The resulting percent errors for the predictions of velocity and lateral force are 7.89% and 16.86%, respectively. In terms of performance, the robot can move horizontally in rigid tubes even if
|
|
15:20-15:25, Paper TuCT17.2 | Add to My Program |
Compliance Control with Dynamic and Self-Sensing Hydraulic Artificial Muscles for Wearable Assistive Devices |
|
Bibhu, Sharma | UNSW Sydney |
Emanuele, Nicotra | UNSW Sydney |
Davies, James J. | University of New South Wales |
Chi Cong, Nguyen | University of New South Wales |
Phuoc, Thien Phan | University of New South Wales |
Ji, Adrienne | University of New South Wales |
Zhu, Kefan | UNSW Sydney |
Wan, Jingjing | University of New South Wales |
Ngo, Trung Dung | University of Prince Edward Island |
La, Hung | University of Nevada at Reno |
Ho, Van | Japan Advanced Institute of Science and Technology |
Lovell, Nigel Hamilton | University of New South Wales |
Do, Thanh Nho | University of New South Wales |
Keywords: Soft Robot Applications, Physically Assistive Devices, Wearable Robotics
Abstract: While wearable robots that utilize intrinsically soft materials for actuation offer enhanced safety and biological compatibility, the challenges of sensing and control significantly affect their performance. The control problem in such systems is inherently complex, and the inclusion of ‘softness’ introduces additional nonlinearities, hysteresis, and uncertainties. Furthermore, the effectiveness of control strategies is highly dependent on sensor selection and integration, which presents its own challenges. Most robotic systems require separate sensors for control purposes. In this study, a new sensing and control scheme are introduced for soft wearable robots, leveraging the intrinsic soft-sensing capability of fluidic filament actuators without adding computational complexity. This method enables simultaneous sensing and actuation with 96% position accuracy, even under physical disturbances. This approach is demonstrated with a soft assistive device for elbow flexion/extension, achieving 70.5% tracking accuracy and a 0.09s response delay to human intention, ensuring the system provides minimal resistance when assistance is not needed, while delivering the required support when necessary.
|
|
15:25-15:30, Paper TuCT17.3 | Add to My Program |
Braided Artificial Muscle with Programmable Body Morphing and Its Application to Elbow Joint Flexion |
|
Wu, Changchun | The University of Hong Kong |
Liu, Hao | The University of Hong Kong |
Lin, Senyuan | The University of Hong Kong |
Yuan, Wenbo | The University of Hong Kong |
Li, Yunquan | South China University of Technology |
Lam, James | University of Hong Kong |
Xi, Ning | The University of Hong Kong |
Chen, Yonghua | The University of Hong Kong |
Keywords: Soft Sensors and Actuators, Soft Robot Materials and Design, Soft Robot Applications
Abstract: For pneumatic artificial muscles, it is always considered the more maximum contraction ratio the better. While for human joint assisting applications, PAMs with configurable maximum contraction rate are more suitable because of advantageous safety and adaptability. A PAM based on planar-to-specific-wave body shape morph is proposed in this work. Shape-morphing-based braided artificial muscles (SBAMs) have uniqueness of initial elasticity and maximum contraction ration programmability, which meet the favors of human joint assisting applications. The basic structure and working mechanism of contraction in SBAMs will be explained, and their mathematical model will also be established. According to the experimental results, a SBAM prototype generates a force more than 140 times its weight under an easily accessible pressure of 150 kPa. A mannequin wearing the SBAM enables actively flexes its elbow over 120 °.
|
|
15:30-15:35, Paper TuCT17.4 | Add to My Program |
Physics-Informed Hybrid Modeling of Pneumatic Artificial Muscles |
|
Wang, Genmeng | Institut National Des Sciences Appliquees De Lyon |
Chalard, Rémi | Université D'Évry Paris-Saclay |
Jenny Alexandra, Cifuentes | Comillas Pontifical University |
Pham, Minh Tu | INSA Lyon (Institut National Des Sciences Appliquees) |
Keywords: Modeling, Control, and Learning for Soft Robots, Model Learning for Control, Calibration and Identification
Abstract: Pneumatic Artificial Muscles (PAMs) are complex nonlinear systems characterized by hysteresis, making them challenging to model with classical system identification methods. While deep learning has emerged as a powerful tool for modeling nonlinear systems from data, purely neural network-based models often lack interpretability and are prone to overfitting. To address these challenges, this study explores several hybrid approaches that combine analytical models with neural networks to model PAM behavior more effectively. The results demonstrate that hybrid models significantly outperform both purely analytical and black-box neural network models, particularly in terms of generalization and dynamic accuracy. Among the approaches, the Physics-Informed Neural Network (PINN) unsupervised model shows the most robust performance, capturing complex PAM dynamics while maintaining computational efficiency. These findings suggest that hybrid modeling is a promising and scalable solution for accurately representing the intricate behavior of PAMs.
|
|
15:35-15:40, Paper TuCT17.5 | Add to My Program |
Anisotropic Stiffness and Programmable Actuation for Soft Robots Enabled by an Inflated Rotational Joint |
|
Wang, Sicheng | Purdue University |
Frias-Miranda, Eugenio | Purdue University |
Alvarez Valdivia, Antonio | Purdue University |
Blumenschein, Laura | Purdue University |
Keywords: Soft Robot Materials and Design, Mechanism Design, Modeling, Control, and Learning for Soft Robots
Abstract: Soft robots are known for their ability to perform tasks with great adaptability, enabled by their distributed, non-uniform stiffness and actuation. Bending is the most fundamental motion for soft robot design, but creating robust, and easy-to-fabricate soft bending joint with tunable properties remains an active problem of research. In this work, we demonstrate an inflatable actuation module for soft robots with a defined bending plane enabled by forced partial wrinkling. This lowers the structural stiffness in the bending direction, with the final stiffness easily designed by the ratio of wrinkled and unwrinkled regions. We present models and experimental characterization showing the stiffness properties of the actuation module, as well as its ability to maintain the kinematic constraint over a large range of loading conditions. We demonstrate the potential for complex actuation in a soft continuum robot and for decoupling actuation force and efficiency from load capacity. The module provides a novel method for embedding intelligent actuation into soft pneumatic robots.
|
|
15:40-15:45, Paper TuCT17.6 | Add to My Program |
Enhancement of Thin McKibben Muscle Durability under Repetitive Actuation in a Bent State |
|
Kobayashi, Ryota | Tokyo Institute of Technology |
Nabae, Hiroyuki | Institute of Science Tokyo |
Mao, Zebing | Yamaguchi University |
Endo, Gen | Institute of Science Tokyo |
Suzumori, Koichi | Tokyo Institute of Technology |
Keywords: Soft Sensors and Actuators, Soft Robot Applications
Abstract: The McKibben muscle can produce a high force-to-mass ratio, beneficial for various applications in the soft mechatronics field. The thin McKibben muscle, which has a small diameter, has the advantage of a high force-to-mass ratio and sufficient flexibility for use in a bent state. This flexibility permits the realization of flexible mechatronics. However, the thin McKibben muscle is easily broken in a bent state while it is very durable in a straight state. Over repetitive operations, the fibers within the sleeve gradually shift, causing the rubber tube inside to protrude and ultimately leading to cracking. This study investigates improvements in the durability of artificial muscles using adhesives to prevent this fiber-to-fiber misalignment. The durability test showed that the adhesive could provide a durability of up to 10,000-times greater than that of a normal artificial muscle in the maximum case. Using the thin McKibben muscle with the proposed method, tensegrity modules were fabricated. The durability test revealed a 500-fold increase under an applied pressure of 0.5 MPa. Furthermore, the durability of the adhesive-applied artificial muscles was also confirmed to be enhanced during the dynamic movements of a soft tensegrity robot that throws a ball with 0.7 MPa.
|
|
TuCT18 Regular Session, 406 |
Add to My Program |
Intelligent Transportation Systems |
|
|
Chair: Li, Jiachen | University of California, Riverside |
Co-Chair: Likhachev, Maxim | Carnegie Mellon University |
|
15:15-15:20, Paper TuCT18.1 | Add to My Program |
Camera-Based Online Vectorized HD Map Construction with Incomplete Observation |
|
Liu, Hui | Shandong University |
Chang, Faliang | Shandong University |
Liu, Chunsheng | Shandong University |
Lu, Yansha | Shandong University |
Liu, Minhang | Shandong University |
Keywords: Intelligent Transportation Systems, Semantic Scene Understanding, Deep Learning for Visual Perception
Abstract: Camera-based online map construction focuses on learning map elements from surround-view images. Distinguished with previous methods that rely on complete observations, we explore a new map construction problem under incomplete observations where one or more perspectives of the surround-view are missing due to camera damage or occlusion. Incomplete observations lead to inferior performance and may even result in failure. Map construction based on incomplete observations faces two challenges: supplementing missing perspective features and reducing the complexity of high-dimensional feature learning. To address these issues, we propose a novel Panoramic Observation Prior Network (POP-Net). Firstly, based on the observation switch training mechanism, we propose a Panoramic Learning Module (PL-Module). It establishes a learnable panoramic feature space, facilitating the extraction of panoramic features from incomplete observations, thus supplementing missing perspective features. Secondly, based on the feature decomposition mechanism, we design a Panoramic Decomposition-Aggregation Operation (PDA-Operation), which decomposes high-dimensional panoramic features into low-dimensional local scene features. This allows limited local scene features to represent diverse panoramic features, alleviating computational and memory burdens of high-dimensional feature learning. Experimental results demonstrate that our method surpasses existing approaches under incomplete observation scenarios.
|
|
15:20-15:25, Paper TuCT18.2 | Add to My Program |
Online Aggregation of Trajectory Predictors |
|
Tong, Alex | Harvard University |
Sharma, Apoorva | NVIDIA |
Veer, Sushant | NVIDIA |
Pavone, Marco | Stanford University |
Yang, Heng | Harvard University |
Keywords: Intelligent Transportation Systems, Autonomous Agents, Continual Learning
Abstract: Trajectory prediction, the task of forecasting future agent behavior from past data, is central to safe and efficient autonomous driving. A diverse set of methods (e.g., rule-based or learned with different architectures and datasets) have been proposed, yet it is often the case that the performance of these methods is sensitive to the deployment environment (e.g., how well the design rules model the environment, or how accurately the test data match the training data). Building upon the principled theory of online convex optimization but also go- ing beyond convexity and stationarity, we present a lightweight and model-agnostic method to aggregate different trajectory predictors online. We propose to treat each single trajectory predictor as an “expert” and maintain a probability vector to mix the outputs of different experts. Then, the key technical approach lies in leveraging online data –the true agent behavior to be revealed at the next time step– to form a convex-or- nonconvex, stationary-or-dynamic loss function whose gradient steers the probability vector towards choosing the best mixture of experts. We instantiate this method to aggregate trajectory predictors trained on different cities in the NUSCENES dataset and show that it performs just as well, if not better than, any singular model, even when deployed on the LYFT dataset.
|
|
15:25-15:30, Paper TuCT18.3 | Add to My Program |
Gen-Drive: Enhancing Diffusion Generative Driving Policies with Reward Modeling and Reinforcement Learning Fine-Tuning |
|
Huang, Zhiyu | Nanyang Technological University |
Weng, Xinshuo | NVIDIA Corporation |
Igl, Maximilian | Waymo LLC |
Chen, Yuxiao | Nvidia Research |
Cao, Yulong | NVIDIA |
Ivanovic, Boris | NVIDIA |
Pavone, Marco | Stanford University |
Lv, Chen | Nanyang Technological University |
Keywords: Intelligent Transportation Systems, Autonomous Agents, AI-Based Methods
Abstract: Autonomous driving necessitates the ability to reason about future interactions between traffic agents and to make informed evaluations for planning. This paper intro-duces the Gen-Drive framework, which shifts from the traditional prediction and deterministic planning framework to a generation-then-evaluation planning paradigm. The framework employs a behavior diffusion model as a scene generator to produce diverse possible future scenarios, thereby enhancing the capability for joint interaction reasoning. To facilitate decision-making, we propose a scene evaluator (reward) model, trained with pairwise preference data collected through VLM assistance, thereby reducing human workload and enhancing scalability. Furthermore, we utilize an RL fine-tuning frame-work to improve the generation quality of the diffusion model, rendering it more effective for planning tasks. We conduct training and closed-loop planning tests on the nuPlan dataset, and the results demonstrate that employing such a generation-then-evaluation strategy outperforms other learning-based approaches. Additionally, the fine-tuned generative driving policy shows significant enhancements in planning performance. We further demonstrate that utilizing our learned reward model for evaluation or RL fine-tuning leads to better planning performance compared to relying on human-designed rewards. Project website: https://mczhi.github.io/GenDrive.
|
|
15:30-15:35, Paper TuCT18.4 | Add to My Program |
Optimizing Efficiency of Mixed Traffic through Reinforcement Learning: A Topology-Independent Approach and Benchmark |
|
Xiao, Chuyang | ShanghaiTech University |
Wang, Dawei | The University of Hong Kong |
Tang, Xinzheng | The University of Hong Kong |
Pan, Jia | University of Hong Kong |
Ma, Yuexin | ShanghaiTech University |
Keywords: Intelligent Transportation Systems, Autonomous Agents, Multi-Robot Systems
Abstract: This paper presents a mixed traffic control policy designed to optimize traffic efficiency across diverse road topologies, addressing issues of congestion prevalent in urban environments. A model-free reinforcement learning (RL) approach is developed to manage large-scale traffic flow, using data collected by autonomous vehicles to influence human-driven vehicles. A real-world mixed traffic control benchmark is also released, which includes 444 scenarios from 20 countries, representing a wide geographic distribution and covering a variety of scenarios and road topologies. This benchmark serves as a foundation for future research, providing a realistic simulation environment for the development of effective policies. Comprehensive experiments demonstrate the effectiveness and adaptability of the proposed method, achieving better performance than existing traffic control methods in both intersection and roundabout scenarios. To the best of our knowledge, this is the first project to introduce a real-world complex scenarios mixed traffic control benchmark. Videos and code of our work are available at https://sites.google.com/berkeley.edu/mixedtrafficplus/home .
|
|
15:35-15:40, Paper TuCT18.5 | Add to My Program |
Internal-Stably Energy-Saving Cooperative Control of Articulated Wheeled Robot with Distributed Drive Units |
|
Yang, Yi | Beijing Institute of Technology |
Peng, Huishuai | Beijing Institute of Technology |
Hu, Zhexi | Beijing Institute of Technology |
Li, Haoyu | Beijing Institute of Techology |
Xie, Shanshan | Beijing Institute of Technology |
Keywords: Intelligent Transportation Systems, Motion Control, Wheeled Robots
Abstract: Articulated wheeled robots play a crucial role in the logistics industry. However, conventional tractor-driven articulated wheeled robots exhibit poor internal stability and are prone to jackknifing, while also consuming a significant amount of energy. By deploying distributed drives and coordinating control among multiple drives, these issues can be effectively addressed. However, the flexible connections between the bodies of articulated vehicles pose significant challenges to the coordinated control of distributed drives. This paper proposes a multi-drive unit coordinated control algorithm based on driving force equivalence and allocation. A neural network is used to predict the driving force, and through nonlinear driving force equivalence, a feedforward driving force is obtained. This is combined with a closed-loop feedback compensation controller to form a control architecture that integrates feedforward and feedback, resulting in the equivalent total driving force for the vehicle queue. Subsequently, an equivalent distribution strategy allocates the required driving force to each drive, enabling the vehicle bodies to achieve accurate and stable speed tracking while allowing each drive to operate near its efficient operating point, thereby reducing total energy consumption. Experiments demonstrate that our algorithm significantly lowers the total energy consumption of the vehicle queue under standard operating conditions while ensuring speed-tracking accuracy and improving internal stability.
|
|
15:40-15:45, Paper TuCT18.6 | Add to My Program |
Fast-Poly: A Fast Polyhedral Algorithm for 3D Multi-Object Tracking |
|
Li, Xiaoyu | Harbin Institute of Technology |
Liu, Dedong | Harbin Institute of Technology |
Wu, Yitao | Harbin Institute of Technology |
Wu, Xian | Harbin Institute of Technology |
Zhao, Lijun | Harbin Institute of Technology |
Gao, Jinghan | Harbin Institute of Technology |
Keywords: Intelligent Transportation Systems, Computer Vision for Transportation
Abstract: 3D Multi-Object Tracking (MOT) captures stable and comprehensive motion states of surrounding obstacles, essential for robotic perception. However, current 3D trackers face issues with accuracy and latency consistency. In this paper, we propose Fast-Poly, a fast and effective filter-based method for 3D MOT. Building upon our previous work Poly-MOT, Fast-Poly addresses object rotational anisotropy in 3D space, enhances local computation densification, and leverages parallelization technique, improving inference speed and precision. Fast-Poly is extensively tested on two large-scale tracking benchmarks with Python implementation. On the nuScenes dataset, Fast-Poly achieves new state-of-the-art performance with 75.8% AMOTA among all methods and can run at 34.2 FPS on a personal CPU. On the Waymo dataset, Fast-Poly exhibits competitive accuracy with 63.6% MOTA and impressive inference speed (35.5 FPS). The source code is publicly available at https://github.com/lixiaoyu2000/FastPoly.
|
|
15:45-15:50, Paper TuCT18.7 | Add to My Program |
Importance Sampling-Guided Meta-Training for Intelligent Agents in Highly Interactive Environments |
|
Arief, Mansur | Stanford University |
Timmerman, Mike | Stanford University |
Li, Jiachen | University of California, Riverside |
Isele, David | University of Pennsylvania, Honda Research Institute USA |
Kochenderfer, Mykel | Stanford University |
Keywords: Intelligent Transportation Systems, Reinforcement Learning, Planning under Uncertainty
Abstract: Training intelligent agents to navigate highly interactive environments presents significant challenges. While guided meta reinforcement learning (RL) approach that first trains a guiding policy to train the ego agent has proven effective in improving generalizability across scenarios with various levels of interaction, the state-of-the-art method tends to be overly sensitive to extreme cases, impairing the agents' performance in the more common scenarios. This study introduces a novel training framework that integrates guided meta RL with importance sampling (IS) to optimize training distributions iteratively for navigating highly interactive driving scenarios, such as T-intersections or roundabouts. Unlike traditional methods that may underrepresent critical interactions or overemphasize extreme cases during training, our approach strategically adjusts the training distribution towards more challenging driving behaviors using IS proposal distributions and applies the importance ratio to de-bias the result. By estimating a naturalistic distribution from real-world datasets and employing a mixture model for iterative training refinements, the framework ensures a balanced focus across common and extreme driving scenarios. Experiments conducted with both synthetic and naturalistic datasets demonstrate both accelerated training and performance improvements under highly interactive driving tasks.
|
|
TuCT19 Regular Session, 407 |
Add to My Program |
Medical Robot Systems |
|
|
Chair: Webster III, Robert James | Vanderbilt University |
Co-Chair: Menciassi, Arianna | Scuola Superiore Sant'Anna - SSSA |
|
15:15-15:20, Paper TuCT19.1 | Add to My Program |
Design and Modeling of a Compact Spooling Mechanism for the COAST Guidewire Robot |
|
Brumfiel, Timothy A. | Georgia Institute of Technology |
Grinberg, Jared | Georgia Institute of Technology |
Siopongco, Betina | Georgia Institute of Technology |
Desai, Jaydev P. | Georgia Institute of Technology |
Keywords: Medical Robots and Systems, Mechanism Design, Tendon/Wire Mechanism
Abstract: The treatment of many intravascular procedures begins with a clinician manually placing a guidewire to the target lesion to aid in placing other devices. Manually steering the guidewire is challenging due to the lack of direct tip control and the high tortuosity of vessel structures, potentially resulting in vessel perforation or guidewire fracture. These challenges can be alleviated through the use of robotically steerable guidewires that can improve guidewire tip control, provide force feedback, and, similar to commercial guidewires, are inherently safe due to their compliant structure. However, robotic guidewires are not yet clinically viable due to small robot lengths or large actuation systems. In this paper, we develop a highly compact spooling mechanism for the COaxially Aligned STeerable (COAST) guidewire robot, capable of dispensing a clinically viable length of 1.5 m of the robotic guidewire. The mechanism utilizes a spool with several interior armatures to actuate each component of the COAST guidewire. The kinematics of the robotic guidewire are then modeled considering additional friction forces caused by interactions within the mechanism. The actuating mechanisms of the compact spooling mechanism are calibrated and the kinematics of the guidewire are validated resulting in an average curvature RMSE of 0.24 m−1.
|
|
15:20-15:25, Paper TuCT19.2 | Add to My Program |
Motion-Guided Dual-Camera Tracker for Endoscope Tracking and Motion Analysis in a Mechanical Gastric Simulator |
|
Zhang, Yuelin | CUHK |
Yan, Kim | The Chinese University of Hong Kong |
Lam, Chun Ping | The Chinese University of Hong Kong |
Fang, Chengyu | Tsinghua University |
Xie, Wenxuan | The Chinese University of Hong Kong |
Qiu, Yufu | The Chinese University of HongKong |
Tang, Raymond Shing-Yan | The Chinese University of Hong Kong, Department of Medicine And |
Cheng, Shing Shin | The Chinese University of Hong Kong |
Keywords: Deep Learning Methods, Visual Tracking, Computer Vision for Medical Robotics
Abstract: Flexible endoscope motion tracking and analysis in mechanical simulators have proven useful for endoscopy training. Common motion tracking methods based on electromagnetic tracker are however limited by their high cost and material susceptibility. In this work, the motion-guided dual-camera vision tracker is proposed to provide robust and accurate tracking of the endoscope tip's 3D position. The tracker addresses several unique challenges of tracking flexible endoscope tip inside a dynamic, life-sized mechanical simulator. To address the appearance variation and keep dual-camera tracking consistency, the cross-camera mutual template strategy (CMT) is proposed by introducing dynamic transient mutual templates. To alleviate large occlusion and light-induced distortion, the Mamba-based motion-guided prediction head (MMH) is presented to aggregate historical motion with visual tracking. The proposed tracker achieves superior performance against state-of-the-art vision trackers, achieving 42% and 72% improvements against the second-best method in average error and maximum error. Further motion analysis involving novice and expert endoscopists also shows that the tip 3D motion provided by the proposed tracker enables more reliable motion analysis and more substantial differentiation between different expertise levels, compared with other trackers. Project page: https://github.com/PieceZhang/MotionDCTrack
|
|
15:25-15:30, Paper TuCT19.3 | Add to My Program |
A System for Endoscopic Submucosal Dissection Featuring Concentric Push-Pull Manipulators |
|
Connor, Peter | Vanderbilt University |
Hatch, Carter | University of Tennessee |
Dang, Khoa | University of Tennessee, Knoxville |
Qin, Tony | University of North Carolina at Chapel Hill |
Alterovitz, Ron | University of North Carolina at Chapel Hill |
Rucker, Caleb | University of Tennessee |
Webster III, Robert James | Vanderbilt University |
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles, Flexible Robotics
Abstract: Endoscopic Submucosal Dissection (ESD) is an effective minimally invasive approach to removing colon cancer, yet it is underutilized, since it is challenging to learn and perform. To promote the adoption of ESD by making it easier, we propose a system in which two small, flexible robotic manipulators are delivered through a colonoscope. Our system differs from prior robotic systems aimed at this application in that our manipulators are small enough to fit through a clinically used colonoscope. By not re-engineering the colonoscope, we maintain overall system diameter at the current clinical gold standard, and streamline the path to eventual clinical deployment. Our concentric push-pull robot (CPPR) manipulators offer dexterity and simultaneously provide a conduit for grasper or cutting tool deployment. Each manipulator in our system consists of two push-pull tube pairs, and we describe how they are actuated. We describe for the first time our approach to compensating for undesirable CPPR tip motion induced by differences in the tubes' transmission stiffness. We also evaluate the workspace of the manipulators and demonstrate teleoperation in a point-touching experiment. Lastly, we demonstrate the ability of the system to resect tissue via ex vivo animal experiments.
|
|
15:30-15:35, Paper TuCT19.4 | Add to My Program |
Quantitative Evaluation of Curved BioPrinted Constructs of an in Situ Robotic System towards Treatment of Volumetric Muscle Loss |
|
Rezayof, Omid | University of Texas at Austin |
Huang, Xinyuan | The University of Texas at Austin |
Kamaraj, Meenakshi | Terasaki Institute for Biomedical Innovation, Los Angeles, Calif |
John, Johnson V. | Terasaki Institute for Biomedical Innovation |
Alambeigi, Farshid | University of Texas at Austin |
Keywords: Medical Robots and Systems, Robotics and Automation in Life Sciences, Hardware-Software Integration in Robotics
Abstract: Tissue engineering techniques and particularly in situ bioprinting using handheld devices and robotic systems have recently demonstrated promising outcomes to address volumetric muscle loss injuries. Nevertheless, these approaches suffer from insufficient printing precision and/or lack of quantitative analysis of the thickness and uniformity of bioprinted constructs (BPCs) - which are critical for ensuring cell viability and growth. To address these limitations, in this study, we present a framework for robotic bioprinting and complementary vision-based algorithms to quantitatively analyze thickness and uniformity of BPCs with curved geometries. The performance of the proposed robotic bioprinting and complementary algorithms has been thoroughly evaluated using various simulation and experimental studies on BPCs with constant and variable thicknesses. The results clearly demonstrate the remarkable and accurate performance of the proposed method in calculating the thickness and its variations along the geometry of the BPCs.
|
|
15:35-15:40, Paper TuCT19.5 | Add to My Program |
Design and Hysteresis Compensation of a Telerobotic System for Transesophageal Echocardiography |
|
Zhang, Xiu | Politecnico Di Milano |
Tamadon, Izadyar | University of Twente |
Fortuno Jara, Benjamín Ignacio | Politecnico Di Milano |
Cannizzaro, Vanessa | Politecnico Di Milano |
Peloso, Angela | Politecnico Di Milano |
Bicchi, Anna | Politecnico Di Milano |
Aliverti, Andrea | Politecnico Di Milano |
Votta, Emiliano | Politecnico Di Milano |
Menciassi, Arianna | Scuola Superiore Sant'Anna - SSSA |
De Momi, Elena | Politecnico Di Milano |
Keywords: Medical Robots and Systems, Surgical Robotics: Steerable Catheters/Needles, Tendon/Wire Mechanism
Abstract: Transesophageal echocardiogram (TEE) plays an important role in diagnosing cardiac conditions such as valvular diseases and cardiac embolism, as well as guiding various cardiac interventions. It provides detailed cardiac imaging by inserting a probe into the esophagus, which offers an unobstructed view of the heart’s chambers and valves. Addressing the operational challenges and health risks of the sonographer associated with the manual procedure, a novel robotic TEE system is developed to teleoperate the TEE probe across all four degrees of freedom (4-DoFs). This actuation device features an easily assembled design for post-operative cleaning and sanitization. Moreover, this system enhances the precision of tip bending angles through an optimization technique for offline calibration of the actuation plane. The hysteresis effect inherent in the tendon-driven mechanism is characterized and compensated using a free knots B-spline method and a look-up table. Experiments are conducted in a realistic human cardiovascular phantom for preclinical evaluation. Repeatability experiments validate the system’s robustness. Furthermore, compared with the piecewise linear model, the proposed method achieves high accuracy with a median bending angle error of less than 0.8◦. The results demonstrate the system’s potential to significantly improve the autonomy of TEE procedures in cardiac diagnostic and therapeutic procedures.
|
|
15:40-15:45, Paper TuCT19.6 | Add to My Program |
A Magnetic Capsule Robot with an Exoskeleton to Withstand Esophageal Pressure and Delivery Drug in Stomach |
|
Liu, Ruomao | City University of Hong Kong |
Chen, Yujun | Tongji University |
Yin, Zhen | Tongji University |
Zhang, Jiachen | City University of Hong Kong |
Keywords: Soft Robot Applications, Compliant Joints and Mechanisms, Robot Safety
Abstract: Capsule medicine is one of the most widely used methods of drug delivery into the human digestive tract. Packaging drugs into capsules not only prevents contamination of the drug before reaching the destination, but also protects the digestive organs and respiratory tract from potential damages caused by drug reactions. After reaching the targeted digestive organs, the drugs in the orally taken capsules usually can only be released passively. Most capsule robots that have been proposed to release drugs actively did not consider the compressed pressure when they pass through the esophagus, which could lead to premature drug release. This letter proposes a magnetic capsule robot that can withstand intra-esophageal pressure and also has the advantages of active locomotion and on-demand drug releasing. The proposed robot consists of two permanent magnets, an exoskeleton, and a soft non-magnetic container. Thus, it can withstand intra-esophageal pressure when it passes through the esophagus. This capsule robot can enter the stomach for targeted drug releasing without leaking liquid drugs along the path. The behavior of the robot is controllable using an external magnetic field thanks to the ring-shaped magnets mounted on the robot's top and bottom sections. The non-magnetic drug container will not be influenced by the external magnetic field during the locomotion to prevent leakage. The experiments show that this proposed capsule robot is more relevant to real-world medical applications thanks to its unique capability to withstand esophageal pressure.
|
|
15:45-15:50, Paper TuCT19.7 | Add to My Program |
MINRob: A Large Force-Outputting Miniature Robot Based on a Triple-Magnet System |
|
Xiang, Yuxuan | City University of Hong Kong |
Liu, Ruomao | City University of Hong Kong |
Wei, Zihan | City University of Hongkong |
Wang, Xinliang | City University of Hong Kong |
Kang, Weida | Harbin Institute of Technology, Shenzhen |
Wang, Min | City University of Hong Kong |
Liu, Jun | The University of Hong Kong |
Liang, Xudong | Harbin Institute of Technology, Shenzhen |
Zhang, Jiachen | City University of Hong Kong |
Keywords: Medical Robots and Systems, Mechanism Design, Mobile Manipulation, Force Control
Abstract: Magnetically actuated miniature robots are limited in their mechanical outputting capability, because the magnetic forces decrease significantly with decreasing robot size and increasing actuating distance. Hence, the output force of these robots can hardly meet the demand for specific biomedical applications (e.g., tissue penetration). This article proposes a tetherless magnetic impact needle robot (MINRob) based on a triple-magnet system with reversible and repeatable magnetic collisions to overcome this constraint on output force. The working procedure of the proposed system is divided into several states, and a mathematical model is developed to predict and optimize the force output. Measured force values indicate a 10-fold increase compared with existing miniature robots that only utilize magnetic attractive force. Eventually, MINRob is integrated with a teleoperation system, enabling remote and precise control of the robot's position and orientation. The triple-magnet system offers promising locomotion patterns and penetration capacity via the notably increased force output, showing great potential in robot-assisted tissue penetration in minimally invasive healthcare.
|
|
TuCT20 Regular Session, 408 |
Add to My Program |
Mechanism Design and Control |
|
|
Chair: Della Santina, Cosimo | TU Delft |
Co-Chair: Luo, Wenhao | University of Illinois Chicago |
|
15:15-15:20, Paper TuCT20.1 | Add to My Program |
Model-Free Safety Filter for Soft Robots: A Q-Learning Approach |
|
Sue, Guo Ning (Andrew) | Carnegie Mellon University |
Choudhary, Yogita | Carnegie Mellon University |
Desatnik, Richard | Carnegie Mellon University |
Majidi, Carmel | Carnegie Mellon University |
Dolan, John M. | Carnegie Mellon University |
Shi, Guanya | Carnegie Mellon University |
Keywords: Robot Safety, Reinforcement Learning, Modeling, Control, and Learning for Soft Robots
Abstract: Ensuring safety via safety filters in real-world robotics presents significant challenges, particularly when the system dynamics is complex or unavailable. To handle this issue, learning-based safety filters recently gained popularity, which can be classified as model-based and model-free methods. Existing model-based approaches requires various assumptions on system model (e.g., control-affine), which limits their application in complex systems, and existing model-free approaches need substantial modifications to standard RL algorithms and lack versatility. This paper proposes a simple, plugin-and-play, and effective model-free safety filter learning framework. We introduce a novel reward formulation and use Q-learning to learn Q-value functions to safeguard arbitrary task specific nominal policies via filtering out their potentially unsafe actions. Due to its model-free nature and simplicity, our framework can be seamlessly integrated with various RL algorithms. We validate the proposed approach through simulations on double integrator and Dubin’s car systems and demonstrate its effectiveness in real-world experiments with a soft robotic limb.
|
|
15:20-15:25, Paper TuCT20.2 | Add to My Program |
Reachability Analysis for Black-Box Dynamical Systems |
|
Chilakamarri, Vamsi Krishna | Indian Institute of Technology Madras |
Feng, Zeyuan | Stanford University |
Bansal, Somil | Stanford University |
Keywords: Robot Safety, Machine Learning for Robot Control, Optimization and Optimal Control
Abstract: Hamilton-Jacobi (HJ) reachability analysis is a powerful framework for ensuring safety and performance in autonomous systems. However, existing methods typically rely on a white-box dynamics model of the system, limiting their applicability in many practical robotics scenarios where only a black-box model of the system is available. In this work, we propose a novel reachability method to compute reachable sets and safe controllers for black-box dynamical systems. Our approach efficiently approximates the Hamiltonian function using samples from the black-box dynamics. This Hamiltonian is then used to solve the HJ Partial Differential Equation (PDE), providing the reachable set of the system. The proposed method can be applied to general nonlinear systems and can be seamlessly integrated with existing reachability toolboxes for white-box systems to extend their use to black-box systems. Through simulation studies on a black-box slip-wheel car and a quadruped robot, we demonstrate the effectiveness of our approach in accurately obtaining the reachable sets for black-box dynamical systems.
|
|
15:25-15:30, Paper TuCT20.3 | Add to My Program |
SAFE-GIL: SAFEty Guided Imitation Learning for Robotic Systems |
|
Ciftci, Yusuf Umut | University of Southern California |
Chiu, Darren | University of Southern California |
Feng, Zeyuan | Stanford University |
Sukhatme, Gaurav | University of Southern California |
Bansal, Somil | Stanford University |
Keywords: Robot Safety, Machine Learning for Robot Control, Imitation Learning
Abstract: Behavior cloning (BC) is a widely used approach in imitation learning, where a robot learns a control policy by observing an expert supervisor. However, the learned policy can make errors and might lead to safety violations, which limits their utility in safety-critical robotics applications. While prior works have tried improving a BC policy via additional real or synthetic action labels, adversarial training, or runtime filtering, none of them explicitly focus on reducing the BC policy's safety violations during training time. We propose SAFE-GIL, a design-time method to learn safety-aware behavior cloning policies. SAFE-GIL deliberately injects adversarial disturbance in the system during data collection to guide the expert towards safety-critical states. This disturbance injection simulates potential policy errors that the system might encounter during the test time. By ensuring that training more closely replicates expert behavior in safety-critical states, our approach results in safer policies despite policy errors during the test time. We further develop a reachability-based method to compute this adversarial disturbance. We compare SAFE-GIL with various behavior cloning techniques and online safety-filtering methods in three domains: autonomous ground navigation, aircraft taxiing, and aerial navigation on a quadrotor testbed. Our method demonstrates a significant reduction in safety failures, particularly in low data regimes where the likelihood of learning errors, and therefore safety violations, is higher. See our website here: https://y-u-c.github.io/safegil/
|
|
15:30-15:35, Paper TuCT20.4 | Add to My Program |
Computationally and Sample Efficient Safe Reinforcement Learning Using Adaptive Conformal Prediction |
|
Zhou, Hao | University of Illinois Chicago |
Zhang, Yanze | University of Illinois Chicago |
Luo, Wenhao | University of Illinois Chicago |
Keywords: Robot Safety, Model Learning for Control, Integrated Planning and Learning
Abstract: Safety is a critical concern in learning-enabled autonomous systems especially when deploying these systems in real-world scenarios. An important challenge is accurately quantifying the uncertainty of unknown models to generate provably safe control policies that facilitate the gathering of informative data, thereby achieving both safe and optimal policies. Additionally, the selection of the data-driven model can significantly impact both the real-time implementation and the uncertainty quantification process. In this paper, we propose a provably sample efficient episodic safe learning framework that remains robust across various model choices with quantified uncertainty for online control tasks. Specifically, we first employ Quadrature Fourier Features (QFF) for kernel function approximation of Gaussian Processes (GPs) to enable efficient approximation of unknown dynamics. Then the Adaptive Conformal Prediction (ACP) is used to quantify the uncertainty from online observations and combined with the Control Barrier Functions (CBF) to characterize the uncertainty-aware safe control constraints under learned dynamics. Finally, an optimism-based exploration strategy is integrated with ACP-based CBFs for safe exploration and near-optimal safe nonlinear control. Theoretical proofs and simulation results are provided to demonstrate the effectiveness and efficiency of the proposed framework.
|
|
15:35-15:40, Paper TuCT20.5 | Add to My Program |
Guaranteed Reach-Avoid for Black-Box Systems through Narrow Gaps Via Neural Network Reachability |
|
Chung, Long Kiu | Georgia Institute of Technology |
Jung, Wonsuhk | Georgia Institute of Technology |
Pullabhotla, Srivatsank | Georgia Institute of Technology |
Shinde, Parth Kishor | Georgia Institute of Technology |
Sunil, Yadu Krishna | Georgia Institute of Technology |
Kota, Saihari | Georgia Institute of Technology |
Batista, Luis F. W. | Georgia Instutue of Technology and Universite De Lorraine |
Pradalier, Cedric | GeorgiaTech Lorraine |
Kousik, Shreyas | Georgia Institute of Technology |
Keywords: Robot Safety, Model Learning for Control, Motion Control
Abstract: In the classical reach-avoid problem, autonomous mobile robots are tasked to reach a goal while avoiding obstacles. However, it is difficult to provide guarantees on the robot's performance when the obstacles form a narrow gap and the robot is a black-box (i.e. the dynamics are not known analytically, but interacting with the system is cheap). To address this challenge, this paper presents NeuralPARC. The method extends the authors' prior Piecewise Affine Reach-avoid Computation (PARC) method to systems modeled by rectified linear unit (ReLU) neural networks, which are trained to represent parameterized trajectory data demonstrated by the robot. NeuralPARC computes the reachable set of the network while accounting for modeling error, and returns a set of states and parameters with which the black-box system is guaranteed to reach the goal and avoid obstacles. NeuralPARC is shown to outperform PARC, generating provably-safe extreme vehicle drift parking maneuvers in simulations and in real life on a model car, as well as enabling safety on an autonomous surface vehicle (ASV) subjected to large disturbances and controlled by a deep reinforcement learning (RL) policy.
|
|
15:40-15:45, Paper TuCT20.6 | Add to My Program |
RAIL: Reachability-Aided Imitation Learning for Safe Policy Execution |
|
Jung, Wonsuhk | Georgia Institute of Technology |
Anthony, Dennis | Georgia Institute of Technology |
Mishra, Utkarsh | Georgia Institute of Technology |
Ranawaka Arachchige, Nadun | Georgia Institute of Technology |
Bronars, Matthew | Carnegie Mellon University |
Xu, Danfei | Georgia Institute of Technology |
Kousik, Shreyas | Georgia Institute of Technology |
Keywords: Robot Safety, Imitation Learning, Motion and Path Planning
Abstract: Imitation learning (IL) has shown great success in learning complex robot manipulation tasks. However, there remains a need for practical safety methods to justify widespread deployment. In particular, it is important to certify that a system obeys hard constraints on unsafe behavior in settings when it is unacceptable to design a tradeoff between performance and safety via tuning the policy (i.e. soft constraints). This leads to the question, how does enforcing hard constraints impact the performance (meaning safely completing tasks) of an IL policy? To answer this question, this paper builds a reachability-based safety filter to enforce hard constraints on IL, which we call Reachability-Aided Imitation Learning (RAIL). Through evaluations with state-of-the-art IL policies in mobile robots and manipulation tasks, we make two key findings. First, the highest-performing policies are sometimes only so because they frequently violate constraints, and significantly lose performance under hard constraints. Second, surprisingly, hard constraints on the lower-performing policies can occasionally increase their ability to perform tasks safely. Finally, hardware evaluation confirms the method can operate in real time. More results can be found at our website: https://safe-robotics-lab-gt.github.io/rail/.
|
|
15:45-15:50, Paper TuCT20.7 | Add to My Program |
Safe Reinforcement Learning of Robot Trajectories in the Presence of Moving Obstacles |
|
Kiemel, Jonas | Karlsruhe Institute of Technology |
Righetti, Ludovic | New York University |
Kroeger, Torsten | Intrinsic Innovation LLC |
Asfour, Tamim | Karlsruhe Institute of Technology (KIT) |
Keywords: Reinforcement Learning, Robot Safety, Motion Control
Abstract: In this paper, we present an approach for learning collision-free robot trajectories in the presence of moving obstacles. As a first step, we train a backup policy to generate evasive movements from arbitrary initial robot states using model-free reinforcement learning. When learning policies for other tasks, the backup policy can be used to estimate the potential risk of a collision and to offer an alternative action if the estimated risk is considered too high. No matter which action is selected, our action space ensures that the kinematic limits of the robot joints are not violated. We analyze and evaluate two different methods for estimating the risk of a collision. A physics simulation performed in the background is computationally expensive but provides the best results in deterministic environments. If a data-based risk estimator is used instead, the computational effort is significantly reduced, but an additional source of error is introduced. For evaluation, we successfully learn a reaching task and a basketball task while keeping the risk of collisions low. The results demonstrate the effectiveness of our approach for deterministic and stochastic environments, including a human-robot scenario and a ball environment, where no state can be considered permanently safe. By conducting experiments with a real robot, we show that our approach can generate safe trajectories in real time.
|
|
TuCT21 Regular Session, 410 |
Add to My Program |
Reinforcement Learning 3 |
|
|
Chair: Jiang, Chao | University of Wyoming |
Co-Chair: Subosits, John | Toyota Research Institute |
|
15:15-15:20, Paper TuCT21.1 | Add to My Program |
Decision Making for Multi-Robot Fixture Planning Using Multi-Agent Reinforcement Learning (I) |
|
Canzini, Ethan | University of Sheffield |
Auledas-Noguera, Marc | University of Sheffield |
Pope, Simon A. | The University of Sheffield |
Tiwari, Ashutosh | University of Sheffield |
Keywords: Intelligent and Flexible Manufacturing, Multi-Robot Systems, Reinforcement Learning
Abstract: Within the realm of flexible manufacturing, fixture layout planning allows manufacturers to rapidly deploy optimal fixturing plans that can reduce surface deformation that leads to crack propagation in components during manufacturing tasks. The role of fixture layout planning has evolved from being performed by experienced engineers to computational methods due to the number of possible configurations for components. Current optimisation methods commonly fall into sub-optimal positions due to the existence of local optima, with data-driven machine learning techniques relying on costly to collect labelled training data. In this paper, we present a framework for multi-agent reinforcement learning with team decision theory to find optimal fixturing plans for manufacturing tasks. We demonstrate our approach on two representative aerospace components with complex geometries across a set of drilling tasks, illustrating the capabilities of our method; we will compare this against state of the art methods to showcase our method’s improvement at finding optimal fixturing plans with 3 times the improvement in deformation control within tolerance bounds.
|
|
15:20-15:25, Paper TuCT21.2 | Add to My Program |
Reference-Free Formula Drift with Reinforcement Learning: From Driving Data to Tire Energy-Inspired, Real-World Policies |
|
Djeumou, Franck | University of Texas, Austin |
Thompson, Michael | Toyota Research Institute |
Suminaka, Makoto | Toyota Research Institute |
Subosits, John | Toyota Research Institute |
Keywords: Reinforcement Learning, Model Learning for Control, Planning under Uncertainty
Abstract: The skill to drift a car--i.e., operate in a state of controlled oversteer like professional drivers-- could give future autonomous cars maximum flexibility when they need to retain control in adverse conditions or avoid collisions. We investigate real-time drifting strategies that put the car where needed while bypassing expensive trajectory optimization. To this end, we design a reinforcement learning agent that builds on the concept of tire energy absorption to autonomously drift through changing and complex waypoint configurations while safely staying within track bounds. We achieve zero-shot deployment on the car by training the agent in a simulation environment built on top of a neural stochastic differential equation vehicle model learned from pre-collected driving data. Experiments on a Toyota GR Supra and Lexus LC 500 show that the agent is capable of drifting smoothly through varying waypoint configurations with tracking error as low as 10 cm while stably pushing the vehicles to sideslip angles of up to 63°.
|
|
15:25-15:30, Paper TuCT21.3 | Add to My Program |
FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning |
|
Hu, Jiaheng | UT Austin |
Hendrix, Rose | Allen Institute for AI |
Farhadi, Ali | University of Washington |
Kembhavi, Aniruddha | Allen Institute for AI |
Martín-Martín, Roberto | University of Texas at Austin |
Stone, Peter | University of Texas at Austin |
Zeng, Kuo-Hao | Allen Institute for AI |
Ehsani, Kiana | Allen Institute for Artificial Intelligence |
Keywords: Reinforcement Learning, Mobile Manipulation, Vision-Based Navigation
Abstract: In recent years, the Robotics field has initiated several efforts toward building generalist robot policies through large-scale multi-task Behavior Cloning (BC). However, direct deployments of these policies have led to unsatisfactory performance, where the policy struggles with unseen states and tasks. How can we break through the performance plateau of these models and elevate their capabilities to new heights? In this paper, we propose FLaRe, a large-scale RL fine-tuning framework that integrates robust pre-trained representations, large-scale training, and gradient stabilization techniques. Our method aligns pre-trained policies towards task completion, achieving state-of-the-art performance on both previously demonstrated and entirely novel tasks and embodiments. Specifically, on a set of long-horizon mobile manipulation tasks, FLaRe achieves an average success rate of 79.5%, with absolute improvements of +23.6% in simulation and +30.7% in real-world settings over prior state-of-the-art methods. By utilizing only sparse rewards, our approach can enable generalizing to new capabilities beyond the pretraining data with minimal human effort. Moreover, we demonstrate rapid adaptation to new embodiments and behaviors with less than a day of fine-tuning. Code and website at https://robot-flare.github.io/.
|
|
15:30-15:35, Paper TuCT21.4 | Add to My Program |
Suite-IN: Aggregating Motion Features from Apple Suite for Robust Inertial Navigation |
|
Sun, Lan | Shanghai Jiao Tong University |
Xia, Songpengcheng | Shanghai Jiao Tong University |
Deng, Junyuan | Shanghai Jiao Tong University |
Yang, Jiarui | Shanghai Jiao Tong University |
Lai, Zengyuan | Shanghai Jiao Tong University |
Wu, Qi | Shanghai Jiao Tong University |
Pei, Ling | Shanghai Jiao Tong University |
Keywords: Localization, Datasets for Human Motion, Sensor Fusion
Abstract: With the rapid development of wearable technology, devices like smartphones, smartwatches, and headphones equipped with IMUs have become essential for applications such as pedestrian positioning. However, traditional pedestrian dead reckoning (PDR) methods struggle with diverse motion patterns, while recent data-driven approaches, though improving accuracy, often lack robustness due to reliance on a single device. In our work, we attempt to enhance the positioning performance using the low-cost commodity IMUs embedded in the wearable devices. We propose a multi-device deep learning framework named Suite-IN, aggregating motion data from Apple Suite for inertial navigation. Motion data captured by sensors on different body parts contains both local and global motion information, making it essential to reduce the negative effects of localized movements and extract global motion representations from multiple devices.Our model innovatively introduces a contrastive learning module to disentangle motion-shared and motion-private latent representations, enhancing positioning accuracy. We validate our method on a self-collected dataset consisting of Apple Suite: iPhone, Apple Watch and Airpods, which supports a variety of movement patterns and flexible device configurations. Experimental results demonstrate that our approach outperforms state-of-the-art models while maintaining robustness across diverse sensor configurations.
|
|
15:35-15:40, Paper TuCT21.5 | Add to My Program |
Sample-Efficient Unsupervised Policy Cloning from Ensemble Self-Supervised Labeled Videos |
|
Liu, Xin | Institute of Automation, Chinese Academy of Sciences |
Chen, Yaran | Institute of Automation, Chinese Academy of Sciense |
Li, Haoran | Institute of Automation, Chinese Academy of Sciences |
Keywords: Computer Vision for Automation, Reinforcement Learning
Abstract: Current advanced policy learning methodologies have demonstrated the ability to develop expert-level strategies when provided enough information. However, their requirements, including task-specific rewards, action-labeled expert trajectories, and huge environmental interactions, can be expensive or even unavailable in many scenarios. In contrast, humans can efficiently acquire skills within a few trials and errors by imitating easily accessible internet videos, in the absence of any other supervision. In this paper, we try to let machines replicate this efficient watching-and-learning process through Unsupervised Policy from Ensemble Self-supervised labeled Videos (UPESV), a novel framework to efficiently learn policies from action-free videos without rewards and any other expert supervision. UPESV trains a video labeling model to infer the expert actions in expert videos through several organically combined self-supervised tasks. Each task performs its duties, and they together enable the model to make full use of both action-free videos and reward-free interactions for robust dynamics understanding and advanced action prediction. Simultaneously, UPESV clones a policy from the labeled expert videos, in turn collecting environmental interactions for self-supervised tasks. After a sample-efficient, unsupervised, and iterative training process, UPESV obtains an advanced policy based on a robust video labeling model. Extensive experiments in sixteen challenging procedurally generated environments demonstrate that the proposed UPESV achieves state-of-the-art interaction-limited policy learning performance (outperforming five current advanced baselines on 12/16 tasks) without exposure to any other supervision except for videos.
|
|
15:40-15:45, Paper TuCT21.6 | Add to My Program |
PrivilegedDreamer: Explicit Imagination of Privileged Information for Rapid Adaptation of Learned Policies |
|
Byrd, Morgan | Georgia Institute of Technology |
Crandell, Jackson | Georgia Institute of Technology |
Das, Mili | Georgia Institute of Technology |
Inman, Jessica | Georgia Tech Research Institute |
Wright, Robert | Georgia Tech Research Institute |
Ha, Sehoon | Georgia Institute of Technology |
Keywords: Reinforcement Learning
Abstract: Numerous real-world control problems involve dynamics and objectives affected by unobservable hidden parameters, ranging from autonomous driving to robotic manipulation, which cause performance degradation during sim-to-real transfer. To represent these kinds of domains, we adopt hidden-parameter Markov decision processes (HIP-MDPs), which model sequential decision problems where hidden variables parameterize transition and reward functions. Existing approaches, such as domain randomization, domain adaptation, and meta-learning, simply treat the effect of hidden parameters as additional variance and often struggle to effectively handle HIP-MDP problems, especially when the rewards are parameterized by hidden variables. We introduce PrivilegedDreamer, a model-based reinforcement learning framework that extends the existing model-based approach by incorporating an explicit parameter estimation module. PrivilegedDreamer features its novel dual recurrent architecture that explicitly estimates hidden parameters from limited historical data and enables us to condition the model, actor, and critic networks on these estimated parameters. Our empirical analysis on five diverse HIP-MDP tasks demonstrates that PrivilegedDreamer outperforms state-of-the-art model-based, model-free, and domain adaptation learning algorithms. Additionally, we conduct ablation studies to justify the inclusion of each component in the proposed architecture.
|
|
15:45-15:50, Paper TuCT21.7 | Add to My Program |
Dynamic Non-Prehensile Object Transport Via Model-Predictive Reinforcement Learning |
|
Jawale, Neel Anand | University of Washington |
Boots, Byron | University of Washington |
Sundaralingam, Balakumar | NVIDIA Corporation |
Bhardwaj, Mohak | University of Washington |
Keywords: Reinforcement Learning, Optimization and Optimal Control, Learning from Demonstration
Abstract: We investigate the problem of teaching a robot manipulator to perform dynamic non-prehensile object transport, also known as the ‘robot waiter’ task, from a limited set of real-world demonstrations. We propose an approach that combines batch reinforcement learning (RL) with model-predictive control (MPC) by pretraining an ensemble of value functions from demonstration data, and utilizing them online within an uncertainty-aware MPC scheme to ensure robustness to limited data coverage. Our approach is straightforward to integrate with off-the-shelf MPC frameworks and enables learning solely from task space demonstrations with sparsely labeled transitions, while leveraging MPC to ensure smooth joint space motions and constraint satisfaction. We validate the proposed approach through extensive simulated and real-world experiments on a Franka Panda robot performing the robot waiter task and demonstrate robust deployment of value functions learned from 50-100 demonstrations. Furthermore, our approach enables generalization to novel objects not seen during training and can improve upon suboptimal demonstrations. We believe that such a framework can reduce the burden of providing extensive demonstrations and facilitate rapid training of robot manipulators to perform non-prehensile manipulation tasks.
|
|
TuCT22 Regular Session, 411 |
Add to My Program |
Deep Learning for Visual Perception 1 |
|
|
Chair: Chung, Jen Jen | The University of Queensland |
Co-Chair: Jenkins, Odest Chadwicke | University of Michigan |
|
15:15-15:20, Paper TuCT22.1 | Add to My Program |
Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor Policies |
|
Wang, Ruiyu | KTH Royal Institute of Technology |
Zhuang, Zheyu | KTH Royal Institute of Technology |
Jin, Shutong | KTH Royal Institute of Technology |
Ingelhag, Nils | KTH Royal Institute of Technology |
Kragic, Danica | KTH |
Pokorny, Florian T. | KTH Royal Institute of Technology |
Keywords: Imitation Learning, Deep Learning in Grasping and Manipulation, Deep Learning for Visual Perception
Abstract: An end-to-end (E2E) visuomotor policy is typically treated as a unified whole, but recent approaches using out-of-domain (OOD) data to pretrain the visual encoder have cleanly separated the visual encoder from the network, with the remainder referred to as the policy. We propose Visual Alignment Testing, an experimental framework designed to evaluate the validity of this functional separation. Our results indicate that in E2E-trained models, visual encoders actively contribute to decision-making resulting from motor data supervision, contradicting the assumed functional separation. In contrast, OOD-pretrained models, where encoders lack this capability, experience an average performance drop of 42% in our benchmark results, compared to the state-of-the-art performance achieved by E2E policies. We believe this initial exploration of visual encoders' role can provide a first step towards guiding future pretraining methods to address their decision-making ability, such as developing task-conditioned or context-aware encoders.
|
|
15:20-15:25, Paper TuCT22.2 | Add to My Program |
JRN-Geo: A Joint Perception Network Based on RGB and Normal Images for Cross-View Geo-Localization |
|
Zhou, Hongyu | Northeastern University |
Zhang, Yunzhou | Northeastern University |
Huang, Tingsong | University of Sheffield |
Ge, Fawei | Northeastern University |
Qi, Man | Northeastern University |
Zhang, Xichen | Northeastern University |
Zhang, Yizhong | Northeastern University |
Keywords: Representation Learning, Localization, Deep Learning for Visual Perception
Abstract: Cross-view geo-localization plays a critical role in Unmanned Aerial Vehicle (UAV) localization and navigation. However, significant challenges arise from the drastic viewpoint differences and appearance variations between images. Existing methods predominantly rely on semantic features from RGB images, often neglecting the importance of spatial structural information in capturing viewpoint-invariant features. To address this issue, we incorporate geometric structural information from normal images and introduce a Joint perception network to integrate RGB and Normal images (JRN-Geo). Our approach utilizes a dual-branch feature extraction framework, leveraging a Difference-Aware Fusion Module (DAFM) and Joint-Constrained Interaction Aggregation (JCIA) strategy to enable deep fusion and joint-constrained semantic and structural information representation. Furthermore, we propose a 3D geographic augmentation technique to generate potential viewpoint variation samples, enhancing the network’s ability to learn viewpoint-invariant features. Extensive experiments on the University-1652 and SUES-200 datasets validate the robustness of our method against complex viewpoint variations, achieving state-of-the-art performance.
|
|
15:25-15:30, Paper TuCT22.3 | Add to My Program |
U^2Frame: A Unified and Unsupervised Learning Framework for LiDAR-Based Loop Closing |
|
Yixin, Zhang | Sun Yat-Sen University |
Ao, Sheng | Sun Yat-Sen University |
Zhang, Ye | Sun Yat-Sen University |
Song, Zhuo | University of Chinese Academy of Sciences |
Qingyong, Hu | University of Oxford |
Chang, Tao | National University of Defense Technology |
Guo, Yulan | Sun Yat-Sen University |
Keywords: Deep Learning Methods, Localization, Mapping
Abstract: Loop closing is critically important in Simultaneous Localization and Mapping (SLAM) due to its ability to correct accumulated localization errors. However, existing methods are hindered by the difficulty of acquiring pose labels and the unreliability of ground truth data. In this paper, we propose U^2Frame, a unified and unsupervised learning framework for LiDAR-based loop closing. Specifically, the natural temporal-spatial correlation in point cloud sequences is first leveraged to supervise the network training, where near scans are treated as positives and vice versa as negatives. A new neural architecture is then constructed to jointly learn highly discriminative local and global features for loop closure detection. Additionally, an effective candidate verification module that exploits high-order geometric information is presented to further filter out false loop closures and estimate precise poses. We extensively evaluate U^2Frame on multiple datasets according to two tasks derived from loop closing: place recognition and loop pose estimation. Comparative experiments demonstrate that our method outperforms existing state-of-the-art supervised techniques and has a strong generalization ability across unseen scenarios. Code will be released soon.
|
|
15:30-15:35, Paper TuCT22.4 | Add to My Program |
Self-Supervised Place Recognition by Refining Temporal and Featural Pseudo Labels from Panoramic Data |
|
Chen, Chao | New York University |
Cheng, Zegang | New York University |
Liu, Xinhao | New York University |
Li, Yiming | New York University |
Ding, Li | Amazon |
Wang, Ruoyu | New York University |
Feng, Chen | New York University |
Keywords: Deep Learning for Visual Perception, Recognition, Localization
Abstract: Visual place recognition (VPR) using deep networks has achieved state-of-the-art performance. However, most of them require a training set with ground truth sensor poses to obtain positive and negative samples of each observation’s spatial neighborhood for supervised learning. When such information is unavailable, temporal neighborhoods from a sequentially collected data stream could be exploited for self-supervised training, although we find its performance suboptimal. Inspired by noisy label learning, we propose a novel self-supervised framework named TF-VPR that uses temporal neighborhoods and learnable feature neighborhoods to discover unknown spatial neighborhoods. Our method follows an iterative training paradigm which alternates between: (1) representation learning with data augmentation, (2) positive set expansion to include the current feature space neighbors, and (3) positive set contraction via geometric verification. We conduct auto-labeling and generalization tests on both simulated and real datasets, with either RGB images or point clouds as inputs. The results show that our method outperforms self-supervised baselines in recall rate, robustness, and heading diversity, a novel metric we propose for VPR. Our code and datasets can be found at https://ai4ce.github.io/TF-VPR/.
|
|
15:35-15:40, Paper TuCT22.5 | Add to My Program |
AiSDF: Structure-Aware Neural Signed Distance Fields in Indoor Scenes |
|
Jang, Jaehoon | Ulsan National Institue of Science and Technology |
Lee, Inha | Ulsan National Institute of Science & Technology |
Kim, Minje | Ulsan National Institute of Science & Technology |
Joo, Kyungdon | UNIST |
Keywords: Deep Learning for Visual Perception, Mapping, Incremental Learning
Abstract: Indoor scenes we are living in are visually homogenous or textureless, while they inherently have structural forms and provide enough structural priors for 3D scene reconstruction. Motivated by this fact, we propose a structure-aware online signed distance fields (SDF) reconstruction framework in indoor scenes, especially under the Atlanta world (AW) assumption. Thus, we dub this incremental SDF reconstruction for AW as AiSDF. Within the online framework, we infer the underlying Atlanta structure of a given scene and then estimate planar surfel regions supporting the Atlanta structure. This Atlanta-aware surfel representation provides an explicit planar map for a given scene. In addition, based on these Atlanta planar surfel regions, we adaptively sample and constrain the structural regularity in the SDF reconstruction, which enables us to improve the reconstruction quality by maintaining a high-level structure while enhancing the details of a given scene. We evaluate the proposed AiSDF on the ScanNet and ReplicaCAD datasets, where we demonstrate that the proposed framework is capable of reconstructing fine details of objects implicitly, as well as structures explicitly in room-scale scenes.
|
|
15:40-15:45, Paper TuCT22.6 | Add to My Program |
Targeted Hard Sample Synthesis Based on Estimated Pose and Occlusion Error for Improved Object Pose Estimation |
|
Li, Alan | University of Toronto |
Schoellig, Angela P. | TU Munich |
Keywords: Computer Vision for Automation, RGB-D Perception, Deep Learning for Visual Perception
Abstract: 6D Object pose estimation is a fundamental component in robotics enabling efficient interaction with the environment. It is particularly challenging in bin-picking applications, where objects may be in difficult poses, and occlusion between objects of the same type can cause confusion even in well-trained models. We propose a novel method of hard example synthesis that is model-agnostic, using existing simulators and the modelling of pose error in both the camera-to-object views-phere and occlusion space. Through evaluation of the model performance with respect to the distribution of object poses and occlusions, we discover regions of high error and generate realistic training samples to specifically target these regions. We demonstrate an improvement in correct detection rate of up to 20% across several ROBI-dataset objects and also state-of-the-art pose estimation models.
|
|
15:45-15:50, Paper TuCT22.7 | Add to My Program |
Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation |
|
Opipari, Anthony | University of Michigan |
Krishnan, Aravindhan | Amazon Lab126 |
Gayaka, Shreekant | Amazon |
Sun, Min | National Tsing Hua University |
Kuo, Cheng-Hao | Amazon |
Sen, Arnab | Amazon |
Jenkins, Odest Chadwicke | University of Michigan |
Keywords: Object Detection, Segmentation and Categorization, Data Sets for Robotic Vision, RGB-D Perception
Abstract: This paper presents a method for generating large-scale datasets to improve class-agnostic video segmentation across robots with different form factors. Specifically, we consider the question of whether video segmentation models trained on generic segmentation data could be more effective for particular robot platforms if robot embodiment is factored into the data generation process. To answer this question, a pipeline is formulated for using 3D reconstructions (e.g. from HM3DSem[1]) to generate segmented videos that are configurable based on a robot’s embodiment (e.g. sensor type, sensor placement, and illumination source). A resulting massive RGB-D video panoptic segmentation dataset (MVPd) is introduced for extensive benchmarking with foundation and video segmentation models, as well as to support embodiment-focused research in video segmentation. Our experimental findings demonstrate that using MVPd for finetuning can lead to performance improvements when transferring foundation models to certain robot embodiments, such as specific camera placements. These experiments also show that using 3D modalities (depth images and camera pose) can lead to improvements in video segmentation accuracy and consistency. Project page: https://topipari.com/projects/MVPd
|
|
TuCT23 Regular Session, 412 |
Add to My Program |
Autonomous Vehicle Perception 1 |
|
|
Chair: Laugier, Christian | INRIA |
Co-Chair: Zhao, Hang | Tsinghua University |
|
15:15-15:20, Paper TuCT23.1 | Add to My Program |
Characterizing and Optimizing the Tail Latency for Autonomous Vehicle Systems |
|
Liu, Haolan | University of California San Diego |
Wang, Zixuan | University of California San Diego |
Zhao, Jishen | UC San Diego |
Keywords: Engineering for Robotic Systems, Software-Hardware Integration for Robot Systems, Methods and Tools for Robot System Design
Abstract: Autonomous vehicles (AVs) systems are envisioned to revolutionize our life by providing safe, relaxing, and convenient ground transportation. To ensure safety, AV systems need to make timely driving decisions in response to complicated and highly dynamic real-world driving environments. We present a systematic study to understand the causes of tail latency in AV systems and their impact on safety. We empirically analyze the design of two open-source industrial AV systems, Baidu Apollo and Autoware. We explore how pipelined computation design (such as module dependency and execution patterns), traffic factors (surrounding environments of AV), and system factors (such as cache contention) impact AV systems' tail latency. Inspired by the insights, We propose a set of systematic designs that lead to performance and safety improvements of up to 1.65× and 14×, respectively.
|
|
15:20-15:25, Paper TuCT23.2 | Add to My Program |
MORDA: A Synthetic Dataset to Facilitate Adaptation of Object Detectors to Unseen Real-Target Domain While Preserving Performance on Real-Source Domain |
|
Lim, Hojun | MORAI |
Yoo, Heecheol | MORAI |
Lee, Jinwoo | MORAI |
Jeon, Seungmin | MORAI Inc |
Jeon, Hyeongseok | MORAI Inc |
Keywords: Data Sets for Robotic Vision, Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization
Abstract: Deep neural network (DNN) based perception models are indispensable in the development of autonomous vehicles (AVs). However, their reliance on large-scale, high-quality data is broadly recognized as a burdensome necessity due to the substantial cost of data acquisition and labeling. Further, the issue is not a one-time concern as AVs might need a new dataset if they are to be deployed to another region (real-target domain) that the in-hand dataset within the real-source domain cannot incorporate. To mitigate this burden, we propose leveraging synthetic environments as an auxiliary domain where the characteristics of real domains are reproduced. This approach could enable indirect experience about the real-target domain in a time- and cost-effective manner. As a practical demonstration of our methodology, nuScenes and South Korea are employed to represent real-source and real-target domains, respectively. That means we construct digital twins for several regions of South Korea, and the data-acquisition framework of nuScenes is reproduced. Blending the aforementioned components within a simulator allows us to obtain a synthetic-fusion domain in which we forge our novel driving dataset, MORDA: Mixture Of Real-domain characteristics for synthetic-data-assisted Domain Adaptation. To verify the value of synthetic features that MORDA provides in learning about driving environments of South Korea, 2D/3D detectors are trained solely on a combination of nuScenes and MORDA. Afterward, their performance is evaluated on the unforeseen real-world dataset(AI-Hub) collected in South Korea. Our experiments present that MORDA can significantly improve mean Average Precision (mAP) on AI-Hub dataset while that on nuScenes is retained or slightly enhanced. Details on MORDA can be accessed at https://morda-e8d07e.gitlab.io.
|
|
15:25-15:30, Paper TuCT23.3 | Add to My Program |
Towards Latency-Aware 3D Streaming Perception for Autonomous Driving |
|
Peng, Jiaqi | Tsinghua University |
Wang, Tai | Shanghai AI Laboratory |
Pang, Jiangmiao | Shanghai AI Laboratory |
Shen, Yuan | Tsinghua University |
Keywords: Deep Learning for Visual Perception, Sensor Fusion
Abstract: Although existing 3D perception algorithms have demonstrated significant improvements in performance, their deployment on edge devices continues to encounter critical challenges due to substantial runtime latency. We propose a new benchmark tailored for online evaluation by considering runtime latency. Based on the benchmark, we build a Latency-Aware 3D Streaming Perception (LASP) framework that addresses the latency issue through two primary components: 1) latency-aware history integration, which extends query propagation into a continuous process, ensuring the integration of historical data regardless of varying latency; 2) latency-aware predictive detection, a mechanism that compensates the detection results with the predicted trajectory and the posterior accessed latency. By incorporating the latency-aware mechanism, our method shows generalization across various latency levels, achieving an online performance that closely aligns with 80% of its offline evaluation on the Jetson AGX Orin without any acceleration techniques.
|
|
15:30-15:35, Paper TuCT23.4 | Add to My Program |
DRIVE: Dependable Robust Interpretable Visionary Ensemble Framework in Autonomous Driving |
|
Lai, Songning | The Hong Kong University of Science and Technology (Guangzhou) |
Xue, Tianlang | HongKong University of Science and Technology(GuangZhou) |
Xiao, Hongru | Tongji University |
Hu, Lijie | KAUST |
Wu, Jiemin | Hong Kong University of Science and Technology (Guangzhou) |
Feng, Ninghui | The Hong Kong University of Science and Technology (Guangzhou) |
Guan, Runwei | University of Liverpool |
Haicheng, Liao | University of Macau |
Li, Zhenning | University of Macau |
Yue, Yutao | Hong Kong University of Science and Technology (Guangzhou) |
Keywords: Deep Learning for Visual Perception, Visual Learning, Computer Vision for Transportation
Abstract: Recent advancements in autonomous driving have seen a paradigm shift towards end-to-end learning paradigms, which map sensory inputs directly to driving actions, thereby enhancing the robustness and adaptability of autonomous vehicles. However, these models often sacrifice interpretability, posing significant challenges to trust, safety, and regulatory compliance. To address these issues, we introduce DRIVE -- Dependable Robust Interpretable Visionary Ensemble Framework in Autonomous Driving, a comprehensive framework designed to improve the dependability and stability of explanations in end-to-end unsupervised autonomous driving models. Our work specifically targets the inherent instability problems observed in the Driving through the Concept Gridlock (DCG) model, which undermine the trustworthiness of its explanations and decision-making processes. We define four key attributes of textbf{DRIVE}: consistent interpretability, stable interpretability, consistent output, and stable output. These attributes collectively ensure that explanations remain reliable and robust across different scenarios and perturbations. Through extensive empirical evaluations, we demonstrate the effectiveness of our framework in enhancing the stability and dependability of explanations, thereby addressing the limitations of current models. Our contributions include an in-depth analysis of the dependability issues within the DCG model, a rigorous definition of DRIVE with its fundamental properties, a framework to implement DRIVE, and novel metrics for evaluating the dependability of concept-based explainable autonomous driving models. These advancements lay the groundwork for the development of more reliable and trusted autonomous driving systems, paving the way for their broader acceptance and deployment in real-world applicatio
|
|
15:35-15:40, Paper TuCT23.5 | Add to My Program |
Dur360BEV: A Real-World 360-Degree Single Camera Dataset and Benchmark for Bird-Eye View Mapping in Autonomous Driving |
|
E, Wenke | Durham University |
Yuan, Chao | Durham University |
Li, Li | Durham University |
Sun, Yixin | Durham University |
A. Gaus, Yona Falinie | Durham University |
Atapour-Abarghouei, Amir | Durham University |
Breckon, Toby | Durham University |
Keywords: Data Sets for Robotic Vision, Omnidirectional Vision, Deep Learning for Visual Perception
Abstract: We present Dur360BEV, a novel spherical camera autonomous driving dataset equipped with a high-resolution 128-channel 3D LiDAR and a RTK-refined GNSS/INS system, along with a benchmark architecture designed to generate Bird-Eye-View (BEV) maps using only a single spherical camera. This dataset and benchmark address the challenges of BEV generation in autonomous driving, particularly by reducing hardware complexity through the use of a single 360-degree camera instead of multiple perspective cameras. Within our benchmark architecture, we propose a novel spherical-image-to-BEV module that leverages spherical imagery and a refined sampling strategy to project features from 2D to 3D. Our approach also includes an innovative application of focal loss, specifically adapted to address the extreme class imbalance often encountered in BEV segmentation tasks, that demonstrates improved segmentation performance on the Dur360BEV dataset. The results show that our benchmark not only simplifies the sensor setup but also achieves competitive performance. Code + Dataset: https://github.com/Tom-E-Durham/Dur360BEV
|
|
15:40-15:45, Paper TuCT23.6 | Add to My Program |
MVCTrack: Boosting 3D Point Cloud Tracking Via Multimodal-Guided Virtual Cues |
|
Hu, Zhaofeng | Stony Brook University |
Zhou, Sifan | Southeast University |
Yuan, Zhihang | Houmo AI |
Yang, Dawei | Houmo |
Zhao, Shibo | Carnegie Mellon University |
Liang, Ci-Jyun | Stony Brook University |
Keywords: Visual Tracking, Human Detection and Tracking, Sensor Fusion
Abstract: 3D single object tracking plays a crucial role in autonomous driving and robotics. Existing methods often struggle with sparse and incomplete point cloud scenarios. To overcome these limitations, we propose Multimodal-guided virtual cue projection (MVCP) scheme to generate virtual cues for sparse point cloud. Furthermore, we also construct a enhanced tracker called MVCTrack based the generated virtual cues.Specifically, the MVCP scheme seamlessly integrates RGB sensors into LiDAR-based systems, leveraging a set of 2D detections to generate dense 3D virtual points that enhance the originally sparse 3D point cloud. These virtual points can naturally integrate with existing LiDAR-based 3D detectors, resulting in significant performance improvements. Extensive experiments demonstrate that our method achieves competitive performance on the NuScenes dataset. Code is available at https://github.com/StiphyJay/MVCTrack
|
|
15:45-15:50, Paper TuCT23.7 | Add to My Program |
Chameleon: Fast-Slow Neuro-Symbolic Lane Topology Extraction |
|
Zhang, Zongzheng | Tsinghua University |
Li, Xinrun | Newcastle University |
Zou, Sizhe | Beijing Jiaotong University |
Chi, Guoxuan | Tsinghua University |
Li, Siqi | Zhejiang University |
Qiu, Xuchong | Bosch |
Wang, Guoliang | Institute for AI Industry Research (AIR), Tsinghua University |
Zheng, Guantian | Huazhong University of Science and Technology |
Wang, LeiChen | Robert Bosch CN |
Zhao, Hang | Tsinghua University |
Zhao, Hao | Tsinghua University |
Keywords: Cognitive Modeling, Object Detection, Segmentation and Categorization
Abstract: Lane topology extraction involves detecting lanes and traffic elements and determining their relationships, a key perception task for mapless autonomous driving. This task requires complex reasoning, such as determining whether it is possible to turn left into a specific lane. To address this challenge, we introduce neuro-symbolic methods powered by visionlanguage foundation models (VLMs). Existing approaches have notable limitations: (1) Dense visual prompting with VLMs can achieve strong performance but is costly in terms of both financial resources and carbon footprint, making it impractical for robotics applications. (2) Neuro-symbolic reasoning methods for 3D scene understanding fail to integrate visual inputs when synthesizing programs, making them ineffective in handling complex corner cases. To this end, we propose a fast-slow neuro-symbolic lane topology extraction algorithm, named Chameleon, which alternates between a fast system that directly reasons over detected instances using synthesized programs and a slow system that utilizes a VLM with a chainof-thought design to handle corner cases. Chameleon leverages the strengths of both approaches, providing an affordable solution while maintaining high performance. We evaluate the method on the OpenLane-v2 dataset, showing consistent improvements across various baseline detectors. Our code, data, and models are publicly available at https://github.com/XR-Lee/neural-symbolic.
|
|
TuCT24 Regular Session, 401 |
Add to My Program |
Novel Sensors |
|
|
Chair: Sinapov, Jivko | Tufts University |
Co-Chair: Draelos, Mark | University of Michigan |
|
15:15-15:20, Paper TuCT24.1 | Add to My Program |
ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement |
|
Guo, Xuejian | Xi'an Jiaotong University |
Tian, Zhiqiang | Xi'an Jiaotong University |
Wang, Yuehang | Jilin University |
Li, Siqi | Tsinghua University |
Jiang, Yu | Jilin University |
Du, Shaoyi | Xi'an Jiaotong University |
Gao, Yue | Tsinghua University |
Keywords: Deep Learning for Visual Perception, Visual Learning, Sensor Fusion
Abstract: Low-light image enhancement aims to restore the under-exposure image captured in dark scenarios. Under such scenarios, traditional frame-based cameras may fail to capture the structure and color information due to the exposure time limitation. Event cameras are bio-inspired vision sensors that respond to pixel-wise brightness changes asynchronously. Event cameras’ high dynamic range is pivotal for visual perception in extreme low-light scenarios, surpassing traditional cameras and enabling applications in challenging dark environments. In this paper, inspired by the success of the retinex theory for traditional frame-based low-light image restoration, we introduce the first methods that combine the retinex theory with event cameras and propose a novel retinex-based low-light image restoration framework named ERetinex. Among our contributions, the first is developing a new approach that leverages the high temporal resolution data from event cameras with traditional image information to estimate scene illumination accurately. This method outperforms traditional image-only techniques, especially in low-light environments, by providing more precise lighting information. Additionally, we propose an effective fusion strategy that combines the high dynamic range data from event cameras with the color information of traditional images to enhance image quality. Through this fusion, we can generate clearer and more detail-rich images, maintaining the integrity of visual information even under extreme lighting conditions. The experimental results indicate that our proposed method outperforms state-of-the-art (SOTA) methods, achieving a gain of 1.0613 dB in PSNR while reducing FLOPS by 84.28%. The code is available at https://github.com/lodew920/ERetinex.
|
|
15:20-15:25, Paper TuCT24.2 | Add to My Program |
ThermoStereoRT: Thermal Stereo Matching in Real Time Via Knowledge Distillation and Attention-Based Refinement |
|
Hu, Anning | Shanghai Jiao Tong University |
Li, Ang | Shanghai Jiao Tong University |
Jin, Xirui | Shanghai Jiao Tong University |
Zou, Danping | Shanghai Jiao Ton University |
Keywords: Deep Learning for Visual Perception, RGB-D Perception
Abstract: We introduce ThermoStereoRT, a real-time thermal stereo matching method designed for all-weather conditions that recovers disparity from two rectified thermal stereo images, envisioning applications such as night-time drone surveillance or under-bed cleaning robots. Leveraging a lightweight yet powerful backbone, ThermoStereoRT constructs a 3D cost volume from thermal images and employs multi-scale attention mechanisms to produce an initial disparity map. To refine this map, we design a novel channel and spatial attention module. Addressing the challenge of sparse ground truth data in thermal imagery, we utilize knowledge distillation to boost performance without increasing computational demands. Comprehensive evaluations on multiple datasets demonstrate that ThermoStereoRT delivers both real-time capacity and robust accuracy, making it a promising solution for real-world deployment in various challenging environments.Our code will be released on https://github.com/SJTU-ViSYS-team/ThermoStereoRT.
|
|
15:25-15:30, Paper TuCT24.3 | Add to My Program |
Tool-Mediated Robot Perception of Granular Substances Using Multiple Sensory Modalities |
|
Liu, Si | TUFTS |
Sinapov, Jivko | Tufts University |
Keywords: Recognition, Learning Categories and Concepts, Robot Audition
Abstract: People use tools to interact with and perceive the world, with multimodal sensory inputs forming the basis of how we understand our environment. For example, a blind person uses a walking cane to tap the road and detect obstacles, and a builder uses a hammer to strike a wall to assess its structural integrity. Using tools extends our sensory capabilities during exploratory behaviors, enabling us to perceive object properties that are otherwise inaccessible. Inspired by this cognitive process, we propose a framework in which a multi-sensory robot employs exploratory behaviors using various tools to recognize granular substances. Our framework effectively integrates multiple non-visual sensory inputs (e.g., audio, haptic, and tactile) gathered through multiple tools (e.g., spoon, fork) and behaviors (e.g., stirring, poking) to perceive object properties. The framework segments interactions into time windows and aligns different modalities, enhancing data efficiency and interactive perception. Additionally, we conducted tool-transfer experiments to evaluate similarities between tools. Our experiments demonstrate that combining multiple tools and behaviors outperforms single-tool and single-behavior approaches. While the audio modality dominates the non-visual multimodal system, other modalities contribute. We further demonstrate that tool similarities vary depending on the behavior, and notably, the robot does not need to complete entire interactions to achieve optimal recognition accuracy.
|
|
15:30-15:35, Paper TuCT24.4 | Add to My Program |
FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera |
|
Zhao, Guoyang | HKUST(GZ) |
Liu, Yuxuan | Hong Kong University of Science and Technology |
Qi, Weiqing | HKUST |
Ma, Fulong | The Hong Kong University of Science and Technology |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Ma, Jun | The Hong Kong University of Science and Technology |
Keywords: Deep Learning for Visual Perception, Autonomous Vehicle Navigation, Vision-Based Navigation
Abstract: Accurate depth estimation is crucial for 3D scene comprehension in robotics and autonomous vehicles. Fisheye cameras, known for their wide field of view, have inherent geometric benefits. However, their use in depth estimation is restricted by a scarcity of ground truth data and image distortions. We present FisheyeDepth, a self-supervised depth estimation model tailored for fisheye cameras. We incorporate a fisheye camera model into the projection and reprojection stages during training to handle image distortions, thereby improving depth estimation accuracy and training stability. Furthermore, we incorporate real-scale pose information into the geometric projection between consecutive frames, replacing the poses estimated by the conventional pose network. Essentially, this method offers the necessary physical depth for robotic tasks, and also streamlines the training and inference procedures. Additionally, we devise a multi-channel output strategy to improve robustness by adaptively fusing features at various scales, which reduces the noise from real pose data. We demonstrate the superior performance and robustness of our model in fisheye image depth estimation through evaluations on public datasets and real-world scenarios. The project website is available at: https://github.com/guoyangzhao/FisheyeDepth.
|
|
15:35-15:40, Paper TuCT24.5 | Add to My Program |
Geometry-Aware Volumetric Data Stitching Using Local Surface Mapping and Robot Optical Coherence Tomography |
|
Ma, Guangshen | Duke University |
Draelos, Mark | University of Michigan |
Keywords: Sensor-based Control, Medical Robots and Systems, Computer Vision for Medical Robotics
Abstract: Optical coherence tomography (OCT) has been widely used for high-fidelity biological tissue scanning but is traditionally limited to small lateral fields of view that preclude large-area scanning. To overcome this problem, we propose an integration of an OCT sensor to a 6-DOF robot arm end-effector combined with a geometry-aware stitching model for surface and volumetric data stitching. We firstly develop a simple but efficient Robot-OCT calibration method by using a three-marker calibration pattern and implement an optimization solver. Given a pre-defined trajectory, a local planner is developed to update the sensor pose by using the OCT point cloud information in order to maintain the effective imaging depth based on the distance and orientation constraints. The system calibration method is verified through repeated experiments with the three-marker targets and the result shows an average testing error of 0.132 +-0.071 mm. The geometry-aware OCT stitching framework is demonstrated based on the experiments of different scanning trajectories and 3D-printed phantoms for large-area scanning. The OCT stitched point cloud is compared with the ground truth from the phantom CAD model and the result show an average surface alignment error of 0.441 +- 0.241 mm for the path following tasks.
|
|
15:40-15:45, Paper TuCT24.6 | Add to My Program |
Thermal Chameleon: Task-Adaptive Tone-Mapping for Radiometric Thermal-Infrared Images |
|
Lee, DongGuw | Seoul National University (SNU) |
Kim, Jeongyun | SNU |
Cho, Younggun | Inha University |
Kim, Ayoung | Seoul National University |
Keywords: Object Detection, Segmentation and Categorization, Representation Learning, Deep Learning for Visual Perception
Abstract: Thermal Infrared (TIR) imaging provides robust perception for navigating in challenging outdoor environments but faces issues with poor texture and low image contrast due to its 14/16-bit format. Conventional methods utilize various tone-mapping methods to enhance contrast and photometric consistency of TIR images, however, the choice of tone-mapping is largely dependent on knowing the task and temperature dependent priors to work well. In this paper, we present the Thermal Chameleon Network (TCNet), a task-adaptive tone-mapping approach for RAW 14-bit TIR images. Given the same image, TCNet tone-maps different representations of TIR images tailored for each specific task, eliminating the heuristic image rescaling preprocessing and reliance on the extensive prior knowledge of the scene temperature or task-specific characteristics. TCNet exhibits improved generalization performance across object detection and monocular depth estimation, with minimal computational overhead and modular integration to existing architectures for various tasks.
|
|
TuDT1 Regular Session, 302 |
Add to My Program |
Award Finalists 4 |
|
|
Chair: Chli, Margarita | ETH Zurich & University of Cyprus |
Co-Chair: Kosuge, Kazuhiro | The University of Hong Kong |
|
16:35-16:40, Paper TuDT1.1 | Add to My Program |
MAC-VO: Metrics-Aware Covariance for Learning-Based Stereo Visual Odometry |
|
Qiu, Yuheng | Carnegie Mellon University |
Chen, Yutian | Carnegie Mellon University |
Zhang, Zihao | Shanghai Jiao Tong University |
Wang, Wenshan | Carnegie Mellon University |
Scherer, Sebastian | Carnegie Mellon University |
Keywords: SLAM, Localization, Mapping
Abstract: We propose the MAC-VO, a novel learning-based stereo VO that leverages the learned metrics-aware matching uncertainty for dual purposes: selecting keypoint and weighing the residual in pose graph optimization. Compared to traditional geometric methods prioritizing texture-affluent features like edges, our keypoint selector employs the learned uncertainty to filter out the low-quality features based on global inconsistency. In contrast to the learning-based algorithms that model the scale-agnostic diagonal weight matrix for covariance, we design a metrics-aware covariance model to capture the spatial error during keypoint registration and the correlations between different axes. Integrating this covariance model into pose graph optimization enhances the robustness and reliability of pose estimation, particularly in challenging environments with varying illumination, feature density, and motion patterns. On public benchmark datasets, MAC-VO outperforms existing VO algorithms and even some SLAM algorithms in challenging environments. The covariance map also provides valuable information about the reliability of the estimated poses, which can benefit decision-making for autonomous systems.
|
|
16:40-16:45, Paper TuDT1.2 | Add to My Program |
Ground-Optimized 4D Radar-Inertial Odometry Via Continuous Velocity Integration Using Gaussian Process |
|
Yang, Wooseong | Seoul National University |
Jang, Hyesu | Seoul National University |
Kim, Ayoung | Seoul National University |
Keywords: Range Sensing, SLAM, Localization
Abstract: Radar ensures robust sensing capabilities in adverse weather conditions, yet challenges remain due to its high inherent noise level. Existing radar odometry has overcome these challenges with strategies such as filtering spurious points, exploiting Doppler velocity, or integrating with inertial measurements. This paper presents two novel improvements beyond the existing radar-inertial odometry: ground-optimized noise filtering and continuous velocity preintegration. Despite the widespread use of ground planes in LiDAR odometry, imprecise ground point distributions of radar measurements cause naive plane fitting to fail. Unlike plane fitting in LiDAR, we introduce a zone-based uncertainty-aware ground modeling specifically designed for radar. Secondly, we note that radar velocity measurements can be better combined with IMU for a more accurate preintegration in radar-inertial odometry. Existing methods often ignore temporal discrepancies between radar and IMU by simplifying the complexities of asynchronous data streams with discretized propagation models. Tackling this issue, we leverage GP and formulate a continuous preintegration method for tightly integrating 3-DOF linear velocity with IMU, facilitating full 6-DOF motion directly from the raw measurements. Our approach demonstrates remarkable performance (less than 1% vertical drift) in public datasets with meticulous conditions, illustrating substantial improvement in elevation accuracy. The code will be released as open source for the community: https://github.com/wooseongY/Go-RIO.
|
|
16:45-16:50, Paper TuDT1.3 | Add to My Program |
UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation |
|
Tang, Yihe | Stanford University |
Huang, Wenlong | Stanford University |
Wang, Yingke | Stanford University |
Li, Chengshu | Stanford University |
Yuan, Roy | Stanford University |
Zhang, Ruohan | Stanford University |
Wu, Jiajun | Stanford University |
Fei-Fei, Li | Stanford University |
Keywords: Representation Learning, Deep Learning for Visual Perception, Sensorimotor Learning
Abstract: Understanding fine-grained object affordances is imperative for robots to manipulate objects in unstructured environments given open-ended task instructions. However, existing methods on visual affordance predictions often rely on manually-annotated data or conditions only on predefined set of tasks. We introduce Unsupervised Affordance Distillation (UAD), a method for distilling affordance knowledge from foundation models into a task-conditioned affordance model without any manual annotations. By leveraging the complementary strengths of large vision models and vision-language models, UAD automatically annotates a large-scale dataset with detailed pairs. Training only a lightweight task-conditioned decoder atop frozen features, UAD exhibits notable generalization to in-the-wild robotic scenes as well as to various human activities despite only being trained on rendered objects in simulation. Using affordance provided by UAD as the observation space, we show an imitation learning policy that demonstrates promising generalization to unseen object instances, object categories, and even variations in task instructions after training on as few as 10 demonstrations.
|
|
16:50-16:55, Paper TuDT1.4 | Add to My Program |
Bat-VUFN: Bat-Inspired Visual-And-Ultrasound Fusion Network for Robust Perception in Adverse Conditions |
|
Lim, Gyeongrok | KAIST |
Hong, Jeong-ui | KAIST |
Bae, Hyeon Min | Kaist |
Keywords: Sensor Fusion, Localization
Abstract: Environmental factors like weather and road conditions significantly impact object recognition in autonomous vehicles. While cameras provide rich semantic information, their reliance on electromagnetic waves makes them vulnerable to performance degradation in adverse conditions such as low light and rain. In contrast, ultrasonic sensors offer reliable short-range detection, unaffected by such conditions. We introduce Bat-VUFN, a bio-inspired multi-sensory system that merges camera and ultrasonic data using an Input Quality Score (IQS)-based fusion technique to enhance near-field perception in challenging environments. Bat-VUFN dynamically adjusts sensor contributions based on prevailing conditions, achieving impressive results on the K-Bat dataset (average precision: 0.95, MAE: 0.52m, RMSE: 0.55m), demonstrating its robustness in adverse scenarios.
|
|
16:55-17:00, Paper TuDT1.5 | Add to My Program |
TinySense: A Lighter Weight and More Power-Efficient Avionics System for Flying Insect-Scale Robots |
|
Yu, Zhitao | University of Washington |
Tran, Josh | University of Washington |
Li, Claire | University of Washington |
Weber, Aaron | University of Washington |
Talwekar, Yash P. | University of Washington |
Fuller, Sawyer | University of Washington |
Keywords: Biologically-Inspired Robots, Micro/Nano Robots, Sensor Fusion
Abstract: In this paper, we introduce advances in the sensor suite of an autonomous flying insect robot (FIR) weighing less than a gram. FIRs, because of their small weight and size, offer unparalleled advantages in terms of material cost and scalability. However, their size introduces considerable control challenges, notably high-speed dynamics, restricted power, and limited payload capacity. While there have been advancements in developing lightweight sensors, often drawing inspiration from biological systems, no sub-gram aircraft has been able to attain sustained hover without relying on feedback from external sensing such as a motion capture system. The lightest vehicle capable of sustained hovering---the first level of ``sensor autonomy''---is the much larger 28 g Crazyflie. Previous work reported a reduction in size of that vehicle's avionics suite to 187 mg and 21 mW. Here, we report a further reduction in mass and power to only 78.4 mg and 15 mW. We replaced the laser rangefinder with a lighter and more efficient pressure sensor, and built a smaller optic flow sensor around a global-shutter imaging chip. A Kalman Filter (KF) fuses these measurements to estimate the state variables that are needed to control hover: pitch angle, translational velocity, and altitude. Our system achieved performance comparable to that of the Crazyflie's estimator while in flight, with root mean squared errors of 1.573 deg, 0.186 m/s, and 0.136 m, respectively, relative to motion capture.
|
|
17:00-17:05, Paper TuDT1.6 | Add to My Program |
TSCLIP: Robust CLIP Fine-Tuning for Worldwide Cross-Regional Traffic Sign Recognition |
|
Zhao, Guoyang | HKUST(GZ) |
Ma, Fulong | The Hong Kong University of Science and Technology |
Qi, Weiqing | HKUST |
Zhang, Chenguang | Wuhan Polytechnic University |
Liu, Yuxuan | Hong Kong University of Science and Technology |
Liu, Ming | Hong Kong University of Science and Technology (Guangzhou) |
Ma, Jun | The Hong Kong University of Science and Technology |
Keywords: Recognition, Computer Vision for Transportation, Autonomous Vehicle Navigation
Abstract: Traffic sign is a critical map feature for navigation and traffic control. Nevertheless, current methods for traffic sign recognition rely on traditional deep learning models, which typically suffer from significant performance degradation considering the variations in data distribution across different regions. In this paper, we propose TSCLIP, a robust fine-tuning approach with the contrastive language-image pre-training (CLIP) model for worldwide cross-regional traffic sign recognition. We first curate a cross-regional traffic sign benchmark dataset by combining data from ten different sources. Then, we propose a prompt engineering scheme tailored to the characteristics of traffic signs, which involves specific scene descriptions and corresponding rules to generate targeted text descriptions for optimizing the model training process. During the TSCLIP fine-tuning process, we implement adaptive dynamic weight ensembling (ADWE) to seamlessly incorporate outcomes from each training iteration with the zero-shot CLIP model. This approach ensures that the model retains its ability to generalize while acquiring new knowledge about traffic signs. Our method surpasses conventional classification benchmark models in cross-regional traffic sign evaluations, and it achieves state-of-the-art performance compared to existing CLIP fine-tuning techniques (Fig. 1). To the best knowledge of authors, TSCLIP is the first contrastive language-image model used for the worldwide cross-regional traffic sign recognition task. The project website is available at: https://github.com/guoyangzhao/TSCLIP.
|
|
TuDT2 Regular Session, 301 |
Add to My Program |
Integrating Motion Planning and Learning 1 |
|
|
Chair: Mao, Jiayuan | MIT |
Co-Chair: Righetti, Ludovic | New York University |
|
16:35-16:40, Paper TuDT2.1 | Add to My Program |
Adaptive Abrupt Disturbance Rejection Tracking Control for Wheeled Mobile Robots |
|
Wu, Hao | Huazhong University of Science and Technology |
Wang, Shuting | Huazhong University of Science and Technology |
Xie, Yuanlong | Huazhong University of Science and Technology |
Li, Hu | Huazhong University of Science and Technology |
Zheng, Shiqi | China University of Geosciences Wuhan Campus |
Jiang, Liquan | Wuhan Textile University |
Keywords: Motion Control, Wheeled Robots, Robust/Adaptive Control
Abstract: Uncertain disturbances increase the difficulty of robust tracking control for wheeled mobile robots (WMRs) in industrial scenarios, especially when exhibiting abrupt changes. This letter proposes an adaptive abrupt disturbance-rejection sliding mode controller (SMC). To address the increased variability in the disturbance boundaries caused by abrupt transitions, a new adaptive disturbance observer (ADOB) is designed to improve the tracking robustness and weaken the chattering of SMC by generating auxiliary system variables without depending on any prior boundary information about disturbance and its change rate. Then, a novel barrier function (BF)-based switching law is constructed to suppress the residual-disturbance estimation error of the ADOB at the transient state, which achieves the tradeoffs between the necessary sufficient gain and chattering by avoiding gain overestimation. The finite-time Lyapunov stability of the sliding variables and the estimated errors have been proved theoretically. The practical effectiveness is illustrated in experiments with the custom-developed WMRs.
|
|
16:40-16:45, Paper TuDT2.2 | Add to My Program |
OPPA: Online Planner's Parameter Adaptation for Enhanced Mobile Robot Navigation |
|
Chang, Minsu | Samsung Electronics |
Jang, Junwon | Samsung Electronics Co., Ltd |
Han, Daewoong | Samsung Electronics Co. Ltd |
Choi, Wonje | Samsung Electronics |
Kim, Seungyeon | Graduate School of Convergence Science and Technology, Seoul Nat |
Park, Hyunkyu | Samsung Advanced Institute of Technology |
Choi, Hyundo | Samsung Electronics |
Keywords: Collision Avoidance, Integrated Planning and Learning, AI-Based Methods
Abstract: Autonomous navigation in mobile robots has made significant advancements; however, traditional methods often struggle to adapt in real-time to dynamic or unstructured environments. This paper presents the Online Planner’s Parameter Adaptation (OPPA) framework, which enhances both adaptability and safety in mobile robot navigation by dynamically adjusting planner parameters. OPPA integrates a rule-based system for estimating tunnel width using 2D LiDAR and path data with a learning-based approach utilizing a shallow transformer model. By incorporating a human-in-the-loop process to refine training data, OPPA improves accuracy and reliability in complex environments. Designed for real-time efficiency on resource-constrained platforms, OPPA has been validated through simulation and real-world experiments, demonstrating its ability to enhance both safety and performance. These results highlight OPPA as a viable solution for dynamic and complex robotic applications.
|
|
16:45-16:50, Paper TuDT2.3 | Add to My Program |
Learning to Refine Input Constrained Control Barrier Functions Via Uncertainty-Aware Online Parameter Adaptation |
|
Kim, Taekyung | University of Michigan |
Kee, Robin Inho | University of Michigan |
Panagou, Dimitra | University of Michigan, Ann Arbor |
Keywords: Integrated Planning and Learning, Integrated Planning and Control, Machine Learning for Robot Control
Abstract: Control Barrier Functions (CBFs) have become powerful tools for ensuring safety in nonlinear systems. However, finding valid CBFs that guarantee persistent safety and feasibility remains an open challenge, especially in systems with input constraints. Traditional approaches often rely on manually tuning the parameters of the class K functions of the CBF conditions a priori. The performance of CBF-based controllers is highly sensitive to these fixed parameters, potentially leading to overly conservative behavior or safety violations. To overcome these issues, this paper introduces a learning-based optimal control framework for online adaptation of Input Constrained CBF (ICCBF) parameters in discrete-time nonlinear systems. Our method employs a probabilistic ensemble neural network to predict the performance and risk metrics, as defined in this work, for candidate parameters, accounting for both epistemic and aleatoric uncertainties. We propose a two-step verification process using Jensen-Rényi Divergence and distributionally-robust Conditional Value at Risk to identify valid parameters. This enables dynamic refinement of ICCBF parameters based on current state and nearby environments, optimizing performance while ensuring safety within the verified parameter set. Experimental results demonstrate that our method outperforms both fixed-parameter and existing adaptive methods in robot navigation scenarios across safety and performance metrics.
|
|
16:50-16:55, Paper TuDT2.4 | Add to My Program |
GA-TEB: Goal-Adaptive Framework for Efficient Navigation Based on Goal Lines |
|
Zhang, Qianyi | Nankai University |
Luo, Wentao | Huawei |
Zhang, Ziyang | Huawei, China |
Wang, Yaoyuan | Huawei |
Liu, Jingtai | Nankai University |
Keywords: Motion and Path Planning, Human-Aware Motion Planning
Abstract: In crowd navigation, the local goal plays a crucial role in trajectory initialization, optimization, and evaluation. Recognizing that when the global goal is distant, the robot's primary objective is avoiding collisions, making it less critical to pass through the exact local goal point, this work introduces the concept of goal lines, which extend the traditional local goal from a single point to multiple candidate lines. Coupled with a topological map construction strategy that groups obstacles to be as convex as possible, a goal-adaptive navigation framework is proposed to efficiently plan multiple candidate trajectories. Simulations and experiments demonstrate that the proposed GA-TEB framework effectively prevents deadlock situations, where the robot becomes frozen due to a lack of feasible trajectories in crowded environments. Additionally, the framework greatly increases planning frequency in scenarios with numerous non-convex obstacles, enhancing both robustness and safety.
|
|
16:55-17:00, Paper TuDT2.5 | Add to My Program |
Reinforcement Learning for Adaptive Planner Parameter Tuning: A Perspective on Hierarchical Architecture |
|
Lu, Wangtao | Zhejiang University |
Wei, Yufei | Zhejiang University |
Xu, Jiadong | Zhejiang University |
Jia, Wenhao | Zhejiang University of Technology |
Li, Liang | Zhejiang Univerisity |
Xiong, Rong | Zhejiang University |
Wang, Yue | Zhejiang University |
Keywords: Motion and Path Planning, Reinforcement Learning
Abstract: 规划算法的自动参数调整方法, 将管道方法与基于学习的方法集成在一起 技术被认为很有前途,因为它们 稳定性和处理非结构化的能力 环境。虽然现有的参数优化方法具有 展示了相当大的成功,进一步的性能 改进需要更结构化的方法。在这个 论文中,我们提出了一种分层架构 基于强化学习的参数调整。这 architecture 引入了一个分层结构,其中 低频参数调谐、中频规划,以及 高频控制,实现同步增强 上层参数调整和下层控制 通过迭代训练。实验评估 模拟环境和真实环境都表明,我们的 方法超越了现有的参数调整方法。此外,我们的方法在 自主机器人导航 (BARN) 挑战赛
|
|
17:00-17:05, Paper TuDT2.6 | Add to My Program |
Integrating One-Shot View Planning with a Single Next-Best View Via Long-Tail Multiview Sampling |
|
Pan, Sicong | University of Bonn |
Hu, Hao | Fudan University |
Wei, Hui | Fudan University |
Dengler, Nils | University of Bonn |
Zaenker, Tobias | University of Bonn |
Elnagdi, Murad | University of Bonn |
Bennewitz, Maren | University of Bonn |
Keywords: View Planning, Deep Learning in Robotics and Automation, Motion and Path Planning, Computer Vision for Automation
Abstract: Existing view planning systems either adopt an iterative paradigm using next-best views (NBV) or a one-shot pipeline relying on the set-covering view-planning (SCVP) network. However, neither of these methods can concurrently guarantee both high-quality and high-efficiency reconstruction of 3D unknown objects. To tackle this challenge, we introduce a crucial hypothesis: with the availability of more information about the unknown object, the prediction quality of the SCVP network improves. There are two ways to provide extra information: (1) leveraging perception data obtained from NBVs, and (2) training on an expanded dataset of multiview inputs. In this work, we introduce a novel combined pipeline that incorporates a single NBV before activating the proposed multiview-activated (MA-)SCVP network. The MA-SCVP is trained on a multiview dataset generated by our long-tail sampling method, which addresses the issue of unbalanced multiview inputs and enhances the network performance. Extensive simulated experiments substantiate that our system demonstrates a significant surface coverage increase and a substantial 45% reduction in movement cost compared to state-of-the-art systems. Real-world experiments justify the capability of our system for generalization and deployment.
|
|
TuDT3 Regular Session, 303 |
Add to My Program |
Verification and Formal Methods |
|
|
Chair: Luo, Xusheng | Carnegie Mellon University |
Co-Chair: Liu, Wenliang | Amazon |
|
16:35-16:40, Paper TuDT3.1 | Add to My Program |
Decomposition-Based Hierarchical Task Allocation and Planning for Multi-Robots under Hierarchical Temporal Logic Specifications |
|
Luo, Xusheng | Carnegie Mellon University |
Xu, Shaojun | Zhejiang University |
Liu, Ruixuan | Carnegie Mellon University |
Liu, Changliu | Carnegie Mellon University |
Keywords: Formal Methods in Robotics and Automation, Planning, Scheduling and Coordination, Multi-Robot Systems
Abstract: Past research into robotic planning with temporal logic specifications, notably Linear Temporal Logic (LTL), was largely based on single formulas for individual or groups of robots. But with increasing task complexity, LTL formulas unavoidably grow lengthy, complicating interpretation and specification generation, and straining the computational capacities of the planners. A recent development has been the hierarchical representation of LTL~cite{luo2024simultaneous} that contains multiple temporal logic specifications, providing a more interpretable framework. However, the proposed planning algorithm assumes the independence of robots within each specification, limiting their application to multi-robot coordination with complex temporal constraints. In this work, we formulated a decomposition-based hierarchical framework. At the high level, each specification is first decomposed into a set of atomic sub-tasks. We further infer the temporal relations among the sub-tasks of different specifications to construct a task network. Subsequently, a Mixed Integer Linear Program is utilized to assign sub-tasks to various robots. At the lower level, domain-specific controllers are employed to execute sub-tasks. Our approach was experimentally applied to domains of robotic navigation and manipulation. The simulation demonstrated that our approach can find better solutions using less runtimes.
|
|
16:40-16:45, Paper TuDT3.2 | Add to My Program |
Hand It to Me Formally! Data-Driven Control for Human-Robot Handovers with Signal Temporal Logic |
|
Khanna, Parag | KTH Royal Institute of Technology |
Fredberg, Jonathan | KTH Royal Institute of Technology |
Björkman, Mårten | KTH |
Smith, Claes Christian | KTH Royal Institute of Technology |
Linard, Alexis | KTH Royal Institute of Technology |
Keywords: Formal Methods in Robotics and Automation, Human-Aware Motion Planning, Motion and Path Planning
Abstract: To facilitate human-robot interaction (HRI), we aim for robot behavior that is efficient, transparent, and closely resembles human actions. Signal Temporal Logic (STL) is a formal language that enables the specification and verification of complex temporal properties in robotic systems, helping to ensure their correctness. STL can be used to generate explainable robot behaviour, the degree of satisfaction of which can be quantified by checking its STL robustness. In this work, we use data-driven STL inference techniques to model human behavior in human-human interactions, on a handover dataset. We then use the learned model to generate robot behavior in human-robot interactions. We present a handover planner based on inferred STL specifications to command robotic motion in human-robot handovers. We also validate our method in a human-to-robot handover experiment.
|
|
16:45-16:50, Paper TuDT3.3 | Add to My Program |
Forward Invariance in Trajectory Spaces for Safety-Critical Control |
|
Vahs, Matti | KTH Royal Institute of Technology, Stockholm |
Cabral Muchacho, Rafael Ignacio | KTH Royal Institute of Technology |
Pokorny, Florian T. | KTH Royal Institute of Technology |
Tumova, Jana | KTH Royal Institute of Technology |
Keywords: Formal Methods in Robotics and Automation, Robot Safety
Abstract: Useful robot control algorithms should not only achieve performance objectives but also adhere to hard safety constraints. Control Barrier Functions (CBFs) have been developed to provably ensure system safety through forward invariance. However, they often unnecessarily sacrifice performance for safety since they are purely reactive. Receding horizon control (RHC), on the other hand, consider planned trajectories to account for the future evolution of a system. This work provides a new perspective on safety-critical control by introducing Forward Invariance in Trajectory Spaces (FITS). We lift the problem of safe RHC into the trajectory space and describe the evolution of planned trajectories as a controlled dynamical system. Safety constraints defined over states can be converted into sets in the trajectory space which we render forward invariant via a CBF framework. We derive an efficient quadratic program (QP) to synthesize trajectories that provably satisfy safety constraints. Our experiments support that FITS improves the adherence to safety specifications without sacrificing performance over alternative CBF and NMPC methods.
|
|
16:50-16:55, Paper TuDT3.4 | Add to My Program |
Scalable Multi-Robot Task Allocation and Coordination under Signal Temporal Logic Specifications |
|
Liu, Wenliang | Amazon |
Majcherczyk, Nathalie | Worcester Polytechnic Institute |
Pecora, Federico | Amazon Robotics |
Keywords: Formal Methods in Robotics and Automation, Multi-Robot Systems, Planning, Scheduling and Coordination
Abstract: Motion planning with simple objectives, such as collision-avoidance and goal-reaching, can be solved efficiently using modern planners. However, the complexity of the allowed tasks for these planners is limited. On the other hand, signal temporal logic (STL) can specify complex requirements, but STL-based motion planning and control algorithms often face scalability issues, especially in large multi-robot systems with complex dynamics. In this paper, we propose an algorithm that leverages the best of the two worlds. We first use a single-robot motion planner to efficiently generate a set of alternative reference paths for each robot. Then coordination requirements are specified using STL, which is defined over the assignment of paths and robots' progress along those paths. We use a Mixed Integer Linear Program (MILP) to compute task assignments and robot progress targets over time such that the STL specification is satisfied. Finally, a local controller is used to track the target progress. Simulations demonstrate that our method can handle tasks with complex constraints and scales to large multi-robot teams and intricate task allocation scenarios.
|
|
16:55-17:00, Paper TuDT3.5 | Add to My Program |
Planning with Linear Temporal Logic Specifications: Handling Quantifiable and Unquantifiable Uncertainty |
|
Yu, Pian | University College London |
Li, Yong | University of Liverpool |
Parker, David | University of Oxford |
Kwiatkowska, Marta | University of Oxford |
Keywords: Formal Methods in Robotics and Automation, Planning under Uncertainty, Task Planning
Abstract: This work studies the planning problem for robotic systems under both quantifiable and unquantifiable uncertainty. The objective is to enable the robotic systems to optimally fulfill high-level tasks specified by Linear Temporal Logic (LTL) formulas. To capture both types of uncertainty in a unified modelling framework, we utilise Markov Decision Processes with Set-valued Transitions (MDPSTs). We introduce a novel solution technique for optimal robust strategy synthesis of MDPSTs with LTL specifications. To improve efficiency, our work leverages limit-deterministic B"uchi automata (LDBAs) as the automaton representation for LTL to take advantage of their efficient constructions. To tackle the inherent nondeterminism in MDPSTs, which presents a significant challenge for reducing the LTL planning problem to a reachability problem, we introduce the concept of a Winning Region (WR) for MDPSTs. Additionally, we propose an algorithm for computing the WR over the product of the MDPST and the LDBA. Finally, a robust value iteration algorithm is invoked to solve the reachability problem. We validate the effectiveness of our approach through a case study involving a mobile robot operating in the hexagonal world, demonstrating promising efficiency gains.
|
|
17:00-17:05, Paper TuDT3.6 | Add to My Program |
Lyapunov-Certified Trajectory Tracking for Mobile Robot with a Tail Wheel: Differential-Flatness and Adaptive Backstepping Design |
|
Nishizawa, Yuta | Honda R&D Co., Ltd |
Koga, Shumon | Honda Research and Development |
Aizawa, Koki | Honda R&D Co., Ltd |
Yasui, Yuji | Honda R&D Co., Ltd |
Keywords: Nonholonomic Mechanisms and Systems, Robust/Adaptive Control, Wheeled Robots
Abstract: This paper proposes a trajectory tracking control law for a mobile robot with two front differential wheels and a tail wheel. The dynamics is given by mimicking Ackerman steering model for the dynamics of position and orientation, associated with the actuator dynamics of the tail wheel's angle modeled by a first-order response with respect to the robot's angular velocity. First we develop a nominal trajectory tracking control law to track a given desired trajectory by applying differential-flatness property of the unicycle model and backstepping approach to handle the actuator dynamics. The effectiveness of the trajectory tracking is demonstrated by conducting hardware robot experiment after performing system identification, which illustrates the superior performance over a benchmark method. The design is also extended to an adaptive tracking control under parameter uncertainty in the tail wheel dynamics through introducing the adaptation law of the parameters, and the performance is demonstrated in numerical simulation.
|
|
TuDT4 Regular Session, 304 |
Add to My Program |
Object Detection 2 |
|
|
Chair: Li, Yingke | Massachusetts Institute of Technology |
Co-Chair: Joffe, Benjamin | Georgia Institute of Technology |
|
16:35-16:40, Paper TuDT4.1 | Add to My Program |
OPRNet: Object-Centric Point Reconstruction Network for Multimodal 3D Object Detection in Adverse Weathers |
|
Yoon, Jaehyun | Chonnam National University |
Jung, JongWon | CHONNAM University |
Lee, Eungi | Chonnam National University |
Yoo, Seok Bong | Chonnam National University |
Keywords: Object Detection, Segmentation and Categorization, Computer Vision for Automation, Sensor Fusion
Abstract: The development of a multimodal fusion technique utilizing LiDAR-camera data has enabled precise 3D object detection for self-driving vehicles, particularly in ideal conditions with clear weather. Nevertheless, adverse weathers such as fog, snow, and rain remain a challenge for existing multimodal methods. These conditions lead to a reduced density of point clouds as a result of laser signal occlusion and attenuation. Additionally, as the distance grows, the point cloud becomes sparser, further challenging object detection tasks. To address these problems, we introduce a point reconstruction network employing equirectangular projection tailored for multimodal 3D object detection. This network incorporates a range-constrained noise filter to remove noise caused by adverse weather and an object-centric point generator designed to flexibly generate points for distant objects. Moreover, we propose a dual 2D auxiliary module to enhance image features and support the point reconstruction. Experimental evaluations conducted on adverse weather datasets demonstrate that the suggested approach surpasses current techniques. The implementation can be accessed at https://github.com/jhyoon964/oprnet.
|
|
16:40-16:45, Paper TuDT4.2 | Add to My Program |
Hierarchical Spatiotemporal Fusion for Event-Visible Object Detection |
|
Jhong, Sin-Ye | Tamkang University |
Lin, Hsin-Chun | National Taiwan University of Science and Technology |
Liu, Tzu-Chi | National Taiwan University of Science and Technology |
Hua, Kai-Lung | National Taiwan University of Science and Technology |
Chen, Yung-Yao | National Taiwan University of Science and Technology |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning Methods, Visual Learning
Abstract: Traditional visible light cameras are prone to performance degradation under varying weather and lighting conditions. To address this challenge, we introduce an event-based camera and propose a novel hierarchical spatiotemporal fusion approach for event-visible object detection. Our method enhances detection performance by integrating data from both event-based and visible light cameras. We have designed three key modules: The Gated Event Accumulation Representation module (GEAR), the Temporal Feature Selection module (TFS), and the Adaptive Fusion module (AF). GEAR and TFS enhance temporal feature fusion at both image and feature levels, while AF effectively integrates multi-modal features with low computational complexity. Our approach has been trained and validated on the publicly available DSEC-Detection dataset, achieving mAP50 and mAP50-95 scores of 67.2% and 45.6%, respectively, demonstrating superior detection performance and validating the effectiveness of the proposed method.
|
|
16:45-16:50, Paper TuDT4.3 | Add to My Program |
Dark-DENet: A Lightweight Enhancement Network for Low-Light Object Detection |
|
Wu, Xiaoyu | China University of Geoscience |
Shao, Yuxiang | China University of Geoscience |
Jin, Xinyu | China University of Geoscience |
Keywords: Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception, Recognition
Abstract: Deep learning-based object detection methods have shown significant success, particularly in robotic vision tasks like autonomous navigation and object manipulation. However, their performance drops sharply in low-light con- ditions, challenging robots in poorly lit environments. To address this, we propose Dark-DENet, a lightweight detection- driven enhancement network specifically designed for low- light conditions. Dark-DENet introduces an Improved Global Enhancement Module for low-frequency components to capture multiscale features, and a multi-layer convolutional structure in the Detail Enhancement Module to enhance high-frequency components. Additionally, the Scale-Aware Pooling Fusion Module enriches the semantic information of HF components. Dark-DENet is a plug-and-play network that can be easily integrated into the backbone of various detectors for joint training. Integrated with YOLOv5 as DD-YOLO, and combined with other models like YOLO series, RT-DETR, RetinaNet, and Faster R-CNN, experimental results show Dark-DENet consistently improves detection performance across all models. It effectively enhances latent features under limited runtime, making it a robust solution for robotic vision in low-light environments.
|
|
16:50-16:55, Paper TuDT4.4 | Add to My Program |
CubeDN: Real-Time Drone Detection in 3D Space from Dual mmWave Radar Cubes |
|
Fang, Yuan | University College London |
Shi, Fangzhan | University College London |
Wei, Xijia | University College London |
Chen, Qingchao | Peking University |
Chetty, Kevin | University College London |
Julier, Simon | University College London |
Keywords: Object Detection, Segmentation and Categorization, Sensor Fusion, Surveillance Robotic Systems
Abstract: As drone use has become more widespread, there is a critical need to ensure safety and security. A key element of this is robust and accurate drone detection and localization. While cameras and other optical sensors like LiDAR are commonly used for object detection, their performance degrades under adverse lighting and environmental conditions. Therefore, this has generated interest in finding more reliable alternatives, such as millimeter-wave (mmWave) radar. Recent research on mmWave radar object detection has predominantly focused on 2D detection of road users. Although these systems demonstrate excellent performance for 2D problems, they lack the sensing capability to measure elevation, which is essential for 3D drone detection. To address this gap, we propose CubeDN, a single-stage end-to-end radar object detection network specifically designed for flying drones. CubeDN overcomes challenges such as poor elevation resolution by utilizing a dual radar configuration and a novel deep learning pipeline. It simultaneously detects, localizes, and classifies drones of two sizes, achieving decimeter-level tracking accuracy at closer ranges with overall 95% average precision (AP) and 85% average recall (AR). Furthermore, CubeDN completes data processing and inference at 10Hz, making it highly suitable for practical applications.
|
|
16:55-17:00, Paper TuDT4.5 | Add to My Program |
CA-IoU: Central-Gaussian Angle-IoU for Robust Bounding Box Regression |
|
Jang, Junbo | Chung-Ang University |
Kim, Dohoon | Chung-Ang University |
Paik, Joonki | Chung-Ang University |
Keywords: Object Detection, Segmentation and Categorization
Abstract: Accurate object detection depends on the precise refinement of bounding box regression. Recent advancements in bounding box regression have introduced a variety of methodologies aimed at reducing the disparity between predicted and ground truth bounding boxes. The prevailing objective functions for bounding box regression typically encompass three key perspectives: 1) Intersection over Union (IoU), 2) distance between central points, and 3) aspect ratio alignment. Nonetheless, these existing loss functions encounter two primary challenges including slow convergence of the distance term and aspect ratio variation irrelevant to bounding box localization. This paper presents two novel loss terms to address these challenges. Firstly, we introduce the concept of the Integral of Central-Gaussian, a novel approach that leverages the cumulative distribution function (CDF) derived from a closed-form Gaussian distribution based on the central points of bounding boxes. Secondly, we introduce an alternative aspect ratio representation by minimizing the angle between two bounding boxes in direct proportion to their IoU. We term this comprehensive loss function ``Central-Gaussian Angle-IoU" (CA-IoU), seamlessly incorporating the Integral of Central-Gaussian with angle-based IoU. Extensive experiments on various models and benchmarks for object detection highlight the superior performance of CA-IoU loss compared to existing bounding box regression methods. The source code and the corresponding trained models will be made available.
|
|
17:00-17:05, Paper TuDT4.6 | Add to My Program |
On Onboard LiDAR-Based Flying Object Detection |
|
Vrba, Matous | Faculty of Electrical Engineering, Czech Technical University In |
Walter, Viktor | Czech Technical University |
Pritzl, Vaclav | Czech Technical University in Prague |
Pliska, Michal | Czech Technical University in Prague, Faculty of Electrical Engi |
Baca, Tomas | Czech Technical University in Prague FEE |
Spurny, Vojtech | Ceske Vysoke Uceni Technicke V Praze, FEL |
Hert, Daniel | Czech Technical University in Prague |
Saska, Martin | Czech Technical University in Prague |
Keywords: Aerial Systems: Perception and Autonomy, Multi-Robot Systems, Object Detection, Segmentation and Categorization, Autonomous Aerial Interception
Abstract: A new robust and accurate approach for the detection and localization of flying objects with the purpose of highly dynamic aerial interception and agile multi-robot interaction is presented in this paper. The approach is proposed for use on board of autonomous aerial vehicles equipped with a 3D LiDAR sensor. It relies on a novel 3D occupancy voxel mapping method for the target detection that provides high localization accuracy and robustness with respect to varying environments and appearance changes of the target. In combination with a proposed cluster-based multi-target tracker, sporadic false positives are suppressed, state estimation of the target is provided, and the detection latency is negligible. This makes the system suitable for tasks of agile multi-robot interaction, such as autonomous aerial interception or formation control where fast, precise, and robust relative localization of other robots is crucial. We evaluate the viability and performance of the system in simulated and real-world experiments which demonstrate that at a range of 20m, our system is capable of reliably detecting a micro-scale UAV with an almost 100% recall, 0.2m accuracy, and 20ms delay.
|
|
TuDT5 Regular Session, 305 |
Add to My Program |
Aerial Robots: Planning and Control |
|
|
Chair: Zheng, Minghui | Texas A&M University |
Co-Chair: Faigl, Jan | Czech Technical University in Prague |
|
16:35-16:40, Paper TuDT5.1 | Add to My Program |
Improving Disturbance Estimation and Suppression Via Learning among Systems with Mismatched Dynamics |
|
Modi, Harsh Jashvantbhai | Texas A&M University |
Chen, Zhu | University at Buffalo |
Liang, Xiao | Texas A&M University |
Zheng, Minghui | Texas A&M University |
Keywords: Aerial Systems: Applications, Motion Control
Abstract: Iterative learning control (ILC) is a method for reducing system tracking or estimation errors over multiple iterations by using information from past iterations. The disturbance observer (DOB) is used to estimate and mitigate disturbances within the system, while the system is being affected by them. ILC enhances system performance by introducing a feedforward signal in each iteration. However, its effectiveness may diminish if the conditions change during the iterations. On the other hand, although DOB effectively mitigates the effects of new disturbances, it cannot entirely eliminate them as it operates reactively. Therefore, neither ILC nor DOB alone can ensure sufficient robustness in challenging scenarios. This study focuses on the simultaneous utilization of ILC and DOB to enhance system robustness. The proposed methodology specifically targets dynamically different linearized systems performing repetitive tasks. The systems share similar forms but differ in dynamics (e.g. sizes, masses, and controllers). Consequently, the design of learning filters must account for these differences in dynamics. To validate the approach, the study establishes a theoretical framework for designing learning filters in conjunction with DOB. The validity of the framework is then confirmed through numerical studies and experimental tests conducted on unmanned aerial vehicles (UAVs). Although UAVs are nonlinear systems, the study employs a linearized controller as they operate in proximity to the hover condition.
|
|
16:40-16:45, Paper TuDT5.2 | Add to My Program |
Learning Speed Adaptation for Flight in Clutter |
|
Zhao, Guangyu | Zhejiang University |
Wu, Tianyue | Zhejiang University |
Chen, Yeke | Zhejiang University |
Gao, Fei | Zhejiang University |
Keywords: Aerial Systems: Applications, Motion and Path Planning, Reinforcement Learning
Abstract: Animals learn to adapt speed of their movements to their capabilities and the environment they observe. Mobile robots should also demonstrate this ability to trade-off aggressiveness and safety for efficiently accomplishing tasks. The aim of this work is to endow flight vehicles with the ability of speed adaptation in prior unknown and partially observable cluttered environments. We propose a hierarchical learning and planning system where we utilize both well-established methods of model-based trajectory generation and trial-and-error that comprehensively learns a policy to dynamically configure the speed constraint. Technically, we use online reinforcement learning to obtain the deployable policy. The statistical results in simulation demonstrate the advantages of our method over the constant speed constraint baselines and an alternative method in terms of flight efficiency and safety. In particular, the policy behaves perception awareness, which distinguish it from alternative approaches. By deploying the policy to hardware, we verify that these advantages can be brought to the real world.
|
|
16:45-16:50, Paper TuDT5.3 | Add to My Program |
Design, Contact Modeling, and Collision-Inclusive Planning of a Dual-Stiffness Aerial RoboT (DART) |
|
Kumar, Yogesh | Arizona State University |
Patnaik, Karishma | Arizona State University |
Zhang, Wenlong | Arizona State University |
Keywords: Aerial Systems: Mechanics and Control, Aerial Systems: Applications, Motion Control
Abstract: Collision-resilient quadrotors have gained significant attention for operating in cluttered environments and leveraging impacts to perform agile maneuvers. However, existing designs are typically single-mode: either safeguarded by propeller guards that prevent deformation or deformable but lacking rigidity, which is crucial for stable flight in open environments. This paper introduces DART, a Dual-stiffness Aerial RoboT, that adapts its post-collision response by either engaging a locking mechanism for a rigid mode or disengaging it for a flexible mode, respectively. Comprehensive characterization tests highlight the significant difference in post-collision responses between its rigid and flexible modes, with the rigid mode offering seven times higher stiffness compared to the flexible mode. To understand and harness the collision dynamics, we propose a novel collision response prediction model based on the linear complementarity system theory. We demonstrate the accuracy of predicting collision forces for both the rigid and flexible modes of DART. Experimental results confirm the accuracy of the model and underscore its potential to advance collision-inclusive trajectory planning in aerial robotics.
|
|
16:50-16:55, Paper TuDT5.4 | Add to My Program |
Learning Quadrotor Control from Visual Features Using Differentiable Simulation |
|
Heeg, Johannes | University of Zürich |
Song, Yunlong | University of Zurich |
Scaramuzza, Davide | University of Zurich |
Keywords: Aerial Systems: Mechanics and Control, Machine Learning for Robot Control, Reinforcement Learning
Abstract: The sample inefficiency of reinforcement learning (RL) remains a significant challenge in robotics. RL requires large-scale simulation and can still cause long training times, slowing research and innovation. This issue is particularly pronounced in vision-based control tasks where reliable state estimates are not accessible. Differentiable simulation offers an alternative by enabling gradient back-propagation through the dynamics model, providing low-variance analytical policy gradients and, hence, higher sample efficiency. However, its usage for real-world robotic tasks has yet been limited. This work demonstrates the great potential of differentiable simulation for learning quadrotor control. We show that training in differentiable simulation significantly outperforms model-free RL in terms of both sample efficiency and training time, allowing a policy to learn to recover a quadrotor in seconds when providing vehicle states and in minutes when relying solely on visual features. The key to our success is two-fold. First, the use of a simple surrogate model for gradient computation greatly accelerates training without sacrificing control performance. Second, combining state representation learning with policy learning enhances convergence speed in tasks where only visual features are observable. These findings highlight the potential of differentiable simulation for real-world robotics and offer a compelling alternative to conventional RL approaches.
|
|
16:55-17:00, Paper TuDT5.5 | Add to My Program |
Real-Time Planning of Minimum-Time Trajectories for Agile UAV Flight |
|
Teissing, Krystof | Czech Technical University in Prague |
Novosad, Matej | Czech Technical University in Prague |
Penicka, Robert | Czech Technical University in Prague |
Saska, Martin | Czech Technical University in Prague |
Keywords: Aerial Systems: Applications, Motion and Path Planning
Abstract: We address the challenge of real-time planning of minimum-time trajectories over multiple waypoints, onboard multirotor UAVs. Previous works demonstrated that achieving a truly time-optimal trajectory is computationally too demanding to enable frequent replanning during agile flight, especially on less powerful flight computers. Our approach overcomes this stumbling block by utilizing a point-mass model with a novel iterative thrust decomposition algorithm, enabling the UAV to use all of its collective thrust, something previous point-mass approaches could not achieve. The approach enables gravity and drag modeling integration, significantly reducing tracking errors in high-speed trajectories, which is proven through an ablation study. When combined with a new multi-waypoint optimization algorithm, which uses a gradient-based method to converge to optimal velocities in waypoints, the proposed method generates minimum-time multi-waypoint trajectories within milliseconds. The proposed approach, which we provide as open-source package, is validated both in simulation and in real-world, using Nonlinear Model Predictive Control. With accelerations of up to 3.5g and speeds over 100 km/h, trajectories generated by the proposed method yield similar or even smaller tracking errors than the trajectories generated for a full multirotor model.
|
|
17:00-17:05, Paper TuDT5.6 | Add to My Program |
Variable Time-Step MPC for Agile Multi-Rotor UAV Interception of Dynamic Targets |
|
Ghotavadekar, Atharva | BITS Pilani K.K.Birla Goa Campus |
Nekovar, Frantisek | Czech Technical University in Prague |
Saska, Martin | Czech Technical University in Prague |
Faigl, Jan | Czech Technical University in Prague |
Keywords: Aerial Systems: Applications, Motion and Path Planning, Autonomous Vehicle Navigation
Abstract: Agile trajectory planning can improve the efficiency of multi-rotor Uncrewed Aerial Vehicles (UAVs) in scenarios with combined task-oriented and kinematic trajectory planning, such as monitoring spatio-temporal phenomena or intercepting dynamic targets. Agile planning using existing non-linear model predictive control methods is limited by the number of planning steps as it becomes increasingly computationally demanding. This reduces the prediction horizon length, which leads to a decrease in solution quality. Besides, the fixed time-step length limits the utilization of the available UAV dynamics in the target neighbourhood. In this paper, we propose to address these limitations by introducing variable time-steps and coupling them with the prediction horizon length. A simplified point-mass motion primitive is used to leverage the differential flatness of quadrotor dynamics and the generation of feasible trajectories in the flat output space. Based on evaluation results and experimentally validated deployment, the proposed method increases the solution quality by enabling planning for long flight segments but allowing tightly sampled maneuvering.
|
|
TuDT6 Regular Session, 307 |
Add to My Program |
Perception for Medical Robotics |
|
|
Chair: Ren, Hongliang | Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS) |
Co-Chair: Valdastri, Pietro | University of Leeds |
|
16:35-16:40, Paper TuDT6.1 | Add to My Program |
REMOTE: Real-Time Ego-Motion Tracking for Various Endoscopes Via Multimodal Visual Feature Learning |
|
Shao, Liangjing | Fudan University |
Chen, Benshuang | Fudan University |
Zhao, Shuting | Fudan University |
Chen, Xinrong | Fudan University |
Keywords: Computer Vision for Medical Robotics, Deep Learning for Visual Perception, Visual Tracking
Abstract: Real-time ego-motion tracking for endoscope is a significant task for efficient navigation and robotic automation of endoscopy. In this paper, a novel framework is proposed to perform real-time ego-motion tracking for endoscope. Firstly, a multi-modal visual feature learning network is proposed to perform relative pose prediction, in which the motion feature from the optical flow, the scene features and the joint feature from two adjacent observations are all extracted for prediction. Due to more correlation information in the channel dimension of the concatenated image, a novel feature extractor is designed based on attention mechanism to integrate multi-dimensional information from the concatenation of two continuous frames. To extract more complete feature representation from the fused features, a novel pose decoder is proposed to predict the pose transformation from the concatenated feature map at the end of the framework. At last, the absolute pose of endoscope is calculated based on relative poses. The experiment is conducted on three datasets of various endoscopic scenes and the results demonstrate that the proposed method outperforms state-of-the-art methods. Besides, the inference speed of the proposed method is over 30 frames per second, which meets the real-time requirement. The project page is here: https://remote-bmxs.netlify.app.
|
|
16:40-16:45, Paper TuDT6.2 | Add to My Program |
Intraoperative Trocar-Based Eyeball Rotation Estimation Using Only 2D Microscope Images |
|
Yang, Junjie | TUM |
Inagaki, Satoshi | NSK.Ltd |
Zhao, Zhihao | Technische Universität München |
Zapp, Daniel | Klinikum Rechts Der Isar Der TU München |
Maier, Mathias | Klinikum Rechts Der Isar Der TU München |
Issa, Peter Charbel | Klinikum Rechts Der Isar, Technical University of Munich |
Huang, Kai | Sun Yat-Sen University |
Navab, Nassir | TU Munich |
Nasseri, M. Ali | Technische Universitaet Muenchen |
Keywords: Computer Vision for Medical Robotics, Visual Tracking, Recognition
Abstract: In ophthalmic surgery, surgeons or robots manipulate a light probe and an instrument around two separated trocars following sclerotomy to achieve orbital control for eyeball pose adjustment and subsequent surgical tasks referring to microscope frames. However, current methods face significant challenges in directly extracting the eyeball pose from real-time microscope frames due to the limited microscope perspective and the darkened operating room (OR). This paper decomposes eyeball rotations only along the x and y axes. Then, a method of calculating eyeball poses using eyeball geometry and microscopic trocar positions is presented. This method is tested by simulation and a phantom system with current [-2.0, 2.8] degree error, providing assistant intraoperative eyeball status in the dark OR with extended method discussions.
|
|
16:45-16:50, Paper TuDT6.3 | Add to My Program |
Toward Zero-Shot Learning for Visual Dehazing of Urological Surgical Robots |
|
Wu, Renkai | Shanghai University |
Wang, Xianjin | Ruijin Hospital, Shanghai Jiaotong University School of Medicine |
Liang, Pengchen | Ruijin Hospital, Shanghai Jiaotong University School of Medicine |
Zhang, Zhenyu | Nanjing University |
Chang, Qing | Ruijin Hospital, Shanghai Jiao Tong University School of Medicin |
Tang, Hao | Peking University |
Keywords: Computer Vision for Medical Robotics, Data Sets for Robotic Vision, Robotics and Automation in Life Sciences
Abstract: Robot-assisted surgery has profoundly influenced current forms of minimally invasive surgery. However, in transurethral urological surgical robots, they need to work in a liquid environment. This causes vaporization of the liquid when shearing and heating is performed, resulting in bubble atomization that affects the visual perception of the robot. This can lead to the need for uninterrupted pauses in the surgical procedure, which makes the surgery take longer. To address the atomization characteristics of liquids under urological surgical robotic vision, we propose an unsupervised zero-shot dehaze method (RSF-Dehaze). Specifically, the proposed Region Similarity Filling Module (RSFM) of RSF-Dehaze significantly improves the recovery of blurred region tissues. In addition, we organize and propose a dehaze dataset for robotic vision in urological surgery (USRobot-Dehaze dataset). In particular, this dataset contains the three most common urological surgical robot operation scenarios. To the best of our knowledge, we are the first to organize and propose a publicly available dehaze dataset for urological surgical robot vision. The proposed RSF-Dehaze proves the effectiveness of our method in three urological surgical robot operation scenarios with extensive comparative experiments with 20 most classical and advanced dehazing and image recovery algorithms. The proposed source code and dataset are available at https://github.com/wurenkai/RSF-Dehaze .
|
|
16:50-16:55, Paper TuDT6.4 | Add to My Program |
Sim2Real within 5 Minutes: Efficient Domain Transfer with Stylized Gaussian Splatting for Endoscopic Images |
|
Wu, Junyang | Shanghai Jiao Tong University |
Gu, Yun | Shanghai Jiao Tong University |
Yang, Guang-Zhong | Shanghai Jiao Tong University |
Keywords: Computer Vision for Medical Robotics, Surgical Robotics: Laparoscopy, Surgical Robotics: Planning
Abstract: Robot assisted endoluminal intervention is an emerging technique for both benign and malignant luminal lesions. With vision-based navigation, when combined with pre-operative imaging data as priors, it is possible to recover position and pose of the endoscope without the need of additional sensors. In practice, however, aligning pre-operative and intra-operative domains is complicated by significant texture differences. Although methods such as style transfer can be used to address this issue, they require large datasets from both source and target domains with prolonged training times. This paper proposes an efficient domain transfer method based on stylized Gaussian splatting, only requiring a few of real images (10 images) with very fast training time. Specifically, the transfer process includes two phases. In the first phase, the 3D models reconstructed from CT scans are represented as differential Gaussian point clouds. In the second phase, only color appearance related parameters are optimized to transfer the style and preserve the visual content. A novel structure consistency loss is applied to latent features and depth levels to enhance the stability of the transferred images. Detailed validation was performed to demonstrate the performance advantages of the proposed method compared to that of the current state-of-the-art, highlighting the potential for intra-operative surgical navigation.
|
|
16:55-17:00, Paper TuDT6.5 | Add to My Program |
Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-Driven Surface Normal-Aware Tracking and Mapping |
|
Huang, Yiming | The Chinese University of Hong Kong |
Cui, Beilei | The Chinese University of Hong Kong |
Bai, Long | The Chinese University of Hong Kong |
Chen, Zhen | Centre for Artificial Intelligence and Robotics (CAIR), Hong Kon |
Wu, Jinlin | Institute of Automation, Chinese Academy of Sciences |
Li, Zhen | Qilu Hospital of Shandong University |
Liu, Hongbin | Hong Kong Institute of Science & Innovation, Chinese Academy Of |
Ren, Hongliang | Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS) |
Keywords: SLAM, Surgical Robotics: Laparoscopy, Computer Vision for Medical Robotics
Abstract: Simultaneous Localization and Mapping (SLAM) is essential for precise surgical interventions and robotic tasks in minimally invasive procedures. While recent advancements in 3D Gaussian Splatting (3DGS) have improved SLAM with high-quality novel view synthesis and fast rendering, these systems struggle with accurate depth and surface reconstruction due to multi-view inconsistencies. Simply incorporating SLAM and 3DGS leads to mismatches between the reconstructed frames. In this work, we present Endo-2DTAM, a real-time endoscopic SLAM system with 2D Gaussian Splatting (2DGS) to address these challenges. Endo-2DTAM incorporates a surface normal-aware pipeline, which consists of tracking, mapping, and bundle adjustment modules for geometrically accurate reconstruction. Our robust tracking module combines point-to-point and point-to-plane distance metrics, while the mapping module utilizes normal consistency and depth distortion to enhance surface reconstruction quality. We also introduce a pose-consistent strategy for efficient and geometrically coherent keyframe sampling. Extensive experiments on public endoscopic datasets demonstrate that Endo-2DTAM achieves an RMSE of 1.87±0.63 mm for depth reconstruction of surgical scenes while maintaining computationally efficient tracking, high-quality visual appearance, and real-time rendering. Our code will be released at github.com/lastbasket/Endo-2DTAM.
|
|
17:00-17:05, Paper TuDT6.6 | Add to My Program |
HFUS-NeRF: Hybrid Representation for Fast Ultrasound Reconstruction in Robotic Ultrasound System |
|
Zhang, Shuai | Hefei University of Technology |
Zhao, Cancan | Hefei University of Technology |
Ouyang, Bo | Hefei University of Technology |
Keywords: Health Care Management, Computer Vision for Medical Robotics, AI-Enabled Robotics
Abstract: Telemedicine is promising in digital healthcare management, such as supporting the coronavirus disease 2019 (COVID-19) pandemic. Three-dimensional (3D) ultrasound reconstruction and new view image synthesis, which can assist in diagnosis and reexamine, have significant potential in tele-ultrasound, especially integrating robotic ultrasound systems (RUSS). Neural Radiance Field (NeRF), an impressive reconstruction method, requires long training times, limiting its practicality in ultrasound. Despite NeRF variants achieving faster optimization, their performance remains confined to natural scene reconstructions. To address this limitation, we propose HFUS-NeRF, a hybrid representation method designed for fast and accurate ultrasound reconstruction. HFUS-NeRF integrates multi-resolution hash-grid and tri-plane representations to represent each sampling point of the ultrasonic wave. A unified model for sampling points from different ultrasonic probes is presented to simulate the wave's propagation through tissues, and the final ultrasound image is rendered using volume rendering. Compared with NeRF-based ultrasound reconstruction, both the hash grid and tri-plane resolutions can be scaled up more efficiently, improving reconstruction speed. Experimental results demonstrate that HFUS-NeRF enhances reconstruction quality while significantly reducing reconstruction time to mere minutes. Furthermore, we validated HFUS-NeRF’s adaptability by reconstruction using images from different types of ultrasound probes, and real-world experiments confirmed its feasibility and transferability, enabling fast ultrasound reconstruction on human subjects.
|
|
TuDT7 Regular Session, 309 |
Add to My Program |
Marine Robotics 2 |
|
|
Chair: Sukhatme, Gaurav | University of Southern California |
Co-Chair: Englot, Brendan | Stevens Institute of Technology |
|
16:35-16:40, Paper TuDT7.1 | Add to My Program |
Mission-Oriented Gaussian Process Motion Planning for UUVs Over Complex Seafloor Terrain and Current Flows |
|
Huang, Yewei | Stevens Institute of Technology |
Lin, Xi | Stevens Institute of Technology |
Hernandez-Rocha, Mariana | Stevens Institute of Technology |
Narain, Sanjai | Peraton Labs |
Pochiraju, Kishore | Stevens Institute of Technology |
Englot, Brendan | Stevens Institute of Technology |
Keywords: Marine Robotics, Motion and Path Planning, Constrained Motion Planning
Abstract: We present a novel motion planning framework for unmanned underwater vehicles (UUVs) - the first framework that applies Gaussian process motion planning to solve a 3D path planning problem for a 6-DoF robot in underwater environments. We address missions requiring UUVs to remain in close proximity to seafloor terrain, which must be achieved alongside collision avoidance. Our framework also considers the influence of current flows as part of the cost function, allowing for more accurate planning. To evaluate the performance of our proposed framework, we compare it with the widely used RRT* and STOMP algorithms over a range of underwater environments. Our experimental results demonstrate the stability and time efficiency of our framework.
|
|
16:40-16:45, Paper TuDT7.2 | Add to My Program |
Word2Wave: Language Driven Mission Programming for Efficient Subsea Deployments of Marine Robots |
|
Chen, Ruo | University of Florida |
Blow, David | University of Florida |
Abdullah, Adnan | University of Florida |
Islam, Md Jahidul | University of Florida |
Keywords: Marine Robotics, Field Robots
Abstract: This paper explores the design and development of a language-based interface for dynamic mission programming of autonomous underwater vehicles (AUVs). The proposed "Word2Wave" (W2W) framework enables interactive programming and parameter configuration of AUVs for remote subsea missions. The W2W framework includes: (i) a set of novel language rules and command structures for efficient language-to-mission mapping; (ii) a GPT-based prompt engineering module for training data generation; (iii) a small language model (SLM)-based sequence-to-sequence learning pipeline for mission command generation from human speech or text; and (iv) a novel user interface for 2D mission map visualization and human-machine interfacing. The proposed learning pipeline adapts an SLM named T5-Small that can learn language-to-mission mapping from processed language data effectively, providing robust and efficient performance. In addition to a benchmark evaluation with state-of-the-art, we conduct a user interaction study to demonstrate the effectiveness of W2W over commercial AUV programming interfaces. Across participants, W2W-based programming required less than 10% time for mission programming compared to traditional interfaces; it is deemed to be a simpler and more natural paradigm for subsea mission programming with a usability score of 76.25. W2W opens up promising future research opportunities on hands-free AUV mission programming for efficient subsea deployments.
|
|
16:45-16:50, Paper TuDT7.3 | Add to My Program |
Three-Dimensional Obstacle Avoidance and Path Planning for Unmanned Underwater Vehicles Using Elastic Bands (I) |
|
Amundsen, Herman Biørn | NTNU |
Føre, Martin | NTNU |
Ohrem, Sveinung Johan | SINTEF Ocean AS |
Haugaløkken, Bent | SINTEF Ocean |
Kelasidi, Eleni | NTNU |
Keywords: Collision Avoidance, Path Planning for Multiple Mobile Robots or Agents, Field Robots
Abstract: Unmanned underwater vehicles (UUVs) have become indispensable tools for inspection, maintenance, and repair (IMR) operations in the underwater domain. The major focus and novelty of this work is collision-free autonomous navigation of UUVs in dynamically changing environments. Path planning and obstacle avoidance are fundamental concepts for enabling autonomy for mobile robots. This remains a challenge, particularly for underwater vehicles operating in complex and dynamically changing environments. The elastic band method has been a suggested method for planning collision-free paths and is based on modeling the path as a dynamic system that will continuously be reshaped based on its surroundings. This article proposes adaptations to the method for underwater applications and presents a thorough investigation of the method for 3-D path planning and obstacle avoidance, both through simulations and extensive lab and field experiments. In the experiments, the method was used by a UUV operating autonomously at an industrial-scale fish farm and demonstrated that the method was able to successfully guide the vehicle through a challenging and constantly changing environment. The proposed work has broad applications for field deployment of marine robots in environments that require the vehicle to quickly react to changes in its surroundings.
|
|
16:50-16:55, Paper TuDT7.4 | Add to My Program |
A Data-Driven Velocity Estimator for Autonomous Underwater Vehicles Experiencing Unmeasurable Flow and Wave Disturbance |
|
Cai, Jinzhi | Hong Kong University of Science and Technology |
Mayberry, Scott | Georgia Institute of Technology |
Yin, Huan | Hong Kong University of Science and Technology |
Zhang, Fumin | Hong Kong University of Science and Technology |
Keywords: Marine Robotics, AI-Based Methods, Software-Hardware Integration for Robot Systems
Abstract: Autonomous Underwater Vehicles (AUVs) encounter significant challenges in confined spaces like ports and testing tanks, where vehicle-environment interactions, such as wave reflections and unsteady flows, introduce complex, time-varying disturbances. Model-based state estimation methods can struggle to handle these dynamics, leading to localization errors. To address this, we propose a data-driven velocity estimation approach using Inertial Measurement Units (IMUs) and a Gated Recurrent Unit (GRU) neural network, capturing temporal dependencies and rejecting external disturbances. This velocity estimator is then integrated into a sensor fusion framework using an asynchronous Kalman filter to improve localization by fusing on-board and off-board sensor information. Experimental validation on miniature AUVs demonstrates the effectiveness of the proposed method in enhancing accuracy for velocity and position estimation in environments with significant disturbances due to interactions between the vehicle and the environment.
|
|
16:55-17:00, Paper TuDT7.5 | Add to My Program |
Dynamic End Effector Trajectory Tracking for Small-Scale Underwater Vehicle-Manipulator Systems (UVMS): Modeling, Control, and Experimental Validation |
|
Trekel, Niklas | University of Bonn |
Bauschmann, Nathalie | Hamburg University of Technology |
Alff, Thies Lennart | Technische Universität Hamburg |
Duecker, Daniel Andre | Technical University of Munich (TUM) |
Haddadin, Sami | Mohamed Bin Zayed University of Artificial Intelligence |
Seifried, Robert | Hamburg University of Technology |
Keywords: Marine Robotics, Field Robots
Abstract: With the ongoing miniaturization, recently, lightweight, commercial underwater vehicle manipulator systems (UVMSs) have emerged that massively lower the entry barrier into underwater manipulation. Within this research field, dynamic and accurate end effector trajectory tracking is a crucial first step in developing autonomous capabilities. In this context, coupling effects between the manipulator and vehicle dynamics are expected to pose a considerable challenge. However, UVMS control strategies analyzed in detailed experimental studies are particularly rare. We present a holistic approach based on task-priority control that we describe and discuss from modeling towards extensive experimental studies, which are crucial for the notoriously hard-to-simulate underwater domain. We demonstrate this framework on the widely used platform of a BlueROV2 and an Alpha 5 manipulator. The end effector trajectory tracking is shown to be highly accurate, with < 4 cm median position error. Moreover, our experimental findings on the consideration of dynamic coupling within UVMS control motivate further research. The code is available at https://github.com/HippoCampusRobotics/uvms. A video of the results is available at https://youtu.be/IDMlI5KqlVI.
|
|
17:00-17:05, Paper TuDT7.6 | Add to My Program |
DeepVL: Dynamics and Inertial Measurements-Based Deep Velocity Learning for Underwater Odometry |
|
Singh, Mohit | NTNU: Norwegian University of Science and Technology |
Alexis, Kostas | NTNU - Norwegian University of Science and Technology |
Keywords: Marine Robotics, Visual-Inertial SLAM
Abstract: This paper presents a learned model to predict the robot-centric velocity of an underwater robot through dynamics-aware proprioception. The method exploits a recurrent neural network using as inputs inertial cues, motor commands, and battery voltage readings alongside the hidden state of the previous time-step to output robust velocity estimates and their associated uncertainty. An ensemble of networks is utilized to enhance the velocity and uncertainty predictions. Fusing the network's outputs into an Extended Kalman Filter, alongside inertial predictions and barometer updates, the method enables long-term underwater odometry without further exteroception. Furthermore, when integrated into visual-inertial odometry, the method assists in enhanced estimation resilience when dealing with an order of magnitude fewer total features tracked (as few as 1) as compared to conventional visual-inertial systems. Tested onboard an underwater robot deployed both in a laboratory pool and the Trondheim Fjord, the method takes less than 5ms for inference either on the CPU or the GPU of an NVIDIA Orin AGX and demonstrates less than 4% relative position error in novel trajectories during complete visual blackout, and approximately 2% relative error when a maximum of 2 visual features from a monocular camera are available.
|
|
TuDT8 Regular Session, 311 |
Add to My Program |
Representation Learning 1 |
|
|
Chair: Katz, Sydney | Stanford University |
Co-Chair: Pinto, Lerrel | New York University |
|
16:35-16:40, Paper TuDT8.1 | Add to My Program |
A Frequency-Based Attention Neural Network and Subject-Adaptive Transfer Learning for sEMG Hand Gesture Classification |
|
Nguyen, Phuc Thanh-Thien | National Taiwan University of Science and Technology |
Su, Shun-Feng | National Taiwan University of Science and Technology |
Kuo, Chung-Hsien | National Taiwan University |
Keywords: Gesture, Posture and Facial Expressions, Transfer Learning
Abstract: This study introduces a novel approach for real-time hand gesture classification through the integration of a Frequency-based Attention Neural Network (FANN) with Subject-Adaptive Transfer Learning, specifically tailored for surface electromyography (sEMG) data. By utilizing the Fourier transform, the proposed methodology leverages the inherent frequency characteristics of sEMG signals to enhance the discriminative features for accurate gesture recognition. Additionally, the subject-adaptive transfer learning strategy is employed to improve model generalization across different individuals. The combination of these techniques results in an effective and versatile system for sEMG-based hand gesture classification, demonstrating promising performance in adapting individual variability and improving classification accuracy. The proposed method’s performance is evaluated and compared with established approaches using the publicity available NinaPro DB5 dataset. Notably, the proposed simple model, coupled with frequency-based attention modules, achieves accuracies of 89.56% with a quick prediction time of 5ms, showcasing its potential for dexterous control of robots and bionic hands. The findings of this research contribute to the advancement of gesture recognition systems, particularly in the domains of human-computer interaction and prosthetic control.
|
|
16:40-16:45, Paper TuDT8.2 | Add to My Program |
P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot Policies |
|
Levy, Mara | University of Maryland, College Park |
Haldar, Siddhant | New York University |
Pinto, Lerrel | New York University |
Shrivastava, Abhinav | University of Maryland, College Park |
Keywords: Deep Learning Methods, Representation Learning, Visual Learning
Abstract: Developing generalizable robot policies that can robustly handle varied environmental conditions and object instances remains a fundamental challenge in robot learning. While considerable efforts have focused on collecting large robot datasets and developing policy architectures to learn from such data, naively learning from visual inputs often results in brittle policies that fail to transfer beyond the training data. This work presents Prescriptive Point Priors for Policies or P3-PO, a novel framework that constructs a unique state representation of the environment leveraging recent advances in computer vision and robot learning to achieve improved out-of-distribution generalization for robot manipulation. This representation is obtained through two steps. First, a human annotator prescribes a set of semantically meaningful points on a single demonstration frame. These points are then propagated through the dataset using off-the-shelf vision models. The derived points serve as an input to state-of-the-art policy architectures for policy learning. Our experiments across four real-world tasks demonstrate an overall 43% absolute improvement over prior methods when evaluated in identical settings as training. Further, P3-PO exhibits 58% and 80% gains across tasks for new object instances and more cluttered environments respectively. Videos illustrating the robot's performance are best viewed at point-priors.github.io.
|
|
16:45-16:50, Paper TuDT8.3 | Add to My Program |
APA-BI: Adaptive Partition Aggregation and Bidirectional Integration for UAV-View Geo-Localization |
|
Zhang, Xichen | Northeastern University |
Zhao, Shuying | Northeastern University |
Zhang, Yunzhou | Northeastern University |
Ge, Fawei | Northeastern University |
Zhao, Bin | Northeastern University |
Zhang, Yizhong | Northeastern University |
Keywords: Representation Learning, Localization, Deep Learning Methods
Abstract: The task of UAV-view geo-localization is to match a query image with database images to estimate the current geographic location of the query image. This is particularly useful in environments where GPS is not available or when the device fails. Although deep learning methods make sufficient progress in UAV-view geo-localization, they still face challenges in improving the distinguishability of features. For instance, some feature aggregation methods do not consider semantic integrity, and robust elements in the image are not given enough attention. This paper proposes a UAV-view geo-localization method (APA-BI) to tackle the above issues. Specifically, we propose an adaptive partition aggregation method to ensure feature integrity at the semantic level by increasing the receptive field of the classifier module. At the same time, we design a bidirectional integration module to further enhance feature distinguishability by extracting robust tubular topological structures from images. Experimental results on public datasets demonstrate that APA-BI achieves impressive retrieval accuracy and outperforms most state-of-the-art methods. Moreover, the test results of APA-BI in real-world scenarios also show excellent performance.
|
|
16:50-16:55, Paper TuDT8.4 | Add to My Program |
Robo-MUTUAL: Robotic Multimodal Task Specification Via Unimodal Learning |
|
Li, Jianxiong | Tsinghua University |
Wang, Zhihao | Peking University |
Zheng, Jinliang | Tsinghua University |
Zhou, Xiaoai | University of Toronto |
Wang, Guanming | University College London |
Song, Guanglu | SenseTime Research |
Liu, Yu | SenseTime Group Limited |
Liu, Jingjing | Institute for AI Industry Research (AIR), Tsinghua University |
Zhang, Ya-Qin | Institute for AI Industry Research(AIR), Tsinghua University |
Yu, Junzhi | Peking University |
Zhan, Xianyuan | Tsinghua University |
Keywords: Representation Learning, Imitation Learning
Abstract: Multimodal task specification is essential for enhanced robotic performance, where Cross-modality Alignment enables the robot to holistically understand complex task instructions. Directly annotating multimodal instructions for model training proves impractical, due to the sparsity of paired multimodal data. In this study, we demonstrate that by leveraging unimodal instructions abundant in real data, we can effectively teach robots to learn multimodal task specifications. First, we endow the robot with strong Crossmodality Alignment capabilities, by pretraining a robotic multimodal encoder using extensive out-of-domain data. Then, we employ two Collapse and Corrupt operations to further bridge the remaining modality gap in the learned multimodal representation. This approach projects different modalities of identical task goal as interchangeable representations, thus enabling accurate robotic operations within a well-aligned multimodal latent space. Evaluation across more than 130 tasks and 4000 evaluations on both simulated LIBERO benchmark and real robot platforms showcases the superior capabilities of our proposed framework, demonstrating significant potential in overcoming data constraints in robotic learning. Website: zh1hao.wang/Robo MUTUAL
|
|
16:55-17:00, Paper TuDT8.5 | Add to My Program |
Model Free Method of Screening Training Data for Adversarial Datapoints through Local Lipschitz Quotient Analysis |
|
Kamienski, Emily | Massachusetts Institute of Technology |
Asada, Harry | MIT |
Keywords: Deep learning methods, data sets for robot learning, Lipschitz quotient, data preparation, adversarial data
Abstract: It is often challenging to pick suitable data features for learning problems. Sometimes certain regions of the data are harder to learn because they are not well characterized by the data features. The challenge is amplified when resources for sensing and computation are limited and time-critical yet reliable decisions must be made. For example, a robotic system for preventing falls of elderly people needs a real-time fall predictor, with low false positive and false negative rates, using a simple wearable sensor to activate a fall prevention mechanism. Here we present a methodology for assessing the learnability of data based on the Lipschitz quotient. We develop a procedure for determining which regions of the dataset contain adversarial data points, input data that look similar but belong to different target classes. Regardless of the learning model, it will be hard to learn such data. We then present a method for determining which additional feature(s) are most effective in improving the predictability of each of these regions. This is a model-independent data analysis that can be executed before constructing a prediction model through machine learning or other techniques. We demonstrate this method on two synthetic datasets and a data set of human falls, which uses inertial measurement unit signals. For the fall dataset, we were able to identify two groups of adversarial data points and improve the predictability of each group over the baseline dataset as assessed by Lipschitz by using 2 different sets of features. This work offers a valuable tool for assessing data learnability that can be applied to not only fall prediction problems, but also other robotics applications that learn from data.
|
|
17:00-17:05, Paper TuDT8.6 | Add to My Program |
3D Space Perception Via Disparity Learning Using Stereo Images and an Attention Mechanism: Real-Time Grasping Motion Generation for Transparent Objects |
|
Cai, Xianbo | Waseda University |
Ito, Hiroshi | Hitachi, Ltd. / Waseda University |
Hiruma, Hyogo | Waseda University |
Ogata, Tetsuya | Waseda University |
Keywords: Representation Learning, Perception-Action Coupling
Abstract: Object grasping in 3D space is crucial for robotic applications. Such tasks are performed by utilizing depth map data acquired from RGB-D images or 3D point cloud data. However, these methods struggle when dealing with transparent objects, as transparency limits sensor performance when predicting depth maps. Additionally, the grasping motions are predicted without incorporating the relationship between depth data and motion information, which limits the motion’s flexibility. In this paper, to address these problems, we propose an end-to-end motion generation model using stereo RGB images, a deep-learning model that incorporates image and motion information. Furthermore, visual attention mechanisms are used for extracting task-related attention points, which is essential for building spatial cognition constructs. Real-robot experimental results confirmed that the proposed model is able to grasp transparent objects under various situations, including unseen positions, heights, and background. It was also found that the model self-organized a spatial cognition representation within its hidden states, suggesting that the integrated learning of robot motion and stable spatial attention points is important for spatial perception. Such explicit feature representations cannot be obtained via learning motion alone.
|
|
TuDT9 Regular Session, 312 |
Add to My Program |
Motion Planning 4 |
|
|
Chair: Uwacu, Diane | Texas A&M University |
Co-Chair: Lee, Dongjun | Seoul National University |
|
16:35-16:40, Paper TuDT9.1 | Add to My Program |
Making a Complete Mess and Getting Away with It: Traveling Salesperson Problems with Circle Placement Variants |
|
Woller, David | Czech Technical University in Prague |
Mansouri, Masoumeh | Birmingham University |
Kulich, Miroslav | Czech Technical University in Prague |
Keywords: Task and Motion Planning, Constrained Motion Planning, Computational Geometry
Abstract: This paper explores a variation of the Traveling Salesperson Problem, where the agent places a circular obstacle next to each node once it visits it. Referred to as the Traveling Salesperson Problem with Circle Placement (TSP-CP), the aim is to maximize the obstacle radius for which a valid closed tour exists and then minimize the tour cost. The TSP-CP finds relevance in various real-world applications, such as harvesting, quarrying, and open-pit mining. We propose several novel solvers to address the TSP-CP, its variant tailored for Dubins vehicles, and a crucial subproblem known as the Traveling Salesperson Problem on self-deleting graphs (TSP-SD). Our extensive experimental results show that the proposed solvers outperform the current state-of-the-art on related problems in solution quality.
|
|
16:40-16:45, Paper TuDT9.2 | Add to My Program |
Narrow Passage Path Planning Using Collision Constraint Interpolation |
|
Lee, Minji | Seoul National University |
Lee, Jeongmin | Seoul National University |
Lee, Dongjun | Seoul National University |
Keywords: Motion and Path Planning, Constrained Motion Planning, Manipulation Planning
Abstract: Narrow passage path planning is a prevalent problem from industrial to household sites, often facing difficulties in finding feasible paths or requiring excessive computational resources. Given that deep penetration into the environment can cause optimization failure, we propose a framework to ensure feasibility throughout the process using a series of subproblems tailored for narrow passage problem. We begin by decomposing the environment into convex objects and initializing collision constraints with a subset of these objects. By continuously interpolating the collision constraints through the process of sequentially introducing remaining objects, our proposed framework generates subproblems that guide the optimization toward solving the narrow passage problem. Several examples are presented to demonstrate how the proposed framework addresses narrow passage path planning problems.
|
|
16:45-16:50, Paper TuDT9.3 | Add to My Program |
Trajectory Planning with Signal Temporal Logic Costs Using Deterministic Path Integral Optimization |
|
Halder, Patrick | ZF Friedrichshafen AG |
Homburger, Hannes | HTWG Konstanz, Institute of System Dynamics |
Kiltz, Lothar | ZF Friedrichshafen AG |
Reuter, Johannes | University of Applied Sciences Constance |
Althoff, Matthias | Technische Universität München |
Keywords: Task and Motion Planning, Optimization and Optimal Control, Motion and Path Planning
Abstract: Formulating the intended behavior of a dynamic system can be challenging. Signal temporal logic (STL) is frequently used for this purpose due to its suitability in formalizing comprehensible, modular, and versatile spatio-temporal specifications. Due to scaling issues with respect to the complexity of the specifications and the potential occurrence of non-differentiable terms, classical optimization methods often solve STL-based problems inefficiently. Smoothing and approximation techniques can alleviate these issues but require changing the optimization problem. This paper proposes a novel sampling-based method based on model predictive path integral control to solve optimal control problems with STL cost functions. We demonstrate the effectiveness of our method on benchmark motion planning problems and compare its performance with state-of-the-art methods. The results show that our method efficiently solves optimal control problems with STL costs.
|
|
16:50-16:55, Paper TuDT9.4 | Add to My Program |
Multi-Agent Path Finding Using Conflict-Based Search and Structural-Semantic Topometric Maps |
|
Fredriksson, Scott | Luleå University of Technology |
Bai, Yifan | Luleå University of Technology |
Saradagi, Akshit | Luleå University of Technology, Luleå, Sweden |
Nikolakopoulos, George | Luleå University of Technology |
Keywords: Path Planning for Multiple Mobile Robots or Agents, Motion and Path Planning, Multi-Robot Systems
Abstract: As industries increasingly adopt large robotic fleets, there is a pressing need for computationally efficient, practical, and optimal conflict-free path planning for multiple robots. Conflict-Based Search (CBS) is a popular method for multi-agent path finding (MAPF) due to its completeness and optimality; however, it is often impractical for real-world applications, as it is computationally intensive to solve and relies on assumptions about agents and operating environments that are difficult to realize. This article proposes a solution to overcome computational challenges and practicality issues of CBS by utilizing structural-semantic topometric maps. Instead of running CBS over large grid-based maps, the proposed solution runs CBS over a sparse topometric map containing structural-semantic cells representing intersections, pathways, and dead ends. This approach significantly accelerates the MAPF process and reduces the number of conflict resolutions handled by CBS while operating in continuous time. In the proposed method, robots are assigned time ranges to move between topometric regions, departing from the traditional CBS assumption that a robot can move to any connected cell in a single time step. The approach is validated through real-world multi-robot path-finding experiments and benchmarking simulations. The results demonstrate that the proposed MAPF method can be applied to real-world non-holonomic robots and yields significant improvement in computational efficiency compared to traditional CBS methods while improving conflict detection and resolution in cases of corridor symmetries.
|
|
16:55-17:00, Paper TuDT9.5 | Add to My Program |
Topo-Geometrically Distinct Path Computation Using Neighborhood-Augmented Graph, and Its Application to Path Planning for a Tethered Robot in 3D |
|
Sahin, Alp | Lehigh University |
Bhattacharya, Subhrajit | Lehigh University |
Keywords: Motion and Path Planning, Optimization and Optimal Control, Foundations of Automation, Multi Path Planning
Abstract: Many robotics applications benefit from being able to compute multiple geodesic paths in a given configuration space. Existing paradigm is to use topological path planning, which can compute optimal paths in distinct topological classes. However, these methods usually require non-trivial geometric constructions which are prohibitively expensive in 3D, and are unable to distinguish between distinct topologically equivalent geodesics that are created due to high-cost/curvature regions or prismatic obstacles in 3D. In this paper, we propose an approach to compute k geodesic paths using the concept of a novel neighborhood-augmented graph, on which graph search algorithms can compute multiple optimal paths that are topo-geometrically distinct. Our approach does not require complex geometric constructions, and the resulting paths are not restricted to distinct topological classes, making the algorithm suitable for problems where finding and distinguishing between geodesic paths are of interest. We demonstrate the application of our algorithm to planning shortest traversible paths for a tethered robot in 3D with cable-length constraint.
|
|
17:00-17:05, Paper TuDT9.6 | Add to My Program |
Homotopy-Aware Efficiently Adaptive State Lattices for Mobile Robot Motion Planning in Cluttered Environments |
|
Menon, Ashwin | University of Rochester |
Damm, Eric | University of Rochester |
Howard, Thomas | University of Rochester |
Keywords: Field Robots, Motion and Path Planning
Abstract: Mobile robot navigation architectures that employ a planning algorithm to provide a single optimal path to follow are flawed in the presence of unstructured, rapidly changing environments. As the environment updates, optimal plans often oscillate around discrete obstacles, which is problematic for path following controllers that are biased to follow the planned route. A potentially better approach involves the generation of multiple plans, each optimal within their own homotopy class, to provide a more comprehensive approximation of cost to goal for a path-following controller. In this paper, we present Homotopy-Aware Efficiently Adaptive State Lattices (HAEASL), which uses multiple open lists to bias search towards routes with distinct homotopy classes. Experiments are presented that measure the number, the optimality, and the diversity of solutions generated across 3,200 planning problems in 80 randomly generated environments. The performance of HAEASL is benchmarked against two previous approaches: Search-Based Path Planning with Homotopy Class Constraints (A*HC) and Homotopy-Aware RRT* (HARRT*). Experimental results demonstrate that HAEASL can generate a greater number of paths and more diverse paths than A*HC without a significant reduction of optimality. Additionally, results demonstrate that HAEASL generates a greater number of paths and ones with lower costs than HARRT*. A final demonstration of HAEASL generating multiple solutions subject to temporal, resource, and kinodynamic constraints using data collected from an off-road mobile robot illustrates the suitability of the approach for the motivating example.
|
|
TuDT10 Regular Session, 313 |
Add to My Program |
Multi-Robot and Human-Robot Teams |
|
|
Chair: Min, Byung-Cheol | Purdue University |
Co-Chair: Sevil, Hakki Erhan | University of West Florida |
|
16:35-16:40, Paper TuDT10.1 | Add to My Program |
Initial Task Allocation in Multi-Human Multi-Robot Teams: An Attention-Enhanced Hierarchical Reinforcement Learning Approach |
|
Wang, Ruiqi | Purdue University |
Zhao, Dezhong | Beijing University of Chemical Technology |
Gupte, Arjun | Purdue University |
Min, Byung-Cheol | Purdue University |
Keywords: Human-Robot Teaming, Human-Robot Collaboration, Design and Human Factors
Abstract: Multi-human multi-robot teams (MH-MR) obtain tremendous potential in tackling intricate and massive missions by merging distinct strengths and expertise of individual members. The inherent heterogeneity of these teams necessitates advanced initial task allocation (ITA) methods that align tasks with the intrinsic capabilities of team members from the outset. While existing reinforcement learning approaches show encouraging results, they might fall short in addressing the nuances of long-horizon ITA problems, particularly in settings with large-scale MH-MR teams or multifaceted tasks. To bridge this gap, we propose an attention-enhanced hierarchical reinforcement learning approach that decomposes the complex ITA problem into structured sub-problems, facilitating more efficient allocations. To bolster sub-policy learning, we introduce a hierarchical cross-attribute attention (HCA) mechanism, encouraging each sub-policy within the hierarchy to discern and leverage the specific nuances in the state space that are crucial for its respective decision-making phase. Through an extensive environmental surveillance case study, we demonstrate the benefits of our model and the HCA inside.
|
|
16:40-16:45, Paper TuDT10.2 | Add to My Program |
Enabling Multi-Robot Collaboration from Single-Human Guidance |
|
Ji, Zhengran | Duke University |
Zhang, Lingyu | Duke University |
Sajda, Paul | Columbia University |
Chen, Boyuan | Duke University |
Keywords: Human Factors and Human-in-the-Loop, Learning from Demonstration, Multi-Robot Systems
Abstract: Learning collaborative behaviors is essential for multi-agent systems. Traditionally, multi-agent reinforcement learning solves this implicitly through a joint reward and centralized observations, assuming collaborative behavior will emerge. Other studies propose to learn from demonstrations of a group of collaborative experts. We instead propose an efficient and explicit way of learning collaborative behaviors in multi-agent systems by leveraging expertise from only a single human. Our insight is that humans have the natural ability to take on various roles in a team. We show that by allowing a human operator to dynamically switch between controlling agents for a short period of time and incorporating a human-like theory-of-mind model of teammates, agents can effectively learn to collaborate. Our experiments showed that our method improves the success rate of a challenging collaborative hide-and-seek task by up to 58% with only 40 minutes of human guidance. We further demonstrate our findings transfer to the real world by conducting multi-robot experiments.
|
|
16:45-16:50, Paper TuDT10.3 | Add to My Program |
Fan-Out Revisited: The Impact of the Human Element on Scalability of Human Multi-Robot Teams |
|
Perkins, Lawrence Dale | University of West Florida |
Johnson, Matthew | Inst. for Human & Machine Cognition |
Sevil, Hakki Erhan | University of West Florida |
Goodrich, Michael A. | Brigham Young University |
Keywords: Human-Robot Teaming, Human-Robot Collaboration, Multi-Robot Systems
Abstract: This paper introduces a novel fan-out model that improves accuracy over previous models. The commonly used models rely on neglect time, the time an agent operates independently, which confounds both human and robot abilities. The proposed model separates neglect time into two functionally distinct concepts: the time a robot can operate self-sufficiently, and the time a human estimates the robot can do so. Previous research indicates fan-out is often overestimated. This work explains why robot ability provides an upper bound to fanout, but that actual achieved fan-out is influenced by both the human and robot abilities. We conduct a study to validate this new model and show improved performance over the two most common fan-out models. The results show that both previous models overestimate as predicted. Using the new fan-out model, we show that as the difference between human estimation and robot abilities grows, the actual fan-out will fall further from the upper bound potential fan-out. By including assessments of both the robotic and human elements, the new model provides a more nuanced understanding of the dynamics at play and the factors involved in scaling Human Multi-Robot Teams.
|
|
16:50-16:55, Paper TuDT10.4 | Add to My Program |
HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning |
|
Hu, Huawen | Northwestern Polytechnical University |
Shi, Enze | Northwestern Polytechnical University |
Yue, Chenxi | Northwestern Polytechnical University |
Yang, Shuocun | Northwestern Polytechnical University |
Wu, Zihao | University of Georgia |
Li, Yiwei | UGA |
Zhong, Tianyang | Northwestern Polytechnical University |
Zhang, Tuo | Northwestern Polytechnical University |
Liu, Tianming | University of Georgia |
Zhang, Shu | Northwestern Polytechnical University |
Keywords: Human Factors and Human-in-the-Loop, Reinforcement Learning, Human-Robot Collaboration
Abstract: Human-in-the-loop reinforcement learning integrates human expertise to accelerate agent learning and provide critical guidance and feedback in complex fields. However, many existing approaches focus on single-agent tasks and require continuous human involvement during the training process, significantly increasing the human workload and limiting scalability. In this paper, we propose HARP (Human-Assisted Regrouping with Permutation Invariant Critic), a multi-agent reinforcement learning framework designed for group-oriented tasks. HARP integrates automatic agent regrouping with strategic human assistance during deployment, enabling and allowing non-experts to offer effective guidance with minimal intervention. During training, agents dynamically adjust their groupings to optimize collaborative task completion. When deployed, they actively seek human assistance and utilize the Permutation Invariant Group Critic to evaluate and refine human-proposed groupings, allowing non-expert users to contribute valuable suggestions. In multiple collaboration scenarios, our approach is able to leverage limited guidance from non-experts and enhance performance. The project can be found at https://github.com/huawen-hu/HARP.
|
|
16:55-17:00, Paper TuDT10.5 | Add to My Program |
Training Human-Robot Teams by Improving Transparency through a Virtual Spectator Interface |
|
Dallas, Sean | Oakland University |
Qiang, Hongjiao | University of Michigan |
AbuHijleh, Motaz | Oakland University |
Jo, Wonse | University of Michigan |
Riegner, Kayla | Ground Vehicle Systems Center (GVSC) |
Smereka, Jonathon M. | U.S. Army TARDEC |
Robert, Lionel | University of Michigan |
Louie, Wing-Yue Geoffrey | Oakland University |
Tilbury, Dawn | University of Michigan |
Keywords: Human-Robot Teaming, Human Factors and Human-in-the-Loop, Human-Centered Robotics
Abstract: After-action reviews (AARs) are professional discussions that help operators and teams enhance their task performance by analyzing completed missions with peers and professionals. Previous studies comparing different formats of AARs have focused mainly on human teams. However, the inclusion of robotic teammates brings along new challenges in understanding teammate intent and communication. Traditional AAR between human teammates may not be satisfactory for human-robot teams. To address this limitation, we propose a new training review (TR) tool, called the Virtual Spectator Interface (VSI), to enhance human-robot team performance and situational awareness (SA) in a simulated search mission. The proposed VSI primarily utilizes visual feedback to review subjects’ behavior. To examine the effectiveness of VSI, we took elements from AAR to conduct our own TR, and designed a 1 × 3 between-subjects experiment with experimental conditions: TR with (1) VSI, (2) screen recording, and (3) non-technology (only verbal descriptions). The results of our experiments demonstrated that the VSI did not result in significantly better team performance than other conditions. However, the TR with VSI led to more improvement in the subjects’ SA over the other conditions.
|
|
17:00-17:05, Paper TuDT10.6 | Add to My Program |
Adaptive Task Allocation in Multi-Human Multi-Robot Teams under Team Heterogeneity and Dynamic Information Uncertainty |
|
Yuan, Ziqin | Purdue University |
Wang, Ruiqi | Purdue University |
Kim, Taehyeon | Purdue University |
Zhao, Dezhong | Beijing University of Chemical Technology |
Obi, Ike | Purdue University |
Min, Byung-Cheol | Purdue University |
Keywords: Human-Robot Teaming, Task Planning, Reinforcement Learning
Abstract: Task allocation in multi-human multi-robot (MH-MR) teams presents significant challenges due to the inherent heterogeneity of team members, the dynamics of task execution, and the information uncertainty of operational states. Existing approaches often fail to address these challenges simultaneously, resulting in suboptimal performance. To tackle this, we propose an adaptive task allocation method using hierarchical reinforcement learning (HRL), incorporating initial task allocation (ITA) that leverages team heterogeneity and conditional task reallocation in response to dynamic operational states. Additionally, we introduce an auxiliary state representation learning task to manage information uncertainty and enhance task execution. Through an extensive case study in large-scale environmental monitoring tasks, we demonstrate the benefits of our approach. More details are available on our website: https://sites.google.com/view/ata-hrl.
|
|
TuDT11 Regular Session, 314 |
Add to My Program |
Human-Robot Interaction 2 |
|
|
Chair: Dogan, Fethiye Irmak | University of Cambridge |
Co-Chair: Alves-Oliveira, Patrícia | Amazon Lab126 |
|
16:35-16:40, Paper TuDT11.1 | Add to My Program |
“Don’t Forget to Put the Milk Back!” Dataset for Enabling Embodied Agents to Detect Anomalous Situations |
|
Mullen, James | University of Maryland |
Goyal, Prasoon | Amazon |
Piramuthu, Robinson | Amazon |
Johnston, Michael | Amazon |
Manocha, Dinesh | University of Maryland |
Ghanadan, Reza | Amazon |
Keywords: Robot Companions, AI-Based Methods, Semantic Scene Understanding
Abstract: Home robots intend to make their users lives easier. Our work aims to assist in this goal by enabling robots to inform their users of dangerous or unsanitary anomalies in their home. Some examples of these anomalies include the user leaving their milk out, forgetting to turn off the stove, or leaving poison accessible to children. To move towards enabling home robots with these abilities, we have created a new dataset, which we call SafetyDetect. The SafetyDetect dataset consists of 1000 anomalous home scenes, each of which contains unsafe or unsanitary situations for an agent to detect. Our approach utilizes large language models (LLMs) alongside both a graph representation of the scene and the relationships between the objects in the scene. Our key insight is that this connected scene graph and the object relationships it encodes enables the LLM to better reason about the scene --- especially as it relates to detecting dangerous or unsanitary situations. Our most promising approach utilizes GPT-4 and pursues a classification technique where object relations from the scene graph are classified as normal, dangerous, unsanitary, or dangerous for children. This method is able to correctly identify over 90% of anomalous scenarios in the SafetyDetect Dataset. Additionally, we conduct real world experiments on a ClearPath TurtleBot where we generate a scene graph from visuals of the real world scene, and run our approach with no modification. This setup resulted in little performance loss. The SafetyDetect dataset and code will be released to the public upon this papers publication.
|
|
16:40-16:45, Paper TuDT11.2 | Add to My Program |
Development of Contactless Delivery Service Robot with Modular Working Platform in Isolation Wards |
|
Yang, Kyon-Mo | Korea Institute of Robot and Convergence |
Koo, Jaewan | Korea Institute of Robotics and Technology Convergence ; KIRO |
Seo, Kap-Ho | Korea Institute of Robot and Convergence |
Keywords: Human-Centered Automation, Medical Robots and Systems, Human-Centered Robotics
Abstract: Preventing cross-infection is crucial for robots designed to assist medical staff in isolation wards during outbreaks of infectious diseases like COVID-19. This paper proposes a modular robotic system with a working platform and a mobile base to prevent cross-infection during item delivery and waste transport. An alignment structure for combining the two platforms is introduced, and a marker map and barcode-based destination input system were developed to allow medical staff without specialized robotics knowledge to use the system without additional training. The effectiveness of this robot's service was evaluated through a System Usability Scale (SUS) test with twenty medical staff working in isolation wards, achieving an average score of 77.12. This indicates a high level of usability, suggesting that this robot can significantly contribute to safe and efficient hospital operations during pandemic situations.
|
|
16:45-16:50, Paper TuDT11.3 | Add to My Program |
RACCOON: Grounding Embodied Question-Answering with State Summaries from Existing Robot Modules |
|
Bustamante, Samuel | German Aeroespace Center (DLR), Robotics and Mechatronics Center |
Knauer, Markus | German Aerospace Center (DLR) |
Thun, Jeremias | University Bremen |
Schneyer, Stefan | German Aerospace Center (DLR) |
Albu-Schäffer, Alin | DLR - German Aerospace Center |
Weber, Bernhard | German Aerospace Center |
Stulp, Freek | DLR - Deutsches Zentrum Für Luft Und Raumfahrt E.V |
Keywords: Human-Centered Robotics
Abstract: Explainability is vital for establishing user trust, also in robotics. Recently, foundation models (e.g. vision-language models, VLMs) fostered a wave of embodied agents that answer arbitrary queries about their environment and their interactions with it. However, naively prompting VLMs to answer queries based on camera images does not take into account existing robot architectures which represent the robot's tasks, skills, and beliefs about the state of the world. To overcome this limitation, we propose RACCOON, a framework that combines foundation models' responses with a robot's internal knowledge. Inspired by Retrieval-Augmented Generation (RAG), RACCOON selects relevant context, retrieves information from the robot's state, and utilizes it to refine prompts for an LLM to answer questions accurately. This bridges the gap between the model's adaptability and the robot's domain expertise.
|
|
16:50-16:55, Paper TuDT11.4 | Add to My Program |
GRACE: Generating Socially Appropriate Robot Actions Leveraging LLMs and Human Explanations |
|
Dogan, Fethiye Irmak | University of Cambridge |
Ozyurt, Umut | Middle East Technical University |
Çınar, Gizem | Bilkent University |
Gunes, Hatice | University of Cambridge |
Keywords: Human Factors and Human-in-the-Loop, Human-Centered Robotics, Deep Learning Methods
Abstract: When operating in human environments, robots need to handle complex tasks while both adhering to social norms and accommodating individual preferences. For instance, based on common sense knowledge, a household robot can predict that it should avoid vacuuming during a social gathering, but it may still be uncertain whether it should vacuum before or after having guests. In such cases, integrating common-sense knowledge with human preferences, often conveyed through human explanations, is fundamental yet a challenge for existing systems. In this paper, we introduce GRACE, a novel approach addressing this while generating socially appropriate robot actions. GRACE leverages common sense knowledge from LLMs, and it integrates this knowledge with human explanations through a generative network. The bidirectional structure of GRACE enables robots to refine and enhance LLM predictions by utilizing human explanations and makes robots capable of generating such explanations for human-specified actions. Our evaluations show that integrating human explanations boosts GRACE's performance, where it outperforms several baselines and provides sensible explanations.
|
|
16:55-17:00, Paper TuDT11.5 | Add to My Program |
Robi Butler: Multimodal Remote Interaction with a Household Robot Assistant |
|
Xiao, Anxing | National University of Singapore |
Janaka, Nuwan | National University of Singapore |
Hu, Tianrun | National University of Singapore |
Gupta, Anshul | National University of Singapore |
Li, Kaixin | National University of Singapore |
Yu, Cunjun | NUS |
Hsu, David | National University of Singapore |
Keywords: Human-Centered Robotics, AI-Enabled Robotics, Virtual Reality and Interfaces
Abstract: Imagine a future when we can Zoom-call a robot to manage household chores remotely. This work takes one step in this direction. Robi Butler is a new household robot assistant that enables seamless multimodal remote interaction. It allows the human user to monitor its environment from a first-person view, issue voice or text commands, and specify target objects through hand-pointing gestures. At its core, a high-level behavior module, powered by Large Language Models (LLMs), interprets multimodal instructions to generate multistep action plans. Each plan consists of open-vocabulary primitives supported by vision-language models, enabling the robot to process both textual and gestural inputs. Zoom provides a convenient interface to implement remote interactions between the human and the robot. The integration of these components allows Robi Butler to ground remote multimodal instructions in real-world home environments in a zero-shot manner. We evaluated the system on various household tasks, demonstrating its ability to execute complex user commands with multimodal inputs. We also conducted a user study to examine how multimodal interaction influences user experiences in remote human-robot interaction. These results suggest that with the advances in robot foundation models, we are moving closer to the reality of remote household robot assistants.
|
|
17:00-17:05, Paper TuDT11.6 | Add to My Program |
AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-To-Specific Task Decomposition and Knowledge Refinement |
|
Singh, Shivam | International Institute of Information Technology Hyderabad |
Swaminathan, Karthik | International Institutue of Information Technology - Hyderabad ( |
Dash, Nabanita | International Institute of Information Technology, Hyderabad |
Singh, Ramandeep | International Institute of Information Technology, Hyderabad |
Banerjee, Snehasis | Iiit-H / Tcs |
Sridharan, Mohan | University of Edinburgh |
Krishna, Madhava | IIIT Hyderabad |
Keywords: Human Factors and Human-in-the-Loop, AI-Based Methods, Task Planning
Abstract: An embodied agent assisting humans is often asked to complete new tasks, and there may not be sufficient time or labeled examples to train the agent to perform these new tasks. Large Language Models (LLMs) trained on considerable knowledge across many domains can be used to predict a sequence of abstract actions for completing such tasks, although the agent may not be able to execute this sequence due to task-, agent-, or domain-specific constraints. Our framework addresses these challenges by leveraging the generic predictions provided by LLM and the prior domain knowledge encoded in a Knowledge Graph (KG), enabling an agent to quickly adapt to new tasks. The robot also solicits and uses human input as needed to refine its existing knowledge. Based on experimental evaluation in the context of cooking and cleaning tasks in simulation domains, we demonstrate that the interplay between LLM, KG, and human input leads to substantial performance gains compared with just using the LLM. Project website§: https://sssshivvvv.github.io/adaptbot/
|
|
TuDT12 Regular Session, 315 |
Add to My Program |
Information Gathering, Planning and Control in Challenging Environments |
|
|
Chair: Oh, Hyondong | UNIST |
Co-Chair: Bobadilla, Leonardo | Florida International University |
|
16:35-16:40, Paper TuDT12.1 | Add to My Program |
LCD-RIG: Limited Communication Decentralized Robotic Information Gathering Systems |
|
Redwan Newaz, Abdullah Al | University of New Orleans |
Padrao, Paulo | Florida International University |
Fuentes, Jose | Florida International University |
Alam, Tauhidul | Lamar University |
Govindarajan, Ganesh | Florida International University |
Bobadilla, Leonardo | Florida International University |
Keywords: Environment Monitoring and Management, Planning, Scheduling and Coordination, Distributed Robot Systems
Abstract: Effective data collection in collaborative information-gathering systems relies heavily on maintaining uninterrupted connectivity. Yet, real-world communication disruptions often pose challenges to information-gathering processes. To address this issue, we introduce a novel method —a limited communication decentralized information gathering system for multiple robots to explore environmental phenomena characterized as unknown spatial fields. Our method leverages quadtree structures to ensure comprehensive workspace coverage and efficient exploration. Unlike traditional systems that depend on global and synchronous communication, our method enables robots to share local experiences within a limited transmission range and coordinate their tasks through pairwise and asynchronous communication. Information estimation is facilitated by a Gaussian Process with an Attentive Kernel, allowing adaptive capturing of crucial behavior and data patterns. Our proposed system undergoes validation through simulated scalar field studies in non-stationary environments where multiple robots explore spatial fields. Theoretical guarantees ensure the convergence of distributed area coverage and the regret bounds of distributed online scalar field mapping. We also validate the applicability of our method empirically in a water quality monitoring scenario featuring two Autonomous Surface Vehicles, tasked with constructing a spatial field.
|
|
16:40-16:45, Paper TuDT12.2 | Add to My Program |
Multi-Agent Path Planning in Complex Environments Using Gaussian Belief Propagation with Global Path Finding |
|
Jensen, Jens Høigaard | Aarhus University |
Plagborg Bak Sørensen, Kristoffer | Aarhus University |
le Fevre Sejersen, Jonas | Aarhus University |
Sarabakha, Andriy | Aarhus University |
Keywords: Multi-Robot Systems, Path Planning for Multiple Mobile Robots or Agents, Collision Avoidance
Abstract: Multi-agent path planning is a critical challenge in robotics, requiring agents to navigate complex environments while avoiding collisions and optimizing travel efficiency. This work addresses the limitations of existing approaches by combining Gaussian belief propagation with path integration and introducing a novel tracking factor to ensure strict adherence to global paths. The proposed method is tested with two different global path-planning approaches: rapidly exploring random trees and a structured planner, which leverages predefined lane structures to improve coordination. A simulation environment was developed to validate the proposed method across diverse scenarios, each posing unique challenges in navigation and communication. Simulation results demonstrate that the tracking factor reduces path deviation by 28% in single-agent and 16% in multi-agent scenarios, highlighting its effectiveness in improving multi-agent coordination, especially when combined with structured global planning.
|
|
16:45-16:50, Paper TuDT12.3 | Add to My Program |
Olympus: A Jumping Quadruped for Planetary Exploration Utilizing Reinforcement Learning for In-Flight Attitude Control |
|
Olsen, Jørgen Anker | Norwegian University of Science and Technology |
Malczyk, Grzegorz | NTNU |
Alexis, Kostas | NTNU - Norwegian University of Science and Technology |
Keywords: Space Robotics and Automation, Legged Robots
Abstract: Exploring planetary bodies with lower gravity, such as the moon and Mars, allows legged robots to utilize jumping as an efficient form of locomotion thus giving them a valuable advantage over traditional rovers for exploration. Motivated by this fact, this paper presents the design, simulation, and learning-based "in-flight" attitude control of Olympus, a jumping legged robot tailored to the gravity of Mars. First, the design requirements are outlined followed by detailing how simulation enabled optimizing the robot's design - from its legs to the overall configuration - towards high vertical jumping, forward jumping distance, and in-flight attitude reorientation. Subsequently, the reinforcement learning policy used to track desired in-flight attitude maneuvers is presented. Successfully crossing the sim2real gap, extensive experimental studies of attitude reorientation tests are demonstrated.
|
|
16:50-16:55, Paper TuDT12.4 | Add to My Program |
THAMP-3D: Tangent-Based Hybrid A* Motion Planning for Tethered Robots in Sloped 3D Terrains |
|
Kumar, Rahul | Northeastern University |
Chipade, Vishnu S. | University of Michigan |
Yong, Sze Zheng | Northeastern University |
Keywords: Motion and Path Planning, Constrained Motion Planning, Nonholonomic Motion Planning
Abstract: This paper introduces a novel motion planning algorithm designed for a team of curvature-constrained tethered robots operating on sloped 3D terrains. Our approach addresses the critical issues of tether-terrain interaction, robot stability, and tether entanglement avoidance. The study focuses on a two-robot system, where stability is primarily dependent on tether tension, which is in turn limited by wheel traction. We propose a path-planning method that strategically utilizes terrain features (e.g., rocks) to augment tether tension through additional friction, thereby enhancing overall system stability. Our algorithm employs a modified tangent graph as the underlying structure for a hybrid A^* search, incorporating stability constraints throughout the planning process. The proposed method is extensively evaluated through various simulation experiments, demonstrating its effectiveness in planning safe and efficient paths.
|
|
16:55-17:00, Paper TuDT12.5 | Add to My Program |
Deep Learning Based Topography Aware Gas Source Localization with Mobile Robot |
|
Tian, Changhao | Nanyang Technological University |
Wang, Annan | Nanyang Technological University |
Fan, Han | Örebro University |
Wiedemann, Thomas | German Aerospace Center (DLR) |
Luo, Yifei | Institute of Materials Research and Engineering (IMRE), Agency F |
Yang, Le | Institute of Materials Research and Engineering, Agency for Scie |
Lin, Weisi | Nanyang Technological University |
Lilienthal, Achim J. | Orebro University |
Chen, Xiaodong | Nanyang Technological University |
Keywords: Environment Monitoring and Management, Sensor Fusion, Deep Learning Methods
Abstract: Gas source localization in complex environments is critical for applications such as environmental monitoring, industrial safety, and disaster response. Traditional methods often struggle with the challenges posed by a lack of environmental topography integration, especially when interactions between wind and obstacles distort gas dispersion patterns. In this paper, we propose a deep learning-based approach, which leverages spatial context and environmental mapping to enhance gas source localization. By integrating Simultaneous Localization and Mapping (SLAM) with a U-Net-based model, our method predicts the likelihood of gas source locations by analyzing gas sensor data, wind flow, and topography of the environment represented by a 2D occupancy map. We demonstrate the efficacy of our approach using a wheeled robot equipped with a photoionization detector, a LIDAR, and an anemometer, in various scenarios with dynamic wind fields and multiple obstacles. The results show that our approach can robustly locate gas sources, even in challenging environments with fluctuating wind directions, outperforming conventional methods by utilizing topography contextual information. This study underscores the importance of topographical context in gas source localization and offers a flexible and robust solution for real-world applications. Data and code are publicly available.
|
|
17:00-17:05, Paper TuDT12.6 | Add to My Program |
Gas Source Localization in Unknown Indoor Environments Using Dual-Mode Information-Theoretic Search |
|
Kim, Seunghwan | UNIST |
Seo, Jaemin | UNIST |
Jang, Hongro | UNIST |
Kim, Changseung | Ulsan National Institute of Science and Technology |
Kim, Murim | Korea Institute of Robot and Convergence |
Pyo, Juhyun | Korea Institute of Robotics & Technology Convergence |
Oh, Hyondong | UNIST |
Keywords: Planning under Uncertainty, Environment Monitoring and Management, Robotics in Hazardous Fields
Abstract: This paper proposes a dual-mode planner for localizing gas sources using a mobile sensor in unknown indoor spaces. The complexity of indoor environments creates constraints on search paths, leading to situations where no valid paths can be generated, which are termed as dead end in this paper. The proposed dual-mode planner is designed to effectively address the dead end problem while maintaining efficient search paths. In addition, the absence of analytical dispersion models that can be used in unknown indoor environments presents another critical issue for indoor gas source localization (GSL). To address this, we present an indoor Gaussian dispersion model (IGDM) that can analytically model indoor gas dispersion without a complete map. Finally, we establish a GSL framework for indoor environments along with real-time mapping, utilizing the dual-mode planner and IGDM. This framework is validated in indoor scenarios with the realistic gas dispersion simulator. The simulation results show the high success rate of the proposed method, its ability to reduce search time, and its computational efficiency. Furthermore, through real-world experiments, we demonstrate the potential of the proposed approach as a practical solution, evidenced by its satisfactory performance
|
|
TuDT13 Regular Session, 316 |
Add to My Program |
Wearable Robotics 2 |
|
|
Chair: Rouse, Elliott | University of Michigan |
Co-Chair: Torielli, Davide | Humanoids and Human Centered Mechatronics (HHCM), Istituto Italiano Di Tecnologia |
|
16:35-16:40, Paper TuDT13.1 | Add to My Program |
Online Design Optimization of Passive Exoskeletons Using Fast Biomechanics Simulation and Reinforcement Learning |
|
Vatsal, Vighnesh | Tata Consultancy Services |
Keywords: Prosthetics and Exoskeletons, Reinforcement Learning, Modeling and Simulating Humans
Abstract: Exoskeletons are being adopted as assistive devices in industries such as manufacturing, logistics, and construction, aimed at reducing musculoskeletal loads in workers. Presently, their design process assumes the user to be quasi-static, optimizing the design parameters for reduction of human joint torques followed by fine-tuning through usability studies and physical prototyping. We present a method for optimizing passive exoskeleton designs before the physical prototyping stage for muscle effort reduction in dynamic tasks such as arm reaching and walking. We employ fast MuJoCo-based simulations of human biomechanics to compute the joint torques, muscle forces and muscle activations while executing task trajectories using pre-trained reinforcement learning models from the literature. We train another set of reinforcement learning models that minimize joint torques and muscle effort rates by varying the exoskeleton's design parameters online during the task motions. Baselines for comparison include the default designs of shoulder and walking assist exoskeletons from the literature, and designs obtained through conventional optimization techniques. In terms of muscle effort rates, the RL-based designs improved upon these baselines by an average of 3.42% and 1.96% respectively in the arm reaching task, and 6.28% and 5.81% in the walking task. Our method can be adapted to evaluate exoskeletons in real-time through motion capture, and for muscle-aware online control of powered exoskeletons.
|
|
16:40-16:45, Paper TuDT13.2 | Add to My Program |
Accurately Modeling the Output Torque and Stiffness of Ankle-Foot Orthoses with a Compliant Linkage Model |
|
Lam, David | University of Michigan - Ann Arbor |
Van Crey, Nikko | University of Michigan Ann Arbor |
Rouse, Elliott | University of Michigan |
Keywords: Wearable Robotics, Prosthetics and Exoskeletons, Physically Assistive Devices
Abstract: The stiffness of passive lower-limb exoskeletons and orthoses governs their assistance. A common practice in the design of these systems is to assume the stiffness of the device is determined only by the intended elastic element (e.g., spring), while the structural components, human attachments, and soft tissues are considered rigid. In practice, the mechanical behavior of orthoses is significantly affected by the compliance of these elements, which drastically impacts the assistance provided. In this work, we present a linkage model with compliant elements that can accurately predict the applied stiffness of ankle-foot orthoses, and retroactively estimate the stiffness of unintended spring elements from published data. The compliant model accurately predicted the torque trajectories of two published passive orthoses with modeled peak torques within 4% to 7% of measured values. In contrast, the rigid model greatly overestimated the peak torques, predicting 203% to 376% of the measured values. The compliant model also indicated that an onboard joint encoder could only measure 52% to 69% of the peak ankle angle recorded with motion capture. The compliant model was also used to reassess the stiffness range of a variable-stiffness orthosis, indicating that its adjustable range is likely 69% of rigid model predictions. Overall, this work highlights the need to consider how unmodeled compliance affects the mechanical behavior of orthoses and provides a foundation for further exploration.
|
|
16:45-16:50, Paper TuDT13.3 | Add to My Program |
Towards Neurorobotic Interface for Finger Joint Angle Estimation: A Multi-Stage CNN-LSTM Network with Transfer Learning |
|
Chen, Yun | The University of Alabama |
Zhang, Xinyu | The University of Alabama |
Li, Hui | University of Alabama |
He, Hongsheng | The University of Alabama |
Shou, Wan | University of Arkansas |
Zhang, Qiang | The University of Alabama |
Keywords: Prosthetics and Exoskeletons, Rehabilitation Robotics, Sensor Fusion
Abstract: To maximize the autonomy of individuals with upper limb amputations in daily activities, leveraging forearm muscle information to infer movement intent is a promising research direction. While current prosthetic hand technologies can utilize forearm muscle data to achieve basic movements such as grasping, accurately estimating finger joint angles remains a significant challenge. Therefore, we propose a Multi-Stage Cascade Convolutional Neural Network with a Long Short-Term Memory Network, where an upsampling module is introduced before the downsampling module to enhance model generalization. Additionally, we designed a transfer learning (TL) framework based on parameter freezing, where the pre-trained downsampling module is fixed, and only the upsampling module is updated with a small amount of out-of-distribution data to achieve TL. Furthermore, we compared the performance of unimodal and multimodal models, collecting surface electromyography (sEMG) signals, brightness mode ultrasound images (B-mode US images), and motion capture data simultaneously. The results show that on the validation set, the US image had the lowest error, while on the prediction set, the four-channel sEMG achieved the lowest error. The performance of the multimodal model in both datasets was intermediate between the unimodal models. On the prediction set, the average normalized root mean square error values for the four-channel sEMG, US images, and sensor fusion models across three subjects were 0.170, 0.203, and 0.186, respectively. By utilizing advanced sensor fusion techniques and TL, our approach can reduce the need for extensive data collection and training for new users, making prosthetic control more accessible and adaptable to individual needs.
|
|
16:50-16:55, Paper TuDT13.4 | Add to My Program |
Design, Characterization, and Validation of a Variable Stiffness Prosthetic Elbow |
|
Milazzo, Giuseppe | Istituto Italiano Di Tecnologia |
Lemerle, Simon | University of Pisa |
Grioli, Giorgio | Istituto Italiano Di Tecnologia |
Bicchi, Antonio | Fondazione Istituto Italiano Di Tecnologia |
Catalano, Manuel Giuseppe | Istituto Italiano Di Tecnologia |
Keywords: Prosthetics and Exoskeletons, Variable Stiffness Actuators, Compliant Joint/Mechanism, Mechanism Design
Abstract: Intuitively, prostheses with user-controllable stiffness could mimic the intrinsic behavior of the human musculoskeletal system, promoting safe and natural interactions and task adaptability in real-world scenarios. However, prosthetic design often disregards compliance because of the additional complexity, weight, and needed control channels. This article focuses on designing a variable stiffness actuator (VSA) with weight, size, and performance compatible with prosthetic applications, addressing its implementation for the elbow joint. While a direct biomimetic approach suggests adopting an agonist-antagonist (AA) layout to replicate the biceps and triceps brachii with elastic actuation, this solution is not optimal to accommodate the varied morphologies of residual limbs. Instead, we employed the AA layout to craft an elbow prosthesis fully contained in the user’s forearm, catering to individuals with distal transhumeral amputations. In addition, we introduce a variant of this design where the two motors are split in the upper arm and forearm to distribute mass and volume more evenly along the bionic limb, enhancing comfort for patients with more proximal amputation levels. We characterize and validate our approach, demonstrating that both architectures meet the target requirements for an elbow prosthesis. The system attains the desired 120◦ range of motion, achieves the target stiffness range of [2, 60] N ・ m/rad, and can actively lift up to 3 kg. Our novel design reduces weight by up to 50% compared to existing VSAs for elbow prostheses while achieving performance comparable to the state of the art. Case studies suggest that passive and variable compliance could enable robust and safe interactions and task adaptability in the real world.
|
|
16:55-17:00, Paper TuDT13.5 | Add to My Program |
Long-Term Upper-Limb Prosthesis Myocontrol Via High-Density sEMG and Incremental Learning |
|
Di Domenico, Dario | Italian Institute of Technology |
Boccardo, Nicolò | IIT - Istituto Italiano Di Tecnologia |
Marinelli, Andrea | University of Genova, Italian Institute of Technologies |
Canepa, Michele | Italian Institute of Technology |
Gruppioni, Emanuele | INAIL Prosthesis Center |
Laffranchi, Matteo | Istituto Italiano Di Tecnologia |
Camoriano, Raffaello | Politecnico Di Torino |
Keywords: Prosthetics and Exoskeletons, Intention Recognition, Incremental Learning
Abstract: Noninvasive human-machine interfaces such as surface electromyography (sEMG) have long been employed for controlling robotic prostheses. However, classical controllers are limited to few degrees of freedom (DoF). More recently, machine learning methods have been proposed to learn personalized controllers from user data. While promising, they often suffer from distribution shift during long-term usage, requiring costly model re-training. Moreover, most prosthetic sEMG sensors have low spatial density, which limits accuracy and the number of controllable motions. In this work, we address both challenges by introducing a novel myoelectric prosthetic system integrating a high density-sEMG (HD-sEMG) setup and incremental learning methods to accurately control 7 motions of the Hannes prosthesis. First, we present a newly designed, compact HD-sEMG interface equipped with 64 dry electrodes positioned over the forearm. Then, we introduce an efficient incremental learning system enabling model adaptation on a stream of data. We thoroughly analyze multiple learning algorithms across 7 subjects, including one with limb absence, and 6 sessions held in different days covering an extended period of several months. The size and time span of the collected data represent a relevant contribution for studying long-term myocontrol performance. Therefore, we release the DELTA dataset together with our experimental code.
|
|
17:00-17:05, Paper TuDT13.6 | Add to My Program |
ChatEMG: Synthetic Data Generation to Control a Robotic Hand Orthosis for Stroke |
|
Xu, Jingxi | Columbia University |
Wang, Runsheng | Columbia University |
Shang, Siqi | Columbia University |
Chen, Ava | Columbia University |
Winterbottom, Lauren | Columbia University |
Hsu, To-Liang | Columbia University |
Chen, Wenxi | Columbia University |
Ahmed, Khondoker | Columbia University |
La Rotta, Pedro Leandro | Columbia University |
Zhu, Xinyue | Columbia University |
Nilsen, Dawn | Columbia University |
Stein, Joel | Columbia University |
Ciocarlie, Matei | Columbia University |
Keywords: Rehabilitation Robotics, Prosthetics and Exoskeletons, Wearable Robotics
Abstract: Intent inferral on a hand orthosis for stroke patients is challenging due to the difficulty of data collection. Additionally, EMG signals exhibit significant variations across different conditions, sessions, and subjects, making it hard for classifiers to generalize. Traditional approaches require a large labeled dataset from the new condition, session, or subject to train intent classifiers; however, this data collection process is burdensome and time-consuming. In this paper, we propose ChatEMG, an autoregressive generative model that can generate synthetic EMG signals conditioned on prompts (i.e., a given sequence of EMG signals). ChatEMG enables us to collect only a small dataset from the new condition, session, or subject and expand it with synthetic samples conditioned on prompts from this new context. ChatEMG leverages a vast repository of previous data via generative training while still remaining context-specific via prompting. Our experiments show that these synthetic samples are classifier-agnostic and can improve intent inferral accuracy for different types of classifiers. We demonstrate that our complete approach can be integrated into a single patient session, including the use of the classifier for functional orthosis-assisted tasks. To the best of our knowledge, this is the first time an intent classifier trained partially on synthetic data has been deployed for functional control of an orthosis by a stroke survivor.
|
|
TuDT14 Regular Session, 402 |
Add to My Program |
Large Models for Manipulation |
|
|
Chair: Ugur, Emre | Bogazici University |
Co-Chair: Mehr, Negar | University of California Berkeley |
|
16:35-16:40, Paper TuDT14.1 | Add to My Program |
Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation |
|
Honerkamp, Daniel | Albert Ludwigs Universität Freiburg |
Büchner, Martin | University of Freiburg |
Despinoy, Fabien | Toyota Motor Europe |
Welschehold, Tim | Albert-Ludwigs-Universität Freiburg |
Valada, Abhinav | University of Freiburg |
Keywords: Mobile Manipulation, Integrated Planning and Learning, Domestic Robotics
Abstract: To fully leverage the capabilities of mobile manipulation robots, it is imperative that they are able to autonomously execute long-horizon tasks in large unexplored environments. While large language models (LLMs) have shown emergent reasoning skills on arbitrary tasks, existing work primarily concentrates on explored environments, typically focusing on either navigation or manipulation tasks in isolation. In this work, we propose MoMa-LLM, a novel approach that grounds language models within structured representations derived from open-vocabulary scene graphs, dynamically updated as the environment is explored. We tightly interleave these representations with an object-centric action space. The resulting approach, given object detections, is zero-shot, open-vocabulary, and readily extendable to a spectrum of mobile manipulation and household robotic tasks. We demonstrate the effectiveness of MoMa-LLM in a novel semantic interactive search task in large realistic indoor environments. In extensive experiments in both simulation and the real world, we show substantially improved search efficiency compared to conventional baselines and state-of-the-art approaches, as well as its applicability to more abstract tasks. We make the code publicly available at https://moma-llm.cs.uni-freiburg.de.
|
|
16:40-16:45, Paper TuDT14.2 | Add to My Program |
Here's Your PDDL Problem File! on Using VLMs for Generating Symbolic PDDL Problem Files |
|
Aregbede, Victor | Örebro University |
Forte, Paolo | Örebro University |
Gupta, Himanshu | Örebro University |
Andreasson, Henrik | Örebro University |
Köckemann, Uwe | Orebro Universitet |
Lilienthal, Achim J. | Orebro University |
Keywords: AI-Enabled Robotics, Task and Motion Planning
Abstract: Large Language Models (LLMs) excel at generating contextually relevant text but lack logical reasoning abilities. They rely on statistical patterns rather than logical inference, making them unreliable for structured decision-making. Integrating LLMs with task planning can address this limitation by combining their natural language understanding with the precise, goal-oriented reasoning of planners. This paper introduces ViPlan, a hybrid system that leverages Vision Language Models (VLMs) to extract high-level semantic information from visual and textual inputs while integrating classical planners for logical reasoning. ViPlan utilizes VLMs to generate syntactically correct and semantically meaningful PDDL problem files from images and natural language instructions, which are then processed by a task planner to generate an executable plan. The entire process is embedded within a behavior tree framework, enhancing efficiency, reactivity, replanning, modularity, and flexibility. The generation and planning capabilities of ViPlan are empirically evaluated with simulated and real-world experiments.
|
|
16:45-16:50, Paper TuDT14.3 | Add to My Program |
MuST: Multi-Head Skill Transformer for Long-Horizon Dexterous Manipulation with Skill Progress |
|
Gao, Kai | Rutgers University |
Wang, Fan | Amazon Robotics |
Aduh, Erica | Amazon Robotics |
Randle, Dylan Labatt | Amazon Robotics |
Shi, Jane | Amazon |
Keywords: Industrial Robots, Dexterous Manipulation, Learning from Demonstration
Abstract: Robot picking and packing tasks require dexterous manipulation skills, such as rearranging objects to establish a good grasping pose, or placing and pushing items to achieve tight packing. These tasks are challenging for robots due to the complexity and variability of the required actions. To tackle the difficulty of learning and executing long-horizon tasks, we propose a novel framework called the Multi-Head Skill Transformer (MuST). This model is designed to learn and sequentially chain together multiple motion primitives (skills), enabling robots to perform complex sequences of actions effectively. MuST introduces a "progress value" for each skill, guiding the robot on which skill to execute next and ensuring smooth transitions between skills. Additionally, our model is capable of expanding its skill set and managing various sequences of sub-tasks efficiently. Extensive experiments in both simulated and real-world environments demonstrate that MuST significantly enhances the robot's ability to perform long-horizon dexterous manipulation tasks. The accompanying video is available online.
|
|
16:50-16:55, Paper TuDT14.4 | Add to My Program |
CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills Using Large Language Models |
|
Ryu, Kanghyun | University of California, Berkeley |
Liao, Qiayuan | University of California, Berkeley |
Li, Zhongyu | University of California, Berkeley |
Delgosha, Payam | UIUC |
Sreenath, Koushil | University of California, Berkeley |
Mehr, Negar | University of California Berkeley |
Keywords: Incremental Learning, Continual Learning, Transfer Learning
Abstract: Curriculum learning is a training mechanism in reinforcement learning (RL) that facilitates the achievement of complex policies by progressively increasing the task difficulty during training. However, designing effective curricula for a specific task often requires extensive domain knowledge and human intervention, which limits its applicability across various domains. Our core idea is that large language models (LLMs), with their extensive training on diverse language data and ability to encapsulate world knowledge, present significant potential for efficiently breaking down tasks and decomposing skills across various robotics environments. Additionally, the demonstrated success of LLMs in translating natural language into executable code for RL agents strengthens their role in generating task curricula. In this work, we propose CurricuLLM, which leverages the high-level planning and programming capabilities of LLMs for curriculum design, thereby enhancing the efficient learning of complex target tasks. CurricuLLM consists of: (Step 1) Generating a sequence of subtasks that aid target task learning in natural language form, (Step 2) Translating natural language description of subtasks in executable task code, including the reward code and goal distribution code, and (Step 3) Evaluating trained policies based on trajectory rollout and subtask description. We evaluate CurricuLLM in various robotics simulation environments, ranging from manipulation, navigation, and locomotion, to show that CurricuLLM can aid learning complex robot control tasks. In addition, we validate humanoid locomotion policy learned through CurricuLLM in the real-world. Project website is https://iconlab.negarmehr.com/CurricuLLM/
|
|
16:55-17:00, Paper TuDT14.5 | Add to My Program |
PUGS: Zero-Shot Physical Understanding with Gaussian Splatting |
|
Shuai, Yinghao | Tongji University |
Yu, Ran | Tsinghua University |
Chen, Yuantao | Xi'an University of Architecture and Technology |
Jiang, Zijian | Tongji University |
Song, Xiaowei | Tongji University |
Wang, Nan | Tongji University |
Zheng, Jv | Lightwheel.AI |
Ma, Jianzhu | Tsinghua University |
Yang, Meng | MGI |
Wang, Zhicheng | Tongji University |
Ding, Wenbo | Tsinghua University |
Zhao, Hao | Tsinghua University |
Keywords: Contact Modeling, Semantic Scene Understanding
Abstract: Current robotic systems can understand the categories and poses of objects well. But understanding physical properties like mass, friction, and hardness, in the wild, remains challenging. We propose a new method that reconstructs 3D objects using the Gaussian splatting representation and predicts various physical properties in a zero-shot manner. We propose two techniques during the reconstruction phase: a geometry-aware regularization loss function to improve the shape quality and a region-aware feature contrastive loss function to promote region affinity. Two other new techniques are designed during inference: a feature-based property propagation module and a volume integration module tailored for the Gaussian representation. Our framework is named as zero-shot physical understanding with Gaussian splatting, or PUGS. PUGS achieves new state-of-the-art results on the standard benchmark of ABO-500 mass prediction. We provide extensive quantitative ablations and qualitative visualization to demonstrate the mechanism of our designs. We show the proposed methodology can help address challenging real-world grasping tasks. Our codes, data, and models are available at https://github.com/EverNorif/PUGS
|
|
17:00-17:05, Paper TuDT14.6 | Add to My Program |
ViewInfer3D: 3D Visual Grounding Based on Embodied Viewpoint Inference |
|
Geng, Liang | Beijing University of Posts and Telecommunications, Shijiazhuang |
Yin, Jianqin | Beijing University of Posts and Telecommunications |
Keywords: Embodied Cognitive Science, Human-Robot Collaboration, Intention Recognition
Abstract: 3D Visual Grounding (3D VG) is a fundamental task in embodied intelligence, which entails robots interpreting natural language descriptions to locate objects within 3D environments. The complexity of this task emerges as robots perceive the spatial relationships of objects differently depending on their observational viewpoints. In this work, we propose ViewInfer3D, a framework that leverages Large Language Models (LLMs) to infer embodied viewpoints, thereby avoiding incorrect observational viewpoints. To enhance the reliability and speed of reasoning from embodied viewpoints, we have designed three sub-strategies: constructing a hierarchical 3D scene graph, implementing embodied viewpoint parsing, and applying scene graph reasoning. Through extensive experiments, we demonstrate that this framework can improve performance in 3D Visual Grounding tasks through embodied viewpoint reasoning. Our framework achieves the best performance among all zeroshot methods on the ScanRefer and Nr3D/Sr3D datasets, without significantly increasing inference time.
|
|
TuDT15 Regular Session, 403 |
Add to My Program |
Surgical Robotics: Planning |
|
|
Chair: Howe, Robert D. | Harvard University |
Co-Chair: Bano, Sophia | University College London |
|
16:35-16:40, Paper TuDT15.1 | Add to My Program |
Image-Guided Surgical Planning for Percutaneous Nephrolithotomy Using CTRs: A Phantom-Based Study |
|
Pedrosa, Filipe | Western University |
Feizi, Navid | Brigham and Women's Hospital |
Sacco, Dianne | Harvard University |
Patel, Rajni | University of Western Ontario |
Jayender, Jagadeesan | Harvard Medical School, Brigham and Women's Hospital |
Keywords: Surgical Robotics: Planning, Surgical Robotics: Steerable Catheters/Needles, Medical Robots and Systems
Abstract: In this paper, we validate the effectiveness of the optimal planning algorithms we have developed for devising surgical plans for Percutaneous Nephrolithotomy (PCNL) using patient-specific Concentric-Tube Robots (CTRs). To do so, we built a life-sized phantom model of the right hemithorax, replicating the anatomy of a patient who suffered from kidney stone and underwent conventional PCNL. Two-dimensional CT scans of the phantom model and its 3D reconstruction enabled the creation of a surgical plan using our planning algorithms based on a puncture into the mid-pole of the kidney. This was compared with two other percutaneous tracts involving punctures into the lower and upper calyces for comparison. The optimal mid-pole plan achieved 84% stone coverage, significantly outperforming the lower pole (58%) and upper pole (45%) plans. These results validate the effectiveness of the algorithms and align with simulation-based findings from previous studies, which reported an average volume coverage of 81.6±19.6% in clinical cases.
|
|
16:40-16:45, Paper TuDT15.2 | Add to My Program |
Vision-Based Automatic Control of a Surgical Robot for Posterior Segment Ophthalmic Surgery (I) |
|
Wang, Ning | Xi'an Jiaotong University |
Zhang, Xiaodong | Xi'an Jiaotong University |
Bano, Sophia | University College London |
Stoyanov, Danail | University College London |
Zhang, Hongbing | The First Affiliated Hospital of Northwestern University |
Stilli, Agostino | University College London |
Keywords: Medical Robots and Systems, Surgical Robotics: Planning, Vision-Based Navigation
Abstract: In ophthalmic surgery, especially in posterior segment procedures, clinicians face significant challenges, like the inherent tremor of the surgeon’s arm, restricted visibility, and heavy reliance on the surgeon’s skills for precise control of hand-held tools during micro-surgical movements. Automatic control of robotic-assisted ophthalmic surgical systems has the potential to overcome these challenges, simplifying complex surgical procedures. This paper proposes a novel image-guided automatic control method for an Ophthalmic micro-Surgical Robot (OmSR), specifically designed for posterior segment eye surgery. The method relies on forceps shadow tracking. The paper introduces a tip detection network (Net-SR), which accurately calculates the coordinates of the Tips of Surgical Forceps (ToSF) and Tips of Shadow (ToS) to enable automatic navigation. Additionally, through the Non-Uniform Rational B-Spline (NURBS) curve interpolation and speed look-ahead algorithm, dense and time-continuous data points are obtained to improve control accuracy and smoothness. The accuracy of the Net-SR network and motion of the ToSF, and the effectiveness of the proposed automatic controller are experimentally evaluated. Results demonstrate a significant 98.21% improvement in the Net-SR network accuracy over the normal keypoint detection network. The use of the speed look-ahead algorithm leads to a notable 41.7% improvement in optimal speed, and the ToSF successfully reaches the target lesion.
|
|
16:45-16:50, Paper TuDT15.3 | Add to My Program |
ETSM: Automating Dissection Trajectory Suggestion and Confidence Map-Based Safety Margin Prediction for Robot-Assisted Endoscopic Submucosal Dissection |
|
Xu, Mengya | National University of Singapore |
Mo, Wenjin | Sun Yat-Sen University |
Wang, Guankun | The Chinese University of Hong Kong |
Gao, Huxin | National University of Singapore |
Wang, An | The Chinese University of Hong Kong |
Bai, Long | The Chinese University of Hong Kong |
Li, Zhen | Qilu Hospital of Shandong University |
Yang, Xiaoxiao | Qilu Hospital of Shandong University |
Ren, Hongliang | Chinese Univ Hong Kong (CUHK) & National Univ Singapore(NUS) |
Keywords: Surgical Robotics: Planning, Data Sets for Robotic Vision, AI-Enabled Robotics
Abstract: Robot-assisted Endoscopic Submucosal Dissection (ESD) improves the surgical procedure by providing a more comprehensive view through advanced robotic instruments and bimanual operation, thereby enhancing dissection efficiency and accuracy. Accurate prediction of dissection trajectories is crucial for better decision-making, reducing intraoperative errors, and improving surgical training. Nevertheless, predicting these trajectories is challenging due to variable tumor margins and dynamic visual conditions. To address this issue, we create the ESD Trajectory and Confidence Map-based Safety Margin (ETSM) dataset with 1849 short clips, focusing on submucosal dissection with a dual-arm robotic system. We also introduce a framework that combines optimal dissection trajectory prediction with a confidence map-based safety margin, providing a more secure and intelligent decision-making tool to minimize surgical risks for ESD procedures. Additionally, we propose the Regression-based Confidence Map Prediction Network (RCMNet), which utilizes a regression approach to predict confidence maps for dissection areas, thereby delineating various levels of safety margins. We evaluate our RCMNet using three distinct experimental setups: in-domain evaluation, robustness assessment, and out-of-domain evaluation. Experimental results show that our approach excels in the confidence map-based safety margin prediction task, achieving a mean absolute error (MAE) of only 3.18. To the best of our knowledge, this is the first study to apply a regression approach for visual guidance concerning delineating varying safety levels of dissection areas. Our approach bridges gaps in current research by improving prediction accuracy and enhancing the safety of the dissection process, showing great clinical significance in practice. The dataset and code will be made available.
|
|
16:50-16:55, Paper TuDT15.4 | Add to My Program |
Partial-To-Full Registration Based on Gradient-SDF for Computer-Assisted Orthopedic Surgery |
|
Li, Tiancheng | University of Technology Sydney |
Walker, Peter | Concord Repatriation General Hospital |
Danial, Hammoud | Concord Repatriation General Hospital |
Zhao, Liang | The University of Edinburgh |
Huang, Shoudong | University of Technology, Sydney |
Keywords: Surgical Robotics: Planning
Abstract: In computer-assisted orthopedic surgery (CAOS), accurate pre-operative to intra-operative bone registration is an essential and critical requirement for providing navigational guidance. This registration process is challenging since the intra-operative 3D points are sparse, only partially overlapped with the pre-operative model, and disturbed by noise and outliers. The commonly used method in current state-of-the-art orthopedic robotic system is bony landmarks based registration, but it is very time-consuming for the surgeons. To address these issues, we propose a novel partial-to-full registration framework based on gradient-SDF for CAOS. The simulation experiments using bone models from publicly available datasets and the phantom experiments performed under both optical tracking and electromagnetic tracking systems demonstrate that the proposed method can provide more accurate results than standard benchmarks and be robust to 90% outliers. Importantly, our method achieves convergence in less than 1 second in real scenarios and mean target registration error values as low as 2.198 mm for the entire bone model. Finally, it only requires random acquisition of points for registration by moving a surgical probe over the bone surface without correspondence with any specific bony landmarks, thus showing significant potential clinical value. The code of the framework is available.
|
|
16:55-17:00, Paper TuDT15.5 | Add to My Program |
Sampling-Based Model Predictive Control for Volumetric Ablation in Robotic Laser Surgery |
|
Wang, Vincent | Duke University |
Prakash, Ravi | Duke University |
Oca, Siobhan | Duke University |
LoCicero, Ethan | Duke University |
Codd, Patrick | Duke University |
Bridgeman, Leila | Duke University |
Keywords: Surgical Robotics: Planning, Constrained Motion Planning, Integrated Planning and Control
Abstract: Laser-based surgical ablation relies heavily on surgeon involvement, restricting precision to the limits of human error and perception. The interaction between laser and tissue is governed by various laser parameters that control the laser irradiance on the tissue, including the power, distance, spot size, orientation, and exposure time. This complex interaction lends itself to robotic automation, allowing the surgeon to focus on high-level tasks, such as choosing the region and method of ablation, while the lower-level ablation plan can be handled autonomously. This paper describes a sampling-based model predictive control (MPC) scheme to plan ablation sequences for arbitrary tissue volumes. Using a steady-state point ablation model to simulate a single laser-tissue interaction, a random search technique explores the reachable state space while preserving sensitive tissue regions. The sampled MPC strategy provides an ablation sequence that accounts for parameter uncertainty without violating constraints, such as avoiding nerve bundles.
|
|
17:00-17:05, Paper TuDT15.6 | Add to My Program |
SuFIA-BC: Generating High Quality Demonstration Data for Visuomotor Policy Learning in Surgical Subtasks |
|
Moghani, Masoud | University of Toronto |
Nelson, Nigel | NVIDIA |
Ghanem, Mohamed | Georgia Institute of Technology |
Diaz-Pinto, Andres | NVIDIA |
Hari, Kush | UC Berkeley |
Azizian, Mahdi | Intuitive Surgical |
Goldberg, Ken | UC Berkeley |
Huver, Sean | NVIDIA |
Garg, Animesh | Georgia Institute of Technology |
Keywords: Surgical Robotics: Planning, Surgical Robotics: Laparoscopy, Medical Robots and Systems
Abstract: Behavior cloning facilitates the learning of dexterous manipulation skills, yet the complexity of surgical environments, the difficulty and expense of obtaining patient data, and robot calibration errors present unique challenges for surgical robot learning. We provide an enhanced surgical digital twin with photorealistic human anatomical organs, integrated into a comprehensive simulator designed to generate high-quality synthetic data to solve fundamental tasks in surgical autonomy. We present SuFIA-BC: visual Behavior Cloning policies for Surgical First Interactive Autonomy Assistants. We investigate visual observation spaces including multi-view cameras and 3D visual representations extracted from a single endoscopic camera view. Through systematic evaluation, we find that the diverse set of photorealistic surgical tasks introduced in this work enables a comprehensive evaluation of prospective behavior cloning models for the unique challenges posed by surgical environments. We observe that current state-of-the-art behavior cloning techniques struggle to solve the contact-rich and complex tasks evaluated in this work, regardless of their underlying perception or control architectures. These findings highlight the importance of customizing perception pipelines and control architectures, as well as curating larger-scale synthetic datasets that meet the specific demands of surgical tasks. Project website: orbit-surgical.github.io/sufia-bc/
|
|
TuDT16 Regular Session, 404 |
Add to My Program |
Manipulation 4 |
|
|
Chair: Agrawal, Pulkit | MIT |
Co-Chair: Bauza Villalonga, Maria | Massachusetts Institute of Technology |
|
16:35-16:40, Paper TuDT16.1 | Add to My Program |
Vegetable Peeling: A Case Study in Constrained Dexterous Manipulation |
|
Chen, Tao | Massachusetts Institute of Technology |
Cousineau, Eric | Toyota Research Institute |
Kuppuswamy, Naveen | Toyota Research Institute |
Agrawal, Pulkit | MIT |
Keywords: Dexterous Manipulation, In-Hand Manipulation, Reinforcement Learning
Abstract: Recent studies have made significant progress in addressing dexterous manipulation problems, particularly in in-hand object reorientation. However, there are few existing works that explore the potential utilization of developed dexterous manipulation controllers for downstream tasks. In this study, we focus on constrained dexterous manipulation for food peeling. Food peeling presents various constraints on the reorientation controller, such as the requirement for the hand to securely hold the object after reorientation for peeling. We propose a simple system for learning a reorientation controller that facilitates the subsequent peeling task.
|
|
16:40-16:45, Paper TuDT16.2 | Add to My Program |
Prompt-Responsive Object Retrieval with Memory-Augmented Student-Teacher Learning |
|
Mosbach, Malte | University of Bonn |
Behnke, Sven | University of Bonn |
Keywords: Reinforcement Learning, Dexterous Manipulation, Grasping
Abstract: Building models responsive to input prompts represents a transformative shift in machine learning. This paradigm holds significant potential for robotics problems, such as targeted manipulation amidst clutter. In this work, we present a novel approach to combine promptable foundation models with reinforcement learning (RL), enabling robots to perform dexterous manipulation tasks in a prompt-responsive manner. Existing methods struggle to link high-level commands with fine-grained dexterous control. We address this gap with a memory-augmented student-teacher learning framework. We use the Segment-Anything 2 model as a perception backbone to infer an object of interest from user prompts. While detections are imperfect, their temporal sequence provides rich information for implicit state estimation by memory-augmented models. Our approach successfully learns prompt-responsive policies, demonstrated in picking objects from cluttered scenes. Videos and code are available at https://memory-student-teacher.github.io.
|
|
16:45-16:50, Paper TuDT16.3 | Add to My Program |
Implicit Articulated Robot Morphology Modeling with Configuration Space Neural Signed Distance Functions |
|
Chen, Yiting | Rice University |
Gao, Xiao | École Polytechnique Fédérale De Lausanne |
Yao, Kunpeng | Massachusetts Institute of Technology |
Niederhauser, Loïc | EPFL |
Bekiroglu, Yasemin | Chalmers University of Technology, University College London |
Billard, Aude | EPFL |
Keywords: Manipulation Planning, Collision Avoidance, Grasping
Abstract: In this paper, we introduce a novel approach to implicitly encode precise robot morphology using forward kinematics based on a configuration space signed distance function. Our proposed Robot Neural Distance Function (RNDF) optimizes the balance between computational efficiency and accuracy for signed distance queries conditioned on the robot's configuration for each link. Compared to the baseline method, the proposed approach achieves an 81.1% reduction in distance error while utilizing only 47.6% of model parameters. Its parallelizable and differentiable nature provides direct access to joint-space derivatives, enabling a seamless connection between robot planning in Cartesian task space and configuration space. These features make RNDF an ideal surrogate model for general robot optimization and learning in 3D spatial planning tasks. Specifically, we apply RNDF to robotic arm-hand modeling and demonstrate its potential as a core platform for whole-arm, collision-free grasp planning in cluttered environments. The code and model are available at https://github.com/robotic-manipulation/RNDF.
|
|
16:50-16:55, Paper TuDT16.4 | Add to My Program |
A Data-Efficient Progressive Learning Framework for Robot Scooping Task |
|
Wang, Shuai | Tencent |
Entang, Wang | Saarland University |
Huang, Bidan | Tencent |
Zhang, Chong | Tencent |
Wang, Wei | Harbin Institute of Technology, Shenzhen |
Zheng, Yu | Tencent |
Keywords: Manipulation Planning, Grippers and Other End-Effectors
Abstract: Robot scooping is a challenging and important task in robotic tool manipulation research due to the complex relationship between the robot, the tool, and target objects/environment. Taking into account different tools, different target objects and varying environments, the required scooping manipulation strategy usually varies greatly. Even considering a specific type of spoon, the question of how to obtain a policy model that requires less demonstration data but shows better generalization capabilities deserves further exploration. In this paper, we propose a progressive learning framework for general robot scooping tasks, which requires a limited number of demonstrations but shows promising generalization capability. We first learn a scooping policy via human demonstrations with a specific setup. We then use this as a pre-train model for reinforcement learning in a curriculum manner to achieve a scooping strategy that is generalizable to different task setups. Finally, we evaluate the capabilities of the policy with a series of experiments both in simulation and on a real robot.
|
|
16:55-17:00, Paper TuDT16.5 | Add to My Program |
Manipulability Transfer and Tracking Control: Bridging Domain Adaptation with Predictive Feasibility |
|
Gong, Yuhe | University of Nottingham |
Xing, Hao | Technical University of Munich (TUM) |
Guo, Yu | Technical University of Munich |
Figueredo, Luis | University of Nottingham (UoN) |
Keywords: Manipulation Planning, Learning from Demonstration, Human Factors and Human-in-the-Loop
Abstract: This paper introduces a novel framework for improving human-to-robot manipulability transfer and tracking in Learning by Demonstration. Our approach addresses key challenges, including manipulability ellipsoid (ME) domain adaptation between different kinematic structures, ME-IK feasibility checks and optimization across trajectories accounting for the robot's redundancy, and introducing a manipulability-aware control strategy. Leveraging a unified quadratic programming control with vector-field inequalities, our method enables robust tracking and optimization of manipulability, accommodating multiple demonstrations and the inherent variability in task execution. Experimental results demonstrate superior performance in precise tracking and force generation compared to traditional methods, highlighting the advantages of incorporating human implicit information for more effective robot control.
|
|
17:00-17:05, Paper TuDT16.6 | Add to My Program |
Adaptive Contact-Rich Manipulation through Few-Shot Imitation Learning with Force-Torque Feedback and Pre-Trained Object Representations |
|
Tsuji, Chikaha | The University of Tokyo |
Coronado, Enrique | National Institute of Advanced Industrial Science and Technology |
Osorio, Pablo | Tokyo University of Agriculture and Technology |
Venture, Gentiane | The University of Tokyo |
Keywords: Deep Learning in Grasping and Manipulation, Imitation Learning, Force Control
Abstract: Imitation learning offers a pathway for robots to perform repetitive tasks, allowing humans to focus on more engaging and meaningful activities. However, challenges arise from the need for extensive demonstrations and the disparity between training and real-world environments. This paper focuses on contact-rich tasks like wiping with soft and deformable objects, requiring adaptive force control to handle variations in wiping surface height and the sponge's physical properties. To address these challenges, we propose a novel method that integrates real-time force-torque (FT) feedback with pre-trained object representations. This approach allows robots to dynamically adjust to previously unseen changes in surface heights and sponges' physical properties. In real-world experiments, our method achieved 96% accuracy in applying the average reference force, significantly outperforming the previous method that lacked an FT feedback loop, which only achieved 4% accuracy. To evaluate the adaptability of our approach, we conducted experiments under different conditions from the training setup, involving 40 scenarios using 10 sponges with varying physical properties and 4 types of wiping surface heights, demonstrating significant improvements in the robot's adaptability by analyzing force trajectories.
|
|
TuDT17 Regular Session, 405 |
Add to My Program |
Localization 2 |
|
|
Chair: Dellaert, Frank | Georgia Institute of Technology |
Co-Chair: Kim, Jinwhan | KAIST |
|
16:35-16:40, Paper TuDT17.1 | Add to My Program |
GS-EVT: Cross-Modal Event Camera Tracking Based on Gaussian Splatting |
|
Liu, Tao | ShanghaiTech University |
Yuan, Runze | Shanghaitech University |
Ju, Yiang | Shanghaitech |
Xu, Xun | ShanghaiTech University |
Yang, Jiaqi | ShanghaiTech University |
Meng, Xiangting | ShanghaiTech University |
Lagorce, Xavier | ShanghaiTech University |
Kneip, Laurent | ShanghaiTech University |
Keywords: Localization, SLAM, Deep Learning for Visual Perception
Abstract: Reliable self-localization is a foundational skill for many intelligent mobile platforms. This paper explores the use of event cameras for motion tracking thereby providing a solution with inherent robustness under difficult dynamics and illumination. In order to circumvent the challenge of event camera-based mapping, the solution is framed in a cross-modal way. It tracks a map representation that comes directly from frame-based cameras. Specifically, the proposed method operates on top of gaussian splatting, a state-of-the-art representation that permits highly efficient and realistic novel view synthesis. The key of our approach consists of a novel pose parametrization that uses a reference pose plus first order dynamics for local differential image rendering. The latter is then compared against images of integrated events in a staggered coarse-to-fine optimization scheme. As demonstrated by our results, the realistic view rendering ability of gaussian splatting leads to stable and accurate tracking across a variety of both publicly available and newly recorded data sequences.
|
|
16:40-16:45, Paper TuDT17.2 | Add to My Program |
A Coarse-To-Fine Event-Based Framework for Camera Pose Relocalization with Spatio-Temporal Retrieval and Refinement Network |
|
Song, Yuhang | Northeastern University - China |
Zhuang, Hao | Northeastern University |
Jiang, Junjie | Northeastern University |
Liu, Zuntao | Northeastern University of China |
Fang, Zheng | Northeastern University |
Keywords: Localization, SLAM, Deep Learning for Visual Perception
Abstract: Most existing event-based camera pose relocalization (CPR) learning methods implicitly encode environmental information into network parameters to achieve end-to-end mapping from event stream to pose. However, these end-to-end CPR methods fail to utilize prior environmental information effectively. As the scale of the environment increases, the difficulty of this mapping relationship grows significantly, reducing the robustness of the end-to-end methods across different scenarios. To address the above issues, this paper proposes the first coarse-to-fine event-based CPR framework, which achieves a new paradigm from end-to-end pose regression network to a hierarchical approach. In the coarse localization stage, we effectively encode similarity features by incorporating the fine-grained temporal information, achieving accurate retrieval of nearby event stream. In the pose refinement stage, we present an Event Spatio-temporal Pose Refinement Network (ESPR-Net) based on the Recurrent Convolutional Neural Networks (RCNN) architecture, which is capable of learning more nuanced spatio-temporal features to achieve accurate regression of the relative pose. Finally, we conducted a comprehensive comparison on the IJRR and M3ED dataset, achieving state-of-the-art (SOTA) performance on both. Notably, our method attains a significant 83% performance improvement on the outdoor M3ED dataset.
|
|
16:45-16:50, Paper TuDT17.3 | Add to My Program |
Digital Beamforming Enhanced Radar Odometry |
|
Jiang, Jingqi | Imperial College London |
Xu, Shida | Imperial College London |
Zhang, Kaicheng | Imperial College London |
Wei, Jiyuan | Imperial College London |
Wang, Jingyang | Tsinghua University |
Wang, Sen | Imperial College London |
Keywords: Localization, Mapping, SLAM
Abstract: Radar has become an essential sensor for autonomous navigation, especially in challenging environments where camera and LiDAR sensors fail. 4D single-chip millimeter-wave radar systems, in particular, have drawn increasing attention thanks to their ability to provide spatial and Doppler information with low hardware cost and power consumption. However, most single-chip radar systems using traditional signal processing, such as Fast Fourier Transform, suffer from limited spatial resolution in radar detection, significantly limiting the performance of radar-based odometry and Simultaneous Localization and Mapping (SLAM) systems. In this paper, we develop a novel radar signal processing pipeline that integrates spatial domain beamforming techniques, and extend it to 3D Direction of Arrival estimation. Experiments using public datasets are conducted to evaluate and compare the performance of our proposed signal processing pipeline against traditional methodologies. These tests specifically focus on assessing structural precision across diverse scenes and measuring odometry accuracy in different radar odometry systems. This research demonstrates the feasibility of achieving more accurate radar odometry by simply replacing the standard FFT-based processing with the proposed pipeline. The codes are available at GitHub.
|
|
16:50-16:55, Paper TuDT17.4 | Add to My Program |
Fast Global Localization on Neural Radiance Field |
|
Kong, Mangyu | Yonsei University |
Lee, Jaewon | Yonsei University |
Lee, Seongwon | Kookmin University |
Kim, Euntai | Yonsei University |
Keywords: Localization, Mapping, SLAM
Abstract: Neural Radiance Fields (NeRF) presented a novel way to represent scenes, allowing for high-quality 3D reconstruction from 2D images. Following its remarkable achievements, global localization within NeRF maps is an essential task for enabling a wide range of applications. Recently, Loc-NeRF demonstrated a localization approach that combines traditional Monte Carlo Localization with NeRF, showing promising results for using NeRF as an environment map. However, despite its advancements, Loc-NeRF encounters the challenge of a time-intensive ray rendering process, which can be a significant limitation in practical applications. To address this issue, we introduce Fast Loc-NeRF, which enhances efficiency and accuracy in NeRF map-based global localization. We propose a particle rejection weighting strategy that estimates the uncertainty of particles by leveraging NeRF’s inherent characteristics and incorporates them into the particle weighting process to reject abnormal particles. Additionally, Fast Loc-NeRF employs a coarse-to-fine approach, matching rendered pixels and observed images across multiple resolutions from low to high. As a result, it speeds up the costly particle update process while enhancing precise localization results. Our Fast Loc-NeRF establishes new state-of-the-art localization performance on several benchmarks, demonstrating both its accuracy and efficiency.
|
|
16:55-17:00, Paper TuDT17.5 | Add to My Program |
Continuous-Time Radar-Inertial and Lidar-Inertial Odometry Using a Gaussian Process Motion Prior |
|
Burnett, Keenan | University of Toronto |
Schoellig, Angela P. | TU Munich |
Barfoot, Timothy | University of Toronto |
Keywords: Localization, Mapping, Range Sensing, Continuous-Time
Abstract: In this work, we demonstrate continuous-time radar-inertial and lidar-inertial odometry using a Gaussian process motion prior. Using a sparse prior, we demonstrate improved computational complexity during preintegration and interpolation. We use a white-noise-on-acceleration motion prior and treat the gyroscope as a direct measurement of the state while preintegrating accelerometer measurements to form relative velocity factors. Our odometry is implemented using sliding-window batch trajectory estimation. To our knowledge, our work is the first to demonstrate radar-inertial odometry with a spinning mechanical radar using both gyroscope and accelerometer measurements. We improve the performance of our radar odometry by 43% by incorporating an IMU. Our approach is efficient and we demonstrate real-time performance. Code for this paper can be found at: github.com/utiasASRL/steam_icp
|
|
17:00-17:05, Paper TuDT17.6 | Add to My Program |
NV-LIOM: LiDAR-Inertial Odometry and Mapping Using Normal Vectors towards Robust SLAM in Multifloor Environments |
|
Chung, Dongha | Stradvision |
Kim, Jinwhan | KAIST |
Keywords: Localization, Mapping, SLAM
Abstract: Over the last few decades, numerous LiDAR-inertial odometry (LIO) algorithms have been developed, demonstrating satisfactory performance across diverse environments. Most of these algorithms have predominantly been validated in open outdoor environments; however, they often encounter challenges in confined indoor settings. In such indoor environments, reliable point cloud registration becomes problematic due to the rapid changes in LiDAR scans and repetitive structural features like walls and stairs, particularly in multifloor buildings. In this paper, we present NV-LIOM, a normal vector-based LiDAR-inertial odometry and mapping framework focused on robust point cloud registration designed for indoor multifloor environments. Our approach extracts the normal vectors from the LiDAR scans and utilizes them for correspondence search to enhance the point cloud registration performance. To ensure robust registration, the distribution of the normal vector directions is analyzed, and situations of degeneracy are examined to adjust the matching uncertainty. Additionally, a viewpoint-based loop closure module is implemented to avoid wrong correspondences that are blocked by the walls. The proposed method is tested through public datasets and our own dataset. To contribute to the community, the code will be made public on https://github.com/dhchung/nv_liom.
|
|
TuDT18 Regular Session, 406 |
Add to My Program |
Place Recognition 2 |
|
|
Chair: Smith, Stephen L. | University of Waterloo |
Co-Chair: Aravecchia, Stephanie | Georgia Tech Lorraine - IRL 2958 GT-CNRS |
|
16:35-16:40, Paper TuDT18.1 | Add to My Program |
SALSA: Swift Adaptive Lightweight Self-Attention for Enhanced LiDAR Place Recognition |
|
Goswami, Raktim | New York University |
Patel, Naman | New York University Tandon School of Engineering |
Krishnamurthy, Prashanth | New York University Tandon School of Engineering |
Khorrami, Farshad | New York University Tandon School of Engineering |
Keywords: Localization, Deep Learning for Visual Perception, Deep Learning Methods
Abstract: Large-scale LiDAR mappings and localization leverage place recognition techniques to mitigate odometry drifts, ensuring accurate mapping. These techniques utilize scene representations from LiDAR point clouds to identify previously visited sites within a database. Local descriptors, assigned to each point within a point cloud, are aggregated to form a scene representation for the point cloud. These descriptors are also used to re-rank the retrieved point clouds based on geometric fitness scores. We propose SALSA, a novel, lightweight, and efficient framework for LiDAR place recognition. It consists of a Sphereformer backbone that uses radial window attention to enable information aggregation for sparse distant points, an adaptive self-attention layer to pool local descriptors into tokens, and a multi-layer-perceptron Mixer layer for aggregating the tokens to generate a scene descriptor. The proposed framework outperforms existing methods on various LiDAR place recognition datasets in terms of both retrieval and metric localization while operating in real-time.
|
|
16:40-16:45, Paper TuDT18.2 | Add to My Program |
HeRCULES: Heterogeneous Radar Dataset in Complex Urban Environment for Multi-Session Radar SLAM |
|
Kim, Hanjun | Seoul National University |
Jung, Minwoo | Seoul National University |
Noh, Chiyun | Seoul National University |
Jung, Sangwoo | Seoul National University |
Song, Hyunho | Seoul National University |
Yang, Wooseong | Seoul National University |
Jang, Hyesu | Seoul National University |
Kim, Ayoung | Seoul National University |
Keywords: Data Sets for SLAM, SLAM, Range Sensing
Abstract: Recently, radars have been widely featured in robotics for their robustness in challenging weather conditions. Two commonly used radar types are spinning radars and phased-array radars, each offering distinct sensor characteristics. Existing datasets typically feature only a single type of radar, leading to the development of algorithms limited to that specific kind. In this work, we highlight that combining different radar types offers complementary advantages, which can be leveraged through a heterogeneous radar dataset. Moreover, this new dataset fosters research in multi-session and multi-robot scenarios where robots are equipped with different types of radars. In this context, we introduce the HeRCULES dataset, a comprehensive, multi-modal dataset with heterogeneous radars, FMCW LiDAR, IMU, GPS, and cameras. This is the first dataset to integrate 4D radar and spinning radar alongside FMCW LiDAR, offering unparalleled localization, mapping, and place recognition capabilities. The dataset covers diverse weather and lighting conditions and a range of urban traffic scenarios, enabling a comprehensive analysis across various environments. The sequence paths with multiple revisits and ground truth pose for each sensor enhance its suitability for place recognition research. We expect the HeRCULES dataset to facilitate odometry, mapping, place recognition, and sensor fusion research. The dataset and development tools are available at https://sites.google.com/view/herculesdataset.
|
|
16:45-16:50, Paper TuDT18.3 | Add to My Program |
NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments |
|
Pan, Taiyi | New York University |
He, Junyang | University of Virginia |
Chen, Chao | New York University |
Li, Yiming | New York University |
Feng, Chen | New York University |
Keywords: Data Sets for Robotic Vision, Localization, Computer Vision for Transportation
Abstract: Visual place recognition (VPR) enables autonomous robots to identify previously visited locations, which contributes to tasks like simultaneous localization and mapping (SLAM). VPR faces challenges such as accurate image neighbor retrieval and appearance change in scenery. Event cameras, also known as dynamic vision sensors, are a new sensor modality for VPR and offer a promising solution to the challenges with their unique attributes: high temporal resolution (1MHz clock), ultra-low latency (in μs), and high dynamic range (>120dB). These attributes make event cameras less susceptible to motion blur and more robust in variable lighting conditions, making them suitable for addressing VPR challenges. However, the scarcity of event-based VPR datasets, partly due to the novelty and cost of event cameras, hampers their adoption. To fill this data gap, our paper introduces the NYC-Event-VPR dataset to the robotics and computer vision communities, featuring the Prophesee IMX636 HD event sensor (1280x720 resolution), combined with RGB camera and GPS module. It encompasses over 13 hours of geotagged event data, spanning 260 kilometers across New York City, covering diverse lighting and weather conditions, day/night scenarios, and multiple visits to various locations. Furthermore, our paper employs three frameworks to conduct generalization performance assessments, promoting innovation in event-based VPR and its integration into robotics applications.
|
|
16:50-16:55, Paper TuDT18.4 | Add to My Program |
ZeroSCD: Zero-Shot Street Scene Change Detection |
|
Kannan, Shyam Sundar | Purdue University |
Min, Byung-Cheol | Purdue University |
Keywords: Mapping
Abstract: Scene Change Detection is a challenging task in computer vision and robotics that aims to identify differences between two images of the same scene captured at different times. Traditional change detection methods rely on training models that take these image pairs as input and estimate the changes, which requires large amounts of annotated data, a costly and time-consuming process. To overcome this, we propose ZeroSCD, a zero-shot scene change detection framework that eliminates the need for training. ZeroSCD leverages pre-existing models for place recognition and semantic segmentation, utilizing their features and outputs to perform change detection. In this framework, features extracted from the place recognition model are used to estimate correspondences and detect changes between the two images. These are then combined with segmentation results from the semantic segmentation model to precisely delineate the boundaries of the detected changes. Extensive experiments on benchmark datasets demonstrate that ZeroSCD outperforms several state-of-the-art methods in change detection accuracy, despite not being trained on any of the benchmark datasets, proving its effectiveness and adaptability across different scenarios.
|
|
16:55-17:00, Paper TuDT18.5 | Add to My Program |
SPR: Single-Scan Radar Place Recognition |
|
Casado Herraez, Daniel | University of Bonn & CARIAD SE |
Chang, Le | University of Stuttgart |
Zeller, Matthias | CARIAD SE |
Wiesmann, Louis | University of Bonn |
Behley, Jens | University of Bonn |
Heidingsfeld, Michael | CARIAD SE |
Stachniss, Cyrill | University of Bonn |
Keywords: Localization, SLAM, Autonomous Vehicle Navigation
Abstract: Localization is a crucial component for the navigation of autonomous vehicles. It encompasses global localization and place recognition, allowing a system to identify locations that have been mapped or visited before. Place recognition is commonly approached using cameras or LiDARs. However, these sensors are affected by bad weather or low lighting conditions. In this paper, we exploit automotive radars to address the problem of localizing a vehicle within a map using single radar scans. The effectiveness of radars is not dependent on environmental conditions, and they provide additional information not present in LiDARs such as Doppler velocity and radar cross section. However, the sparse and noisy radar measurement makes place recognition a challenge. Recent research in automotive radars addresses the sensor's limitations by aggregating multiple radar scans and using high-dimensional scene representations. We, in contrast, propose a novel neural network architecture that focuses on each point of single radar scans, without relying on an additional odometry input for scan aggregation. We extract pointwise local and global features, resulting in a compact scene descriptor vector. Our model improves local feature extraction by estimating the importance of each point for place recognition and enhances the global descriptor by leveraging the radar cross section information provided by the sensor. We evaluate our model using nuScenes and the 4DRadarDataset, which involve 2D and 3D automotive radar sensors. Our findings illustrate that our approach achieves state-of-the-art results for single-scan place recognition using automotive radars.
|
|
17:00-17:05, Paper TuDT18.6 | Add to My Program |
Improving Visual Place Recognition Based Robot Navigation by Verifying Localization Estimates |
|
Claxton, Owen Thomas | Queensland University of Technology |
Malone, Connor | Queensland University of Technology |
Carson, Helen | Queensland University of Technology |
Ford, Jason | Queensland University of Technology |
Bolton, Gabriel Joseph | Australian National University |
Shames, Iman | The Australian National University |
Milford, Michael J | Queensland University of Technology |
Keywords: Localization, Acceptability and Trust, Vision-Based Navigation
Abstract: Visual Place Recognition (VPR) systems often have imperfect performance, affecting the `integrity' of position estimates and subsequent robot navigation decisions. Previously, SVM classifiers have been used to monitor VPR integrity. This research introduces a novel Multi-Layer Perceptron (MLP) integrity monitor which demonstrates improved performance and generalizability, removing per-environment training and reducing manual tuning requirements. We test our proposed system in extensive real-world experiments, presenting two real-time integrity-based VPR verification methods: a single-query rejection method for robot navigation to a goal zone (Experiment 1); and a history-of-queries method that takes a best, verified, match from its recent trajectory and uses an odometer to extrapolate a current position estimate (Experiment 2). Noteworthy results for Experiment 1 include a decrease in aggregate mean along-track goal error from ≈9.8m to ≈3.1m, and an increase in the aggregate rate of successful mission completion from ≈41% to ≈55%. Experiment 2 showed a decrease in aggregate mean along-track localization error from ≈2.0m to ≈0.5m, and an increase in the aggregate localization precision from ≈97% to ≈99%. Overall, our results demonstrate the practical usefulness of a VPR integrity monitor in real-world robotics to improve VPR localization and consequent navigation performance.
|
|
TuDT19 Regular Session, 407 |
Add to My Program |
Tactile Sensing and Manipulation |
|
|
Chair: Roehrbein, Florian | Chemnitz University of Technology |
Co-Chair: Wang, Chunpeng | The Robotics and AI Institute |
|
16:35-16:40, Paper TuDT19.1 | Add to My Program |
Shared Control for Cable Routing with Tactile Sensing |
|
Bao, Ange | Zhejiang Univeristy |
Zheng, Haoran | Zhejiang University |
Shi, Xiaohang | Zhejiang University |
Zhao, Pei | Zhejiang University |
Keywords: Telerobotics and Teleoperation, Dexterous Manipulation, Force and Tactile Sensing
Abstract: Multi-stage deformable linear object manipulation, such as cable routing, is the common and necessary part of human life and industry. However, autonomous robots still lack the dexterity and generalization required for these complex tasks. Direct teleoperation is an alternative approach, but the absence of reliable force and haptic feedback methods undermines its robustness and efficiency. This paper proposes a shared control method based on tactile sensing to address a multi-stage, contact-rich cable routing task. The proposed method allows human and robotic autonomy to share control of the robot platform. An action primitive vocabulary is constructed, incorporating adaptive authority allocation between human and autonomy, to generate motions for specific task stages. These allocations modulate the control weights of human and autonomy in accordance with the requirements of task stages. The method selects primitives from this vocabulary based on the tactile data and human intention. The effectiveness of our approach is demonstrated through a task involving straightening a cable and slotting it into a clip. We compare its performance with alternative methods and present that our method has a higher success rate and takes less time than direct teleoperation.
|
|
16:40-16:45, Paper TuDT19.2 | Add to My Program |
Whisker-Based Active Tactile Perception for Contour Reconstruction |
|
Dang, Yixuan | Technische Universität München |
Xu, Qinyang | TU München |
Zhang, Yu | Technical University of Munich |
Yao, Xiangtong | Technical University of Munich |
Zhang, Liding | Technical University of Munich |
Bing, Zhenshan | Technical University of Munich |
Roehrbein, Florian | Chemnitz University of Technology |
Knoll, Alois | Tech. Univ. Muenchen TUM |
Keywords: Biologically-Inspired Robots, Force and Tactile Sensing, Sensor-based Control
Abstract: Perception using whisker-inspired tactile sensors currently faces a major challenge: the lack of active control in robots based on direct contact information from the whisker. To accurately reconstruct object contours, it is crucial for the whisker sensor to continuously follow and maintain an appropriate relative touch pose on the surface. This is especially important for localization based on tip contact, which has a low tolerance for sharp surfaces and must avoid slipping into tangential contact. In this paper, we first construct a magnetically transduced whisker sensor featuring a compact and robust suspension system composed of three flexible spiral arms. We develop a method that leverages a characterized whisker deflection profile to directly extract the tip contact position using gradient descent, with a Bayesian filter applied to reduce fluctuations. We then propose an active motion control policy to maintain the optimal relative pose of the whisker sensor against the object surface. A B-Spline curve is employed to predict the local surface curvature and determine the sensor orientation. Results demonstrate that our algorithm can effectively track objects and reconstruct contours with sub-millimeter accuracy. Finally, we validate the method in simulations and real-world experiments where a robot arm drives the whisker sensor to follow the surfaces of three different objects.
|
|
16:45-16:50, Paper TuDT19.3 | Add to My Program |
CDM: Contact Diffusion Model for Multi-Contact Point Localization |
|
Han, Seo Wook | Korean Advanced Institute of Science and Technology |
Kim, Min Jun | KAIST |
Keywords: Physical Human-Robot Interaction, Probabilistic Inference
Abstract: In this paper, we propose a Contact Diffusion Model (CDM), a novel learning-based approach for multi-contact point localization. We consider a robot equipped with joint torque sensors and a force/torque sensor at the base. By leveraging a diffusion model, CDM addresses the singularity where multiple pairs of contact points and forces produce identical sensor measurements. We formulate CDM to be conditioned on past model outputs to account for the time-dependent characteristics of the multi-contact scenarios. Moreover, to effectively address the complex shape of the robot surfaces, we incorporate the signed distance field in the denoising process. Consequently, CDM can localize contacts at arbitrary locations with high accuracy. Simulation and real-world experiments demonstrate the effectiveness of the proposed method. In particular, CDM operates at 15.97ms and, in the real world, achieves an error of 0.44cm in single-contact scenarios and 1.24cm in dual-contact scenarios.
|
|
16:50-16:55, Paper TuDT19.4 | Add to My Program |
Force Admittance Control of an Underactuated Gripper with Full-State Feedback |
|
Wang, Chunpeng | Northeastern University |
Nguyen, David | Massachusetts Institute of Technology |
Teoh, Zhi Ern | Harvard University |
O'Neill, Ciarán Tomás | Massachusetts Institute of Technology |
Odhner, Lael | Boston Dynamics AI Institute |
Whitney, John Peter | Northeastern University |
Estrada, Matthew | École Polytechnique Fédérale De Lausanne |
Keywords: Haptics and Haptic Interfaces, Grippers and Other End-Effectors, Force Control
Abstract: We present admittance control and fingertip contact detection with a linkage gripper remotely driven by a pneumatic rolling diaphragm actuator. The gripper is driven by underactuated mechanisms sensorized by joint encoders in order to fully determine the gripper state. We present the modelling of the linkage and fluidic transmission, validate its ability to regulate pinch force within an RMS error well under 0.5 Newtons via admittance control, and show the ability to detect contact at targeted locations on the linkage. In addition, we demonstrate simple grasping behaviors: blindly searching for an unobstructed object and detecting object loss. Our results show that an integrative approach of instrumenting underactuated gripper mechanisms can result in a lightweight gripper that is not only mechanically adaptive but sensitive enough to react to contact events without distal sensors or vision.
|
|
16:55-17:00, Paper TuDT19.5 | Add to My Program |
GenTact Toolbox: A Computational Design Pipeline to Procedurally Generate Context-Driven 3D Printed Whole-Body Artificial Skins |
|
Kohlbrenner, Carson | University of Colorado Boulder |
Escobedo, Caleb | University of Colorado - Boulder |
Bae, S. Sandra | CU Boulder |
Dickhans, Alexander | University of Colorado Boulder |
Roncone, Alessandro | University of Colorado Boulder |
Keywords: Physical Human-Robot Interaction, Touch in HRI, Multi-Contact Whole-Body Motion Planning and Control
Abstract: Developing whole-body tactile skins for robots remains a challenging task, as existing solutions often prioritize modular, one-size-fits-all designs, which, while versatile, fail to account for the robot’s specific shape and the unique demands of its operational context. In this work, we introduce GenTact Toolbox, a computational pipeline for creating versatile whole-body tactile skins tailored to both robot shape and application domain. Our method includes procedural mesh generation for conforming to a robot’s topology, task-driven simulation to refine sensor distribution, and multi-material 3D printing for shape-agnostic fabrication. We validate our approach by creating and deploying six capacitive sensing skins on a Franka Research 3 robot arm in a human-robot interaction scenario. This work represents a shift from “one-size-fits-all” tactile sensors toward context-driven, highly adaptable designs that can be customized for a wide range of robotic systems and applications. The project website is available at https://hiro-group.ronc.one/gentacttoolbox
|
|
17:00-17:05, Paper TuDT19.6 | Add to My Program |
Human-Robot Collaborative Cable-Suspended Manipulation with Contact Distinction |
|
Cortigiani, Giovanni | University of Siena |
Malvezzi, Monica | University of Siena |
Prattichizzo, Domenico | University of Siena |
Pozzi, Maria | University of Siena |
Keywords: Human-Centered Robotics, Human-Robot Collaboration, Physical Human-Robot Interaction
Abstract: The collaborative transportation of objects between humans and robots is a fundamental task in physical human-robot interaction. Most of the literature considers the rigid co-grasping of non-deformable items in which both the human and the robot directly hold the transported object with their hands. In this paper, we implement a control strategy for the collaborative manipulation of a cable-suspended platform. The latter is an articulated and partially deformable object that can be used as a base where to place the transported object. In this way, the human and the robot are not rigidly coupled, ensuring a greater flexibility in the partners' motions and a safer interaction. However, the uncertain dynamics of the platform introduces a greater possibility of unintended collisions with external objects, which must be distinguished from contacts arising when a load is placed on or removed from the platform. This paper proposes a contact detection and distinction strategy to address this challenge. The proposed cable-suspended manipulation framework is based only on force sensing at the robot end-effector, and was tested with ten users.
|
|
TuDT20 Regular Session, 408 |
Add to My Program |
Robot Interaction Interfaces |
|
|
Chair: Kazanzides, Peter | Johns Hopkins University |
Co-Chair: Rettinger, Maximilian | Technical University of Munich |
|
16:35-16:40, Paper TuDT20.1 | Add to My Program |
Interactive Motion Planning for a 7-DOF Robot |
|
Greene, Nicholas | Johns Hopkins University |
Pryor, Will | Johns Hopkins University |
Wang, Liam | Johns Hopkins University |
Kazanzides, Peter | Johns Hopkins University |
Keywords: Telerobotics and Teleoperation, Motion and Path Planning, Human Factors and Human-in-the-Loop
Abstract: The use of robots in high-risk and extreme environments is crucial for tasks that are dangerous or inaccessible to humans and require high precision. Particularly in scenarios where the cost of failure is high, remote human teleoperation can be the preferred method of robot control due to the adaptability and high-level decision making of humans. Teleoperation brings many challenges including lack of accurate prior knowledge about the environment, limited views of the environment by on-board sensors, and especially inconsistent latency. 7-DOF (degrees of freedom) manipulators provide redundancy which can be utilized for increased flexibility in manipulation, and may be preferred to 6-DOF manipulators in many scenarios. The redundancy, however, must be considered by the teleoperation system. We present an extension to an existing Interactive Planning and Supervised Execution (IPSE) system that enables full teleoperation of a 7-DOF robot by encoding the redundant degree of freedom with a Shoulder-Elbow-Wrist (SEW) angle, which is user-manipulable via an SEW angle graph. Additionally, we introduce a novel user interface feature that encodes robot state information into a 2D image which is displayed directly on the SEW angle graph. We conduct a user-study which demonstrates that the addition of this SEW graph significantly reduces task completion time.
|
|
16:40-16:45, Paper TuDT20.2 | Add to My Program |
A Hybrid User Interface Combining AR, Desktop, and Mobile Interfaces for Enhanced Industrial Robot Programming |
|
Krieglstein, Jan | University of Stuttgart |
Kolberg, Jan | Fraunhofer IPA |
Sousa Calepso, Aimée | University of Stuttgart |
Kraus, Werner | Fraunhofer IPA |
Sedlmair, Michael | University of Stuttgart |
Keywords: Virtual Reality and Interfaces, Software Tools for Robot Programming, Assembly
Abstract: Robot programming for complex assembly tasks is challenging and demands expert knowledge. With Augmented Reality (AR), immersive 3D visualization can be placed in the robot’s intrinsic coordinate system to support robot programming. However, AR interfaces introduce usability challenges. To address these, we introduce a hybrid user interface (HUI) that combines a 2D desktop, a smartphone, and an AR head-mounted display (HMD) application, enabling operators to choose the most suitable device for each sub-task. The evaluation with an expert user study shows that an HUI can enhance efficiency and user experience by selecting the appropriate device for each sub-task. Generally, the HMD is preferred for tasks involving 3D content, the desktop for creating the program structure and parametrization, and the smartphone for mobile parametrization. However, the device selection depends on individual user characteristics and their familiarity with the devices.
|
|
16:45-16:50, Paper TuDT20.3 | Add to My Program |
Enhancing AR-To-Robot Registration Accuracy: A Comparative Study of Marker Detection Algorithms and Registration Parameters |
|
Mielke, Tonia | Otto-Von-Guericke University Magdeburg |
Heinrich, Florian | Otto-Von-Guericke University Magdeburg |
Hansen, Christian | Otto-Von-Guericke University Magdeburg |
Keywords: Virtual Reality and Interfaces, Visual Tracking
Abstract: Augmented Reality (AR) offers potential for enhancing human-robot collaboration by enabling intuitive interaction and real-time feedback. A crucial aspect of AR-robot integration is accurate spatial registration to align virtual content with the physical robotic workspace. This paper systematically investigates the effects of different tracking techniques and registration parameters on AR-to-robot registration accuracy, focusing on paired-point methods. We evaluate four marker detection algorithms - ARToolkit, Vuforia, ArUco, and retroreflective tracking - analyzing the influence of viewing distance, angle, marker size, point distance, distribution, and quantity. Our results show that ARToolkit provides the highest registration accuracy. While larger markers and positioning registration point centroids close to target locations consistently improved accuracy, other factors such as point distance and quantity were highly dependent on the tracking techniques used. Additionally, we propose an effective refinement method using point cloud registration, significantly improving accuracy by integrating data from points recorded between registration locations. These findings offer practical guidelines for enhancing AR-robot registration, with future work needed to assess the transferability to other AR devices and robots.
|
|
16:50-16:55, Paper TuDT20.4 | Add to My Program |
Sketch-MoMa: Teleoperation for Mobile Manipulator Via Interpretation of Hand-Drawn Sketches |
|
Tanada, Kosei | Toyota Motor Corporation |
Iwanaga, Yuka | Toyota Motor Corporation |
Tsuchinaga, Masayoshi | Toyota Motor Corporation |
Nakamura, Yuji | Toyota Motor Corporation |
Mori, Takemitsu | Toyota Motor Corporation |
Sakai, Remi | Aichi Institute of Technology |
Yamamoto, Takashi | Aichi Institute of Technology |
Keywords: Telerobotics and Teleoperation, Mobile Manipulation, Task Planning
Abstract: To use assistive robots in everyday life, a remote control system with common devices, such as 2D devices, is helpful to control the robots anytime and anywhere as intended. Hand-drawn sketches are one of the intuitive ways to control robots with 2D devices. However, since similar sketches have different intentions from scene to scene, existing work requires additional modalities to set the sketches’ semantics. This requires complex operations for users and leads to decreasing usability. In this paper, we propose Sketch-MoMa, a teleoperation system using user-given hand-drawn sketches as instructions to control a robot. We use Vision-Language Models (VLMs) to understand the user-given sketches superimposed on an observation image and infer drawn shapes and low-level tasks of the robot. We utilize sketches and the generated shapes for recognition and motion planning of the generated low-level tasks for precise and intuitive operations. We validate our approach using state-of-the-art VLMs with 7 tasks and 5 sketch shapes. We also demonstrate that our approach effectively specifies more detailed intentions, such as how to grasp and how much to rotate. Moreover, we show the competitive usability of our approach compared with the existing 2D interface through a user experiment with 14 participants.
|
|
16:55-17:00, Paper TuDT20.5 | Add to My Program |
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V |
|
Zhi, Peiyuan | Beijing Institute for General Artificial Intelligence |
Zhang, Zhiyuan | Tsinghua University |
Zhao, Yu | Beijing Institute for General Artificial Intelligence |
Han, Muzhi | Hillbot, Inc |
Zhang, Zeyu | Beijing Institute for General Artificial Intelligence |
Li, Zhitian | Beijing Institute for General Artificial Intelligence |
Jiao, Ziyuan | Beijing Institute for General Artificial Intelligence |
Jia, Baoxiong | Beijing Institute for General Artificial Intelligence |
Huang, Siyuan | Beijing Institute for General Artificial Intelligence |
Keywords: Domestic Robotics, Task Planning, Failure Detection and Recovery
Abstract: Autonomous robot navigation and manipulation in open environments require reasoning and replanning with closed-loop feedback. In this work, we present COME-robot, the first closed-loop robotic system utilizing the GPT-4V vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios. COME-robot incorporates two key innovative modules: (i) a multi-level open-vocabulary perception and situated reasoning module that enables effective exploration of the 3D environment and target object identification using commonsense knowledge and situated information, and (ii) an iterative closed-loop feedback and restoration mechanism that verifies task feasibility, monitors execution success, and traces failure causes across different modules for robust failure recovery. Through comprehensive experiments involving 8 challenging real-world mobile and tabletop manipulation tasks, COME-robot demonstrates a significant improvement in task success rate (~25%) compared to state-of-the-art methods. We further conduct comprehensive analyses to elucidate how COME-robot's design facilitates failure recovery, free-form instruction following, and long-horizon task planning.
|
|
17:00-17:05, Paper TuDT20.6 | Add to My Program |
Optimizing Robot Programming: Mixed Reality Gripper Control |
|
Rettinger, Maximilian | Technical University of Munich |
Hacker, Leander | Technical University of Munich (TUM) |
Wolters, Philipp | Technical University of Munich |
Rigoll, Gerhard | Technische Universität München |
Keywords: Virtual Reality and Interfaces, Industrial Robots, Design and Human Factors
Abstract: Conventional robot programming methods are complex and time-consuming for users. In recent years, alternative approaches such as mixed reality have been explored to address these challenges and optimize robot programming. While the findings of the mixed reality robot programming methods are convincing, most existing methods rely on gesture interaction for robot programming. Since controller-based interactions have proven to be more reliable, this paper examines three controller-based programming methods within a mixed reality scenario: 1) Classical Jogging, where the user positions the robot's end effector using the controller's thumbsticks, 2) Direct Control, where the controller's position and orientation directly corresponds to the end effector's, and 3) Gripper Control, where the controller is enhanced with a 3D-printed gripper attachment to grasp and release objects. A within-subjects study (n=30) was conducted to compare these methods. The findings indicate that the Gripper Control condition outperforms the others in terms of task completion time, user experience, mental demand, and task performance, while also being the preferred method. Therefore, it demonstrates promising potential as an effective and efficient approach for future robot programming. Video available at https://youtu.be/83kWr8zUFIQ.
|
|
TuDT21 Regular Session, 410 |
Add to My Program |
Reinforcement Learning 4 |
|
|
Chair: Biza, Ondrej | Robotics and AI Institute |
Co-Chair: Scheutz, Matthias | Tufts University |
|
16:35-16:40, Paper TuDT21.1 | Add to My Program |
MJPR: Multi-Modal Joint Predictive Representation in Deep Reinforcement Learning |
|
Wang, Zehan | Northwestern Polytechnical University |
He, Ziming | Northwestern Polytechnical University |
Wang, ZiJia | Northwestern Polytechnical University |
He, Hua | Northwestern Polytechnical University |
Yang, Beiya | University of Strathclyde |
Shi, Hao-Bin | Northwestern Polytechnical University, School of Computer Science |
Keywords: Reinforcement Learning, Representation Learning, Sensor Fusion
Abstract: Multi-modal reinforcement learning (RL) has been brought into focus due to its ability to provide complementary information from different sensors, enriching observations of agents. However, the introduction of multi-modal high-dimensional observations brings challenges to sample efficiency. There is a lack of research on how to efficiently obtain multi-modal latent states while encouraging them to generate complementary information. To address this, we propose a representation learning method, Multi-modal Joint Predictive Representation (MJPR), which utilizes multi-modal interactive information to predict future latent states. The joint prediction method achieves the representation training for modalities and promotes each modality to generate complementary information related to predictions of each other. In addition, we introduce multi-modal loss balancing to prompt training equilibrium and cross-modal contrastive learning (CMCL) to align the modalities for effective modal interaction. We establish the multi-modal environments in the Deepmind Control suite (DMC) and Webots and compare our method with current RL representation methods. Experimental results show that MJPR outperforms state-of-the-art methods by an average of 12.0% on six subtasks in DMC environments. It outperforms advanced methods by 16.7% and 55.4% in simple tasks and complex tasks of Webots environment, respectively. Moreover, ablation experiments are established in the DMC environment to verify the importance of each module to MJPR.
|
|
16:40-16:45, Paper TuDT21.2 | Add to My Program |
FLEX: A Framework for Learning Robot-Agnostic Force-Based Skills Involving Sustained Contact Object Manipulation |
|
Fang, Shijie | Tufts University |
Gao, Wenchang | Tufts University |
Goel, Shivam | Tufts University |
Thierauf, Christopher | Woods Hole Oceanographic Institution |
Scheutz, Matthias | Tufts University |
Sinapov, Jivko | Tufts University |
Keywords: Machine Learning for Robot Control, Reinforcement Learning, Deep Learning in Grasping and Manipulation
Abstract: Learning to manipulate objects efficiently, particularly those involving sustained contact (e.g., pushing, sliding) and articulated parts (e.g., drawers, doors), presents significant challenges. Traditional methods, such as robot-centric reinforcement learning (RL), imitation learning, and hybrid techniques, require massive training and often struggle to generalize across different objects and robot platforms. We propose a novel framework for learning object-centric manipulation policies in textit{force space}, decoupling the robot from the object. By directly applying forces to selected regions of the object, our method simplifies the action space, reduces unnecessary exploration, and decreases simulation overhead. This approach, trained in simulation on a small set of representative objects, captures object dynamics—such as joint configurations—allowing policies to generalize effectively to new, unseen objects. Decoupling these policies from robot-specific dynamics enables direct transfer to different robotic platforms (e.g., Kinova, Panda, UR5) without retraining. Our evaluations demonstrate that the method significantly outperforms baselines, achieving over an order of magnitude improvement in training efficiency compared to other state-of-the-art methods. Additionally, operating in force space enhances policy transferability across diverse robot platforms and object types. We further showcase the applicability of our method in a real-world robotic setting. Link: url{https://tufts-ai-robotics-group.github.io/FLEX/}
|
|
16:45-16:50, Paper TuDT21.3 | Add to My Program |
FLoRA: Sample-Efficient Preference-Based RL Via Low-Rank Style Adaptation of Reward Functions |
|
Marta, Daniel | KTH Royal Institute of Technology |
Holk, Simon | KTH Royal Institute of Technology |
Vasco, Miguel | KTH Royal Institute of Technology |
Lundell, Jens | Royal Institute of Technology |
Homberger, Timon | KTH Royal Institute of Technology |
Busch, Finn Lukas | KTH Royal Institute of Technology |
Andersson, Olov | KTH Royal Institute |
Kragic, Danica | KTH |
Leite, Iolanda | KTH Royal Institute of Technology |
Keywords: Reinforcement Learning, Human Factors and Human-in-the-Loop, Learning from Demonstration
Abstract: Preference-based reinforcement learning (PbRL) is a suitable approach for style adaptation of pre-trained robotic behavior: adapting the robot's policy to follow human user preferences while still being able to perform the original task. However, collecting preferences for the adaptation process in robotics is often challenging and time-consuming. In this work we explore the adaptation of pre-trained robots in the low-preference-data regime. We show that, in this regime, recent adaptation approaches suffer from catastrophic reward forgetting (CRF), where the updated reward model overfits to the new preferences, leading the agent to become unable to perform the original task. To mitigate CRF, we propose to enhance the original reward model with a small number of parameters (low-rank matrices) responsible for modeling the preference adaptation. Our evaluation shows that our method can efficiently and effectively adjust robotic behavior to human preferences across simulation benchmark tasks and multiple real-world robotic tasks. We provide videos of our results and source code at https://sites.google.com/view/preflora/.
|
|
16:50-16:55, Paper TuDT21.4 | Add to My Program |
On-Robot Reinforcement Learning with Goal-Contrastive Rewards |
|
Biza, Ondrej | Robotics and AI Institute |
Weng, Thomas | Boston Dynamics AI Institute |
Sun, Lingfeng | University of California, Berkeley |
Schmeckpeper, Karl | University of Pennslyvania |
Kelestemur, Tarik | Northeastern University |
Ma, Yecheng Jason | University of Pennsylvania |
Platt, Robert | Northeastern University |
van de Meent, Jan-Willem | University of Amsterdam |
Wong, Lawson L.S. | Northeastern University |
Keywords: Reinforcement Learning
Abstract: Reinforcement Learning (RL) has the potential to enable robots to learn from their own actions in the real world. Unfortunately, RL can be prohibitively expensive, in terms of on-robot runtime, due to inefficient exploration when learning from a sparse reward signal. Designing dense reward functions is labour-intensive and requires domain expertise. In our work, we propose Goal-Contrastive Rewards (GCR), a dense reward function learning method that can be trained on passive video demonstrations. By using videos without actions, our method is easier to scale, as we can use arbitrary videos. GCR combines two loss functions, an implicit value loss function that models how the reward increases when traversing a successful trajectory, and a goal-contrastive loss that discriminates between successful and failed trajectories. We perform experiments in simulated manipulation environments across RoboMimic and MimicGen tasks, as well as in the real world using a Franka arm and a Spot quadruped. We find that GCR leads to a more-sample efficient RL, enabling model-free RL to solve about twice as many tasks as our baseline reward learning methods. We also demonstrate positive cross-embodiment transfer from videos of people and of other robots performing a task.
|
|
16:55-17:00, Paper TuDT21.5 | Add to My Program |
Watch Less, Feel More: Sim-To-Real RL for Generalizable Articulated Object Manipulation Via Motion Adaptation and Impedance Control |
|
Do, Tan-Dzung | Peking University |
Nandiraju, Gireesh | Peking University |
Wang, Jilong | Galaxy General Robot Co., Ltd |
Wang, He | Peking University |
Keywords: Machine Learning for Robot Control, Reinforcement Learning, Compliance and Impedance Control
Abstract: Articulated object manipulation poses a unique challenge compared to rigid object manipulation as the object itself represents a dynamic environment. In this work, we present a novel RL-based pipeline equipped with variable impedance control and motion adaptation leveraging observation history for generalizable articulated object manipulation, focusing on smooth and dexterous motion during zero-shot sim-to-real transfer. To mitigate the sim-to-real gap, our pipeline diminishes reliance on vision by not leveraging the vision data feature (RGBD/pointcloud) directly as policy input but rather extracting useful low-dimensional data first via off-the-shelf modules. Additionally, we experience less sim-to-real gap by inferring object motion and its intrinsic properties via observation history as well as utilizing impedance control both in the simulation and in the real world. Furthermore, we develop a well-designed training setting with great randomization and a specialized reward system (task-aware and motion-aware) that enables multi-staged, end-to-end manipulation without heuristic motion planning. To the best of our knowledge, our policy is the first to report 84% success rate in the real world via extensive experiments with various unseen objects. Webpage: https://watch-less-feel-more.github.io/
|
|
17:00-17:05, Paper TuDT21.6 | Add to My Program |
From Imitation to Refinement -- Residual RL for Precise Assembly |
|
Ankile, Lars | Massachusetts Institute of Technology |
Simeonov, Anthony | Massachusetts Institute of Technology |
Shenfeld, Idan | MIT |
Torne Villasevil, Marcel | Stanford University |
Agrawal, Pulkit | MIT |
Keywords: Reinforcement Learning, Learning from Demonstration, Deep Learning in Grasping and Manipulation
Abstract: Recent advances in Behavior Cloning (BC) have made it easy to teach robots new tasks. However, we find that the ease of teaching comes at the cost of unreliable performance that saturates with increasing data for tasks requiring precision. The performance saturation can be attributed to two critical factors: (a) distribution shift resulting from the use of offline data and (b) the lack of closed-loop corrective control caused by action chucking (predicting a set of future actions executed open-loop) critical for BC performance. Our key insight is that by predicting action chunks, BC policies function more like trajectory "planners" than closed-loop controllers necessary for reliable execution. To address these challenges, we devise a simple yet effective method, ResiP (Residual for Precise Manipulation), that overcomes the reliability problem while retaining BC’s ease of teaching and long-horizon capabilities. ResiP augments a frozen, chunked BC model with a fully closed-loop residual policy trained with reinforcement learning (RL) that addresses distribution shifts and introduces closed-loop corrections over open-loop execution of action chunks predicted by the BC trajectory planner.
|
|
TuDT22 Regular Session, 411 |
Add to My Program |
Imitation Learning 1 |
|
|
Chair: Kuo, Yen-Ling | University of Virginia |
Co-Chair: Ramirez-Amaro, Karinne | Chalmers University of Technology |
|
16:35-16:40, Paper TuDT22.1 | Add to My Program |
Fast Policy Synthesis with Variable Noise Diffusion Models |
|
Høeg, Sigmund Hennum | Norwegian University of Science and Technology |
Du, Yilun | MIT |
Egeland, Olav | NTNU |
Keywords: Imitation Learning, Learning from Demonstration, AI-Based Methods
Abstract: Diffusion models have seen rapid adoption in robotic imitation learning, enabling autonomous execution of complex dexterous tasks. However, action synthesis is often slow, requiring many steps of iterative denoising, limiting the extent to which models can be used in tasks that require fast reactive policies. To sidestep this, recent works have explored how the distillation of the diffusion process can be used to accelerate policy synthesis. However, distillation is computationally expensive and can hurt both the accuracy and diversity of synthesized actions. We propose SDP (Streaming Diffusion Policy), an alternative method to accelerate policy synthesis, leveraging the insight that generating a partially denoised action trajectory is substantially faster than a full output action trajectory. At each observation, our approach outputs a partially denoised action trajectory with variable levels of noise corruption, where the immediate action to execute is noise-free, with subsequent actions having increasing levels of noise and uncertainty. The partially denoised action trajectory for a new observation can then be quickly generated by applying a few steps of denoising to the previously predicted noisy action trajectory (rolled over by one timestep). We illustrate the efficacy of this approach, dramatically speeding up policy synthesis while preserving performance across both simulated and real-world settings. Project website: https://streaming-diffusion-policy.github.io
|
|
16:40-16:45, Paper TuDT22.2 | Add to My Program |
Adaptive Compliance Policy: Learning Approximate Compliance for Diffusion Guided Control |
|
Hou, Yifan | Stanford University |
Liu, Zeyi | Stanford University |
Chi, Cheng | Columbia University |
Cousineau, Eric | Toyota Research Institute |
Kuppuswamy, Naveen | Toyota Research Institute |
Feng, Siyuan | Toyota Research Institute |
Burchfiel, Benjamin | Toyota Research Institute |
Song, Shuran | Stanford University |
Keywords: Imitation Learning, Deep Learning in Grasping and Manipulation, Bimanual Manipulation
Abstract: Compliance plays a crucial role in manipulation, as it balances between the concurrent control of position and force under uncertainties. Yet compliance is often overlooked by today's visuomotor policies that solely focus on position control. This paper introduces Adaptive Compliance Policy (ACP), a novel framework that learns to dynamically adjust system compliance both spatially and temporally for given manipulation tasks from human demonstrations, improving upon previous approaches that rely on pre-selected compliance parameters or assume uniform constant stiffness. However, computing full compliance parameters from human demonstrations is an ill-defined problem. Instead, we estimate an approximate compliance profile with two useful properties: avoiding large contact forces and encouraging accurate tracking. Our approach enables robots to handle complex contact-rich manipulation tasks and achieves over 50% performance improvement compared to state-of-the-art visuomotor policy methods.
|
|
16:45-16:50, Paper TuDT22.3 | Add to My Program |
Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning |
|
Wu, Zixuan | Georgia Institute of Technology |
Zaidi, Zulfiqar | Georgia Institute of Technology |
Patil, Adithya | Georgia Institute of Technology |
Xiao, Qingyu | Georgia Institute of Technology |
Gombolay, Matthew | Georgia Institute of Technology |
Keywords: Learning from Demonstration, Transfer Learning, Vision-Based Navigation
Abstract: In this paper, we propose a novel and generalizable zero-shot knowledge transfer framework that distills expert sports navigation strategies from web videos into robotic systems with adversarial constraints and out-of-distribution image trajectories. Our pipeline enables diffusion-based imitation learning by reconstructing the full 3D task space from multiple partial views, warping it into 2D image space, closing the planning loop within this 2D space, and transfer constrained motion of interest back to task space. Additionally, we demonstrate that the learned policy can serve as a local planner in conjunction with position control. We apply this framework in the wheelchair tennis navigation problem to guide the wheelchair into the ball-hitting region. Our pipeline achieves a navigation success rate of 97.67% in reaching real-world recorded tennis ball trajectories with a physical robot wheelchair, and achieve a success rate of 68.49% in a real-world, real-time experiment on a full-sized tennis court.
|
|
16:50-16:55, Paper TuDT22.4 | Add to My Program |
Diff-DAgger: Uncertainty Estimation with Diffusion Policy for Robotic Manipulation |
|
Lee, Sung-Wook | University of Virginia |
Kang, Xuhui | University of Virginia |
Kuo, Yen-Ling | University of Virginia |
Keywords: Imitation Learning, Deep Learning in Grasping and Manipulation, Learning from Demonstration
Abstract: Recently, diffusion policy has shown impressive results in handling multi-modal tasks in robotic manipulation. However, it has fundamental limitations in out-of-distribution failures that persist due to compounding errors and its limited capability to extrapolate. One way to address these limitations is robot-gated DAgger, an interactive imitation learning with a robot query system to actively seek expert help during policy rollout. While robot-gated DAgger has high potential for learning at scale, existing methods like Ensemble-DAgger struggle with highly expressive policies: They often misinterpret policy disagreements as uncertainty at multi-modal decision points. To address this problem, we introduce Diff-DAgger, an efficient robot-gated DAgger algorithm that leverages the training objective of diffusion policy. We evaluate Diff-DAgger across different robot tasks including stacking, pushing, and plugging, and show that Diff-DAgger improves the task failure prediction by 39.0%, the task completion rate by 20.6%, and reduces the wall-clock time by a factor of 7.8. We hope that this work opens up a path for efficiently incorporating expressive yet data-hungry policies into interactive robot learning settings. The project website is available at: https://diffdagger.github.io
|
|
16:55-17:00, Paper TuDT22.5 | Add to My Program |
SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation |
|
Hsu, Cheng-Chun | The University of Texas at Austin |
Wen, Bowen | NVIDIA |
Xu, Jie | NVIDIA |
Narang, Yashraj | NVIDIA |
Wang, Xiaolong | UC San Diego |
Zhu, Yuke | The University of Texas at Austin |
Biswas, Joydeep | University of Texas at Austin |
Birchfield, Stan | NVIDIA Corporation |
Keywords: Learning from Demonstration, Imitation Learning
Abstract: We introduce SPOT, an object-centric imitation learning framework. The key idea is to capture each task by an object-centric representation, specifically the SE(3) object pose trajectory relative to the target. This approach decouples embodiment actions from sensory inputs, facilitating learning from various demonstration types, including both action-based and action-less human hand demonstrations, as well as cross-embodiment generalization. Additionally, object pose trajectories inherently capture planning constraints from demonstrations without the need for manually crafted rules. To guide the robot in executing the task, the object trajectory is used to condition a diffusion policy. We show improvement compared to prior work on RLBench simulated tasks. In real-world evaluation, using only eight demonstrations shot on an iPhone, our approach completed all tasks while fully complying with task constraints. Project page: https://nvlabs.github.io/object_centric_diffusion
|
|
17:00-17:05, Paper TuDT22.6 | Add to My Program |
Imitation Learning with Limited Actions Via Diffusion Planners and Deep Koopman Controllers |
|
Bi, Jianxin | National University of Singapore |
Lin, Kelvin | National University of Singapore |
Chen, Kaiqi | National University of Singapore |
Huang, Yifei | National University of Singapore |
Soh, Harold | National University of Singapore |
Keywords: Imitation Learning, Learning from Demonstration, Machine Learning for Robot Control
Abstract: Recent advances in diffusion-based robot policies have demonstrated significant potential in imitating multi-modal behaviors. However, these approaches typically require large quantities of demonstration data paired with corresponding robot action labels, creating a substantial data collection burden. In this work, we propose a plan-then-control framework aimed at improving the action-data efficiency of inverse dynamics controllers by leveraging observational demonstration data. Specifically, we adopt a Deep Koopman Operator framework to model the dynamical system and utilize observation-only trajectories to learn a latent action representation. This latent representation can then be effectively mapped to real high-dimensional continuous actions using a linear action decoder, requiring minimal action-labeled data. Through experiments on simulated robot manipulation tasks and a real robot experiment with multi-modal expert demonstrations, we demonstrate that our approach significantly enhances action-data efficiency and achieves high task success rates with limited action data.
|
|
TuDT23 Regular Session, 412 |
Add to My Program |
Autonomous Vehicle Perception 2 |
|
|
Chair: Steckel, Jan | University of Antwerp |
Co-Chair: Waslander, Steven | University of Toronto |
|
16:35-16:40, Paper TuDT23.1 | Add to My Program |
H3O: Hyper-Efficient 3D Occupancy Prediction with Heterogeneous Supervision |
|
Shi, Yunxiao | Qualcomm AI Research |
Cai, Hong | Qualcomm Technologies Inc |
Ansari, Amin | Qualcomm Technologies, Inc |
Porikli, Fatih | Australian National University |
Keywords: Semantic Scene Understanding, Deep Learning for Visual Perception, Computer Vision for Automation
Abstract: 3D occupancy prediction has recently emerged as a new paradigm for holistic 3D scene understanding and provides valuable information for downstream planning in autonomous driving. Most existing methods, however, are computationally expensive, requiring costly attention-based 2D-3D transformation and 3D feature processing. In this paper, we present a novel 3D occupancy prediction approach, named H3O, which features highly efficient architecture designs and incurs a significantly lower computational cost as compared to the current state-of-the-art methods. In addition, to compensate for the ambiguity in ground-truth 3D occupancy labels, we advocate leveraging auxiliary tasks to complement the direct 3D supervision. In particular, we integrate multi-camera depth estimation, semantic segmentation, and surface normal estimation via differentiable volume rendering, supervised by corresponding 2D labels that introduces rich and heterogeneous supervision signals. We conduct extensive experiments on the Occ3D-nuScenes and SemanticKITTI benchmarks that demonstrate the superiority of our proposed H3O.
|
|
16:40-16:45, Paper TuDT23.2 | Add to My Program |
TrackOcc: Camera-Based 4D Panoptic Occupancy Tracking |
|
Chen, Zhuoguang | Tsinghua University |
Li, Kenan | New York University |
Yang, Xiuyu | Tsinghua University |
Jiang, Tao | Tsinghua |
Li, Yiming | New York University |
Zhao, Hang | Tsinghua University |
Keywords: Autonomous Agents, Deep Learning for Visual Perception, Semantic Scene Understanding
Abstract: Comprehensive and consistent dynamic scene understanding from camera input is essential for advanced autonomous systems. Traditional camera-based perception tasks like 3D object tracking and semantic occupancy prediction lack either spatial comprehensiveness or temporal consistency. In this work, we introduce a brand-new task, Camera-based 4D Panoptic Occupancy Tracking, which simultaneously addresses panoptic occupancy segmentation and object tracking from camera-only input. Furthermore, we propose TrackOcc, a cutting-edge approach that processes image inputs in a streaming, end-to-end manner with 4D panoptic queries to address the proposed task. Leveraging the localization-aware loss, TrackOcc enhances the accuracy of 4D panoptic occupancy tracking without bells and whistles. Experimental results demonstrate that our method achieves state-of-the-art performance on the Waymo dataset. The code will be released for future research.
|
|
16:45-16:50, Paper TuDT23.3 | Add to My Program |
OCCUQ: Exploring Efficient Uncertainty Quantification for 3D Occupancy Prediction |
|
Heidrich, Severin | RWTH Aachen University |
Beemelmanns, Till | RWTH Aachen University |
Nekrasov, Alexey | RWTH Aachen University |
Leibe, Bastian | RWTH Aachen University |
Eckstein, Lutz | Institute for Automotive Engineering, RWTH Aachen University |
Keywords: Semantic Scene Understanding, Computer Vision for Transportation, Deep Learning for Visual Perception
Abstract: Autonomous driving has the potential to significantly enhance productivity and provide numerous societal benefits. Ensuring robustness in these safety-critical systems is essential, particularly when vehicles must navigate adverse weather conditions and sensor corruptions that may not have been encountered during training. Current methods often overlook uncertainties arising from adversarial conditions or distributional shifts, limiting their real-world applicability. We propose an efficient adaptation of an uncertainty estimation technique for 3D occupancy prediction. Our method dynamically calibrates model confidence using epistemic uncertainty estimates. Our evaluation under various camera corruption scenarios, such as fog or missing cameras, demonstrates that our approach effectively quantifies epistemic uncertainty by assigning higher uncertainty values to unseen data. We introduce region-specific corruptions to simulate defects affecting only a single camera and validate our findings through both scene-level and region-level assessments. Our results show superior performance in Out-of-Distribution (OoD) detection and confidence calibration compared to common baselines such as Deep Ensembles and MC-Dropout. Our approach consistently demonstrates reliable uncertainty measures, indicating its potential for enhancing the robustness of autonomous driving systems in real-world scenarios. Code and dataset are available at https://github.com/ika-rwth-aachen/OCCUQ.
|
|
16:50-16:55, Paper TuDT23.4 | Add to My Program |
RadarMask: A Novel End-To-End Sparse Millimeter-Wave Radar Sequence Panoptic Segmentation and Tracking Method |
|
Guo, Yubo | School of Artifcial Intelligence andAutomation, Huazhong Univers |
Peng, Gang | Huazhong University of Science and Technology |
Gao, Qiang | Huazhong University of Science and Technology |
|
|
16:55-17:00, Paper TuDT23.5 | Add to My Program |
LiDAR-BIND: Multi-Modal Sensor Fusion through Shared Latent Embeddings |
|
Balemans, Niels | University of Antwerp - Imec, Faculty of Applied Engineering - I |
Anwar, Ali | University of Antwerp-Imec |
Steckel, Jan | University of Antwerp |
Mercelis, Siegfried | University of Antwerp - Imec IDLab |
Keywords: Deep Learning Methods, Sensor Fusion, SLAM
Abstract: This paper presents LiDAR-BIND, a novel sensor fusion framework aimed at enhancing the reliability and safety of autonomous vehicles (AVs) through a shared latent embedding space. With this method, the addition of different modalities, such as sonar and radar, into existing navigation setups becomes possible. These modalities offer robust performance even in challenging scenarios where optical sensors fail. Leveraging a shared latent representation space, LiDAR-BIND enables accurate modality prediction, allowing for the translation of one sensor's observations into another, thereby overcoming the limitations of depending solely on LiDAR for dense point-cloud generation. Through this, the framework facilitates the alignment of multiple sensor modalities without the need for large synchronized datasets across all sensors. We demonstrate its usability in SLAM applications, outperforming traditional LiDAR-based approaches under degraded optical conditions.
|
|
17:00-17:05, Paper TuDT23.6 | Add to My Program |
Enhancing Autonomous Navigation by Imaging Hidden Objects Using Single-Photon LiDAR |
|
Young, Aaron | MIT |
Batagoda Mudiyanselage, Nevindu | University of Wisconsin - Madison |
Zhang, Harry | University of Wisconsin-Madison |
Dave, Akshat | MIT |
Pediredla, Adithya | Dartmouth College |
Negrut, Dan | University of Wisconsin |
Raskar, Ramesh | MIT |
Keywords: Deep Learning for Visual Perception
Abstract: Robust autonomous navigation in environments with limited visibility remains a critical challenge in robotics. We present a novel approach that leverages Non-Line-of-Sight (NLOS) sensing using single-photon LiDAR to improve visibility and enhance autonomous navigation. Our method enables mobile robots to ``see around corners" by utilizing multi-bounce light information, effectively expanding their perceptual range without additional infrastructure. We propose a three-module pipeline: (1) Sensing, which captures multi-bounce histograms using SPAD-based LiDAR; (2) Perception, which estimates occupancy maps of hidden regions from these histograms using a convolutional neural network; and (3) Control, which allows a robot to follow safe paths based on the estimated occupancy. We evaluate our approach through simulations and real-world experiments on a mobile robot navigating an L-shaped corridor with hidden obstacles. Our work represents the first experimental demonstration of NLOS imaging for autonomous navigation, paving the way for safer and more efficient robotic systems operating in complex environments. We also contribute a novel dynamics-integrated transient rendering framework for simulating NLOS scenarios, facilitating future research in this domain.
|
|
TuDT24 Regular Session, 401 |
Add to My Program |
Industrial Robots |
|
|
Chair: Vanderborght, Bram | VUB |
Co-Chair: Larranaga Amilibia, Jon | Mondragon Unibertsitatea |
|
16:35-16:40, Paper TuDT24.1 | Add to My Program |
Visual-Based Forklift Learning System Enabling Zero-Shot Sim2Real without Real-World Data |
|
Oishi, Koshi | Toyota Central R&d Labs., Inc |
Kato, Teruki | Toyota Central R&D Labs., Inc |
Makino, Hiroya | Toyota Central R&D Labs., Inc |
Ito, Seigo | Toyota Central R&D Labs., Inc |
Keywords: Industrial Robots, AI-Enabled Robotics, Vision-Based Navigation
Abstract: Forklifts are used extensively in various industrial settings and are in high demand for automation. In particular, counterbalance forklifts are highly versatile and are employed in diverse scenarios. However, efforts to automate these processes are lacking, primarily owing to the absence of a safe and performance-verifiable development environment. This study proposes a learning system that combines a photorealistic digital learning environment with a 1/14-scale robotic forklift environment to address this challenge. Inspired by the training-based learning approach adopted by forklift operators, we employ an end-to-end vision-based deep reinforcement learning approach. The learning is conducted in a digitalized environment created from CAD data, making it safe and eliminating the need for real-world data. In addition, we safely validate the method in a physical setting using a 1/14-scale robotic forklift with a configuration similar to that of a real forklift. We achieved a 60% success rate in pallet loading tasks in real experiments using a robotic forklift. Our approach demonstrates zero-shot sim2real with a simple method that does not require heuristic additions. This learning-based approach is considered a first step towards the automation of counterbalance forklifts.
|
|
16:40-16:45, Paper TuDT24.2 | Add to My Program |
Strategic System Design for High Precision in Assembly Processes of CPU |
|
Yiu, Cheuk Tung Shadow | The Hong Kong University of Science and Technology |
Woo, Kam Tim | The Hong Kong University of Science and Technology |
Keywords: Computer Vision for Automation, Computer Vision for Manufacturing, Industrial Robots
Abstract: Robotic picking and placing played an essential role in Industrial 4.0 and have long been recognized as significant contributions to industrial processes. Various scenarios involve picking and placing parts for assembly in industrial production, such as assembling different electronic components in the manufacturing process. Those tasks require high precision to complete. However, achieving high precision in the assembly of CPUs poses a significant challenge, particularly when dealing with reflective surfaces. This paper presents a strategic system design tailored to address these challenges effectively. We focus on system device choice and optimizing the key parameters of the sensor system to strike a balance between device cost and the required precision. We use methods to construct the whole robot manipulation system, such as geometric segmentation, binocular vision with structure light projection and, based on 3D information, 6D pose estimation to construct the system. The results of our study demonstrate the practical applicability and benefits of this strategic system design in industrial settings. By meeting strict system accuracy requirements, our approach contributes to advancing industry practices and growing its impact on society.
|
|
16:45-16:50, Paper TuDT24.3 | Add to My Program |
The Influence of Counterbalance System on the Dynamic Characterization of Heavy Industrial Robots |
|
Urrutia, Julen | Aldakin Automation S.L |
Izquierdo, Mikel | Mondragon Unibertsitatea |
Ulacia Garmendia, Ibai | Mondragon Unibertsitatea |
Agirre, Nora | Aldakin Automation S.L |
Inziarte, Ibai | Aldakin Automation |
Larranaga Amilibia, Jon | Mondragon Unibertsitatea |
Keywords: Industrial Robots, Dynamics, Hydraulic/Pneumatic Actuators
Abstract: The precision of industrial robots is often limited by the relatively low stiffness of their joints, leading to positioning errors influenced by factors such as the mass and inertia of robotic links, external forces, and the counterbalance system (CBS). Counterbalance systems, typically consisting of hydropneumatic cylinders, are designed to reduce motor torque and assist in supporting heavier links. Traditionally, positioning errors in industrial robots have been corrected statically by determining pose-dependent stiffness values. However, recent numerical models incorpórate inertial effects to improve positioning error correction, making accurate inertial parameter identification essential. These parameters are typically unknown and must be determined experimentally. While methodologies for inertial parameter estimation have been extensively studied, none have accounted for the effect of the counterbalance system in this process. To address this gap, a methodology for estimating inertial parameters was applied to a heavy industrial robot, considering the influence of the counterbalance system. A comparative analysis with and without the counterbalance system showed that its inclusion improved joint torque calculation accuracy, showing the necessity of considering it in dynamic parameter characterization methodologies.
|
|
16:50-16:55, Paper TuDT24.4 | Add to My Program |
Deep Learning-Based Friction Compensation in Low Velocity for Enhanced Direct Teaching in Collaborative Manipulators |
|
Choi, Seohyun | UMass Amherst |
Kim, Jonghyeok | POSTECH |
Chung, Wan Kyun | POSTECH |
Keywords: Industrial Robots
Abstract: Direct teaching in collaborative manipulators, an essential method for intuitive trajectory control, faces significant challenges due to friction in robot joints. To address this, we present a novel friction compensation framework to improve direct teaching methods for robots. Our approach focuses on mitigating friction in the joints most susceptible to frictional effects, ensuring smoother and more precise motion. The proposed framework uses deep neural networks (DNN) to model the complex friction behavior. This approach circumvents the difficulties associated with traditional friction compensation model selection. We develop specific data input preprocessing algorithms that optimize friction estimation when paired with standard encoders commonly used in collaborative robots. In addition, our custom loss function is specifically designed to improve DNN training in these low-velocity regions. To evaluate the effectiveness of our framework, we conduct comprehensive ablation studies assessing the impact of two critical components: the preprocessing algorithms and the custom loss function. These studies provide insight into the contributions of each element to overall performance. Experimental validation using two 6-DoF collaborative robots demonstrates the practical applicability and effectiveness of our approach.
|
|
16:55-17:00, Paper TuDT24.5 | Add to My Program |
Fixture-Free 2D Sewing Using a Dual-Arm Manipulator System (I) |
|
Tokuda, Fuyuki | Centre for Transformative Garment Production |
Murakami, Ryo | Tohoku University |
Seino, Akira | Centre for Transformative Garment Production |
Kobayashi, Akinari | Centre for Transformative Garment Production |
Hayashibe, Mitsuhiro | Tohoku University |
Kosuge, Kazuhiro | The University of Hong Kong |
Keywords: Industrial Robots, Sensor-based Control, Dual Arm Manipulation
Abstract: We propose a fixture-free 2D sewing system using a dual-arm manipulator, i.e., the seam lines of the top and bottom fabric parts are the same. The proposed 2D sewing system sews two stacked fabric parts together along a desired seam line printed on the top fabric part without the use of a fixture. In the proposed system, the set of aligned and stacked fabric parts is held by the end-effectors of the dual-arm manipulator in coordination. The dual-arm manipulator controls the motion of the fabric parts on the flat sewing table stitch by stitch in coordination, while keeping the manipulated fabric parts flat using the internal force applied to the set of fabric parts. A novel vision-based seam line tracking control is proposed to control the motion of the set of fabric parts along the printed seam line on the top fabric part. The convergence of the tracking error is analyzed for sewing along both straight and curved seam lines and is shown to be specified by the control parameters. Sewing experiments show that the tracking error converges to zero as analyzed. The sewing experiments also show that the newly proposed trajectory generation method, which synchronizes the coordinated motion of the manipulators and the motion of the sewing needle, is essential for achieving accurate sewing.
|
|
17:00-17:05, Paper TuDT24.6 | Add to My Program |
Improving the Collision Tolerance of High-Speed Industrial Robots Via Impact-Aware Path Planning and Series Clutched Actuation |
|
Ostyn, Frederik | Ghent University |
Vanderborght, Bram | VUB |
Crevecoeur, Guillaume | Ghent University |
Keywords: Collision tolerance assessment, Motion and Path Planning, Compliant Joint/Mechanism, Industrial Robots
Abstract: Robots are more often deployed in unstructured or unpredictable environments. Particularly collisions at high speed can severely damage the drivetrains and joint bearings of robots. In order to avoid such collisions, path planners exist that adapt the robot’s original trajectory online if a collision hazard is detected. These methods require additional sensors such as cameras, are computationally costly and never flawless due to occlusions. Another approach is to incorporate a cost function that promotes collision tolerance while planning the initial trajectory. The resulting impact-aware path plan minimizes the chance of robot hardware damage if a collision would occur. Two algorithms are presented to assess collision tolerance in high-speed robots, taking into account factors such as robot pose, impact direction, and maximum intermittent loading of the gearboxes and bearings. The first algorithm is more general while the second assumes the presence of joint overload clutches that decouple upon impact. These algorithms are applied to plan an impact-aware path for a custom 6-axis series clutched actuated robot that serves as use case.
|
| |