UR 2025 Program | Tuesday July 1, 2025


TuAT1 Regular, T1: CORPS	Add to My Program
Computer Vision & SLAM

Co-Chair: Kim, Jinwhan	KAIST

10:15-10:30, Paper TuAT1.1	Add to My Program
Development of CNN-Based Terrain Classifier Using Depth Camera

Jeon, Sangha	Korea Advanced Institute of Science and Technology(KAIST)
Kim, Jung	KAIST
Keywords: Computer Vision and Visual Servoing Abstract: Humans adjust their lower limb control strategies based on the terrain they are walking on. This principle also applies to users of robotic orthoses or prostheses. However, currently commercialized robotic orthoses and prostheses do not adapt their control strategies according to terrain; instead, they apply the same walking control as if on flat ground. This leads to awkward and uncomfortable walking patterns for the wearer. The root cause of this issue is that the robot is unable to recognize and classify different terrains. To address this, we propose a terrain classification system based on stereo depth cameras. The proposed classifier categorizes the data from the stereo depth camera into five distinct terrain types: level ground, ramp ascent, ramp descent, stair ascent, and stair descent. By leveraging the 3D information provided by the stereo depth camera, the system is able to effectively differentiate between flat ground and slopes, which are challenging to distinguish using an RGB camera. The terrain classification system achieved an accuracy of 87.06%. This demonstrates that effective terrain classification can be accomplished using only the 3D data from the stereo depth camera.

10:30-10:45, Paper TuAT1.2	Add to My Program
Uncalibrated Visual Servoing with Recursive Least Squares: Adaptive Control for Cable-Driven Parallel Robots

Jenny, Jarrett-Scott	Kennesaw State University
Marshall, Matthew	Kennesaw State University
Keywords: Computer Vision and Visual Servoing, Dynamics and Control, Modeling, Identification, Calibration Abstract: Uncalibrated visual servoing (UVS) aims to adaptively control robots without explicit calibration, promising performance in unstructured or dynamic environments. In principle, integrating recursive least squares to estimate the image Jacobian online can enhance tracking accuracy and resilience to uncertainties. This study implements an RLS-based UVS framework on a three-cable parallel robot with an eye-in-hand camera. Experiments and simulations compare the adaptive approach to a baseline using a static, globally calibrated Jacobian. While the RLS method demonstrated the ability to fine-tune the Jacobian over time, steady-state accuracy and convergence speeds were often similar or inferior to the static approach, particularly under large initial offsets or poor initial conditions. These findings highlight the sensitivity of adaptive UVS to initialization quality and the complexity of achieving practical gains in noisy, nonlinear settings. They underscore the need for improved strategies, such as refined initialization or hybrid methods, to fully realize the theoretical potential of RLS-based uncalibrated servoing in challenging real-world scenarios.

10:45-11:00, Paper TuAT1.3	Add to My Program
Autonomous Integration of Bench-Top Wet Lab Equipment

Logan, Zachary	Pennsylvania State University
Goli, Mohammad	Noblis
Undieh, Kam	Noblis
Keywords: Computer Vision and Visual Servoing, Mechanism and Design, Object Recognition Abstract: Laboratory automation is an expensive and complicated endeavor with limited inflexible options for small scale labs. We developed a prototype system for tending to a bench-top centrifuge using computer vision methods for color detection and circular Hough Transforms to detect and localize centrifuge buckets. Initial results showed that the prototype is capable of automating the usage of regular bench-top lab equipment. Experimental results showed the computer vision system to have successful detection rate of 98%, a 70% success rate for removing test tubes from a centrifuge and a 95% success rate for inserting them.

11:00-11:15, Paper TuAT1.4	Add to My Program
Calibration of a Three-Axis Robot Manipulator with an Intel RealSense LiDAR Camera for Tomato Pruning

Nethala, Prasad	Texas A&M University Corpus Christi
Um, Dugan	Texas A&M University - CC
Keywords: Modeling, Identification, Calibration, Computer Vision and Visual Servoing, Actuation and Actuators Abstract: This paper presents the design, calibration, and error evaluation of a compact three-axis (X-Y-Z) robot equipped with an Intel RealSense LiDAR L515 camera. The camera’s role is to detect and provide the (x, y, z) coordinates of source and target points, so the manipulator can move its end-effector to those points. The X-axis is laid out horizontally, the Y-axis is mounted on the X-axis, and the Z-axis (end-effector) is mounted on the Y-axis, enabling it to reach a workspace of 50 cm (X-direction), 35 cm (Y-direction), and 30 cm vertically (Z-direction). The LiDAR camera offers depth perception and point cloud data to accurately locate features of the plant in three-dimensional space. We outline how we achieved an efficient camera-robot calibration process and how the system uses coordinate transformations to command the manipulator from a source to a destination with minimal error. Results show that the end-effector can consistently reach target points in the environment with a relatively small positioning error, demonstrating the feasibility of our setup.

11:15-11:30, Paper TuAT1.5	Add to My Program
VFT-LIO: Visual Feature Tracking for Robust LiDAR Inertial Odometry under Repetitive Patterns

Choi, Donghyun	Korea Advanced Institute of Science and Technology
Lee, Sangmin	Korea Advanced Institute of Science and Technology
Lee, Handong	Korea Advanced Institute of Science and Technology
Ryu, Jee-Hwan	Korea Advanced Institute of Science and Technology
Keywords: Simultaneous Localization and Mapping (SLAM), Intelligent Robotic Vehicles, Autonoums Vehicle Navigation Abstract: Recent advancements in autonomous vehicle odometry estimation have been largely driven by the integration of various sensor technologies. Among these, Light Detection and Ranging (LiDAR) and cameras play a crucial role; however, both exhibit inherent limitations. In particular, cameras, which are widely utilized, are highly susceptible to illumination changes. In contrast, LiDAR is robust to such variations, making it a powerful tool in Simultaneous Localization and Mapping (SLAM). To enhance LiDAR performance, numerous sensor fusion approaches incorporating Inertial Measurement Units (IMUs) have been proposed. Nonetheless, LiDAR-based methods still face challenges in accurately estimating vehicle states in environments with repetitive patterns. This paper introduces a novel framework to improve LiDAR odometry estimation accuracy in repetitive pattern environments by leveraging the complementary strengths of both cameras and LiDAR. Specifically, we employ a visual feature tracking-based approach that utilizes 2D range images generated from 3D point cloud data. The use of 2D projected range images enables robust feature extraction while maintaining resilience to illumination changes. The proposed method is evaluated on real-time vehicle state estimation tasks using datasets containing repetitive patterns. Experimental results demonstrate that our approach outperforms traditional LiDAR-based methods, validating the effectiveness of incorporating LiDAR vision techniques in such challenging environments.

11:30-11:45, Paper TuAT1.6	Add to My Program
Semantic Loop Closure for Reducing False Matches in SLAM

Kim, Junhyung	KAIST
Kim, Jinwhan	KAIST
Keywords: Simultaneous Localization and Mapping (SLAM), Object Recognition Abstract: This study presents a novel semantic loop closure method that emphasizes the uniqueness of objects to enhance robustness against false matching in simultaneous localization and mapping (SLAM). Loop closure, a critical technique for detecting revisits during SLAM, has traditionally relied on image feature comparison methods, such as visual bag of words. However, these approaches are highly sensitive to viewpoint, often fail to identify revisits. Semantic segmentation-based methods offer improved robustness but have predominantly focused on the geometric distribution of semantic objects, which increases susceptibility to false positives in environments with repetitive patterns. By incorporating object uniqueness into loop closure detection, the proposed method addresses these limitations effectively. Experimental results show that it achieves greater robustness against false positives than conventional semantic loop closure methods while also reducing false negatives compared to image feature-based approaches.


TuAT2 Regular, T2: ROSS	Add to My Program
Multimodal Sesning and Haptics

Chair: Myung, Hyun	KAIST (Korea Advanced Institute of Science and Technology)
Co-Chair: Friesen, Rebecca F.	Texas A&M University

10:15-10:30, Paper TuAT2.1	Add to My Program
Inception CNN-Transformer for Robust PPG-To-ECG Reconstruction

Kim, Sung Woo	ZTACOM
Lee, Jae Young	Korea Advanced Institute of Science and Technology
Kim, Jongsuk	Korea Advanced Institute of Science and Technology (KAIST)
Kim, Junmo	KAIST
Keywords: Foundations of Sensing and Estimation, AI Reasoning Methods for Robotics, Rehabilitation and Healthcare Robotics Abstract: In recent years, wearable healthcare devices and robots have become increasingly crucial in the face of population aging and the rising prevalence of chronic diseases, including cardiovascular disease, which remains a leading cause of mortality worldwide. These conditions amplified the demand for advanced, continuous monitoring of cardiovascular health. While electrocardiogram (ECG) signals offer comprehensive diagnostic insights, their reliance on multiple electrodes often limits practicality. In contrast, photoplethysmogram (PPG) signals are more convenient to acquire but lack the rich detail of ECG. In this paper, we propose a novel PPG-to-ECG reconstruction method that combines an Inception CNN for multi-scale feature extraction with a Transformer architecture for capturing global dependencies. Our proposed method achieves robust ECG signal reconstruction even under high-noise conditions by effectively preserving local morphological details and leveraging long-range contextual information. We validate the proposed approach on diverse datasets spanning everyday life and intensive care unit (ICU) settings, demonstrating high accuracy and generalizability. Experimental results indicate an RMSE of 0.26, corresponding to a 10% improvement over state-of-the-art methods. These findings highlight the feasibility of reliable, real-time ECG reconstruction from PPG signals alone, paving the way for scalable and accessible healthcare monitoring solutions in clinical and wearable contexts.

10:30-10:45, Paper TuAT2.2	Add to My Program
Extrinsic Line Contact Sensing from Visuo-Tactile Measurements

Kim, Yoonjin	Korea Advanced Institute of Science and Technology(KAIST)
Kim, Won Dong	Korea Advanced Institute of Science & Technology (KAIST)
Kim, Jung	KAIST
Keywords: Force and Tactile Sensing, Object Recognition, Contact: Modeling, Sensing and Control Abstract: As robots increasingly interact with unstructured environments, understanding extrinsic contact is crucial for precise manipulation. In particular, estimating the direction vector of an object in contact with an external surface enables tasks such as alignment, insertion, and controlled motion. We propose a novel approach to estimating the direction vector of extrinsic line contact using vision-based tactile sensors. By tracking the relative motion of a grasped object and leveraging kinematic and frictional constraints, the proposed method accurately infers the contact direction without requiring prior geometric information about the object. The system employs a high-resolution tactile sensor and a motion-tracking algorithm to extract object displacement data, which is then transformed into the world frame for robust estimation. Experimental validation with a robotic manipulator demonstrates that the proposed method achieves an overall mean angular error of 8.71˚, confirming its effectiveness in real-world applications. This approach enhances robotic perception and manipulation capabilities, making it a reliable solution for tasks that require precise handling of extrinsic line contact, such as assembly, insertion, and surface interaction.

10:45-11:00, Paper TuAT2.3	Add to My Program
Olfactory Inertial Odometry: Methodology for Effective Robot Navigation by Scent

France, Kordel	University of Texas at Dallas
Keywords: Foundations of Sensing and Estimation, Range, Sonar, GPS and Inertial Sensing, Modeling, Identification, Calibration Abstract: Olfactory navigation is one of the most primitive mechanisms of exploration used by organisms. Navigation by machine olfaction (artificial smell) is a very difficult task to both simulate and solve. With this work, we define olfactory inertial odometry (OIO), a framework for using inertial kinematics, and fast-sampling olfaction sensors to enable navigation by scent analogous to visual inertial odometry (VIO). We establish how principles from SLAM and VIO can be extrapolated to olfaction to enable real-world robotic tasks. We demonstrate OIO with three different odour localization algorithms on a real 5-DoF robot arm over an odour-tracking scenario that resembles real applications in agriculture and food quality control. Our results indicate success in establishing a baseline framework for OIO from which other research in olfactory navigation can build, and we note performance enhancements that can be made to address more complex tasks in the future

11:00-11:15, Paper TuAT2.4	Add to My Program
Soft Haptic Display Toolkit: A Low-Cost, Open-Source Approach to High Resolution Tactile Feedback

Yu, Pijuan	Texas A&M University
Urquhart, Alexis	Texas A&M University
Kawazoe, Anzu	Texas A&M University
Ferris, Thomas	University of Michigan
Hipwell, M Cynthia	Texas A&M Univeristy
Friesen, Rebecca F.	Texas A&M University
Keywords: Haptics, Soft Robotics, Actuation and Actuators Abstract: High-spatial-resolution wearable tactile arrays have drawn interest from both industry and research, thanks to their capacity for delivering detailed tactile sensations. However, investigations of human tactile perception with high-resolution tactile displays remain limited, primarily due to the high costs of multi-channel control systems and the complex fabrication required for fingertip-sized actuators. In this work, we introduce the Soft Haptic Display (SHD) toolkit, designed to enable students and researchers from diverse technical backgrounds to explore high-density tactile feedback in extended reality (XR), robotic teleoperation, braille displays, navigation aid, MR-compatible somatosensory stimulation, and remote palpation. The toolkit provides a rapid prototyping approach and real-time wireless control for a low-cost, 4×4 soft wearable fingertip tactile display with a spatial resolution of 4 mm. We characterized the display’s performance with a maximum vertical displacement of 1.8 mm, a rise time of 0.25 second, and a maximum refresh rate of 8 Hz. All materials and code are open-sourced to foster broader human tactile perception research of high-resolution haptic displays.

11:15-11:30, Paper TuAT2.5	Add to My Program
Humans and Robots, Hand-In-Hand: Using Bilateral Telepresence to Turn Robotic Hands into Wearable Haptic Exoskeletons

Kosanovic, Nicolas	University of Louisville
Chagas Vaz, Jean	University of Louisville
Keywords: Haptics, Telerobotics, Robotic Hands Abstract: Haptic feedback is pivotal to telemanipulation; it equips human operators with intuitive physical information about a distant environment. Anthropomorphic robotic hands demonstrate a similar aptitude for complex remote manipulation. Past efforts to enrich robotic hands with haptic feedback often required prohibitively expensive specialized hardware (>50,000 USD grippers and >5,000 USD gloves) that only displayed unidirectional force feedback. In this work, we present the Hand-in-Hand (HiH): an inexpensive system to realize hand telepresence with 3D force feedback via bilateral robot control. By transforming an inexpensive robotic hand (<500 USD each) into a wearable haptic exoskeleton, users can intuitively control and feel what a distant physical agent touches in real-time. Experimentation reveals: an average motion latency of 63 ms over WLAN; an average joint position tracking RMSE of 3.73 deg; and the force feedback magnitude peaking at ~6 N. Safety is passively ensured via sacrificial exoskeleton parts that prevent excessive loads from harming wearers. Issues regarding stability and transparency are partially addressed using saturated virtual friction. Nonetheless, the HiH presents a novel, intuitive, and low-cost approach to haptic telemanipulation with humanoid robotic hands.

11:30-11:45, Paper TuAT2.6	Add to My Program
Automatic LiDAR-Camera Online Calibration Monitoring and Refinement for Ground Platforms

Song, Wonho	KAIST
Kang, DongWan	Hanwhaaerospace
Myung, Hyun	KAIST (Korea Advanced Institute of Science and Technology)
Keywords: Modeling, Identification, Calibration, Multisensor Data Fusion, Wheeled Mobile Robots Abstract: Accurate LiDAR-camera extrinsic calibration is essential for perception in autonomous ground vehicles, but even a precisely known initial transformation can drift over time due to minor collisions or mechanical shifts. In this paper, we present an online calibration method that detects and corrects such drift without relying on special targets. Our approach continuously checks the existing extrinsic parameters by comparing newly estimated ground-plane parameters and edge-based reprojection errors. Whenever the reprojection error exceeds a preset threshold, a joint optimization refines the LiDAR-camera transform by incorporating both plane and edge constraints. Experimental validation in a simulation environment shows that the proposed framework detects small extrinsic misalignments promptly and effectively restores accurate sensor fusion over extended operation.


TuBT1 Regular, T1: CORPS	Add to My Program
AI & Deep Learning

Chair: Hong, Dennis	UCLA

14:10-14:25, Paper TuBT1.1	Add to My Program
Stability Ensured Deep Reinforcement Learning for Online Bin Packing

Gao, Ziyan	Japan Advanced Institute of Science and Technology
Chong, Nak Young	Japan Advanced Institute of Science and Technology
Keywords: AI Reasoning Methods for Robotics, Industrial Robots, Motion Planning and Obstacle Avoidance Abstract: The Online Bin Packing Problem (OBPP) aims to determine the optimal loading position for each incoming item to maximize bin utilization, a critical challenge in various industrial applications. While many studies have focused on learning-based policies and heuristic approaches to enhance packing efficiency, stability constraints have largely been overlooked. In this work, we propose a computationally efficient method to validate stable loading positions for incoming items without requiring exact knowledge of their physical properties, such as mass. Our approach leverages the concept of Load-Bearable Convex Polygons (LBCPs), which provide substantial support forces to ensure structural stability. We further integrate our static stability validation framework into a state-of-the-art deep reinforcement learning (DRL) model, guiding it to learn physics feasible packing strategies. Experimental results demonstrate that our stability-aware DRL model achieves comparable packing efficiency while ensuring robust bin stability, offering a significant advancement in practical OBPP applications.

14:25-14:40, Paper TuBT1.2	Add to My Program
RoboCSKBench: Benchmarking Embodied Commonsense Capabilities of Large Language Models

Töberg, Jan-Philipp	Bielefeld University
Kenneweg, Svenja	University of Bielefeld
Cimiano, Philipp	CITEC
Keywords: AI Reasoning Methods for Robotics, Performance Evaluation and Optimization Abstract: Robots and intelligent assistants are increasingly performing tasks autonomously in household settings. While navigation-based tasks are straightforward, open-ended tasks require reasoning on the basis of commonsense knowledge. Towards fostering the development of systems that can use and reason on commonsense knowledge to tackle open-ended tasks, we propose RoboCSKBench, a natural language-based multi-task benchmark to assess embodied commonsense knowledge capabilities of agents and systems interacting in dynamic household environments. Our benchmark combines various resources (e.g. knowledge graphs, manipulation benchmarks, crowdsourcing) to provide data for five different, commonly encountered household tasks: Tidy Up, Tool Usage, Meta-Reasoning, Table Setting and Procedural Knowledge. Each task comprises of data and evaluation metrics supporting the evaluation of systems inducing embodied commonsense knowledge. While the benchmark consists of five tasks at the time of writing, it can be extended with further tasks in the future. Building on the benchmark, we assess the capabilities of three state-of-the-art large language models on the various tasks of the benchmark. Our results indicate a diverse foundation, with model performance varying across different tasks, suggesting that no single model clearly outperforms the others. On the contrary, all models exhibit limitations, leaving room for further optimization and improvement.

14:40-14:55, Paper TuBT1.3	Add to My Program
Monte Carlo Beam Search for Actor-Critic Reinforcement Learning in Continuous Control

Alzorgan, Hazim	Clemson University
Razi, Abolfazl	Clemson University
Keywords: AI Reasoning Methods for Robotics, Performance Evaluation and Optimization, Robotic Systems Architectures and Programming Abstract: Actor-critic methods, like Twin Delayed Deep Deterministic Policy Gradient (TD3), depend on basic noise-based exploration, which can result in less than optimal policy convergence. In this study, we introduce Monte Carlo Beam Search (MCBS), a new hybrid method that combines beam search and Monte Carlo rollouts with TD3 to improve exploration and action selection. MCBS produces several candidate actions around the policy’s output and assesses them through short-horizon rollouts, enabling the agent to make better-informed choices. We test MCBS across various continuous-control benchmarks, including HalfCheetah-v4, Walker2d-v5, and Swimmer-v5, showing enhanced sample efficiency and performance compared to standard TD3 and other baseline methods like SAC, PPO, and A2C. Our findings emphasize MCBS’s capability to enhance policy learning through structured look-ahead search while ensuring computational efficiency. Additionally, we offer a detailed analysis of crucial hyperparameters, such as beam width and rollout depth, and explore adaptive strategies to optimize MCBS for complex control tasks. Our method shows a higher convergence rate across different environments compared to TD3, SAC, PPO, and A2C. For instance, we achieved 90% of the maximum achievable reward within around 200 thousand timesteps compared to 400 thousand timesteps for the second-best method.

14:55-15:10, Paper TuBT1.4	Add to My Program
Integrating Depth Priors into PixelNeRF for Enhanced Global Guidance

You, Eunyoung	KIST
Hu, Sumin	StradVision, Inc
Kim, Jeewon	KIST
Seo, Hyunseok	Korea Institute of Science and Technology (KIST)
Keywords: Deep Learning for Visual Percepton, Computer Vision and Visual Servoing Abstract: We present an improved pixelNeRF adaptation that integrates depth features, image global features, and the SIREN module to improve novel view synthesis (NVS) in sparse-view settings. While pixelNeRF effectively reconstructs scenes from limited views, it struggles with blurriness and insufficient high-frequency details, particularly in occluded regions. To address this, depth features provide structural guidance, global features enhance consistency in sparsely observed areas, and SIREN improves fine-grained detail capture. Experiments on ILSH and DTU datasets show that the proposed method reduces blurriness and improves occlusion reconstruction, validated through qualitative and quantitative evaluations. Future work should refine feature aggregation to better handle complex environments, demonstrating the potential of multi-scale feature integration for NVS in sparse-view scenarios.

15:10-15:25, Paper TuBT1.5	Add to My Program
Evaluating Data Collection Methods for Vision-Based Learning in Humanoid Robot Soccer

Hong, Ethan	Geffen Academy at UCLA
Ahn, Ji Sung	University of California, Los Angeles
Lee, Sangjoon	University of California, Los Angeles
Flores Alvarez, Arturo Moises	University of California, Los Angeles
Wang, Shiqi	UCLA
Hong, Dennis	UCLA
Keywords: Deep Learning for Visual Percepton, Humanoids, Object Recognition Abstract: Humanoid robots competing in RoboCup (an inter- national robot soccer competition) must perceive their environments under highly dynamic and often unpredictable conditions. Requiring a vision system for localization and path planning, teams typically need to collect training image data for the machine learning object detection model. This paper presents an empirical comparison of three commonly used methods - handheld camera, gimbal-mounted systems, and rollable tripods. Experimental results show that the gimbal-mounted approach consistently outperforms the other two, yielding superior precision and recall when detecting crucial soccer field landmarks and the game ball. These results highlight that data collection methods which effectively simulate the robot’s actual visual experience during gameplay lead to more robust and reliable vision models. Inspired by these findings, we implemented a revised vision pipeline in our latest humanoid robot, ARTEMIS, capturing data directly from its onboard stereo camera system. This approach proved instrumental in achieving reliable object detection in real-time, even under severe motion blur and degraded field conditions during the dynamic matches, resulting in our eventual victory. We discuss the advantage and limitations of each data collection method, emphasizing the critical role of matching the robot’s real-world visual experience to achieve champion-level performance.

15:25-15:40, Paper TuBT1.6	Add to My Program
Efficient and Robust Pallet Detection Using RGB-D Sensors and Synthetic Data Augmentation

Son, Jungho	NAVIFRA Corp
Maeng, Woohyun	NAVIFRA Corp
Kim, Yeongsoo	NAVIFRA Corp
Jung, Minkuk	NAVIFRA Corp
Keywords: Deep Learning for Visual Percepton, Multisensor Data Fusion, Object Recognition Abstract: This paper proposes a deep learning-based approach for pallet detection and pose estimation using RGB-D sensors, addressing key challenges in forklift-operated logistics and manufacturing environments. Existing research often faces limitations such as environmental constraints and data scarcity. To overcome these issues, our method combines synthetic data generation with Diffusion model-based data augmentation techniques, generating diverse pallet datasets with varying sizes and shapes using advanced 3D simulation tools. A modified YOLOv11 network is introduced to detect pallet bounding boxes and estimate the center and corner points of cuboids. The network is trained on the generated data and evaluated using real-world RGB-D data in real-time. The proposed approach significantly improves the precision of pallet detection and pose estimation in complex environments, contributing to logistics automation and offering broader implications for various industrial applications.

15:40-15:55, Paper TuBT1.7	Add to My Program
A Walk to Remember: MLLM Memory-Driven Visual Navigation

Vitharana, Sandun Sampath	Texas A&M University
Mallikarachchi, Sanjaya	Texas A&M University
Hatharasin Gamage, Chamika Wijayagrahi	Coventry University
Abizov, Nuralem	International Engineering and Technological University
Amanzhol, Bektemessov	International Engineering and Technological University
Ibrayev, Aidos	International Engineering Technological University
Godage, Isuru S.	Texas A&M University
Keywords: Motion Planning and Obstacle Avoidance, Robotic Systems Architectures and Programming, AI Reasoning Methods for Robotics Abstract: This paper presents a novel framework for memory-based navigation for terrestrial robots, utilizing a customized multimodal large language model (MLLM) to interpret visual inputs and generate navigation commands. The system employs a Unitree GO1 robot equipped with a camera to capture environmental images, which are processed by the customized MLLM for navigation. By leveraging a memory-based approach, the robot efficiently reuses previously traversed paths, reducing the need for re-exploration and enhancing navigation efficiency. The hybrid controller in this work features a deliberation unit and a reactive controller for high-level commands and robot alignment. Experimental validation in a hallway-like environment demonstrates that memory-driven navigation improves path retracing and overall performance.


TuBT2 Regular, T2: ROSS	Add to My Program
Autonoums System/Vehicle Navigation


14:10-14:25, Paper TuBT2.1	Add to My Program
Navigation and Optimized Support Rover

Hidalgo, Daniel	Texas A&M
Keywords: Autonoums Vehicle Navigation Abstract: This paper presents the design and development of a low-maintenance, modular lunar rover for long-term service and maintenance operations on the Moon. The rover is engineered for extended durability (~1 year) low-backlash cycloidal drive systems, and solar-resistant materials to minimize wear and maintenance. Key features include a 6- degree-of-freedom (DOF) robotic arm capable of lifting 50 kg (81 N under lunar gravity). To ensure autonomous operation in a dynamic lunar environment, the rover integrates advanced sensors, including a Zed Mini Camera, Micro Lidar sensors, and IMU modules, controlled by a Jetson Nano-based system. Autonomous navigation and payload manipulation are enabled through computer vision models trained on convolutional neural networks (CNNs), with PID-controlled dynamic adjustments. The rover is powered by a Lithium Iron Phosphate battery, allowing for hot-swappable operation and extended activity. Additionally, Gazebo simulations will be used to refine control algorithms before deployment. This design aims to enhance long- term operational efficiency and reduce logistical costs for sustained lunar exploration.

14:25-14:40, Paper TuBT2.2	Add to My Program
Passive Camera-Based Vehicle Orientation Estimation for Autonomous Systems

Boncek, John	United States Military Academy
Engel, Ronan	United States Military Academy
Lowrance, Christopher John	United States Military Academy
Salmento, Joseph	United States Military Academy
Keywords: Autonoums Vehicle Navigation, Deep Learning for Visual Percepton, Computer Vision and Visual Servoing Abstract: A critical task for autonomous vehicles is not only detecting surrounding objects but also predicting their orientation and heading, particularly for nearby vehicles. Understanding the direction a neighboring vehicle is facing and its potential trajectory enables autonomous systems to make informed navigation decisions, avoid collisions, and operate safely in traffic. This paper develops and evaluates two passive-sensing methods that leverage machine learning to predict vehicle orientation from two-dimensional (2D) images. The first approach employs deep learning to classify a vehicle’s orientation into one of eight general directions. The second method utilizes a cascaded approach with object detection, bounding box area analysis, and regression to predict a more precise orientation of the vehicle. A dataset of 1,424 labeled images, each annotated with the relative heading difference of the distant vehicle with respect to the observing vehicle, was collected and used in both approaches. The findings of this research indicate that the second, cascaded approach is particularly effective, achieving a Mean Absolute Error of 5.06 degrees, demonstrating its potential for robust vehicle tracking applications.

14:40-14:55, Paper TuBT2.3	Add to My Program
Robust and Precise Autonomous Mobile Robot System for Misplaced Rack Center Navigation in Small to Medium-Sized Warehouses

Park, Jin Ho	KAIST
Lee, Jeong tae	Korea Advanced Institute of Science and Technology
Yang, Seunghoon	KAIST
Choi, Keun Ha	Korea Advanced Institute of Science and Technology
Kim, Kyung-Soo	KAIST(Korea Advanced Institute of Science and Technology)
Keywords: Autonoums Vehicle Navigation, Industrial Robots, Wheeled Mobile Robots Abstract: This paper presents a robust and precise Autonomous Mobile Robot (AMR) system tailored for small to medium-sized warehouses, specifically addressing the challenges of navigating misplaced racks in constrained environments. We developed a compact AMR capable of effective operation in spatially limited environments, optimizing it for the smaller racks commonly found in these warehouses. The key contribution is the development of a precise rack center navigation algorithm using 2D LiDAR, allowing the AMR to detect rack legs, navigate between them, and accurately position itself for collision-free and efficient lifting operations, even when racks are misaligned. Experimental validation in real warehouse settings demonstrates the system's superior performance in terms of path stability, error tolerance, and adaptability, proving its potential to significantly enhance logistics automation in small to medium-sized warehouses.

14:55-15:10, Paper TuBT2.4	Add to My Program
Autonomous Multi-Floor and Narrow Indoor Exploration Using Multi-Criteria Decision-Making Approach

Roh, Juhyeong	Korea Advanced Institute of Science and Technology (KAIST)
Kim, Jinwon	KRM
Park, Chanwoo	KAIST
Shim, David Hyunchul	KAIST
Keywords: Autonoums Vehicle Navigation, Search and Rescue Robotics, Robotic Systems Architectures and Programming Abstract: Exploring narrow and multi-floor indoor environments presents significant challenges due to their confined spaces and structural complexity. This paper introduces a novel exploration strategy based on Multi-Criteria Decision-Making (MCDM) to address these challenges effectively. The proposed algorithm dynamically manages exploration coverage and utilizes ray-casting techniques tailored to the size of the environment to identify exploration candidates efficiently. Additionally, it incorporates a robust staircase detection and traversal mechanism using 3D LiDAR sensors, enabling seamless exploration across multiple floors. Experimental validation in real-world maze-like environments demonstrated the algorithm's capability to thoroughly explore confined spaces, detect and overcome staircases, and resume exploration on new floors. The results confirmed the algorithm's effectiveness in achieving comprehensive exploration and robust performance, validated through experiments conducted under diverse and challenging environmental conditions.

15:10-15:25, Paper TuBT2.5	Add to My Program
Auditory Perception in Open-Source Driving Simulator CARLA

Priest, Erik	Texas A&M University
Cassity, Alyssa	Texas A&M University
Nina, Kang	Goldman Sachs
Tao, Jian	Texas A&M University
Keywords: Autonoums Vehicle Navigation, World Modelling, Foundations of Sensing and Estimation Abstract: This paper presents a proof of concept for integrating real-time audio classification into autonomous vehicle systems using the open-source autonomous driving simulator, CARLA. With the increasing need for autonomous vehicles to operate safely in their environment, the addition of auditory signal perception (e.g., emergency sirens) can improve navigation in urban settings. Using support vector machines, we developed a binary classification model capable of identifying sirens within the simulated environment, allowing simulated autonomous vehicles to detect and respond to emergency signals. Using CARLA, our open source framework, we can synthesize realistic urban driving scenarios, collecting and processing audio data. Results demonstrate the potential of auditory perception systems in autonomous vehicle development, improving the vehicle’s situational awareness and paving the way for further developments in audio-responsive autonomous driving technology. This research showcases the flexibility of CARLA for auditory simulation and highlights the potential of audio detection to improve autonomous vehicle safety and environmental awareness. To facilitate further research and development in this area, we have made our implementation open-source.

15:25-15:40, Paper TuBT2.6	Add to My Program
Navigation in Underground Parking Lot by Semantic Occupancy Grid Map Prediction

Lee, Handong	Korea Advanced Institute of Science and Technology
Choi, Donghyun	Korea Advanced Institute of Science and Technology
Lee, Sangmin	Korea Advanced Institute of Science and Technology
Ryu, Jee-Hwan	Korea Advanced Institute of Science and Technology
Song, Heejin	Korea Advanced Institute of Science and Technology
Keywords: Intelligent Robotic Vehicles, Autonoums Vehicle Navigation Abstract: Autonomous navigation in complex environments, such as underground parking lots, poses significant challenges due to the absence of prior maps and reliance on real-time perception. This paper proposes a comprehensive framework for mapless exploration and navigation, integrating a Semantic Occupancy Grid Network (SoCNet) and a Navigator module. SoCNet predicts local semantic occupancy grids from sensor data, achieving an average pixel accuracy of 0.8234 on test maps. The Navigator constructs and updates a global semantic occupancy grid using a Bayesian approach, incorporating distancebased weighting to mitigate uncertainties in distant predictions. Exploration targets, termed Topology Nodes, are sampled and scored based on proximity and semantic likelihood, guiding the robot via an A* planner. Evaluated in the Isaac Sim environment across multiple trials, the framework successfully explored all spaces in 11 out of 12 trials (91.7% success rate), despite occasional revisits to known areas. Our results demonstrate robust adaptability and efficiency, offering a practical solution for autonomous navigation in unmapped, dynamic settings.

15:40-15:55, Paper TuBT2.7	Add to My Program
H* Algorithm: Enhancing A* for Smoother and More Feasible Robot Navigation

Gabrielov, Sergei	University of Houston-Downtown
Izadi, Azadeh	UHD
Keywords: Motion Planning and Obstacle Avoidance, Autonoums Vehicle Navigation, Intelligent Robotic Vehicles Abstract: This paper presents H, an enhanced A algorithm designed to improve the realism and efficiency of pathfinding for real-world agents, especially in robotics. H* uses hexagonal grid decomposition and an improved geometric heuristic to model traversable space more effectively. By incorporating the agent’s velocity and turning radius, H* generates smoother, more realistic paths and detects sharp turns tailored to kinematic constraints. These enhancements aim to reduce travel time while maintaining feasibility for physical agents. Experimental results demonstrate the algorithm’s practical benefits in realistic scenarios.


TuCT1 Regular, T1: CORPS	Add to My Program
Cognitive Human-Robot Interaction & Learning from Humans

Chair: Robinette, Paul	University of Massachusetts Lowell
Co-Chair: Chirikjian, Gregory	University of Delaware

16:10-16:25, Paper TuCT1.1	Add to My Program
Parameter-Free Segmentation of Robot Movements with Cross-Correlation Using Different Similarity Metrics

Carvalho, Wendy	University of Massachusetts Lowell
Elkoudi, Meriem	University of Massachusetts Lowell
Hertel, Brendan	University of Masssachusetts Lowell
Azadeh, Reza	University of Massachusetts Lowell
Keywords: Learning from Humans, Cognitive Human Robot Interaction Abstract: Often, robots are asked to execute primitive movements, whether as a single action or in a series of actions representing a larger, more complex task. These movements can be learned in many ways, but a common one is from demonstrations presented to the robot by a teacher. However, these demonstrations are not always simple movements themselves, and complex demonstrations must be broken down, or segmented, into primitive movements. In this work, we present a parameter-free approach to segmentation using techniques inspired by autocorrelation and cross-correlation from signal processing. In cross-correlation, a representative signal is found in some larger, more complex signal by correlating the representative signal with the larger signal. This same idea can be applied to segmenting robot motion and demonstrations, provided with a representative motion primitive. This results in a fast and accurate segmentation, which does not take any parameters. One of the main contributions of this paper is modification of the cross-correlation process by employing similarity metrics that can capture features specific to robot movements. To validate our framework, we conduct several experiments of complex tasks both in simulation and in real-world. We also evaluate the effectiveness of our segmentation framework by comparing various similarity metrics.

16:25-16:40, Paper TuCT1.2	Add to My Program
Visuotactile Diffusion Policy: Automated Failure Recovery in Assistive Tasks with Tactile Manipulation Using Imitation Learning

Sharma, Sagar	George Washington University
Kim, Yonghyun	George Washington University
Park, Chung Hyuk	George Washington University
Keywords: Learning from Humans, Grasping, Manipulation Planning and Control Abstract: Imitation learning is a powerful technique for teaching autonomous agents a variety of tasks. However, many imitation learning algorithms suffer from the fundamental problem of error propagation. Typically, imitation learning is framed as a supervised learning problem; however, during live operation, the agent operates in a state space generated by its own actions instead of expert demonstrations. This co-variate shift can lead to agents either failing to complete designated tasks, taking dangerous actions which can lead to damage or harm, or simply copying expert behavior without completing the task (copycat problem). The problems of co-variate shift and expert copying is especially important in safety-critical environments, such as assistive robotics, where simple errors can bear high costs. While algorithmic solutions exist for this problem, these often rely on constraining the agent policy or simply improving decision making without resolving the issue of error propagation. To address these challenges, we present an efficient solution for resolving error propagation by introducing the tactile modailty in fine-grained grasping and manipulation tasks. To this end, we present Visuotactile Diffusion Policy, a policy learning framework which allows for automated failure recovery. The purpose of this study was to explore an out-of-the-box technique for preventing co-variate shift and the copycat problem, especially in grasping tasks for assistive robotic systems. The primary contribution of this study was to develop a robotic system and policy learning framework capable of automated failure recovery in grasping tasks. Along with this, we demonstrate how tactile sensing can lead to more robust robotic control policies and provide a generalizable solution for co-variate shift in robotic control tasks.

16:40-16:55, Paper TuCT1.3	Add to My Program
Robot Learning Using Multi-Coordinate Elastic Maps

Hertel, Brendan	University of Masssachusetts Lowell
Azadeh, Reza	University of Massachusetts Lowell
Keywords: Learning from Humans, Physical and Cognitive Human-Robot Interaction Abstract: To learn manipulation skills, robots need to understand the features of those skills. An easy way for robots to learn is through Learning from Demonstration (LfD), where the robot learns a skill from an expert demonstrator. While the main features of a skill might be captured in one differential coordinate (i.e., Cartesian), they could have meaning in other coordinates. For example, an important feature of a skill may be its shape or velocity profile, which are difficult to discover in Cartesian differential coordinate. In this work, we present a method which enables robots to learn skills from human demonstrations via encoding these skills into various differential coordinates, then determines the importance of each coordinate to reproduce the skill. We also introduce a modified form of Elastic Maps that includes multiple differential coordinates, combining statistical modeling of skills in these differential coordinate spaces. Elastic Maps, which are flexible and fast to compute, allow for the incorporation of several different types of constraints and the use of any number of demonstrations. Additionally, we propose methods for auto-tuning several parameters associated with the modified Elastic Map formulation. We validate our approach in several simulated experiments and a real-world writing task with a UR5e manipulator arm.

16:55-17:10, Paper TuCT1.4	Add to My Program
Learning Dexterous Robot Hand Control by Imitating Human Hands

Yan, Yashuai	Vienna University of Technology
Lee, Dongheui	Technische Universität Wien (TU Wien)
Keywords: Learning from Humans, Robotic Hands, Grasping Abstract: This paper presents an unsupervised deep-learning method for controlling dexterous robotic hands by mimicking human hand motions. We introduce a cross-domain similarity metric to capture the spatial and kinematic relationships between human and robot hands. Using this metric, our approach learns a shared latent space that aligns motion features across the two embodiments. The framework consists of two separate encoders that map human and robot hand data into the latent space, along with a robot decoder that generates feasible robot hand motions. During inference, only the human hand encoder and the robot hand decoder are needed to seamlessly retarget human hand movements to the robot hand, enabling scalable and flexible motion retargeting without requiring paired human-robot data. To demonstrate real-world applicability, we integrate our motion retargeting system with Mediapipe, a human hand pose estimator, enabling real-time robotic hand control from RGB video input.

17:10-17:25, Paper TuCT1.5	Add to My Program
Put a Lid on It! a Learning-Free Method to Cap a Container Via Physical Simulations

Su, Wan	National University of Singapore
Zhu, Rong	National University of Singapore
Chen, Ziao	National University of Singapore
Li, Wanze	Nation University of Singapore
Chirikjian, Gregory	University of Delaware
Keywords: Manipulation Planning and Control, Learning from Humans Abstract: Putting a lid on a container is a very common and crucial task in daily life. In this paper, we propose a novel learning-free method for robots to `imagine' the matching of unseen open containers and lids via physical simulation. After reconstructing the objects with the Gaussian process distance field, open container imagination is conducted initially to generate the footprint. The footprint is analyzed to determine the relative pose between the container and lid. Then the optimal matching pose is identified by carrying out matching imagination. Experiments were conducted in real-world scenarios. Our method outperforms an LLM-based method, reaching a success rate of 90% when the robot autonomously puts lids on containers. The code is available on our GitHub page.

17:25-17:40, Paper TuCT1.6	Add to My Program
The Role of Drone Appearance and Capability in Human Trust: A Comparative vs. Isolated Analysis

Rezaei Khavas, Zahra	Umass Lowell
Majdi, Amin	University of Massachusetts Lowell
Azadeh, Reza	University of Massachusetts Lowell
Robinette, Paul	University of Massachusetts Lowell
Keywords: Physical and Cognitive Human-Robot Interaction, Cognitive Human Robot Interaction, Search and Rescue Robotics Abstract: Advancements in autonomy, navigation, and sensor systems have led to the increased deployment of drones in high-risk applications, such as mapping operations. While drones can mitigate the dangers associated with these missions, human trust in drones is essential for their effective use. This study explores the influence of key factors, including drone appearance, capabilities, protective cage, and noise on human trust. We implemented two different methodologies: (1) an isolated approach, in which the effects of each drone’s appearance and capabilities were studied independently, and (2) a comparative approach, where participants evaluated two drones with different appearances and capabilities in direct comparison. The experiment results indicate that while drone appearance influences human trust, drone capabilities have a significantly greater impact. Additionally, comparing the two methodologies revealed that the comparative approach directs participants’ attention more effectively to the studied factors. One of the primary contributions of this work is the introduction of a tested and effective method to investigate the effects of different factors on trust between humans and drones. Our findings can help robot designers develop drones suited for diverse scenarios by identifying the features most valued by human operators.


TuCT2 Regular, T2: ROSS	Add to My Program
Industrial & Field Robotics

Chair: Alam, Tauhidul	Lamar University

16:10-16:25, Paper TuCT2.1	Add to My Program
Soft Rod-Like Robot Crawling: Overcoming Tube Boundaries for Enhanced Navigation

Wang, Zhengguang	Southern Methodist University
Khedewy, Amira	Southern Methodist University (SMU)
Lee, Sangwon	Southern Methodist University
Duygu, Yasin Cagatay	Southern Methodist University
Kim, MinJun	Southern Methodist University
Keywords: Soft Robotics, Actuation and Actuators, Biomimetic and Bioinspired Robots Abstract: This paper presents a motion control strategy for a magnetically actuated rod-like soft robot, enabling it to transition from free space into a tube and overcome structural boundaries. Although soft robots have shown promise in navigating constrained environments, initiating entry into narrow channels and transitioning across sudden changes in geometry, such as the boundary between open space and a confined tube, remains a significant challenge. To address this limitation, we introduce a crawling-based transition mechanism that allows the soft robot to actively engage with the tube entrance, facilitating smooth entry without relying on external guiding structures. We developed a modeling framework to analyze the propulsion dynamics, considering elastic energy storage, frictional interaction, and magnetic actuation. Experiments confirmed that propulsion efficiency depends on the stored elastic energy and how it is released. Our results suggest that controlled oscillatory actuation enables the robot to overcome boundary constraints, which could be an available approach for navigation in confined environments. This work advances magnetically driven soft robotic locomotion, with potential applications in minimally invasive procedures and targeted drug delivery.

16:40-16:55, Paper TuCT2.3	Add to My Program
R.I.P.T.I.D.E: Robot Inspecting Parts to Increase Development Efficiency

Stiles, Bradley	Texas A&M University
Torck, David	Texas A&M University
Duron, Angela	Independent
Keywords: Industrial Robots, Modular Robots, Grasping Abstract: Additive manufacturing introduces challenges in quality assurance due to the high variability and volume of produced components. Traditional manual inspection methods, such as measurements taken with calipers, are time consuming, labor intensive, and prone to human error. This paper presents RIPTIDE, a quality confirmation system designed to enhance inspection accuracy and efficiency. RIPTIDE integrates a robotic pick and place mechanism with a scanning procedure to generate high fidelity three dimensional models of manufactured parts. By eliminating human intervention, this system improves consistency, reduces inspection time, and streamlines the validation process for mass volume. The proposed approach demonstrates significant potential in optimizing additive manufacturing workflows by ensuring reliable and scalable quality control.

16:55-17:10, Paper TuCT2.4	Add to My Program
Enhanced Robotic Gripping Accuracy through the Integration of RGB-D and Palm-Type Line Sensors

Cho, Min-Young	Korea Electronics Technology Institute
Seo, Myeongin	Korea Electronics Technology Institute
Shin, Dongin	KETI
Jun, Se-Woong	Korea Electronics Technology Institute
Keywords: Multisensor Data Fusion, Grasping, Industrial Robots Abstract: This paper introduces a novel method for precise object grasping point estimation using a palm-type line laser sensor. Active stereo sensors have difficulty in accurately determining object positions and spatial distances, which poses a challenge in robotic grasping. The proposed approach improves object recognition by accurately detecting positions, widths, and spaces, significantly improving the accuracy of contact points and distance measurement between objects. This enables a robot gripper to insert tool tips without collision, enhancing overall operability. Particularly in complex environments, this method substantially improves robotic manipulation capabilities, making it more effective in industrial automation and smart manufacturing.

17:10-17:25, Paper TuCT2.5	Add to My Program
Challenges for Expeditionary Robotic Manufacturing Systems

Guzman, Alina	Texas A&M University
Patterson, Albert	Texas A&M University
Keywords: Robotics in Hazardous Applications, Industrial Robots Abstract: The ability to manufacture spare parts, complete repairs, and carry out other important manufacturing activities is a major concern for users in expeditionary environments (battlefields, remote research stations, or disaster relief areas). The challenges that arise include a limited source of energy, security concerns, an unreliable supply chain, poor local infrastructure, harsh weather, and urgency not typically encountered in regular manufacturing environments. This article developed a conceptual model for the challenges encountered in expeditionary manufacturing, with a focus on applications that use robotic systems to complete or assist in the fabrication. A case study was completed to demonstrate the concepts for a realistic scenario. This work is useful for designers and system planners who wish to use robotic systems (including CNC machines and 3D printers) to support manufacturing activities within an expeditionary environment.

17:25-17:40, Paper TuCT2.6	Add to My Program
Harnessing Robotic Scouts for Resilient Evacuation Policies in Disaster Scenarios

Alam, Tauhidul	Lamar University
Quader, Sufi	Lamar University
Islam, Sadman	Lamar University
Redwan Newaz, Abdullah Al	University of New Orleans
Keywords: Search and Rescue Robotics, Intelligent Robotic Vehicles, Autonoums Vehicle Navigation Abstract: Efficient evacuation route planning is critical for enhancing emergency response systems in disaster scenarios. Unlike traditional navigation systems that rely on pre-existing data and provide traffic-based routing under normal conditions, our method integrates robotic scouts–comprising drones and ground vehicles–to dynamically assist in evacuation planning during disasters. We propose an effective method for synthesizing evacuation policies that enable robotic scouts to guide evacuees through disaster-affected areas. By leveraging real-world disaster assessment data mapped onto a roadmap, we model the disaster environment and formulate the problem of generating evacuation policies in a stochastic framework using a Markov Decision Process (MDP). Within this framework, we assign location-specific costs on the roadmap based on the degree of structural damage in surrounding areas. Through policy iteration, we solve the MDP to synthesize the optimal evacuation policy for robotic scouts, ensuring effective routes from impacted zones to safe locations. Our simulation results based on real-world data from previous disaster assessments and performance analysis demonstrate the effectiveness of our method and validate its potential to significantly improve disaster management and emergency response strategies.

Technical Program for Tuesday July 1, 2025