RITA 2025 Program | Thursday December 18, 2025


ThPlenary	Great Hall
Learning Agile Robot Control: Reinforcement, Imitation, and Bayesian Auto-Tuning. Prof. Sebastian Trimpe (RWTH Aachen University)	Plenary Talk
Chair: Cho, Namhoon	Seoul National University


Th1A	Great Hall
AI & ML & Deep RL	Regular Session
Chair: Liu, Chenyu	Cranfield University

11:10-11:25, Paper Th1A.1
Constrained Optimization Formulation of Bellman Optimality Equation for Online Reinforcement Learning

Lee, Hyochan	Korea Advanced Institute of Science and Technology
Choi, Kyunghwan	Korea Advanced Institute of Science and Technology
Keywords: AI & ML & Deep RL, Dynamics and Control Abstract: This paper proposes an online reinforcement learning algorithm that directly solves the Bellman optimality equation by casting it as a constrained optimization problem. Unlike policy or value iteration, which incrementally approximate the Bellman (optimality) equation, the method treats the value function and control policy as joint decision variables and solves them simultaneously. The formulation also permits systematic incorporation of additional constraints, such as input or safety limits. Direct solution of the Bellman optimality equation enables coordinated value–policy updates that stabilize online adaptation. Explicit constraint handling ensures per-step feasibility in online settings. The problem is addressed using a Lagrangian-based primal–dual approach, resulting in online update laws that drive the Bellman error toward zero while satisfying all constraints. The effectiveness of the method is demonstrated on a constrained nonlinear optimal control task.

11:25-11:40, Paper Th1A.2
Task-Adaptive Inverse Kinematics through LLM Guidance: Bridging Semantic Understanding and Numerical Optimisation

Yang, Shibao	University of York
Liu, Pengcheng	University of York
Keywords: Robot Manipulation, Language Models for Robotics, Dynamics and Control Abstract: Inverse kinematics (IK) remains central to robotic manipulation, yet most solvers use fixed weighting and priorities that do not adapt to task context, limiting precision, safety, and efficiency in real settings. We propose LLM-AWQP, a framework that uses a large language model (LLM) as a semantic-to-control adapter: compact task descriptions covering factors such as object fragility, environmental constraints, and manipulation phase are mapped to IK solver configurations in an adaptive weighted quadratic programming (AWQP) formulation. The resulting policy emphasises fast approach, precise and stable grasping (with additional care for fragile objects), and safe, orientation-stable lifting, while preserving the stability and practicality of standard IK optimisation. The approach is modular (supporting different LLM backends and IK solvers) and lightweight enough for real-time use. In experiments across diverse manipulation scenarios, LLM-AWQP reduces iterations to convergence and improves task success and efficiency, demonstrating that semantic guidance can effectively shape low-level control.

11:40-11:55, Paper Th1A.3
Residual Reinforcement Learning for Robust Path Following in Unstructured Terrains

Yang, Taegeun	Korea Advanced Institute of Science and Technology
Hwang, Jiwoo	Korea Advanced Institute of Science and Technology
Son, Hyunsik	Hanwha Aerospace Co., Ltd., Korea
Yoon, Sung-eui	KAIST
Keywords: AI & ML & Deep RL Abstract: Robust path tracking for autonomous vehicles in unstructured off-road environments is a significant challenge due to unpredictable wheel-ground interactions. While model-based controllers lack adaptability, purely learning-based methods often lack stability guarantees. To address this, we propose a novel hybrid control framework that combines a nominal Model Predictive Controller (MPC) with a residual reinforcement learning (RL) policy. Our key contribution is a terrain-aware adaptation mechanism that distills privileged physical properties, such as ground friction, from velocity history, enabling the policy to proactively compensate for terrain-induced discrepancies. Comprehensive experiments in a high-fidelity simulation demonstrate that our method not only achieves substantially lower tracking errors compared to conventional MPC and end-to-end RL, but also exhibits superior training efficiency. This work presents an effective solution for precise path following in complex, unstructured environments.

11:55-12:10, Paper Th1A.4
Deep Learning for Semantic Segmentation of 3D Ultrasound Data

Liu, Chenyu	Calyo
Cecotti, Marco	Cranfield University
Vijayakumar, Harikrishnan	Cranfield University
Robinson, Patrick James	Calyo
Barson, James	Calyo
Caleap, Mihai	Calyo
Keywords: AI & ML & Deep RL, Navigation, Perception & SLAM, Advances in Sensor Technology Abstract: Developing cost-efficient and reliable perception systems remains a central challenge for automated vehicles. LiDAR and camera-based systems dominate, yet they present trade-offs in cost, robustness and performance under adverse conditions. This work introduces a novel framework for learning-based 3D semantic segmentation using Calyo Pulse, a modular, solid-state 3D ultrasound sensor system for use in harsh and cluttered environments. A 3D U-Net architecture is introduced and trained on the spatial ultrasound data for volumetric segmentation. Results demonstrate robust segmentation performance from Calyo Pulse sensors, with potential for further improve-ment through larger datasets, refined ground truth, and weighted loss functions. Importantly, this study highlights 3D ultrasound sensing as a promising complementary modality for reliable autonomy.

12:10-12:25, Paper Th1A.5
E-SDS – Environment-Aware See It, Do It, Sorted: Automated Environment-Aware Reinforcement Learning for Humanoid Locomotion

Yalcin, Enis	University College London
O'hara, Josh	University College London
Stamatopoulou, Maria	University College London
Zhou, Chengxu	University College London
Kanoulas, Dimitrios	University College London
Keywords: Navigation, Perception & SLAM, Language Models for Robotics, AI & ML & Deep RL Abstract: Automated reward design using vision-language models (VLMs) promises to alleviate the bottleneck of manual engineering in humanoid locomotion, yet existing methods are typically `blind', lacking the environment perception required for complex terrain. We present E-SDS (Environment-aware See it, Do it, Sorted), a framework that closes this perception gap. E-SDS integrates VLMs with real-time terrain sensor analysis to automatically generate reward functions that facilitate the training of robust perceptive locomotion policies, grounded by example videos. Evaluated on a Unitree G1 humanoid across four distinct terrains (simple, gaps, obstacles, stairs), E-SDS uniquely enabled successful stair descent, while policies trained with manually-designed rewards or a non-perceptive automated baseline were unable to complete the task. Across all terrains, E-SDS also reduced velocity tracking error by 51.9–82.6%. Our framework reduces the human effort of reward design from days to under two hours while simultaneously yielding more robust and capable locomotion policies.

12:25-12:40, Paper Th1A.6
Mitigating Attention Collapse Via Mean-Deviation Constrained Optimization

Kim, Jiyun	Gwangju Institute of Science and Technology
Choi, Kyunghwan	Korea Advanced Institute of Science and Technology
Keywords: AI & ML & Deep RL, Safe Decision Making under Uncertainty, Risk-Aware Autonomy Abstract: Attention mechanisms are widely used in deep learning to compute contextual representations, but they are prone to collapse when attention weights concentrate excessively on a few tokens, potentially degrading model performance.We propose an Mean-Deviation Constrained Attention (MDCA), an optimization-based attention mechanism that constrains the mean-deviation of attention weights to mitigate attention collapse. The constraint is formulated as an inequality condition and is efficiently handled using the Augmented Lagrangian Method (ALM), enabling explicit control over attention concentration. Unlike heuristic approaches such as dropout or temperature scaling, our method introduces a principled regularization framework grounded in constrained optimization. We evaluate the proposed method on two tasks: (i) selective attention for handwriting classification using the Badge-MNIST dataset, in comparison with standard baselines including vanilla attention, entropy regularization, and temperature scaling; and (ii) imitation learning on the nuPlan dataset, compared with a representative state-of-the-art planner. On Badge-MNIST, our method improves attention selectivity and accuracy across seeds. On nuPlan, it yields safety driving in reactive closed loop and openloop evaluation while maintaining modest.


ThInvited	Great Hall
Out of the Ordinary: Unlocking the Potential of Unconventional Aerial Robots. Prof. Sophie F. Armanini (Imperial College London)	Invited Talk
Chair: Lee, Seokwon	Chung-Ang University


Th2A	Great Hall
Navigation, Perception & SLAM I	Regular Session
Chair: Xu, Gangyan	The Hong Kong Polytechnic University

14:20-14:35, Paper Th2A.1
Surveillance System Evaluation in Occlusion and Moving Persons Scenarios Using Mobile Robot

EJAZ, SUMIYA	University of Tsukuba
Yorozu, Ayanori	University of Tsukuba
Ohya, Akihisa	University of Tsukuba
Keywords: Safe Decision Making under Uncertainty, Human Robot Interaction, AI & ML & Deep RL Abstract: Developing intelligent surveillance systems remains an active research focus. A mobile robot-based surveillance system equipped with an omnidirectional camera was designed [2, 3] to detect people, track their movements, verify face mask usage, and perform full-area monitoring in indoor environments. The system continuously detects, classifies, and localizes individuals to enable real-time monitoring without relying on traditional distance sensors. The development included a person position estimation method and robot path planning for complete area coverage and targeted person approach. This paper evaluates the system’s performance by addressing challenges related to occlusion caused by person intervals and the continuous tracking of moving individuals across various scenarios. The evaluation experiments verify whether the robot handles occlusion effectively and accurately approaches moving targets while maintaining consistent tracking. Key findings indicate that occlusion can be mitigated as the robot moves but also reveal challenges in maintaining continuous recognition in certain situations. In addition, the results confirm the system’s ability to maintain consistent tracking while accurately approaching moving individuals, while also highlighting its limitations in a specific scenario.

14:35-14:50, Paper Th2A.2
MC-BEVPlace++: Multi-Channel Bird’s-Eye-View Description for LiDAR Place Recognition

Rhee, Jeongmin	KETI(Korea Electronics Technology Institute)
K.E.T.I., Seokjun	Korea Electronics Technology Institute
Sung, Nak-Myoung	Korea Electronics Technology Institute
JUNG, SUNGWOOK	KETI (Korea Electronics Technology Institute)
Ahn, Il-Yeop	Korea Electronics Technology Institute
Choe, Chungjae	Korea Electronics Technology Institute
Keywords: AI & ML & Deep RL, Navigation, Perception & SLAM, Advances in Sensor Technology Abstract: LiDAR place recognition (LPR) in large, cluttered, and seasonally varying environments remains challenging. Recent BEV-based methods such as BEVPlace++ improve rotation robustness via rotation-invariant training, yet their single-channel BEV inputs may not utilize the rich signals available in LiDAR. We propose MC-BEVPlace++, a multi-channel enhancement that augments the BEVPlace++ input with three complementary cues (i.e., local elevation gradient, absolute height, and reflectance intensity) to enrich geometric and appearance evidence without changing the core backbone or loss. This colored BEV representation yields more discriminative global descriptors. Experiments on the NCLT benchmark dataset demonstrate around 6% improvement over the baseline (BEVPlace++), with consistent gains across cluttered scenes and diverse weather and illumination. These results indicate that simple, physically grounded channels can substantially boost BEV-based LPR while preserving rotation invariance and implementation simplicity.

14:50-15:05, Paper Th2A.3
A Target-Based Multi-LiDAR Multi-Camera Extrinsic Calibration System

Gentilini, Lorenzo	University of Bologna
Serio, Pierpaolo	University of Pisa
Donzella, Valentina	Queen Mary Univeristy of London
Pollini, Lorenzo	University of Pisa
Keywords: Navigation, Perception & SLAM, Advances in Sensor Technology, Mechatronic systems Abstract: Extrinsic Calibration represents the cornerstone of autonomous driving. Its accuracy plays a crucial role in the perception pipeline, as any errors can have implications for the safety of the vehicle. Modern sensor systems collect different types of data from the environment, making it harder to align the data. To this end, we propose a target-based extrinsic calibration system tailored for a multi-LiDAR and multi-camera sensor suite. This system enables cross-calibration between LiDARs and cameras with limited prior knowledge using a custom ChArUco board and a tailored nonlinear optimization method. We test the system with real-world data gathered in a warehouse. Results demonstrated the effectiveness of the proposed method, highlighting the feasibility of a unique pipeline tailored for various types of sensors.

15:05-15:20, Paper Th2A.4
Efficient 3D Scene Graph Update Based on Scene Change Detection

Park, Chanyoung	Gwangju Institute of Science and Technology (GIST)
Jang, Suji	Gwang-Ju Institute of Science and Technology
Kim, Ue-Hwan	Gwangju Institute of Science and Technology (GIST)
Keywords: AI & ML & Deep RL, Dynamics and Control Abstract: 3D Scene Graphs (3DSGs) provide a compact semantic structure. However, existing methods typically assume static environments, even though real-world environments are dynamic. When dynamic environments are considered, these methods often rely on a dense 3D map, which leads to substantial memory and computation costs. To overcome these limitations, we propose an efficient 3DSG update framework incorporating a novel scene change detection method and a dynamic 3DSG representation. This framework operates directly on a 3DSG without reconstructing a dense 3D map. Evaluations demonstrate that our framework achieves accurate updates with significantly reduced memory and runtime, enabling practical applicability in dynamic environments.

15:20-15:35, Paper Th2A.5
Raymoval: Raycasting-Based Dynamic Object Removal for Static 3D Mapping

Kim, Daebeom	Korea Advanced Institute of Science and Technology
Lee, Seungjae	Korea Advanced Institute of Science and Technology
Jang, Seoyeon	Korea Advanced Institute of Science and Technology
Marsim, Kevin Christiansen	KAIST
Myung, Hyun	KAIST (Korea Advanced Institute of Science and Technology)
Keywords: Navigation, Perception & SLAM Abstract: Static mapping is fundamental to robot navigation, providing a persistent geometric prior and a consistent reference for long-term autonomy. However, dynamic objects leave residual traces and cause surface loss, which reduces map consistency. We propose a raycasting-based module for dynamic object removal in static 3D mapping. Each scan is projected onto an azimuth-elevation grid, and for every viewing direction we compare the bin-wise minimum range with the map’s first-hit distance computed by raycasting. Furthermore, we apply a raycast consistency test that separates dynamic from static points. Finally, a spatial consistency validation step refines labels, producing static maps with lower residual dynamics and reduced over-removal. We evaluate our approach quantitatively and qualitatively on SemanticKITTI and a challenging custom dataset, and show consistent static mapping results.

15:35-15:50, Paper Th2A.6
Absolute Depth Estimation Using Fisheye Stereo Camera and Relative Depth Estimation Model

Sadamitsu, Shusuke	University of Tsukuba
Bernard, Jean-Charles David	FORVIA - FSVAP Japan
Koga, Masashi	FSVAP Japan
Ohya, Akihisa	University of Tsukuba
Yorozu, Ayanori	University of Tsukuba
Keywords: Navigation, Perception & SLAM, Advances in Sensor Technology, AI & ML & Deep RL Abstract: High-density and high-precision depth information at close ranges of several meters is essential for various automotive applications, such as automated parking and collision avoidance systems. Although fisheye cameras are widely adopted for automotive surround view perception, existing deep learning-based depth estimation methods output only relative depth, requiring camera-specific fine-tuning to obtain absolute depth. This study proposes a depth estimation method that integrates sparse absolute depth from fisheye stereo cameras with dense relative depth from monocular depth estimation models to achieve high-density absolute depth estimation. Efficient absolute depth conversion is achieved through the selection of high-quality reference points based on bidirectional disparity calculation and texture confidence, along with optimal scaling factor estimation using distance-based uniform sampling. The effectiveness of the proposed method is validated through experiments in parking lot environments.

Technical Program for Thursday December 18, 2025