ICRA 2024 Program | Tuesday May 14, 2024


TuPL-HL Plenary Session, National Convention Hall	Add to My Program
Plenary I: How to Turn a Roboticist into a Corporate Explorer -- Dr. Yoky Matsuoka

Chair: Wang, Zhidong	Chiba Institute of Technology

09:00-10:00, Paper TuPL-HL.1	Add to My Program
How to Turn a Roboticist into a Corporate Explorer

Matsuoka, Yoky	University of Washington


TuAA1-CC Award Session, CC-Main Hall	Add to My Program
Automation

Chair: Wang, Michael Yu	Mywang@gbu.edu.cn
Co-Chair: Nishi, Tatsushi	Okayama University

10:30-12:00, Paper TuAA1-CC.1	Add to My Program
TinyMPC: Model-Predictive Control on Resource-Constrained Microcontrollers

Nguyen, Khai	Carnegie Mellon University
Schoedel, Samuel	Carnegie Mellon University
Alavilli, Anoushka	Carnegie Mellon University
Plancher, Brian	Barnard College, Columbia University
Manchester, Zachary	Carnegie Mellon University
Keywords: Optimization and Optimal Control, Embedded Systems for Robotic and Automation Abstract: Model-predictive control (MPC) is a powerful tool for controlling highly dynamic robotic systems subject to complex constraints. However, MPC is computationally demanding, and is often impractical to implement on small, resource-constrained robotic platforms. We present TinyMPC, a high-speed MPC solver with a low memory footprint targeting the microcontrollers common on small robots. Our approach is based on the alternating direction method of multipliers (ADMM) and leverages the structure of the MPC problem for efficiency. We demonstrate TinyMPC’s effectiveness by benchmarking against the state-of-the-art solver OSQP, achieving nearly an order of magnitude speed increase, as well as through hardware experiments on a 27 gram quadrotor, demonstrating high-speed trajectory tracking and dynamic obstacle avoidance.

10:30-12:00, Paper TuAA1-CC.2	Add to My Program
A Movable Microfluidic Chip with Gap Effect for Manipulation of Oocytes

Liang, Shuzhang	The University of Tokyo
Amaya, Satoshi	The University of Tokyo
Sugiura, Hirotaka	The University of Tokyo
Mo, Hao	The University of Tokyo
Dai, Yuguo	The University of Tokyo
Arai, Fumihito	The University of Tokyo
Keywords: Biological Cell Manipulation, Mobile Manipulation, Automation at Micro-Nano Scales Abstract: This study proposes a novel movable microfluidic chip in which a microfluidic chip is integrated into a robotic manipulator for manipulating oocytes. The microfluidic device has the ability to release a single cell with gap effect. The robotic manipulator can control the position of the microfluidic chip. The microfluidic chip with a pipette tip is directly fabricated using 3D printing. Xenopus oocyte was used in the experiment. When oocytes move from bottom side of the channel to the tip side, they generate gaps between each other. The gap distance can reach about 16 times the diameter of oocyte. In addition, a capacitive sensor was used to detect oocytes in the manipulation processes. The results showed that oocytes were successfully released one by one with no deformation in shape using the movable microfluidic chip. The method has significant advantages in biomedicine engineering and micro-nano-manipulation.

10:30-12:00, Paper TuAA1-CC.3	Add to My Program
Under Pressure: Learning-Based Analog Gauge Reading in the Wild

Reitsma, Maurits	ETH Zurich
Keller, Julian	ETH Zurich
Blomqvist, Kenneth	ETH Zurich
Siegwart, Roland	ETH Zurich
Keywords: Robotics in Hazardous Fields, Industrial Robots, Computer Vision for Automation Abstract: We propose an interpretable framework for reading analog gauges that is deployable on real world robotic systems. Our framework splits the reading task into distinct steps, such that we can detect potential failures at each step. Our system needs no prior knowledge of the type of gauge or the range of the scale and is able to extract the units used. We show that our gauge reading algorithm is able to extract readings with a relative reading error of less than 2%.

10:30-12:00, Paper TuAA1-CC.4	Add to My Program
Efficient Composite Learning Robot Control under Partial Interval Excitation

Shi, Tian	Sun Yat-Sen University
Li, Weibing	Sun Yat-Sen University
Yu, Haoyong	National University of Singapore
Pan, Yongping	Sun Yat-Sen University
Keywords: Robust/Adaptive Control Abstract: Parameter convergence in adaptive control is crucial for improving the stability and robustness of robotic systems. Nevertheless, a stringent condition named persistent excitation (PE) needs to be satisfied to ensure parameter convergence in the conventional adaptive robot control. Composite learning robot control (CLRC) is an innovative methodology that guarantees parameter convergence under a condition of interval excitation (IE) that is strictly weaker than PE. This paper puts forward a time-division multi-channel (TDMC) CLRC strategy such that parameter convergence is achieved even without the IE condition. In the TDMC mechanism, a filtered regressor is integrated with multiple time intervals to generate a generalized prediction error for parameter update, such that excitation information of regressor channels at different instants is exploited more effectively and efficiently to achieve fast and accurate parameter estimation. Global exponential stability with parameter convergence of the closed-loop system is achieved under a partial IE condition that is much weaker than IE. Experiments on a collaborative robot with 7 degrees of freedom have demonstrated the superiority of the proposed approach in both parameter estimation and trajectory tracking compared to start-of-the-art approaches.

10:30-12:00, Paper TuAA1-CC.5	Add to My Program
MORALS: Analysis of High-Dimensional Robot Controllers Via Topological Tools in a Latent Space

Vieira, Ewerton	Rutgers University
Sivaramakrishnan, Aravind	Rutgers University
Tangirala, Sumanth	Rutgers University, New Brunswick
Granados, Edgar	Rutgers
Mischaikow, Konstantin	Rutgers University
Bekris, Kostas E.	Rutgers, the State University of New Jersey
Keywords: Dynamics, Hybrid Logical/Dynamical Planning and Verification, Motion Control Abstract: Estimating the region of attraction RoA for a robot controller is essential for safe application and controller composition. Many existing methods require a closed-form expression that limit applicability to data-driven controllers. Methods that operate only over trajectory rollouts tend to be data-hungry. In prior work, we have demonstrated that topological tools based on Morse Graphs (directed acyclic graphs that combinatorially represent the underlying nonlinear dynamics) offer data-efficient RoA estimation without needing an analytical model. They struggle, however, with high-dimensional systems as they operate over a state-space discretization. This paper presents Morse Graph-aided discovery of Regions of Attraction in a learned Latent Space (MORALS). The approach combines auto-encoding neural networks with Morse Graphs. MORALS shows promising predictive capabilities in estimating attractors and their RoAs for data-driven controllers operating over high-dimensional systems, including a 67-dim humanoid robot and a 96-dim 3-fingered manipulator. It first projects the dynamics of the controlled system into a learned latent space. Then, it constructs a reduced form of Morse Graphs representing the bistability of the underlying dynamics, i.e., detecting when the controller results in a desired versus an undesired behavior. The evaluation on high-dimensional robotic datasets indicates data efficiency in RoA estimation.


TuAA2-CC Award Session, CC-301	Add to My Program
Cognitive Robotics

Chair: Ogata, Tetsuya	Waseda University
Co-Chair: Beetz, Michael	University of Bremen

10:30-12:00, Paper TuAA2-CC.1	Add to My Program
Resilient Legged Local Navigation: Learning to Traverse with Compromised Perception End-To-End

Zhang, Chong	ETH Zurich
Jin, Jin	Tongji University
Frey, Jonas	ETH Zurich
Rudin, Nikita	ETH Zurich, NVIDIA
Mattamala, Matias	University of Oxford
Cadena Lerma, Cesar	ETH Zurich
Hutter, Marco	ETH Zurich
Keywords: Legged Robots, Sensorimotor Learning, Task and Motion Planning Abstract: Autonomous robots must navigate reliably in unknown environments even under compromised exteroceptive perception, or perception failures. Such failures often occur when harsh environments lead to degraded sensing, or when the perception algorithm misinterprets the scene due to limited generalization. In this paper, we model perception failures as invisible obstacles and pits, and train a reinforcement learning (RL) based local navigation policy to guide our legged robot. Unlike previous works relying on heuristics and anomaly detection to update navigational information, we train our navigation policy to reconstruct the environment information in the latent space from corrupted perception and react to perception failures end-to-end. To this end, we incorporate both proprioception and exteroception into our policy inputs, thereby enabling the policy to sense collisions on different body parts and pits, prompting corresponding reactions. We validate our approach in simulation and on the real quadruped robot ANYmal running in real-time (<10 ms CPU inference). In a quantitative comparison with existing heuristic-based locally reactive planners, our policy increases the success rate over 30% when facing perception failures. Project Page: https://bit.ly/45NBTuh.

10:30-12:00, Paper TuAA2-CC.2	Add to My Program
Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

Yokoyama, Naoki	Georgia Institute of Technology
Ha, Sehoon	Georgia Institute of Technology
Batra, Dhruv	Georgia Tech / Facebook AI Research
Wang, Jiuguang	Boston Dynamics AI Institute
Bucher, Bernadette	University of Michigan
Keywords: Vision-Based Navigation, AI-Enabled Robotics, Semantic Scene Understanding Abstract: Understanding how humans leverage semantic knowledge to navigate unfamiliar environments and decide where to explore next is pivotal for developing robots capable of human-like search behaviors. We introduce a zero-shot navigation approach, Vision-Language Frontier Maps (VLFM), which is inspired by human reasoning and designed to navigate towards unseen semantic objects in novel environments. VLFM builds occupancy maps from depth observations to identify frontiers, and leverages RGB observations and a pre-trained vision-language model to generate a language-grounded value map. VLFM then uses this map to identify the most promising frontier to explore for finding an instance of a given target object category. We evaluate VLFM in photo-realistic environments from the Gibson, Habitat-Matterport 3D (HM3D), and Matterport 3D (MP3D) datasets within the Habitat simulator. Remarkably, VLFM achieves state-of-the-art results on all three datasets as measured by success weighted by path length (SPL) for the Object Goal Navigation task. Furthermore, we show that VLFM's zero-shot nature enables it to be readily deployed on real-world robots such as the Boston Dynamics Spot mobile manipulation platform. We deploy VLFM on Spot and demonstrate its capability to efficiently navigate to target objects within an office building in the real world, without any prior knowledge of the environment. The accomplishments of VLFM underscore the promising potential of vision-language models in advancing the field of semantic navigation. Videos of real world deployment can be viewed at naoki.io/vlfm.

10:30-12:00, Paper TuAA2-CC.3	Add to My Program
Learning Continuous Control with Geometric Regularity from Robot Intrinsic Symmetry

Yan, Shengchao	University of Freiburg
Zhang, Baohe	University of Freiburg
Zhang, Yuan	University of Freiburg
Boedecker, Joschka	University of Freiburg
Burgard, Wolfram	University of Technology Nuremberg
Keywords: Machine Learning for Robot Control, Reinforcement Learning, Deep Learning Methods Abstract: Geometric regularity, which leverages data symmetry, has been successfully incorporated into deep learning architectures such as CNNs, RNNs, GNNs, and Transformers. While this concept has been widely applied in robotics to address the curse of dimensionality when learning from high-dimensional data, the inherent reflectional and rotational symmetry of robot structures has not been adequately explored. Drawing inspiration from cooperative multi-agent reinforcement learning, we introduce novel network structures for single-agent control learning that explicitly capture these symmetries. Moreover, we investigate the relationship between the geometric prior and the concept of Parameter Sharing in multi-agent reinforcement learning. Last but not the least, we implement the proposed framework in online and offline learning methods to demonstrate its ease of use. Through experiments conducted on various challenging continuous control tasks on simulators and real robots, we highlight the significant potential of the proposed geometric regularity in enhancing robot learning capabilities.

10:30-12:00, Paper TuAA2-CC.4	Add to My Program
Learning Vision-Based Bipedal Locomotion for Challenging Terrain

Duan, Helei	Oregon State University
Pandit, Bikram	Oregon State University
Gadde, Mohitvishnu S.	Oregon State University
van Marum, Bart Jaap	Oregon State University
Dao, Jeremy	Oregon State University
Kim, Chanho	Oregon State University
Fern, Alan	Oregon State University
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Reinforcement Learning Abstract: Reinforcement learning (RL) for bipedal locomotion has recently demonstrated robust gaits over moderate terrains using only proprioceptive sensing. However, such blind controllers will fail in environments where robots must anticipate and adapt to local terrain, which requires visual perception. In this paper, we propose a fully-learned system that allows bipedal robots to react to local terrain while maintaining commanded travel speed and direction. Our approach first trains a controller in simulation using a heightmap expressed in the robot's local frame. Next, data is collected in simulation to train a heightmap predictor, whose input is the history of depth images and robot states. We demonstrate that with appropriate domain randomization, this approach allows for successful sim-to-real transfer with no explicit pose estimation and no fine-tuning using real-world data. To the best of our knowledge, this is the first example of sim-to-real learning for vision-based bipedal locomotion over challenging terrains.

10:30-12:00, Paper TuAA2-CC.5	Add to My Program
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration

Sridhar, Ajay	University of California, Berkeley
Shah, Dhruv	University of California, Berkeley
Glossop, Catherine	University of California, Berkeley
Levine, Sergey	UC Berkeley
Keywords: Deep Learning Methods, Vision-Based Navigation Abstract: Robotic learning for navigation in unfamiliar environments needs to provide policies for both task-oriented navigation (i.e., reaching a goal that the robot has located), and task-agnostic exploration (i.e., searching for a goal in a novel setting). Typically, these roles are handled by separate models, for example by using subgoal proposals, planning, or separate navigation strategies. In this paper, we describe how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration, with the latter providing the ability to search novel environments, and the former providing the ability to reach a user-specified goal once it has been located. We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments, as compared to approaches that use subgoal proposals from generative models, or prior methods based on latent variable models. We instantiate our method by using a large-scale Transformer- based policy trained on data from multiple ground robots, with a diffusion model decoder to flexibly handle both goal- conditioned and goal-agnostic navigation. Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods, and demonstrate significant improvements in performance and lower collision rates, despite utilizing smaller models than state-of-the-art approaches.


TuAT4-CC Oral Session, CC-315	Add to My Program
Multi-Robot Systems I

Chair: Sukhatme, Gaurav	University of Southern California
Co-Chair: Kanezaki, Asako	Tokyo Institute of Technology

10:30-12:00, Paper TuAT4-CC.1	Add to My Program
Phasic Diversity Optimization for Population-Based Reinforcement Learning

Jiang, Jingcheng	Dalian University of Technology
Piao, Haiyin	Northwestern Polytechnical University
Fu, Yu	Dalian University of Technology
Hao, Yihang	Yangzhou Collaborative Innovation Research Institute CO., LTD
Jiang, Chuanlu	Dalian University of Technology
Wei, Ziqi	Chinese Academy of Science
Yang, Xin	Dalian University of Technology
Keywords: Reinforcement Learning, Aerial Systems: Perception and Autonomy, Aerial Systems: Applications Abstract: Looking back at previous diversity work on Rein-forced learning, diversity is often achieved through the augmented loss function, which is required in the context of reward and diversity. Usually, the diversity optimization algorithm uses the multi-armed bandit algorithm to select the coefficient that predefined space. However, the dynamic distribution of the reward signal or quality of the MAB with diversity limits the performance of these methods. We introduce the Phase Diversity Optimization (PDO) algorithm, a population-based training-based framework that combines reward and diversity training to different stages, rather than optimizing multi-objective functions. In the secondary phase, having poor performance diversification through determinants does not replace the better agents in the archive. Rewards and diversity allow us to use the diversity of positive optimization in the secondary phase, where performance does not degrade. Furthermore, we built an aerial melee scenario agent.

10:30-12:00, Paper TuAT4-CC.2	Add to My Program
VO-Safe Reinforcement Learning for Drone Navigation

Lin, Feiqiang	Cardiff University
Wei, Changyun	Hohai University
Grech, Raphael	Spirent Communications
Ji, Ze	Cardiff University
Keywords: Reinforcement Learning, Aerial Systems: Perception and Autonomy, Vision-Based Navigation Abstract: This work is focused on reinforcement learning (RL)-based navigation for drones, whose localisation is based on visual odometry (VO). Such drones should avoid flying into areas with poor visual features, as this can lead to deteriorated localization or complete loss of tracking. To achieve this, we propose a hierarchical control scheme, which uses an RL-trained policy as the high-level controller to generate waypoints for the next control step and a low-level controller to guide the drone to reach subsequent waypoints. For the high-level policy training, unlike other RL-based navigation approaches, we incorporate awareness of VO performance into our policy by introducing pose estimation-related punishment. To aid robots in distinguishing between perception-friendly areas and unfavoured zones, we instead provide semantic scenes, as input for decision-making instead of raw images. This approach also helps minimise the sim-to-real application gap.

10:30-12:00, Paper TuAT4-CC.3	Add to My Program
RoCo: Dialectic Multi-Robot Collaboration with Large Language Models

Zhao, Mandi	Stanford University
Jain, Shreeya	Columbia University
Song, Shuran	Columbia University
Keywords: Multi-Robot Systems, Deep Learning Methods, Human-Robot Collaboration Abstract: We propose a novel approach to multi-robot collaboration that harnesses the power of pre-trained large language models (LLMs) for both high-level communication and low-level path planning. Robots are equipped with LLMs to discuss and collectively reason task strategies. They generate sub-task plans and task space waypoint paths, which are used by a multi-arm motion planner to accelerate trajectory planning. We also provide feedback from the environment, such as collision checking, and prompt the LLM agents to improve their plan and waypoints in-context. For evaluation, we introduce RoCoBench, a 6-task benchmark covering a wide range of multi-robot collaboration scenarios, accompanied by a text-only dataset that evaluates LLMs’ agent representation and reasoning capability. We experimentally demonstrate the effectiveness of our approach — it achieves high success rates across all tasks in RoCoBench and adapts to variations in task semantics. Our dialog setup offers high interpretability and flexibility — in real world experiments, we show RoCo easily incorporates human-in-the-loop, where a user can communicate and collaborate with a robot agent to complete tasks together. Project website: https://project-roco.github.io

10:30-12:00, Paper TuAT4-CC.4	Add to My Program
Collision Avoidance and Navigation for a Quadrotor Swarm Using End-To-End Deep Reinforcement Learning

Huang, Zhehui	University of Southern California
Yang, Zhaojing	University of Southern California
Krupani, Rahul	University of Southern California
Şenbaşlar, Baskın	NVIDIA
Batra, Sumeet	USC
Sukhatme, Gaurav	University of Southern California
Keywords: Multi-Robot Systems, Reinforcement Learning, Collision Avoidance Abstract: End-to-end deep reinforcement learning (DRL) for quadrotor control promises many benefits -- easy deployment, task generalization and real-time execution capability. Prior end-to-end DRL-based methods have showcased the ability to deploy learned controllers onto single quadrotors or quadrotor teams maneuvering in simple, obstacle-free environments. However, the addition of obstacles increases the number of possible interactions exponentially, thereby increasing the difficulty of training RL policies. In this work, we propose an end-to-end DRL approach to control quadrotor swarms in environments with obstacles. We provide our agents a curriculum and a replay buffer of the clipped collision episodes to improve performance in obstacle-rich environments. We implement an attention mechanism to attend to the neighbor robots and obstacle interactions - the first successful demonstration of this mechanism on policies for swarm behavior deployed on severely compute-constrained hardware. Our work is the first work that demonstrates the possibility of learning neighbor-avoiding and obstacle-avoiding control policies trained with end-to-end DRL that transfers zero-shot to real quadrotors. Our approach scales to 32 robots with 80% obstacle density in simulation and 8 robots with 20% obstacle density in physical deployment. Website: https://sites.google.com/view/obst-avoid-swarm-rl.

10:30-12:00, Paper TuAT4-CC.5	Add to My Program
C3F: Constant Collaboration and Communication Framework for Graph-Representation Dynamic Multi-Robotic Systems

Jia, Hongda	National University of Defense Technology
Gao, Zijian	National University of Defense Technology
Yang, Cheng	National University of Defense Technology
Ding, Bo	National University of Defense Technology
Zhai, Yuanzhao	National University of Defense Technology
Wang, Huaimin	National University of Defense Technology
Keywords: Multi-Robot Systems, Reinforcement Learning, Distributed Robot Systems Abstract: Deep reinforcement learning (DRL) methods have been widely applied in distributed multi-robotic systems and successfully realized autonomous learning in many fields. In these fields, robots need to communicate and collaborate with other robots in real time, and reach agreed cognition for task assignment, which puts high requirements on efficiency and stability. However, robots may often get damaged even crash in complex environments, and have to be dynamically substituted. It seems not robust enough for most existing DRL works to make new robots fast adapt to current team policies, causing performance degradation. In this work, we get inspired by the genetic mechanism of social animals' instincts, and propose a robust multi-robotic collaboration and communication framework, C3F. It introduces graph-based representation to discover more features on the relevance among robots, and takes advantage of meta learning mechanism to conclude the general meta policy. When some robots crash and get replaced by new ones, this meta policy will be reused to guide new robots on how to quickly follow the existing collaboration and communication rules, and fast adapt to their roles in the team. The experiments on both the Webots simulator and the Starcraft II platform indicate that our methods have better performance compared with some SOTA methods, showing strong robustness and remarkable adaptability to the dynamic substitution in multi-robotic systems.

10:30-12:00, Paper TuAT4-CC.6	Add to My Program
Multi-Level Action Tree Rollout (MLAT-R): Efficient and Accurate Online Multiagent Policy Improvement

Henshall, Andrea	MIT
Karaman, Sertac	Massachusetts Institute of Technology
Keywords: Multi-Robot Systems, Reinforcement Learning, Optimization and Optimal Control Abstract: Rollout algorithms are renowned for their abilities to correct for the suboptimalities of offline-trained base policies. In the multiagent setting, performing online rollout can require an exponentially large number of optimizations with respect to the number of agents. One-agent-at-a-time algorithms offer computationally efficient approaches to guaranteed policy improvement; however, this improvement is with respect to a state value estimate derived from a potentially poor base policy. Monte Carlo tree search (MCTS) provably converges to the true state value estimates; however, the exponentially large search space often makes its online use limited. Here, we present the Multi-Level Action Tree Rollout (MLAT-R) algorithm. MLAT-R provides 1) provable improvement over a base policy, 2) policy improvement with respect to the true state value, 3) applicability to any number of agents, and 4) an action space that grows linearly with the number of agents rather than exponentially. In this paper, we outline the algorithm, sketch a proof of its improvement over a base policy, and evaluate its performance on a challenging problem for which the base policy cannot reach a terminal state. Despite the challenging experimental setup, our algorithm reached a terminal state in 86% of all experiments, compared to 31% for state-of-the-art one-agent-at-a-time algorithms. In experiments involving MCTS, MLAT-R reached a terminal state in 99% of experiments compared to 92% for MCTS. MLAT-R achieved these results while considering an exponentially smaller action space than MCTS.

10:30-12:00, Paper TuAT4-CC.7	Add to My Program
Stimulate the Potential of Robots Via Competition

Huang, Kangyao	Tsinghua University
Guo, Di	Beijing University of Posts and Telecommunications
Zhang, Xinyu	Tsinghua University
Ji, Xiangyang	Tsinghua University
Liu, Huaping	Tsinghua University
Keywords: Multi-Robot Systems, Reinforcement Learning, Transfer Learning Abstract: It is common for us to feel pressure in a competition environment, which arises from the desire to obtain success comparing with other individuals or opponents. Although we might get anxious under the pressure, it could also be a drive for us to stimulate our potentials to the best in order to keep up with others. Inspired by this, we propose a competitive learning framework which is able to help individual robot to acquire knowledge from the competition, fully stimulating its dynamics potential in the race. Specifically, the competition information among competitors is introduced as the additional auxiliary signal to learn advantaged actions. We further build a Multiagent-Race environment, and extensive experiments are conducted, demonstrating that robots trained in competitive environments outperform ones that are trained with SoTA algorithms in single robot environment.

10:30-12:00, Paper TuAT4-CC.8	Add to My Program
Multi-Agent Visual Coordination Using Optical Wireless Communication

Nakagawa, Haruyuki	Tokyo Institute of Technology
Kanezaki, Asako	Tokyo Institute of Technology
Keywords: Multi-Robot Systems, Vision-Based Navigation, Reinforcement Learning Abstract: Communication is a key element in applying multi-agent reinforcement learning to a wide range of real-world scenarios. We focus on optical wireless communication (OWC), which is a practical solution to be used in various real situations where radio communication is not available, such as underwater or in a lot of radio noise environment. OWC is a method of communicating only with other agents in visual range using light, unlike radio wave like communication which is mostly assumed in existing research on multi-agent reinforcement learning. Due to limited communication, when OWC is used, overall performance is generally degraded from the case with full communication. In this paper, we propose a reinforcement learning method that learns visual coordination behavior using OWC. Our proposed visually cooperative behavior enables agents equipped with limited field of view (FOV) cameras to efficiently comprehend and imagine their surrounding environment through cooperative communication. Experimental results in simulation demonstrated that, using the proposed visual coordination method, multi-agents using OWC with general FOV show comparable performance to those with radio wave like full communication. Additionally, it has been demonstrated that this method can improve performance in various multi-agent reinforcement learning algorithms. We also implement OWC devices on real mobile robots and demonstrated the proposed multi-agent operation.


TuAT9-CC Oral Session, CC-419	Add to My Program
Datasets for Robot Learning

Chair: Zeng, Long	Tsinghua University
Co-Chair: Caesar, Holger	Delft University of Technology

10:30-12:00, Paper TuAT9-CC.1	Add to My Program
Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding

Tang, Yifan	Tsinghua University
Tai, Cong	Tsinghua University
Chen, Fang-xing	Tsinghua Shenzhen International Graduate School
Zhang, Wanting	Tsinghua University
Zhang, Tao	Pudu Technology Ltd
Liu, Xueping	Tsinghua University
Liu, Yong-Jin	Tsinghua University
Zeng, Long	Tsinghua University
Keywords: Data Sets for Robot Learning, Service Robotics, Dynamics Abstract: Most existing robotic datasets capture static scene data and thus are limited in evaluating robots' dynamic performance. To address this, we present a mobile robot oriented large-scale indoor dataset, denoted as THUD (Tsinghua University Dynamic) robotic dataset, for training and evaluating their dynamic scene understanding algorithms. Specifically, the THUD dataset construction is first detailed, including organization, acquisition, and annotation methods. It comprises both real-world and synthetic data, collected with a real robot platform and a physical simulation platform, respectively. Our current dataset includes 13 larges-scale dynamic scenarios, 90K image frames, 20M 2D/3D bounding boxes of static and dynamic objects, camera poses, and IMU. The dataset is still continuously expanding. Then, the performance of mainstream indoor scene understanding tasks, e.g. 3D object detection, semantic segmentation, and robot relocalization, is evaluated on our THUD dataset. These experiments reveal serious challenges for some robot scene understanding tasks in dynamic scenes. By sharing this dataset, we aim to foster and iterate new mobile robot algorithms quickly for robot actual working dynamic environment, i.e. complex crowded dynamic scenes.

10:30-12:00, Paper TuAT9-CC.2	Add to My Program
InteRACT: Transformer Models for Human Intent Prediction Conditioned on Robot Actions

Kedia, Kushal	Cornell University
Bhardwaj, Atiksh	Cornell University
Dan, Prithwish	Cornell University
Choudhury, Sanjiban	Cornell University
Keywords: Data Sets for Robot Learning, Intention Recognition, Human-Robot Collaboration Abstract: In collaborative human-robot manipulation, a robot must predict human intents and adapt its actions accordingly to smoothly execute tasks. However, the human's intent in turn depends on actions the robot takes, creating a chicken-or-egg problem. Prior methods ignore such inter-dependency and instead train marginal intent prediction models independent of robot actions. This is because training conditional models is hard given a lack of paired human-robot interaction datasets. Can we instead leverage large-scale human-human interaction data that is more easily accessible? Our key insight is to exploit a correspondence between human and robot actions that enables transfer learning from human-human to human-robot data. We propose a novel architecture, InteRACT, that pre-trains a conditional intent prediction model on large human-human datasets and fine-tunes on a small human-robot dataset. We evaluate on a set of real-world collaborative human-robot manipulation tasks and show that our conditional model improves over various marginal baselines. We also introduce new techniques to tele-operate a 7-DoF robot arm and collect a diverse range of human-robot collaborative manipulation data which we open-source. We release our code and datasets at https://portal-cornell.github.io/interact/.

10:30-12:00, Paper TuAT9-CC.3	Add to My Program
Towards Learning-Based Planning: The nuPlan Benchmark for Real-World Autonomous Driving

Karnchanachari, Napat	Motional
Geromichalos, Dimitris	Motional
Tan, Kok Seang	Motional
Li, Nanxiang	Motional
Eriksen, Christopher	Motional AD
Yaghoubi, Shakiba	Motional
Mehdipour, Noushin	Motional
Bernasconi, Gianmarco	Motional
Fong, Whye Kit	Motional
Guo, Yiluan	Motional
Caesar, Holger	Delft University of Technology
Keywords: Data Sets for Robot Learning, Integrated Planning and Learning, Deep Learning Methods Abstract: Machine Learning (ML) has replaced traditional handcrafted methods for perception and prediction in autonomous vehicles. Yet for the equally important planning task, the adoption of ML-based techniques is slow. We present nuPlan, the world’s first real-world autonomous driving dataset, and benchmark. The benchmark is designed to test the ability of ML-based planners to handle diverse driving situations and to make safe and efficient decisions. To that end, we introduce a new large-scale dataset that consists of 1282 hours of diverse driving scenarios from 4 cities (Las Vegas, Boston, Pittsburgh, and Singapore) and includes high-quality auto-labeled object tracks and traffic light data. We exhaustively mine and taxonomize common & rare driving scenarios which are used during evaluation to get fine-grained insights into the performance and characteristics of a planner. Beyond the dataset, we provide a simulation and evaluation framework that enables a planner’s actions to be simulated in closed-loop to account for interactions with other traffic participants. We present a detailed analysis of numerous baselines and investigate gaps between ML-based and traditional methods. Find the nuPlan dataset and code at nuplan.org.

10:30-12:00, Paper TuAT9-CC.4	Add to My Program
TBD Pedestrian Data Collection: Towards Rich, Portable, and Large-Scale Natural Pedestrian Data

Wang, Allan	Carnegie Mellon University
Sato, Daisuke	Carnegie Mellon University
Corzo, Yasser	Carnegie Mellon University
Simkin, Sonya	Carnegie Mellon University
Biswas, Abhijat	Carnegie Mellon University
Steinfeld, Aaron	Carnegie Mellon University
Keywords: Data Sets for Robot Learning, Human-Aware Motion Planning, Data Sets for Robotic Vision Abstract: Social navigation and pedestrian behavior research has shifted towards machine learning-based methods and converged on the topic of modeling inter-pedestrian interactions and pedestrian-robot interactions. For this, large-scale datasets that contain rich information are needed. We describe a portable data collection system, coupled with a semi-autonomous labeling pipeline. As part of the pipeline, we designed a label correction web application that facilitates human verification of automated pedestrian tracking outcomes. Our system enables large-scale data collection in diverse environments and fast trajectory label production. Compared with existing pedestrian data collection methods, our system contains three components: a combination of top-down and ego-centric views, natural human behavior in the presence of a socially appropriate "robot", and human-verified labels grounded in the metric space. To the best of our knowledge, no prior data collection system has a combination of all three components. We further introduce our ever-expanding dataset from the ongoing data collection effort -- the TBD Pedestrian Dataset and show that our collected data is larger in scale, contains richer information when compared to prior datasets with human-verified labels, and supports new research opportunities.

10:30-12:00, Paper TuAT9-CC.5	Add to My Program
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics

Sermanet, Pierre	Google
Ding, Tianli	Google
Zhao, Jeffrey	Google
Xia, Fei	Google Inc
Dwibedi, Debidatta	Google
Gopalakrishnan, Keerthana	Google
Chan, Christine	Google LLC
Dulac-Arnold, Gabriel	Google
Maddineni, Sharath	Google
Joshi, Nikhil J	Google
Florence, Peter	MIT
Han, Wei	Google
Robert, Baruch	Google.com
Lu, Yao	Google
Mirchandani, Suvir	Google
Xu, Peng	Google
Sanketi, Pannag	Google
Hausman, Karol	Google Brain
Shafran, Izhak	Google
Ichter, Brian	Google Brain
Cao, Yuan	Google
Keywords: Data Sets for Robot Learning, Data Sets for Robotic Vision, Learning Categories and Concepts Abstract: We present a scalable, bottom-up and intrinsically diverse data collection scheme that can be used for high-level reasoning with long and medium horizons and that has 2.2x higher throughput compared to traditional narrow top-down step-by-step collection. We collect realistic data by performing any user requests within the entirety of 3 office buildings and using multiple embodiments (robot, human, human with grasping tool). With this data, we show that models trained on all embodiments perform better than ones trained on the robot data only, even when evaluated solely on robot episodes. We explore the economics of collection costs and find that for a fixed budget it is beneficial to take advantage of the cheaper human collection along with robot collection. We release a large and highly diverse (29,520 unique instructions) dataset dubbed RoboVQA containing 829,502 (video, text) pairs for robotics-focused visual question answering. We also demonstrate how evaluating real robot experiments with an intervention mechanism enables performing tasks to completion, making it deployable with human oversight even if imperfect while also providing a single performance metric. We demonstrate a single video-conditioned model named RoboVQA-VideoCoCa trained on our dataset that is capable of performing a variety of grounded high-level reasoning tasks in broad realistic settings with a cognitive intervention rate 46% lower than the zero-shot state of the art visual language model (VLM) baseline and is able to guide real robots through long-horizon tasks. The performance gap with zero-shot state-of-the-art models indicates that a lot of grounded data remains to be collected for real-world deployment, emphasizing the critical need for scalable data collection approaches. Finally, we show that video VLMs significantly outperform single-image VLMs with an average error rate reduction of 19% across all VQA tasks. Data and videos are available at https://robovqa.github.io

10:30-12:00, Paper TuAT9-CC.6	Add to My Program
RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot

Fang, Hao-Shu	Shanghai Jiao Tong University
Fang, Hongjie	Shanghai Jiao Tong University
Tang, Zhenyu	Shanghai Jiao Tong University
Liu, Jirong	Shanghai Jiaotong University
Wang, Chenxi	Shanghai Jiao Tong University
Wang, Junbo	Shanghai Jiao Tong University
Zhu, Haoyi	University of Science and Technology of China
Lu, Cewu	ShangHai Jiao Tong University
Keywords: Data Sets for Robot Learning, Big Data in Robotics and Automation, Imitation Learning Abstract: A key challenge for robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots. Recent progress in one-shot imitation learning and robotic foundation models have shown promise in transferring trained policies to new tasks based on demonstrations. This feature is attractive for enabling robots to acquire new skills and improve their manipulative ability. However, due to limitations in the training dataset, the current focus of the community has mainly been on simple cases, such as push or pick-place tasks, relying solely on visual guidance. In reality, there are many complex skills, some of which may even require both visual and tactile perception to solve. This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception. To achieve this, we have collected a dataset comprising over 110,000 contact-rich robot manipulation sequences across diverse skills, contexts, robots, and camera viewpoints, all collected in the real world. Each sequence in the dataset includes visual, force, audio, and action information. Moreover, we also provide a corresponding human demonstration video and a language description for each robot sequence. We have invested significant efforts in calibrating all the sensors and ensuring a high-quality dataset. The dataset is made publicly available on our website: https://rh20t.github.io.

10:30-12:00, Paper TuAT9-CC.7	Add to My Program
SACSoN: Scalable Autonomous Control for Social Navigation

Hirose, Noriaki	UC Berkeley / TOYOTA Motor North America
Shah, Dhruv	University of California, Berkeley
Sridhar, Ajay	University of California, Berkeley
Levine, Sergey	UC Berkeley
Keywords: Data Sets for Robot Learning, Machine Learning for Robot Control Abstract: Machine learning provides a powerful tool for building socially compliant robotic systems that go beyond simple predictive models of human behavior. By observing and understanding human interactions from past experiences, learning can enable effective social navigation behaviors directly from data. In this paper, our goal is to develop methods for training policies for socially unobtrusive behavior, such that robots can navigate among humans in ways that don't disturb human behavior in visual navigation using only onboard RGB observations. We introduce a definition for such behavior based on the counterfactual perturbation of the human: if the robot had not intruded into the space, would the human have acted in the same way? By minimizing this counterfactual perturbation, we can induce robots to behave in ways that do not alter the natural behavior of humans in the shared space. Instantiating this principle requires training policies to minimize their effect on human behavior, and this in turn requires data that allows us to model the behavior of humans in the presence of robots. Therefore, our approach is based on two key contributions. First, we collect a large dataset where an indoor mobile robot interacts with human bystanders. Second, we utilize this dataset to train policies that minimize counterfactual perturbation. We provide supplementary videos and make publicly available the visual navigation dataset on our project page.


TuAT11-CC Oral Session, CC-502	Add to My Program
Deep Learning for Visual Perception I

Chair: Shibata, Tomohiro	Kyushu Institute of Technology
Co-Chair: Oishi, Takeshi	The University of Tokyo

10:30-12:00, Paper TuAT11-CC.1	Add to My Program
Multi-Confidence Guided Source-Free Domain Adaption Method for Point Cloud Primitive Segmentation

Wang, Shaohu	Institute of Automation, Chinese Academy of Sciences
Tong, Yuchuang	The Institute of Automation of the Chinese Academy of Sciences
Shang, Xiuqin	Institute of Automation, Chinese Academy of Sciences
Zhang, Zhengtao	Institute of Automation, Chinese Academy of Sciences
Keywords: Deep Learning for Visual Perception, Computer Vision for Automation, RGB-D Perception Abstract: Point cloud primitive segmentation aims to segment the surface point cloud into various geometric types of primitives, which plays a vital role in robot operation and industrial automation. However, differences in object structures and shapes across industrial datasets create domain shift issues, compounded by privacy concerns preventing dataset sharing. To address these challenges, we propose a novel source-free domain adaptation method for point cloud primitive segmentation, which follows the popular pseudo-label based self-training framework. Unlike previous works using single-model uncertainty to refine pseudo labels, our method leverages multi-confidence, including transformation consistency, task confidence, and geometric saliency to provide more informative guidance. Specifically, the transformation consistency is first utilized to vote pseudo-labels and task confidences. Furthermore, to filter out high-confident noises and obtain more reliable pseudo-labels, we investigate the geometric curvature properties of primitives and propose a geometric saliency guided dynamic prototype matching and label graph aggregation strategies for pseudo-label reassignment with different task confidence. For this novel task, we construct several datasets and verify the effectiveness of the proposed methods through a series of experiments.

10:30-12:00, Paper TuAT11-CC.2	Add to My Program
FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization

Ma, Nan	Beijing University of Technology, Beijing, China
Wang, Mohan	Beijing University of Technology
Han, Yiheng	Beijing University of Technology
Liu, Yong-Jin	Tsinghua University
Keywords: Deep Learning for Visual Perception, Deep Learning Methods, Localization Abstract: Cross-modality point cloud registration is confronted with significant challenges due to inherent differences in modalities between sensors. To deal with this problem, we propose FF-LOGO: a cross-modality point cloud registration framework with Feature Filtering and LOcal-Global Optimization. The cross-modality feature correlation filtering module extracts geometric transformation-invariant features from cross-modality point clouds and achieves point selection by feature matching. We also introduce a cross-modality optimization process, including a local adaptive key region aggregation module and a global modality consistency fusion optimization module. Experimental results demonstrate that our two-stage optimization significantly improves the registration accuracy of the feature association and selection module. Our method achieves a substantial increase in recall rate compared to the current state-of-the-art methods on the 3DCSR dataset, improving from 40.59% to 75.74%. Our code will be available at https://github.com/wangmohan17/FFLOGO.

10:30-12:00, Paper TuAT11-CC.3	Add to My Program
CAPT: Category-Level Articulation Estimation from a Single Point Cloud Using Transformer

Fu, Lian	The University of Tokyo
Ishikawa, Ryoichi	The University of Tokyo
Sato, Yoshihiro	Kyoto University of Advanced Science
Oishi, Takeshi	The University of Tokyo
Keywords: Deep Learning for Visual Perception, RGB-D Perception, Recognition Abstract: The ability to estimate joint parameters is essential for various applications in robotics and computer vision. In this paper, we propose CAPT: category-level articulation estimation from a point cloud using Transformer. CAPT uses an end-to-end transformer-based architecture for joint parameter and state estimation of articulated objects from a single point cloud. The proposed CAPT methods accurately estimate joint parameters and states for various articulated objects with high precision and robustness. The paper also introduces a motion loss approach, which improves articulation estimation performance by emphasizing the dynamic features of articulated objects. Additionally, the paper presents a double voting strategy to provide the framework with coarse-to-fine parameter estimation. Experimental results on several category datasets demonstrate that our methods outperform existing alternatives for articulation estimation. Our research provides a promising solution for applying Transformer-based architectures in articulated object analysis.

10:30-12:00, Paper TuAT11-CC.4	Add to My Program
Energy-Based Detection of Adverse Weather Effects in LiDAR Data

Piroli, Aldi	Universität Ulm
Dallabetta, Vinzenz	BMW Group
Kopp, Johannes	Ulm University
Walessa, Marc	BMW Group
Meissner, Daniel	BMW Group
Dietmayer, Klaus	University of Ulm
Keywords: Deep Learning for Visual Perception, Semantic Scene Understanding, Recognition Abstract: Autonomous vehicles rely on LiDAR sensors to perceive the environment. Adverse weather conditions like rain, snow, and fog negatively affect these sensors, reducing their reliability by introducing unwanted noise in the measurements. In this work, we tackle this problem by proposing a novel approach for detecting adverse weather effects in LiDAR data. We reformulate this problem as an outlier detection task and use an energy-based framework to detect outliers in point clouds. More specifically, our method learns to associate low energy scores with inlier points and high energy scores with outliers allowing for robust detection of adverse weather effects. In extensive experiments, we show that our method performs better in adverse weather detection and has higher robustness to unseen weather effects than previous state-of-the-art methods. Furthermore, we show how our method can be used to perform simultaneous outlier detection and semantic segmentation. Finally, to help expand the research field of LiDAR perception in adverse weather, we release the SemanticSpray dataset, which contains labeled vehicle spray data in highway-like scenarios. The dataset will be available upon acceptance (see supplementary material for sample).

10:30-12:00, Paper TuAT11-CC.5	Add to My Program
EdgePoint: Efficient Point Detection and Compact Description Via Distillation

Yao, Haodi	Harbin Institute of Technology
Hao, Ning	Harbin Institute of Technology
Xie, Chen	Harbin Institute of Technology
He, Fenghua	Harbin Institute of Technology
Keywords: Deep Learning for Visual Perception, Vision-Based Navigation Abstract: Efficient interest point detection and description in images play a crucial role in many tasks such as multi-robot SLAM and collaborative localization. To facilitate fast detection and generate compact descriptions on edge devices, we introduce EdgePoint, a lightweight neural network. We design a new detection loss UnfoldSoftmax to improve inference speed. Futhermore, we propose Ortho-Alignment loss combined with LocalPCA compression to learn compact 32-dimensional descriptors. To enable efficient storage or communication, we also quantize the generated descriptors into integral values. We perform EdgePoint on various datasets, and show that it surpasses SuperPoint in performance while utilizing only 1% of the parameters and achieving up to more than 10 times faster inference speed. By applying descriptor quantization, the requirements for storage and communication can be reduced by up to 97% without performance decreasing.

10:30-12:00, Paper TuAT11-CC.6	Add to My Program
Fast and Robust Point Cloud Registration with Tree-Based Transformer

Chen, Guangyan	Beijing Institute of Technology
Wang, Meiling	Beijing Institute of Technology
Yang, Yi	Beijing Institute of Technology
Yuan, Li	Peking University
Yue, Yufeng	Beijing Institute of Technology
Keywords: Deep Learning for Visual Perception, Visual Learning Abstract: Point cloud registration is essential in computer vision and robotics. Recently, transformer-based methods have achieved advanced point cloud registration performance. However, the standard attention mechanism utilized in these methods considers many low-relevance points, and it has difficulty focusing its attention weights on sparse and meaningful points, leading to limited local structure modeling capabilities and quadratic computational complexity. To address these limitations, we present the Tree-based Transformer (TrT), which is able to extract abundant local and global features with linear computational complexity. Specifically, the TrT builds coarse-to-dense feature trees, and a novel Tree-based Attention (TrA) is proposed to guide the progressive convergence of the attended regions toward meaningful points and to structurize point clouds following tree structures. In each layer, the top S key points with the highest attention scores are selected, such that in the next layer, attention is evaluated only within the specified high-relevance regions, corresponding to the child points of these selected S points. Additionally, coarse features containing high-level semantic information are incorporated into the child points to guide the feature extraction process, facilitating local structure modeling and multiscale information integration. Consequently, TrA enables the model to focus on critical local structures and extract rich local information with linear computational complexity. Experiments demonstrate that our method achieves state-of-the-art performance on 3DMatch and KITTI benchmarks. The code for our method is publicly available at https://github.com/CGuangyan-BIT/TrT.


TuAT17-AX Oral Session, AX-205	Add to My Program
Legged Robots I

Chair: Sugihara, Tomomichi	OMRON Corporation
Co-Chair: Griffin, Robert J.	Institute for Human and Machine Cognition (IHMC)

10:30-12:00, Paper TuAT17-AX.1	Add to My Program
Walking-By-Logic: Signal Temporal Logic-Guided Model Predictive Control for Bipedal Locomotion Resilient to External Perturbations

Gu, Zhaoyuan	Georgia Institute of Technology
Guo, Rongming	Georgia Institute of Technology
Yates, William	Georgia Institute of Technology
Chen, Yipu	Georgia Institute of Technology
Zhao, Yuntian	Southern University of Science and Technology
Zhao, Ye	Georgia Institute of Technology
Keywords: Humanoid and Bipedal Locomotion, Formal Methods in Robotics and Automation, Collision Avoidance Abstract: This study proposes a novel planning framework based on a model predictive control formulation that incorporates signal temporal logic (STL) specifications for task completion guarantee and robustness quantification. This marks the first-ever study to apply STL-guided trajectory optimization for bipedal locomotion push recovery, where the robot experiences unexpected disturbances. Existing recovery strategies often struggle with complex task logic reasoning and locomotion robustness evaluation, making them susceptible to failures due to inappropriate recovery strategies or insufficient robustness. To address this issue, the STL-guided framework generates optimal and safe recovery trajectories that simultaneously satisfy the task specification and maximize the locomotion robustness. Our framework outperforms a state-of-the-art locomotion controller in a high-fidelity dynamic simulation, especially in scenarios involving crossed-leg maneuvers. Furthermore, it demonstrates versatility in tasks such as locomotion on stepping stones, where the robot must select from a set of disjointed footholds to maneuver successfully.

10:30-12:00, Paper TuAT17-AX.2	Add to My Program
Seamless Reaction Strategy for Bipedal Locomotion Exploiting Real-Time Nonlinear Model Predictive Control

Choe, JongHun	KAIST
Kim, Joon-Ha	Korea Advanced Institute of Science and Technology(KAIST)
Hong, Seungwoo	MIT (Massachusetts Institute of Technology)
Lee, Jinoh	German Aerospace Center (DLR)
Park, Hae-Won	Korea Advanced Institute of Science and Technology
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Humanoid Robot Systems Abstract: This paper presents a reactive locomotion method for bipedal robots enhancing robustness and external disturbance rejection performance by seamlessly rendering several walking strategies of the ankle, hip, and footstep adjustment. The Nonlinear Model Predictive Control (NMPC) is formulated to take into account nonlinear Divergent Component of Motion (DCM) error dynamics that predicts the future states of the robot in response to the walking strategies. This formulated NMPC enables the seamless application of these strategies improving push disturbance rejection performance. The proposed controller is validated in simulation and through an experiment on a bipedal robot platform, Gazelle, which confirms its effectiveness in real-time.

10:30-12:00, Paper TuAT17-AX.3	Add to My Program
Synthesizing Robust Walking Gaits Via Discrete-Time Barrier Functions with Application to Multi-Contact Exoskeleton Locomotion

Tucker, Maegan	Georgia Institute of Technology
Li, Kejun	California Institute of Technology
Ames, Aaron	Caltech
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Hybrid Logical/Dynamical Planning and Verification Abstract: Successfully achieving bipedal locomotion remains challenging due to real-world factors such as model uncertainty, random disturbances, and imperfect state estimation. In this work, we propose a novel metric for locomotive robustness -- the estimated size of the hybrid forward invariant set associated with the step-to-step dynamics. Here, the forward invariant set can be loosely interpreted as the region of attraction for the discrete-time dynamics. We illustrate the use of this metric towards synthesizing nominal walking gaits using a simulation-in-the-loop learning approach. Further, we leverage discrete-time barrier functions and a sampling-based approach to approximate sets that are maximally forward invariant. Lastly, we experimentally demonstrate that this approach results in successful locomotion for both flat-foot walking and multi-contact walking on the Atalante lower-body exoskeleton.

10:30-12:00, Paper TuAT17-AX.4	Add to My Program
Efficient, Dynamic Locomotion through Step Placement with Straight Legs and Rolling Contacts

Fasano, Stefan	Florida Institute for Human & Machine Cognition
Foster, James Paul	University of West Florida
Bertrand, Sylvain	Institute for Human and Machine Cognition
DeBuys, Christian	Texas A&M University
Griffin, Robert J.	Institute for Human and Machine Cognition (IHMC)
Keywords: Humanoid and Bipedal Locomotion, Legged Robots, Whole-Body Motion Planning and Control Abstract: For humans, fast, efficient walking over flat ground represents the vast majority of locomotion that an individual experiences on a daily basis, and for an effective, real-world humanoid robot the same will likely be the case. In this work, we propose a locomotion controller for efficient walking over near-flat ground using a relatively simple, model-based controller that utilizes a novel combination of several interesting design features including an ALIP-based step adjustment strategy, stance leg length control as an alternative to center of mass height control, and rolling contact for heel-to-toe motion of the stance foot. We then present the results of this controller on our robot Nadia, both in simulation and on hardware. These results include validation of this controller’s ability to perform fast, reliable forward walking at 0.75 m/s along with backwards walking, side-stepping, turning in place, and push recovery. We also present an efficiency comparison between the proposed control strategy and our baseline walking controller over three steady-state walking speeds. Lastly, we demonstrate some of the benefits of utilizing rolling contact in the stance foot, specifically the reduction of necessary positive and negative work throughout the stride.

10:30-12:00, Paper TuAT17-AX.5	Add to My Program
Unified Motion Planner for Walking, Running, and Jumping Using the Three-Dimensional Divergent Component of Motion

Mesesan, George	German Aerospace Center (DLR)
Schuller, Robert	German Aerospace Center (DLR)
Englsberger, Johannes	DLR (German Aerospace Center)
Ott, Christian	TU Wien
Albu-Schäffer, Alin	DLR - German Aerospace Center
Keywords: Humanoid and Bipedal Locomotion, Motion and Path Planning, Humanoid Robots, Legged Robots Abstract: Running and jumping are locomotion modes that allow legged robots to rapidly traverse great distances and overcome difficult terrain. In this article, we show that the 3-D Divergent Component of Motion (3D-DCM) framework, which was successfully used for generating walking trajectories in previous works, retains its validity and coherence during flight phases, and, therefore, can be used for planning running and jumping motions. We propose a highly efficient motion planner that generates stable center-of-mass (CoM) trajectories for running and jumping with arbitrary contact sequences and time parametrizations. The proposed planner constructs the complete motion plan as a sequence of motion phases that can be of different types: stance, flight, transition phases, etc. We introduce a unified formulation of the CoM and DCM waypoints at the start and end of each motion phase, which makes the framework extensible and enables the efficient waypoint computation in matrix and algorithmic form. The feasibility of the generated reference trajectories is demonstrated by extensive whole-body simulations with the humanoid robot TORO.

10:30-12:00, Paper TuAT17-AX.6	Add to My Program
Data-Driven Latent Space Representation for Robust Bipedal Locomotion Learning

Castillo, Guillermo A.	The Ohio State University
Weng, Bowen	Iowa State University
Zhang, Wei	Southern University of Science and Technology
Hereid, Ayonga	Ohio State University
Keywords: Humanoid and Bipedal Locomotion, Representation Learning, Legged Robots Abstract: This paper presents a novel framework for learning robust bipedal walking by combining a data-driven state representation with a Reinforcement Learning (RL) based locomotion policy. The framework utilizes an autoencoder to learn a low-dimensional latent space that captures the complex dynamics of bipedal locomotion from existing locomotion data. This reduced dimensional state representation is then used as states for training a robust RL-based gait policy, eliminating the need for heuristic state selections or the use of template models for gait planning. The results demonstrate that the learned latent variables are disentangled and directly correspond to different gaits or speeds, such as moving forward, backward, or walking in place. Compared to traditional template model-based approaches, our framework exhibits superior performance and robustness in simulation. The trained policy effectively tracks a wide range of walking speeds and demonstrates good generalization capabilities to unseen scenarios.

10:30-12:00, Paper TuAT17-AX.7	Add to My Program
LIKO: LiDAR, Inertial, and Kinematic Odometry for Bipedal Robots

Zhao, Qingrui	Beijing Institute of Technology
Li, Mingyuan	Beijing Institute of Technology
Shi, Yongliang	Tsinghua University
Chen, Xuechao	Beijing Insititute of Technology
Yu, Zhangguo	Beijing Institute of Technology
Han, Lianqiang	Beijing Institute of Technology
Fu, Zhenyuan	School of Mechatronic Engineering, Beijing Institute of Technolo
Zhang, Jintao	Beijing Institute of Technology
Li, Chao	Beijing Institute of Technology
Zhang, YuanXi	Beijing Institute of Technology
Huang, Qiang	Beijing Institute of Technology
Keywords: Humanoid Robot Systems, Legged Robots, Sensor Fusion Abstract: High-frequency and accurate state estimation is crucial for biped robots. This paper presents a tightly-coupled LiDAR-Inertial-Kinematic Odometry (LIKO) for biped robot state estimation based on an iterated extended Kalman filter. Beyond state estimation, the foot contact position is also modeled and estimated. This allows for both position and velocity updates from kinematic measurement. Additionally, the use of kinematic measurement results in an increased output state frequency of about 1kHz. This ensures temporal continuity of the estimated state and makes it practical for control purposes of biped robots. We also announce a biped robot dataset consisting of LiDAR, inertial measurement unit (IMU), joint encoders, force/torque (F/T) sensors, and motion capture ground truth to evaluate the proposed method. The dataset is collected during robot locomotion, and our approach reached the best quantitative result among other LIO-based methods and biped robot state estimation algorithms. The dataset and source code will be available at https://github.com/Mr-Zqr/LIKO.

10:30-12:00, Paper TuAT17-AX.8	Add to My Program
Barry: A High-Payload and Agile Quadruped Robot

Valsecchi, Giorgio	Robotic System Lab, ETH
Rudin, Nikita	ETH Zurich, NVIDIA
Nachtigall, Lennart	ETH Zurich
Mayer, Konrad	ETH Zurich
Tischhauser, Fabian	ETH Zurich
Hutter, Marco	ETH Zurich
Keywords: Legged Robots, Actuation and Joint Mechanisms, Mechanism Design Abstract: This paper introduces Barry, a dynamically balancing quadruped robot optimized for high payload capabilities and efficiency. It presents a new high-torque and low-inertia leg design, which includes custom-built high-efficiency actuators and transparent, sensorless transmissions. The robot’s reinforcement learning-based controller is trained to fully leverage the new hardware capabilities to balance and steer the robot. The newly developed controller can manage the non-linearities introduced by the new leg design and handle unmodeled payloads up to 90kg while operating at high efficiency. The approach’s efficacy is demonstrated by a high payload-to-weight ratio verified with multiple tests, with a maximum ratio of 2 on flat terrain. Experiments also demonstrate Barry’s power consumption and cost of transport, which converge to a value of 0.7 at 1.4m/s, regardless of the payload mass.


TuAT23-NT Oral Session, NT-G401	Add to My Program
Aerial Systems: Mechanics and Control I

Chair: Suzuki, Satoshi	Chiba University
Co-Chair: Ryll, Markus	Technical University Munich

10:30-12:00, Paper TuAT23-NT.1	Add to My Program
Rapid Resistography with Passive Overhead-Perching Mechanism in an Unmanned Aerial System for Wood Structure Inspection

Lee, Shawndy Michael	Singapore University of Technology and Design
Liu, Jingmin	Singapore University of Technology & Design
Chien, Jer Luen	Singapore University of Technology & Design
Ng, Wei Hien	Singapore University of Technology & Design
Lim, Milven	Singapore University of Technology & Design
Foong, Shaohui	Singapore University of Technology and Design
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control Abstract: This paper presents an aerial robotic platform for rapid remote elevated overhead-perching drill operations for wood health inspection. The platform features an innovative passive prismatic-gripper mechanism affixed to the aerial robot’s top, facilitating overhead drilling. The primary aim is to enhance the safety and efficiency of elevated wood structure inspection using the resistography method, which involves drilling into wooden structures to identify internal voids. The research centers on two key enabling technologies: a gripper mechanism for secure attachment to target surfaces and a tethered drill configuration for drilling operations. The novel gripper mechanism enables drilling on large planar surfaces and even small beam-width structures. The paper concludes with discussions on design simulations and drill resistance experiments, highlighting the effectiveness of the proposed approach in detecting internal cavities within wooden structures.

10:30-12:00, Paper TuAT23-NT.2	Add to My Program
Dual Quaternion Control of UAVs with Cable-Suspended Load

Yuan, Yuxia	Technical University of Munich
Ryll, Markus	Technical University Munich
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control Abstract: Modeling the kinematics and dynamics of robotics systems with suspended loads using dual quaternions has not been explored so far. This paper introduces a new innovative control strategy using dual quaternions for UAVs with cable-suspended loads, focusing on the sling load lifting and tracking problems. By utilizing the mathematical efficiency and compactness of dual quaternions, a unified representation of the UAV and its suspended load’s dynamics and kinematics is achieved, facilitating the realization of load lifting and trajectory tracking. The simulation results have tested the proposed strategy’s accuracy, efficiency, and robustness. This study makes a substantial contribution to present this novel control strategy that harnesses the benefits of dual quaternions for cargo UAVs. Our work also holds promise for inspiring future innovations in under-actuated systems control using dualquaternions.

10:30-12:00, Paper TuAT23-NT.3	Add to My Program
Design, Modeling and Control of a Top-Loading Fully-Actuated Cargo Transportation Multirotor

Park, Wooyong	Seoul National University of Science and Technology
Wu, Xiangyu	University of California, Berkeley
Lee, Dongjae	Seoul National University
Lee, Seung Jae	Seoul National University of Science and Technology
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Calibration and Identification Abstract: Existing multirotor-based cargo transportation does not maintain a constant cargo attitude due to underactuation; however, fragile payloads may require a consistent posture. The conventional method is also cumbersome when loading cargo, and the size of the cargo to be loaded is limited. To overcome these issues, we propose a new fully-actuated multirotor unmanned aerial vehicle platform capable of translational motion while maintaining a constant attitude. Our newly developed platform has a cubic exterior and can freely place cargo at any point on the flat top surface. However, the center-of-mass (CoM) position changes when cargo is loaded, leading to undesired attitudinal motion due to unwanted torque generation. To address this problem, we introduce a new model-free center-of-mass position estimation method named as the MOCE (Model-free Online Center-of-mass Estimation) algorithm, which is inspired by the extremum-seeking control (ESC) technique. Experimental results are presented to validate the performance of the proposed estimation method, effectively estimating the CoM position and showing satisfactory constant-attitude flight performance.

10:30-12:00, Paper TuAT23-NT.4	Add to My Program
Aerial Interaction with Tactile Sensing

Guo, Xiaofeng	Carnegie Mellon Univeristy
He, Guanqi	Carnegie Mellon University
Mousaei, Mohammadreza	Carnegie Mellon University
Geng, Junyi	Pennsylvania State University
Shi, Guanya	Carnegie Mellon University
Scherer, Sebastian	Carnegie Mellon University
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Force and Tactile Sensing Abstract: While the field of autonomous Uncrewed Aerial Vehicles (UAVs) has grown rapidly, most applications only focus on passive visual tasks. Aerial interaction aims to execute tasks involving physical interactions, which offers a way to assist humans in high-altitude and high-risk operations. Tactile sensors, being both cost-effective and lightweight, are capable of sensing contact information including force distribution, as well as recognizing local textures. In this paper, we pioneer the use of vision-based tactile sensors on fully actuated UAVs in dynamic aerial manipulation tasks. We introduce a pipeline utilizing tactile feedback for force tracking via a hybrid motion-force controller and a method for wall texture detection during aerial interactions. Our experiments demonstrate that our system can effectively replace or complement traditional force/torque (F/T) sensors. Compared with only using the F/T sensor, our approach offers two solutions: substitution with tactile sensing, achieving comparable flight performance, or integration of tactile sensing with F/T sensor feedback, leading to around 16% improvement in position tracking accuracy. Our algorithm achieves 93.4% accuracy in real-time texture recognition, which further escalates to 100% in post-contact analysis. To the best of our knowledge, this is the first work to incorporate a vision-based tactile sensor into aerial interaction tasks.

10:30-12:00, Paper TuAT23-NT.5	Add to My Program
A Meter-Scale Ornithopter Capable of Jumping Take-Off

Yan, Wei	Shanghai Jiaotong University, Shanghai, China
Chen, Genliang	Shanghai Jiao Tong University
Zhang, Zhuang	Westlake University
Wang, Hao	Shanghai Jiao Tong University
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Mechanism Design Abstract: Flapping wing air vehicles(FWAV) or ornithopters are bio-inspired aerial robots that mimic the flying principles of insects and birds. Autonomous take-off is an important capability for FWAV to enhance its performance and extend its working time, which is equipped by almost every kind of bird. As a common method of take-off for birds, jumping take-off has a great ability to adapt to different terrain and high energy efficiency compared with running and rotor-based take-off. Despite recent research, there is no FWAV capable of jumping take-off to this day. In this paper, we present a process to realize the jumping take-off of a meter-scale FWAV from flat ground. To lower the mechanical complexity, we eliminate the design of traditional robotic legs. Instead, we realize steady standing through a tripod-like structure that consists of two wings and a jumping mechanism. The flapping wing is directly driven by two independent servos. Three carbon fiber springs are employed to build a lightweight jumping module with high elastic energy. We build the dynamic model to analyze the aerodynamic effect during the jumping phase and realize a stable transition to flapping flight. This work lays the foundation for outdoor flight without human assistance.

10:30-12:00, Paper TuAT23-NT.6	Add to My Program
Autonomous Aerial Perching and Unperching Using Omnidirectional Tiltrotor and Switching Controller

Lee, Dongjae	Seoul National University
Hwang, Sunwoo	Seoul National University
Byun, Jeonghyun	Seoul National University
Lee, Seung Jae	Seoul National University of Science and Technology
Kim, H. Jin	Seoul National University
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Motion Control Abstract: Aerial unperching of multirotors has received little attention as opposed to perching that has been investigated to elongate operation time. This study presents a new aerial robot capable of both perching and unperching autonomously on/from a ferromagnetic surface during flight, and a switching controller to avoid rotor saturation and mitigate overshoot during transition between free-flight and perching. To enable stable perching and unperching maneuvers on/from a vertical surface, a lightweight (approximately 1 kg), fully actuated tiltrotor that can hover at 90-degree pitch angle is first developed. We design a perching/unperching module composed of a single servomotor and a magnet, which is then mounted on the tiltrotor. A switching controller including exclusive control modes for transitions between free-flight and perching is proposed. Lastly, we propose a simple yet effective strategy to ensure robust perching in the presence of measurement and control errors and avoid collisions with the perching site immediately after unperching. We validate the proposed framework in experiments where the tiltrotor successfully performs perching and unperching on/from a vertical surface during flight. We further show effectiveness of the proposed transition mode in the switching controller by ablation studies where large overshoot and even collision with a perching site occur. To the best of the authors' knowledge, this work presents the first autonomous aerial unperching framework using a fully actuated tiltrotor.

10:30-12:00, Paper TuAT23-NT.7	Add to My Program
RotorTM: A Flexible Simulator for Aerial Transportation and Manipulation

Li, Guanrui	New York University
Xinyang, Liu	New York University
Loianno, Giuseppe	New York University
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Simulation and Animation, Motion Control Abstract: Low-cost autonomous Micro Aerial Vehicles (MAVs) have great potential to help humans by simplifying and speeding up complex tasks, such as construction, package delivery, and search and rescue. These systems, which may consist of single or multiple vehicles, can be equipped with passive connection mechanisms such as rigid links or cables for transportation and manipulation tasks. However, these systems are inherently complex. They are often underactuated and evolve in nonlinear manifold configuration spaces. In addition, the complexity escalates for systems with cable-suspended load due to the hybrid dynamics that vary with the cables' tension conditions. This paper presents the first aerial transportation and manipulation simulator incorporating different payloads and passive connection mechanisms with full system dynamics, planning, and control algorithms. Furthermore, it includes a novel general model accounting for the transient hybrid dynamics for aerial systems with cable-suspended load to closely mimic real-world systems. Comparisons between simulations and real-world experiments with different vehicles' configurations show the fidelity of the simulator results with respect to real-world settings. The experiments also show the simulator's benefit for the rapid prototyping and transitioning of aerial transportation and manipulation systems to real-world deployment.

10:30-12:00, Paper TuAT23-NT.8	Add to My Program
Simulation and Experimental Validation of an Autonomous Perching and Takeoff Method for a Multirotor UAV on Vertical Surfaces Using a Suction Cup

Chapdelaine, Bruno	National Research Council Canada
Celce, Mathis	Polytechnique Montréal
Vidal, Charles	Aerospace Research Centre, National Research Council Canada
Birglen, Lionel	Ecole Polytechnique De Montreal
Monsarrat, Bruno	Aerospace Research Centre, National Research Council Canada
Keywords: Aerial Systems: Applications, Aerial Systems: Mechanics and Control, Software, Middleware and Programming Environments Abstract: This paper details the simulation and experimental validation of an autonomous perching and take-off method for a multirotor unmanned aerial vehicle (UAV) using a suction cup perching mechanism on vertical surfaces. The suction cup interaction with different surface types is characterized with experimental tests to accurately model the perching manoeuvre. The resulting model is used to develop a realistic hardware-in-the-loop (HIL) simulation of the perching and take-off manoeuvre of the UAV in Gazebo. A control method is developed to automate the perching and take-off manoeuvre. The method is tested in simulation and is experimentally validated with flight tests. Comparisons between simulation and experimental data demonstrate that the simulation is accurate and can be used to continue the development of autonomous perching methods.


TuAT25-NT Oral Session, NT-G403	Add to My Program
Localization I

Chair: Cho, Younggun	Inha University
Co-Chair: Oleynikova, Helen	ETH Zurich

10:30-12:00, Paper TuAT25-NT.1	Add to My Program
Salience-Guided Ground Factor for Robust Localization of Delivery Robots in Complex Urban Environments

Park, Jooyong	Inha University
Lee, Jungwoo	Inha University
Choi, Euncheol	Inha University
Cho, Younggun	Inha University
Keywords: Localization, Intelligent Transportation Systems, SLAM Abstract: In urban environments for delivery robots, particularly in areas such as campuses and towns, many custom features defy standard road semantic categorizations. Addressing this challenge, our paper introduces a method leveraging Salient Object Detection (SOD) to extract these unique features, employing them as pivotal factors for enhanced robot loop closure and localization. Traditional geometric feature-based localization is hampered by fluctuating illumination and appearance changes. Our preference for SOD over semantic segmentation sidesteps the intricacies of classifying a myriad of non-standardized urban features. To achieve consistent ground features, the Motion Compensate IPM (MC-IPM) technique is implemented, capitalizing on motion for distortion compensation and subsequently selecting the most pertinent salient ground features through moment computations. For thorough evaluation, we validated the saliency detection and localization performances to the real urban scenarios. Project page:https://sites.google.com/view/salient-ground- factor/home.

10:30-12:00, Paper TuAT25-NT.2	Add to My Program
Block-Map-Based Localization in Large-Scale Environment

Feng, Yixiao	University of New South Wales
Jiang, Zhou	Beijing Institute of Technology
Shi, Yongliang	Tsinghua University
Feng, Yunlong	ShanghaiTech University
Chen, Xiangyu	Liverpool John Moores University
Zhao, Hao	Tsinghua University
Zhou, Guyue	Tsinghua University
Keywords: Localization, Mapping Abstract: Accurate localization is an essential technology for the flexible navigation of robots in large-scale environments. Both SLAM-based and map-based localization will increase the computing load due to the increase in map size, which will affect downstream tasks such as robot navigation and services. To this end, we propose a localization system based on Block Maps (BMs) to reduce the computational load caused by maintaining large-scale maps. Firstly, we introduce a method for generating block maps and the corresponding switching strategies, ensuring that the robot can estimate the state in large-scale environments by loading local map information. Secondly, global localization according to Branch-and-Bound Search (BBS) in the 3D map is introduced to provide the initial pose. Finally, a graph-based optimization method is adopted with a dynamic sliding window that determines what factors are being marginalized whether a robot is exposed to a BM or switching to another one, which maintains the accuracy and efficiency of pose tracking. Comparison experiments are performed on publicly available large-scale datasets. Results show that the proposed method can track the robot pose even though the map scale reaches more than 6 kilometers, while efficient and accurate localization is still guaranteed on NCLT and M2DGR.

10:30-12:00, Paper TuAT25-NT.3	Add to My Program
Subsurface Feature-Based Ground Robot/Vehicle Localization Using a Ground Penetrating Radar

Li, Haifeng	Civil Aviation University of China
Guo, Jiajun	Civil Aviation University of China
Song, Dezhen	Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
Keywords: Localization, Mapping Abstract: Robot localization using subsurface features captured by Ground-Penetrating Radar (GPR) complements and improves robustness over existing common sensor modalities, as subsurface features are less sensitive to weather, season and surface scene changes. Here, we propose a novel subsurface feature-based localization method that uses only GPR measurements with a known subsurface map. An efficient feature descriptor, the dominant energy curve (DEC), is designed to identify different locations in cluttered conditions. Specifically, image processing techniques that involve background segmentation, energy point detection, and energy curve refinement are designed to extract DEC features from a 2D radargram. With DECs features obtained, a metric subsurface feature map is constructed. Finally, we perform robot localization by feature matching under a particle swarm optimization framework. We have implemented our method and tested it with the public CMU-GPR dataset. The results show that our algorithm improves accuracy and robustness with real-time performance for robot localization tasks. Specifically, the mean localization error is 0.503 m for all cases.

10:30-12:00, Paper TuAT25-NT.4	Add to My Program
Colmap-PCD: An Open-Source Tool for Fine Image-To-Point Cloud Registration

Bai, Chunge	AgiBot Technology Co. Ltd
Fu, Ruijie	Carnegie Mellon University
Gao, Xiang	AgiBot Technology Co. Ltd
Keywords: Localization, Mapping, Methods and Tools for Robot System Design Abstract: State-of-the-art techniques for monocular camera reconstruction predominantly rely on the Structure from Motion (SfM) pipeline. However, such methods often yield reconstruction outcomes that lack crucial scale information, and over time, accumulation of images leads to inevitable drift issues. In contrast, mapping methods based on LiDAR scans are popular in large-scale urban scene reconstruction due to their precise distance measurements, a capability fundamentally absent in visual-based approaches. Researchers have made attempts to utilize concurrent LiDAR and camera measurements in pursuit of precise scaling and color details within mapping outcomes. However, the outcomes are subject to extrinsic calibration and time synchronization precision. In this paper, we propose a novel cost-effective reconstruction pipeline that utilizes a pre-established LiDAR map as a fixed constraint to effectively address the inherent scale challenges present in monocular camera reconstruction. To our knowledge, our method is the first to register images onto the point cloud map without requiring synchronous capture of camera and LiDAR data, granting us the flexibility to manage reconstruction detail levels across various areas of interest. To facilitate further research in this domain, we have released Colmap-PCD, an open-source tool leveraging the Colmap algorithm, that enables precise fine-scale registration of images to the point cloud map.

10:30-12:00, Paper TuAT25-NT.5	Add to My Program
COIN-LIO: Complementary Intensity-Augmented LiDAR Inertial Odometry

Pfreundschuh, Patrick	ETH Zurich
Oleynikova, Helen	ETH Zurich
Cadena Lerma, Cesar	ETH Zurich
Siegwart, Roland	ETH Zurich
Andersson, Olov	KTH Royal Institute
Keywords: Localization, Mapping, SLAM Abstract: We present COIN-LIO, a LiDAR Inertial Odometry pipeline that tightly couples information from LiDAR intensity with geometry-based point cloud registration. The focus of our work is to improve the robustness of LiDAR-inertial odometry in geometrically degenerate scenarios, like tunnels or flat fields. We project LiDAR intensity returns into an image, and present a novel image processing pipeline that produces filtered images with improved brightness consistency within the image as well as across different scenes. We effectively leverage intensity as an additional modality, using our new feature selection scheme that detects uninformative directions in the point cloud registration and explicitly selects patches with complementary image information. Photometric error minimization in the image patches is then fused with inertial measurements and point-to-plane registration in an iterated Extended Kalman Filter. The proposed approach improves accuracy and robustness on a public dataset. We additionally publish a new dataset, that captures five real-world environments in challenging, geometrically degenerate scenes. By using the additional photometric information, our approach shows drastically improved robustness against geometric degeneracy in environments where all compared baseline approaches fail.

10:30-12:00, Paper TuAT25-NT.6	Add to My Program
MegaParticles: Range-Based 6-DoF Monte Carlo Localization with GPU-Accelerated Stein Particle Filter

Koide, Kenji	National Institute of Advanced Industrial Science and Technology
Oishi, Shuji	National Institute of Advanced Industrial Science and Technology
Yokozuka, Masashi	Nat. Inst. of Advanced Industrial Science and Technology
Banno, Atsuhiko	National Instisute of Advanced Industrial Science and Technology
Keywords: Localization, Range Sensing, SLAM Abstract: This paper presents a 6-DoF range-based Monte Carlo localization method with a GPU-accelerated Stein particle filter. To update a massive amount of particles, we propose a Gauss-Newton-based Stein variational gradient descent (SVGD) with iterative neighbor particle search. This method uses SVGD to collectively update particle states with gradient and neighborhood information, which provides efficient particle sampling. For an efficient neighbor particle search, it uses locality sensitive hashing and iteratively updates the neighbor list of each particle over time. The neighbor list is then used to propagate the posterior probabilities of particles over the neighbor particle graph. The proposed method is capable of evaluating one million particles in real-time on a single GPU and enables robust pose initialization and re-localization without an initial pose estimate. In experiments, the proposed method showed an extreme robustness to complete sensor occlusion (i.e., kidnapping), and enabled pinpoint sensor localization without any prior information.

10:30-12:00, Paper TuAT25-NT.7	Add to My Program
Tightly Coupled Range Inertial Localization on a 3D Prior Map Based on Sliding Window Factor Graph Optimization

Koide, Kenji	National Institute of Advanced Industrial Science and Technology
Oishi, Shuji	National Institute of Advanced Industrial Science and Technology
Yokozuka, Masashi	Nat. Inst. of Advanced Industrial Science and Technology
Banno, Atsuhiko	National Instisute of Advanced Industrial Science and Technology
Keywords: Localization, Range Sensing, SLAM Abstract: This paper presents a range inertial localization algorithm for a 3D prior map. The proposed algorithm tightly couples scan-to-scan and scan-to-map point cloud registration factors along with IMU factors on a sliding window factor graph. The tight coupling of the scan-to-scan and scan-to-map registration factors enables a smooth fusion of sensor ego-motion estimation and map-based trajectory correction that results in robust tracking of the sensor pose under severe point cloud degeneration and defective regions in a map. We also propose an initial sensor state estimation algorithm that robustly estimates the gravity direction and IMU state and helps perform global localization in 3- or 4-DoF for system initialization without prior position information. Experimental results show that the proposed method outperforms existing state-of-the-art methods in extremely severe situations where the point cloud data becomes degenerate, there are momentary sensor interruptions, or the sensor moves along the map boundary or into unmapped regions.

10:30-12:00, Paper TuAT25-NT.8	Add to My Program
SPOT: Point Cloud Based Stereo Visual Place Recognition for Similar and Opposing Viewpoints

Carmichael, Spencer	University of Michigan
Agrawal, Rahul	University of Michigan
Vasudevan, Ram	University of Michigan
Skinner, Katherine	University of Michigan
Keywords: Localization, SLAM Abstract: Recognizing places from an opposing viewpoint during a return trip is a common experience for human drivers. However, the analogous robotics capability, visual place recognition (VPR) with limited field of view cameras under 180 degree rotations, has proven to be challenging to achieve. To address this problem, this paper presents Same Place Opposing Trajectory (SPOT), a technique for opposing viewpoint VPR that relies exclusively on structure estimated through stereo visual odometry (VO). The method extends recent advances in lidar descriptors and utilizes a novel double (similar and opposing) distance matrix sequence matching method. We evaluate SPOT on a publicly available dataset with 6.7-7.6 km routes driven in similar and opposing directions under various lighting conditions. The proposed algorithm demonstrates remarkable improvement over the state-of-the-art, achieving up to 91.7% recall at 100% precision in opposing viewpoint cases, while requiring less storage than all baselines tested and running faster than all but one. Moreover, the proposed method assumes no a priori knowledge of whether the viewpoint is similar or opposing, and also demonstrates competitive performance in similar viewpoint cases.


TuBA1-CC Award Session, CC-Main Hall	Add to My Program
Human-Robot Interaction

Chair: Laschi, Cecilia	National University of Singapore
Co-Chair: Soh, Harold	National University of Singapore

13:30-15:00, Paper TuBA1-CC.1	Add to My Program
POLITE: Preferences Combined with Highlights in Reinforcement Learning

Holk, Simon	KTH Royal Institute of Technology
Marta, Daniel	KTH Royal Institute of Technology
Leite, Iolanda	KTH Royal Institute of Technology
Keywords: Human Factors and Human-in-the-Loop, Reinforcement Learning, Representation Learning Abstract: Many solutions to address the challenge of robot learning have been devised, namely through exploring novel ways for humans to communicate complex goals and tasks in reinforcement learning (RL) setups. One way that experienced recent research interest directly addresses the problem by considering human feedback as preferences between pairs of trajectories (sequences of state-action pairs). However, when simply attributing a single preference to a pair of trajectories that contain many agglomerated steps, key pieces of information are lost in the process. We amplify the initial definition of preferences to account for highlights: state-action pairs of relatively high information (high/low reward) within a preferred trajectory. To include the additional information, we design novel regularization methods within a preference learning framework. To this extent, we present our method which is able to greatly reduce the necessary amount of preferences, by permitting the highlighting of favoured trajectories, in order to reduce the entropy of the credit assignment. We show the effectiveness of our work in both simulation and a user study, which analyzes the feedback given and its implications. We also use the total collected feedback to train a robot policy for socially compliant trajectories in a simulated social navigation environment. We release code and video examples at https://sites.google.com/view/rl-polite

13:30-15:00, Paper TuBA1-CC.2	Add to My Program
CoFRIDA: Self-Supervised Fine-Tuning for Human-Robot Co-Painting

Schaldenbrand, Peter	Carnegie Mellon University
Parmar, Gaurav	Carnegie Mellon University
Zhu, Jun-Yan	Carnegie Mellon University
McCann, James	Carnegie Mellon University
Oh, Jean	Carnegie Mellon University
Keywords: Human-Robot Collaboration, Art and Entertainment Robotics, Deep Learning Methods Abstract: Prior robot painting and drawing work, such as FRIDA, has focused on decreasing the sim-to-real gap and expanding input modalities for users, but the interaction with these systems generally exists only in the input stages. To support interactive, human-robot collaborative painting, we introduce the Collaborative FRIDA (CoFRIDA) robot painting framework, which can co-paint by modifying and engaging with content already painted by a human collaborator. To improve text-image alignment, FRIDA's major weakness, our system uses pre-trained text-to-image models; however, pre-trained models in the context of real-world co-painting do not perform well because they (1) do not understand the constraints and abilities of the robot and (2) cannot perform co-painting without making unrealistic edits to the canvas and overwriting content. We propose a self-supervised fine-tuning procedure that can tackle both issues, allowing the use of pre-trained state-of-the-art text-image alignment models with robots to enable co-painting in the physical world. Our open-source approach, CoFRIDA, creates paintings and drawings that match the input text prompt more clearly than FRIDA, both from a blank canvas and one with human created work. More generally, our fine-tuning procedure successfully encodes the robot's constraints and abilities into a pre-trained text-to-image model, showcasing promising results as an effective method for reducing sim-to-real gaps.

13:30-15:00, Paper TuBA1-CC.3	Add to My Program
MateRobot: Material Recognition in Wearable Robotics for People with Visual Impairments

Zheng, Junwei	Karlsruhe Institute of Technology
Zhang, Jiaming	Karlsruhe Institute of Technology
Yang, Kailun	Hunan University
Peng, Kunyu	Karlsruhe Institute of Technology
Stiefelhagen, Rainer	Karlsruhe Institute of Technology
Keywords: Human-Centered Robotics, Object Detection, Segmentation and Categorization Abstract: People with Visual Impairments (PVI) typically recognize objects through haptic perception. Knowing objects and materials before touching is desired by the target users but under-explored in the field of human-centered robotics. To fill this gap, in this work, a wearable vision-based robotic system, MateRobot, is established for PVI to recognize materials and object categories beforehand. To address the computational constraints of mobile platforms, we propose a lightweight yet accurate model MateViT to perform pixel-wise semantic segmentation, simultaneously recognizing both objects and materials. Our methods achieve respective 40.2% and 51.1% of mIoU on COCOStuff-10K and DMS datasets, surpassing the previous method with +5.7% and +7.0% gains. Moreover, on the field test with participants, our wearable system reaches a score of 28 in the NASA-Task Load Index, indicating low cognitive demands and ease of use. Our MateRobot demonstrates the feasibility of recognizing material property through visual cues and offers a promising step towards improving the functionality of wearable robots for PVI. The source code has been made publicly available at https://junweizheng93.github.io/publications/MATERobot/MATE Robot.html.

13:30-15:00, Paper TuBA1-CC.4	Add to My Program
Robot-Assisted Navigation for Visually Impaired through Adaptive Impedance and Path Planning

Balatti, Pietro	Istituto Italiano Di Tecnologia
Ozdamar, Idil	HRI2 Lab., Istituto Italiano Di Tecnologia. Dept. of Informatics
Sirintuna, Doganay	HRI2 Lab., Istituto Italiano Di Tecnologia. Dept. of Informatics
Fortini, Luca	Istituto Italiano Di Tecnologia
Leonori, Mattia	Istituto Italiano Di Tecnologia
Gandarias, Juan M.	University of Malaga
Ajoudani, Arash	Istituto Italiano Di Tecnologia
Keywords: Human-Centered Robotics, Human-Aware Motion Planning, Physical Human-Robot Interaction Abstract: This paper presents a framework to navigate visually impaired people through unfamiliar environments by means of a mobile manipulator. The Human-Robot system consists of three key components: a mobile base, a robotic arm, and the human subject who gets guided by the robotic arm via physically coupling their hand with the cobot's end-effector. These components, receiving a goal from the user, traverse a collision-free set of waypoints in a coordinated manner, while avoiding static and dynamic obstacles through an obstacle avoidance unit and a novel human guidance planner. With this aim, we also present a legs tracking algorithm that utilizes 2D LiDAR sensors integrated into the mobile base to monitor the human pose. Additionally, we introduce an adaptive pulling planner responsible for guiding the individual back to the intended path if they veer off course. This is achieved by establishing a target arm end-effector position and dynamically adjusting the impedance parameters in real-time through a impedance tuning unit. To validate the framework we present a set of experiments both in laboratory settings with 12 healthy blindfolded subjects and a proof-of-concept demonstration in a real-world scenario.

13:30-15:00, Paper TuBA1-CC.5	Add to My Program
Incremental Learning of Full-Pose Via-Point Movement Primitives on Riemannian Manifolds

Daab, Tilman	Karlsruhe Institute of Technology (KIT)
Jaquier, Noémie	Karlsruhe Institute of Technology (KIT)
Dreher, Christian R. G.	Karlsruhe Institute of Technology (KIT)
Meixner, Andre	Karlsruhe Institute of Technology (KIT)
Krebs, Franziska	Karlsruhe Institute of Technology (KIT)
Asfour, Tamim	Karlsruhe Institute of Technology (KIT)
Keywords: Incremental Learning, Learning from Demonstration Abstract: Movement primitives (MPs) are compact representations of robot skills that can be learned from demonstrations and combined into complex behaviors. However, merely equipping robots with a fixed set of innate MPs is insufficient to deploy them in dynamic and unpredictable environments. Instead, the full potential of MPs remains to be attained via adaptable, large-scale MP libraries. In this paper, we propose a set of seven fundamental operations to incrementally learn, improve, and re-organize MP libraries. To showcase their applicability, we provide explicit formulations of the spatial operations for libraries composed of Via-Point Movement Primitives (VMPs). By building on Riemannian manifold theory, our approach enables the incremental learning of all parameters of position and orientation VMPs within a library. Moreover, our approach stores a fixed number of parameters, thus complying with the essential principles of incremental learning. We evaluate our approach to incrementally learn a VMP library from sequentially-provided motion capture data.

13:30-15:00, Paper TuBA1-CC.6	Add to My Program
Supernumerary Robotic Limbs to Support Post-Fall Recoveries for Astronauts

Ballesteros, Erik	Massachusetts Institute of Technology
Lee, Sang-Yoep	Seoul National University
Carpenter, Kalind	Jet Propulsion Laboratory
Asada, Harry	MIT
Keywords: Human Performance Augmentation, Human-Robot Collaboration, Physical Human-Robot Interaction Abstract: This paper proposes the utilization of Supernumerary Robotic Limbs (SuperLimbs) for augmenting astronauts during an Extra-Vehicular Activity (EVA) in a partial-gravity environment. We investigate the effectiveness of SuperLimbs in assisting astronauts to their feet following a fall. Based on preliminary observations from a pilot human study, we categorized post-fall recoveries into a sequence of statically stable poses called ``waypoints". The paths between the waypoints can be modeled with a simplified kinetic motion applied about a specific point on the body. Following the characterization of post-fall recoveries, we designed a task-space impedance control with high damping and low stiffness, where the SuperLimbs provide an astronaut with assistance in post-fall recovery while keeping the human-in-the-loop scheme. In order to validate this control scheme, a full-scale wearable analog space suit was constructed and tested with a SuperLimbs prototype. Results from the experimentation found that without assistance, astronauts would impulsively exert themselves to perform a post-fall recovery, which resulted in high energy consumption and instabilities maintaining an upright posture, concurring with prior NASA studies. When the SuperLimbs provided assistance, the astronaut's energy consumption and deviation in their tracking as they performed a post-fall recovery was reduced considerably.


TuBA2-CC Award Session, CC-301	Add to My Program
Mechanisms and Design

Chair: Chen, I-Ming	Nanyang Technological University
Co-Chair: Yang, Guilin	Ningbo Institute of Material Technology and Engineering, Chinese Academy of Sciences

13:30-15:00, Paper TuBA2-CC.1	Add to My Program
Lissajous Curve-Based Vibrational Orbit Control of a Flexible Vibrational Actuator with a Structural Anisotropy

Miyazaki, Yuto	Graduate School of Engineering, Osaka University
Higashimori, Mitsuru	Osaka University
Keywords: Flexible Robotics Abstract: This paper proposes a novel flexible vibrational actuator with a structural anisotropy and its control method to diversify the vibrational behavior. First, the analytical model of the proposed actuator, which comprises a rectangular cross-sectional flexible beam and a rotational-type motor, is introduced. Regarding the structural anisotropy, the rotational axis of the motor is nonparallel to both principal axes of bending stiffness of the beam. Then, the vibrational phenomenon of the actuator is theoretically revealed. It is shown that using the synthetic wave input constituting two sine waves based on the resonance frequencies for the principal axes of the beam, the vibrational orbit of the tip of the beam can be controlled in the same manner as the Lissajous curve. Finally, the proposed method is experimentally validated. The Lissajous curve-based vibrational orbit control is performed using a prototype actuator. Furthermore, an application to underactuated-type locomotor is demonstrated.

13:30-15:00, Paper TuBA2-CC.2	Add to My Program
Dynamic Modeling of Wing-Assisted Inclined Running with a Morphing Multi-Modal Robot

Sihite, Eric	California Institute of Technology
Ramezani, Alireza	Northeastern University
Morteza, Gharib	CALTECH
Keywords: Biologically-Inspired Robots, Motion Control, Dynamics Abstract: Robot designs can take many inspirations from nature, where there are many examples of highly resilient and fault-tolerant locomotion strategies to navigate complex terrains by using multi-functional appendages. For example, Chukar and Hoatzin birds can repurpose their wings for quadrupedal walking and wing-assisted incline running (WAIR) to climb steep surfaces. We took inspiration from nature and designed a morphing robot with multi-functional thruster-wheel appendages that allows the robot to change its mode of locomotion by transforming into a rover, quad-rotor, mobile inverted pendulum (MIP), and other modes. In this work, we derive a dynamic model and formulate a nonlinear model predictive controller to perform WAIR to showcase the unique capabilities of our robot. We implemented the model and controller in a numerical simulation and experiments to show their feasibility and the capabilities of our transforming multi-modal robot.

13:30-15:00, Paper TuBA2-CC.3	Add to My Program
Design and Modeling of a Nested Bi-Cavity-Based Soft Growing Robot for Grasping in Constrained Environments

Yong, Haochen	Huazhong University of Science and Technology
Xu, Fukang	Huazhong University of Science and Technology
Li, Chenfei	Huazhong University of Science and Technology
Ding, Han	Huazhong University of Science and Technology
Wu, Zhigang	Huazhong University of Science and Technology
Keywords: Soft Robot Applications, Soft Robot Materials and Design, Grasping Abstract: Soft growing robots with unique navigation (tip extension by eversion) hold great promise in rescue, medical, and industrial applications. Equipping them with grasping capability would enhance their usefulness in constrained environments for various applications. However, in traditional designs, the tip’s eversion naturally conflicts with grasping, and the addition of grippers at the tip would limit navigation inevitably in constrained environments. To realize grasping in such scenes without extra devices, we propose a nested bi-cavity-based growing soft robot (BIBOT). The new design consists of two coaxially nested cavities, where the inner and outer cavities extend synchronously by inversion and eversion of the film rolls. Such a bi-cavity design enables the BIBOT to navigate and grasp without relative movements between the body and environment, and avoids contact between the object and its surroundings as well. Further, a kinematics model is established and verified to precisely control its lengthening and steering by a feed mechanism. Finally, its capability in a constrained environment is demonstrated by navigating and grasping an object in a curved pipe with a variable internal diameter.

13:30-15:00, Paper TuBA2-CC.4	Add to My Program
Optimized Design and Fabrication of Skeletal Muscle Actuators for Bio-Syncretic Robots

Yang, Lianchao	Shenyang Institute of Automation, Chinese Academy of Sciences
Zhang, Chuang	Shenyang Institute of Automation Chinese Academy of Sciences
Wang, Ruiqian	Shenyang Institute of Automation, Chinese Academy of Sciences
Zhang, Yiwei	Shenyang Institute of Automation, Chinese Academy of Sciences
Liu, Lianqing	Shenyang Institute of Automation
Keywords: Biologically-Inspired Robots, Soft Robot Materials and Design, Soft Sensors and Actuators Abstract: In recent years, bio-syncretic robots actuated by living materials have received widespread attention. Among the common living materials, engineered skeletal muscle tissue (eSKT) has been the focus of researchers due to its high contraction force and good controllability. However, the current performance of eSKT is far from that of natural skeletal muscle tissue. In this paper, an optimized design method for eSKTs has been proposed. By combining simulation analysis with experiments, the eSKTs with multiple strips have been developed. The results show that under a specific volume (250 μL), the optimized strip structures can enhance the stability of eSKT and facilitate the penetration of nutrients and oxygen, leading to improved fusion of myoblasts and the directional arrangement of myotubes, thus improving the performance of eSKT. The eSKT with multiple strips exhibits a significant contraction force and has been successfully utilized in a bio-syncretic robot to demonstrate its actuation capability. This work may provide insights into the development of the field of bio-syncretic robots and even tissue engineering.


TuBT7-CC Oral Session, CC-416	Add to My Program
Imitation Learning

Chair: Johns, Edward	Imperial College London
Co-Chair: Bıyık, Erdem	University of Southern California

13:30-15:00, Paper TuBT7-CC.1	Add to My Program
Overparametrization Helps Offline-To-Online Generalization of Closed-Loop Control from Pixels

Lechner, Mathias	Massachusetts Institute of Technology
Hasani, Ramin	Massachusetts Institute of Technology (MIT)
Amini, Alexander	Massachusetts Institute of Technology
Wang, Tsun-Hsuan	Massachusetts Institute of Technology
Henzinger, Thomas	IST Austria
Rus, Daniela	MIT
Keywords: Imitation Learning, Deep Learning Methods, Representation Learning Abstract: There is an ever-growing zoo of modern neural network models that can efficiently learn end-to-end control from visual observations. These advanced deep models, ranging from convolutional to vision transformers, from small to gigantic networks, have been extensively tested on offline image classification tasks. In this paper, we study these vision models with respect to the open-loop training to closed-loop generalization abilities, i.e., deployment realizes a causal feedback loop that is not present during training. This causality gap typically emerges in robotics applications such as autonomous driving, where a network is trained to imitate the control commands of a human. In this setting, two situations arise: 1) Closed-loop testing in-distribution, where the test environment shares properties with those of offline training data. 2) Closed-loop testing under distribution shifts and out-of-distribution. Contrary to recently reported results, we show that emph{under proper training guidelines}, all vision architectures perform indistinguishably well on in-distribution deployment, resolving the causality gap. In situation 2, We observe that scale is the strongest factor in improving closed-loop generalization regardless of the choice of the model architecture. Our results predict the trend that in the future we will see larger and larger models being used in offline-training-online-deployment imitation learning tasks in robotic applications.

13:30-15:00, Paper TuBT7-CC.2	Add to My Program
Hierarchical Human-To-Robot Imitation Learning for Long-Horizon Tasks Via Cross-Domain Skill Alignment

Lin, Zhenyang	University of Chinese Academy of Sciences
Chen, Yurou	Chinese Academy of Sciences
Liu, Zhiyong	Institute of Automation Chinese Academy of Sciences
Keywords: Imitation Learning, Deep Learning Methods, Representation Learning Abstract: For a general-purpose robot, it is desirable to imitate human demonstration videos that can effectively solve long-horizon tasks and perform novel ones. Recent advances in skill-based imitation learning have shown that extracting skill embedding from raw human videos is a promising paradigm to enable robots to cope with long-horizon tasks. However, generalization to unseen tasks in a different domain with a human prompt video poses a significant challenge due to the big embodiment and environment difference. To this end, we present Hierarchical Human-to-Robot Imitation Learning (H2RIL) that learns the mapping of cross-domain sensorimotor skills and utilizes it to generalize to unseen tasks given a human video in a different environment. To allow for generalizing zero-shot across environments and embodiments, H2RIL leverages task-agnostic play data for low-level policy training and paired human-robot data for both semantic and temporal skill embedding alignment. Extensive experiments in a simulated kitchen environment demonstrate that H2RIL significantly outperforms other prior baselines and is capable of generalizing to composable new tasks and adapting to Out-of-Distribution (OOD) tasks.

13:30-15:00, Paper TuBT7-CC.3	Add to My Program
Policy Optimization by Looking Ahead for Model-Based Offline RL

Liu, Yang	The University of Hong Kong
Hofert, Marius	The University of Hong Kong
Keywords: Reinforcement Learning, Deep Learning Methods, Planning under Uncertainty Abstract: Offline reinforcement learning (RL) aims to optimize the policy, based on pre-collected data, to maximize the cumulative rewards after performing a sequence of actions. Existing approaches learn a value function from historical data, then guide the update of policy parameters by maximizing the value function at a single time. Driven by the gap between maximizing the cumulative rewards of RL and the greedy strategy of existing methods, we propose an approach of policy optimization by looking ahead (POLA) to mitigate the gap. Concretely, we optimize the policy on both current and future states where the future states are predicted by a transition model. A trajectory contains numerous actions before the task is done. Performing the best action at each time does not mean an optimal trajectory in the end. We need to allow sub-optimal or negative actions occasionally. But existing methods focus on generating the optimal action at each time according to the maximizing Q-value principle. This motivates our looking ahead approach. Besides, hidden confounding factors may affect the decision making process. To that end, we incorporate the correlations among dimensions of the state into the policy, providing more information about the environment for the policy to make decisions. Empirical results on the Mujoco dataset show the effectiveness of the proposed approach.

13:30-15:00, Paper TuBT7-CC.4	Add to My Program
DINOBot: Robot Manipulation Via Retrieval and Alignment with Vision Foundation Models

Di Palo, Norman	Imperial College London
Johns, Edward	Imperial College London
Keywords: Imitation Learning, Perception for Grasping and Manipulation, Deep Learning in Grasping and Manipulation Abstract: We propose DINOBot, a novel imitation learning framework for robot manipulation, which leverages the image-level and pixel-level capabilities of features extracted from Vision Transformers trained with DINO. When interacting with a novel object, DINOBot first uses these features to retrieve the most visually similar object experienced during human demonstrations, and then uses this object to align its end-effector with the novel object to enable effective interaction. Through a series of real-world experiments on everyday tasks, we show that exploiting both the image-level and pixel-level properties of visual foundation models enables unprecedented learning efficiency and generalisation. Videos and code are available at https://www.robot-learning.uk/dinobot

13:30-15:00, Paper TuBT7-CC.5	Add to My Program
Rank2Reward: Learning Shaped Reward Functions from Passive Video

Yang, Daniel	Massachusetts Institute of Technology
Tjia, Davin	University of Washington
Herman Berg, Jacob	University of Washington
Damen, Dima	University of Bristol
Agrawal, Pulkit	MIT
Gupta, Abhishek	University of Washington
Keywords: Imitation Learning, Reinforcement Learning, Machine Learning for Robot Control Abstract: Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation puts a heavy burden on human supervisors. In contrast to this paradigm, it is often significantly easier to provide raw, action-free visual data of tasks being performed. Moreover, this data can even be mined from video datasets or the web. Ideally, this data can serve to guide robot learning for new tasks in novel environments, informing both "what" to do and "how" to do it. A powerful way to encode both the "what" and the "how" is to infer a well-shaped reward function for reinforcement learning. The challenge is determining how to ground visual demonstration inputs into a well-shaped and informative reward function. We propose a technique Rank2Reward for learning behaviors from videos of tasks being performed without access to any low-level states and actions. We do so by leveraging the videos to learn a reward function that measures incremental "progress" through a task by learning how to temporally rank the video frames in a demonstration. By inferring an appropriate ranking, the reward function is able to guide reinforcement learning by indicating when task progress is being made. This ranking function can be integrated into an adversarial imitation learning scheme resulting in an algorithm that can learn behaviors without exploiting the learned reward function. We demonstrate the effectiveness of Rank2Reward at learning behaviors from raw video on a number of tabletop manipulation tasks in both simulations and on a real-world robotic arm. We also demonstrate how Rank2Reward can be easily extended to be applicable to web-scale video datasets.

13:30-15:00, Paper TuBT7-CC.6	Add to My Program
A Generalized Acquisition Function for Preference-Based Reward Learning

Ellis, Evan	UC Berkeley
Ghosal, Gaurav	Carnegie Mellon University
Russell, Stuart Jonathan	University of California, Berkeley
Dragan, Anca	University of California Berkeley
Bıyık, Erdem	University of Southern California
Keywords: Imitation Learning, Reinforcement Learning, Machine Learning for Robot Control Abstract: Preference-based reward learning is a popular technique for teaching robots and autonomous systems how a human user wants them to perform a task. Previous works have shown that actively synthesizing preference queries to maximize information gain about the reward function parameters improves data efficiency. The information gain criterion focuses on precisely identifying all parameters of the reward function. This can potentially be wasteful as many parameters may result in the same reward, and many rewards may result in the same behavior in the downstream tasks. Instead, we show that it is possible to optimize for learning the reward function up to a behavioral equivalence class, such as inducing the same ranking over behaviors, distribution over choices, or other related definitions of what makes two rewards similar. We introduce a tractable framework that can capture such definitions of similarity. Our experiments in a synthetic environment, an assistive robotics environment with domain transfer, and a natural language processing problem with real datasets demonstrate the superior performance of our querying method over the state-of-the-art information gain method.

13:30-15:00, Paper TuBT7-CC.7	Add to My Program
Human-Robot Deformation Manipulation Skill Transfer: Sequential Fabric Unfolding Method for Robots

Fu, Tianyu	Shandong University
Bai, Yunfeng	Shandong University
Li, Cheng	Shandong University
Li, Fengming	Shandong Jianzhu University
Wang, Chaoqun	Shandong University
Song, Rui	Shandong University
Keywords: Imitation Learning, Learning from Demonstration, Manipulation Planning Abstract: Deformable object manipulation has been considered a challenging task for robots for its complex dynamics and the infinite-dimensional configuration space. Fabric unfolding manipulation takes on critical significance in the textile industry and household services. Accordingly, enabling robots to possess the above-mentioned skill has been confirmed as a crucial and challenging task. In this study, a general framework is developed for transferring human skills to robots in fabric unfolding manipulation. The developed framework comprises two key components (i.e., behavior cloning to learn human unfolding policy and learning from demonstration to transfer unfolding actions). A mixture density network is introduced, with the aim of addressing the multimodality in human policy. Moreover, task parameter weighting is considered during action generalization to adapt to a wide variety of unfolding scenarios. As revealed by the experimental results of this study, the framework can successfully unfold fabrics of different colors and sizes, and its performance can be comparable to human-level operation. Furthermore, the framework also can be applied to garment unfolding, and experiments suggest that it exhibits generalization.

13:30-15:00, Paper TuBT7-CC.8	Add to My Program
Model Optimization in Deep Learning Based Robot Control for Autonomous Driving

Paniego, Sergio	Universidad Rey Juan Carlos
Paliwal, Nikhil	Saarland University, Germany
Cañas, José M.	Universidad Rey Juan Carlos
Keywords: Imitation Learning, Deep Learning for Visual Perception, Machine Learning for Robot Control Abstract: Deep Learning (DL) has been successfully used in robotics for perception tasks and end-to-end robot control. In the context of autonomous driving, this work explores and compares a variety of alternatives for model optimization to solve the visual lane-follow application in urban scenarios with an imitation learning approach. The optimization techniques include quantization, pruning, fine-tuning (retraining), and clustering, covering all the options available at the most common DL frameworks. TensorRT optimization for specific cutting-edge hardware devices has been also explored. For the comparison, offline metrics such as mean squared error and inference time are used. In addition, the optimized models have been evaluated in an online fashion using the autonomous driving state-of-the-art simulator CARLA and an assessment tool called Behavior Metrics, which provides holistic quantitative fine-grain data about robot performance. Typically the performance of robot applications depends both on the quality of the control decisions and also on their frequency. The studied optimized models significantly increase inference frequency without losing decision quality. The impact of each optimization alone has also been measured. This speed-up allows us to successfully run DL robot-control applications even in limited computing hardware. All the work presented here is open-source, including models, weights, assessment tool, and dataset, for easy replication and extension.


TuBT9-CC Oral Session, CC-419	Add to My Program
Vision-Based Navigation

Chair: Morimitsu, Henrique	University of Science and Technology Beijing
Co-Chair: Fischer, Tobias	Queensland University of Technology

13:30-15:00, Paper TuBT9-CC.1	Add to My Program
Exploitation-Guided Exploration for Semantic Embodied Navigation

Wasserman, Justin	University of Illinois at Urbana-Champaign
Chowdhary, Girish	University of Illinois at Urbana Champaign
Gupta, Abhinav	Carnegie Mellon University
Jain, Unnat	Indian Institute of Technology Kanpur
Keywords: Vision-Based Navigation Abstract: In the recent progress in embodied navigation and sim-to-robot transfer, modular policies have emerged as a de facto framework. However, there is more to compositionality beyond the decomposition of the learning load into modular components. In this work, we investigate a principled way to syntactically combine these components. Particularly, we propose Exploitation-Guided Exploration (XGX) where separate modules for exploration and exploitation come together in a novel and intuitive manner. We configure the exploitation module to take over in the deterministic final steps of navigation i.e. when the goal becomes visible. Crucially, an exploitation module teacher-forces the exploration module and continues driving an overridden policy optimization. XGX, with effective decomposition and novel guidance, improves the state-of-the-art performance on the challenging object navigation task from 70% to 73%. Along with better accuracy, through targeted analysis, we show that XGX is also more efficient at goal-conditioned exploration. Finally, we show sim-to-real transfer to robot hardware and XGX performs over two-fold better than the best baseline from simulation benchmarking.

13:30-15:00, Paper TuBT9-CC.2	Add to My Program
Teach and Repeat Navigation: A Robust Control Approach

Nourizadeh, Payam	QUT Centre for Robotics
Milford, Michael J	Queensland University of Technology
Fischer, Tobias	Queensland University of Technology
Keywords: Vision-Based Navigation Abstract: Robot navigation requires an autonomy pipeline that is robust to environmental changes and effective in varying conditions. Teach and Repeat (T&R) navigation has shown high performance in autonomous repeated tasks under challenging circumstances, but research within T&R has predominantly focused on motion planning as opposed to robust motion control. In this paper, we propose a novel T&R system based on a robust motion control technique for a skid-steering mobile robot using sliding-mode control that effectively handles uncertainties due to sensor noises, parametric uncertainties, and wheel-terrain interaction. We theoretically demonstrate that the proposed T&R system is globally stable and robust while considering the uncertainties of the closed-loop system. When deployed on a Clearpath Jackal robot, we show the global stability of the proposed system in both indoor and outdoor environments covering different terrains, outperforming previous state-of-the-art methods that had a higher mean average trajectory error and became unstable in these challenging environments. This paper makes an important step towards long-term autonomous T&R navigation with ensured safety guarantees.

13:30-15:00, Paper TuBT9-CC.3	Add to My Program
Real-Time Localization for Closed-Loop Control of Assistive Furniture

Tang, Lixuan	EPFL
Ning, Chuanfang	EPFL
Adaimi, George	École Polytechnique Fédérale De Lausanne (EPFL)
Ijspeert, Auke	EPFL
Alahi, Alexandre	EPFL
Bolotnikova, Anastasia	EPFL
Keywords: Vision-Based Navigation, Localization, Object Detection, Segmentation and Categorization Abstract: For people with limited mobility, navigating in cluttered indoor environments is challenging. In this work, we propose a mobile assistive furniture suite that is designed to ease the life of people with special needs in indoor movement. To enable intelligent coordination of this system, a key component is the localization of each mobile furniture. The challenge is to assess the state of an arbitrary living environment so that the estimation can be used as a realtime feedback signal for autonomous closed-loop control of mobile furniture. We propose a perception pipeline that addresses these challenges. A machine learning model is designed and trained to jointly achieve multi-object semantic keypoint detection and classification in camera images. The synthetic data generation is employed to augment the training set and boost the model performance. A robust point cloud registration uses the detected semantic keypoints and depth information to estimate poses of the furniture. Tracking is applied to achieve smooth estimation. A high-performance accelerator that optimizes the efficiency of using heterogeneous devices is applied to achieve real-time performance. This visual perception pipeline is used in closed-loop control to steer the mobile furniture from initial to a desired location demonstrated in experiments on real hardware.

13:30-15:00, Paper TuBT9-CC.4	Add to My Program
Uncertainty-Aware Hybrid Paradigm of Nonlinear MPC and Model-Based RL for Offroad Navigation: Exploration of Transformers in the Predictive Model

Lotfi, Faraz	McGill University
Virji, Khalil	McGill University
Faraji, Farnoosh	McGill University
Berry, Lucas	McGill University
Holliday, Andrew	McGill University
Meger, David Paul	McGill University
Dudek, Gregory	McGill University
Keywords: Vision-Based Navigation, Planning under Uncertainty, Optimization and Optimal Control Abstract: In this paper, we investigate a hybrid scheme that combines nonlinear model predictive control (MPC) and model-based reinforcement learning (RL) for navigation planning of an autonomous model car across offroad, unstructured terrains without relying on predefined maps. Our innovative approach takes inspiration from BADGR, an LSTM-based network that primarily concentrates on environment modeling, but distinguishes itself by substituting LSTM modules with transformers to greatly elevate the performance our model. Addressing uncertainty within the system, we train an ensemble of predictive models and estimate the mutual information between model weights and outputs, facilitating dynamic horizon planning through the introduction of variable speeds. Further enhancing our methodology, we incorporate a nonlinear MPC controller that accounts for the intricacies of the vehicle's model and states. The model-based RL facet produces steering angles and quantifies inherent uncertainty. At the same time, the nonlinear MPC suggests optimal throttle settings, striking a balance between goal attainment speed and managing model uncertainty influenced by velocity. In the conducted studies, our approach excels over the existing baseline by consistently achieving higher metric values in predicting future events and seamlessly integrating the vehicle's kinematic model for enhanced decision-making.

13:30-15:00, Paper TuBT9-CC.5	Add to My Program
Robot Navigation in Unseen Environments Using Coarse Maps

Xu, Chengguang	Northeastern University
Amato, Christopher	Northeastern University
Wong, Lawson L.S.	Northeastern University
Keywords: Vision-Based Navigation, Localization Abstract: Metric occupancy maps are widely used in autonomous robot navigation systems. However, when a robot is deployed in an unseen environment, building an accurate metric map is time-consuming. Can an autonomous robot directly navigate in previously unseen environments using coarse maps? In this work, we propose the Coarse Map Navigator (CMN), a navigation framework that can perform robot navigation in unseen environments using different coarse maps. To do so, CMN addresses two challenges: (1) novel and realistic visual observations; (2) error and misalignment on coarse maps. To tackle novel visual observations in unseen environments, CMN learns a deep perception model that maps the visual input from various pixel spaces to the local occupancy grid space. To tackle the error and misalignment on coarse maps, CMN extends the Bayesian filter and maintains a belief directly on coarse maps using the predicted local occupancy grids as observations. Using the latest belief, CMN extracts a global heuristic vector that guides the planner to find a local navigation action. Empirical results demonstrate that CMN achieves high navigation success rates in unseen environments, significantly outperforming baselines, and is robust to different coarse maps.

13:30-15:00, Paper TuBT9-CC.6	Add to My Program
Bicode: A Hybrid Blinking Marker System for Event Cameras

Kitade, Takuya	Ntt Docomo, Inc
Yamada, Wataru	Ntt Docomo, Inc
Ochiai, Keiichi	Ntt Docomo, Inc
Imai, Michita	Keio University
Keywords: Vision-Based Navigation, Recognition, Localization Abstract: In the field of robotics, tag systems play an important role in various applications, such as object identification and robot control in real-world environments. While typical visual markers use two-dimensional (2D) patterns and RGB cameras for recognizing object IDs and poses, achieving long-distance recognition necessitates increasing marker size and camera magnification to ensure the required resolution. Furthermore, the growing adoption of event cameras in robotics captures rapid changes in pixel brightness but faces limitations in recognizing stationary 2D markers. Although compact blinker markers using blinking light-emitting diodes (LEDs) achieve long-distance recognition, they are constrained by the number of IDs or recognition speed when used with standard RGB cameras. In addition, recognizing object pose using only a single blinking LED presents challenges. To address these challenges, we introduce ‘Bicode,’ an indoor visual marker designed for event cameras. Bicode seamlessly integrates 2D and blinker markers within a single marker unit.We have developed prototypes of 2.5, 5, and 10 cm square acrylic 2D markers, each equipped with a single LED blinking at 1 kHz, enabling recognition with an event camera. Our experiments revealed the effects of marker size, LED light quantity, recognition distance, and angle, external lighting conditions, and camera or marker movement on accuracy. Notably, using the 5 cm marker, we confirmed its compatibility to recognize IDs at distances exceeding 20 m, and pose recognition at 2.5 m was confirmed.

13:30-15:00, Paper TuBT9-CC.7	Add to My Program
RAPIDFlow: Recurrent Adaptable Pyramids with Iterative Decoding for Efficient Optical Flow Estimation

Morimitsu, Henrique	University of Science and Technology Beijing
Xiaobin, Zhu	University of Science and Technology Beijing
Marcondes Cesar Junior, Roberto	University of São Paulo USP
Ji, Xiangyang	Tsinghua University
Yin, Xu-Cheng	University of Science and Technology Beijing
Keywords: Vision-Based Navigation, Visual Tracking, Computer Vision for Automation Abstract: Extracting motion information from videos with optical flow estimation is vital in multiple practical robot applications. Current optical flow approaches show remarkable accuracy, but top-performing methods have high computational costs and are unsuitable for embedded devices. Although some previous works have focused on developing low-cost optical flow strategies, their estimation quality has a noticeable gap with more robust methods. In this paper, we develop a novel method to efficiently estimate high-quality optical flow in embedded devices. Our proposed RAPIDFlow model combines efficient NeXt1D convolution blocks with a fully recurrent structure based on feature pyramids to decrease computational costs without significantly impacting estimation accuracy. The adaptable recurrent encoder produces multi-scale features with a single shared block, which allows us to adjust the pyramid length at inference time and make it more robust to changes in input size. Also, it enables our model to offer multiple tradeoffs between accuracy and speed to suit different applications. Experiments using a Jetson Orin NX embedded system on the MPI-Sintel and KITTI public benchmarks show that RAPIDFlow outperforms previous approaches by significant margins at faster speeds.

13:30-15:00, Paper TuBT9-CC.8	Add to My Program
LOC-ZSON: Language-Driven Object-Centric Zero-Shot Object Retrieval and Navigation

Guan, Tianrui	University of Maryland
Yang, Yurou	Amazon
Cheng, Harry	Amazon
Lin, Muyuan	Amazon.com LLC
Kim, Richard	Amazon, Lab126
Madhivanan, Rajasimman	Amazon.com
Sen, Arnab	Amazon
Manocha, Dinesh	University of Maryland
Keywords: Vision-Based Navigation, AI-Enabled Robotics, Computer Vision for Automation Abstract: In this paper, we present LOC-ZSON, a novel Language-driven Object-Centric image representation for object navigation task within complex scenes. We propose an object-centric image representation and corresponding losses for visual-language model (VLM) fine-tuning, which can handle complex object-level queries. In addition, we design a novel LLM-based augmentation and prompt templates for stability during training and zero-shot inference. We implement our method on Astro robot and deploy it in both simulated and real-world environments for zero-shot object navigation. We show that our proposed method can achieve an improvement of 1.38-13.38% in terms of text-to-image recall on different benchmark settings for the retrieval task. For object navigation, we show the benefit of our approach in simulation and real world, showing 5% and 16.67% improvement in terms of navigation success rate, respectively.


TuBT11-CC Oral Session, CC-502	Add to My Program
Deep Learning for Visual Perception II

Chair: Su, Hao	North Carolina State University
Co-Chair: Ji, Jingjing	Huazhong University of Science and Technology

13:30-15:00, Paper TuBT11-CC.1	Add to My Program
EgoPAT3Dv2: Predicting 3D Action Target from 2D Egocentric Vision for Human-Robot Interaction

Fang, Irving	New York University
Chen, Yuzhong	New York University
Wang, Yifan	New York University
Zhang, Jianghan	New York University
Zhang, Qiushi	New York University
Xu, Jiali	New York University
He, Xibo	Xi'an Jiaotong University
Gao, Weibo	North Carolina State University
Su, Hao	North Carolina State University
Li, Yiming	New York University
Feng, Chen	New York University
Keywords: Data Sets for Robotic Vision, Intention Recognition, Deep Learning for Visual Perception Abstract: A robot's ability to anticipate the 3D action target location of a hand's movement from egocentric videos can greatly improve safety and efficiency in human-robot interaction (HRI). While previous research predominantly focused on semantic action classification or 2D target region prediction, we argue that predicting the action target's 3D coordinate could pave the way for more versatile downstream robotics tasks, especially given the increasing prevalence of headset devices. This study expands EgoPAT3D, the sole dataset dedicated to egocentric 3D action target prediction. We augment both its size and diversity, enhancing its potential for generalization. Moreover, we substantially enhance the baseline algorithm by introducing a large pre-trained model and human prior knowledge. Remarkably, our novel algorithm can now achieve superior prediction outcomes using solely RGB images, eliminating the previous need for 3D point clouds and IMU input. Furthermore, we deploy our enhanced baseline algorithm on a real-world robotic platform to illustrate its practical utility in straightforward HRI tasks. The demonstrations showcase the real-world applicability of our advancements and may inspire more HRI use cases involving egocentric vision. All code and data are open-sourced and can be found on the project website.

13:30-15:00, Paper TuBT11-CC.2	Add to My Program
Distribution-Aware Continual Test-Time Adaptation for Semantic Segmentation

Ni, Jiayi	Peking University
Yang, Senqiao	Harbin Institute of Technology, Shenzhen
Xu, Ran	Beijing University of Posts and Telecommunications
Liu, Jiaming	Peking University
Li, Xiaoqi	Peking University
Jiao, Wenyu	University of Washington
Chen, Zehui	University of Science and Technology of China
Liu, Yi	Baidu Inc
Zhang, Shanghang	Peking University
Keywords: Deep Learning for Visual Perception, Continual Learning, Transfer Learning Abstract: Since autonomous driving systems usually face dynamic and ever-changing environments, continual test-time adaptation (CTTA) has been proposed as a strategy for transferring deployed models to continually changing target domains. However, the pursuit of long-term adaptation often introduces catastrophic forgetting and error accumulation problems, which impede the practical implementation of CTTA in the real world. Recently, existing CTTA methods mainly focus on utilizing a majority of parameters to fit target domain knowledge through self-training. Unfortunately, these approaches often amplify the challenge of error accumulation due to noisy pseudo-labels, and pose practical limitations stemming from the heavy computational costs associated with entire model updates. In this paper, we propose a distribution-aware tuning (DAT) method to make the semantic segmentation CTTA efficient and practical in real-world applications. DAT adaptively selects and updates two small groups of trainable parameters based on data distribution during the continual adaptation process, including domain-specific parameters (DSP) and task-relevant parameters (TRP). Specifically, DSP exhibits sensitivity to outputs with substantial distribution shifts, effectively mitigating the problem of error accumulation. In contrast, TRP are allocated to positions that are responsive to outputs with minor distribution shifts, which are fine-tuned to avoid the catastrophic forgetting problem. In addition, since CTTA is a temporal task, we introduce the Parameter Accumulation Update (PAU) strategy to collect the updated DSP and TRP in target domain sequences. We conducted extensive experiments on two widely-used semantic segmentation CTTA benchmarks, achieving competitive performance and efficiency compared to previous state-of-the-art methods.

13:30-15:00, Paper TuBT11-CC.3	Add to My Program
STNet: Spatio-Temporal Fusion-Based Self-Attention for Slip Detection in Visuo-Tactile Sensors

Lu, Jin	Huazhong University of Science and Technology
Niu, Bangyan	Huazhong University of Science and Technology
Ma, Huan	Huazhong University of Science and Technology
Jiafeng, Zhu	Huazhong University of Science and Technology
Ji, Jingjing	Huazhong University of Science and Technology
Keywords: Deep Learning for Visual Perception, Force and Tactile Sensing, Perception for Grasping and Manipulation Abstract: Slip detection plays a pivotal role in the dexterity of robotics, improving the reliability and precision of manipulations but also contributing to safety, efficiency, and adaptability. Deep learning-based slip detection algorithms commonly difficult to concentrate on key features when faced with dense 3D shape data obtained by visuo-tactile sensors. Data from noncontact locations can interfere with slip judgements and the ignorance of interframe linkage can also lead to slip detection failure. In this paper, a new spatio-temporal sequences fusion-based self-attention, STNet, is proposed to perform slip detection by allocating more attention to the object-sensor contact area when processing complex 3D shape data. A binocular visuo-tactile system (BVTS) is designed and fabricated for dataset construction. The entire 3D shape dataset containing 4 motion patterns, including stationary, pressing, rolling and slipping. Self-attention architecture with and without spatio-temporal sequences fusion mechanism (denoted as STNet and TemNet, respectively) are trained based on the same dataset. The experiments show the validity of STNet, which can reach 98.91% slip detection accuracy. Meanwhile, the ablation studies confirm the effectiveness of the spatio-temporal sequences fusion mechanism.

13:30-15:00, Paper TuBT11-CC.4	Add to My Program
Commonsense Spatial Knowledge-Aware 3-D Human Motion and Object Interaction Prediction

Lee, Sang Uk	Motional
Keywords: Deep Learning for Visual Perception, Human-Robot Collaboration, Deep Learning Methods Abstract: We propose a novel 3-D human motion and object interaction prediction model that is aware of commonsense knowledge about human--object interaction. We jointly predict human joint motion and human--object interactions. The two prediction results are combined to enforce commonsense knowledge, such as ``if the human right hand is predicted to be in contact with an object after 1 second, the distance between the right hand and an object should also be predicted to be small,'' explicit to the model. Our model uses the raw point cloud representation of the surrounding objects in the environment as input. Using raw point cloud representation allows us to model commonsense knowledge easily and improve accuracy. In particular, it does not require a separate perception system (e.g., object classification, object pose estimation, and so on), as in previous studies, and thus is robust to perception errors. Our model applies a cross-attention mechanism to fuse the environmental point cloud and past human joint poses. The surrounding environment context and past human joint poses are two heterogeneous inputs and cross-attention can be a powerful approach to fuse them. Our model is validated on the KIT Whole-Body Human Motion (WBHM) dataset.

13:30-15:00, Paper TuBT11-CC.5	Add to My Program
High-Degrees-Of-Freedom Dynamic Neural Fields for Robot Self-Modeling and Motion Planning

Schulze, Lennart	Columbia University
Lipson, Hod	Columbia University
Keywords: Deep Learning for Visual Perception, Machine Learning for Robot Control, AI-Enabled Robotics Abstract: A robot self-model is a task-agnostic representation of the robot's physical morphology that can be used for motion planning tasks in the absence of a classical geometric kinematic model. In particular, when the latter is hard to engineer or the robot's kinematics change unexpectedly, human-free self-modeling is a necessary feature of truly autonomous agents. In this work, we leverage neural fields to allow a robot to self-model its kinematics as a neural-implicit query model learned only from 2D images annotated with camera poses and configurations. This enables significantly greater applicability than existing approaches which have been dependent on depth images or geometry knowledge. To this end, alongside a curricular data sampling strategy, we propose a new encoder-based neural density field architecture for dynamic object-centric scenes conditioned on high numbers of degrees of freedom (DOFs). In a 7-DOF robot test setup, the learned self-model achieves a Chamfer-L2 distance of 2% of the robot's workspace dimension. We demonstrate the capabilities of this model on motion planning tasks as an exemplary downstream application.

13:30-15:00, Paper TuBT11-CC.6	Add to My Program
Language-Conditioned Affordance-Pose Detection in 3D Point Clouds

Nguyen, Tien Toan	FPT Software
Vu, Minh Nhat	TU Wien, Austria
Huang, Baoru	Imperial College London
Van Vo, Tuan	FPT Software
Truong, Thuy Tuong Vy	FPT Software
Le, Ngan	University of Arkansas
Vo, Thieu	Ton Duc Thang University
Le, Hoai Bac	VNUHCM-University of Science
Nguyen, Anh	University of Liverpool
Keywords: Deep Learning for Visual Perception, Recognition Abstract: Affordance detection and pose estimation are of great importance in many robotic applications. Their combination helps the robot gain an enhanced manipulation capability, in which the generated pose can facilitate the corresponding affordance task. Previous methods for affodance-pose joint learning are limited to a predefined set of affordances, thus limiting the adaptability of robots in real-world environments. In this paper, we propose a new method for language-conditioned affordance-pose joint learning in 3D point clouds. Given a 3D point cloud object, our method detects the affordance region and generates appropriate 6-DoF poses for any unconstrained affordance label. Our method consists of an open-vocabulary affordance detection branch and a language-guided diffusion model that generates 6-DoF poses based on the affordance text. We also introduce a new high-quality dataset for the task of language-driven affordance-pose joint learning. Intensive experimental results demonstrate that our proposed method works effectively on a wide range of open-vocabulary affordances and outperforms other baselines by a large margin. In addition, we illustrate the usefulness of our method in real-world robotic applications. Our code and dataset are publicly available at https://3dapnet.github.io.

13:30-15:00, Paper TuBT11-CC.7	Add to My Program
Multi-Object RANSAC: Efficient Plane Clustering Method in a Clutter

Lim, Seunghyeon	Seoul National University
Yoo, Youngjae	Seoul National University
Jun Ki, Lee	Seoul National University
Zhang, Byoung-Tak	Seoul National University
Keywords: Deep Learning for Visual Perception, RGB-D Perception, Grasping Abstract: In this paper, we propose a novel method for plane clustering specialized in cluttered scenes using an RGB-D camera and validate its effectiveness through robot grasping experiments. Unlike existing methods, which focus on large-scale indoor structures, our approach---Multi-Object RANSAC emphasizes cluttered environments that contain a wide range of objects with different scales. It enhances plane segmentation by generating subplanes in Deep Plane Clustering (DPC) module, which are then merged with the final planes by post-processing. DPC rearranges the point cloud by voting layers to make subplane clusters, trained in a self-supervised manner using pseudo-labels generated from RANSAC. Multi-Object RANSAC demonstrates superior plane instance segmentation performances over other recent RANSAC applications. We conducted an experiment on robot suction-based grasping, comparing our method with vision-based grasping network and RANSAC applications. The results from this real-world scenario showed its remarkable performance surpassing the baseline methods, highlighting its potential for advanced scene understanding and manipulation.

13:30-15:00, Paper TuBT11-CC.8	Add to My Program
Utilizing Inpainting for Training Keypoint Detection Algorithms towards Markerless Visual Servoing

Chatterjee, Sreejani	Worcester Polytechnic Institute
Doan, Duc	Worcester Polytechnic Institute
Calli, Berk	Worcester Polytechnic Institute
Keywords: Deep Learning for Visual Perception, Visual Servoing, Computer Vision for Automation Abstract: This paper presents a novel strategy to train keypoint detection models for robotics applications. Our goal is to develop methods that can robustly detect and track natural features on robotic manipulators. Such features can be used for vision-based control and pose estimation purposes, when placing artificial markers (e.g. ArUco) on the robot’s body is not possible or practical in runtime. Prior methods require accurate camera calibration and robot kinematic models in order to label training images for the keypoint locations. In this paper, we remove these dependencies by utilizing inpainting methods: In the training phase, we attach ArUco markers along the robot’s body and then label the keypoint locations as the center of those markers. We, then, use an inpainting method to reconstruct the parts of the robot occluded by the ArUco markers. As such, the markers are artificially removed from the training images, and labeled data is obtained to train markerless keypoint detection algorithms without the need for camera calibration or robot models. Using this approach, we trained a model for realtime keypoint detection and used the inferred keypoints as control features for an adaptive visual servoing scheme. We obtained successful control results with this fully model-free control strategy, utilizing natural robot features in the runtime and not requiring camera calibration or robot models in any stage of this process.


TuBT13-AX Oral Session, AX-201	Add to My Program
Physical Human-Robot Interaction II

Chair: Zefran, Milos	University of Illinois at Chicago

13:30-15:00, Paper TuBT13-AX.1	Add to My Program
Transformer-Based Prediction of Human Motions and Contact Forces for Physical Human-Robot Interaction

Fusco, Alessia	Politecnico Di Torino
Modugno, Valerio	University College London
Kanoulas, Dimitrios	University College London
Rizzo, Alessandro	Politecnico Di Torino
Cognetti, Marco	LAAS-CNRS and Université Toulouse III - Paul Sabatier
Keywords: Physical Human-Robot Interaction, Intention Recognition, Safety in HRI Abstract: In this paper, we propose a transformer-based architecture for predicting contact forces during a physical human-robot interaction. Our Neural Network is composed of two main parts: a Multi-Layer Perceptron called Transducer and a Transformer. The former estimates, based on the kinematic data from a motion capture suit, the current contact forces. The latter predicts -- taking as input the same kinematic data and the output of the Transducer -- the human motions and the contact forces over a time window in the future. We validated our approach by testing the network on directions of motions that were not provided in the training set. We also compared our approach to a purely Transformer-based network, showing a better prediction accuracy of the contact forces.

13:30-15:00, Paper TuBT13-AX.2	Add to My Program
SynH2R: Synthesizing Hand-Object Motions for Learning Human-To-Robot Handovers

Christen, Sammy	ETH Zurich
Feng, Lan	ETH ZURICH
Yang, Wei	NVIDIA
Chao, Yu-Wei	NVIDIA
Hilliges, Otmar	ETH Zurich
Song, Jie	ETHZ
Keywords: Physical Human-Robot Interaction, Modeling and Simulating Humans, Data Sets for Robot Learning Abstract: Vision-based human-to-robot handover is an important and challenging task in human-robot interaction. Recent work has attempted to train robot policies by interacting with dynamic virtual humans in simulated environments, where the policies can later be transferred to the real world. However, a major bottleneck is the reliance on human motion capture data, which is expensive to acquire and difficult to scale to arbitrary objects and human grasping motions. In this paper, we introduce a framework that can generate plausible human grasping motions suitable for training the robot. To achieve this, we propose a hand-object synthesis method that is designed to generate handover-friendly motions similar to humans. This allows us to generate synthetic training and testing data with 100x more objects than previous work. In our experiments, we show that our method trained purely with synthetic data is competitive with state-of-the-art methods that rely on real human motion data both in simulation and on a real system. In addition, we can perform evaluations on a larger scale compared to prior work. With our newly introduced test set, we show that our model can better scale to a large variety of unseen objects and human motions compared to the baselines.

13:30-15:00, Paper TuBT13-AX.3	Add to My Program
Proactive Robot Control for Collaborative Manipulation Using Human Intent

Rysbek, Zhanibek	University of Illinois at Chicago
Li, Siyu	University of Illinois at Chicago
Mehri Shervedani, Afagh	University of Illinois Chicago
Zefran, Milos	University of Illinois at Chicago
Keywords: Physical Human-Robot Interaction, Intention Recognition, Human-Robot Collaboration Abstract: Collaborative manipulation task often requires negotiation using explicit or implicit communication. An important example is determining where to move when the goal destination is not uniquely specified, and who should lead the motion. This work is motivated by the ability of humans to communicate the desired destination of motion through back-and-forth force exchanges. Inherent to these exchanges is also the ability to dynamically assign a role to each participant, either taking the initiative or deferring to the partner's lead. In this paper, we propose a hierarchical robot control framework that emulates human behavior in communicating a motion destination to a human collaborator and in responding to their actions. At the top level, the controller consists of a set of finite-state machines corresponding to different levels of commitment of the robot to its desired goal configuration. The control architecture is loosely based on the human strategy observed in the human-human experiments, and the key component is a real-time intent recognizer that helps the robot respond to human actions. We describe the details of the control framework, feature engineering and training process of the intent recognition. The proposed controller was implemented on a UR10e robot (Universal Robots) and evaluated through human studies. The experiments show that the robot correctly recognizes and responds to human input, communicates its intent clearly, and resolves conflict. We report success rates and draw comparisons with human-human experiments to demonstrate the effectiveness of the approach.

13:30-15:00, Paper TuBT13-AX.4	Add to My Program
Human Modeling in Physical Human-Robot Interaction: A Brief Survey

Fang, Cheng	University of Southern Denmark
Peternel, Luka	Delft University of Technology
Seth, Ajay	Delft University of Technology
Sartori, Massimo	University of Twente
Mombaur, Katja	Karlsruhe Institute of Technology
Yoshida, Eiichi	Tokyo University of Science
Keywords: Physical Human-Robot Interaction, Modeling and Simulating Humans, Human-Centered Robotics Abstract: The advancement and development of human modeling have greatly benefited from principles used in robotics, for instance, multibody dynamics laid the foundations for physics engines of human movement simulation, and the robotics and control theory were used to contextualize human sensorimotor control. There are many common interests and interconnections between the fields of human modeling and robotics. In recent years, as robots have become safer and smarter, they actively participate in our lives and help us in various scenarios. Roboticists need tools and data from human modeling to build next-generation robots that better assist humans. In this survey, we focus on the connections between physical human-robot interaction and human modeling. On one hand, human neuromusculoskeletal and sensorimotor control models provide novel insights into the human response that robots can utilize to improve human performance. On the other hand, robots are becoming instrumental in quantifying the performance of the (neuro)musculoskeletal system. Thus, the combined use of human modeling and robotic methods in physical human-robot interaction can lead to both improved human understanding and functional assistance.

13:30-15:00, Paper TuBT13-AX.5	Add to My Program
Exploring Transformers and Visual Transformers for Force Prediction in Human-Robot Collaborative Transportation Tasks

Dominguez-Vidal, Jose Enrique	Institut De Robòtica I Informàtica Industrial, CSIC-UPC
Sanfeliu, Alberto	Universitat Politècnica De Cataluyna
Keywords: Physical Human-Robot Interaction, Intention Recognition, Deep Learning Methods Abstract: In this paper, we analyze the possibilities offered by Deep Learning State-of-the-Art architectures such as Transformers and Visual Transformers in generating a prediction of the human’s force in a Human-Robot collaborative object transportation task at a middle distance. We outperform our previous predictor by achieving a success rate of 93.8% in testset and 90.9% in real experiments with 21 volunteers predicting in both cases the force that the human will exert during the next 1 s. A modification in the architecture allows us to obtain a second output from the model with a velocity prediction, which allows us to improve the capabilities of our predictor if it is used to estimate the trajectory that the human-robot pair will follow. An ablation test is also performed to verify the relative contribution to performance of each input.

13:30-15:00, Paper TuBT13-AX.6	Add to My Program
Exploring the Effect of Base Compliance on Physical Human-Robot Collaboration

Wang, Ziqi	University of Technology Sydney
Carmichael, Marc	Centre for Autonomous Systems
Keywords: Physical Human-Robot Interaction, Human-Robot Collaboration, Human Factors and Human-in-the-Loop Abstract: Mobile physical human-robot collaboration (pHRC) using collaborative robots (cobots) and mobile robots has attracted much research attention. Many researchers have focused on improving the control performance to comply with human intentions. However, a problem that generally exists with mobile pHRC but often gets neglected is the impact of non-rigid components e.g. deformable tyres, suspension systems and uneven terrain on human interaction experience and task performance. To fullfil this current research gap, we carried out an investigation on the above-mentioned problem by altering a cobot’s base rigidity level (also referred to as base compliance level or BCL) during pHRC experiments. We explored how the task performance is affected by base compliance as well as human operator’s experience and cobot control parameters. Measurements include the human operator’s physical effort, task velocity, and task error. From the experimental results, it is discovered that base compliance has a significant impact on task accuracy as it can easily excite the system if an inadequate control strategy is deployed. Furthermore, through ANOVA, it is discovered that the influence of base compliance can be minimized and system excitation can be avoided by sufficient human operator training and the appropriate selection of cobot’s control parameters.

13:30-15:00, Paper TuBT13-AX.7	Add to My Program
Experimental and Simulation-Based Estimation of Interface Power During Physical Human-Robot Interaction in Hand Exoskeletons

Yousaf, Saad	The University of Texas at Austin
Mukherjee, Gaurav	University of Washington
King, Raymond	Oculus VR
Deshpande, Ashish	The University of Texas
Keywords: Physical Human-Robot Interaction, Wearable Robotics, Design and Human Factors Abstract: Even the best wearable robots face challenges with power losses in the system, especially at the physical attachment interface. While some sources for power loss are inherent to the system, such as human soft tissue or musculoskeletal joint damping, other sources such as soft padding materials and bias strap forces can be modulated to optimize interface power transmission. Few methods currently exist for estimating power loss at physical human-robot interfaces, especially for upper-body exoskeletons. This letter presents a novel method to estimate interface power from experimental data in a wearable hand device, along with a simulation model for predicting interaction behavior by incorporating viscoelastic properties at the attachment interface. The experimental method is implemented with the Maestro hand exoskeleton, and repeatability of the interface power estimation is confirmed with pilot human testing. Simulation results are compared with experimental estimation of interface power, showing agreement of trends and validating the use of a simulation model to predict physical human-robot interaction behavior. These findings highlight the advantages of multi-body simulations as a tool to perform modular, inexpensive, and predictive investigations in physical human-robot interaction, without affecting the real-world mechatronic system or hindering the subject’s safety. The proposed tools can optimize the design of wearable robots for seamless integration with the human body.

13:30-15:00, Paper TuBT13-AX.8	Add to My Program
A Personalizable Controller for the Walking Assistive omNi-Directional Exo-Robot (WANDER)

Fortuna, Andrea	Politecnico Di Milano
Lorenzini, Marta	Istituto Italiano Di Tecnologia
Leonori, Mattia	Istituto Italiano Di Tecnologia
Gandarias, Juan M.	University of Malaga
Balatti, Pietro	Istituto Italiano Di Tecnologia
Cho, Younggeol	Istituto Italiano Di Tecnologia (IIT)
De Momi, Elena	Politecnico Di Milano
Ajoudani, Arash	Istituto Italiano Di Tecnologia
Keywords: Physically Assistive Devices, Optimization and Optimal Control, Physical Human-Robot Interaction Abstract: Preserving and encouraging mobility in the elderly and adults with chronic conditions is of paramount importance. However, existing walking aids are either inadequate to provide sufficient support to users' stability or too bulky and poorly maneuverable to be used outside hospital environments. In addition, they all lack adaptability to individual requirements. To address these challenges, this paper introduces WANDER, a novel Walking Assistive omNi-Directional Exo-Robot. It consists of an omnidirectional platform and a robust aluminum structure mounted on top of it, which provides partial body weight support. A comfortable and minimally restrictive coupling interface embedded with a force/torque sensor allows to detect users' intentions, which are translated into command velocities by means of a variable admittance controller. An optimization technique based on users' preferences, i.e., Preference-Based Optimization (PBO) guides the choice of the admittance parameters (i.e., virtual mass and damping) to better fit subject-specific needs and characteristics. Experiments with twelve healthy subjects exhibited a significant decrease in energy consumption and jerk when using WANDER with PBO parameters as well as improved user performance and comfort. The great interpersonal variability in the optimized parameters highlights the importance of personalized control settings when walking with an assistive device, aiming to enhance users' comfort and mobility while ensuring reliable physical support.


TuBT14-AX Oral Session, AX-202	Add to My Program
Prosthetics and Exoskeletons II

Chair: Kong, Kyoungchul	Korea Advanced Institute of Science and Technology
Co-Chair: Masia, Lorenzo	Heidelberg University

13:30-15:00, Paper TuBT14-AX.1	Add to My Program
Lightweight and Flexible Prosthetic Wrist with Shape Memory Alloy (SMA)-Based Artificial Muscle and Elliptic Rolling Joint

Hyeon, Kyujin	KAIST
Chung, Chongyoung	Korea Advanced Institute of Science and Technology (KAIST)
Ma, Jihyeong	Korea Advanced Institute of Science and Technology
Kyung, Ki-Uk	Korea Advanced Institute of Science & Technology (KAIST)
Keywords: Prosthetics and Exoskeletons, Soft Robot Applications, Biomimetics Abstract: This paper proposes a novel prosthetic wrist that emulates the anatomical structure of the human wrist, specifically the wrist bones and muscles responsible for wrist movements. To achieve a range of motion (ROM) and load-bearing capacity comparable to the human wrist joint, we designed an elliptic rolling joint as an artificial wrist joint, mimicking the two-row structures of carpal bones. The joint offers two degrees of freedom (DOFs) and can support high loads while also providing adequate ROM. In addition, we designed the artificial muscles using the properties of human muscles, such as moment arm and displacement, and implemented them as shape memory alloy (SMA) spring-based actuators. The resulting prosthetic wrist, incorporating the artificial joint and artificial muscles, is lightweight at only 50g and can perform functional ranges of motion, including 53° for flexion, 50° for extension, 40° for radial deviation, and 42° for ulnar deviation. The use of SMA spring actuators confers restoring force and flexibility to the prosthetic wrist, allowing it to withstand external disturbances. Furthermore, the proposed wrist can be utilized as a robotic wrist, affording two additional DOFs, the ability to lift loads more than 20 times its weight, and variable joint stiffness.

13:30-15:00, Paper TuBT14-AX.2	Add to My Program
Ankle Exoskeleton with a Symmetric 3 DoF Structure for Plantarflexion Assistance

Dezman, Miha	Karlsruhe Institute of Technology
Marquardt, Charlotte Dorothea	Karlsruhe Institute of Technology (KIT)
Asfour, Tamim	Karlsruhe Institute of Technology (KIT)
Keywords: Prosthetics and Exoskeletons, Wearable Robotics, Mechanism Design Abstract: Ankle exoskeletons can assist the ankle joint and reduce the metabolic cost of walking. However, many existing ankle exoskeletons constrain the natural 3 degrees of freedom (DoF) of the ankle to limit the exoskeleton's weight and mechanical complexity, thereby compromising comfort and kinematic compatibility with the user. This paper presents a novel ankle exoskeleton frame design that allows for 3 DoF ankle motion using a symmetric parallel frame design principle resulting in a strong frame while weighing 1.8 kg. Furthermore, a cable routing method is proposed to actuate the plantarflexion of the ankle. The kinematic compatibility of the proposed exoskeleton frame is evaluated in straight- and curve-walking scenarios with four users. The study demonstrates that the exoskeleton frame adapts to the natural 3 DoF ankle motion and the range of motion (RoM) during walking. The actuation in plantarflexion is evaluated in a stationary torque experiment demonstrating the ability of the frame to transfer large torque loads of up to 57.4 Nm. This work contributes to the design and development of more flexible and adaptable ankle exoskeletons for walking assistance.

13:30-15:00, Paper TuBT14-AX.3	Add to My Program
Design of a Front-Enveloping Powered Exoskeleton Considering Optimal Distribution of Actuating Torques and Center of Mass

Park, Jeongsu	KAIST
Shi, Kyeongsu	Korea Advanced Institute of Science and Technology
An, Hyojun	Korea Advanced Institute of Science and Technology (KAIST)
Lee, Gunhee	Korea Advanced Institute of Science and Technology
Kim, Seunghwan	Korea Advanced Institute of Science and Technology
Ko, Chanyoung	Korea Advanced Institute of Science and Technology
Kim, Taeyeon	Korea Advanced Institute of Science and Technology
Kim, Hyeongjun	Korea Advanced Institute of Science and Technology
Kong, Kyoungchul	Korea Advanced Institute of Science and Technology
Keywords: Prosthetics and Exoskeletons, Wearable Robotics, Mechanism Design Abstract: Traditionally, powered exoskeletons have predominantly featured a back-enveloping design due to its simplicity in both implementation and user donning. However, this design results in a backward shift of the center of mass (CoM) in the sagittal plane. This paper identifies the limitations of existing design approaches and determines the optimal anterior-posterior (A/P) CoM position considering factors like actuating power, balance in the neutral posture, and user's hand workspace. Our optimization analysis recommends placing the CoM in front of the user. We address historical constraints on front-enveloping designs and propose solutions. Furthermore, we validate the usability of our designed exoskeleton through testing with a complete paraplegic user.

13:30-15:00, Paper TuBT14-AX.4	Add to My Program
Real-Time Locomotion Transitions Detection: Maximizing Performances with Minimal Resources

Orhan, Zeynep Özge	EPFL
Prete, Andrea Dal	Politecnico Di Milano
Bolotnikova, Anastasia	EPFL
Gandolla, Marta	Politecnico Di Milano
Ijspeert, Auke	EPFL
Bouri, Mohamed	EPFL
Keywords: Prosthetics and Exoskeletons, Wearable Robotics, Physical Human-Robot Interaction Abstract: Assistive devices, such as exoskeletons and prostheses, have revolutionized the field of rehabilitation and mobility assistance. Efficiently detecting transitions between different activities, such as walking, stair ascending and descending, and sitting, is crucial for ensuring adaptive control and enhancing user experience. We present an approach for real-time transition detection, aimed at optimizing the processing-time performance. By establishing activity-specific threshold values through trained machine learning models, we effectively distinguish motion patterns and we identify transition moments between locomotion modes. This threshold-based method improves real-time embedded processing time performance by up to 11 times compared to machine learning approaches. The efficacy of the developed finite-state machine is validated using data collected from three different measurement systems. Moreover, experiments with healthy participants were conducted on an active pelvis orthosis to validate the robustness and reliability of our approach. The proposed algorithm achieved high accuracy in detecting transitions between activities. These promising results show the robustness and reliability of the method, reinforcing its potential for integration into practical applications.

13:30-15:00, Paper TuBT14-AX.5	Add to My Program
ExoRecovery: Push Recovery with a Lower-Limb Exoskeleton Based on Stepping Strategy

Orhan, Zeynep Özge	EPFL
Shafiee, Milad	EPFL
Juillard, Vincent	EPFL
Coelho Oliveira, Joel	EPFL
Ijspeert, Auke	EPFL
Bouri, Mohamed	EPFL
Keywords: Prosthetics and Exoskeletons, Wearable Robotics, Physically Assistive Devices Abstract: Balance loss is a significant challenge in lower-limb exoskeleton applications, as it can lead to potential falls, thereby impacting user safety and confidence. We introduce a control framework for omnidirectional recovery step planning by online optimization of step duration and position in response to external forces. We map the step duration and position to a human-like foot trajectory, which is then translated into joint trajectories using inverse kinematics. These trajectories are executed via an impedance controller, promoting cooperation between the exoskeleton and the user. Moreover, our framework is based on the concept of the divergent component of motion, also known as the Extrapolated Center of Mass, which has been established as a consistent dynamic for describing human movement. This real-time online optimization framework enhances the adaptability of exoskeleton users under unforeseen forces thereby improving the overall user stability and safety. To validate the effectiveness of our approach, simulations, and experiments were conducted. Our push recovery experiments employing the exoskeleton in zero-torque mode (without assistance) exhibit an alignment with the exoskeleton's recovery assistance mode, that shows the consistency of the control framework with human intention. To the best of our knowledge, this is the first cooperative push recovery framework for the lower-limb human exoskeleton that relies on the simultaneous adaptation of intra-stride parameters in both frontal and sagittal directions. The proposed control scheme has been validated with human subject experiments.

13:30-15:00, Paper TuBT14-AX.6	Add to My Program
Pilot Comparison of Customized and Generalized Hip-Knee-Ankle Exoskeleton Torque Profiles

Bryan, Gwendolyn	IHMC
Franks, Patrick W.	Skip
Song, Seungmoon	Northeastern
Collins, Steven H.	Stanford University
Keywords: Prosthetics and Exoskeletons, Wearable Robotics Abstract: Optimized assistance patterns have produced the greatest exoskeleton benefits to energy expenditure of any strategy to date. This strategy may be effective due to the customization of the applied torque profiles to the user as well as the locomotion condition; however, it is currently unclear how sensitive participants are to their unique torque profile. To investigate, we applied previously optimized hip-knee-ankle torque profiles to expert users (N=3; 1.25 m/s; 0 deg incline). The participants walked with the profile optimized to them, the two profiles optimized to the other two participants, and the average of the three torque profiles while we measured their energy expenditure. Relative to walking with the device turned off, on average, participants experienced a 47.5% (range 12%) metabolic reduction when walking with the torque profile optimized to them and a 46% (range 15%) reduction when walking with the other profiles. Interestingly, within-subject performance was more consistent than across subjects (P1: 52% range 5%, P2: 49% range 6%, P3: 39% range 3%) suggesting that, for expert users of some devices, there may be a range of nearly equally effective torque profiles to reduce the metabolic cost of walking. The torque timing was remarkably similar across the four torque profiles while the torque magnitude varied; participants may be much more sensitive to torque timing than torque magnitude, and there may be a set of torque timing parameters that are generally effective.

13:30-15:00, Paper TuBT14-AX.8	Add to My Program
Task-Space Control of a Powered Ankle Prosthesis

Kelly, David	University of Notre Dame
Posh, Ryan	University of Notre Dame
Wensing, Patrick M.	University of Notre Dame
Keywords: Prosthetics and Exoskeletons Abstract: Powered lower-limb prostheses have shown promise in helping individuals with amputation regain functionality that passive prostheses cannot provide. However, the best method for controlling these devices in coordination with their users is still an open research topic. While powered devices can replicate normative joint kinematics and kinetics, active control also holds the potential to shape system-level characteristics such as the center of mass (CoM) that play an important role in balance. Controlling the prosthesis based on these system-level, or task-space, variables would further represent a new way of coordinating the user and their device. This paper explores the initial implementation of task-space control for a powered ankle prosthesis, characterizing the emergent outcomes of this new coordination strategy. One able-bodied subject walked using a bypass adapter while prosthesis torques were commanded based on reference ground reaction force (GRF) and CoM trajectories. The subject could walk comfortably and continuously at their preferred walking speed, achieving normative ankle torques and joint trajectories despite not tracking explicit joint-level references in stance.

13:30-15:00, Paper TuBT14-AX.9	Add to My Program
Integrating Computer Vision in Exosuits for Adaptive Support and Reduced Muscle Strain in Industrial Environments

Missiroli, Francesco	Heidelberg University
Mazzoni, Pietro	Politecnico Di Milano
Lotti, Nicola	Heidelberg University
Tricomi, Enrica	Heidelberg University
Braghin, Francesco	Politecnico Di Milano
Roveda, Loris	SUPSI-IDSIA
Masia, Lorenzo	Heidelberg University
Keywords: Embedded Systems for Robotic and Automation, Modeling, Control, and Learning for Soft Robots, Prosthetics and Exoskeletons Abstract: Exosuits are wearable technologies that improve physical capabilities and mobility providing support during various activities. Although primarily intended for medical rehabilitation, there is growing interest in utilizing exosuits in industrial environments to prevent work-related musculoskeletal disorders (WMSDs) by ensuring continuous joints support. However, achieving synchronization between the exosuit and human motion, as well as effectively controlling interactions with the surroundings, presents ongoing challenges. The integration of computer vision techniques, particularly object recognition algorithms, can greatly assist exosuits in understanding the user's environment and adapting their behaviour accordingly. To address this issue, we have developed a control strategy for a soft exosuit that employs computer vision to collaboratively offer tailored assistance to the elbow, alleviating joint stress during interactions with objects of various natures and weights. We conducted a study to assess the effectiveness of the integrated system, which merges object recognition and gravity compensation within a built-in structure of the robotic exosuit. The findings confirmed that the suggested solution notably minimized muscle strain during dynamic activities, exhibiting a consistent correlation with the mass of the object being lifted, namely reducing by 45% and 54% respectively the Biceps activity while lifting the MW and HW compared to the 32% of the "Dynamic Arm". The int


TuBT17-AX Oral Session, AX-205	Add to My Program
Legged Robots II

Chair: Zhou, Chengxu	University College London
Co-Chair: Ijspeert, Auke	EPFL

13:30-15:00, Paper TuBT17-AX.1	Add to My Program
Terrestrial Locomotion of PogoX: From Hardware Design to Energy Shaping and Step-To-Step Dynamics Based Control

Wang, Yi	Columbia University
Kang, Jiarong	University of Wisconsin Madison
Chen, Zhiheng	University of Wisconsin-Madison
Xiong, Xiaobin	University of Wisconsin Madison
Keywords: Legged Robots, Aerial Systems: Mechanics and Control, Underactuated Robots Abstract: We present a novel controller design on a robotic locomotor that combines an aerial vehicle with a spring-loaded leg. The main motivation is to enable the terrestrial locomotion capability on aerial vehicles so that they can carry heavy loads: heavy enough that flying is no longer possible, e.g., when the thrust-to-weight ratio (TWR) is small. The robot is designed with a pogo-stick leg and a quadrotor, and thus it is named as PogoX. We show that with a simple and lightweight spring-loaded leg, the robot is capable of hopping with TWR <1. The control of hopping is realized via two components: a vertical height control via control Lyapunov function-based energy shaping, and a step-to-step (S2S) dynamics based horizontal velocity control that is inspired by the hopping of the Spring-Loaded Inverted Pendulum (SLIP). The controller is successfully realized on the physical robot, showing dynamic terrestrial locomotion of PogoX which can hop at variable heights and different horizontal velocities with robustness to ground height variations and external pushes.

13:30-15:00, Paper TuBT17-AX.2	Add to My Program
Learning Emergent Gaits with Decentralized Phase Oscillators: On the Role of Observations, Rewards, and Feedback

Zhang, Jenny	Massachusetts Institute of Technology
Heim, Steve	Massachusetts Institute of Technology
Jeon, Se Hwan	Massachusetts Institute of Technology
Kim, Sangbae	Massachusetts Institute of Technology
Keywords: Legged Robots, Bioinspired Robot Learning, Natural Machine Motion Abstract: We present a minimal phase oscillator model for learning quadrupedal locomotion. Each of the four oscillators is coupled only to itself and its corresponding leg through local feedback of the ground reaction force, which can be interpreted as an observer feedback gain. We interpret the oscillator itself as a latent contact state-estimator. Through a systematic ablation study, we show that the combination of phase observations, simple phase-based rewards, and the local feedback dynamics induces policies that exhibit emergent gait preferences, while using a reduced set of simple rewards, and without prescribing a specific gait. The code is open-source, and a video synopsis available at https://youtu.be/1NKQ0rSV3jU.

13:30-15:00, Paper TuBT17-AX.3	Add to My Program
Bio-Inspired Gait Transitions for Quadruped Locomotion

Humphreys, Joseph Elliot	University of Leeds
Li, Jun	Harbin Institute of Technology
Wan, Yuhui	University of Leeds
Gao, Haibo	Harbin Institute of Technology
Zhou, Chengxu	University College London
Keywords: Legged Robots, Biologically-Inspired Robots, Humanoid and Bipedal Locomotion Abstract: Developing gaits inspired by animal locomotion for quadruped robots has become a prevalent approach in achieving dynamic locomotion. Analogous to animal gaits, they exhibit optimal effectiveness at specific velocities, necessitating the transitions between them for enhanced locomotion proficiency. Despite the significance of these transitions, methods for achieving them have received comparatively limited attention. For successful gait transitions, stability and suitable velocities are essential to maintain efficiency. In this study, a bio-inspired gait transition method has been devised, capitalising on the Froude number—a parameter characterising the velocity at which different-sized quadrupeds alter their gaits. By formulating a set of governing equations contingent on the Froude number, stable gait transitions can be generated. A series of simulations were conducted to determine the optimal Froude number ranges for various gaits and to validate the generality of this method by applying it to four distinct quadrupeds. To assess the performance of the gait transitions, a series of hardware experiments were executed, demonstrating a variety of gait transitions, comparing the proposed transition method with existing alternatives and testing its generality.

13:30-15:00, Paper TuBT17-AX.4	Add to My Program
Optimizing Dynamic Balance in a Rat Robot Via the Lateral Flexion of a Soft Actuated Spine

Huang, Yuhong	Technische Universität München
Bing, Zhenshan	Technical University of Munich
Zhang, Zitao	Sun Yat-Sen University
Zhuang, Genghang	Technical University of Munich
Huang, Kai	Sun Yat-Sen University
Knoll, Alois	Tech. Univ. Muenchen TUM
Keywords: Legged Robots, Body Balancing, Motion Control Abstract: Balancing oneself using the spine is a physiological alignment of the body posture in the most efficient manner by the muscular forces for mammals. For this reason, we can see many disabled quadruped animals can still stand or walk even with three limbs. This paper investigates the optimization of dynamic balance during trot gait based on the spatial relationship between the center of mass (CoM) and support area influenced by spinal flexion. During trotting, the robot balance is significantly influenced by the distance of the CoM to the support area formed by diagonal footholds. In this context, lateral spinal flexion, which is able to modify the position of footholds, holds promise for optimizing balance during trotting. This paper explores this phenomenon using a rat robot equipped with a soft actuated spine. Based on the lateral flexion of the spine, we establish a kinematic model to quantify the impact of spinal flexion on robot balance during trot gait. Subsequently, we develop an optimized controller for spinal flexion, designed to enhance balance without altering the leg locomotion. The effectiveness of our proposed controller is evaluated through extensive simulations and physical experiments conducted on a rat robot. Compared to both a non-spine based trot gait controller and a trot gait controller with lateral spinal flexion, our proposed optimized controller effectively improves the dynamic balance of the robot and retains the desired locomotion during trotting.

13:30-15:00, Paper TuBT17-AX.5	Add to My Program
SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos

Zhang, John	Carnegie Mellon University
Yang, Shuo	Carnegie Mellon University
Yang, Gengshan	Meta
Bishop, Arun	Carnegie Mellon University
Gurumurthy, Swaminathan	Carnegie Mellon University
Ramanan, Deva	Carnegie Mellon University
Manchester, Zachary	Carnegie Mellon University
Keywords: Legged Robots, Computer Vision for Automation Abstract: We present SLoMo: a first-of-its-kind framework for transferring skilled motions from casually captured “in- the-wild” video footage of humans and animals to legged robots. SLoMo works in three stages: 1) synthesize a physically plausible reconstructed key-point trajectory from monocular videos; 2) optimize a dynamically feasible reference trajectory for the robot offline that includes body and foot motion, as well as a contact sequence that closely tracks the key points; 3) track the reference trajectory online using a general-purpose model-predictive controller on robot hardware. Traditional motion imitation for legged motor skills often requires expert anima- tors, collaborative demonstrations, and/or expensive motion- capture equipment, all of which limits scalability. Instead, SLoMo only relies on easy-to-obtain videos, readily available in online repositories such as YouTube. It converts videos into motion primitives that can be executed reliably by real- world robots. We demonstrate our approach by transferring the motions of cats, dogs, and humans to example robots including a quadruped (on hardware) and a humanoid (in simulation).

13:30-15:00, Paper TuBT17-AX.6	Add to My Program
Introducing the Carpal-Claw: A Mechanism to Enhance High-Obstacle Negotiation for Quadruped Robots

Barasuol, Victor	Istituto Italiano Di Tecnologia
Emre, Sinan	Istituto Italiano Di Tecnologia
Suzano Medeiros, Vivian	University of São Paulo
Bratta, Angelo	Istituto Italiano Di Tecnologia
Semini, Claudio	Istituto Italiano Di Tecnologia
Keywords: Legged Robots, Mechanism Design, Robot Safety Abstract: The capability of a quadruped robot to negotiate obstacles is tightly connected to its leg workspace and joint torque limits. When facing terrain where the height of obstacles is close to the leg length, the locomotion robustness and safety are reduced since more dynamic motions are required to traverse it. In this paper, we introduce a new mechanism called the Carpal-Claw, which enables quadruped robots to negotiate higher obstacles and adds safety to the locomotion by allowing the robot to negotiate obstacles under static and quasi-static locomotion and regular joint torque demands. The design of the mechanism is detailed, as well as the methodology to exploit the mechanism in the locomotion control framework. The Carpal-Claw functionality is validated through various experiments on a very high obstacle and stairs-like terrains using an Aliengo robot. We demonstrate how Aliengo can safely descend a step height of 40cm, which is 80% of its leg length. To the best knowledge of the authors, this is the first time a mechanism like the C-Claw is proposed for improving quadruped robot locomotion over high obstacles.

13:30-15:00, Paper TuBT17-AX.7	Add to My Program
SpaceHopper: A Small-Scale Legged Robot for Exploring Low-Gravity Celestial Bodies

Spiridonov, Alexander	ETH Zurich
Buehler, Fabio	ETH Zurich
Berclaz, Moriz	ETH Zurich
Schelbert, Valerio Antonio	ETH Zurich
Geurts, Jorit	ETH Zürich
Krasnova, Elena	ETH Zürich
Steinke, Emma	ETH Zürich
Toma, Jonas	ZHAW School of Engineering
Wüthrich, Joschua	ETH Zürich
Polat, Recep	ETH Zürich
Zimmermann, Wim	ZHAW
Arm, Philip	ETH Zurich
Rudin, Nikita	ETH Zurich, NVIDIA
Kolvenbach, Hendrik	ETH Zurich
Hutter, Marco	ETH Zurich
Keywords: Legged Robots, Space Robotics and Automation, Engineering for Robotic Systems Abstract: We present SpaceHopper, a three-legged, small-scale robot designed for future mobile exploration of asteroids and moons. The robot weighs 5.2kg and has a body size of 245mm while using space-qualifiable components. Furthermore, SpaceHopper's design and controls make it well-adapted for investigating dynamic locomotion modes with extended flight-phases. Instead of gyroscopes or fly-wheels, the system uses its three legs to reorient the body during flight in preparation for landing. We control the leg motion for reorientation using Deep Reinforcement Learning policies. In a simulation of Ceres' gravity (0.029g), the robot can reliably jump to commanded positions up to 6m away. Our real-world experiments show that SpaceHopper can successfully reorient to a safe landing orientation within 9.7 degree inside a rotational gimbal and jump in a counterweight setup in Earth's gravity. Overall, we consider SpaceHopper an important step towards controlled jumping locomotion in low-gravity environments.

13:30-15:00, Paper TuBT17-AX.8	Add to My Program
ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots

Shafiee, Milad	EPFL
Bellegarda, Guillaume	EPFL
Ijspeert, Auke	EPFL
Keywords: Legged Robots, Biomimetics Abstract: Learning a locomotion policy for quadruped robots has traditionally been constrained to a specific robot morphology, mass, and size. The learning process must usually be repeated for every new robot, where hyperparameters and reward function weights must be re-tuned to maximize performance for each new system. Alternatively, attempting to train a single policy to accommodate different robot sizes, while maintaining the same degrees of freedom (DoF) and morphology, requires either complex learning frameworks, or mass, inertia, and dimension randomization, which leads to prolonged training periods. In our study, we show that drawing inspiration from animal motor control allows us to effectively train a single locomotion policy capable of controlling a diverse range of quadruped robots. The robot differences encompass: a variable number of DoFs, (i.e. 12 or 16 joints), three distinct morphologies, a broad mass range spanning from 2 kg to 200 kg, and nominal standing heights ranging from 18 cm to 100 cm. Our policy modulates a representation of the Central Pattern Generator (CPG) in the spinal cord, effectively coordinating both frequencies and amplitudes of the CPG to produce rhythmic output (Rhythm Generation), which is then mapped to a Pattern Formation (PF) layer. Across different robots, the only varying component is the PF layer, which adjusts the scaling parameters for the stride height and length. Subsequently, we evaluate the sim-to-real transfer by testing the single policy on both the Unitree Go1 and A1 robots. Remarkably, we observe robust performance, even when adding a 15 kg load, equivalent to 125% of the A1 robot’s nominal mass.


TuBT20-NT Oral Session, NT-G302	Add to My Program
Robotics and Automation in Construction

Co-Chair: Osa, Takayuki	University of Tokyo

13:30-15:00, Paper TuBT20-NT.1	Add to My Program
Hyblock: Hardware Realization and Control of Modular Hydraulic Robots with Dowel Connectors

Hyon, Sang-Ho	Ritsumeikan University
Ando, Ryo	Ritsumeikan University
Sono, Eiji	Ritsumeikan University
Sugimoto, Shunichi	Ritsumeikan University
Saito, Yasushi	KYB-YS Co. Ltd
Keywords: Robotics and Automation in Construction, Hydraulic/Pneumatic Actuators, Motion Control Abstract: This paper presents the hardware design and development of Hyblock, a modular hydraulic robot for heavy-duty application such as construction. The robot is equipped with a simple docking mechanism called a C-type expansion dowel and a novel hydraulic circuit MHSB that matches the modular structure. In this paper, we first report on the design of the robot hardware including the dowel and hydraulic circuit, then present preliminary experiments on pressure-based torque control and docking control using proximal magnetic sensors. Next, we propose a framework for dynamic reconfiguration and task-space motion control built on the concept of dowel connectors. Simulation results demonstrate that a collective modular robot achieves desired motion tasks while keeping all normal contact forces of the connectors being lower-bound. The results are also explained in the supplementary video.

13:30-15:00, Paper TuBT20-NT.2	Add to My Program
PLASTR: Planning for Autonomous Sampling-Based Trowelling

Kuhlmann-Jørgensen, Mads Alber	ETH Zurich
Pankert, Johannes	ETH Zuerich
Pietrasik, Lukasz Leszek	ETH Zurich
Hutter, Marco	ETH Zurich
Keywords: Robotics and Automation in Construction, Motion and Path Planning, Optimization and Optimal Control Abstract: Plaster is commonly used in the construction industry to finish walls and ceilings, but the application is labor-intensive and physically strenuous, which motivates the need for automation. We present PLASTR, a receding horizon optimization-based planning algorithm for robotic plaster trowelling. It samples trowelling sequence rollouts from a new plaster simulator and weights them according to the flatness of the finished wall. The proposed simulator approximates the real-world plaster-trowel interaction adequately while allowing execution orders of magnitude faster than real-time. We evaluate PLASTR in simulation and on a real-world test setup and compare it to two handcrafted heuristic baseline algorithms. PLASTR performs equal to or better than the best heuristic in terms of material coverage for both simulated and real-world experiments while being 50% more efficient in terms of trowelled distance.

13:30-15:00, Paper TuBT20-NT.3	Add to My Program
Self-Reconfigurable Robots for Collaborative Discrete Lattice Assembly

Smith, Miana	MIT
Abdel-Rahman, Amira	MIT
Gershenfeld, Neil	Massachusetts Institute of Technology
Keywords: Robotics and Automation in Construction, Multi-Robot Systems, Assembly Abstract: We present a robotic system for the assembly of 3D discrete lattice structures in which the robots are able to self-reproduce, such that the assembly system may scale its own parallelization. Robots and structures are made from a set of compatible building blocks, or voxels, which can be assembled and reassembled into more complex structures. Robotic modules are made by combining actuators with a functional voxel, which routes electrical power and signals. Robotic modules then assemble into reconfigurable robots via a reversible solder joint. The robot assembles higher performance structures using a set of construction voxels, which do not contain electrical features. This paper describes the design, development, and evaluation of this assembly system, including the robotic hardware, lattice material, and planning and controls methods. We demonstrate the system through a set of fundamental assembly tasks: the robot assembling another robot, and the two robots collaborating to assemble a small structure.

13:30-15:00, Paper TuBT20-NT.4	Add to My Program
LiSTA: Geometric Object-Based Change Detection in Cluttered Environments

Rowell, Joseph	University of Oxford, Oxford Robotics Institute
Zhang, Lintong	University of Oxford
Fallon, Maurice	University of Oxford
Keywords: Robotics and Automation in Construction, Object Detection, Segmentation and Categorization, SLAM Abstract: We present LiSTA (LiDAR Spatio-Temporal Analysis), a system to detect probabilistic object-level change over time using multi-mission SLAM. Many applications require such a system, including construction, robotic navigation, long-term autonomy, and environmental monitoring. We focus on the semi-static scenario where objects are added, subtracted, or changed in position over weeks or months. Our system combines multi-mission LiDAR SLAM, volumetric differencing, object instance description, and correspondence grouping using learned descriptors to keep track of an open set of objects. Object correspondences between missions are determined by clustering the object's learned descriptors. We demonstrate our approach using datasets collected in a simulated environment and a real-world dataset captured using a LiDAR system mounted on a quadruped robot monitoring an industrial facility containing static, semi-static, and dynamic objects. Our method demonstrates superior performance in detecting changes in semi-static environments compared to existing methods.

13:30-15:00, Paper TuBT20-NT.5	Add to My Program
Scalable Underwater Assembly with Reconfigurable Visual Fiducials

Lensgraf, Samuel	Dartmouth College
Sarkar, Ankita	Dartmouth College
Pediredla, Adithya	Dartmouth College
Balkcom, Devin	Dartmouth College
Quattrini Li, Alberto	Dartmouth College
Keywords: Robotics and Automation in Construction, Perception for Grasping and Manipulation, Marine Robotics Abstract: We present a scalable combined localization infrastructure deployment and task planning algorithm for underwater assembly. Infrastructure is autonomously modified to suit the needs of manipulation tasks based on an uncertainty model on the infrastructure's positional accuracy. Our uncertainty model can be combined with the noise characteristics from multiple devices. For the task planning problem, we propose a layer-based clustering approach that completes the manipulation tasks one cluster at a time. We employ movable visual fiducial markers as infrastructure and an autonomous underwater vehicle (AUV) for manipulation tasks. The proposed task planning algorithm is computationally simple, and we implement it on AUV without any offline computation requirements. Combined hardware experiments and simulations over large datasets show that the proposed technique is scalable to large areas.

13:30-15:00, Paper TuBT20-NT.6	Add to My Program
Automatic Loading of Unknown Material with a Wheel Loader Using Reinforcement Learning

Eriksson, Daniel	Tampere University
Ghabcheloo, Reza	Tampere University
Geimer, Marcus	Karlsruhe Institute of Technology
Keywords: Robotics and Automation in Construction, Reinforcement Learning Abstract: Loading multiple different materials with wheel loaders is a challenging task because various materials require different loading techniques. It's, therefore, difficult to find a single controller capable of handling them all. One solution is to use a base controller and fine-tune it for different materials. Reinforcement Learning (RL) automates this process without the need for collecting additional human-annotated data. We investigated the feasibility of this approach using a full-size 24-tonnes wheel loader in the real world and demonstrated that it's possible to fine-tune a neural network controller that was originally trained with imitation learning on blasted rock for use with an unknown gravel material, requiring 20 bucket fillings. Additionally, we showcased the adaptability of a controller pre-trained on woodchips for an unknown gravel material, requiring 40 bucket fillings. We also proposed a novel reward function for the material loading task. Finally, we examined how the sampling time of the reinforcement learning algorithm affects convergence speed and adaptability. Our results demonstrate that it's optimal to match the sampling time of the RL algorithm to the delays of the wheel loader's hydraulic actuators.

13:30-15:00, Paper TuBT20-NT.7	Add to My Program
Learning Adaptive Policies for Autonomous Excavation under Various Soil Conditions by Adversarial Domain Sampling

Osa, Takayuki	University of Tokyo
Osajima, Naoto	Kyushu Institute of Technology
Aizawa, Masanori	Komatsu Ltd
Harada, Tatsuya	The University of Tokyo
Keywords: Robotics and Automation in Construction, Reinforcement Learning Abstract: Excavation is a frequent task in construction. In this context, automation is expected to reduce hazard risks and labor-intensive work. To this end, recent studies have investigated using reinforcement learning (RL) to automate construction machines. One of the challenges in applying RL to excavation tasks concerns obtaining skills adaptable to various conditions. When the conditions of soils differ, the optimal plans for efficiently excavating the target area will significantly differ. In existing meta-learning methods, the domain parameters are often uniformly sampled; this implicitly assumes that the difficulty of the task does not change significantly for different domain parameters. In this study, we empirically show that uniformly sampling the domain parameters is insufficient when the task difficulty varies according to the task parameters. Correspondingly, we develop a framework for learning a policy that can be generalized to various domain parameters in excavation tasks. We propose two techniques for improving the performance of an RL method in our problem setting: adversarial domain sampling and domain parameter estimation with a sensitivity-aware importance weight. In the proposed adversarial domain sampling technique, the domain parameters leading to low expected Q-values are actively sampled during the training phase. We empirically show that our approach outperforms existing meta-learning and domain adaptation methods for excavation tasks.

13:30-15:00, Paper TuBT20-NT.8	Add to My Program
Robotic Inspection and Subsurface Defect Mapping Using Impact-Echo and Ground Penetrating Radar

Hoxha, Ejup	The City College of New York
Feng, Jinglun	The City College of New York
Sanakov, Diar	New York University
Xiao, Jizhong	The City College of New York
Keywords: Robotics and Automation in Construction, Sensor-based Control, Sensor Fusion Abstract: Concrete infrastructure often develops a variety of internal flaws that cannot be detected through visual inspection alone, and must be regularly inspected with other methods to maintain structural integrity. It has been demonstrated through previous studies that relying solely on a single non-destructive evaluation (NDE) method can be insufficient in providing a comprehensive evaluation of the structure's condition. In addition, manual NDE data collection can be labor-intensive for on-site engineers. This paper presents a robotic inspection system that uses vision-based positioning and tags NDE measurement with pose information to reveal and map subsurface defects. The system consists of three modules: 1) an Omni-directional robotic data collection platform equipped with a Realsense D435i camera for localization, an impact-echo (IE) sensor, and a ground penetrating radar (GPR), to perform automatic NDE data collection; 2) an IE data processing module that utilizes both learning-based and classical methods to interpret the IE data and reveal subsurface objects; 3) a GPR data processing module to reconstruct underground targets and create a 3D map for better visualization. Field testing demonstrates that the robotic system significantly increases the data collection speed, and the correlation of findings from both IE and GPR sensors give a comprehensive evaluation of concrete structures that will benefit the inspection and maintenance industry of civil infrastructure.


TuBT24-NT Oral Session, NT-G402	Add to My Program
Multi-Robot SLAM

Chair: Kim, Ayoung	Seoul National University
Co-Chair: Beltrame, Giovanni	Ecole Polytechnique De Montreal

13:30-15:00, Paper TuBT24-NT.1	Add to My Program
Tight Fusion of Odometry and Kinematic Constraints for Multiple Aerial Vehicles in Physical Interconnection

Fan, Yingjun	Beijing Institute of Technology
Shi, Chuanbeibei	Univeristy of Bristol
Lai, Ganghua	Beijing Institute of Technology
Zhang, Ruiheng	Beijing Institute of Technology
Yu, Yushu	Beijing Institute of Technology
Sun, Fuchun	Tsinghua University
Dong, Yiqun	Nanyang Technological University
Keywords: Multi-Robot SLAM, Aerial Systems: Perception and Autonomy, Visual-Inertial SLAM Abstract: Integrated aerial Platforms (IAPs), comprising multiple aircrafts, are typically fully actuated and hold significant potential for aerial manipulation tasks. Differing from a multiple aerial swarm, the aircrafts within the IAP are interconnected, presenting promising opportunities for enhancing localization. Incorporating the physical constraints of these multiple aircrafts to improve the accuracy and reliability of integrated aircraft positioning and navigation systems is a challenging yet highly significant problem. In this paper, we introduce a distributed multi-aircraft visual-inertial-range odometry system that analyzes the position, velocity, and attitude constraints within the IAP. Leveraging constraint relationships in the IAP, we propose corresponding methods that tightly fuse visual-inertial-range odometry and kinematic constraints to optimize odometry accuracy. Our system's performance is validated using a collected dataset, resulting in a notable 28.7% reduction in drift compared to the baseline.

13:30-15:00, Paper TuBT24-NT.2	Add to My Program
Robust Multi-Robot Global Localization with Unknown Initial Pose Based on Neighbor Constraints

Zhang, Yaojie	Shenzhen Institute of Advanced Technology，Chinese Academy
Luo, Haowen	Shenzhen Institute of Advanced Technology，Chinese Academy
Wang, Weijun	Guangzhou Institute of Advanced Technology, Chinese Academy of Sc
Feng, Wei	Shenzhen Institutes of Advanced Technology, Chinese Academy of S
Keywords: Multi-Robot SLAM, Localization, Mapping Abstract: Multi-robot global localization (MR-GL) with unknown initial positions in a large scale environment is a challenging task. The key point is the data association between different robots' viewpoints. It also makes traditional Appearance-based localization methods unusable. Recently, researchers have utilized the object's semantic invariance to generate a semantic graph to address this issue. However, previous works lack robustness and are sensitive to the overlap rate of maps, resulting in unpredictable performance in real-world environments. In this paper, we propose a data association algorithm based on neighbor constraints to improve the robustness of the system. We demonstrate the effectiveness of our method on three different datasets, indicating a significant improvement in robustness compared to previous works.

13:30-15:00, Paper TuBT24-NT.3	Add to My Program
Swarm-SLAM: Sparse Decentralized Collaborative Simultaneous Localization and Mapping Framework for Multi-Robot Systems

Lajoie, Pierre-Yves	École Polytechnique De Montréal
Beltrame, Giovanni	Ecole Polytechnique De Montreal
Keywords: Multi-Robot SLAM, Multi-Robot Systems, SLAM Abstract: Collaborative Simultaneous Localization And Mapping (C-SLAM) is a vital component for successful multi-robot operations in environments without an external positioning system, such as indoors, underground or underwater. In this paper, we introduce Swarm-SLAM, an open-source C-SLAM system that is designed to be scalable, flexible, decentralized, and sparse, which are all key properties in swarm robotics. Our system supports lidar, stereo, and RGB-D sensing, and it includes a novel inter-robot loop closure prioritization technique that reduces communication and accelerates convergence. We evaluated our ROS 2 implementation on five different datasets, and in a real-world experiment with three robots communicating through an ad-hoc network. Our code is publicly available: https://github.com/MISTLab/Swarm-SLAM

13:30-15:00, Paper TuBT24-NT.4	Add to My Program
AutoFusion: Autonomous Visual Geolocation and Online Dense Reconstruction for UAV Cluster

Zhang, Yizhu	Northwestern Polytechnical University
Bu, Shuhui	Northwestern Polytechnical University
Dong, Yifei	Northwestern Polytechnical University
Yu, Zhang	NorthWestern Polytechnical University
Li, Kun	Northwestern Polytechnical University
Chen, Lin	Northwestern Polytechnical University
Keywords: Multi-Robot SLAM, SLAM, Mapping Abstract: Real-time dense reconstruction using Unmanned Aerial Vehicle (UAV) is becoming increasingly popular in large-scale rescue and environmental monitoring tasks. However, due to the energy constraints of a single UAV, the efficiency can be greatly improved through the collaboration of multi-UAVs. Nevertheless, when faced with unknown environments or the loss of Global Navigation Satellite System (GNSS) signal, most multi-UAV SLAM systems can't work, making it hard to construct a global consistent map. In this paper, we propose a real-time dense reconstruction system called AutoFusion for multiple UAVs, which robustly supports scenarios with lost global positioning and weak co-visibility. A method for Visual Geolocation and Matching Network (VGMN) is suggested by constructing a graph convolutional neural network as a feature extractor. It can acquire geographical location information solely through images. We also present a real-time dense reconstruction framework for multi-UAV with autonomous visual geolocation. UAV agents send images and relative positions to the ground server, which processes the data using VGMN for multi-agent geolocation optimization, including initialization, pose graph optimization, and map fusion. Extensive experiments demonstrate that our system can efficiently and stably construct large-scale dense maps in real-time with high accuracy and robustness.

13:30-15:00, Paper TuBT24-NT.5	Add to My Program
CoLRIO: LiDAR-Ranging-Inertial Centralized State Estimation for Robotic Swarms

Zhong, Shipeng	Sun Yat-Sen University
Chen, Hongbo	Sun Yat-Sen University
Qi, Yuhua	Sun Yat-Sen University
Feng, Dapeng	Sun Yat-Sen University
Chen, Zhiqiang	Sun Yat-Sen University
Jin, Wu	UESTC
Wen, Weisong	Hong Kong Polytechnic University
Liu, Ming	Hong Kong University of Science and Technology (Guangzhou)
Keywords: Multi-Robot SLAM, SLAM, Range Sensing Abstract: Collaborative state estimation using heterogeneous multi-sensors is a fundamental prerequisite for robotic swarms operating in GPS-denied environments, presenting a formidable research challenge. In this work, we propose a centralized system designed to facilitate collaborative LiDAR-ranging-inertial state estimation in expansive environments, enabling robotic swarms to operate without the need for anchor deployment. The system optimally distributes computationally intensive tasks to a potent central server, thereby alleviating the computational burden on individual robots for local odometry calculations. The server back-end establishes a global reference by harnessing the shared data, reﬁning the joint pose graph optimization through place recognition, global optimization, and the removal of redundant data to ensure an precise and robust collaborative state estimation. Extensive evaluations of our system using both public and our custom datasets showcase notable improvements in the accuracy of collaborative SLAM estimates. Furthermore, our system demonstrates its competence in large-scale missions, where ten robots collaborate seamlessly in performing SLAM tasks. To benefit the community, we will open-source our code at https://github.com/PengYu-team/Co-LRIO.

13:30-15:00, Paper TuBT24-NT.6	Add to My Program
Relative Localization Estimation for Multiple Robots Via the Rotating Ultra-Wideband Tag

Liu, Jinxin	Nanyang Technological University
Hu, Guoqiang	Nanyang Technological University,
Keywords: Range Sensing, Multi-Robot Systems, Localization Abstract: Most distributed algorithms for robot coordination require relative location information, but how to obtain relative locations in a distributed manner is still a primary problem to address in multi-robot applications. In order to obtain the relative locations between robots, no matter whether they are in relative motion or stationary situation, we design a rotating ultra-wideband tag to provide the persistency of excitation condition and two estimation algorithms to estimate the relative locations in a distributed manner. Moreover, our approach relies only on on-board sensors and requires only one ultra-Wideband tag per robot, eliminating the need for any ground anchors, thus allowing deployment in GNSS-denied environments without range restrictions. The proposed approach in this letter is also tested in simulations and experiments to verify the theoretical findings and effectiveness in practice.

13:30-15:00, Paper TuBT24-NT.7	Add to My Program
Asynchronous Multiple LiDAR-Inertial Odometry Using Point-Wise Inter-LiDAR Uncertainty Propagation

Jung, Minwoo	Seoul National University
Jung, Sangwoo	Seoul National University
Kim, Ayoung	Seoul National University
Keywords: Range Sensing, SLAM, Mapping Abstract: In recent years, multiple Light Detection and Ranging (LiDAR) systems have grown in popularity due to their enhanced accuracy and stability from the increased field of view (FOV). However, integrating multiple LiDARs can be challenging, attributable to temporal and spatial discrepancies. Common practice is to transform points among sensors while requiring strict time synchronization or approximating transformation among sensor frames. Unlike existing methods, we elaborate the inter-sensor transformation using continuous time (CT) inertial measurement unit (IMU) modeling and derive associated ambiguity as a point-wise uncertainty. This uncertainty, modeled by combining the state covariance with the acquisition time and point range, allows us to alleviate the strict time synchronization and to overcome FOV difference. The proposed method has been validated on both public and our datasets and is compatible with various LiDAR manufacturers and scanning patterns. We open-source the code for public access at https://github.com/minwoo0611/MA-LIO.

13:30-15:00, Paper TuBT24-NT.8	Add to My Program
AutoMerge: A Framework for Map Assembling and Smoothing in City-Scale Environments

Yin, Peng	City University of Hong Kong
Zhao, Shiqi	University of California San Diego
Lai, Haowen	University of Pennsylvania
Ge, Ruohai	Carnegie Mellon Univeristy
Zhang, Ji	Carnegie Mellon University
Choset, Howie	CMU
Scherer, Sebastian	Carnegie Mellon University
Keywords: Map Merging, Mapping, Multi-Robot Systems, SLAM Abstract: In the era of advancing autonomous driving and increasing reliance on geospatial information, high-precision mapping not only demands accuracy but also flexible construction. Current approaches mainly rely on expensive mapping devices, which are time-consuming for city-scale map construction and vulnerable to erroneous data associations without accurate GPS assistance. We present AutoMerge, a novel framework for merging large-scale maps that surpasses these limitations, which (i) provides robust place recognition performance despite differences in both translation and viewpoint, (ii) is capable of identifying and discarding incorrect loop closures caused by perceptual aliasing, and (iii) effectively associates and optimizes large-scale and numerous map segments in the real-world scenario. AutoMerge utilizes multi-perspective fusion and adaptive loop closure detection for accurate data associations, and it uses incremental merging to assemble large maps from individual trajectory segments given in random order and with no initial estimations. Furthermore, AutoMerge performs posegraph optimization after assembling the segments to smooth the merged map globally. We demonstrate AutoMe


TuKN1-HL Keynote Session, National Convention Hall	Add to My Program
Keynote: Robotics Foundations I

Chair: Wang, Zhidong	Chiba Institute of Technology

15:30-16:00, Paper TuKN1-HL.1	Add to My Program
Biosyncretic Sensing, Actuation and Intelligence for Robotics

Liu, Lianqing	Shenyang Institute of Automation


TuKN2-CC Keynote Session, CC-Main Hall	Add to My Program
Keynote: Automation I

Chair: Vogel-Heuser, Birgit	Technical University Munich

15:30-16:00, Paper TuKN2-CC.1	Add to My Program
Digital Twins for Manufacturing Automation

Tilbury, Dawn	University of Michigan


TuKN3-CC Keynote Session, CC-301	Add to My Program
Keynote: Human Centered and Lifelike Robotics I

Chair: Kurita, Yuichi	Hiroshima University

15:30-16:00, Paper TuKN3-CC.1	Add to My Program
Beyond Force Feedback: Cutaneous Haptics in Human-Centered Robotics

Pacchierotti, Claudio	Centre national de la recherche scientifique (CNRS)


TuKN4-NT Keynote Session, NT-G2	Add to My Program
Keynote: Robots for Unstructured Environments I

Chair: Ishigami, Genya	Keio University

15:30-16:00, Paper TuKN4-NT.1	Add to My Program
Medical Robotics for Cell Surgery - Science and Applications

Sun, Yu	University of Toronto


TuKN5-NT Keynote Session, NT-G7	Add to My Program
Keynote: Healthcare and Medical Robotics I

Chair: Harada, Kanako	The University of Tokyo

15:30-16:00, Paper TuKN5-NT.1	Add to My Program
Adaptable AI-Enabled Robots to Create a Vibrant Society - Moonshot R&D Projects in Japan -

Hirata, Yasuhisa	Tohoku University


TuCA1-CC Award Session, CC-Main Hall	Add to My Program
Medical Robotics

Chair: Lueth, Tim C.	Technical University of Munich
Co-Chair: Althoefer, Kaspar	Queen Mary University of London

16:30-18:00, Paper TuCA1-CC.1	Add to My Program
Exoskeleton-Mediated Physical Human-Human Interaction for a Sit-To-Stand Rehabilitation Task

Vianello, Lorenzo	Shirley Ryan Ability Lab
Kucuktabak, Emek Baris	Northwestern University, Shirley Ryan Ability Lab
Short, Matthew	Northwestern University, Shirley Ryan AbilityLab
Lhoste, Clément	Northwestern University
Amato, Lorenzo	Scuola Superiore Sant'Anna
Lynch, Kevin	Northwestern University
Pons, Jose L.	Shirley Ryan AbilityLab
Keywords: Rehabilitation Robotics, Physical Human-Robot Interaction, Prosthetics and Exoskeletons Abstract: Sit-to-Stand (StS) is a fundamental daily activity that can be challenging for stroke survivors due to strength, motor control, and proprioception deficits in their lower limbs. Existing therapies involve repetitive StS exercises, but these can be physically demanding for therapists while assistive devices may limit patient participation and hinder motor learning. To address these challenges, this work proposes the use of two lower-limb exoskeletons to mediate physical interaction between therapists and patients during a StS rehabilitative task. This approach offers several advantages, including improved therapist-patient interaction, safety enforcement, and performance quantification. The whole body control of the two exoskeletons transmits online feedback between the two users, but at the same time assists in movement and ensures balance, and thus helping subjects with greater difficulty. In this study we present the architecture of the framework, presenting and discussing some technical choices made in the design.

16:30-18:00, Paper TuCA1-CC.2	Add to My Program
Intraoperatively Iterative Hough Transform Based In-Plane Hybrid Control of Arterial Robotic Ultrasound for Magnetic Catheterization

Li, Zhengyang	University of Macau
Yeerbulati, Magejiang	University of Macau
Xu, Qingsong	University of Macau
Keywords: Compliant Joints and Mechanisms, Surgical Robotics: Steerable Catheters/Needles, Medical Robots and Systems Abstract: This paper presents an intraoperatively iterative Hough transform (IHT) based in-plane hybrid control of extracorporeal ultrasound (US) guided magnetic catheterization for arterial intervention. One uniqueness lies in that both control and tracking of the arterial robotic ultrasound end-effector have been implemented to improve performance. Firstly, the magnetic catheter model and hybrid visual/force servoing control scheme of the extracorporeal ultrasound-integrated tracking arm (EUTA) are derived based on the interaction Jacobian matrix and impedance modeling. Meanwhile, we implement a tracking method of in-plane ultrasound catheter's tip and detection of vascular boundaries utilizing intensity-level iterative Hough-transform with Iterative End-Ponit Fitting (IEPF). The effectiveness of the proposed control and tracking method has been verified by conducting in vitro experimental studies for catheter steering of a soft tissue-imitating phantom. Results show that an average steering error of 0.56 mm and signal-to-noise-ratio (SNR) of 12.2 are obtained for the ultrasound imaging at high synchronization along with a low target lost rate (15.8%) and constant-force tracking (2.50pm1.02 N).

16:30-18:00, Paper TuCA1-CC.3	Add to My Program
Efficient Model Learning and Adaptive Tracking Control of Magnetic Micro-Robots for Non-Contact Manipulation

Jia, Yongyi	Tsinghua University
Miao, Shu	Tsinghua University
Zhou, Junjian	Shenyang Institute of Automation, Chinese Academy of Sciences
Jiao, Niandong	Shenyang Institute of Automation, Chinese Academy of Sciences
Liu, Lianqing	Shenyang Institute of Automation
Li, Xiang	Tsinghua University
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Motion Control Abstract: Magnetic microrobots can be navigated by an external magnetic field to autonomously move within living organisms with complex and unstructured environments. Potential applications include drug delivery, diagnostics, and therapeutic interventions. Existing techniques commonly impart magnetic properties to the target object, or drive the robot to contact and then manipulate the object, both probably inducing physical damage. This paper considers a non-contact formulation, where the robot spins to generate a repulsive field to push the object without physical contact. Under such a formulation, the main challenge is that the motion model between the input of the magnetic field and the output velocity of the target object is commonly unknown and difficult to analyze. To deal with it, this paper proposes a data-driven-based solution. A neural network is constructed to efficiently estimate the motion model. Then, an approximate model-based optimal control scheme is developed to push the object to track a time-varying trajectory, maintaining the non-contact with distance constraints. Furthermore, a straightforward planner is introduced to assess the adaptability of non-contact manipulation in a cluttered unstructured environment. Experimental results are presented to show the tracking and navigation performance of the proposed scheme.

16:30-18:00, Paper TuCA1-CC.4	Add to My Program
Design and Implementation of a Robotized Hand-Held Dissector for Endoscopic Pulmonary Endarterectomy

Zhu, Runfeng	The Hong Kong Polytechnic University
Hou, Xilong	Hong Kong Institute of Science and Innovation Chinese Academy Of
Huang, Wei	CAIR
Du, Lei	Sichuan University
Wu, Zhong	West China Hospital, Sichuan University
Liu, Hongbin	Hong Kong Institute of Science & Innovation, Chinese Academy Of
Chu, Henry	The Hong Kong Polytechnic University
Zhao, Qing xiang	Hong Kong Institute of Science & Innovation, Centre for Artifici
Keywords: Surgical Robotics: Steerable Catheters/Needles, Mechanism Design, Force Control Abstract: Severe chronic pulmonary endarterectomy needs a dissector to delicately remove proliferative intima located in the depth of the pulmonary artery. This work proposed a novel endoscopic robotized steerable dissector for this surgery, enabling easier access to curved deep artery branches. The handheld surgical dissector also provides suction and visualization for surgeons to enhance effectiveness. The steerable section is a cable-driven hinged structure, and through an antagonistic mechanism regulating the cable tension, the overall stiffness is adjusted to adapt to various surroundings. The mapping between actuation space and shape configuration and tip force estimation model are respectively established for further closed-loop control scheme, achieving adaptive positioning and safe surgery. Experiments first demonstrate the feasibility of the proposed models and ex vivo trials validated the usage and effectiveness of the robotized dissector.

16:30-18:00, Paper TuCA1-CC.5	Add to My Program
Colibri5: Real-Time Monocular 5-DoF Trocar Pose Tracking for Robot-Assisted Vitreoretinal Surgery

Dehghani, Shervin	TUM
Sommersperger, Michael	Technical University of Munich
Saleh, Mahdi	Technical University Munich
Alikhani, Alireza	Augen Klinik Und Poliklinik, Klinikum Rechts Der Isar Der Techn
Busam, Benjamin	Technical University of Munich
Gehlbach, Peter	Johns Hopkins Medical Institute
Iordachita, Ioan Iulian	Johns Hopkins University
Navab, Nassir	TU Munich
Nasseri, M. Ali	Technische Universitaet Muenchen
Keywords: Computer Vision for Medical Robotics, Vision-Based Navigation, Visual Tracking Abstract: Retinal surgery is a complex medical procedure that requires high precision dexterity to perform delicate instrument maneuvers with sub-millimeter accuracy. Minimizing the manual tremor and achieving precise and repeatable execution of surgical tasks has motivated the development of robotic platforms to overcome the limitations of manual surgery. However, specific tasks, such as instrument insertion through the trocar, are more challenging in robotic surgery than in conventional manual procedures since the robot control is often optimized for navigation inside the eye. This challenges the integration of robotic systems, creating a high cognitive load on the operator and prolonging the surgery time. Moreover, misalignment of the robot's remote center of motion (RCM) and trocar position during the procedure can lead to excessive forces between the instrument and the trocar, potentially causing patient trauma. Precise and rapid localization of the trocars enables the automation of the insertion procedure and dynamic compensation of eye motion. In this work, we present a real-time marker-less method for 3D pose tracking of trocar, achieved with only a single monocular camera. Our experiments show promising results towards real-time trocar pose estimation and tracking, achieving an average error of 3 degrees in trocar orientation estimation, with an average processing time of 15 fps. This could serve as a foundation to improve robotic systems' automation, integration, and efficiency of robotic systems for retinal surgery. The dataset created for this work is made publicly available.

16:30-18:00, Paper TuCA1-CC.6	Add to My Program
Hybrid Volitional Control of a Robotic Transtibial Prosthesis Using a Phase Variable Impedance Controller

Posh, Ryan	University of Notre Dame
Tittle, Jonathan Allen	University of Notre Dame
Kelly, David	University of Notre Dame
Schmiedeler, James	University of Notre Dame
Wensing, Patrick M.	University of Notre Dame
Keywords: Prosthetics and Exoskeletons Abstract: For robotic transtibial prosthesis control, the global tibia kinematics can be used to monitor gait cycle progression and command smooth and continuous actuation. In this work, these global tibia kinematics define a phase variable impedance controller (PVIC), which is implemented as the non-volitional base controller within a hybrid volitional control framework (PVI-HVC). The gait progression estimation and biomechanic performance of one able-bodied individual walking on a robotic ankle prosthesis via a bypass adapter are compared for three control schemes: benchmark passive controller, PVIC, and PVI-HVC. The different actuation of each had a direct effect on the global tibia kinematics, but the average deviation between the estimated and ground truth gait percentages were 1.6%, 1.8%, and 2.1%, respectively, for each controller. Both PVIC and PVI-HVC produced good agreement with able-bodied kinematic and kinetic references. As designed, PVI-HVC results were similar to those of PVIC when the user used low volitional intent, but yielded higher peak plantarflexion, peak torque, and peak power when the user commanded high volitional input in late stance. This additional torque and power also allowed the user to volitionally and continuously achieve activities beyond level walking, such as ascending ramps, avoiding obstacles, standing on tip-toes, and tapping the foot. In this way, PVI-HVC offers the kinetic and kinematic performance of the PVIC during level ground walking, along with the freedom to volitionally pursue alternative activities.


TuCA2-CC Award Session, CC-301	Add to My Program
Multi-Robot Systems

Chair: Kelly, Jonathan	University of Toronto
Co-Chair: Sabattini, Lorenzo	University of Modena and Reggio Emilia

16:30-18:00, Paper TuCA2-CC.1	Add to My Program
Do We Run Large-Scale Multi-Robot Systems on the Edge? More Evidence for Two-Phase Performance in System Size Scaling

Kuckling, Jonas	University of Konstanz
Luckey, Robin	Institute of Computer Engineering, University of Lübeck
Avrutin, Viktor	Institute for Systems Theory and Automatic Control, University O
Vardy, Andrew	Memorial University of Newfoundland
Reina, Andreagiovanni	Université Libre De Bruxelles
Hamann, Heiko	University of Konstanz
Keywords: Swarm Robotics, Multi-Robot Systems Abstract: With increasing numbers of mobile robots arriving in real-world applications, more robots coexist in the same space, interact, and possibly collaborate. Methods to provide such systems with system size scalability are known, for example, from swarm robotics. Example strategies are self-organizing behavior, a strict decentralized approach, and limiting the robot-robot communication. Despite applying such strategies, any multi-robot system breaks above a certain critical system size (i.e., number of robots) as too many robots share a resource (e.g., space, communication channel). We provide additional evidence based on simulations, that at these critical system sizes, the system performance separates into two phases: nearly optimal and minimal performance. We speculate that in real-world applications that are configured for optimal system size, the supposedly high-performing system may actually live on borrowed time as it is on a transient to breakdown. We provide two modeling options (based on queueing theory and a population model) that may help to support this reasoning.

16:30-18:00, Paper TuCA2-CC.2	Add to My Program
Learning for Dynamic Subteaming and Voluntary Waiting in Heterogeneous Multi-Robot Collaborative Scheduling

Jose, Williard Joshua	University of Massachusetts Amherst
Zhang, Hao	University of Massachusetts Amherst
Keywords: Multi-Robot Systems, Imitation Learning Abstract: Coordinating heterogeneous robots is essential for autonomous multi-robot teaming. To execute a set of dependent tasks as quickly as possible, and to complete tasks that cannot be addressed by individual robots, it is necessary to form subteams that can collaboratively finish the tasks. It is also advantageous for robots to wait for teammates and tasks to become available in order to form better subteams or reduce the overall completion time. To enable both abilities, we introduce a new graph learning approach that formulates heterogeneous collaborative scheduling as a bipartite matching problem that maximizes a reward matrix learned via imitation learning. We design a novel graph attention transformer network (GATN) that represents the problem of collaborative scheduling as a bipartite graph, and integrates both local and global graph information to estimate the reward matrix using graph attention networks and transformers. By relaxing the constraint of one-to-one correspondence in bipartite matching, our approach allows multiple robots to address the same task as a subteam. Our approach also enables voluntary waiting by introducing an idle task that the robots can select to wait. Experimental results have shown that our approach well addresses heterogeneous collaborative scheduling with dynamic subteam formation and voluntary waiting, and outperforms the previous and baseline methods.

16:30-18:00, Paper TuCA2-CC.3	Add to My Program
Asynchronous Distributed Smoothing and Mapping Via On-Manifold Consensus ADMM

McGann, Daniel	Carnegie Mellon University
Lassak, Kyle	Astrobotic Technology, Inc
Kaess, Michael	Carnegie Mellon University
Keywords: Multi-Robot SLAM, SLAM, Distributed Robot Systems Abstract: In this paper we present a fully distributed, asynchronous, and general purpose optimization algorithm for Consensus Simultaneous Localization and Mapping (CSLAM). Multi-robot teams require that agents have timely and accurate solutions to their state as well as the states of the other robots in the team. To optimize this solution we develop a CSLAM back-end based on Consensus ADMM called MESA (Manifold, Edge-based, Separable ADMM). MESA is fully distributed to tolerate failures of individual robots, asynchronous to tolerate communication delays and outages, and general purpose to handle any CSLAM problem formulation. We demonstrate that MESA exhibits superior convergence rates and accuracy compare to existing state-of-the art CSLAM back-end optimizers.

16:30-18:00, Paper TuCA2-CC.4	Add to My Program
Uncertainty-Bounded Active Monitoring of Unknown Dynamic Targets in Road-Networks with Minimum Fleet

Wang, Shuaikang	Peking University
Kantaros, Yiannis	Washington University in St. Louis
Guo, Meng	Peking University
Keywords: Multi-Robot Systems, Task and Motion Planning, Integrated Planning and Control Abstract: Fleets of unmanned robots can be beneficial for the long-term monitoring of large areas, e.g., to monitor wild flocks, detect intruders, search and rescue. Monitoring numerous dynamic targets in a collaborative and efficient way is a challenging problem that requires online coordination and information fusion. The majority of existing works either assume a passive all-to-all observation model to minimize the summed uncertainties over all targets by all robots, or optimize over the jointed discrete actions while neglecting the dynamic constraints of the robots and unknown behaviors of the targets. This work proposes an online task and motion coordination algorithm that ensures an explicitly-bounded estimation uncertainty for the target states, while minimizing the average number of active robots. The robots have a limited-range perception to actively track a limited number of targets simultaneously, of which their future control decisions are all unknown. It includes: (i) the assignment of monitoring tasks, modeled as a flexible size multiple vehicle routing problem with time windows (m-MVRPTW), given the predicted target trajectories with uncertainty measure in the road-networks; (ii) the nonlinear model predictive control (NMPC) for optimizing the robot trajectories under uncertainty and safety constraints. It is shown that the robots can switch between active and inactive roles dynamically online as required by the unknown monitoring task. The proposed methods are validated via large-scale simulations of up to 100 robots and targets.

16:30-18:00, Paper TuCA2-CC.5	Add to My Program
Observer-Based Distributed MPC for Collaborative Quadrotor-Quadruped Manipulation of a Cable-Towed Load

Xu, Shaohang	Huazhong University of Science and Technology
Wang, Yi'an	Huazhong University of Science and Technology
Zhang, Wentao	Huazhong University of Science and Technology
Ho, Chin Pang	City University of Hong Kong
Zhu, Lijun	Huazhong University of Science and Technology
Keywords: Distributed Robot Systems, Swarm Robotics, Legged Robots Abstract: This paper presents a collaborative quadrotor-quadruped robot system for the manipulation of a cable-towed payload. In particular, we aim to solve the challenge from the unknown dynamics of the cable-towed payload. To this end, we first propose novel dynamic models for both the quadrotor and the quadruped robot, taking into account the nonlinear robot dynamics and the uncertainties associated with the cable-towed load. Moreover, we design observers for the hybrid interaction between the robots and the payload. Theoretically, the convergence of these observers is analyzed using Lyapunov functions under mild technical assumptions. Finally, we seamlessly integrate the dynamics models and the observers into a distributed Model Predictive Control (MPC) framework with kinematics limitations and collision avoidance constraints. The proposed system is validated through challenging field experiments in indoor and outdoor environments, involving push disturbances, varying and unknown payloads, uneven terrains, etc.


TuCT4-CC Oral Session, CC-315	Add to My Program
Multi-Robot Systems III

Chair: Amato, Nancy	University of Illinois
Co-Chair: Nikolakopoulos, George	Luleå University of Technology

16:30-18:00, Paper TuCT4-CC.1	Add to My Program
Behavior Tree Capabilities for Dynamic Multi-Robot Task Allocation with Heterogeneous Robot Teams

Heppner, Georg	FZI Forschungszentrum Informatik
Oberacker, David	FZI Forschungszentrum Informatik
Roennau, Arne	FZI Forschungszentrum Informatik, Karlsruhe
Dillmann, Rüdiger	FZI - Forschungszentrum Informatik - Karlsruhe
Keywords: Multi-Robot Systems, Control Architectures and Programming, Behavior-Based Systems Abstract: While individual robots are becoming increasingly capable, the complexity of expected missions increases exponentially in comparison. To cope with this complexity, heterogeneous teams of robots have become a significant research interest in recent years. Making effective use of the robots and their unique skills in a team is challenging. Dynamic runtime conditions often make static task allocations infeasible, requiring a dynamic, capability-aware allocation of tasks to team members. To this end, we propose and implement a system that allows a user to specify missions using Behavior (BTs), which can then, at runtime, be dynamically allocated to the current robot team. The system allows to statically model an individual robot's capabilities within our ros_bt_py BT framework. It offers a runtime auction system to dynamically allocate tasks to the most capable robot in the current team. The system leverages utility values and pre-conditions to ensure that the allocation improves the overall mission execution quality while preventing faulty assignments. To evaluate the system, we simulated a find-and-decontaminate mission with a team of three heterogeneous robots and analyzed the utilization and overall mission times as metrics. Our results show that our system can improve the overall effectiveness of a team while allowing for intuitive mission specification and flexibility in the team composition.

16:30-18:00, Paper TuCT4-CC.2	Add to My Program
Multi-Robot Cooperative Navigation in Crowds: A Game-Theoretic Learning-Based Model Predictive Control Approach

Le, Viet-Anh	University of Delaware
Tadiparthi, Vaishnav	Honda Research Institute
Chalaki, Behdad	Honda Research Institute USA, Inc
Nourkhiz Mahjoub, Hossein	Honda Research Institute US
D'sa, Jovin	Honda Research Institute, USA
Moradi-Pari, Ehsan	Honda Research Institute
Malikopoulos, Andreas	Cornell University
Keywords: Multi-Robot Systems, Human-Aware Motion Planning, Social HRI Abstract: In this paper, we develop a control framework for the coordination of multiple robots as they navigate through crowded environments. Our framework comprises of a local model predictive control (MPC) for each robot and a social long short-term memory model that forecasts pedestrians' trajectories. We formulate the local MPC formulation for each individual robot that includes both individual and shared objectives, in which the latter encourages the emergence of coordination among robots. Next, we consider the multi-robot navigation and human-robot interaction, respectively, as a potential game and a two-player game, then employ an iterative best response approach to solve the resulting optimization problems in a centralized and distributed fashion. Finally, we demonstrate the effectiveness of coordination among robots in simulated crowd navigation.

16:30-18:00, Paper TuCT4-CC.3	Add to My Program
Multi-Robot Human-In-The-Loop Control under Spatiotemporal Specifications

Zhang, Yixiao	KTH Royal Institute of Technology
Nan Fernandez-Ayala, Victor	KTH Royal Institute of Technology
Dimarogonas, Dimos V.	KTH Royal Institute of Technology
Keywords: Multi-Robot Systems, Human-Robot Collaboration, Robotics and Automation in Agriculture and Forestry Abstract: In this work, we present a coordination strategy tailored for scenarios involving multiple agents and tasks. We devise a range of tasks using signal temporal logic (STL), each earmarked for specific agents. These tasks are then imposed through control barrier function (CBF) constraints to ensure completion. To extend existing methodologies, our framework adeptly manages interactions among multiple agents. This extension is facilitated by leveraging nonlinear model predictive control (NMPC) to compute trajectories that avoid collisions. An integral aspect of our approach is the integration of a human-in-the-loop (HIL) model. This model enables real-time integration of human directives into the coordination process. A novel task allocation protocol is embedded within the framework to guide this process. We substantiate our methodology through a series of experiments, which corroborate the viability and relevance of our algorithms.

16:30-18:00, Paper TuCT4-CC.4	Add to My Program
Hypergraph-Based Multi-Robot Task and Motion Planning

Motes, James	University of Illinois Urbana-Champaign
Chen, Tan	Michigan Technological University
Bretl, Timothy	University of Illinois at Urbana-Champaign
Morales, Marco	University of Illinois at Urbana-Champaign & Instituto Tecnológ
Amato, Nancy	University of Illinois
Keywords: Multi-Robot Systems, Task Planning, Motion and Path Planning, Cooperating Robots Abstract: We present a multi-robot task and motion planning method that, when applied to the rearrangement of objects by manipulators, results in solution times up to three orders of magnitude faster than existing methods and successfully plans for problems with up to twenty objects, more than three times as many objects as comparable methods. We achieve this improvement by decomposing the planning space to consider manipulators alone, objects, and manipulators holding objects. We represent this decomposition with a hypergraph where vertices are decomposed elements of the planning spaces and hyperarcs are transitions between elements. Existing methods use based representations where vertices are full composite spaces and edges are transitions between these. Using the hypergraph reduces the representation size of the planning space-for multi-manipulator object rearrangement, the number of hypergraph vertices scales linearly with the number of either robots or objects, while the number of hyperarcs scales quadratically with the number of robots and linearly with the number of objects. In contrast, the number of vertices and edges in graph representations scales exponentially with either.

16:30-18:00, Paper TuCT4-CC.5	Add to My Program
Measurement-Limited Multi-Agent, Relative Pose Estimation for On-Orbit Inspection

Mercier, Mark	Air Force Institute of Technology
Curtis, David	Air Force Institute of Technology
Taylor, Clark	Air Force Institute of Technology
Keywords: Space Robotics and Automation, Multi-Robot Systems, Autonomous Vehicle Navigation Abstract: Relative navigation methods are a critical enabling technology for the next generation of autonomous spacecraft conducting close proximity operations. This is especially true for multi-agent inspection operations in which safety including intra-agent or agent-target collisions are a serious concern. Additionally, in an on-orbit servicing operation various failure modes of the target may result in unreliable a-priori knowledge or cooperation from the target. The main contribution of this work is the demonstration of a method for multi-agent, relative pose estimation that is robust to A) sensor blinding and B) dynamic uncertainty. This objective is accomplished leveraging GTSAM, an existing toolbox for the formulation of factor graphs, along with an algorithm for the efficient, real-time solution of such factor graphs, iSAM2. This estimation method is demonstrated in an example scenario with uncertain dynamics and sensor blinding due to sun position. Results revealed that the iSAM2-based method is capable of handling sensor blinding through leveraging an inter-agent range measurement, despite a dynamically uncertain environment.

16:30-18:00, Paper TuCT4-CC.6	Add to My Program
Dynamic Targeting of Satellite Observations Incorporating Slewing Costs and Complex Observation Utility

Kangaslahti, Akseli	University of Michigan
Candela, Alberto	NASA Jet Propulsion Laboratory, Caltech
Swope, Jason	Jet Propulsion Laboratory, California Institute of Technology
Yue, Qing	Jet Propulsion Laboratory, California Institute of Technology
Chien, Steve	Jet Propulsion Laboratory
Keywords: Space Robotics and Automation, Planning, Scheduling and Coordination Abstract: Maximizing the utility of limited Earth observing satellite resources is a difficult ongoing problem. Dynamic Targeting is an approach to this challenge that intelligently plans and executes primary sensor observations based on information from a lookahead sensor. However, current implementations have failed to account for realistic satellite operational constraints and have used static utility for repeat observations of the same target. To address these limitations, we implement a more general Dynamic Targeting framework that comprises a physics-based slew model, a dynamic model of observation utility, and an algorithm for gathering high-utility observations. To demonstrate this framework, we also supply complex dynamic utility models that are applicable to many missions and new algorithms for intelligently scheduling observations with slewing restrictions and changing utility, including a greedy algorithm and a depth-first search algorithm. To evaluate these algorithms, we test their performance across simulated runs through two datasets and compare to the performance of an algorithm representative of most scheduling algorithms aboard Earth science missions today as well as an intractable upper bound. We show that our algorithms have great potential to improve science return from Earth science missions.

16:30-18:00, Paper TuCT4-CC.7	Add to My Program
RecNet: An Invertible Point Cloud Encoding through Range Image Embeddings for Multi-Robot Map Sharing and Reconstruction

Stathoulopoulos, Nikolaos	Luleå University of Technology
Valdes Saucedo, Mario Alberto	Lulea University of Technology
Koval, Anton	Luleå University of Technology
Nikolakopoulos, George	Luleå University of Technology
Keywords: Multi-Robot Systems, AI-Based Methods, Localization Abstract: In the field of resource-constrained robots and the need for effective place recognition in multi-robotic systems, this article introduces RecNet, a novel approach that concurrently addresses both challenges. The core of RecNet's methodology involves a transformative process: it projects 3D point clouds into range images, compresses them using an encoder-decoder framework, and subsequently reconstructs the range image, restoring the original point cloud. Additionally, RecNet utilizes the latent vector extracted from this process for efficient place recognition tasks. This approach not only achieves comparable place recognition results but also maintains a compact representation, suitable for sharing among robots to reconstruct their collective maps. The evaluation of RecNet encompasses an array of metrics, including place recognition performance, the structural similarity of the reconstructed point clouds, and the bandwidth transmission advantages, derived from sharing only the latent vectors. Our proposed approach is assessed using both a publicly available dataset and field experiments¹, confirming its efficacy and potential for real-world applications.


TuCT5-CC Oral Session, CC-411	Add to My Program
Visual Tracking

Chair: Kheddar, Abderrahmane	CNRS-AIST
Co-Chair: Pathak, Sarthak	Chuo University

16:30-18:00, Paper TuCT5-CC.1	Add to My Program
Object Permanence Filter for Robust Tracking with Interactive Robots

Peng, Shaoting	University of Pennsylvania
Wang, Margaret	Massachusetts Institute of Technology
Shah, Julie A.	MIT
Figueroa, Nadia	University of Pennsylvania
Keywords: Visual Tracking, Cognitive Control Architectures, Visual Servoing Abstract: Object permanence, which refers to the concept that objects continue to exist even when they are no longer perceivable through the senses, is a crucial aspect of human cognitive development. In this work, we seek to incorporate this understanding into interactive robots by proposing a set of assumptions and rules to represent object permanence in multi-object, multi-agent interactive scenarios. We integrate these rules into the particle filter, resulting in the Object Permanence Filter (OPF). For multi-object scenarios, we propose an ensemble of K interconnected OPFs, where each filter predicts plausible object tracks that are resilient to missing, noisy, and kinematically or dynamically infeasible measurements. Through several interactive scenarios, we demonstrate that the proposed OPF approach provides robust tracking in human-robot interactive tasks agnostic to measurement type, even in the presence of prolonged and complete occlusion. Project webpage: https://opfilter.github.io/

16:30-18:00, Paper TuCT5-CC.2	Add to My Program
Zero-Shot Open-Vocabulary Tracking with Large-Scale Pre-Trained Models

Chu, Wen-Hsuan	Carnegie Mellon University
Harley, Adam	Stanford University
Tokmakov, Pavel	CMU
Dave, Achal	Toyota Research Institute
Guibas, Leonidas	Stanford University
Fragkiadaki, Aikaterini	Carnegie Mellon University
Keywords: Visual Tracking, Deep Learning for Visual Perception Abstract: Object tracking is central to robot perception and scene understanding, allowing robots to parse a video stream in terms of moving objects with names. Tracking-by-detection has long been a dominant paradigm for object tracking of specific object categories. Recently, large-scale pre-trained models have shown promising advances in detecting and segmenting objects and parts in 2D static images in the wild. This raises the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking? In this paper, we combine an open-vocabulary detector, segmenter, and dense optical flow estimator, into a model that tracks and segments any object in 2D videos. Given a monocular video input, our method predicts object and part mask tracks with associated language descriptions, rebuilding the pipeline of Tractor with modern large pre-trained models for static image detection and segmentation: we detect open-vocabulary object instances and propagate their boxes from frame to frame using a flow-based motion model, refine the propagated boxes with the box regression module of the visual detector, and prompt an open-world segmenter with the refined box to segment the objects. We decide the termination of an object track based on the objectness score of the propagated boxes as well as forward-backward optical flow consistency. We re-identify objects across occlusions using deep feature matching. We show that our model achieves strong performance on multiple established benchmarks, and can produce reasonable tracks in manipulation data. In particular, our model outperforms previous state-of-the-art in UVO and BURST, benchmarks for open-world object tracking and segmentation, despite never being explicitly trained for tracking. We hope that our approach can serve as a simple and extensible framework for future research and enable imitation learning from videos with unconventional objects.

16:30-18:00, Paper TuCT5-CC.3	Add to My Program
Multi-Correlation Siamese Transformer Network with Dense Connection for 3D Single Object Tracking

Feng, Shihao	Zhengzhou University
Liang, Pengpeng	Zhengzhou University
Gao, Jin	Institute of Automation Chinese Academy of Sciences
Cheng, Erkang	Nullmax Inc
Keywords: Visual Tracking, Deep Learning for Visual Perception, Computer Vision for Transportation Abstract: Point cloud-based 3D object tracking is an important task in autonomous driving. Though great advances regarding Siamese-based 3D tracking have been made recently, it remains challenging to learn the correlation between the template and search branches effectively with the sparse LIDAR point cloud data. Instead of performing correlation of the two branches at just one point in the network, in this paper, we present a multi-correlation Siamese Transformer network that has multiple stages and carries out feature correlation at the end of each stage based on sparse pillars. More specifically, in each stage, self-attention is first applied to each branch separately to capture the non-local context information. Then, cross-attention is used to inject the template information into the search area. This strategy allows the feature learning of the search area to be aware of the template while keeping the individual characteristics of the template intact. To enable the network to easily preserve the information learned at different stages and ease the optimization, for the search area, we densely connect the initial input sparse pillars and the output of each stage to all subsequent stages and the target localization network, which converts pillars to bird’s eye view (BEV) feature maps and predicts the state of the target with a small densely connected convolution network. Deep supervision is added to each stage to further boost the performance as well.

16:30-18:00, Paper TuCT5-CC.4	Add to My Program
Refining Pre-Trained Motion Models

Sun, Xinglong	Stanford & UIUC
Harley, Adam	Stanford University
Guibas, Leonidas	Stanford University
Keywords: Visual Tracking, Deep Learning for Visual Perception, Human Detection and Tracking Abstract: Given the difficulty of manually annotating motion in video, the current best motion estimation methods are trained with synthetic data, and therefore struggle somewhat due to a train/test gap. Self-supervised methods hold the promise of training directly on real video, but typically perform worse. These include methods trained with warp error (i.e., color constancy) combined with smoothness terms, and methods that encourage cycle-consistency in the estimates (i.e., tracking backwards should yield the opposite trajectory as tracking forwards). In this work, we take on the challenge of improving state-of-the-art supervised models with self-supervised training. We find that when the initialization is supervised weights, most existing self-supervision techniques actually make performance worse instead of better, which suggests that the benefit of seeing the new data is overshadowed by the noise in the training signal. Focusing on obtaining a "clean" training signal from real-world unlabelled video, we propose to separate label-making and training into two distinct stages. In the first stage, we use the pre-trained model to estimate motion in a video, and then select the subset of motion estimates which we can verify with cycle-consistency. This produces a sparse but accurate pseudo-labelling of the video. In the second stage, we fine-tune the model to reproduce these outputs, while also applying augmentations on the input. We complement this boot-strapping method with simple techniques that densify and re-balance the pseudo-labels, ensuring that we do not merely train on "easy" tracks. We show that our method yields reliable gains over fully-supervised methods in real videos, for both short-term (flow-based) and long-range (multi-frame) pixel tracking. Our code can be found here: https://github.com/AlexSunNik/refining-motion-code.

16:30-18:00, Paper TuCT5-CC.5	Add to My Program
SWTrack: Multiple Hypothesis Sliding Window 3D Multi-Object Tracking

Papais, Sandro	University of Toronto
Ren, Robert	University of Toronto
Waslander, Steven Lake	University of Toronto
Keywords: Visual Tracking, Intelligent Transportation Systems, Probability and Statistical Methods Abstract: Modern robotic systems are required to operate in dense dynamic environments, requiring highly accurate real-time track identification and estimation. For 3D multi-object tracking, recent approaches process a single measurement frame recursively with greedy association and are prone to errors in ambiguous association decisions. Our method, Sliding Window Tracker (SWTrack), yields more accurate association and state estimation by batch processing many frames of sensor data while being capable of running online in real-time. The most probable track associations are identified by evaluating all possible track hypotheses across the temporal sliding window. A novel graph optimization approach is formulated to solve the multidimensional assignment problem with lifted graph edges introduced to account for missed detections and graph sparsity enforced to retain real-time efficiency. We evaluate our SWTrack implementation on the NuScenes autonomous driving dataset to demonstrate improved tracking performance.

16:30-18:00, Paper TuCT5-CC.6	Add to My Program
UncertaintyTrack: Exploiting Detection and Localization Uncertainty in Multi-Object Tracking

Lee, Chang Won	University of Toronto
Waslander, Steven Lake	University of Toronto
Keywords: Visual Tracking, Object Detection, Segmentation and Categorization, Intelligent Transportation Systems Abstract: Multi-object tracking (MOT) methods have seen a significant boost in performance recently, due to strong interest from the research community and steadily improving object detection methods. The majority of tracking methods, which follow the tracking-by-detection (TBD) paradigm, blindly trust the incoming detections with no sense of their associated localization uncertainty. This lack of uncertainty awareness poses a problem in safety-critical tasks such as autonomous driving where passengers could be put at risk due to erroneous detections that have propagated to downstream tasks, including MOT. While there are existing works in probabilistic object detection that predict the localization uncertainty around the boxes, no work in 2D MOT for autonomous driving has studied whether these estimates are meaningful enough to be leveraged effectively in object tracking. We introduce UncertaintyTrack, a collection of extensions that can be applied to multiple TBD trackers to account for localization uncertainty estimates from probabilistic object detectors. Experiments on the Berkeley Deep Drive MOT dataset show that the combination of our method and informative uncertainty estimates reduces the number of ID switches by around 19% and improves mMOTA by 2-3%.

16:30-18:00, Paper TuCT5-CC.7	Add to My Program
Humanoid Loco-Manipulations Using Combined Fast Dense 3D Tracking and SLAM with Wide-Angle Depth-Images (I)

Chappellet, Kevin	CNRS
Murooka, Masaki	AIST
Caron, Guillaume	CNRS
Kanehiro, Fumio	National Inst. of AIST
Kheddar, Abderrahmane	CNRS-AIST
Keywords: Visual Tracking, Perception for Grasping and Manipulation, Humanoid Robot Systems Abstract: To efficiently achieve complex humanoid loco-manipulation tasks in industrial contexts, we propose a combined vision-based tracker-localization interplay integrated as part of a task-space whole-body optimization control. To achieve good perception complementarity between manipulation and localization, a new fast dense 3D model-based tracking using wide-angle depth image is developed and used in conjunction with a simultaneous localization and mapping software. Our approach allows humanoid robots, targeted for industrial manufacturing, to manipulate and assemble large-scale objects while walking. It is assessed with experiments consisting in rolling and assembling in an unwinder a heavy and wide bobbin using bimanual grasping and bipedal locomotion at a time. This experimental use-case is found in some large-scale manufacturing where bobbins are enrolled with various materials (cables, papers, rubbers, etc.). The same experiments are made using two different humanoid robots of the same family.

16:30-18:00, Paper TuCT5-CC.8	Add to My Program
LiteTrack: Layer Pruning with Asynchronous Feature Extraction for Lightweight and Efficient Visual Tracking

Wei, Qingmao	Guandong University of Technology
Zeng, Bi	Guangdong University of Technology
Liu, Jianqi	Guangdong University of Technology
He, Li	Southern University of Science and Technology
Zeng, Guotian	Guangdong University of Technology
Keywords: Visual Tracking, Recognition, Computer Vision for Automation Abstract: The recent advancements in transformer-based visual trackers have led to significant progress, attributed to their strong modeling capabilities. However, as performance improves, running latency correspondingly increases, presenting a challenge for real-time robotics applications, especially on edge devices with computational constraints. In response to this, we introduce LiteTrack, an efficient transformer-based tracking model optimized for high-speed operations across various devices. It achieves a more favorable trade-off between accuracy and efficiency than the other lightweight trackers. The main innovations of LiteTrack encompass: 1) asynchronous feature extraction and interaction between the template and search region for better feature fushion and cutting redundant computation, and 2) pruning encoder layers from a heavy tracker to refine the balnace between performance and speed. As an example, our fastest variant, LiteTrack-B4, achieves 65.2% AO on the GOT-10k benchmark, surpassing all preceding efficient trackers, while running over 100 fps with ONNX on the Jetson Orin NX edge device. Moreover, our LiteTrack-B9 reaches competitive 72.2% AO on GOT-10k and 82.4% AUC on TrackingNet, and operates at 171 fps on an NVIDIA 2080Ti GPU. The code and demo materials will be available at https://github.com/TsingWei/LiteTrack.


TuCT14-AX Oral Session, AX-202	Add to My Program
Rehabilitation Robotics

Chair: Nakata, Yoshihiro	The University of Electro-Communications
Co-Chair: Stroppa, Fabio	Kadir Has University

16:30-18:00, Paper TuCT14-AX.1	Add to My Program
The Impact of Evolutionary Computation on Robotic Design: A Case Study with an Underactuated Hand Exoskeleton

Akbas, Baris	Kadir Has University
Soylemez, Aleyna	Kadir Has University
Yuksel, Huseyin Taner	Kadir Has University
Zyada, Mazhar Eid	Kadir Has University
Sarac, Mine	Kadir Has University
Stroppa, Fabio	Kadir Has University
Keywords: Prosthetics and Exoskeletons, Rehabilitation Robotics, Optimization and Optimal Control Abstract: Robotic exoskeletons can enhance human strength and aid people with physical disabilities. However, designing them to ensure safety and optimal performance presents significant challenges. Developing exoskeletons should incorporate specific optimization algorithms to find the best design. This study investigates the potential of Evolutionary Computation (EC) methods in robotic design optimization, with an underactuated hand exoskeleton (U-HEx) used as a case study. We propose improving the performance and usability of the UHEx design, which was initially optimized using a naive bruteforce approach, by integrating EC techniques such as Genetic Algorithm and Big Bang-Big Crunch Algorithm. Comparative analysis revealed that EC methods consistently yield more precise and optimal solutions than brute force in a significantly shorter time. This allowed us to improve the optimization by increasing the number of variables in the design, which was impossible with naive methods. The results show significant improvements in terms of the torque magnitude the device transfers to the user, enhancing its efficiency. These findings underline the importance of performing proper optimization while designing exoskeletons, as well as providing a significant improvement to this specific robotic design.

16:30-18:00, Paper TuCT14-AX.2	Add to My Program
Design & Systematic Evaluation of Power Transmission Efficiency of an Ankle Exoskeleton for Walking Post-Stroke

Cooper, Myles	Harvard University
Canete, Santiago	Harvard University
Eckert-Erdheim, Asa	Harvard University
Kimberley, Aidan	Harvard University
Siviy, Christopher	Harvard University School of Engineering and Applied Sciences
Baker, Teresa	Boston University
Ellis, Terry	Boston University
Slade, Patrick	Harvard University
Walsh, Conor James	Harvard University
Keywords: Prosthetics and Exoskeletons, Rehabilitation Robotics, Physically Assistive Devices Abstract: Community-based locomotor training post-stroke has shown improvements in independent ambulation by increasing dose, intensity, and specificity of walking practice. Robotic ankle exoskeletons hold the potential to facilitate continued rehabilitation at home, but understanding what aspects of the design are most relevant for successful translation to the community presents a challenge. Here, we design a portable rigid ankle exoskeleton to use as a research platform for investigating the effect of assistance on post-stroke gait during overground, community-based walking. We first test our device with stroke survivors and validate its potential for future community use. We then present a systematic method for quantifying power transmission losses at each transmission stage from the battery to the wearer, using data gathered from walking trials with healthy participants. Our evaluation method revealed inefficiencies in power transfer at the interface level, likely resulting from the compliance in the structural components of the system, which motivates future redesign considerations. Overall, our method provides a framework to identify and characterize the components that must be redesigned to lower exoskeleton weight and maximize performance.

16:30-18:00, Paper TuCT14-AX.3	Add to My Program
Achieving Mechanical Transparency Using Fusion Hybrid Linear Actuator for Shoulder Flexion and Extension in Exoskeleton Robot

Shimoyama, Takuma	Graduate School of Informatics and Engineering, the University O
Noda, Tomoyuki	ATR Computational Neuroscience Laboratories
Teramae, Tatsuya	ATR Computational Neuroscience Laboratories
Nakata, Yoshihiro	The University of Electro-Communications
Keywords: Prosthetics and Exoskeletons, Rehabilitation Robotics Abstract: Recently, the importance of mechanical transparency in human-assistive robots has grown. Traditionally, its primary goal was minimizing interaction forces during assistance. However, under this conventional definition, mechanical transparency was not considered when an interaction force was required during assistance. This research focuses on achieving mechanical transparency within the context of shoulder motion in upper extremity exoskeletons for rehabilitation. Our primary goal is maintaining interaction forces at target values, even with motion disturbances. To this end, we developed a shoulder actuation testbed for exoskeletons, incorporating a fusion hybrid linear actuator distinguished by high back-drivability, robust torque generation capability, and safety features. To attain mechanical transparency, we created a model for calculating the required joint torque, accounting for gravitational dynamics, and subsequently determined the necessary actuator output. The system characteristics were evaluated based on the joint torque generated by the actuator. The actuator utilized pneumatic pressure to generate force and compensated for kinetic friction using electromagnetic forces. The results showed that the compensation by the electromagnetic force reduced the root mean square error of the torque to less than 60% in relation to pneumatic pressure alone. This demonstrated the ability to generate consistent torque with high robustness to motion disturbances.

16:30-18:00, Paper TuCT14-AX.4	Add to My Program
Imitation Learning-Based System for the Execution of Self-Paced Robotic-Assisted Passive Rehabilitation Exercises

Escarabajal, Rafael J.	Universidad Politécnica De Valencia
Pulloquinga, José Luis	Universidad Politécnica De Valencia
Zamora-Ortiz, Pau	Universitat Politècnica De València
Valera, Angel	Universidad Politécnica De Valencia
Mata, Vicente	Universidad Politécnica De Valencia
Valles, Marina	Universitat Politècnica De València
Keywords: Rehabilitation Robotics, Imitation Learning, Parallel Robots Abstract: The development of robotic-assisted rehabilitation exercises involving physical human-robot interaction requires extreme care since an injured limb may be in physical contact with the robot, so compliant behavior is imperative for these tasks. Typical approaches involve force control schemes like admittance controllers that allow humans to adapt the motion. However, when the patient's limb has limited mobility or is potentially injured, unintentional forces may occur during the robot's trajectory that could be incompatible with these controllers. This paper addresses a new way of generating compliant trajectories for passive rehabilitation exercises, considering that previous positions of the trajectory are attainable for the patient, so reversing the trajectory is a safe operation. Since there is no clear way to optimize such a goal due to the physiological variability among patients, the condition of reversal is based on imitation learning by taking the analogous healthy limb of the patient as a reference and encoding the forces using Gaussian Mixture Regression, and reversibility is accomplished by means of Reversible Dynamic Movement Primitives. The system allows for self-paced rehabilitation exercises by back-and-forth movements along the trajectory according to the patient's reaction, and it has been successfully applied to a 4-DOF parallel robot for lower-limb rehabilitation.

16:30-18:00, Paper TuCT14-AX.6	Add to My Program
An Adaptable Ankle Trajectory Generation Method for Lower-Limb Exoskeletons by Means of Safety Constraints Computation and Minimum Jerk Planning

Giannattasio, Raffaele	Italian Institute of Technology
Maludrottu, Stefano	Italian Institute of Technology
Zinni, Gaia	Istituto Italiano Di Tecnologia
De Momi, Elena	Politecnico Di Milano
Laffranchi, Matteo	Istituto Italiano Di Tecnologia
De Michieli, Lorenzo	Istituto Italiano Di Tecnologia
Keywords: Rehabilitation Robotics, Prosthetics and Exoskeletons, Medical Robots and Systems Abstract: This paper presents a method to compute smooth ankle trajectories for lower limb exoskeletons with powered ankle joints. The proposed approach defines ankle trajectories using four polynomial functions, each representing one of the four primary phases of gait. These polynomials are computed according to different safety constraints. During the single support phase, ground contact constraints are enforced. In the swing phase, an optimization problem is solved to achieve minimum jerk planning while respecting a set of equality and inequality constraints designed to minimize the risk of stumbling. The used approach focuses on making the ankle joint able to smoothly adapt in real-time to different walking styles defined by user-selected gait parameters such as step length and clearance. The primary aim is to improve the user experience by producing a secure and comfortable walking pattern. To validate the effectiveness of the proposed method, the new ankle trajectories were tested on a group of healthy volunteers using the TWIN lower limb exoskeleton.

16:30-18:00, Paper TuCT14-AX.7	Add to My Program
Controlling FES of Arm Movements Using Physics-Informed Reinforcement Learning Via Co-Kriging Adjustment

Wannawas, Nat	Imperial College London
Diaz-Pintado, Clara	Imperial College London
Narayan, Jyotindra	Imperial College / University of Bayreuth
Faisal, Aldo	Imperial College London
Keywords: Rehabilitation Robotics, Reinforcement Learning, Model Learning for Control Abstract: Upper limb paralysis affects the quality of life. Functional Electrical Stimulation (FES) offers a solution to restore lost motor functions. Yet, there remain challenges in controlling FES to induce arbitrary arm movements. Reinforcement learning (RL) emerges as a promising method for controlling arm movement with success in simulation. However, challenges remain in translating the successes into real-world settings. One dominant challenge is the sample efficiency of RL. This study presents a practical RL setup to control FES for arm movements. We also present a flexible method, called co-kriging adjustment (CKA), which combines a biomechanical simulator and real data to build an accurate model of the real system. We demonstrate our RL-based control on a 2-DoF planar setting where the subject's arm, placed on a frictionless supporter, is stimulated to perform point-to-point reaching. By using 90 seconds of real interaction data, our RL-based control can perform the reaching with the average error over the workspace of 5.5 cm. Beyond the application of FES, our method can be extended to other control systems, propelling RL towards general uses in the real world.

16:30-18:00, Paper TuCT14-AX.8	Add to My Program
Adaptive Control for Triadic Human-Robot-FES Collaboration in Gait Rehabilitation: A Pilot Study

Christou, Andreas	The University of Edinburgh
del-Ama, Antonio J.	Rey Juan Carlos University
Moreno, Juan C.	Cajal Institute, CSIC
Vijayakumar, Sethu	University of Edinburgh
Keywords: Rehabilitation Robotics, Wearable Robotics, Prosthetics and Exoskeletons Abstract: The hybridisation of robot-assisted gait training and functional electrical stimulation (FES) can provide numerous physiological benefits to neurological patients. However, the design of an effective hybrid controller poses significant challenges. In this over-actuated system, it is extremely difficult to find the right balance between robotic assistance and FES that will provide personalised assistance, prevent muscle fatigue and encourage the patient’s active participation in order to accelerate recovery. In this paper, we present an adaptive hybrid robot-FES controller to do this and enable the triadic collaboration between the patient, the robot and FES. A patient-driven controller is designed where the voluntary movement of the patient is prioritised and assistance is provided using FES and the robot in a hierarchical order depending on the patient’s performance and their muscles’ fitness. The performance of this hybrid adaptive controller is tested in simulation and on one healthy subject. Our results indicate an increase in tracking performance with lower overall assistance, and less muscle fatigue when the hybrid adaptive controller is used, compared to its non adaptive equivalent. This suggests that our hybrid adaptive controller may be able to adapt to the behaviour of the user to provide assistance as needed and prevent the early termination of physical therapy due to muscle fatigue.

16:30-18:00, Paper TuCT14-AX.9	Add to My Program
Stretch with Stretch: Physical Therapy Exercise Games Led by a Mobile Manipulator

Lamsey, Matthew	Georgia Institute of Technology
Wells, Meredith	Emory University School of Medicine
Tan, You Liang	Georgia Institute of Technology
Beatty, Madeline	Georgia Institute of Technology
Liu, Zexuan	University of Michigan
Majumdar, Arjun	Georgia Institute of Technology
Washington, Kendra	The Georgia Institute of Technology
Feldman, Jerry	Parkinson's Foundation
Kuppuswamy, Naveen	Toyota Research Institute
Nguyen, Elizabeth	Long School of Medicine
Wallenstein, Arielle	Emory University
Kemp, Charles C.	Hello Robot Inc
Hackney, Madeleine Eve	Emory University
Keywords: Physical Human-Robot Interaction, Rehabilitation Robotics, Human-Centered Robotics Abstract: Physical therapy (PT) is a key component of many rehabilitation regimens, such as treatments for Parkinson's disease (PD). However, there are shortages of physical therapists and adherence to self-guided PT is low. Robots have the potential to support physical therapists and increase adherence to self-guided PT, but prior robotic systems have been large and immobile, which can be a barrier to use in homes and clinics. We present Stretch with Stretch (SWS), a novel robotic system for leading stretching exercise games for older adults with PD. SWS consists of a compact and lightweight mobile manipulator (Hello Robot Stretch RE1) that visually and verbally guides users through PT exercises. The robot's soft end effector serves as a target that users repetitively reach towards and press with a hand, foot, or knee. For each exercise, target locations are customized for the individual via a visually estimated kinematic model, a haptically estimated range of motion, and the person's exercise performance. The system includes sound effects and verbal feedback from the robot to keep users engaged throughout a session and augment physical exercise with cognitive exercise. We conducted a user study for which people with PD (n=10) performed 6 exercises with the system. Participants perceived the SWS to be useful and easy to use. They also reported mild to moderate perceived exertion (RPE).


TuCT17-AX Oral Session, AX-205	Add to My Program
Legged Robots III

Chair: Yan, Cong	Ritsumeikan University
Co-Chair: Lin, Pei-Chun	National Taiwan University

16:30-18:00, Paper TuCT17-AX.1	Add to My Program
Investigating Stability Outcomes across Diverse Gait Patterns in Quadruped Robots: A Comparative Analysis

Ju, Zhongjin	Yanshan University
Wei, Ke	YanShanUniversity
Jin, Lei	Yanshan University
Xu, Yundou	Parallel Robot and Mechatronic System Laboratory of Hebei Provin
Keywords: Legged Robots, Methods and Tools for Robot System Design, Motion Control Abstract: Quadruped robots have gained attention for their potential to navigate various terrains. However, the stability of these robots in different gait sequences remains an open question. This study investigates the relationship between different gait sequences and the motion stability of quadruped robots, assuming a flat terrain for the purpose of the analysis. Utilizing mathematical models based on spiral theory, we examine the stability margins associated with different leg movement sequences. Notably, our findings confirm that the most commonly observed sequence in both natural and robotic contexts indeed offers optimal stability. The study also scrutinizes the influence of the robot's structural parameters and gait configuration on its motion stability. These results provide a theoretical foundation for the design and stability control of quadruped robots, setting the stage for future work on more complex terrains.

16:30-18:00, Paper TuCT17-AX.2	Add to My Program
Pedipulate: Enabling Manipulation Skills Using a Quadruped Robot's Leg

Arm, Philip	ETH Zurich
Mittal, Mayank	ETH Zurich
Kolvenbach, Hendrik	ETH Zurich
Hutter, Marco	ETH Zurich
Keywords: Legged Robots, Mobile Manipulation, Reinforcement Learning Abstract: Legged robots have the potential to become vital in maintenance, home support, and exploration scenarios. In order to interact with and manipulate their environments, most legged robots are equipped with a dedicated robot arm, which means additional mass and mechanical complexity compared to standard legged robots. In this work, we explore pedipulation - using the legs of a legged robot for manipulation. By training a reinforcement learning policy that tracks position targets for one foot, we enable a dedicated pedipulation controller that is robust to disturbances, has a large workspace through whole-body behaviors, and can reach far-away targets with gait emergence, enabling loco-pedipulation. By deploying our controller on a quadrupedal robot using teleoperation, we demonstrate various real-world tasks such as door opening, sample collection, and pushing obstacles. We demonstrate load carrying of more than 2.0 kg at the foot. Additionally, the controller is robust to interaction forces at the foot, disturbances at the base, and slippery contact surfaces. Videos of the experiments are available at https://sites.google.com/leggedrobotics.com/pedipulate.

16:30-18:00, Paper TuCT17-AX.3	Add to My Program
An Efficient Model Based Approach on Learning Agile Motor Skills without Reinforcement

Shi, Haojie	Chinese University of Hong Kong
Li, Tingguang	The Chinese University of Hong Kong
Zhu, Qingxu	Tencent
Sheng, Jiapeng	Shandong University
Han, Lei	Tencent Robotics X
Meng, Max Q.-H.	The Chinese University of Hong Kong
Keywords: Legged Robots, Model Learning for Control, Imitation Learning Abstract: Learning-based methods have improved locomotion skills of quadruped robots through deep reinforcement learning. But the sim-to-real gap and low sample efficiency still limits the skill transfer. To address this issue, we propose a model-based supervised learning framework that combines a world model with a policy network. We train a differentiable world model to predict future states and use it to train a Variational Autoencoder (VAE)-based policy network through supervised learning to imitate the natural behavior of real animals. This approach significantly diminishes the requirement for substantial amounts of real interaction data since it solely focuses on training the world model, concurrently allowing for rapid policy updates through supervised learning. We also develop a high-level network to track diverse commands and trajectories. We initially train the policy within a simulation environment and subsequently fine-tune it using a physical robot. Simulated results show a tenfold sample efficiency increase compared to reinforcement learning methods such as PPO. Transitioning to real-world testing, our policy achieves proficient command-following performance with only a two-minute data collection period, and generalizes well to new speeds and paths.

16:30-18:00, Paper TuCT17-AX.4	Add to My Program
Adaptive Model Predictive Control with Data-Driven Error Model for Quadrupedal Locomotion

Zeng, Xuanqi	Chinese University of Hong Kong
Zhang, Hongbo	The Chinese University of Hong Kong
Yue, Linzhu	The Chinese University of Hong Kong
Song, Zhitao	The Chinese University of Hong Kong
Zhang, Lingwei	Hong Kong Centre for Logistics Robotics
Liu, Yunhui	Chinese University of Hong Kong
Keywords: Legged Robots, Motion Control, Dynamics Abstract: Model Predictive Control (MPC) relies heavily on the robot model for its control law. However, a gap always exists between the reduced-order control model with uncertainties and the real robot, which degrades its performance. To address this issue, we propose the controller of integrating a data-driven error model into traditional MPC for quadruped robots. Our approach leverages real-world data from sensors to compensate for defects in the control model. Specifically, we employ the Autoregressive Moving Average Vector (ARMAV) model to construct the state error model of the quadruped robot using data. The predicted state errors are then used to adjust the predicted future robot states generated by MPC. By such an approach, our proposed controller can provide more accurate inputs to the system, enabling it to achieve desired states even in the presence of inaccuracies in the model parameters or disturbances. The proposed controller exhibits the capability to partially eliminate the disparity between the model and the real-world robot, thereby enhancing the locomotion performance of quadruped robots. We validate our proposed method through simulations and real-world experimental trials on a large-size quadruped robot that involves carrying a 20 kg un-modeled payload (84% of body weight).

16:30-18:00, Paper TuCT17-AX.5	Add to My Program
Layered Control for Cooperative Locomotion of Two Quadrupedal Robots: Centralized and Distributed Approaches

Kim, Jeeseop	Caltech
Fawcett, Randall	Virginia Polytechnic Institute and State University
Kamidi, Vinay	Virginia Tech
Ames, Aaron	Caltech
Akbari Hamed, Kaveh	Virginia Tech
Keywords: Legged Robots, Motion Control, Optimization and Optimal Control, Multi-Contact Whole-Body Motion Planning and Control Abstract: This paper presents a layered control approach for real-time trajectory planning and control of robust cooperative locomotion by two holonomically constrained quadrupedal robots. A novel interconnected network of reduced-order models, based on the single rigid body (SRB) dynamics, is developed for trajectory planning purposes. At the higher level of the control architecture, two different model predictive control (MPC) algorithms are proposed to address the optimal control problem of the interconnected SRB dynamics: centralized and distributed MPCs. The distributed MPC assumes two local quadratic programs that share their optimal solutions according to a one-step communication delay and an agreement protocol. At the lower level of the control scheme, distributed nonlinear controllers are developed to impose the full-order dynamics to track the prescribed reduced-order trajectories generated by MPCs. The effectiveness of the control approach is verified with extensive numerical simulations and experiments for the robust and cooperative locomotion of two holonomically constrained A1 robots with different payloads on variable terrains and in the presence of disturbances. It is shown that the distributed MPC has a performance similar to that of the centralized MPC, while the computation time is reduced significantly.

16:30-18:00, Paper TuCT17-AX.6	Add to My Program
RL + Model-Based Control: Using On-Demand Optimal Control to Learn Versatile Legged Locomotion

Kang, Dongho	ETH Zurich
Cheng, Jin	ETH Zurich
Zamora Mora, Miguel Angel	ETH Zurich
Zargarbashi, Fatemeh	ETH Zurich
Coros, Stelian	ETH Zurich
Keywords: Legged Robots, Motion Control, Reinforcement Learning Abstract: This letter presents a control framework that combines model-based optimal control and reinforcement learning (RL) to achieve versatile and robust legged locomotion. Our approach enhances the RL training process by incorporating on-demand reference motions generated through finite-horizon optimal control, covering a broad range of velocities and gaits. These reference motions serve as targets for the RL policy to imitate, leading to the development of robust control policies that can be learned with reliability. Furthermore, by utilizing realistic simulation data that captures whole-body dynamics, RL effectively overcomes the inherent limitations in reference motions imposed by modeling simplifications. We validate the robustness and controllability of the RL training process within our framework through a series of experiments. In these experiments, our method showcases its capability to generalize reference motions and effectively handle more complex locomotion tasks that may pose challenges for the simplified model, thanks to RL’s flexibility. Additionally, our framework effortlessly supports the training of control policies for robots with diverse dimensions, eliminating the necessity for robot-specific adjustments in the reward function and hyperparameters.

16:30-18:00, Paper TuCT17-AX.7	Add to My Program
Generation of Steady Wheel Gait for Planar X-Shaped Walker with Reaction Wheel

Asano, Fumihiko	Japan Advanced Institute of Science and Technology
Sedoguchi, Taiki	Japan Advanced Institute of Science and Technology
Yan, Cong	Ritsumeikan University
Keywords: Legged Robots, Motion Control, Underactuated Robots Abstract: This paper addresses the problem of realizing a novel robotic bipedal locomotion called wheel gait, which is achieved by rotating the stance and swing legs in the same direction. First, a model of a planar 3-DOF X-shaped walker with a reaction wheel is introduced, and the mathematical equations are described. Second, the condition for stabilizing zero dynamics is formulated as the time integral value of control input to the reaction wheel for one step becomes zero, and the control system for achieving this is designed based on the method of continuous-time output deadbeat control. Third, a typical steady wheel gait of the linearized model is numerically generated, and its extension to the nonlinear model is discussed. Although the nonlinear model has only one nonlinear term in the gravity term, numerical simulations show that there is a big gap between this and the linearized model. Through analysis of the typical nonlinear wheel gaits, the difficulty of achieving the same walking speed as the linearized model is discussed.

16:30-18:00, Paper TuCT17-AX.8	Add to My Program
Trajectory Optimization Strategy That Considers Body Tip-Over Stability, Limb Dynamics, and Motion Continuity in Legged Robots

Lu, Kuan-Lun	National Taiwan University
Chang, I-Chia	Purdue University
Yu, Wei-Shun	National Taiwan University
Lin, Pei-Chun	National Taiwan University
Keywords: Legged Robots, Multi-Contact Whole-Body Motion Planning and Control, Optimization and Optimal Control Abstract: We propose a limb trajectory planning method that considers both body and limb dynamics in robots, particularly suitable for those with non-trivial limb mass. To simplify the complexity and computation cost of using the full-body dynamics of the limbs, a reduced-order model that can simulate the dynamic characteristics of the original limb is proposed. The performance of the model is experimentally validated using an exemplary single leg-wheel of the leg-wheel transformable robot. The limb trajectory optimization is developed using a genetic algorithm that considers many aspects, including body and limb dynamics, limb workspace, limb motion continuity, body tip-over stability, and power consumption. The performance of the proposed limb trajectory planning strategy is experimentally validated using the same leg-wheel transformable robot, and the results confirm the effectiveness of the strategy.


TuCT23-NT Oral Session, NT-G401	Add to My Program
Aerial Systems: Mechanics and Control III

Chair: Foong, Shaohui	Singapore University of Technology and Design
Co-Chair: Chirarattananon, Pakpong	City University of Hong Kong

16:30-18:00, Paper TuCT23-NT.1	Add to My Program
Hitchhiker: A Quadrotor Aggressively Perching on a Moving Inclined Surface Using Compliant Suction Cup Gripper (I)

Liu, Sensen	ShanghaiJiaotong University
Wang, Zhaoying	Shanghai Jiao Tong University
Sheng, Xinjun	Shanghai Jiao Tong University
Dong, Wei	Shanghai Jiao Tong University
Keywords: Aerial Systems: Mechanics and Control, Aerial Systems: Applications Abstract: Perching on the surface of moving objects, like vehicles, could extend the flight time and range of quadrotors. Suction cups are usually adopted for surface attachment due to their durability and large adhesive force. To seal on a surface, suction cups must be aligned with the surface and possess proper relative tangential velocity. However, quadrotors' attitude and relative velocity errors would become significant when the object's surface is moving and inclined. To address this problem, we proposed a real-time trajectory planning algorithm. The time optimal aggressive trajectory is efficiently generated through multimodal search in a dynamic time-domain. The velocity errors relative to the moving surface are alleviated. To further adapt to the residual errors, we design a compliant gripper using self-sealing cups. Multiple cups in different directions are integrated into a wheel-like mechanism to increase the tolerance to attitude errors. The wheel mechanism also eliminates the requirement of matching the attitude and tangential velocity. Extensive tests are conducted to perch on static and moving surfaces at various inclinations. Results demonstrate that our proposed system enables a quadrotor to reliably perch on moving inclined surfaces (up to 1.07m/s and 90◦) with a success rate of 70% or higher. The efficacy of the trajectory planner is also validated.

16:30-18:00, Paper TuCT23-NT.2	Add to My Program
Harnessing the Differential Flatness of Monocopter Dynamics for the Purpose of Trajectory Tracking in a Stable Invertible Coaxial Actuated ROtorcraft (SICARO)

Tang, Emmanuel	Singapore University of Technology & Design
Ang, Wei Jun	Singapore University of Technology & Design
Tan, Kian Wee	Singapore University of Technology & Design
Foong, Shaohui	Singapore University of Technology and Design
Keywords: Aerial Systems: Mechanics and Control, Aerial Systems: Applications, Biologically-Inspired Robots Abstract: In this paper, the dynamics of an emerging class of rotating nature-inspired micro aerial vehicles known as the Monocopter is proven and shown to be differentially flat. By exploiting this phenomenon, trajectory tracking can now be implemented on Monocopters via feed-forward terms that are computed per the trajectory. To demonstrate this, a Monocopter in the form of a Stable Invertible Coaxial Actuated ROtorcraft (SICARO) is chosen to harness this approach fully. The SICARO is capable of flying with either side of the wing facing up and this feature determines the craft’s direction of rotation about its body Z axis as well. In addition, it has the unique feature of a coaxial motor configuration that allows for a pitching-up moment regardless of the wing side facing up. The feed-forward terms computed are fused into a cascaded nonlinear controller on the craft to ensure its effectiveness in tracking trajectories. Lastly, the flight experiments extend to both sides of the wing to validate this method as being applicable for trajectory tracking for Monocopters such as the SICARO which has an extended range of flying capabilities.

16:30-18:00, Paper TuCT23-NT.3	Add to My Program
Passive Aligning Physical Interaction of Fully-Actuated Aerial Vehicles for Pushing Tasks

Hui, Tong	Technical University of Denmark
Cuniato, Eugenio	ETH Zurich
Pantic, Michael	ETH Zürich
Tognon, Marco	Inria Rennes
Fumagalli, Matteo	Danish Technical University
Siegwart, Roland	ETH Zurich
Keywords: Aerial Systems: Mechanics and Control, Aerial Systems: Applications, Dynamics Abstract: Recently, the utilization of aerial manipulators for performing pushing tasks in non-destructive testing (NDT) applications has seen significant growth. Such operations entail physical interactions between the aerial robotic system and the environment. End-effectors with multiple contact points are often used for placing NDT sensors in contact with a surface to be inspected. Aligning the NDT sensor and the work surface while preserving contact, requires that all available contact points at the end-effector tip are in contact with the work surface. With a standard full-pose controller, attitude errors often occur due to perturbations caused by modeling uncertainties, sensor noise, and environmental uncertainties. Even small attitude errors can cause a loss of contact points between the end-effector tip and the work surface. To preserve full alignment amidst these uncertainties, we propose a control strategy which selectively deactivates angular motion control and enables direct force control in specific directions. In particular, we derive two essential conditions to be met, such that the robot can passively align with flat work surfaces achieving full alignment through the rotation along non-actively controlled axes. Additionally, these conditions serve as hardware design and control guidelines for effectively integrating the proposed control method for practical usage. Real world experiments are conducted to validate both the control design and the guidelines.

16:30-18:00, Paper TuCT23-NT.4	Add to My Program
Modeling and Control of PADUAV: A Passively Articulated Dual UAVs Platform for Aerial Manipulation

Sun, Jiali	Beijing Institute of Technology
Wang, Kaidi	Beijing Institute of Technology
Shi, Chuanbeibei	Univeristy of Bristol
Li, Xiujia	Beijing Institute of Technology
Yi, Xiaojian	Beijing Institute of Technology
Yu, Yushu	Beijing Institute of Technology
Sun, Fuchun	Tsinghua University
Dong, Yiqun	Nanyang Technological University
Keywords: Aerial Systems: Mechanics and Control, Control Architectures and Programming, Aerial Systems: Applications Abstract: In this paper, we introduce PADUAV, a novel 5-DOF aerial platform designed to overcome the limitations of traditional tiltrotor vehicles. PADUAV features a unique mechanical design that incorporates two off-the-shelf quadrotors passively articulated to a rigid frame. This innovation enables free pitch rotation without mechanical constraints like cable winding, significantly enhancing its capabilities for various tasks. To control PADUAV's 5 degrees of freedom, we propose a versatile and straightforward 5-DOF geometric tracking control strategy that generates 2D force and 3D torque. A decomposition approach is designed to distribute the output to the torque and thrust commands for each subplane, with no need for complex optimization. We validate our approach through three simulation experiments conducted in the Gazebo environment, leveraging the utilities provided by the RotorS simulator. These experiments not only demonstrate the feasibility of our platform but also provide new perspectives for future aerial platform development, particularly in terms of simulation-based approaches.

16:30-18:00, Paper TuCT23-NT.5	Add to My Program
Particle Filter with Stable Embedding for State Estimation of the Rigid Body Attitude System on the Set of Unit Quaternions

Jang, Hee-Deok	Korea Advanced Institute of Science Technology
Park, Jae-Hyeon	Korea Advanced Institute of Science and Technology (KAIST)
Chang, Dong Eui	KAIST
Keywords: Aerial Systems: Mechanics and Control, Kinematics, Localization Abstract: This paper presents a novel method for state estimation of rigid body attitude system evolving on the manifold, S3, which is crucial in robotics and drone applications. We introduce a particle filter with stable embedding that extends the system into Euclidean space while ensuring stability of the manifold. Our particle filter with stable embedding enables accurate state estimation by maintaining estimated state values in close proximity to the manifold, while requiring significantly fewer computational resources than the standard exponential-map-based method that keeps state estimates on the manifold. Furthermore, our method facilitates the application of usual techniques designed for particle filters in Euclidean spaces, to the manifold system, as is, without any modification. The accuracy and the efficiency of our particle filter are confirmed both by simulation and by real drone experiments.

16:30-18:00, Paper TuCT23-NT.6	Add to My Program
End-To-End Reinforcement Learning for Time Optimal Quadcopter Flight

Ferede, Robin	TU Delft
De Wagter, Christophe	Delft University of Technology
Izzo, Dario	European Space Agency
de Croon, Guido	TU Delft
Keywords: Aerial Systems: Mechanics and Control, Machine Learning for Robot Control, Reinforcement Learning Abstract: Aggressive time-optimal control of quadcopters poses a significant challenge in the field of robotics. The state-of-the-art approach leverages reinforcement learning (RL) to train optimal neural policies. However, a critical hurdle is the sim-to-real gap, often addressed by employing a robust inner loop controller —an abstraction that, in theory, constrains the optimality of the trained controller, necessitating margins to counter potential disturbances. In contrast, our novel approach introduces high-speed quadcopter control using end-to-end RL (E2E) that gives direct motor commands. To bridge the reality gap, we incorporate a learned residual model and an adaptive method that can compensate for modeling errors in thrust and moments. We compare our E2E approach against a state-of-the-art network that commands thrust and body rates to an INDI inner loop controller, both in simulated and real-world flight. E2E showcases a significant 1.39-second advantage in simulation and a 0.17-second edge in real-world testing, highlighting end-to-end reinforcement learning's potential. The performance drop observed from simulation to reality shows potential for further improvement, including refining strategies to address the reality gap or exploring offline reinforcement learning with real flight data.

16:30-18:00, Paper TuCT23-NT.7	Add to My Program
Quadrolltor: A Reconfigurable Quadrotor with Controlled Rolling and Turning

Jia, Huaiyuan	City University of Hong Kong
Ding, Runze	City University of Hongkong
Dong, Kaixu	City University of Hong Kong
Bai, Songnan	City University of Hong Kong
Chirarattananon, Pakpong	City University of Hong Kong
Keywords: Aerial Systems: Mechanics and Control, Mechanism Design, Wheeled Robots Abstract: This letter reports an aerial robot--Quadrolltor, with the ability to roll and turn. Existing bimodal quadrotors feature cylindrical rolling cages that are rotationally decoupled from the robot's main rigid body. In contrast, the proposed robot employs passively reconfigurable structures to enable the second mode of locomotion, tightly coupling the attitude of the robot to the rolling cage. The benefits are precise rolling and turning control as well as improved rolling efficiency. Experiments were conducted to comprehensively validate the hybrid locomotion. The robot leveraged the superior maneuverability in the rolling mode to take photos of the surroundings at different tilting and panning angles to construct a panoramic image. Besides, the results of the power measurements show a significant reduction in the cost of transport brought by rolling, equating to a 15-fold extension in the operational range.

16:30-18:00, Paper TuCT23-NT.8	Add to My Program
Robust Control for Bidirectional Thrust Quadrotors under Instantaneously Drastic Disturbances

Chen, Zujian	Shenzhen University
Shaolin, Mo	Sun Yat-Sen University
Zhang, Botao	Sun Yat-Sen University
Cheng, Hui	Sun Yat-Sen University
Li, Jiyu	College of Engineering, South China Agricultural University
Keywords: Aerial Systems: Mechanics and Control, Robust/Adaptive Control, Aerial Systems: Applications Abstract: Quadrotors may crash and cause severe accidents under instantaneously drastic disturbances. To mitigate the effect of such disturbances, these critical issues should be considered: efficient disturbance observation and compensation, full attitude controllability, and instant output power generation of the quadrotor. In this paper, to keep the quadrotor stable even under suddenly drastic disturbances, a novel control framework is presented to by integrating the advantages of active disturbance rejection control (ADRC) as well as geometric control for a quadrotor with bidirectional thrust capabilities. Moreover, to strengthen the adaptability under significant disturbances, a novel switching strategy is introduced into the control framework by virtue of the quadrotor’s bidirectional thrust capabilities. The ADRC scheme is performed when the disturbances are within a range; alternatively, if the disturbances surpass the preset range and the desired control is beyond the ultimate output of the quadrotor, the quadrotor compliantly responds by executing a 180^circ flip reverse flight to handle such drastic disturbances. Numerical and real-world experiments demonstrate that the proposed robust control strategy has superior performance adapts to instantaneously drastic disturbances.


TuCT31-NT Oral Session, NT-G7	Add to My Program
Intelligent and Flexible Manufacturing

Chair: Liu, Lianqing	Shenyang Institute of Automation
Co-Chair: Park, Yong-Lae	Seoul National University

16:30-18:00, Paper TuCT31-NT.1	Add to My Program
Fixture Calibration with Guaranteed Bounds from a Few Correspondence-Free Surface Points

Haugaard, Rasmus Laurvig	University of Southern Denmark
Kim, Yitaek	University of Southern Denmark
Iversen, Thorbjørn Mosekjær	The Maersk Mc-Kinney Moller Institute, University of Southern De
Keywords: Calibration and Identification, Software Tools for Robot Programming, Industrial Robots Abstract: Calibration of fixtures in robotic work cells is essential but also time consuming and error-prone, and poor calibration can easily lead to wasted debugging time in downstream tasks. Contact-based calibration methods let the user measure points on the fixture's surface with a tool tip attached to the robot's end effector. Most methods require the user to manually annotate correspondences on the CAD model, however, this is error-prone and a cumbersome user experience. We propose a correspondence-free alternative: The user simply measures a few points from the fixture's surface, and our method provides a tight superset of the poses which could explain the measured points. This naturally detects ambiguities related to symmetry and uninformative points and conveys this uncertainty to the user. Perhaps more importantly, it provides guaranteed bounds on the pose. The computation of such bounds is made tractable by the use of a hierarchical grid on SE(3). Our method is evaluated both in simulation and on a real collaborative robot, showing great potential for easier and less error-prone fixture calibration.

16:30-18:00, Paper TuCT31-NT.2	Add to My Program
Data-Driven Virtual Sensing for Probabilistic Condition Monitoring of Solenoid Valves (I)

Vantilborgh, Victor	Ghent University
Lefebvre, Tom	Ghent University
Crevecoeur, Guillaume	Ghent University
Keywords: Manufacturing, Maintenance and Supply Chains, Factory Automation Abstract: There is an emerging industrial demand for predictive maintenance algorithms that exhibit high levels of predictive accuracy. Such condition monitoring tools must estimate dynamic quantities, such as Remaining Useful Lifetime (RUL) and the State of Health (SOH), based on a, typically, restricted set of measurements that can be obtained in an operational setting. These quantities exhibit inherent stochasticity and can only be approximately determined a posteriori to system failure. This paper proposes a generic prognostic tool for probabilistic condition monitoring of mechatronic systems, with the aim to improve the probabilistic prediction of condition metrics, specifically RUL and SOH. Therefore we propose to identify a Hidden Markov Model (HMM) from a fully instrumented measurement set, that is only available for a restricted set of run-to-failure experiments, typically gathered in an R&D setting. Although being artificial and retrospectively constructed metrics, we interpret RUL and SOH as physical measurements with the purpose to identify accurate degradation dynamics. Once the degradation model is identified, we practice the mathematical flexibility of the HMM framework to estimate several of the no longer available dynamic quantities of interest in real-time, from the limited set of measurements that are available in an operational setting. This modelling paradigm is known as virtual sensing.

16:30-18:00, Paper TuCT31-NT.3	Add to My Program
Digital Robot Judge (DR.J): Building a Task-Centric Performance Database of Real-World Manipulation with Electronic Task Boards

So, Peter	Technical University of Munich
Sarabakha, Andriy	Nanyang Technological University
Wu, Fan	Technical University of Munich
Culha, Utku	Technical University of Munich
Abu-Dakka, Fares	Mondragon University
Haddadin, Sami	Technical University of Munich
Keywords: Factory Automation, Intelligent and Flexible Manufacturing, Sensor Networks Abstract: Robotics aims to develop manipulation skills approaching human performance. However, skill complexity is often over- or underestimated based on individual experience, and the real-world performance gap is difficult or expensive to measure through in-person competitions. To bridge this gap, we propose a compact, Internet-connected, electronic task board to measure manipulation performance remotely; we call it the digital robot judge, or “DR.J.” By detecting key events on the board through performance circuitry, DR.J provides an alternative to transporting equipment to in-person competitions and serves as a portable test and data generation system that captures and grades performances, making comparisons less expensive. Data collected are automatically published on a web dashboard (WD) that provides a living performance benchmark. We share the results of a proof-of-concept electronic task board with industry-inspired tasks used in an international competition in 2021 and 2022 to benchmark localization, insertion, and disassembly tasks. We present data from 10 DR.J task boards, describe a method for deriving the relative task complexity (RTC) from timing data, and compare robot solutions with a human performer. In the best case, robots performed 9x faster than humans in specialized tasks but achieved only 16% of human speed across the full set of tasks. Finally, we present the design and software to replicate the electronic task board to promote task-centric benchmarking.

16:30-18:00, Paper TuCT31-NT.4	Add to My Program
Semi-Analytical Design of PDE Endpoint Controller for Flexible Manipulator with Non-Homogenous Boundary Conditions (I)

Yaqubi, Sadeq	Tampere University
Tahamipourzarandi, Seyedmohammad	Tampere University
Mattila, Jouni	Tampere University
Keywords: Flexible Robotics, Nonholonomic Mechanisms and Systems Abstract: This study proposes a new semi-analytical design and implementation method for nonlinear partial differential equation (PDE) control of a flexible manipulator. The proposed scheme considers the effects of the boundary input force and gravity on the payload, which results in non-homogenous boundary conditions. This objective is achieved based on a model transformation scheme for homogenizing boundary conditions, obtaining semi-analytical solutions for the corresponding PDE model. Model transformation is assigned as a hybrid exponential– polynomial function whose coefficients are conveniently calculable without the need for any additional boundary condition measurements. This eliminates the need to use intensive numerical solvers—for example, methods based on finite element analysis— and allows the implementation of sophisticated PDE control schemes considering fully nonlinear PDE models with high computation speed. The presented controller is robust to parametric model uncertainty due to its adaptive design. The precision and efficiency of calculating distributed states using the proposed model transformation are demonstrated based on experimental data for the flexible manipulator with respect to the ground truth camera-based motion capture system. Model transformation is also numerically implemented for the proposed nonlinear endpoint control method based on the original PDE model.

16:30-18:00, Paper TuCT31-NT.5	Add to My Program
Automated Sewing System Enabled by Machine Vision for Smart Garment Manufacturing

Ku, Subyeong	Seoul National University
Choi, HyunWoong	Seoul National University
Kim, Ho-Young	Seoul National University
Park, Yong-Lae	Seoul National University
Keywords: Intelligent and Flexible Manufacturing, Computer Vision for Manufacturing, Factory Automation Abstract: This paper presents an automated sewing system de- signed for smart garment manufacturing, incorporating machine vision capabilities into a custom-built sewing machine. The vision system captures an image of the fabric pattern placed between two acrylic plates with a small opening, utilizing a deep learning model to detect and segment the opening, which represents the area of interest on the plate. Subsequently, a specialized algorithm detects a narrow seam line within the segmented image and generates a stitching path alongside the seam line, ensuring a consistent distance. The sewing machine then accurately stitches along the generated path automatically. The vision system utilized in this study achieves a spatial resolution of 68 um per pixel. The custom-built sewing machine, controlled by an external computer, exhibits a spatial resolution of 10 um, a translation speed of 60 mm/s, and an adjustable stitching interval ranging from 1 mm to 5 mm. The subsystems and components are interconnected using the Robot Operating System (ROS), enabling seamless communication and integration. The proposed system eliminates the need for human intervention, facilitating automated garment production. This innovative system is expected to play a critical role in realizing the vision of smart garment manufacturing.

16:30-18:00, Paper TuCT31-NT.6	Add to My Program
Segmentation and Coverage Planning of Freeform Geometries for Robotic Surface Finishing

Schneyer, Stefan	German Aerospace Center (DLR)
Sachtler, Arne	Technical University of Munich (TUM)
Eiband, Thomas	German Aerospace Center (DLR)
Nottensteiner, Korbinian	German Aerospace Center (DLR)
Keywords: Intelligent and Flexible Manufacturing, Contact Modeling, Motion and Path Planning Abstract: Surface finishing such as grinding or polishing is a time-consuming task, involves health risks for humans and is still largely performed by hand. Due to the high curvatures of complex geometries, different areas of the surface cannot be optimally reached by a simple strategy using a tool with a relatively large and flat finishing disk. In this paper, a planning method is presented that uses a variable contact point on the finishing disk as an additional degree of freedom. Different strategies for covering the workpiece surface are used to optimize the surface finishing process and ensure the coverage of concave areas. Therefore, an automatic segmentation method is developed to find areas with a uniform machining strategy based on the exact tool and workpiece geometry. Further, a method for planning coverage paths is presented, in which the contact area is modeled to realize an adaptive spacing between path lines. The approach was evaluated in simulation and practical experiments on the DLR SARA robot. The results show high coverage for complex freeform geometry and that adaptive spacing can optimize the overall process by reducing uncovered gaps and overlaps between coverage lines.

16:30-18:00, Paper TuCT31-NT.7	Add to My Program
Integrating Robot Assignment and Maintenance Management: A Multi-Agent Reinforcement Learning Approach for Holistic Control

Bhatta, Kshitij	University of Virginia
Chang, Qing	University of Virginia
Keywords: Intelligent and Flexible Manufacturing, Manufacturing, Maintenance and Supply Chains, Planning, Scheduling and Coordination Abstract: Modern manufacturing requires effective integration of production control and maintenance scheduling to improve productivity and quality. However, there have been few studies on this integrated control due to a lack of a comprehensive manufacturing system model. In response to this challenge, this paper presents a mathematical model framework for a mobile multi-skilled robot-operated manufacturing system that integrates three essential control aspects: robot assignment, maintenance scheduling, and product quality. To demonstrate the effectiveness of this approach, a control problem is solved in the Decentralized Partially Observable Markov Decision Process (Dec-POMDP) framework. Results show that the proposed integrated model outperforms models that consider only system-level parameters, as well as those that only address maintenance scheduling and quality-related parameters.

16:30-18:00, Paper TuCT31-NT.8	Add to My Program
Towards Fault-Tolerant Deployment of Mobile Robot Navigation in the Edge: An Experimental Study

Mirus, Florian	Intel Labs
Pasch, Frederik	Intel
Scholl, Kay-Ulrich	Intel
Keywords: Robot Safety, Failure Detection and Recovery, Industrial Robots Abstract: Modern algorithms allow robots to reach a greater level of autonomy and fulfill more challenging tasks. However, on-board limitations regarding computational and battery resources are hindering factors regarding the deployment of such algorithms particularly on mobile robots. Although offloading a majority of the algorithmic components to the edge or even cloud offers an attractive option to leverage massive computing power in robotics applications, safety and reliability remain critical issues. This paper presents a minimalistic safety fallback mechanism when offloading mobile robot navigation to the edge, that ensures safe and collision-free navigation even in the presence of failures in the connection between the on-board and edge-devices. We show the effectiveness of our approach through extensive testing in three different relevant scenarios in a simulated warehouse environment. Our experiments demonstrate the effects of different fallback strategies and show how our proposed approach is able to ensure safety while allowing the robot to continue its mission during an interrupted connection and thus avoiding unnecessary downtime.

Technical Program for Tuesday May 14, 2024