ICRA 2024 Program | Thursday May 16, 2024


ThPL-HL Plenary Session, National Convention Hall	Add to My Program
Plenary III: Rehabilitation Robotics: How to Improve Daily Functions in People with Impairments? -- Prof. Sunil K Agrawal

Chair: Hasegawa, Yasuhisa	Nagoya University

09:00-10:00, Paper ThPL-HL.1	Add to My Program
Rehabilitation Robotics: How to Improve Daily Functions in People with Impairments?

Agrawal, Sunil	Columbia University


ThAT2-CC Oral Session, CC-311	Add to My Program
Swarm Robotics

Chair: Birattari, Mauro	Université Libre De Bruxelles
Co-Chair: Grosu, Radu	TU Wien

10:30-12:00, Paper ThAT2-CC.1	Add to My Program
Flock-Formation Control of Multi-Agent Systems Using Imperfect Relative Distance Measurements

Brandstätter, Andreas	Technische Universität Wien
Smolka, Scott	Stony Brook University
Stoller, Scott	Stony Brook University
Tiwari, Ashish	Microsoft Corp
Grosu, Radu	TU Wien
Keywords: Swarm Robotics, Aerial Systems: Mechanics and Control Abstract: We present distributed distance-based control (DDC), a novel approach for controlling a multi-agent system, such that it achieves a desired formation, in a resource-constrained setting. Our controller is fully distributed and only requires local state-estimation and scalar measurements of inter-agent distances. It does not require an external localization system or inter-agent exchange of state information. Our approach uses spatial-predictive control (SPC), to optimize a cost function given strictly in terms of inter-agent distances and the distance to the target location. In DDC, each agent continuously learns and updates a very abstract model of the actual system, in the form of a dictionary of three independent key-value pairs (delta s, delta d), where delta d is the partial derivative of the distance measurements along a spatial direction delta s. This is sufficient for an agent to choose the best next action. We validate our approach by using DDC to control a collection of Crazyflie drones to achieve formation flight and reach a target while maintaining flock formation.

10:30-12:00, Paper ThAT2-CC.2	Add to My Program
From Shadows to Light: A Swarm Robotics Approach with Onboard Control for Seeking Dynamic Sources in Constrained Environments

Karagüzel, Tugay Alperen	Vrije Universiteit Amsterdam
Retamal Guiberteau, Victor	Vrije Universiteit Amsterdam
Cambier, Nicolas	Vrije Universiteit Amsterdam
Ferrante, Eliseo	Vrije Universiteit Amsterdam
Keywords: Swarm Robotics, Aerial Systems: Perception and Autonomy, Multi-Robot Systems Abstract: In this paper, we present a swarm robotics control and coordination approach that can be used for locating a moving target or source in a GNSS-denied indoor setting. The approach is completely on-board and can be deployed on nano-drones such as the Crazyflies. The swarm acts on a simple set of rules to identify and trail a dynamically changing source gradient. To validate the effectiveness of our approach, we conduct experiments to detect the maxima of the dynamic gradient, which was implemented with a set of lights turned on and off with a time-varying pattern. Additionally, we introduce also a minimalistic fully onboard obstacle avoidance method, and assess the flexibility of our method by introducing an obstacle into the environment. The strategies rely on local interactions among UAVs, and the sensing of the source happens only at the individual level and is scalar, making it a viable option for UAVs with limited capabilities. Our method is adaptable to other swarm platforms with only minor parameter adjustments. Our findings demonstrate the potential of this approach as a flexible solution to tackle such tasks in constrained GNSS-denied indoor environments successfully.

10:30-12:00, Paper ThAT2-CC.3	Add to My Program
Multi-Swarm Interaction through Augmented Reality for Kilobots

Feola, Luigi	University of Rome "La Sapienza" - National Research Council IST
Reina, Andreagiovanni	Université Libre De Bruxelles
Talamali, Mohamed S.	University College London (UCL)
Trianni, Vito	Consiglio Nazionale Delle Ricerche
Keywords: Swarm Robotics, Autonomous Agents, Engineering for Robotic Systems Abstract: Research with swarm robotics systems can be complicated, time-consuming, and often expensive in terms of space and resources. The situation is even worse for studies involving multiple, possibly heterogeneous robot swarms. Augmented reality can provide an interesting solution to these problems, as demonstrated by the ARK system (Augmented Reality for Kilobots), which enhanced the experimentation possibilities with Kilobots, also relieving researchers from demanding tracking and logging activities. However, ARK is limited in mostly enabling experimentation with a single swarm. In this paper, we introduce M-ARK, a system to support studies on multi-swarm interaction. M-ARK is based on the synchronisation over a network connection of multiple ARK systems, whether real or simulated, serving a twofold purpose: (i) to study the interaction of multiple, possibly heterogeneous swarms, and (ii) to enable a gradual transition from simulation to reality. Moreover, M-ARK enables the interaction between swarms dislocated across multiple labs worldwide, encouraging scientific collaboration and advancement in multi-swarm interaction studies.

10:30-12:00, Paper ThAT2-CC.4	Add to My Program
Morphobot: A Platform for Morphogenesis in Robot Swarm

Qin, Xiaoyang	Shenyang Institute of Automation, CAS
Yang, Yongliang	Shenyang Institute of Automation, CAS
Pan, Mengyun	Shenyang Institute of Automation, Chinese Academy of Sciences
Cui, Long	Shenyang Institute of Automation, Chinese Academy of Sciences
Liu, Lianqing	Shenyang Institute of Automation
Keywords: Swarm Robotics, Cellular and Modular Robots, Biologically-Inspired Robots Abstract: Various robot platforms have been developed for investigating new algorithms for swarm robotics. Morphogenetic engineering in robot swarms, however, proposes new requirements for platforms: precise motion control, physical interactions with the environment or neighbor robots, and functionalized shells. Few current platforms fulfill all the above characteristics. Here, we present Morphobot, a robot platform for morphogenetic engineering in swarm robotics. Its direct current coreless motors provide physical support and strong power, meanwhile, these needle-like motors also enhance the physical interactions among robots. Each Morphobot has a changeable shell. It is functionalized for programming local interactions, through physical contact or communication, among Morphobots. We characterized the mobility of Morphobot to test its capability of moving and physically interacting with its neighbors. To demonstrate its advantages in the morphogenesis of robot swarms, we designed two morphogenetic engineering experiments. The results revealed that swarms of Morphobots can form patterns via physical interactions and optical communications.

10:30-12:00, Paper ThAT2-CC.5	Add to My Program
FireAntV3: A Modular Self-Reconfigurable Robot towards Free-Form Self-Assembly Using Attach-Anywhere Continuous Docks

Swissler, Petras	New Jersey Institute of Technology
Rubenstein, Michael	Northwestern University
Keywords: Swarm Robotics, Cellular and Modular Robots, Mechanism Design Abstract: FireAntV3 uses a refined version of the 3D Continuous Docks to attach to other such docks at any location at any orientation with simple control and without alignment. The robot improves upon previous FireAnt-series robots by redesigning the locomotion drive system to improve mechanical and attachment reliability while also reducing the number of motors from six to three. We also expand the sensory capabilities of FireAntV3 to enable the robot to sense forces, sense the direction to a light source, and to sense contacting neighbors using vibrations. We validate this robot through full-robot tests demonstrating phototaxis and neighbor-detecting behavior. This paper also describes the method for manufacturing the continuous docks in a variety of geometries.

10:30-12:00, Paper ThAT2-CC.6	Add to My Program
Reciprocal and Non-Reciprocal Swarmalators with Programmable Locomotion and Formations for Robot Swarms

Ceron, Steven	Massachusetts Institute of Technology
Xiao, Wei	MIT
Rus, Daniela	MIT
Keywords: Swarm Robotics, Cellular and Modular Robots, Micro/Nano Robots Abstract: Natural and robotic swarms often exhibit non-reciprocal interactions; agents do not exhibit equal and opposite forces on each other. By studying the effects of reciprocal and non-reciprocal interactions we are better able to design emergent behaviors in robot collectives composed of agents that exert attractive and repulsive forces on each other. Moreover, by controlling agent-specific coupling forces on-demand, we can enable a collective to exhibit desired behaviors previously not possible. We use a general form of the swarming oscillator, swarmalator, model to study reciprocal and non-reciprocal interactions among agents that affect each other's motions over long and short distances, we use non-reciprocal coupling to elicit collective locomotion toward or away from target sites, and we use the control barrier function method to optimize the non-reciprocal interactions for a desired spatial formation. This work addresses the interests of the active matter, swarm robotics, and control barrier functions communities and demonstrates various collective behaviors with strong potential to be realized in macro- and micro- length scale robot swarms.

10:30-12:00, Paper ThAT2-CC.7	Add to My Program
Automatically Designing Robot Swarms in Environments Populated by Other Robots: An Experiment in Robot Shepherding

Garzón Ramos, David	Université Libre De Bruxelles
Birattari, Mauro	Université Libre De Bruxelles
Keywords: Swarm Robotics, Evolutionary Robotics Abstract: Automatic design is a promising approach to realizing robot swarms. Given a mission to be performed by the swarm, an automatic method produces the required control software for the individual robots. Automatic design has concentrated on missions that a swarm can execute independently, interacting only with a static environment and without the involvement of other active entities. In this paper, we investigate the design of robot swarms that perform their mission by interacting with other robots that populate their environment. We frame our research within robot shepherding: the problem of using a small group of robots—the shepherds—to coordinate a relatively larger group—the sheep. In our study, the group of shepherds is the swarm that is automatically designed, and the sheep are pre-programmed robots that populate its environment. We use automatic modular design and neuroevolution to produce the control software for the swarm of shepherds to coordinate the sheep. We show that automatic design can leverage mission-specific interaction strategies to enable an effective coordination between the two groups.

10:30-12:00, Paper ThAT2-CC.8	Add to My Program
CrazySim: A Software-In-The-Loop Simulator for the Crazyflie Nano Quadrotor

Llanes, Christian	The Georgia Institute of Technology
Kakish, Zahi	Sandia National Laboratories
Williams, Kyle	Sandia National Labs
Coogan, Samuel	Georgia Tech
Keywords: Swarm Robotics, Software-Hardware Integration for Robot Systems, Aerial Systems: Applications Abstract: In this work we develop a software-in-the-loop simulator platform for Crazyflie nano quadrotor drone fleets. One of the challenges in maintaining a large fleet of drones is ensuring that the fleet performs its task as expected without collision, and this becomes more challenging as the number of drones scales, possibly into the hundreds. Software-in-the-loop simulation is an important component in verifying that drone fleets operate correctly and can significantly reduce development time. The simulator interface that we develop runs an instance of the Crazyflie flight stack firmware for each individual drone on a commercial, desktop machine along with a sensors and communication plugin on Gazebo Sim. The plugin transmits simulated sensor information to the firmware along with a socket link interface to run external scripts that would be run on a ground station during hardware deployment. The plugin simulates a radio communication delay between the drones and the ground station to test offboard control algorithms and high-level fleet commands. To validate the proposed simulator, we provide a case study of decentralized model predictive control (MPC) that is run on a ground station to command a fleet of sixteen drones to follow a specified trajectory. We first run the controller on the simulator interface to verify performance and robustness of the algorithm before deployment to a Crazyflie hardware experiment in the Georgia Tech Robotarium.


ThAT8-CC Oral Session, CC-418	Add to My Program
Learning in Field Robotics

Chair: Gao, Zhi	Temasek Laboratories @ NUS
Co-Chair: Sugiura, Hisashi	Yanmar Co., Ltd

10:30-12:00, Paper ThAT8-CC.1	Add to My Program
TartanDrive 2.0: More Modalities and Better Infrastructure to Further Self-Supervised Learning Research in Off-Road Driving Tasks

Sivaprakasam, Matthew	Carnegie Mellon University
Maheshwari, Parv	Indian Institute of Technology Kharagpur
Guaman Castro, Mateo	Carnegie Mellon University
Triest, Samuel	Carnegie Mellon University
Nye, Micah	University of Pittsburgh
Willits, Steven	Carnegie Mellon University
Saba, Andrew	Carnegie Mellon University
Wang, Wenshan	Carnegie Mellon University
Scherer, Sebastian	Carnegie Mellon University
Keywords: Data Sets for Robot Learning, Field Robots, Big Data in Robotics and Automation Abstract: We present TartanDrive 2.0, a large-scale off-road driving dataset for self-supervised learning tasks. In 2021 we released TartanDrive 1.0, which is one of the largest datasets for off-road terrain. As a follow up to our original dataset, we collected seven hours of data at speeds of up to 15m/s with the addition of three new LiDAR sensors alongside the original camera, inertial, GPS, and proprioceptive sensors. We also release the tools we use for collecting, processing, and querying the data, including our metadata system designed to further the utility of our data. Custom infrastructure allows end users to reconfigure the data to cater to their own platforms. These tools and infrastructure alongside the dataset are useful for a variety of tasks in the field of off-road autonomy and, by releasing them, we encourage collaborative data aggregation. These resources lower the barrier to entry to utilizing large-scale datasets, thereyby helping facilitate the advancement of robotics in areas such as self-supervised learning, multi-modal perception, inverse reinforcement learning, and representation learning. The dataset is available at https://theAirLab.org/TartanDrive2.

10:30-12:00, Paper ThAT8-CC.2	Add to My Program
EnYOLO: A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement

Wen, Junjie	The Chinese University of Hong Kong
Cui, Jinqiang	Peng Cheng Laboratory
Zhao, Benyun	The Chinese University of Hong Kong
Han, Bingxin	The Chinese University of Hong Kong
Liu, Xuchen	The Chinese University of Hong Kong
Gao, Zhi	Temasek Laboratories @ NUS
Chen, Ben M.	Chinese University of Hong Kong
Keywords: Deep Learning Methods, Marine Robotics, Object Detection, Segmentation and Categorization Abstract: In recent years, significant progress has been made in the field of underwater image enhancement (UIE). However, its practical utility for high-level vision tasks, such as underwater object detection (UOD) in Autonomous Underwater Vehicles (AUVs), remains relatively unexplored. It may be attributed to several factors: (1) Existing methods typically employ UIE as a pre-processing step, which inevitably introduces considerable computational overhead and latency. (2) The process of enhancing images prior to training object detectors may not necessarily yield performance improvements. (3) The complex underwater environments can induce significant domain shifts across different scenarios, seriously deteriorating the UOD performance. To address these challenges, we introduce EnYOLO, an integrated real-time framework designed for simultaneous UIE and UOD with domain-adaptation capabilities. Specifically, both the UIE and UOD task heads share the same network backbone and utilize a lightweight design. Furthermore, to ensure balanced training for both tasks, we present a multi-stage training strategy aimed at consistently enhancing the performance of both functions. Additionally, we propose a novel domain-adaptation strategy to align feature embeddings originating from diverse underwater environments. Comprehensive experiments demonstrate that our framework not only achieves state-of-the-art (SOTA) performance in both UIE and UOD tasks, but also shows superior adaptability when applied to different underwater scenarios. Our efficiency analysis further highlights the substantial potential of our framework for onboard deployment.

10:30-12:00, Paper ThAT8-CC.3	Add to My Program
GS-PKNN: An Efficient and High-Fidelity Mobility Prediction Method for Unmanned Ground Vehicles

Hua, Chen	University of Science and Technology of China
Jiang, Chunmao	University of Science and Technology of China
Niu, Runxin	Hefei Institutes of Physical Science, Chinese Academy of Science
Yu, Biao	Hefei Institutes of Physical Science, Chinese Academy of Science
Zhu, Hui	Hefei Institutes of Physical Science, Chinese Academy of Science
Li, Bichun	Hefei Institutes of Physical Science, Chinese Academy of Science
Keywords: Field Robots, Deep Learning Methods, Simulation and Animation Abstract: To avoid unmanned ground vehicles being obstructed by deformed terrain in off-road, effective vehicle mobility analysis is required. However, the computational complexity of existing mobility analysis methods, such as discrete element analysis, poses significant challenges when applied to largescale terrains. To address this problem, we propose an efficient and high-fidelity vehicle mobiliy prediction method for a largescale terrain. Initially, precise terrain models are constructed employing Gaussian sampling (GS), thereby serving as optimal inputs for the mobility simulation. Subsequently, we introduce a co-simulation method based on a multi-body dynamics model and discrete element analysis to obtain high-fidelity vehicle mobility data on sampled terrains. Following that, the mobility data is utilized to train a PSO-kriging neural network (PKNN), enabling accurate predictions of the global mobility map. Through rigorous simulation experiments, the proposed method (GS-PKNN) demonstrates its remarkable effectiveness.

10:30-12:00, Paper ThAT8-CC.4	Add to My Program
UNRealNet: Learning Uncertainty-Aware Navigation Features from High-Fidelity Scans of Real Environments

Triest, Samuel	Carnegie Mellon University
Fan, David D.	Jet Propulsion Laboratory, California Institute of Technology, P
Scherer, Sebastian	Carnegie Mellon University
Agha-mohammadi, Ali-akbar	NASA-JPL, Caltech
Keywords: Field Robots, Legged Robots, Deep Learning Methods Abstract: Traversability estimation in rugged, unstructured environments remains a challenging problem in field robotics. Often, the need for precise, accurate traversability estimation is in direct opposition to the limited sensing and compute capability present on affordable, small-scale mobile robots. To address this issue, we present a novel method to learn [u]ncertainty-aware [n]avigation features from high-fidelity scans of [real]-world environments (UNRealNet). This network can be deployed on-robot to predict these high-fidelity features using input from lower-quality sensors. UNRealNet predicts dense, metric-space features directly from single-frame lidar scans, thus reducing the effects of occlusion and odometry error. Our approach is label-free, and is able to produce traversability estimates that are robot-agnostic. Additionally, we can leverage UNRealNet’s predictive uncertainty to both produce risk-aware traversability estimates, and refine our feature predictions over time. We find that our method outperforms traditional local mapping and inpainting baselines by up to 40%, and demonstrate its efficacy on multiple legged platforms.

10:30-12:00, Paper ThAT8-CC.5	Add to My Program
Robot-Dependent Traversability Estimation for Outdoor Environments Using Deep Multimodal Variational Autoencoders

Eder, Matthias	Graz University of Technology
Steinbauer-Wagner, Gerald	Graz University of Technology
Keywords: Field Robots, Motion and Path Planning, Data Sets for Robot Learning Abstract: Efficient and reliable navigation in off-road environments poses a significant challenge for robotics, especially when factoring in the varying capabilities of robots across different terrains. To achieve this, the robot system's traversability is usually estimated to plan traversable routes through an environment. This paper presents a new approach that utilizes Deep Multimodal Variational Autoencoders (DMVAEs) for estimating the traversability of different robots in complex off-road terrains. Our method utilizes DMVAEs to capture essential environmental information and robot properties, effectively modeling factors that influence robotic traversability. The key contribution of this research is a two-stage traversability estimation framework for various robots in diverse off-road conditions that integrates robot properties in addition to environmental information to predict the traversability for various robots in a single model. We validate our method through real-world experiments involving four ground robots navigating an alpine environment. Comparative evaluations against state-of-the-art traversability estimation methods demonstrate the superior accuracy and robustness of our approach. Additionally, we investigate the transfer of trained models to new robots, enhancing their traversability estimation and extending the applicability of our framework.

10:30-12:00, Paper ThAT8-CC.6	Add to My Program
F3DMP: Foresighted 3D Motion Planning of Mobile Robots in Wild Environments

Yang, Andong	Institute of Computing Technology, Chinese Academy of Sciences
Li, Wei	Institute of Computing Technology, Chinese Academy of Sciences
Hu, Yu	Institute of Computing Technology Chinese Academy of Sciences
Keywords: Field Robots, Motion and Path Planning, Deep Learning Methods Abstract: In wild environments, motion planning for mobile robots faces the challenge of local optimal path traps due to limited sensor perception range and lack of spatial awareness. Existing approaches that avoid local optimum by designing heuristic functions or high-quality global paths in wild environments are time-consuming and unstable. This work proposes F3DMP, which consists of two parts to alleviate the local optimum solution and better utilize distant terrain information. First, the entire planning framework is adapted to the three-dimensional space so that the planning result conforms to the geometric characteristics of the terrain. Second, a time allocation function based on offline reinforcement learning is proposed. This function can anticipate potential challenges or opportunities based on semantic information for the image and proactively determine a time allocation. Our planner is integrated into a complete mobile robot system and deployed to a real robot. Experiments in simulation and the real world demonstrate that our method can improve the success rate by 28% and the trajectory smoothness by 27% compared with traditional methods.

10:30-12:00, Paper ThAT8-CC.7	Add to My Program
MATRIX: Multi-Agent Trajectory Generation with Diverse Contexts

Xu, Zhuo	UC Berkeley
Zhou, Rui	University of California, Berkeley
Yin, Yida	University of California, Berkeley
Gao, Huidong	University of California, Berkeley
Tomizuka, Masayoshi	University of California
Li, Jiachen	University of California, Riverside
Keywords: Intelligent Transportation Systems, Path Planning for Multiple Mobile Robots or Agents, Datasets for Human Motion Abstract: Data-driven methods have great advantages in modeling complicated human behavioral dynamics and dealing with many human-robot interaction applications. However, collecting massive and annotated real-world human datasets has been a laborious task, especially for highly interactive scenarios. On the other hand, algorithmic data generation methods are usually limited by their model capacities, making them unable to offer realistic and diverse data needed by various application users. In this work, we study trajectory-level data generation for multi-human or human-robot interaction scenarios and propose a learning-based automatic trajectory generation model, which we call Multi-Agent TRajectory generation with dIverse conteXts (MATRIX). MATRIX is capable of generating interactive human behaviors in realistic diverse contexts. We achieve this goal by modeling the explicit and interpretable objectives so that MATRIX can generate human motions based on diverse destinations and heterogeneous behaviors. We carried out extensive comparison and ablation studies to illustrate the effectiveness of our approach across various metrics. We also presented experiments that demonstrate the capability of MATRIX to serve as data augmentation for imitation-based motion planning.


ThAT10-CC Oral Session, CC-501	Add to My Program
Modeling, Control, and Learning for Soft Robots I

Chair: Mochiyama, Hiromi	University of Tsukuba
Co-Chair: Della Santina, Cosimo	TU Delft

10:30-12:00, Paper ThAT10-CC.1	Add to My Program
A Novel Model for Layer Jamming-Based Continuum Robots

Yi, Bowen	Polytechnique Montreal
Fan, Yeman	University of Technology Sydney
Liu, Dikai	University of Technology, Sydney
Keywords: Modeling, Control, and Learning for Soft Robots, Compliance and Impedance Control Abstract: Continuum robots with variable stiffness have gained wide popularity in the last decade. Layer jamming (LJ) has emerged as a simple and efficient technique to achieve tunable stiffness for continuum robots. Despite its merits, the development of a control-oriented dynamical model¹ tailored for this specific class of robots remains an open problem in the literature. This paper aims to present the first solution, to the best of our knowledge, to close the gap. We propose an energy-based model that is integrated with the LuGre frictional model for LJ-based continuum robots. Then, we take a comprehensive theoretical analysis for this model, focusing on two fundamental characteristics of LJ-based continuum robots: shape locking and adjustable stiffness. To validate the modeling approach and theoretical results, a series of experiments using our OctRobot-I continuum robotic platform was conducted. The results show that the proposed model is capable of interpreting and predicting the dynamical behaviors in LJ-based continuum robots.

10:30-12:00, Paper ThAT10-CC.2	Add to My Program
Lumped Parameter Dynamic Model of an Eversion Growing Robot: Analysis, Simulation and Experimental Validation

Vartholomeos, Panagiotis	University of Thessaly
Wu, Zicong	King's College London
Sadati, S.M.Hadi	King's College London
Bergeles, Christos	King's College London
Keywords: Modeling, Control, and Learning for Soft Robots, Dynamics, Medical Robots and Systems Abstract: This paper presents a lumped-parameter dynamic model of a pressure driven eversion robot carrying a catheter through its hollow core. A simulation framework based on the model is developed in MATLAB and is used for understanding the underlying physics, for identifying the regions of operation, and for demonstrating that, for a range of input commands, the catheter can be used as an actuation mechanism for propelling eversion; an approach especially useful for miniaturised systems. Simulations are experimentally validated on the MAMMOBOT system, which is a miniature steerable soft growing robot for early breast cancer detection. It was demonstrated that for most regions of operation experimental results compare well with simulation exhibiting an error less than 4%. Only one region of operation demonstrated larger deviations due possibly to unmodeled dynamics, which will be investigated in future work.

10:30-12:00, Paper ThAT10-CC.3	Add to My Program
Trajectory Tracking Control of Dual-PAM Soft Actuator with Hysteresis Compensator

Shen, Junyi	The University of Tokyo
Miyazaki, Tetsuro	The University of Tokyo
Ohno, Shingo	Bridgestone Corporation
Sogabe, Maina	The University of Tokyo
Kawashima, Kenji	The University of Tokyo
Keywords: Modeling, Control, and Learning for Soft Robots, Hydraulic/Pneumatic Actuators, Robust/Adaptive Control Abstract: Soft robotics is an emergent and swiftly evolving field. Pneumatic actuators are suitable for driving soft robots because of their superior performance. However, their control is not easy due to their hysteresis characteristics. In response to these challenges, we propose an adaptive control method to compensate hysteresis of a soft actuator. Employing a novel dual pneumatic artificial muscle (PAM) bending actuator, the innovative control strategy abates hysteresis effects by dynamically modulating gains within a traditional PID controller corresponding with the predicted motion of the reference trajectory. Through comparative experimental evaluation, we found that the new control method outperforms its conventional counterparts regarding tracking accuracy and response speed. Our work reveals a new direction for advancing control in soft actuators.

10:30-12:00, Paper ThAT10-CC.4	Add to My Program
Kinematic Modeling and Control of a Soft Robotic Arm with Non-Constant Curvature Deformation

Wang, Zhanchi	University of Science and Technology of China
Wang, Gaotian	University of Science and Technology of China
Chen, Xiaoping	University of Science and Technology of China
Freris, Nikolaos	University of Science and Technology of China
Keywords: Modeling, Control, and Learning for Soft Robots, Kinematics, Motion Control Abstract: The passive compliance of soft robotic arms renders the development of accurate kinematic models and model-based controllers challenging. The most widely used model in soft robotic kinematics assumes Piecewise Constant Curvature (PCC). However, PCC introduces errors when the robot is subject to external forces or even gravity. In this paper, we establish a three-dimensional (3D) kinematic representation of a soft robotic arm with pseudo universal and prismatic joints that are capable of capturing non-constant curvature deformations of the soft segments. We theoretically demonstrate that this constitutes a more general methodology than PCC. Simulations and experiments on the real robot attest to the superior modeling accuracy of our approach in 3D motions with unknown load. The maximum position/rotation error of the proposed model is verified 6.7times/4.6times lower than the PCC model considering gravity and external forces. Furthermore, we devise an inverse kinematic controller that is capable of positioning the tip, tracking trajectories, as well as performing interactive tasks in the 3D space.

10:30-12:00, Paper ThAT10-CC.5	Add to My Program
CIDGIKc: Distance-Geometric Inverse Kinematics for Continuum Robots

Zhang, Hanna	University of Toronto
Giamou, Matthew	University of Toronto
Maric, Filip	University of Toronto Institute for Aerospace Studies
Kelly, Jonathan	University of Toronto
Burgner-Kahrs, Jessica	University of Toronto
Keywords: Modeling, Control, and Learning for Soft Robots, Kinematics Abstract: The small size, high dexterity, and intrinsic compliance of continuum robots (CRs) make them well suited for constrained environments. Solving the inverse kinematics (IK), that is ﬁnding robot joint conﬁgurations that satisfy desired position or pose queries, is a fundamental challenge in motion planning, control, and calibration for any robot structure. For CRs, the need to avoid obstacles in tightly conﬁned workspaces greatly complicates the search for feasible IK solutions. Without an accurate initialization or multiple restarts, existing algorithms often fail to ﬁnd a solution. We present CIDGIKc (Convex Iteration for Distance-Geometric Inverse Kinematics for Continuum Robots), an algorithm that solves these nonconvex feasibility problems with a sequence of semideﬁnite programs whose objectives are designed to encourage low-rank minimizers. CIDGIKc is enabled by a novel distance-geometric parameterization of constant curvature segment geometry for CRs with extensible segments. The resulting IK formulation involves only quadratic expressions and can efﬁciently incorporate a large number of collision avoidance constraints. Our experimental results demonstrate >98% solve success rates within complex, highly cluttered environments which existing algorithms cannot account for.

10:30-12:00, Paper ThAT10-CC.6	Add to My Program
Automatically-Tuned Model Predictive Control for an Underwater Soft Robot

Null, W. David	University of Illinois Urbana-Champaign
Edwards, William	University of Illinois at Urbana-Champaign
Jeong, Dohun	University of Illinois at Urbana-Champaign
Tchalakov, Teodor	University of Illinois Urbana Champaign
Menezes, James	University of Illinois at Urbana Champaign
Hauser, Kris	University of Illinois at Urbana-Champaign
Z, Y	University of Illinois at Urbana-Champaign
Keywords: Modeling, Control, and Learning for Soft Robots, Machine Learning for Robot Control Abstract: Soft robots have desirable qualities for use in underwater environments thanks to their inherent compliance and lack of need for exposed hardware. Nevertheless, these advantages come at the cost of considerable control challenges. Data-driven model predictive control (MPC) is an approach that has shown promise in controlling soft robots. However, manually tuning the many hyperparameters in the learned dynamics model and the optimizer can be extremely tedious. In this work, we explore using data-driven MPC to control an underwater soft robot, and employ the AutoMPC method to automatically tune the hyperparameters and generate the controller. In the process, we extend AutoMPC’s capabilities to handle multi-task tuning and we add a barrier cost function to enforce actuator constraints. Our experiments show that the AutoMPC controller reaches targets with significantly higher accuracy and reliability than state-of-the-art baselines both in- and out-of-distribution of the training data.

10:30-12:00, Paper ThAT10-CC.7	Add to My Program
Soft Acoustic End-Effector

Zhang, Zhiyuan	Acoustic Robotics Systems Laboratory, Institute of Robotics And
Koch, Michael	Acoustic Robotics Systems Laboratory, Institute of Robotics And
Ahmed, Daniel	ETH Zurich
Keywords: Automation at Micro-Nano Scales, Micro/Nano Robots, Soft Robot Applications Abstract: Acoustic techniques have been developed as multifunctional tools for various microscale manipulations. In prevalent design paradigms, a position-fixed piezoelectric transducer (PZT) is utilized to generate ultrasound waves. However, the immobility of the PZT restricts the modulation of the acoustic field's position and orientation, consequently diminishing the adaptability and effectiveness of subsequent acoustic micromanipulation tasks. Here, we proposed a miniaturized soft acoustic end-effector and demonstrated acoustic field modulation and microparticle manipulation by adjusting PZT position and orientation. The PZT is mounted on the end of a soft robotic arm that has three individual degrees of freedom and can be deformed in 3D space by inflating or deflating each chamber. Experiments showed that the soft acoustic end-effector can change the traveling direction of microparticles and modulate the location of a standing wave field. Our approach is simple, flexible, and controllable. We envision that the soft acoustic end-effector will facilitate multiscale acoustic manipulation in interdisciplinary applications, especially, for in vivo acoustic therapies.

10:30-12:00, Paper ThAT10-CC.8	Add to My Program
A Provably Stable Iterative Learning Controller for Continuum Soft Robots

Pierallini, Michele	Centro Di Ricerca E. Piaggio - Università Di Pisa
Stella, Francesco	EPFL
Angelini, Franco	University of Pisa
Deutschmann, Bastian	German Aerospace Center
Hughes, Josie	EPFL
Bicchi, Antonio	Fondazione Istituto Italiano Di Tecnologia
Garabini, Manolo	Università Di Pisa
Della Santina, Cosimo	TU Delft
Keywords: Modeling, Control, and Learning for Soft Robots, Motion Control Abstract: Fully exploiting soft robots’ capabilities requires devising strategies that can accurately control their movements with the limited amount of control sources available. This task is challenging for reasons including the hard-to-model dynamics, the system’s underactuation, and the need of using a prominent feedforward control action to preserve the soft and safe robot behavior. To tackle this challenge, this letter proposes a purely feedforward iterative learning control algorithm that refines the torque action by leveraging both the knowledge of the model and data obtained from past experience. After presenting a 3D polynomial description of soft robots, we study their intrinsic properties, e.g., input-to-state stability, and we prove the convergence of the controller coping with locally Lipschitz nonlinearities. Finally, we validate the proposed approach through simulations and experiments involving multiple systems, trajectories, and in the case of external disturbances and model mismatches.


ThAT12-CC Oral Session, CC-503	Add to My Program
Deep Learning I

Chair: Valada, Abhinav	University of Freiburg
Co-Chair: Barfoot, Timothy	University of Toronto

10:30-12:00, Paper ThAT12-CC.1	Add to My Program
PCPNet: An Efficient and Semantic-Enhanced Transformer Network for Point Cloud Prediction

Luo, Zhen	Beijing Institute of Technology
Ma, Junyi	Beijing Institute of Technology
Zhou, Zijie	Beijing Institute of Technology
Xiong, Guangming	Beijing Institute of Technology
Keywords: Deep Learning Methods, Deep Learning for Visual Perception Abstract: The ability to predict future structure features of environments based on past perception information is extremely needed by autonomous vehicles, which helps to make the following decision-making and path planning more reasonable. Recently, point cloud prediction (PCP) is utilized to predict and describe future environmental structures by the point cloud form. In this letter, we propose a novel efficient Transformer-based network to predict the future LiDAR point clouds exploiting the past point cloud sequences. We also design a semantic auxiliary training strategy to make the predicted LiDAR point cloud sequence semantically similar to the ground truth and thus improves the significance of the deployment for more tasks in real-vehicle applications. Our approach is completely self-supervised, which means it does not require any manual labeling and has a solid generalization ability toward different environments. The experimental results show that our method outperforms the state-of-the-art PCP methods on the prediction results and semantic similarity, and has a good real-time performance. Our open-source code and pre-trained models are available at https://github.com/Blurryface0814/PCPNet.

10:30-12:00, Paper ThAT12-CC.2	Add to My Program
N2M2: Learning Navigation for Arbitrary Mobile Manipulation Motions in Unseen and Dynamic Environments

Honerkamp, Daniel	Albert Ludwigs Universität Freiburg
Welschehold, Tim	Albert-Ludwigs-Universität Freiburg
Valada, Abhinav	University of Freiburg
Keywords: Learning and Adaptive Systems, Mobile Manipulation, Service Robots, Robot Learning Abstract: Despite its importance in both industrial and service robotics, mobile manipulation remains a significant challenge as it requires seamless integration of end-effector trajectory generation with navigation skills as well as reasoning over long-horizons. Ex- isting methods struggle to control the large configuration space and to navigate dynamic and unknown environments. In the previous work, we proposed to decompose mobile manipulation tasks into a simplified motion generator for the end-effector in task space and a trained reinforcement learning agent for the mobile base to account for the kinematic feasibility of the motion. In this work, we introduce Neural Navigation for Mobile Manipulation (N2M2), which extends this decomposition to complex obstacle environ- ments, extends the agent’s control to the torso joint and the norm of the end-effector motion velocities, uses a more general reward function and, thereby, enables robots to tackle a much broader range of tasks in real-world settings. The resulting approach can perform unseen, long-horizon tasks in unexplored environments while instantly reacting to dynamic obstacles and environmental changes. At the same time, it provides a simple way to define new mobile manipulation tasks. We demonstrate the capabilities of our proposed approach in extensive simulation and real-world experi- ments on multiple kinematically diverse mobile manipulators.

10:30-12:00, Paper ThAT12-CC.3	Add to My Program
The Foreseeable Future: Self-Supervised Learning to Predict Dynamic Scenes for Indoor Navigation

Thomas, Hugues	University of Toronto
Zhang, Jian	Purdue University
Barfoot, Timothy	University of Toronto
Keywords: Learning and Adaptive Systems, Reactive and Sensor-Based Planning, Deep Learning in Robotics and Automation, Indoor Navigation Abstract: We present a method for generating, predicting, and using Spatiotemporal Occupancy Grid Maps (SOGM), which embed future semantic information of real dynamic scenes. We present an auto-labeling process that creates SOGMs from noisy real navigation data. We use a 3D-2D feedforward architecture, trained to predict the future time steps of SOGMs, given 3D lidar frames as input. Our pipeline is entirely self-supervised, thus enabling lifelong learning for real robots. The network is composed of a 3D back-end that extracts rich features and enables the semantic segmentation of the lidar frames, and a 2D front-end that predicts the future information embedded in the SOGM representation, potentially capturing the complexities and uncertainties of real-world multi-agent interactions. We also design a navigation system that uses these predicted SOGMs within planning, after they have been transformed into Spatiotemporal Risk Maps (SRMs). We verify our navigation system's abilities in simulation, validate it on a real robot, study SOGM predictions on real data in various circumstances, and provide a novel indoor 3D lidar dataset, collected during our experiments, which includes our automated annotations.

10:30-12:00, Paper ThAT12-CC.4	Add to My Program
The Treachery of Images: Bayesian Scene Keypoints for Deep Policy Learning in Robotic Manipulation

von Hartz, Jan Ole	University of Freiburg
Chisari, Eugenio	University of Freiburg
Welschehold, Tim	Albert-Ludwigs-Universität Freiburg
Burgard, Wolfram	University of Technology Nuremberg
Boedecker, Joschka	University of Freiburg
Valada, Abhinav	University of Freiburg
Keywords: Deep Learning Methods, Imitation Learning, Representation Learning Abstract: In policy learning for robotic manipulation, sample efficiency is of paramount importance. Thus, learning and extracting more compact representations from camera observations is a promising avenue. However, current methods often assume full observability of the scene and struggle with scale invariance. In many tasks and settings, this assumption does not hold as objects in the scene are often occluded or lie outside the field of view of the camera, rendering the camera observation ambiguous with regard to their location. To tackle this problem, we present BASK, a Bayesian approach to tracking scale-invariant keypoints over time. Our approach can successfully resolve inherent ambiguities in images, enabling keypoint tracking on symmetrical objects and occluded and out-of-view objects. We employ ourmethod to learn challenging multi-object robot manipulation tasks from wrist camera observations and demonstrate superior utility for policy learning compared to other representation learning techniques. Furthermore, we show outstanding robustness towards disturbances such as clutter, occlusions, and noisy depth measurements, as well as generalization to unseen objects both in simulation and real-world robotic experiments.

10:30-12:00, Paper ThAT12-CC.5	Add to My Program
PBP: Path-Based Trajectory Prediction for Autonomous Driving

Afshar, Sepideh	Motional
Deo, Nachiket	UC San Diego
Bhagat, Akshay	Motional AD Inc
Chakraborty, Titas	Motional
Shao, Yunming	ZOOX
Buddharaju, Balarama Raju	Motional AD
Deshpande, Adwait	Georgia Institute of Technology
Cui, Henggang	Motional
Keywords: Deep Learning Methods, AI-Based Methods, Intelligent Transportation Systems Abstract: Trajectory prediction plays a crucial role in the autonomous driving stack by enabling autonomous vehicles to anticipate the motion of surrounding agents. Goal-based prediction models have gained traction in recent years for addressing the multimodal nature of future trajectories. Goal-based prediction models simplify multimodal prediction by first predicting 2D goal locations of agents and then predicting trajectories conditioned on each goal. However, a single 2D goal location serves as a weak inductive bias for predicting the whole trajectory, often leading to poor map compliance, i.e., part of the trajectory going off-road or breaking traffic rules. In this paper, we improve upon goal-based prediction by proposing the Path-based prediction (PBP) approach. PBP predicts a discrete probability distribution over reference paths in the HD map using the path features and predicts trajectories in the path-relative Frenet frame. We applied the PBP trajectory decoder on top of the HiVT scene encoder and report results on the Argoverse dataset. Our experiments show that PBP achieves competitive performance on the standard trajectory prediction metrics, while significantly outperforming state-of-the-art baselines in terms of map compliance.

10:30-12:00, Paper ThAT12-CC.6	Add to My Program
Sensorless Estimation of Contact Using Deep-Learning for Human-Robot Interaction

Shan, Shilin	Nanyang Technological University
Pham, Quang-Cuong	NTU Singapore
Keywords: Deep Learning Methods, Physical Human-Robot Interaction, Industrial Robots Abstract: Physical human-robot interaction has been an area of interest for decades. Collaborative tasks, such as joint compliance, demand high-quality joint torque sensing. While external torque sensors are reliable, they come with the drawbacks of being expensive and vulnerable to impacts. To address these issues, studies have been conducted to estimate external torques using only internal signals, such as joint states and current measurements. However, insufficient attention has been given to friction hysteresis approximation, which is crucial for tasks involving extensive dynamic to static state transitions. In this paper, we propose a deep-learning-based method that leverages a novel long-term memory scheme to achieve dynamics identification, accurately approximating the static hysteresis. We also introduce modifications to the well-known Residual Learning architecture, retaining high accuracy while reducing inference time. The robustness of the proposed method is illustrated through a joint compliance and task compliance experiment.

10:30-12:00, Paper ThAT12-CC.7	Add to My Program
Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies

Lawson, Daniel	Purdue University
Qureshi, Ahmed H.	Purdue University
Keywords: Deep Learning Methods, Transfer Learning, Reinforcement Learning Abstract: Recent work has shown the promise of creating generalist, transformer-based, models for language, vision, and sequential decision-making problems. To create such models, we generally require centralized training objectives, data, and compute. It is of interest if we can more flexibly create generalist policies by merging together multiple, task-specific, individually trained policies. In this work, we take a preliminary step in this direction through merging, or averaging, subsets of Decision Transformers in parameter space trained on different MuJoCo locomotion problems, forming multi-task models without centralized training. We also demonstrate the importance of various methodological choices when merging policies, such as utilizing common pre-trained initializations, increasing model capacity, and utilizing Fisher information for weighting parameter importance. In general, we believe research in this direction could help democratize and distribute the process that forms multi-task robotics policies. Our implementation is available at https://github.com/daniellawson9999/merging-decision-transf ormers.


ThAT18-AX Oral Session, AX-206	Add to My Program
Optimization and Optimal Control III

Chair: Del Prete, Andrea	University of Trento
Co-Chair: Calinon, Sylvain	Idiap Research Institute

10:30-12:00, Paper ThAT18-AX.1	Add to My Program
Optimal Control of Granular Material

Aoyama, Yuichiro	Georgia Institute of Technology
Haeri, Amin	Concordia University
Theodorou, Evangelos	Georgia Institute of Technology
Keywords: Optimization and Optimal Control, Machine Learning for Robot Control, Industrial Robots Abstract: The control of granular materials, which are found in many industrial applications, is a challenging open research problem. Granular material systems are complex-behavior (as they could have solid-, fluid-, and gas-like behaviors) and high-dimensional (as they could have many grains/particles with at least 3 DOF in 3D) systems. Recently, a machine learning-based Graph Neural Network (GNN) simulator has been proposed to learn the underlying dynamics. In this paper, we perform optimal control of a rigid body-driven granular material system whose dynamics is learned by a GNN model trained by reduced data generated via a physics-based simulator and Principal Component Analysis (PCA). We use Differential Dynamic Programming (DDP) to obtain optimal control commands that can form granular particles into a target shape. The model and results are shown to be relatively fast and accurate. The control commands are also applied to the ground truth model, i.e., physics-based simulator, to further validate the approach.

10:30-12:00, Paper ThAT18-AX.2	Add to My Program
CACTO: Continuous Actor-Critic with Trajectory Optimization---Towards Global Optimality

Grandesso, Gianluigi	University of Trento
Alboni, Elisa	University of Trento
Rosati Papini, Gastone Pietro	University of Trento
Wensing, Patrick M.	University of Notre Dame
Del Prete, Andrea	University of Trento
Keywords: Optimization and Optimal Control, Machine Learning for Robot Control, Reinforcement Learning Abstract: This paper presents a novel algorithm for the continuous control of dynamical systems that combines Trajectory Optimization (TO) and Reinforcement Learning (RL) in a single framework. The motivations behind this algorithm are the two main limitations of TO and RL when applied to continuous nonlinear systems to minimize a non-convex cost function. Specifically, TO can get stuck in poor local minima when the search is not initialized close to a “good” minimum. On the other hand, when dealing with continuous state and control spaces, the RL training process may be excessively long and strongly dependent on the exploration strategy. Thus, our algorithm learns a “good” control policy via TO-guided RL policy search that, when used as initial guess provider for TO, makes the trajectory optimization process less prone to converge to poor local optima. Our method is validated on several reaching problems featuring non-convex obstacle avoidance with different dynamical systems, including a car model with 6D state, and a 3-joint planar manipulator. Our results show the great capabilities of CACTO in escaping local minima, while being more computationally efficient than the Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO) RL algorithms.

10:30-12:00, Paper ThAT18-AX.3	Add to My Program
Learning Model Predictive Control with Error Dynamics Regression for Autonomous Racing

Xue, Haoru	Carnegie Mellon University
Zhu, Edward	University of California, Berkeley
Dolan, John M.	Carnegie Mellon University
Borrelli, Francesco	University of California, Berkeley
Keywords: Optimization and Optimal Control, Model Learning for Control, Autonomous Vehicle Navigation Abstract: This work presents a novel Learning Model Predictive Control (LMPC) strategy for autonomous racing at the handling limit that can iteratively explore and learn unknown dynamics in high-speed operational domains. We start from existing LMPC formulations and modify the system dynamics learning method. In particular, our approach uses a nominal, global, nonlinear, physics-based model with a local, linear, data-driven learning of the error dynamics. We conduct experiments in simulation, 1/10th scale hardware, and deployed the proposed LMPC on a full-scale autonomous race car used in the Indy Autonomous Challenge (IAC) with closed loop experiments at the Putnam Park Road Course in Indiana, USA. The results show that the proposed control policy exhibits improved robustness to parameter tuning and data scarcity. Incremental and safety-aware exploration toward the limit of handling and iterative learning of the vehicle dynamics in high-speed domains is observed both in simulations and experiments.

10:30-12:00, Paper ThAT18-AX.4	Add to My Program
Robust Balancing Control of Biped Robots for External Forces

Park, Hae Yeon	POSTECH
Kim, Jung Hoon	Pohang University of Science and Technology
Keywords: Optimization and Optimal Control, Robust/Adaptive Control, Humanoid and Bipedal Locomotion Abstract: This paper develops a controller synthesis method for ensuring an admissible bound of external forces on biped robots in a desired level. We first introduce the authors` preceding results on the norm-based stability criterion for a biped walking constructed on its linear inverted pendulum model (LIPM). More precisely, an induced norm can be taken to formulate the fact that the balance for a biped robot is achieved if its zero moment point (ZMP) always stays in the supporting region at each step. Based on this norm-based criterion, we aim at making the maximum energy of external forces admissible for balancing the biped robot be a pregiven desired bound gamma(> 0). To achieve this objective, a robust controller is designed through the linear matrix inequality (LMI)-based approach. More importantly, a necessary and sufficient condition for the existence of a robust controller leading to the desired bound is characterized by some LMI conditions. The effectiveness of the overall arguments is validated through some comparative simulation results of a biped walking robot with external forces.

10:30-12:00, Paper ThAT18-AX.5	Add to My Program
Robust Policy Iteration of Uncertain Interconnected Systems with Imperfect Data (I)

Qasem, Omar	American International University
Gao, Weinan	Florida Institute of Technology
Keywords: Optimization and Optimal Control, Robust/Adaptive Control, Reinforcement Learning Abstract: This paper investigates the robust optimal control problem of a class of continuous-time, partially linear, interconnected systems. In addition to the dynamic uncertainties resulted from the interconnected dynamic system, unknown bounded disturbances and computational errors are taken into account throughout the learning process, wherein the system’s dynamics are also assumed unknown. These challenges lead the collected online data to be imperfect. In this scenario, traditional data-driven control techniques, such as adaptive dynamic programming (ADP) and robust ADP, encounter a challenge in learning the optimal control policy precisely due to imperfect data. In this paper, a novel data-driven robust policy iteration method is proposed to solve the robust optimal control problems. Without relying on the knowledge of the system’s dynamics, the external disturbances or the complete state, the implementation of the proposed method only needs to access the input and partial state information. Based on the small-gain theorem, the notions of strong unboundedness observability and input-to-output stability, it is guaranteed that the learned robust optimal control gain is stabilizing and that the solution of the closed-loop system is uniformly ultimately bounded despite the existence of dynamic uncertainties and unknown external disturbances. The simulation results reveal the efficiency and practicality of the proposed data-driven control method.

10:30-12:00, Paper ThAT18-AX.6	Add to My Program
Robust Co-Design of Canonical Underactuated Systems for Increased Certifiable Stability

Girlanda, Federico	University of Padua
Kumar, Shivesh	DFKI GmbH
Shala, Lasse	Deutsches Forschungszentrum Für Künstliche Intelligenz
Kirchner, Frank	University of Bremen
Keywords: Optimization and Optimal Control, Robust/Adaptive Control, Underactuated Robots Abstract: Optimal behaviours of a system to perform a specific task can be achieved by leveraging the coupling between trajectory optimization, stabilization and design optimization. This approach proves particularly advantageous for underactuated systems, which are systems that have fewer actuators than degrees of freedom and thus require for more elaborate control systems. This paper proposes a novel co-design algorithm, namely Robust Trajectory Control with Design optimization (RTC-D). An inner optimization layer (RTC) simultaneously performs direct transcription (DIRTRAN) to find a nominal trajectory while computing optimal hyperparameters for a stabilizing time-varying linear quadratic regulator (TVLQR). RTC-D augments RTC with a design optimization layer, maximizing the system’s robustness through a time-varying Lyapunov-based region of attraction (ROA) analysis. This analysis provides a formal guarantee of stability for a set of off-nominal states. The proposed algorithm has been tested on two different underactuated systems: the torque-limited simple pendulum and the cart-pole. Extensive simulations of off-nominal initial conditions demonstrate improved robustness, while real-system experiments show increased insensitivity to torque disturbances.

10:30-12:00, Paper ThAT18-AX.7	Add to My Program
Whole-Body Ergodic Exploration with a Manipulator Using Diffusion

Bilaloglu, Cem	Idiap Research Institute, École Polytechnique Fédérale De Lausan
Löw, Tobias	Idiap Research Institute, EPFL
Calinon, Sylvain	Idiap Research Institute
Keywords: Optimization and Optimal Control, Sensorimotor Learning, Whole-Body Motion Planning and Control Abstract: This paper presents a whole-body robot control method for exploring and probing a given region of interest. The ergodic control formalism behind such an exploration behavior consists of matching the time-averaged statistics of a robot trajectory with the spatial statistics of the target distribution. Most existing ergodic control approaches assume the robots/sensors as individual point agents moving in space. We introduce an approach that decomposes the whole-body of a robotic manipulator into multiple kinematically constrained agents. Then, we generate control actions by calculating a consensus among the agents. To do so, we use an ergodic control formulation called heat equation-driven area coverage (HEDAC) and slow the diffusion using the non-stationary heat equation. Our approach extends HEDAC to applications where robots have multiple sensors on the whole-body (such as tactile skin) and use all sensors to optimally explore the given region. We show that our approach increases the exploration performance in terms of ergodicity and scales well to real-world problems. We compare our method in kinematic simulations with the state-of-the-art and demonstrate the applicability of an online exploration task with a 7-axis Franka Emika robot.

10:30-12:00, Paper ThAT18-AX.8	Add to My Program
ReLU-QP: A GPU-Accelerated Quadratic Programming Solver for Model-Predictive Control

Bishop, Arun	Carnegie Mellon University
Zhang, John	Carnegie Mellon University
Gurumurthy, Swaminathan	Carnegie Mellon University
Tracy, Kevin	Carnegie Mellon University
Manchester, Zachary	Carnegie Mellon University
Keywords: Optimization and Optimal Control, Whole-Body Motion Planning and Control Abstract: We present ReLU-QP, a GPU-accelerated solver for quadratic programs (QPs) that is capable of solving high-dimensional control problems at real-time rates. ReLU-QP is derived by exactly reformulating the Alternating Direction Method of Multipliers (ADMM) algorithm for solving QPs as a deep, weight-tied neural network with rectified linear unit (ReLU) activations. This reformulation enables the deployment of ReLU-QP on GPUs using standard machine-learning toolboxes. We evaluate the performance of ReLU-QP across three model-predictive control (MPC) benchmarks: stabilizing random linear dynamical systems with control limits, balancing an Atlas humanoid robot on a single foot, and performing a whole-body pick-up motion on a quadruped equipped with a six-degree-of-freedom arm. These benchmarks indicate that ReLU-QP is competitive with state-of-the-art CPU-based solvers for small-to-medium-scale problems and offers order-of-magnitude speed improvements for larger-scale problems.


ThAT21-NT Oral Session, NT-G303	Add to My Program
Micro/Nano Robots I

Chair: Xu, Qingsong	University of Macau
Co-Chair: Cappelleri, David	Purdue University

10:30-12:00, Paper ThAT21-NT.1	Add to My Program
Automated Assembly by Two-Fingered Microhand for Fabrication of Soft Magnetic Microrobots

Zhao, Yue	Beijing Institute of Technology
Liu, Xiaoming	Beijing Institute of Technology
Wang, Ruixi	Tsinghua University
Liu, Dan	Beijing Institute of Technology
Kojima, Masaru	Osaka University
Huang, Qiang	Beijing Institute of Technology
Arai, Tatsuo	University of Electro-Communications
Keywords: Micro/Nano Robots Abstract: Micro-assembly is an emerging method to fabricate microrobots with multiple modules or particles. However, there is always a lack of a flexible and efficient method to freely create the desired magnetic soft microrobots. In this paper, an automated assembly system based on a two-fingered microhand is presented for fabricating magnetic soft microrobots. Our proposed system can automatically pick and place components to assemble microrobots with a two-fingered micromanipulator, and orient these components through an external magnetic field. The automated assembly has the advantages of high accuracy, high speed, and high success rate. It can endow magnetic microrobots with flexible material selection, arbitrary geometry design, and programable magnetization profile. We can make full use of this system to fabricate multiple magnetic soft microrobots. The experiment results demonstrate that this system can efficiently fabricate microrobots with excellent mechanical properties, which have application potential in robotics, biomedical engineering, and environmental governance.

10:30-12:00, Paper ThAT21-NT.2	Add to My Program
Thin-Film NiTi Microactuator with a Magnetic Spring for a Tiny Launcher Mechanism

Kim, Sukjun	University of California, San Diego
Bergbreiter, Sarah	Carnegie Mellon University
Keywords: Micro/Nano Robots, Additive Manufacturing Abstract: In this work, we present a thin-film shape memory alloy (NiTi) microactuator with a magnetic spring. This novel actuator design utilizes two permanent magnets and 3D-printed magnet holders to effectively apply a tensile strain on the NiTi thin-film. This actuator is expected to generate 8.7 mN of blocking force, and a free displacement of 30 μm is experimentally characterized. The actuator leverages bare NiTi film (∼ 1 μm thick) for actuation, enabling a high actuator bandwidth up to 50 Hz. A comprehensive analytical model is also studied, which was then validated by comparing to the experimental results. A launcher mechanism was designed and integrated with the NiTi actuator, and this mechanism was used to launch a microscale projectile (a salt grain) thereby demonstrating the relative high power actuation achievable with thin-film NiTi.

10:30-12:00, Paper ThAT21-NT.3	Add to My Program
Nature-Inspired Bubble Magnetic Microrobots for Multimode Locomotion, Cargo Delivery, Imaging, and Biosensing

Xu, Zichen	University of Macau
Xu, Qingsong	University of Macau
Yu, Hon Ho	Kiang Wu Hospital
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Biologically-Inspired Robots Abstract: Wirelessly actuated magnetic microrobots are promising tools in medical applications due to their tiny sizes and attractive robotic properties. However, it remains a huge challenge to integrate sufficient functionalities in a limited volume. Microscopic natural phenomenon is a great reference for current microrobot design, where the underlying intelligence and subtlety spurs related modern artificial systems. Inspired by air bubbles in nature, herein, we report a kind of novel magnetic air bubble microrobots. The air bubble-based structure enables multiple functionalities including cargo delivery, multimode locomotion, micromanipulation, medical imaging, and biosensing. The proposed microrobot is essentially Pickering bubbles composed of magnetic particles and air bubbles. Their hollow structures help produce lighter microrobots with density less than 1 g/cm^3, enabling buoyancy-based self-propulsion. Buoyancy and magnetic forces actuation enables flexible 3D locomotion in fluidic environments. Experimental results show that the microrobots can be controlled properly for designated assignments. Furthermore, the introduction of air bubble enhances ultrasound imaging, facilitating further in vivo applications. These findings offer a significant microrobot design paradigm by exploiting natural physical intelligence at the small scale.

10:30-12:00, Paper ThAT21-NT.4	Add to My Program
Magnetic Mobile Micro-Gripping MicroRobots (MMμGRs) with Two Independent Magnetic Actuation Modes

Davis, Aaron C.	Purdue University
Freeman, Emmett	Purdue University
Cappelleri, David	Purdue University
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Mechanism Design Abstract: In this paper, we introduce magnetic mobile micro-gripping microrobots with two independent actuation modes. By aligning two magnets with slight variations in magnetic moment orientations, we create a net magnetic moment for precise position and orientation control through external fields, while harnessing opposing torques on the magnets to induce internal stresses needed for gripping. Our microrobot design features a compliant spring-like structure for significant deflection, enabling a gripping motion under specific magnetic field conditions. Magnet rotation allows precise control over gripper actions, returning to a default state (normally open or closed) when the magnetic field diminishes. This work advances magnetic field-controlled microrobotics, bridging the millimeter-to-micrometer gap. It holds promise for applications in microsurgery, micro-assembly, and microscale exploration.

10:30-12:00, Paper ThAT21-NT.6	Add to My Program
Electroosmotic Self-Propelled Microswimmer with Magnetic Steering

Yamanaka, Toshiro	The University of Tokyo
Arai, Fumihito	The University of Tokyo
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Motion Control Abstract: Microswimmers have significant potential for medical applications such as long-term drug administration, precise surgery, and so on. The outstanding challenge is to realize power supply, propulsion, and steering mechanisms suitable for operations within the human body and microscale fluids. We propose the microswimmer composed of a self-propulsive disk-shaped module with multiple channels using biofuel cell (BFC) and electroosmotic propulsion (EOP) and a magnetic rod using magnetic steering (MS). The BFC produces an open-circuit potential (OCP) between a bioanode and a biocathode by redox reactions. The EOP generates a self-propulsive velocity due to counteracting forces of electroosmotic flows produced by the OCP in the channels arranged between the electrodes. The MS works by aligning the magnetic rod in a controlled magnetic field direction. The prototype was designed and fabricated using an insulating polymer layer, two conductive layers incorporating silver nanoparticles with anodic/cathodic enzymes, and a magnetic layer containing magnetic nanoparticles. The fast self-propulsion of continuously rotating 30 µm prototypes by the steering in a glucose solution was demonstrated as expected theoretically. This concept has the potential to be used as microrobots for future medical applications such as a pulling mechanism to assist in guidewire insertion or agents delivering drugs.

10:30-12:00, Paper ThAT21-NT.7	Add to My Program
A Flat Tendon-Driven Continuum Microrobot for Brain Interventions

Noseda, Lorenzo	EPFL
Liu, Addison	Harvard University
Pancaldi, Lucio	EPFL
Sakar, Mahmut Selman	EPFL
Keywords: Micro/Nano Robots, Automation at Micro-Nano Scales, Surgical Robotics: Steerable Catheters/Needles Abstract: Navigating biomedical instruments inside the brain remains challenging and high-risk. The delicate nature of the tissues involved requires the development of cutting- edge robotic technologies to enhance precision and safety. In response to these demands, this paper presents a novel ribbon-shaped, tendon-driven continuum microrobot designed explicitly to navigate through brain tissues. The microrobot has a cross-sectional area of 1 mm2, and its design is readily compatible with conventional microfabrication techniques for further miniaturization. The flat geometry aims to provide superior maneuverability and opens up new challenges for modeling and control. We detail the design methodology and fabrication, followed by in vitro characterization and testing within brain tissue phantoms.

10:30-12:00, Paper ThAT21-NT.8	Add to My Program
A Selectively Controllable Triple-Helical Micromotor

Zhao, Hongyu	Shenzhen Institute of Artificial Intelligence and Robotics for S
Ye, Min	Shenzhen Institute of Artificial Inteligence and Robotics for So
Nelson, Bradley J.	ETH Zurich
Wang, Xiaopu	Shenzhen Institute of Artificial Intelligence and Robotics for S
Keywords: Micro/Nano Robots, Biologically-Inspired Robots, Additive Manufacturing Abstract: Selective control mechanisms of microrobots have attracted significant attention from researchers. So far, selective control within multiple/swarm magnetic microrobots has been achieved with many strategies, such as utilizing locally specified magnetic fields, applying electrostatic anchoring, taking the advantages of geometry/wettability heterogeneity of the microrobots, etc. Using the step-out behavior of helical microrobots driven by a rotating magnetic field, researchers have proposed a mathematical model for multihelical motor that can be selectively controlled. Based on this model, we developed a micromotor that consists of three geometrically heterogeneous helices that can be selectively driven within a specific narrow frequency range. This type of micromotor shows bi-direction motion capability and has the potential to be used as an actuation unit for multiple types of functional micromechanisms.


ThAT22-NT Oral Session, NT-G304	Add to My Program
Space Robotics II

Chair: Abiko, Satoko	Shibaura Institute of Technology
Co-Chair: Hirano, Daichi	Japan Aerospace Exploration Agency

10:30-12:00, Paper ThAT22-NT.1	Add to My Program
System Identification of Space Manipulator Systems and Its Implications on Robust Control Performance

Rekleitis, Georgios	National Technical University of Athens
Papadopoulos, Evangelos	National Technical University of Athens
Keywords: Space Robotics and Automation, Calibration and Identification Abstract: Space manipulator system (SMS) maneuvers can excite flexible appendages, while fuel sloshing effects impact its dynamics and performance. To predict this behavior and control such systems, sloshing and flexible appendages are modeled. A novel system identification scheme is developed, which identifies all parameters required for the reconstruction of system dynamics despite unmeasurable sloshing and modal states. This is achieved by two identification experiments. In Exp.1 all unmeasurable states are eliminated, while in Exp.2 the unmeasurable sloshing states are eliminated, and a novel estimator is used for the unmeasurable modal states. The significance of accurate SYSID in controller design and performance is demonstrated by simulating a 3D SMS controlled by model-based and robust controllers. In both cases, using the identified parameters results in significant robust control performance enhancement.

10:30-12:00, Paper ThAT22-NT.2	Add to My Program
Model Design and Concept of Operations of Standard Interface for On-Orbit Construction

Zhao, Jingdong	Harbin Institute of Technology
Wang, Zirui	Harbin Institute of Technology
Liu, Ziyi	Harbin Institute of Technology
Zhao, Liangliang	Harbin Institute of Technology
Duan, Qifan	Harbin Institute of Technology
Liu, Hong	Harbin Institute of Technology
Keywords: Space Robotics and Automation, Compliance and Impedance Control, Mechanism Design Abstract: The construction of large-scale space facilities requires the use of on-orbit construction technology. However, several of its key components, such as standard interface design, compliant control methods, and path planning for multi-branch robots, still need improvement before practical application. This paper presents a comprehensive solution for on-orbit construction tasks, encompassing a novel standard interface, docking control method, and path planning method for space multi-branch robots. Firstly, a novel standard interface is introduced, which features multiple mating modes and a lightweight design. Additionally, a compliant docking method is provided to generate lower contact forces along the Z-direction. Furthermore, for four-armed space robots, a hierarchical planning method is proposed, which innovates in environment map construction and locomotion planning. Specifically, the closedform Minkowski sum method is employed to solve the robot’s free space, and a concise locomotion method is elucidated based on transition support points. Finally, simulations and experiments are conducted.

10:30-12:00, Paper ThAT22-NT.3	Add to My Program
Enabling Faster Locomotion of Planetary Rovers with a Mechanically-Hybrid Suspension

Rodríguez-Martínez, David	École Polytechnique Fédérale De Lausanne (EPFL)
Uno, Kentaro	Tohoku University
Sawa, Kenta	Tohoku University
Uda, Masahiro	Tohoku University
Kudo, Gen	Tohoku University
Diaz Huenupan, Gustavo Hernan	Tohoku University
Umemura, Ayumi	Tohoku University
Santra, Shreya	Tohoku University
Yoshida, Kazuya	Tohoku University
Keywords: Space Robotics and Automation, Compliant Joints and Mechanisms, Mechanism Design Abstract: The exploration of the lunar poles and the collection of samples from the martian surface are characterized by shorter time windows demanding increased autonomy and speeds. Autonomous mobile robots must intrinsically cope with a wider range of disturbances. Faster off-road navigation has been explored for terrestrial applications but the combined effects of increased speeds and reduced gravity fields are yet to be fully studied. In this paper, we design and demonstrate a novel fully passive suspension design for wheeled planetary robots, which couples for the first time a high-range passive rocker with elastic in-wheel coil-over shock absorbers. The design was initially conceived and verified in a reduced-gravity (1.625 m/s²) simulated environment, where three different passive suspension configurations were evaluated against steep slopes and unexpected obstacles, and later prototyped and validated in a series of field tests. The proposed mechanically-hybrid suspension proves to mitigate more effectively the negative effects (high-frequency/high-amplitude vibrations and impact loads) of faster locomotion (~1 m/s) over unstructured terrains under varied gravity fields.

10:30-12:00, Paper ThAT22-NT.4	Add to My Program
RoboBall: An All-Terrain Spherical Robot with a Pressurized Shell

Oevermann, Micah	Texas A&M University
Pravecek, Derek	Texas A&M University
Jibrail, Joseph Garrett	Texas A&M University
Jangale, Rishi	Texas A&M University
Ambrose, Robert	Texas A&M University
Keywords: Space Robotics and Automation, Compliant Joints and Mechanisms, Soft Robot Materials and Design Abstract: Spherical robots are a different type of mobility platform. A spherical robot is self-contained within its shell rather than relying on a chassis with wheels to navigate. In this shell, it is completely shielded from dust and the environment. This benefit of geometric simplicity has led to the spherical robot becoming an advantageous option for all-terrain exploration and surveying. This paper focuses on a novel iteration of such a robot with a pressurized pneumatic shell design. A soft robot of this type brings benefits of a passive, compliant contact surface that can affect its performance. However, the added softness of its shell adds new unmodeled dynamics into the system that impair commonly used control schemes. This paper outlines the design and manufacture of a soft, inflatable, spherical shell designed for a robot driven by an internal 2-DOF pendulum. In addition, it presents models for controlling the pendulum and understanding the shell dynamics. The paper concludes with experimental validations of these models and field tests of the system on slopes, gravel, rough grass, and on water.

10:30-12:00, Paper ThAT22-NT.5	Add to My Program
Enhanced Multifunctional Interface for Reconfigurability of Robotic Teams in Planetary Applications

Yüksel, Mehmed	DFKI GmbH
Brinkmann, Wiebke	DFKI Robotics Innovation Center Bremen
Jankovic, Marko	German Research Center for Artificial Intelligence GmbH (DFKI)
Kücüker, Hilmi Dogu	DFKI-RIC
Kirchner, Frank	University of Bremen
Keywords: Space Robotics and Automation, Mechanism Design, Engineering for Robotic Systems Abstract: Exploration missions on extra terrestrial celestial bodies are to date performed by complex and heavy robotic systems. The trend is towards lighter modular systems that can be (re)configured in situ according to mission specific requirements. To facilitate flexible configurability, a multifunctional interconnect is used to mechanically couple the involved systems while providing electrical power and data transmission. The paper presents the further development of the reliable electro-mechanical interface (EMI) from the TransTerrA project, which has been proven in several field tests and reached TRL 4. Docking under loads of up to 550 N has been successfully tested with the new design. The experiments presented include undocking at various inclinations with different loads expected for the application scenario.The maximum determined static load that can be carried by the further developed EMI is 2000 N. In further experiments, new contact blocks responsible for the transfer of electrical power and data were tested for water resistance and resilience to environmental factors, as well as power and data transfer.The obtained results will be helpful in the development of a multi-functional interface suitable for lunar applications and missions having similar challenging environmental conditions

10:30-12:00, Paper ThAT22-NT.6	Add to My Program
Autonomous Perching on Flat Surfaces for Free-Flying Robots with Gecko Adhesive Gripper

Hirano, Daichi	Japan Aerospace Exploration Agency
Tanishima, Nobutaka	JAXA
Chen, Tony G.	Stanford University
Keywords: Space Robotics and Automation, Grippers and Other End-Effectors, Compliant Joints and Mechanisms Abstract: Gecko-inspired adhesives have the advantage of being able to grasp and release flat surfaces in a vacuum using their microwedge structures. This makes them an especially attractive solution for perching on and grasping flat objects in space for free-flying robots. To grasp and anchor onto these flat surfaces, the gripper must ensure contact between the gecko adhesives and the surface before applying the appropriate forces to activate their adhesion. However, in the case of a free-flying robot in microgravity, physical contact with the surface induces reaction forces, causing the robot to quickly bounce away from the surface. To solve this issue, we propose a simple passive mechanism and a control method of a robotic arm on a free-flying robot with a gecko adhesive gripper. The gripper utilizes a single-motor controlled tendon-driven mechanism mounted at the end of a robotic arm equipped with controllable stiffness joints and a linear spring-damper system. A free-flying robot on an air-bearing platform can successfully perch on a flat surface with a velocity of up to 72.5 mm/s and with an approach angle misalignment of up to 33.0 degrees.

10:30-12:00, Paper ThAT22-NT.7	Add to My Program
VWDER: A Variable Wheel-Diameter Ellipsoidal Robot

Qin, Ziao	Beijing University of Posts and Telecommunications
Song, Jingzhou	Beijing University of Posts and Telecommunications
Gong, Xinglong	Beijing University of Posts and Telecommunications
Liu, Changrui	Beijing University of Posts and Telecommunications
Keywords: Space Robotics and Automation, Product Design, Development and Prototyping, Surveillance Robotic Systems Abstract: In recent years, many researchers have conducted extensive research on spherical robots due to their high flexibility and anti-overturning capabilities. Nevertheless, compared with legged and traditional wheeled robots, spherical robots face certain limitations. The spherical robot is composed of a closed spherical shell structure, which makes the capacity of carrying workloads weak. At the same time, the single point contact with the ground cause the contact friction force with the ground is small, so it is hard to climb obstacles such as steps and doorsill. Therefore, we propose a new solution: the variable wheel diameter ellipsoidal robot (VWDER), which combines the characteristics of two-wheel differential driven robot and spherical robot driven by equivalent pendulum.VWDER is equipped with six retractable shell-shaped legs on each side and this innovative design allows both wheels to independently change diameter while rolling. The main frame of the VWDER can keep the top of the frame facing up under the action of equivalent pendulum during the locomotion, which makes it possible to carry workloads such as manipulator arms, cameras, IMU etc. The VWDER robot can climb steps or doorsill using its two adjacent shell-shaped legs. This paper introduces the design of the VWDER and analyzes the kinematics and dynamics of the VWDER. The experimental results verified the performance of the VWDER, including its autonomous opening and closing, obstacle crossing, automatic reorientation and slope climbing etc.

10:30-12:00, Paper ThAT22-NT.8	Add to My Program
On Robust Control Laws Trade-Off Analysis for Space Manipulators with Uncertain Parameters and Flexible Appendages

Nanos, Kostas	National Technical University of Athens
Chachamis, Efstathios	National Technical University of Athens
Papadopoulos, Evangelos	National Technical University of Athens
Keywords: Space Robotics and Automation, Motion Control, Dynamics Abstract: To accurately accomplish on-orbit tasks using Space Manipulator Systems (SMS), advanced model-based controllers, dependent on the knowledge of SMS parameters, can be employed. However, these parameters may change on orbit for several reasons. Also, during an SMS task, excitation of flexible appendages, such as solar panels, or fuel sloshing may introduce significant end-effector errors. Therefore, controllers robust to parametric uncertainty and disturbances are needed. A robust controller attractive due to its small computational effort is the Linear Parameter Varying (LPV) gain-scheduled controller. However, its design for spatial SMS is not trivial and has not been studied yet. Therefore, the aim of this work is to study and compare robust controllers and examine their applicability to SMS. An LPV plus Hinf controller is compared with a Model-Based PD, and a Model-Based PD plus Hinf controller, in the presence of parametric uncertainty, noisy measurements and disturbances, using a planar example. The criteria considered include: (i) Design Complexity, (ii) Trajectory Errors, (iii) Required Torques, and (iv) Computational Effort.


ThAT23-NT Oral Session, NT-G401	Add to My Program
Aerial Systems: Perception and Autonomy I

Chair: Scherer, Sebastian	Carnegie Mellon University
Co-Chair: Kyriakopoulos, Kostas	New York University - Abu Dhabi

10:30-12:00, Paper ThAT23-NT.1	Add to My Program
N-MPC for Deep Neural Network-Based Collision Avoidance Exploiting Depth Images

Jacquet, Martin	NTNU
Alexis, Kostas	NTNU - Norwegian University of Science and Technology
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy Abstract: This paper introduces a Nonlinear Model Predictive Control (N-MPC) framework exploiting a Deep Neural Network for processing onboard-captured depth images for collision avoidance in trajectory-tracking tasks with UAVs. The network is trained on simulated depth images to output a collision score for queried 3D points within the sensor field of view. Then, this network is translated into an algebraic symbolic equation and included in the N-MPC, explicitly constraining predicted positions to be collision-free throughout the receding horizon. The N-MPC achieves real time control of a UAV with a control frequency of 100Hz. The proposed framework is validated through statistical analysis of the collision classifier network, as well as Gazebo simulations and real experiments to assess the resulting capabilities of the N-MPC to effectively avoid collisions in cluttered environments. The associated code is released open-source along with the training images.

10:30-12:00, Paper ThAT23-NT.2	Add to My Program
Aerial Transportation of Cable-Suspended Loads with an Event Camera

Panetsos, Fotis	National Technical University of Athens
Karras, George	University of Thessaly
Kyriakopoulos, Kostas	New York University - Abu Dhabi
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy Abstract: In this work, we investigate the integration of a Dynamic Vision Sensor (DVS) into an Unmanned Aerial Vehicle (UAV) with a cable-suspended load in order to achieve a robust and fast estimation of the cable's state during the transportation of the load. Based on the advantageous properties of event cameras, our ultimate goal is to design a computationally lightweight event processing method that persistently identifies the cable and estimates its complete state – required for any controller with feedback of the cable's state – within a much shorter time period compared to frame-based algorithms. Using a point cloud representation for the incoming event streams, the proposed method achieves the fast detection of the cable while the respective measurements are afterward fitted to a Bézier curve in order to approximate both the cable angle and angular velocity. Our method is initially validated in an indoor environment, where ground truth is available from a motion capture system, and is subsequently deployed in an outdoor one in order to evaluate its robustness against noise. Throughout the outdoor experiment, the feedback provided by the DVS is incorporated into a Nonlinear Model Predictive Control (NMPC) scheme which drives an octorotor towards reference setpoint positions while minimizing the cable angular motion.

10:30-12:00, Paper ThAT23-NT.3	Add to My Program
Autonomous Overhead Powerline Recharging for Uninterrupted Drone Operations

Duong Hoang, Viet	University of Southern Denmark
Falk Nyboe, Frederik	University of Southern Denmark
Malle, Nicolaj	University of Southern Denmark
Ebeid, Emad	University of Southern Denmark
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Aerial Systems: Mechanics and Control Abstract: We present a fully autonomous self-recharging drone system capable of long-duration sustained operations near powerlines. The drone is equipped with a robust onboard perception and navigation system that enables it to locate powerlines and approach them for landing. A passively actuated gripping mechanism grasps the powerline cable during landing after which a control circuit regulates the magnetic field inside a split-core current transformer to provide sufficient holding force as well as battery recharging. The system is evaluated in an active outdoor three-phase powerline environment. We demonstrate multiple contiguous hours of fully autonomous uninterrupted drone operations composed of several cycles of flying, landing, recharging, and takeoff, validating the capability of extended, essentially unlimited, operational endurance.

10:30-12:00, Paper ThAT23-NT.4	Add to My Program
LPNet: A Reaction-Based Local Planner for Autonomous Collision Avoidance Using Imitation Learning

Lu, Junjie	Tianjin University
Tian, Bailing	Tianjin University
Shen, Hongming	Nanyang Technological University
Zhang, Xuewei	Tianjin University
Hui, Yulin	Tianjin University
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Integrated Planning and Learning Abstract: In this work, we propose a reaction-based local planner for autonomous collision avoidance of quadrotor in obstacle-cluttered environment without relying on an explicit map. Our approach searches for feasible trajectory using a set of motion primitives in state lattice and represents the optimal one as a polynomial by solving an optimal control problem. A modified Q-network, termed LPNet, is presented to predict the action-values of motion primitives from the current depth image and the state estimation of the quadrotor directly. To train the proposed LPNet, a primitive-based expert policy with privileged information about the surroundings and unconstrained computational budget is developed to provide demonstrations for imitation learning. Finally, a series of experiments are conducted to demonstrate the effectiveness and time-efficiency of the proposed method in both simulation and real-world.

10:30-12:00, Paper ThAT23-NT.5	Add to My Program
Autonomous Exploration of Unknown 3D Environments Using a Frontier-Based Collector Strategy

Changoluisa Caiza, Ivan David	University of Zagreb
Milas, Ana	University of Zagreb, Faculty of Electrical Engineering and Comp
Montes Grova, Marco Antonio	Center for Advanced Aerospace Technologies (CATEC)
Perez Grau, Francisco Javier	Center for Advanced Aerospace Technologies
Petrovic, Tamara	Univ. of Zagreb
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Mapping Abstract: Autonomous exploration using unmanned aerial vehicles (UAVs) is essential for various tasks such as building inspections, rescue operations, deliveries, and warehousing. However, there are two main limitations to previous approaches: they may not be able to provide a complete map of the environment and assume that the map built during exploration is accurate enough for safe navigation, which is usually not the case. To address these limitations, a novel exploration method is proposed that combines frontier-based exploration with a collector strategy that achieves global exploration and complete map creation. In each iteration, the collector strategy stores and validates frontiers detected during exploration and selects the next best frontier to navigate to. The collector strategy ensures global exploration by balancing the exploitation of a known map with the exploration of unknown areas. In addition, the online path replanning ensures safe navigation through the map created during motion. The performance of the proposed method is verified by exploring 3D simulation environments in comparison with the state-of-the-art methods. Finally, the proposed approach is validated in a real-world experiment.

10:30-12:00, Paper ThAT23-NT.6	Add to My Program
Learning Multi-Scale Context Mask-RCNN Network for Slant Angled Aerial Imagery in Instance Segmentation Using Sim2Real

Saadiyean, Qiranul	Indian Institute of Science, Banglore
S P, Samprithi	PES University
Sundaram, Suresh	Indian Institute of Science
Keywords: Aerial Systems: Applications, Aerial Systems: Perception and Autonomy, Surveillance Robotic Systems Abstract: While instance segmentation models excel at object detection in satellite imagery, their performance drops when applied to slant-angled aerial images due to occlusion and scale variation. This is mainly caused by a lack of training data for such diverse viewpoints and scales. To address this limitation, we propose the Sim2Real-based Multi-Scale Context Mask-RCNN (MSC-RCNN) network, specifically designed for slant-angled aerial imagery. Sim2Real-based transfer learning is adapted to compensate for the limited availability of real-world slant-angle training data. A synthetic dataset is generated using Unreal Engine, detailing the methodology of replicating the real-world scene, for producing diverse slant-angle drone datasets with various weather conditions and backgrounds. The model leverages two distinct feature pyramid backbones, with one incorporating dilated convolutions to address large-scale objects and the other optimized for regular convolutions. Their outputs are fused to effectively detect objects across various scales and angles. Through experiments, it was demonstrated that incorporating this synthetic data significantly reduces reliance on real data while maintaining high mean Average Precision (mAP) scores. Compared to the baseline Mask R-CNN, the proposed approach with Sim2Real adaptation and the MSC-RCNN architecture achieves a remarkable 7.6% performance improvement in instance segmentation accuracy with only a 6% increase in model size.

10:30-12:00, Paper ThAT23-NT.7	Add to My Program
Real-Time Multi-Modal Active Vision for Object Detection on UAVs Equipped with Limited Field of View LiDAR and Camera

Shi, Chuanbeibei	Univeristy of Bristol
Lai, Ganghua	Beijing Institute of Technology
Yu, Yushu	Beijing Institute of Technology
Bellone, Mauro	Tallinn University of Technology
Lippiello, Vincenzo	University of Naples FEDERICO II
Keywords: Aerial Systems: Applications, Perception-Action Coupling, Sensor Fusion Abstract: This paper aims to solve the challenging problems in multi-modal active vision for object detection on unmanned aerial vehicles (UAVs) with a monocular camera and a limited Field of View (FoV) LiDAR. The point cloud acquired from the low-cost LiDAR is firstly converted into a 3-channel tensor via motion compensation, accumulation, projection, and upsampling processes. The generated 3-channel point cloud tensor and RGB image are fused into a 6-channel tensor using an early fusion strategy for object detection based on a Gaussian YOLO network structure. To solve the low computational resource problem and improve the real-time performance, the velocity information of the UAV is further fused with the detection results based on an extended Kalman Filter (EKF). A perception-aware model predictive control (MPC) is designed to achieve active vision on our UAV. According to our performance evaluation, our pre-processing step improves other literature methods running time by a factor of 10 while maintaining acceptable detection performance. Furthermore, our fusion architecture reaches 94.6 mAP on the test set, outperforming the individual sensor networks by roughly 5%. We also described an implementation of the overall algorithm on a UAV platform and validated it in real-world experiments.

10:30-12:00, Paper ThAT23-NT.8	Add to My Program
High-Speed Stereo Visual SLAM for Low-Powered Computing Devices

Kumar, Ashish	Indian Institute of Technology, Kanpur
Park, Jaesik	Seoul National University
Behera, Laxmidhar	IIT Kanpur
Keywords: Aerial Systems: Applications, SLAM, Embedded Systems for Robotic and Automation Abstract: We present an accurate and GPU-accelerated Stereo Visual SLAM design called Jetson-SLAM. It exhibits frame- processing rates above 60FPS on NVIDIA’s low-powered 10W Jetson-NX embedded computer and above 200FPS on desktop- grade 200W GPUs, even in stereo configuration and in the multiscale setting. Our contributions are threefold: (i) a Bounded Rectification technique to prevent tagging many non-corner points as a corner in FAST detection, improving SLAM accuracy. (ii) A novel Pyramidal Culling and Aggregation (PyCA) technique that yields robust features while suppressing redundant ones at high speeds by harnessing a GPU device. PyCA uses our new Multi-Location Per Thread culling strategy (MLPT) and Thread-Efficient Warp-Allocation (TEWA) scheme for GPU to enable Jetson-SLAM achieving high accuracy and speed on embedded devices. (iii) Jetson-SLAM library achieves resource efficiency by having a data-sharing mechanism. Our experiments on three challenging datasets: KITTI, EuRoC, and KAIST-VIO, and two highly accurate SLAM backends: Full- BA and ICE-BA show that Jetson-SLAM is the fastest available accurate and GPU-accelerated SLAM system (Fig. 1).


ThAT24-NT Oral Session, NT-G402	Add to My Program
Robotics and Automation in Agriculture and Forestry II

Chair: Karydis, Konstantinos	University of California, Riverside
Co-Chair: Singh, Arun Kumar	University of Tartu

10:30-12:00, Paper ThAT24-NT.1	Add to My Program
Robotic Assessment of a Crop's Need for Watering

Dechemi, Amel	University of California, Riverside
Chatziparaschis, Dimitrios	UC Riverside
Chen, Joshua	University of California, Riverside
Campbell, Merrick	University of California, Riverside
Shamshirgaran, Azin	University of California, Merced
Mucchiani, Caio	University of California Riverside
Roy-Chowdhury, Amit	University of California, Riverside
Carpin, Stefano	University of California, Merced
Karydis, Konstantinos	University of California, Riverside
Keywords: Agricultural Automation, Mechanism Design, Motion and Path Planning Abstract: This paper focuses on developing a robot-assisted system for stem water potential (SWP) measurement in orchards. SWP is a metric frequently used by agronomists and growers to optimize irrigation schedules for crops. However, such measurements are currently being made via a time- and labor-intensive procedure that faces the challenges of sparse sampling and human variability in determining SWP. In response to these challenges, our proposed robotic system aims to automate time-consuming and difficult to perform tasks, by collecting multiple leaves and automating some parts of the overall SWP analysis process. To achieve so, this work considers three core components: 1) informed planning, to determine where to collect leaves to get the most informative readings; 2) system design and integration for autonomous leaf retrieval with a mobile manipulator and a custom-made end-effector; and 3) learning-based machine vision for automated visual identification of leaf xylem wetness during SWP analysis. Taken together, these constitute the core building-blocks toward enabling complete robot autonomy in physical specimen sampling and transport in the field.

10:30-12:00, Paper ThAT24-NT.2	Add to My Program
Enhancement on Target-Gripper Alignment: A Tomato Harvesting Robot with Dual-Camera Image-Based Visual Servoing

Lian, Feng-Li	National Taiwan University
Wang, Lu-Ching	National Taiwan University
Chu, Yen-Cheng	National Taiwan University
Keywords: Agricultural Automation, Mobile Manipulation, Field Robots Abstract: Automation application on crops harvesting has increased in the past decades. Various types of harvesting robots are emerging in both commercial and research areas. One of the main challenges is the precision alignment for the gripper and the target crop. An undesired dislocation can do harm to both gripper and crop, which is mainly caused by the uncertainties from the sensors and the manipulator. To solve the problem, the dual-camera setup is designed and implemented on a self-built robot. The perception of the tomato is done by a fi xed depth camera and a camera without depth on the gripper. The proposed dual-camera IBVS (Image-Based Visual Servoing) controller is designed to deal with the image feedback from both cameras and the proof of asymptotically convergence is provided. Furthermore, the cumulative error compensation reduces the time for harvesting process. The experiments were conducted in the greenhouse and tested under various conditions. The time cost is formulated as a function and the success picking rate of tomatoes is 68.4%.

10:30-12:00, Paper ThAT24-NT.3	Add to My Program
Few-Shot Fruit Segmentation Via Transfer Learning

James, Jordan	University of Texas at Arlington
Manching, Heather K.	North Carolina State University
Hulse-Kemp, Amanda M.	North Carolina State University
Beksi, William J.	The University of Texas at Arlington
Keywords: Agricultural Automation, Computer Vision for Automation, Object Detection, Segmentation and Categorization Abstract: Advancements in machine learning, computer vision, and robotics have paved the way for transformative solutions in various domains, particularly in agriculture. For example, accurate identification and segmentation of fruits from field images plays a crucial role in automating jobs such as harvesting, disease detection, and yield estimation. However, achieving robust and precise infield fruit segmentation remains a challenging task since large amounts of labeled data are required to handle variations in fruit size, shape, color, and occlusion. In this paper, we develop a few-shot semantic segmentation framework for infield fruits using transfer learning. Concretely, our work is aimed at addressing agricultural domains that lack publicly available labeled data. Motivated by similar success in urban scene parsing, we propose specialized pre-training using a public benchmark dataset for fruit transfer learning. By leveraging pre-trained neural networks, accurate semantic segmentation of fruit in the field is achieved with only a few labeled images. Furthermore, we show that models with pre-training learn to distinguish between fruit still on the trees and fruit that have fallen on the ground, and they can effectively transfer the knowledge to the target fruit dataset.

10:30-12:00, Paper ThAT24-NT.4	Add to My Program
Spatio-Temporal Correspondence Estimation of Growing Plants by Hausdorff Distance Based Skeletonization for Organ Tracking

Pandey, Sharmistha B	ISRO
Colliaux, David	Sony CSL Paris
Chaudhury, Ayan	Indian Institute of Technology Kharagpur
Keywords: Agricultural Automation, Computer Vision for Automation, Visual Tracking Abstract: Tracking of plant organs over spatio-temporal sequence of point cloud data is one of the demanding tasks of agricultural robotics for automated plant monitoring and growth analysis. Due to the complex geometry of plants, it is extremely difficult to identify and track the individual organs in different growth stages of plants. In this paper, we present an approach to perform correspondence estimation of different plant organs over a series of spatio-temporal data. The approach is based on two stages. In the first stage we develop a robust skeleton extraction method from unstructured plant point cloud data by adopting Hausdorff distance metric and modified breadth first search algorithm. The proposed skeletonization method is shown to be performing better than state-of-the-art, especially in handling very thin and delicate branches. We also address an overlooked problem of connecting skeleton points in the form of a graph, and demonstrate that different types of plant phenotype parameters can be obtained in a fully automatic manner from the skeleton graph. In the second stage, we exploit the skeleton graphs in developing an algorithm to perform correspondence estimation among the skeleton nodes using a cosine similarity based approach. We demonstrate the effectiveness of the proposed skeletonization technique in tracking different organs of the plant by finding good quality correspondences. Experiments are performed on three datasets on real and synthetic sequence of spatio-temporal plant point cloud data to demonstrate the effectiveness of the proposed method.

10:30-12:00, Paper ThAT24-NT.5	Add to My Program
Field Robot for High-Throughput and High-Resolution 3D Plant Phenotyping

Esser, Felix	University of Bonn
Rosu, Radu Alexandru	University of Bonn
Cornelissen, Andre	University of Bonn
Klingbeil, Lasse	University of Bonn
Kuhlmann, Heiner	University of Bonn
Behnke, Sven	University of Bonn
Keywords: Agricultural Automation, Field Robots, Robotics and Automation in Agriculture and Forestry Abstract: With the need to feed a growing world population, the efficiency of crop production is of paramount importance. To support breeding and field management, various characteristics of the plant phenotype need to be measured a time-consuming process when performed manually. We present a robotic platform equipped with multiple laser and camera sensors for high-throughput, high-resolution in-field plant scanning. We create digital twins of the plants through 3D reconstruction. This allows the estimation of phenotypic traits such as leaf area, leaf angle, and plant height. We validate our system on a real field, where we reconstruct accurate point clouds and meshes of sugar beet, soybean, and maize.

10:30-12:00, Paper ThAT24-NT.6	Add to My Program
A Hybrid Controller Enhancing Transient Performance for an Aerial Manipulator Extracting a Wedged Object (I)

Byun, Jeonghyun	Seoul National University
Jang, Inkyu	Seoul National University
Lee, Dongjae	Seoul National University
Kim, H. Jin	Seoul National University
Keywords: Aerial Systems: Applications, Hybrid Logical/Dynamical Planning and Verification, Robust/Adaptive Control Abstract: Autonomous aerial manipulation requires the capability to handle inevitable dynamic changes during physical interaction. Previously, very few studies have addressed the stability and transient performance of the scenarios involving abrupt changes in dynamics. This paper proposes a hybrid controller enhancing transient performance for an aerial manipulator extracting an object wedged in a static structure. This task incurs a significant jump in the interaction force on the end-effector so that the analysis using the concept of hybrid dynamical systems is required. To demonstrate the dynamic characteristics of the object-extracting aerial manipulator, we derive the dynamic equations for two flight modes, i.e., free-flight and object-extracting, and the rule of state jumps. Also, we design control strategies which enhance the transient performance during flight mode transition. Then, the stability of the proposed control law is proven, and the overshoot reduction after the object extraction is analyzed. To show the improved performance, we conduct plug-pulling experiments with a quadrotor-based aerial manipulator using the proposed controller and two different existing controllers. The comparative results confirm that our controller enables the aerial manipulator to maintain its stability after the flight mode transition and shows the best transient performance in overshoot minimization among three controllers.

10:30-12:00, Paper ThAT24-NT.7	Add to My Program
VACNA: Visibility-Aware Cooperative Navigation with Application in Inventory Management

Masnavi, Houman	Institute of Technology, University of Tartu
Shrestha, Jatan	University of Tartu
Kruusamäe, Karl	University of Tartu
Singh, Arun Kumar	University of Tartu
Keywords: Aerial Systems: Applications, Machine Learning for Robot Control, Optimization and Optimal Control Abstract: This paper presents an online trajectory planning algorithm for an Unmanned Aerial Vehicle (UAV) to autonomously scan warehouse racks for inventory management. Our main motivation is to make small-sized UAVs with limited computing and sensing hardware capable of reliably performing the scanning task in cluttered environments. To this end, we propose a cooperative system where an Unmanned Ground Vehicle (UGV) guides the UAV using the novel template of visibility-aware cooperative navigation (VACNA). We propose a Cross-Entropy Method (CEM) based approach for solving the trajectory optimization underpinning VACNA. In particular, our CEM projects sampled vehicle trajectories onto the constraint sets before evaluating the cost functions. We further learn a deep generative model in the form of a Conditional Variational Autoencoder (CVAE) from expert demonstrations to warm-start our optimizer. We improve the state-of-the-art in the following respects. First, we present a detailed analysis of the role of our proposed cost and constraint functions for cooperative occlusion-free navigation. Second, we compare our custom CEM optimizer with conventional variants and show significantly reduced collision and occlusion rates. Finally, our CVAE initialization allows our optimizer to operate with smaller batch sizes and achieve real-time performance even on embedded hardware devices like NVIDIA Jetson Xavier.

10:30-12:00, Paper ThAT24-NT.8	Add to My Program
A CBF-Adaptive Control Architecture for Visual Navigation for UAV in the Presence of Uncertainties

Sankaranarayanan, Viswa Narayanan	Lulea University of Techonology
Saradagi, Akshit	Luleå University of Technology, Luleå, Sweden
Satpute, Sumeet	Luleå University of Technology
Nikolakopoulos, George	Luleå University of Technology
Keywords: Aerial Systems: Applications, Robot Safety Abstract: In this article, we propose a control solution for the safe transfer of a quadrotor UAV between two surface robots positioning itself only using the visual features on the surface robots, which enforces safety constraints for precise landing and visual locking, in the presence of modeling uncertainties and external disturbances. The controller handles the ascending and descending phases of the navigation using a visual locking control barrier function (VCBF) and a parametrizable switching descending CBF (DCBF) respectively, eliminating the need for an external planner. The control scheme has a backstepping approach for the position controller with the CBF filter acting on the position kinematics to produce a filtered virtual velocity control input, which is tracked by an adaptive controller to overcome modeling uncertainties and external disturbances. The experimental validation is carried out with a UAV that navigates from the base to the target using an RGB camera.


ThBT6-CC Oral Session, CC-414	Add to My Program
Computer Vision for Transportation

Chair: de Croon, Guido	TU Delft
Co-Chair: Valada, Abhinav	University of Freiburg

13:30-15:00, Paper ThBT6-CC.1	Add to My Program
Amodal Optical Flow

Luz, Maximilian	University of Freiburg
Mohan, Rohit	University of Freiburg
Sekkat, Ahmed Rida	IAV GmbH
Sawade, Oliver	IAV GmbH
Matthes, Elmar	IAV GmbH
Brox, Thomas	University of Freiburg
Valada, Abhinav	University of Freiburg
Keywords: Data Sets for Robotic Vision, Computer Vision for Transportation, Deep Learning for Visual Perception Abstract: Optical flow estimation is very challenging in situations with transparent or occluded objects. In this work, we address these challenges at the task level by introducing Amodal Optical Flow, which integrates optical flow with amodal perception. Instead of only representing the visible regions, we define amodal optical flow as a multi-layered pixel-level motion field that encompasses both visible and occluded regions of the scene. To facilitate research on this new task, we extend the AmodalSynthDrive dataset to include pixel-level labels for amodal optical flow estimation. We present several strong baselines, along with the Amodal Flow Quality metric to quantify the performance in an interpretable manner. Furthermore, we propose the novel AmodalFlowNet as an initial step toward addressing this task. AmodalFlowNet consists of a transformer-based cost-volume encoder paired with a recurrent transformer decoder which facilitates recurrent hierarchical feature propagation and amodal semantic grounding. We demonstrate the tractability of amodal optical flow in extensive experiments and show its utility for downstream tasks such as panoptic tracking. We make the dataset, code, and trained models publicly available at http://amodal-flow.cs.uni-freiburg.de.

13:30-15:00, Paper ThBT6-CC.2	Add to My Program
AutoGraph: Predicting Lane Graphs from Traffic Observations

Zürn, Jannik	University of Freiburg
Posner, Ingmar	Oxford University
Burgard, Wolfram	University of Technology Nuremberg
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation Abstract: Lane graph estimation is a long-standing problem in the context of autonomous driving. Previous works aimed at solving this problem by relying on large-scale, hand-annotated lane graphs, introducing a data bottleneck for training models to solve this task. To overcome this limitation, we propose to use the motion patterns of traffic participants as lane graph annotations. In our AutoGraph approach, we employ a pre-trained object tracker to collect the tracklets of traffic participants such as vehicles and trucks. Based on the location of these tracklets, we predict the successor lane graph from an initial position using overhead RGB images only, not requiring any human supervision. In a subsequent stage, we show how the individual successor predictions can be aggregated into a consistent lane graph. We demonstrate the efficacy of our approach on the UrbanLaneGraph dataset and perform extensive quantitative and qualitative evaluations, indicating that AutoGraph is on par with models trained on hand-annotated graph data. Model and dataset will be made available at http://autograph.cs.uni-freiburg.de/.

13:30-15:00, Paper ThBT6-CC.3	Add to My Program
3DSF-MixNet: Mixer-Based Symmetric Scene Flow Estimation from 3D Point Clouds

Wang, Shuaijun	Southern University of Science and Technology
Gao, Rui	Southern University of Science and Technology
Han, Ruihua	University of Hong Kong
Hao, Qi	Southern University of Science and Technology
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation Abstract: The scene flow estimation aims at accurately achieving the motion of 3D points, imposing challenges like mis-registration, object occlusions, and non-uniform upsampling. This paper introduces a scene flow estimation framework featuring a unified scene flow estimator, a symmetric cost volume approach, and a geometric/semantic feature based upsampling strategy. The novelty of this work is threefold: (1) developing a novel progressive framework which integrates the cost volume module and scene flow estimator, enhancing scene flow estimation. (2) developing a symmetric inter-frame correlation feature extraction method through CV estimation using MLP-Mixer operations; (3) developing an upsampling strategy based on both the semantic and geometric feature similarities between sparse and dense samples. Experiment results show that our method outperforms state-of-the-art baseline methods, especially in scenarios involving challenging conditions, the improvements of our method achieving at most 0.1094m/0.089m/0.091m in EPE3D, 54.23%/53.67%/74.1% in AS, 32.75%/21.87%/40.25% in AR, and 70.981%/58.06%/43.56% in outliers, when tested on FlyingThings3D (FT3D_S, FT3D_H) and KITTI_H datasets, respectively.

13:30-15:00, Paper ThBT6-CC.4	Add to My Program
CVFormer: Learning Circum-View Representation and Consistency Constraints for Vision-Based Occupancy Prediction Via Transformers

Bai, Zhengqi	Shanghai Institute of Microsystem and Information Technology
Shi, Wenjun	Shanghai Institute of Microsystem and Information Technology
Zhu, Dongchen	Shanghai Institute of Microsystem and Information Technology, Chi
Kang, HanLong	Lotus Robotics
Zhang, Guanghui	Shanghai Institute of Microsystem and Information Technology, Ch
Ye, Gang	Lotus Robotics
Xiao, Yang	Lotus Technology Ltd
Wang, Lei	Shanghai Institute of Microsystem and Information Technology, Ch
Zhang, Xiaolin	Shanghai Institute of Microsystem and Information Technology, Chi
Li, Bo	Lotus Technology Ltd
Li, Jiamao	Shanghai Institute of Microsystem and Information Technology, Chi
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Computer Vision for Automation Abstract: With the increasing demands for perception accuracy in autonomous driving, there is a growing focus on fine-grained 3D semantic occupancy prediction. Effectively representing detailed three-dimensional scenes has become a significant challenge in the development of this task. In this paper, we present a novel transformer-based framework named CVFormer, which leverages two-dimensional circum-views from the ego to excavate three-dimensional features of the surrounding environment. Circum-views provide a novel solution for effectively addressing the representation of dense and fine-grained scenes. Specifically, a multi-attention module CTMA is designed for fusing temporal features from circum-views to fully exploit the spatiotemporal correlations between frames and capture more comprehensive clues. Furthermore, a novel 2D projection constraint is established by observing objects from different perspective directions, and multiple 3D constraints based on object invariance and semantic consistency are also conducted for supervising the network, which enhances its performance of understanding the scene. Experimental results on nuScenes dataset demonstrate that the proposed CVFormer obviously outperforms existing methods for occupancy prediction.

13:30-15:00, Paper ThBT6-CC.5	Add to My Program
Lightweight Event-Based Optical Flow Estimation Via Iterative Deblurring

Wu, Yilun	TU Delft
Paredes-Valles, Federico	Delft University of Technology
de Croon, Guido	TU Delft
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Computer Vision for Automation Abstract: Inspired by frame-based methods, state-of-the-art event-based optical flow networks rely on the explicit construction of correlation volumes, which are expensive to compute and store, rendering them unsuitable for robotic applications with limited compute and energy budget. Moreover, correlation volumes scale poorly with resolution, prohibiting them from estimating high-resolution flow. We observe that the spatiotemporally continuous traces of events provide a natural search direction for seeking pixel correspondences, obviating the need to rely on gradients of explicit correlation volumes as such search directions. We introduce IDNet (Iterative Deblurring Network), a lightweight yet high-performing event-based optical flow network directly estimating flow from event traces without using correlation volumes. We further propose two iterative update schemes: "ID" which iterates over the same batch of events, and "TID" which iterates over time with streaming events in an online fashion. Our top-performing model (ID) sets a new state of the art on DSEC benchmark. Meanwhile, the base model (TID) is competitive with prior arts while using 80% fewer parameters, consuming 20x less memory footprint and running 40% faster on the NVidia Jetson Xavier NX. Furthermore, the "TID" scheme is even more efficient offering an additional 5x faster inference speed and 8 ms ultra-low latency at the cost of only a 9% performance drop, making it the only model among current literature capable of real-time operation while maintaining decent performance.

13:30-15:00, Paper ThBT6-CC.6	Add to My Program
ActFormer: Scalable Collaborative Perception Via Active Queries

Huang, Suozhi	Tsinghua University
Zhang, Juexiao	New York University
Li, Yiming	New York University
Feng, Chen	New York University
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Computer Vision for Automation Abstract: Collaborative perception leverages rich visual observations from multiple robots to extend a single robot's perception ability beyond its field of view. Many prior works receive messages broadcast from all collaborators, leading to a scalability challenge when dealing with a large number of robots and sensors. In this work, we aim to address scalable camera-based collaborative perception with a Transformer-based architecture. Our key idea is to enable a single robot to intelligently discern the relevance of the collaborators and their associated cameras according to a learned spatial prior. This proactive understanding of the visual features' relevance does not require the transmission of the features themselves, enhancing both communication and computation efficiency. Specifically, we present ActFormer, a Transformer that learns bird's eye view (BEV) representations by using predefined BEV queries to interact with multi-robot multi-camera inputs. Each BEV query can actively select relevant cameras for information aggregation based on pose information, instead of interacting with all cameras indiscriminately. Experiments on the V2X-Sim dataset demonstrate that ActFormer improves the detection performance from 29.89% to 45.15% in terms of AP@0.7 with about 50% fewer queries, showcasing the effectiveness of ActFormer in multi-agent collaborative 3D object detection.

13:30-15:00, Paper ThBT6-CC.7	Add to My Program
LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models

Nakashima, Kazuto	Kyushu University
Kurazume, Ryo	Kyushu University
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Representation Learning Abstract: Generative modeling of 3D LiDAR data is an emerging task with promising applications for autonomous mobile robots, such as scalable simulation, scene manipulation, and sparse-to-dense completion of LiDAR point clouds. While existing approaches have demonstrated the feasibility of image-based LiDAR data generation using deep generative models, they still struggle with fidelity and training stability. In this work, we present R2DM, a novel generative model for LiDAR data that can generate diverse and high-fidelity 3D scene point clouds based on the image representation of range and reflectance intensity. Our method is built upon denoising diffusion probabilistic models (DDPMs), which have shown impressive results among generative model frameworks in recent years. To effectively train DDPMs in the LiDAR domain, we first conduct an in-depth analysis of data representation, loss functions, and spatial inductive biases. Leveraging our R2DM model, we also introduce a flexible LiDAR completion pipeline based on the powerful capabilities of DDPMs. We demonstrate that our method surpasses existing methods in generating tasks on the KITTI-360 and KITTI-Raw datasets, as well as in the completion task on the KITTI-360 dataset. Our project page can be found at url{https://kazuto1011.github.io/r2dm}.

13:30-15:00, Paper ThBT6-CC.8	Add to My Program
Multi-Task Learning for Real-Time Autonomous Driving Leveraging Task-Adaptive Attention Generator

Choi, Wonhyeok	DGIST
Shin, Mingyu	Daegu Gyeongbuk Institute of Science and Technology
Lee, Hyukzae	Hyundai Motor Company
Cho, Jaehoon	Hyundai Motor Company R&D Division
Park, Jaehyeon	Hyundai Motor
Im, Sunghoon	DGIST
Keywords: Deep Learning for Visual Perception, Computer Vision for Transportation, Vision-Based Navigation Abstract: Real-time processing is crucial in autonomous driving systems due to the imperative of instantaneous decision-making and rapid response. In real-world scenarios, autonomous vehicles are continuously tasked with interpreting their surroundings, analyzing intricate sensor data, and making decisions within split seconds to ensure safety through numerous computer vision tasks. In this paper, we present a new real-time multi-task network adept at three vital autonomous driving tasks: monocular 3D object detection, semantic segmentation, and dense depth estimation. To counter the challenge of negative transfer — the prevalent issue in multi-task learning — we introduce a task-adaptive attention generator. This generator is designed to automatically discern interrelations across the three tasks and arrange the task-sharing pattern, all while leveraging the efficiency of the hard-parameter sharing approach. To the best of our knowledge, the proposed model is pioneering in its capability to concurrently handle multiple tasks, notably 3D object detection, while maintaining real-time processing speeds. Our rigorously optimized network, when tested on the Cityscapes-3D datasets, consistently outperforms various baseline models. Moreover, an in-depth ablation study substantiates the efficacy of the methodologies integrated into our framework.


ThBT7-CC Oral Session, CC-416	Add to My Program
Machine Learning for Robot Control I

Chair: Falotico, Egidio	Scuola Superiore Sant'Anna
Co-Chair: Controzzi, Marco	Scuola Superiore Sant'Anna

13:30-15:00, Paper ThBT7-CC.1	Add to My Program
Adaptive Robot-Human Handovers with Preference Learning

Perovic, Gojko	Scuola Superiore Sant'Anna
Iori, Francesco	Scuola Superiore Sant'Anna
Mazzeo, Angela	Scuola Superiore Sant'Anna
Controzzi, Marco	Scuola Superiore Sant'Anna
Falotico, Egidio	Scuola Superiore Sant'Anna
Keywords: Machine Learning for Robot Control, Human-Robot Collaboration, Human-Aware Motion Planning Abstract: This paper proposes an adaptive method for robot-to-human handovers under different scenarios. The method combines Dynamic Movement Primitives (DMP) with Preference Learning (PL) to generate online trajectories that are reactive to human motion, modulating the speed of the robot. The PL allows for tuning the coupling parameters of the DMP, tailoring the interaction to each participant personally, and allowing for qualitative analysis of user preferences. Simulation of an interaction-constrained learning task with different optimization techniques is performed to determine an appropriate learning approach for a handover task. The validity of the approach is demonstrated through experiments with participants on two handover tasks, with results indicating that the proposed method leads to seamless and pleasurable interactions.

13:30-15:00, Paper ThBT7-CC.2	Add to My Program
NaviFormer: A Data-Driven Robot Navigation Approach Via Sequence Modeling and Path Planning with Safety Verification

Zhang, Xuyang	University of Science and Technology of China
Feng, Ziyang	University of Science and Technology of China
Qiu, Quecheng	School of Data Science, USTC, Hefei 230026, China
Chen, Yu'an	University of Science and Technology of China
Hua, Bei	University of Science and Technology of China
Ji, Jianmin	University of Science and Technology of China
Keywords: Machine Learning for Robot Control, Reinforcement Learning Abstract: Reinforcement learning has shown great potential in improving the performance of robot navigation. In response to the increasing deployments of mobile robots within various scenarios, a data-driven paradigm of navigation approach with safety verification is preferred where one can train RL algorithms with large amounts of prior data, keep learning continuously, and ensure safe navigation in applications. Conventional end-to-end reinforcement learning navigation paradigms have encountered multiple challenges in meeting these demands. In this work, we introduce a novel robot navigation approach termed NaviFormer. This approach handles navigation tasks based on sequence modeling to obtain the data-driven ability. It also integrates rule-based verification for safety insurance. We conduct a series of experiments to validate the data-driven ability of our approach and to compare it with existing navigation methods. We also perform quantitative tests on a real-world robot platform, TurtleBot. The experimental results show our method's outstanding data-driven ability and highlight its superior arrival rate and generalization compared to other state-of-the-art methods like the PPO-based navigation method.

13:30-15:00, Paper ThBT7-CC.3	Add to My Program
Real-Time Adaptive Safety-Critical Control with Gaussian Processes in High-Order Uncertain Models

Zhang, Yu	Technical University of Munich
Wen, Long	Technical University of Munich
Yao, Xiangtong	Technical University of Munich
Bing, Zhenshan	Technical University of Munich
Kong, Linghuan	University of Macau
He, Wei	University of Science and Technology Beijing
Knoll, Alois	Tech. Univ. Muenchen TUM
Keywords: Machine Learning for Robot Control, Robot Safety, Collision Avoidance Abstract: This paper presents an adaptive online learning framework for systems with uncertain parameters to ensure safety-critical control in non-stationary environments. Our approach consists of two phases. The initial phase is centered on a novel sparse Gaussian process (GP) framework. We first integrate a forgetting factor to refine a variational sparse GP algorithm, thus enhancing its adaptability. Subsequently, the hyperparameters of the Gaussian model are trained with a specially compound kernel, and the Gaussian model’s online inferential capability and computational efficiency are strengthened by updating a solitary inducing point derived from newly samples, in conjunction with the learned hyperparameters. In the second phase, we propose a safety filter based on high order control barrier functions (HOCBFs), synergized with the previously trained learning model. By leveraging the compound kernel from the first phase, we effectively address the inherent limitations of GPs in handling highdimensional problems for real-time applications. The derived controller ensures a rigorous lower bound on the probability of satisfying the safety specification. Finally, the efficacy of our proposed algorithm is demonstrated through real-time obstacle avoidance experiments executed using both simulation platform and a real-world 7-DOF robot.

13:30-15:00, Paper ThBT7-CC.4	Add to My Program
DeformNet: Latent Space Modeling and Dynamics Prediction for Deformable Object Manipulation

Li, Chenchang	Tsinghua University
Ai, Zihao	Tsinghua University
Wu, Tong	Tsinghua University
Li, Xiaosa	Tsinghua University
Ding, Wenbo	Tsinghua University
Xu, Huazhe	Tsinghua University
Keywords: Machine Learning for Robot Control, Representation Learning, AI-Based Methods Abstract: Manipulating deformable objects is a ubiquitous task in household environments, demanding adequate representation and accurate dynamics prediction due to the objects' infinite degrees of freedom. This work proposes DeformNet, which utilizes latent space modeling with a learned 3D representation model to tackle these challenges effectively. The proposed representation model combines a PointNet encoder and a conditional neural radiance field (NeRF), facilitating a thorough acquisition of object deformations and variations in lighting conditions. To model the complex dynamics, we employ a recurrent state-space model (RSSM) that accurately predicts the transformation of the latent representation over time. Extensive simulation experiments with diverse objectives demonstrate the generalization capabilities of DeformNet for various deformable object manipulation tasks, even in the presence of previously unseen goals. Finally, we deploy DeformNet on an actual UR5 robotic arm to demonstrate its capability in real-world scenarios.

13:30-15:00, Paper ThBT7-CC.5	Add to My Program
Actor-Critic Model Predictive Control

Romero, Angel	University of Zurich
Song, Yunlong	University of Zurich
Scaramuzza, Davide	University of Zurich
Keywords: Machine Learning for Robot Control, Reinforcement Learning, Aerial Systems: Applications Abstract: An open research question in robotics is how to combine the benefits of model-free reinforcement learning (RL) - known for its strong task performance and flexibility in optimizing general reward formulations - with the robustness and online replanning capabilities of model predictive control (MPC). This paper provides an answer by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an actor-critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in both simulation and the real world with a quadcopter platform across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out of distribution behaviour.

13:30-15:00, Paper ThBT7-CC.6	Add to My Program
Tractable Joint Prediction and Planning Over Discrete Behavior Modes for Urban Driving

Villaflor, Adam	CMU
Yang, Brian	University of California, Berkeley
Su, Huangyuan	Harvard University
Fragkiadaki, Aikaterini	Carnegie Mellon University
Dolan, John M.	Carnegie Mellon University
Schneider, Jeff	Carnegie Mellon University
Keywords: Machine Learning for Robot Control, Deep Learning Methods Abstract: Significant progress has been made in training multimodal trajectory forecasting models for autonomous driving. However, effectively integrating these models with downstream planners and model-based control approaches is still an open problem. Although these models have conventionally been evaluated for open-loop prediction, we show that they can be used to parameterize autoregressive closed-loop models without retraining. We consider recent trajectory prediction approaches which leverage learned anchor embeddings to predict multiple trajectories, finding that these anchor embeddings can parameterize discrete and distinct modes representing high-level driving behaviors. We propose to perform fully reactive closed-loop planning over these discrete latent modes, allowing us to tractably model the causal interactions between agents at each step. We validate our approach on a suite of more dynamic merging scenarios, finding that our approach avoids the frozen robot problem which is pervasive in conventional planners. Our approach also outperforms the previous state-of-the-art in CARLA on challenging dense traffic scenarios when evaluated at realistic speeds.

13:30-15:00, Paper ThBT7-CC.7	Add to My Program
GenDOM: Generalizable One-Shot Deformable Object Manipulation with Parameter-Aware Policy

Kuroki, So	The University of Tokyo
Guo, Jiaxian	The University of Tokyo
Matsushima, Tatsuya	The University of Tokyo
Okubo, Takuya	University of Tokyo
Kobayashi, Masato	Osaka University
Ikeda, Yuya	University of Tokyo
Takanami, Ryosuke	The University of Tokyo
Yoo, Paul	The University of Tokyo
Matsuo, Yutaka	The University of Tokyo
Iwasawa, Yusuke	The University of Tokyo
Keywords: Machine Learning for Robot Control, Transfer Learning, Deep Learning in Grasping and Manipulation Abstract: Due to the inherent uncertainty in their deformability during motion, previous methods in deformable object manipulation, such as rope and cloth, often required hundreds of real-world demonstrations to train a manipulation policy for each object, which hinders their applications in our ever-changing world. To address this issue, we introduce GenDOM, a framework that allows the manipulation policy to handle different deformable objects with only a single real-world demonstration. To achieve this, we augment the policy by conditioning it on deformable object parameters and training it with a diverse range of simulated deformable objects so that the policy can adjust actions based on different object parameters. At the time of inference, given a new object, GenDOM can estimate the deformable object parameters with only a single real-world demonstration by minimizing the disparity between the grid density of point clouds of real-world demonstrations and simulations in a differentiable physics simulator. Empirical validations on both simulated and real-world object manipulation setups clearly show that our method can manipulate different objects with a single demonstration and significantly outperforms the baseline in both environments (a 62% improvement for in-domain ropes and a 15% improvement for out-of-distribution ropes in simulation, as well as a 26% improvement for ropes and a 50% improvement for cloths in the real world), demonstrating the effectiveness of our approach in one-shot deformable object manipulation.


ThBT15-AX Oral Session, AX-203	Add to My Program
Human-Aware Motion Planning

Chair: Chen, Yuxiao	Nvidia Research
Co-Chair: Bera, Aniket	Purdue University

13:30-15:00, Paper ThBT15-AX.1	Add to My Program
Virtual Borders in 3D: Defining a Drone’s Movement Space Using Augmented Reality

Riechmann, Malte	Bielefeld University of Applied Sciences
Kirsch, André	Bielefeld University of Applied Sciences and Arts
König, Matthias	Bielefeld University of Applied Sciences
Rexilius, Jan	Bielefeld University of Applied Sciences and Arts
Keywords: Human-Aware Motion Planning, Virtual Reality and Interfaces, Aerial Systems: Perception and Autonomy Abstract: Robots are increasingly finding their way into home environments, where they can assist with household tasks like vacuuming or surveilling. While the robots can navigate on their own, users might not want them to go everywhere or not in a specific way. For example, users might not want a drone to fly over a table where important letters and the newspaper are stored, even though it is the shortest path to the goal. Therefore, an application is required, that is easy to learn and to apply even for inexperienced users. In this paper, we present a framework that uses a tablet as augmented reality (AR) device to modify a robot’s movement space in 3D. A user can define virtual borders in the real world with the tablet and add them to a map, changing the navigational behavior of the robot. The framework is evaluated by a user study with inexperienced participants that verifies our approach. Further analyses show, that even complex scenarios can be covered with our framework.

13:30-15:00, Paper ThBT15-AX.2	Add to My Program
Trajectory Prediction for Robot Navigation Using Flow-Guided Markov Neural Operator

Bhaskara, Rashmi	Purdue University
Viswanath, Hrishikesh	Purdue University
Bera, Aniket	Purdue University
Keywords: Human-Aware Motion Planning, Deep Learning Methods Abstract: Predicting pedestrian movements remains a complex and persistent challenge in robot navigation research. We must evaluate several factors to achieve accurate predictions, such as pedestrian interactions, the environment, crowd density, and social and cultural norms. Accurate prediction of pedestrian paths is vital for ensuring safe human-robot interaction, especially in robot navigation. Furthermore, this research has potential applications in autonomous vehicles, pedestrian tracking, and human-robot collaboration. Therefore, in this paper, we introduce FlowMNO, an Optical Flow-Integrated Markov Neural Operator designed to capture pedestrian behavior across diverse scenarios. Our paper models trajectory prediction as a Markovian process, where future pedestrian coordinates depend solely on the current state. This problem formulation eliminates the need to store previous states. We conducted experiments using standard benchmark datasets like ETH, HOTEL, ZARA1, ZARA2, UCY, and RGB-D pedestrian datasets. Our study demonstrates that FlowMNO outperforms some of the state-of-the-art deep learning methods like LSTM, GAN, and CNN-based approaches, by approximately 86.46% when predicting pedestrian trajectories. Thus, we show that FlowMNO can seamlessly integrate into robot navigation systems, enhancing their ability to navigate crowded areas smoothly.

13:30-15:00, Paper ThBT15-AX.3	Add to My Program
Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-Based Social Robot Navigation

Pohland, Sara	University of California, Berkeley
Tan, Alvin	University of California, Berkeley
Dutta, Prabal	University of California, Berkeley
Tomlin, Claire	UC Berkeley
Keywords: Human-Aware Motion Planning, Reinforcement Learning Abstract: Reinforcement learning (RL) methods for social robot navigation show great success navigating robots through large crowds of people, but the performance of these learning-based methods tends to degrade in particularly challenging or unfamiliar situations due to the models' dependency on representative training data. To ensure human safety and comfort, it is critical that these algorithms handle uncommon cases appropriately, but the low frequency and high diversity of such situations present a significant challenge for these data-driven methods. To overcome this issue, we propose modifications to the learning process that encourage these RL policies to maintain additional caution in unfamiliar situations. Specifically, we improve the Socially Attentive Reinforcement Learning (SARL) policy by (1) modifying the training process to systematically introduce deviations into a pedestrian model, (2) updating the value network to estimate and utilize pedestrian-unpredictability features, and (3) implementing a reward function to learn an effective response to pedestrian unpredictability. Compared to the original SARL policy, our modified policy maintains similar navigation times and path lengths, while reducing the number of collisions by 82% and reducing the proportion of time spent in the pedestrians' personal space by up to 19 percentage points for the most difficult cases. We also describe how to apply these modifications to other RL policies and demonstrate that key high-level behaviors of our approach transfer to a physical robot.

13:30-15:00, Paper ThBT15-AX.4	Add to My Program
Robot Navigation in Risky, Crowded Environments: Understanding Human Preferences

Suresh, Aamodh	US Army Research Laboratory
Taylor, Angelique	Cornell Tech
Riek, Laurel D.	University of California San Diego
Martinez, Sonia	UC San Diego
Keywords: Human-Aware Motion Planning, Social HRI, Human-Centered Robotics Abstract: The effective deployment of robots in risky and crowded environments (RCE) requires the specification of robot plans that are consistent with humans' behaviors. As is well known, humans perceive uncertainty and risk in a biased way, which can lead to a diversity of actions and expectations when interacting with others. To gain a better understanding of these behaviors, this work presents new data that aims to verify how these biases translate into a human navigational setting. More precisely, we conduct a novel study that recreates a COVID-19 pandemic grocery shopping scenario and asks participants to select among various paths with different levels of} time-risk tradeoffs. The data shows that participants exhibit a variety of path preferences: from risky and urgent to safe and relaxed. To model users' decision making, we evaluate three popular risk models and found that CPT captures people's decisions more accurately, corroborating previous theoretical results that CPT is more expressive and inclusive. We also find that people's self assessments of risk and time-urgency do not correlate with their path preferences in RCEs. Finally, we conduct thematic analysis of custom open-ended questions to gauge interest and preferences of navigational Explainable AI (XAI) in robots. A large majority also showed interest in understanding robot's intention (path plans and decisions) through various modalities like speech, touchscreen and gestures. We provide crucial XAI design insights.

13:30-15:00, Paper ThBT15-AX.5	Add to My Program
MAC-ID: Multi-Agent Reinforcement Learning with Local Coordination for Individual Diversity

Chung, Hojun	Seoul National University
Oh, Jeongwoo	Seoul National University
Heo, Jae Seok	Seoul National University
Lee, Gunmin	Seoul National University
Oh, Songhwai	Seoul National University
Keywords: Human-Aware Motion Planning, Data Sets for Robot Learning, Motion and Path Planning Abstract: With the increase of robots navigating through crowded environments in our daily lives, the demand for designing a socially-aware navigation method considering human-robot interaction has risen. When developing and assessing socially-aware navigation methods, pedestrian motion modeling plays a significant role. However, existing pedestrian models often struggle in complex environments and do not have the capacity to generate diverse pedestrian styles. In this paper, we propose multi-agent reinforcement learning with local coordination for individual diversity (MAC-ID), which can synthesize diverse pedestrian motions via local coordination factor (LCF). Our experiments have demonstrated that the manipulation of the LCF induces interpretable changes in pedestrian behaviors, along with a superior performance compared to existing pedestrian motion models. For evaluating socially-aware navigation methods using MAC-ID, we present a novel benchmark called BSON. It offers realistic and diverse social environments with pedestrians modeled via MAC-ID. We have trained and compared various navigation methods in BSON using a newly proposed metric called socially-aware navigation score. Through BSON, users can evaluate their socially-aware navigation methods and compare them to baselines.

13:30-15:00, Paper ThBT15-AX.6	Add to My Program
Interactive Joint Planning for Autonomous Vehicles

Chen, Yuxiao	Nvidia Research
Veer, Sushant	NVIDIA
Karkus, Peter	NVIDIA
Pavone, Marco	Stanford University
Keywords: Human-Aware Motion Planning, Integrated Planning and Learning, Motion and Path Planning Abstract: In highly interactive driving scenarios, the actions of one agent greatly influence those of its neighbors. Planning safe motions for autonomous vehicles in such interactive environments, therefore, requires reasoning about the impact of the ego’s intended motion plan on nearby agents’ behavior. Deep-learning-based models have recently achieved considerable success in trajectory prediction and many models in the literature allow for ego-conditioned prediction. However, leveraging ego-conditioned prediction remains challenging in downstream planning due to the complex nature of neural networks, limiting the planner structure to simple ones, e.g., sampling-based planners. Despite the ability of gradient-based planning algorithms, such as model predictive control (MPC), to generate fine-grained high-quality motion plans, it is difficult for them to leverage ego-conditioned prediction due to their iterative nature and need for gradients. We present Interactive Joint Planning (IJP) that bridges MPC with learned prediction models in a computationally scalable manner to provide us with the best of both worlds. In particular, IJP jointly optimizes over the behavior of the ego and the surrounding agents and leverages deep-learned prediction models as prediction priors that the join trajectory optimization tries to stay close to. Furthermore, by leveraging free-end homotopy classes—a novel concept we introduce in this paper—IJP efficiently searches over diverse motion plans. Closed-

13:30-15:00, Paper ThBT15-AX.7	Add to My Program
Improve Computing Efficiency and Motion Safety by Analyzing Environment with Graphics (I)

Zhang, Qianyi	Nankai University
Wu, Shichao	Nankai University
Jia, Yuhang	Nankai University
Xu, Yuang	Nankai University
Liu, Jingtai	Nankai University
Keywords: Human-Aware Motion Planning, Nonholonomic Motion Planning, Collision Avoidance Abstract: Exploring topologically distinctive trajectories provides more options for robot motion planning. Since computing time grows greatly with environment complexity, improving exploration efficiency and picking the optimal trajectory in complex environments are critical issues. To this end, this paper proposes a Graphic- and Timed-Elastic-Band-based approach (GraphicTEB) with spatial completeness and high computing efficiency. The environment is analyzed utilizing computer graphics, where obstacles are extracted as nodes and their relationships are built as edges. Three contributions are presented. 1) By assembling directed detours formed by nodes and segmented paths formed by edges, a generalized path consisting of nodes and edges derives various normal paths efficiently. 2) By multiplying two vectors starting from the obstacle point closest to the waypoint and the boundary point farthest from the waypoint, an novel obstacle gradient is introduced to guide safer optimization. 3) By assigning edges with asymmetric Gaussian model, a trajectory evaluation strategy is designed to reflect motion tendency and motion uncertainty of dynamic obstacles. Qualitative and quantitative simulations demonstrate that the proposed GraphicTEB achieves spatial completeness, higher scene pass rate, and fastest computing efficiency. Experiments are implemented in long corridor and broad room scenarios, where the robot goes through gaps safely, finds trajectories quickly, passes pedestrians politely.

13:30-15:00, Paper ThBT15-AX.8	Add to My Program
Generating Environment-Based Explanations of Motion Planner Failure: Evolutionary and Joint-Optimization Algorithms

Liu, Qishuai	University of Nebraska-Lincoln
Brandao, Martim	King's College London
Keywords: Human-Aware Motion Planning, Human-Centered Robotics Abstract: Motion planning algorithms are important components of autonomous robots, which are difficult to understand and debug when they fail to find a solution to a problem. In this paper we propose a solution to the failure-explanation problem, which are automatically-generated environment-based explanations. These explanations reveal the objects in the environment that are responsible for the failure, and how their location in the world should change so as to make the planning problem feasible. Concretely, we propose two methods - one based on evolutionary optimization and another on joint trajectory-and-environment continuous-optimization. We show that the evolutionary method is well-suited to explain sampling-based motion planners, or even optimization-based motion planners in situations where computation speed is not a concern (e.g. post-hoc debugging). However, the optimization-based method is 4000 times faster and thus more attractive for interactive applications, even though at the cost of a slightly lower success rate. We demonstrate the capabilities of the methods through concrete examples and quantitative evaluation.


ThBT16-AX Oral Session, AX-204	Add to My Program
Wearable Robotics I

Chair: Zhu, Yaonan	Nagoya University
Co-Chair: Hassan, Modar	University of Tsukuba

13:30-15:00, Paper ThBT16-AX.1	Add to My Program
Vision-Based Wearable Steering Assistance for People with Impaired Vision in Jogging

Liu, Xiaotong	University of Science and Technology of China
Wang, Binglu	Northwestern Polytechnical University
Li, Zhijun	University of Science and Technology of China
Keywords: Wearable Robotics, Human Performance Augmentation, Data Sets for Robotic Vision Abstract: Outdoor sports pose a challenge for people with impaired vision. The demand for higher-speed mobility inspired us to develop a vision-based wearable steering assistance. To ensure broad applicability, we focused on a representative sports environment, the athletics track. Our efforts centered on improving the speed and accuracy of perception, enhancing planning adaptability for the real world, and providing swift and safe assistance for people with impaired vision. In perception, we engineered a lightweight multitask network capable of simultaneously detecting track lines and obstacles. Additionally, due to the limitations of existing datasets for supporting multi-task detection in athletics tracks, we diligently collected and annotated a new dataset (MAT) containing 1000 images. In planning, we integrated the methods of sampling and spline curves, addressing the planning challenges of curves. Meanwhile, we utilized the positions of the track lines and obstacles as constraints to guide people with impaired vision safely along the current track. Our system is deployed on an embedded device, Jetson Orin NX. Through outdoor experiments, it demonstrated adaptability in different sports scenarios, assisting users in achieving free movement of 400 meter at an average speed of 1.34 m/s, meeting the level of normal people in jogging. Our MAT dataset is publicly available from https://github.com/snoopy-l/MAT

13:30-15:00, Paper ThBT16-AX.2	Add to My Program
Variable Grounding Flexible Limb Tracking Center of Gravity for Sit-To-Stand Transfer Assistance

Sugiura, Sojiro	Nagoya University
Unde, Jayant	Nagoya University
Zhu, Yaonan	Nagoya University
Hasegawa, Yasuhisa	Nagoya University
Keywords: Wearable Robotics, Physically Assistive Devices, Mechanism Design Abstract: Wearable robotic limbs support sit-to-stand (STS) transfer with increased stability while maintaining a compact size, as well as providing body weight support. This paper proposes a new robotic limb, Variable Grounding Flexible Limb (VGFL). The VGFL achieved a novel strategy in which its grounding point tracks the forward shift of the wearer's center of gravity (CoG) during STS. Since the strategy keeps the distance between the grounding point and the CoG close, an upward force can efficiently work for the STS assistance. To implement the strategy, this paper utilized the High-Strength & Flexible Mechanism (HSFM). The HSFM can change its grounding point while being capable of lifting body weight with one motor. Owing to these unique characteristics, the VGFL can realize the strategy without multiple actuators and complex controllers. Furthermore, real-world experiments confirmed that the grounding point of the VGFL accurately tracked the forward shift of the CoG during the STS assistance. Moreover, experiments conducted with three healthy subjects showed that the VGFL reduced the surface myoelectricity of the lower limbs during STS transfer. The VGFL could demonstrate high support performance in STS with the CoG tracking strategy.

13:30-15:00, Paper ThBT16-AX.3	Add to My Program
Towards Enhanced Stability of Human Stance with a Supernumerary Robotic Tail

Abeywardena, Sajeeva	University of Surrey
Farkhatdinov, Ildar	Queen Mary University of London
Keywords: Wearable Robotics, Human Performance Augmentation, Physical Human-Robot Interaction Abstract: Neural control is paramount in maintaining upright stance of a human; however, the associated time delay affects stability. In the design and control of wearable robots to augment human stance, the neural delay dynamics are often overly simplified or ignored leading to over specified systems. In this letter, the neural delay dynamics of human stance are modelled and embedded in the control of a supernumerary robotic tail to augment human balance. The actuation, geometric and inertial parameters of the tail are examined. Through simulations it was shown that by incorporating the delay dynamics, the requirements of the tail can be greatly reduced. Further, it is shown that robustness of stance is significantly enhanced with a supernumerary tail and that there is positive impact on muscle fatigue.

13:30-15:00, Paper ThBT16-AX.4	Add to My Program
Active, Quasi-Passive, Pneumatic, and Portable Knee Exoskeleton with Bidirectional Energy Flow for Efficient Air Recovery in Sit-Stand Tasks

Miskovic, Luka	Jožef Stefan Institute
Brecelj, Tilen	Jozef Stefan Institute
Dezman, Miha	Karlsruhe Institute of Technology
Petric, Tadej	Jozef Stefan Institute
Keywords: Wearable Robotics, Hydraulic/Pneumatic Actuators, Prosthetics and Exoskeletons Abstract: While existing literature encompasses exoskeleton-assisted sit-stand tasks, the integration of energy recovery mechanisms remains unexplored. To push the boundaries further, this study introduces a portable pneumatic knee exoskeleton that operates in both quasi-passive and active modes, where active mode is utilized for aiding in standing up (power generation), thus the energy flows from the exoskeleton to the user, and quasi-passive mode for aiding in sitting down (power absorption), where the device absorbs and can store energy in the form of compressed air, leading to energy savings in active mode. The absorbed energy can be stored and later reused without compromising exoskeleton transparency in the meantime. In active mode, an air pump inflates the pneumatic artificial muscle (PAM), which stores the compressed air, that can then be released into a pneumatic cylinder to generate torque. All electronic and pneumatic components are integrated into the system, and the exoskeleton weighs 3.9 kg with a maximum torque of 20 Nm at the knee joint. The paper describes the mechatronic design, mathematical model and includes a pilot study with an able-bodied subject performing sit-to-stand tasks. The results show that the exoskeleton can recover energy while assisting the subject and reducing mean muscle activity by ∼31%. Further results highlight air regeneration's potential for energy saving in portable pneumatic exoskeletons, showing that the proposed device extends exoskeleton operation by ∼27%.

13:30-15:00, Paper ThBT16-AX.5	Add to My Program
Task-Based Human-Robot Collaboration Control of Supernumerary Robotic Limbs for Overhead Tasks

Tu, Zhixin	Southern University of Science and Technology
Fang, Yijun	Southern University of Science and Technology
Leng, Yuquan	Southern University of Science and Technology
Fu, Chenglong	Southern University of Science and Technology (SUSTech)
Keywords: Wearable Robotics, Human-Robot Collaboration, Human Performance Augmentation Abstract: Supernumerary robotic limbs (SRLs) are novel wearable robots that can be used to augment human operating ability in completing some difficult and complex tasks. In this work, a task-based human-SRLs collaboration control method for overhead tasks is developed. It is autonomous and safe without the need for active commands and previous data. A task model is proposed to model the human-SRLs collaboration process that features different task states and state transition conditions. Specifically, the overhead task process is modeled as a finite state machine (FSM) with four task states, three trigger events, and three SRLs actions. The real-time measured human motion data is utilized to trigger the task state transition and estimate the task parameters, which are the constraints for SRLs motion planning. The proposed admittance control with adjustable parameters allows SRLs to behave like spring-damping systems with different characteristics in different states and actions, which enhances the safety and reliability of the human-SRLs interaction. Finally, the effectiveness of the proposed control method for overhead tasks is further validated on a prototype of the human-SRLs system with two subjects under different installation heights. Trigger events and task parameters are successfully detected and estimated during the task process to trigger the coordination actions of SRLs. The results demonstrate that the task-based collaboration method is useful for overhead tasks with differ

13:30-15:00, Paper ThBT16-AX.6	Add to My Program
Safe and Individualized Motion Planning for Upper-Limb Exoskeleton Robots Using Human Demonstration and Interactive Learning

Chen, Yu	Tsinghua University
Chen, Gong	Shenzhen MileBot Robotics
Ye, Jing	Shenzhen MileBot Robotics Co. Ltd
Qiu, Xiangjun	Tsinghua University
Li, Xiang	Tsinghua University
Keywords: Wearable Robotics, Safety in HRI, Physical Human-Robot Interaction Abstract: A typical application of upper-limb exoskeleton robots is deployment in rehabilitation training, helping patients to regain manipulative abilities. However, as the patient is not always capable of following the robot, safety issues may arise during the training. Due to the bias in different patients, an individualized scheme is also important to ensure that the robot suits the specific conditions (e.g., movement habits) of a patient, hence guaranteeing effectiveness. To fulfill this requirement, this paper proposes a new motion planning scheme for upper-limb exoskeleton robots, which drives the robot to provide customized, safe, and individualized assistance using both human demonstration and interactive learning. Specifically, the robot first learns from a group of healthy subjects to generate a reference motion trajectory via probabilistic movement primitives (ProMP). It then learns from the patient during the training process to further shape the trajectory inside a moving safe region. The interactive data is fed back into the ProMP iteratively to enhance the individualized features for as long as the training process continues. The robot tracks the individualized trajectory under a variable impedance model to realize the assistance. Finally, the experimental results are presented in this paper to validate the proposed control scheme.


ThBT23-NT Oral Session, NT-G401	Add to My Program
Aerial Systems: Perception and Autonomy II

Chair: Okada, Yoshito	Tohoku University
Co-Chair: Chli, Margarita	ETH Zurich & University of Cyprus

13:30-15:00, Paper ThBT23-NT.1	Add to My Program
Air Bumper: A Collision Detection and Reaction Framework for Autonomous MAV Navigation

Wang, Ruoyu	The Chinese University of Hong Kong
Guo, Zixuan	The Chinese University of Hong Kong
Chen, Yizhou	Chinese University of Hong Kong
Wang, Xinyi	The Chinese University of Hong Kong
Chen, Ben M.	Chinese University of Hong Kong
Keywords: Aerial Systems: Perception and Autonomy, Collision Avoidance, Aerial Systems: Applications Abstract: Autonomous navigation in unknown environments with obstacles remains challenging for micro aerial vehicles (MAVs) due to their limited onboard computing and sensing resources. Although various collision avoidance methods have been developed, it is still possible for drones to collide with unobserved obstacles due to unpredictable disturbances, sensor limitations, and control uncertainty. Instead of completely avoiding collisions, this article proposes Air Bumper, a collision detection and reaction framework, for fully autonomous flight in 3D environments to improve flight safety. Our framework only utilizes the onboard inertial measurement unit (IMU) to detect and estimate collisions. We further design a collision recovery control for rapid recovery and collision-aware mapping to integrate collision information into general LiDAR-based sensing and planning frameworks. Our simulation and experimental results show that the drone can rapidly detect, estimate, and recover from collisions with obstacles in 3D space and continue the flight smoothly with the help of the collision-aware map. In addition, we will open-source the implementation of Air Bumper on GitHub.

13:30-15:00, Paper ThBT23-NT.2	Add to My Program
Safety-Aware Perception for Autonomous Collision Avoidance in Dynamic Environments

Bena, Ryan	University of Southern California
Zhao, Chongbo	University of Southern California
Nguyen, Quan	University of Southern California
Keywords: Aerial Systems: Perception and Autonomy, Collision Avoidance, Robot Safety Abstract: Autonomous collision avoidance requires accurate environmental perception; however, flight systems often possess limited sensing capabilities with field-of-view (FOV) restrictions. To navigate this challenge, we present a safety-aware approach for online determination of the optimal sensor-pointing direction, psi_d, which utilizes control barrier functions (CBFs). First, we generate a spatial density function, Phi, which leverages CBF constraints to map the collision risk of all local coordinates. Then, we convolve Phi with an attitude-dependent sensor FOV quality function to produce the objective function, Gamma, which quantifies the total observed risk for a given pointing direction. Finally, by finding the global optimizer for Gamma, we identify the value of psi_d which maximizes the perception of risk within the FOV. We incorporate psi_d into a safety-critical flight architecture and conduct a numerical analysis using multiple simulated mission profiles. Our algorithm achieves a success rate of 88-96%, constituting a 16-29% improvement compared to the best heuristic methods. We demonstrate the functionality of our approach via a flight demonstration using the Crazyflie 2.1 micro-quadrotor. Without a priori obstacle knowledge, the quadrotor follows a dynamic flight path while simultaneously calculating and tracking psi_d to perceive and avoid two static obstacles with an average computation time of 371 mus.

13:30-15:00, Paper ThBT23-NT.3	Add to My Program
Incremental Multimodal Surface Mapping Via Self-Organizing Gaussian Mixture Models

Goel, Kshitij	Carnegie Mellon University
Tabib, Wennie	Carnegie Mellon University
Keywords: Aerial Systems: Perception and Autonomy, Field Robots, Multi-Robot Systems Abstract: This letter describes an incremental multimodal surface mapping methodology, which represents the environment as a continuous probabilistic model. This model enables high-resolution reconstruction while simultaneously compressing spatial and intensity point cloud data. The strategy employed in this work utilizes Gaussian mixture models (GMMs) to represent the environment. While prior GMM-based mapping works have developed methodologies to determine the number of mixture components using information-theoretic techniques, these approaches either operate on individual sensor observations, making them unsuitable for incremental mapping, or are not real-time viable, especially for applications where high-fidelity modeling is required. To bridge this gap, this letter introduces a spatial hash map for rapid GMM submap extraction combined with an approach to determine relevant and redundant data in a point cloud. These contributions increase computational speed by an order of magnitude compared to state-of-the-art incremental GMM-based mapping. In addition, the proposed approach yields a superior tradeoff in map accuracy and size when compared to state-of-the-art mapping methodologies (both GMM- and not GMM-based). Evaluations are conducted using both simulated and real-world data. The software is released open-source to benefit the robotics community.

13:30-15:00, Paper ThBT23-NT.4	Add to My Program
Learning to Explore Indoor Environments Using Autonomous Micro Aerial Vehicles

Tao, Yuezhan	University of Pennsylvania
Iceland, Eran	Hebrew University Jerusalem Israel
Li, Beiming	University of Pennsylvania
Zwecher, Elchanan	Hebrew University
Heinemann, Uri	Hebrew University of Jerusalem
Cohen, Avraham	Technion
Avni, Amir	Technion
Gal, Oren	Technion - Israel Institute of Technology
Barel, Ariel	Technion - Israel Institute of Technology
Kumar, Vijay	University of Pennsylvania
Keywords: Aerial Systems: Perception and Autonomy, Mapping, Reinforcement Learning Abstract: In this paper, we address the challenge of exploring unknown indoor environments using autonomous aerial robots with Size Weight and Power (SWaP) constraints. The SWaP constraints induce limits on mission time requiring efficiency in exploration. We present a novel exploration framework that uses Deep Learning (DL) to predict the most likely indoor map given the previous observations, and Deep Reinforcement Learning (DRL) for exploration, designed to run on modern SWaP constraints neural processors. The DL-based map predictor provides a prediction of the occupancy of the unseen environment while the DRL-based planner determines the best navigation goals that can be safely reached to provide the most information. The two modules are tightly coupled and run onboard allowing the vehicle to safely map an unknown environment. Extensive experimental and simulation results show that our approach surpasses state-of-the-art methods by 50-60% in efficiency, which we measure by the fraction of the explored space as a function of the trajectory length.

13:30-15:00, Paper ThBT23-NT.5	Add to My Program
Multi-Robot Multi-Room Exploration with Geometric Cue Extraction and Circular Decomposition

Kim, Seungchan	Carnegie Mellon University
Corah, Micah	Colorado School of Mines
Keller, John	Carnegie Mellon University
Best, Graeme	University of Technology Sydney
Scherer, Sebastian	Carnegie Mellon University
Keywords: Aerial Systems: Perception and Autonomy, Multi-Robot Systems, Vision-Based Navigation Abstract: This work proposes an autonomous multi-robot exploration pipeline that coordinates the behaviors of robots in an indoor environment composed of multiple rooms. Contrary to simple frontier-based exploration approaches, we aim to enable robots to methodically explore and observe an unknown set of rooms in a structured building, keeping track of which rooms are already explored and sharing this information among robots to coordinate their behaviors in a distributed manner. To this end, we propose (1) a geometric cue extraction method that processes 3D point cloud data and detects the locations of potential cues such as doors and rooms, (2) a circular decomposition for free spaces used for target assignment. Using these two components, our pipeline effectively assigns tasks among robots, and enables a methodical exploration of rooms. We evaluate the performance of our pipeline using a team of up to 3 aerial robots, and show that our method outperforms the baseline by 33.4% in simulation and 26.4% in real-world experiments.

13:30-15:00, Paper ThBT23-NT.6	Add to My Program
Fast Multi-UAV Decentralized Exploration of Forests

Bartolomei, Luca	ETH Zurich
Teixeira, Lucas	ETH Zurich
Chli, Margarita	ETH Zurich & University of Cyprus
Keywords: Aerial Systems: Perception and Autonomy, Path Planning for Multiple Mobile Robots or Agents Abstract: Efficient exploration strategies are vital in tasks such as search-and-rescue missions and disaster surveying. Unmanned Aerial Vehicles (UAVs) have become particularly popular in such applications, promising to cover large areas at high speeds. Moreover, with the increasing maturity of onboard UAV perception, research focus has been shifting toward higher-level reasoning for multi-robot missions. However, autonomous navigation and exploration of previously unknown large spaces still constitute an open challenge, especially when the environment is cluttered and exhibits large and frequent occlusions due to high obstacle density, as is the case of forests. Moreover, the problem of long-distance wireless communication in such scenes can become a limiting factor, especially when automating the navigation of a UAV fleet. In this spirit, this work proposes an exploration strategy that enables multiple UAVs to quickly explore complex scenes in a decentralized fashion. By providing the decision-making capabilities to each UAV to switch between different execution modes, the proposed strategy is shown to strike a great balance between cautious exploration of yet completely unknown regions and more aggressive exploration of smaller areas of unknown space. This results in full coverage of forest areas in multi-UAV setups up to 30% faster than the state of the art.

13:30-15:00, Paper ThBT23-NT.7	Add to My Program
Reinforcement Learning for Collision-Free Flight Exploiting Deep Collision Encoding

Kulkarni, Mihir	NTNU: Norwegian University of Science and Technology
Alexis, Kostas	NTNU - Norwegian University of Science and Technology
Keywords: Aerial Systems: Perception and Autonomy, Reinforcement Learning Abstract: This work contributes a novel deep navigation policy that enables collision-free flight of aerial robots based on a modular approach exploiting deep collision encoding and reinforcement learning. The proposed solution builds upon a deep collision encoder that is trained on both simulated and real depth images using supervised learning such that it compresses the high-dimensional depth data to a low-dimensional latent space encoding collision information while accounting for the robot size. This compressed encoding is combined with an estimate of the robot's odometry and the desired target location to train a deep reinforcement learning navigation policy that offers low-latency computation and robust sim2real performance. A set of simulation and experimental studies in diverse environments are conducted and demonstrate the efficiency of the emerged behavior and its resilience in real-life deployments.

13:30-15:00, Paper ThBT23-NT.8	Add to My Program
Learning Agile Flights through Narrow Gaps with Varying Angles Using Onboard Sensing

Xie, Yuhan	The University of Hong Kong
Lu, Minghao	The University of Hong Kong
Peng, Rui	The University of Hong Kong
Lu, Peng	The University of Hong Kong
Keywords: Aerial Systems: Perception and Autonomy, Reinforcement Learning, Integrated Planning and Learning Abstract: This paper addresses the problem of traversing through unknown, tilted, and narrow gaps for quadrotors using Deep Reinforcement Learning (DRL). Previous learning-based methods relied on accurate knowledge of the environment, including the gap's pose and size. In contrast, we integrate onboard sensing and detect the gap from a single onboard camera. The training problem is challenging for two reasons: a precise and robust whole-body planning and control policy is required for variable-tilted and narrow gaps, and an effective Sim2Real method is needed to successfully conduct real-world experiments. To this end, we propose a learning framework for agile gap traversal flight, which successfully trains the vehicle to traverse through the center of the gap at an approximate attitude to the gap with aggressive tilted angles. The policy trained only in a simulation environment can be transferred into different domains with fine-tuning while maintaining the success rate. Our proposed framework, which integrates onboard sensing and a neural network controller, achieves a success rate of 87.36% in real-world experiments, with gap orientations up to 60deg. To the best of our knowledge, this is the first paper that performs the learning-based variable-tilted narrow gap traversal flight in the real world, without prior knowledge of the environment.


ThBT30-NT Oral Session, NT-G6	Add to My Program
AI-Enabled Robotics and Learning

Chair: Lu, Peng	The University of Hong Kong
Co-Chair: Inamura, Tetsunari	Tamagawa University

13:30-15:00, Paper ThBT30-NT.1	Add to My Program
Ethically Compliant Autonomous Systems under Partial Observability

Lu, Qingyuan	Massachusetts Institute of Technology
Svegliato, Justin	University of California Berkeley
Nashed, Samer	University of Massachusetts Amherst
Zilberstein, Shlomo	University of Massachusetts
Russell, Stuart Jonathan	University of California, Berkeley
Keywords: Ethics and Philosophy, Planning under Uncertainty Abstract: Ethically compliant autonomous systems (ECAS) are the prevailing approach to building robotic systems that perform sequential decision making subject to ethical theories in fully observable environments. However, in real-world robotics settings, these systems often operate under partial observability because of sensor limitations, environmental conditions, or limited inference due to bounded computational resources. Therefore, this paper proposes a partially observable ECAS (PO-ECAS), bringing this work one step closer to being a practical and useful tool for roboticists. First, we formally introduce the PO-ECAS framework and a MILP-based solution method for approximating an optimal ethically compliant policy. Next, we extend an existing ethical framework for prima facie duties to belief space and offer an ethical framework for virtue ethics inspired by Aristotle's Doctrine of the Mean. Finally, we demonstrate that our approach is effective in a simulated campus patrol robot domain.

13:30-15:00, Paper ThBT30-NT.2	Add to My Program
Prompt, Plan, Perform: LLM-Based Humanoid Control Via Quantized Imitation Learning

Sun, Jingkai	The Hong Kong University of Science and Technology(GZ)
Zhang, Qiang	The Hong Kong University of Science and Technology (Guangzhou)
Duan, Yiqun	University of Technolgoy Sydney
Jiang, Xiaoyang	Northeastern University
Cheng, Chong	HKUST(GZ)
Xu, Renjing	The Hong Kong University of Science and Technology (Guangzhou)
Keywords: Imitation Learning, Whole-Body Motion Planning and Control, Human and Humanoid Motion Analysis and Synthesis Abstract: In recent years, reinforcement learning and imitation learning have shown great potential for controlling humanoid robots' motion. However, these methods typically create simulation environments and rewards for specific tasks, resulting in the requirements of multiple policies and limited capabilities for tackling complex and unknown tasks. To overcome these issues, we present a novel approach that combines adversarial imitation learning with large language models (LLMs). This innovative method enables the agent to learn reusable skills with a single policy and solve zero-shot tasks under the guidance of LLMs. In particular, we utilize the LLM as a strategic planner for applying previously learned skills to novel tasks through the comprehension of task-specific prompts. This empowers the robot to perform the specified actions in a sequence. To improve our model, we incorporate codebook-based vector quantization, allowing the agent to generate suitable actions in response to unseen textual commands from LLMs. Furthermore, we design general reward functions that consider the distinct motion features of humanoid robots, ensuring the agent imitates the motion data while maintaining goal orientation without additional guiding direction approaches or policies. To the best of our knowledge, this is the first framework that controls humanoid robots using a single learning policy network and LLM as a planner. Extensive experiments demonstrate that our method exhibits efficient and adaptive ability in complicated motion tasks.

13:30-15:00, Paper ThBT30-NT.3	Add to My Program
Infer and Adapt: Bipedal Locomotion Reward Learning from Demonstrations Via Inverse Reinforcement Learning

Wu, Feiyang	Georgia Institute of Technology
Gu, Zhaoyuan	Georgia Institute of Technology
Wu, Hanran	Georgia Institute of Technology
Wu, Anqi	Georgia Institute of Technology
Zhao, Ye	Georgia Institute of Technology
Keywords: Learning from Demonstration, Humanoid and Bipedal Locomotion, Reinforcement Learning Abstract: Enabling bipedal walking robots to learn how to maneuver over highly uneven, dynamically changing terrains is challenging due to the complexity of robot dynamics and interacted environments. Recent advancements in learning from demonstrations have shown promising results for robot learning in complex environments. While imitation learning of expert policies has been well-explored, the study of learning expert reward functions is largely under-explored in legged locomotion. This paper brings state-of-the-art Inverse Reinforcement Learning (IRL) techniques to solving bipedal locomotion problems over complex terrains. We propose algorithms for learning expert reward functions, and we subsequently analyze the learned functions. Through nonlinear function approximation, we uncover meaningful insights into the expert's locomotion strategies. Furthermore, we empirically demonstrate that training a bipedal locomotion policy with the inferred reward functions enhances its walking performance on unseen terrains, highlighting the adaptability offered by reward learning.

13:30-15:00, Paper ThBT30-NT.4	Add to My Program
Online Distribution Shift Detection Via Recency Prediction

Luo, Rachel	Stanford University
Sinha, Rohan	Stanford University
Sun, Yixiao	Stanford University
Hindy, Ali	Stanford University
Zhao, Shengjia	Stanford University
Savarese, Silvio	Stanford University
Schmerling, Edward	Stanford University
Pavone, Marco	Stanford University
Keywords: Probability and Statistical Methods, AI-Enabled Robotics, Methods and Tools for Robot System Design Abstract: When deploying modern machine learning-enabled robotic systems in high-stakes applications, detecting distribution shift is critical. However, most existing methods for detecting distribution shift are not well-suited to robotics settings, where data often arrives in a streaming fashion and may be very high-dimensional. In this work, we present an online method for detecting distribution shift with guarantees on the false positive rate --- i.e., when there is no distribution shift, our system is very unlikely (with probability < epsilon) to falsely issue an alert; any alerts that are issued should therefore be heeded. Our method is specifically designed for efficient detection even with high dimensional data, and it empirically achieves up to 11x faster detection on realistic robotics settings compared to prior work while maintaining a low false negative rate in practice (whenever there is a distribution shift in our experiments, our method indeed emits an alert). We demonstrate our approach in both simulation and hardware for a visual servoing task, and show that our method indeed issues an alert before a failure occurs.

13:30-15:00, Paper ThBT30-NT.5	Add to My Program
Simplified Continuous High Dimensional Belief Space Planning with Adaptive Probabilistic Belief-Dependent Constraints

Zhitnikov, Andrey	Technion – Israel Institute of Technology
Indelman, Vadim	Technion - Israel Institute of Technology
Keywords: Probability and Statistical Methods, Autonomous Agents, SLAM, Belief Space Planning Abstract: Online decision making under uncertainty in partially observable domains, also known as Belief Space Planning, is a fundamental problem in Robotics and Artificial Intelligence. Due to an abundance of plausible future unravelings, calculating an optimal course of action inflicts an enormous computational burden on the agent. Moreover, in many scenarios, e.g., Information gathering, it is required to introduce a belief-dependent constraint. Prompted by this demand, in this paper, we consider a recently introduced probabilistic belief-dependent constrained POMDP. We present a technique to adaptively accept or discard a candidate action sequence with respect to a probabilistic belief-dependent constraint, before expanding a complete set of sampled future observations episodes and without any loss in accuracy. Moreover, using our proposed framework, we contribute an adaptive method to find a maximal feasible return (e.g., Information Gain) in terms of Value at Risk and a corresponding action sequence, given a set of candidate action sequences, with substantial acceleration. On top of that, we introduce an adaptive simplification technique for a probabilistically constrained setting. Such an approach provably returns an identical-quality solution while dramatically accelerating the online decision making. Our universal framework applies to any belief-dependent constrained continuous POMDP with parametric beliefs, as well as nonparametric beliefs represented by particles.

13:30-15:00, Paper ThBT30-NT.6	Add to My Program
Conformal Policy Learning for Sensorimotor Control under Distribution Shifts

Huang, Huang	University of California at Berkeley
Sharma, Satvik	University of California, Berkeley
Loquercio, Antonio	UC Berkeley
Angelopoulos, Anastasios	University of California, Berkeley
Goldberg, Ken	UC Berkeley
Malik, Jitendra	UC Berkeley
Keywords: Probability and Statistical Methods, Machine Learning for Robot Control, Sensorimotor Learning Abstract: This paper focuses on the problem of detecting and reacting to changes in the distribution of a sensorimotor controller’s observables. The key idea is the design of policies that can take conformal quantiles as input, to detect distribution shifts with formal statistical guarantees, which we define as conformal policy learning. We show how to design such policies by using conformal quantiles to switch between base policies with different characteristics, e.g. safety or speed, or directly augmenting a policy observation with a quantile and training it with reinforcement learning. Theoretically, we show that such policies achieve the formal convergence guarantees in finite time. In addition, we thoroughly evaluate their advantages and limitations on two use cases: simulated autonomous driving and active perception with a physical quadruped. Empirical results demonstrate that our approach outperforms five baselines. It is also the simplest of the baseline strategies besides one ablation. Being easy to use, flexible, and with formal guarantees, our work demonstrates how conformal prediction can be an effective tool for sensorimotor learning under uncertainty.

13:30-15:00, Paper ThBT30-NT.7	Add to My Program
Resampling-Free Particle Filters in High-Dimensions

Boopathy, Akhilan	Massachusetts Institute of Technology
Muppidi, Aneesh	Harvard
Yang, Peggy	MIT
Iyer, Abhiram	MIT
Yue, William	MIT
Ila, Fiete	MIT
Keywords: Probability and Statistical Methods, Probabilistic Inference, Localization Abstract: State estimation is crucial for the performance and safety of numerous robotic applications. Among the suite of estimation techniques, particle filters have been identified as a powerful solution due to their non-parametric nature. Yet, in high-dimensional state spaces, these filters face challenges such as 'particle deprivation' which hinders accurate representation of the true posterior distribution. This paper introduces a novel resampling-free particle filter designed to mitigate particle deprivation by forgoing the traditional resampling step. This ensures a broader and more diverse particle set, especially vital in high-dimensional scenarios. Theoretically, our proposed filter is shown to offer a near-accurate representation of the desired posterior distribution in high-dimensional contexts. Empirically, the effectiveness of our approach is underscored through a high-dimensional synthetic state estimation task and a 6D pose estimation derived from videos. We posit that as robotic systems evolve with greater degrees of freedom, particle filters tailored for high-dimensional state spaces will be indispensable.

13:30-15:00, Paper ThBT30-NT.8	Add to My Program
A New Perspective of DL Testing Framework: Human-Computer Interaction Based Neural Network Testing

Kong, Wei	National Key Laboratory on Science and Technology of Information
Hu, Li	National Key Laboratory on Science and Technology of Information
Qianjin, Du	Tsinghua University
Huayang, Cao	National Key Laboratory on Science and Technology of Information
Xiaohui, Kuang	National Key Laboratory on Science and Technology of Information
Keywords: Human and Humanoid Motion Analysis and Synthesis, Safety in HRI, Humanoid Robot Systems Abstract: Deep learning models have revolutionized various domains but have also raised concerns regarding their security and reliability. Adversarial attacks and coverage-based testing have been extensively studied to assess and enhance the dependability of deep neural networks. However, current research in this area has reached a state of stagnation. Adversarial attacks focus on exploiting vulnerabilities in models, while coverage-based testing aims to achieve comprehensive testing but overlooks application scenarios. Moreover, evaluating test cases solely based on their fault-revealing capability is insufficient. To address these limitations, we propose an innovative interdisciplinary framework that incorporates human-computer interaction methods in deep learning security testing. By considering the attributes of model application scenarios, we can design more effective test suites. Additionally, we establish a comprehensive evaluation metric for test suite quality, considering factors such as diversity and naturalness. This framework promotes reliable and secure deployment of deep learning models, fostering interdisciplinary collaboration between artificial intelligence and human-computer interaction.


ThKN1-HL Keynote Session, National Convention Hall	Add to My Program
Keynote: Robotics Foundations III

Chair: Chung, Jen Jen	The University of Queensland

15:30-16:00, Paper ThKN1-HL.1	Add to My Program
Emergent Functions of Electrically-Induced Bubbles and Intra-Cellular-CA(Cybernetic Avatar)

Yamanishi, Yoko	Kyushu University


ThKN2-CC Keynote Session, CC-Main Hall	Add to My Program
Keynote: Automation III

Chair: Fanti, Maria Pia	Politecnico Di Bari

15:30-16:00, Paper ThKN2-CC.1	Add to My Program
Robotic Manipulation Aiming for Industrial Applications

Harada, Kensuke	Osaka University


ThKN3-CC Keynote Session, CC-301	Add to My Program
Keynote: Human Centered and Lifelike Robotics III

Chair: Atashzar, S. Farokh	New York University (NYU), US

15:30-16:00, Paper ThKN3-CC.1	Add to My Program
The Quest for Social Robot Autonomy

Leite, Iolanda	KTH Royal Institute of Technology


ThKN4-NT Keynote Session, NT-G2	Add to My Program
Keynote: Robots for Unstructured Environments III

Chair: Nagatani, Keiji	The University of Tokyo

15:30-16:00, Paper ThKN4-NT.1	Add to My Program
AI Empowered Robotics¡§change and Challenge

Xiong, Rong	Zhejiang University


ThKN5-NT Keynote Session, NT-G7	Add to My Program
Keynote: Healthcare and Medical Robotics III

Chair: Hirata, Yasuhisa	Tohoku University

15:30-16:00, Paper ThKN5-NT.1	Add to My Program
Tiny Robots, Big Impact: Transforming Gynecological Care

Medina, Mariana	Institute of Integrative Nanosciences


ThCT2-CC Oral Session, CC-311	Add to My Program
Autonomous Agents II

Chair: Sun, Liang	New Mexico State University
Co-Chair: Dillmann, Rüdiger	FZI - Forschungszentrum Informatik - Karlsruhe

16:30-18:00, Paper ThCT2-CC.1	Add to My Program
Adaptive Pedestrian Agent Modeling for Scenario-Based Testing of Autonomous Vehicles through Behavior Retargeting

Muktadir, Golam Md	University of California, Santa Cruz
Whitehead, Jim	University of California, Santa Cruz
Keywords: Behavior-Based Systems, Modeling and Simulating Humans, Task and Motion Planning Abstract: This work proposes a new representation of pedestrian crossing scenarios and a hybrid modeling approach, RePed, that facilitates transferring microscopic behavior models from behavior research to higher-level trajectories. With this, real-world trajectory-based scenarios can be augmented with a diverse set of human crossing maneuvers, producing a wealth of new scenarios and addressing the scarcity of rare case data that existing works struggle to deal with. Leveraging the controllability of this modeling approach, perturbation-based augmentation can be applied to enrich scenarios further. In addition, the representation is rooted in the Ego vehicle's coordinate system with a logical representation of roads. This design enables scenario retargeting to various road structures, traffic conditions, and ego vehicle behaviors. Thus, it strongly supports scenario-based testing by forcing pedestrians to produce certain situations in simulation even when the Ego Vehicle tries to evade them.

16:30-18:00, Paper ThCT2-CC.2	Add to My Program
KT-BT: A Framework for Knowledge Transfer through Behavior Trees in Multi-Robot Systems

Oruganti Venkata, Sanjay Sarma	Rensselaer Polytechnic Institute
Parasuraman, Ramviyas	University of Georgia
Pidaparti, Ramana	University of Georgia
Keywords: Behavior-Based Systems, Multi-Robot Systems, Behavior Trees, Cooperating Robots Abstract: Multi-Robot and Multi-Agent Systems demonstrate collective (swarm) intelligence through systematic and distributed integration of local behaviors in a group. Agents sharing knowledge about the mission and environment can enhance performance at individual and mission levels. However, this is difficult to achieve, partly due to the lack of a generic framework for transferring part of the known knowledge (behaviors) between agents. This paper presents a new knowledge representation framework and a transfer strategy called KT-BT: Knowledge Transfer through Behavior Trees. The KT-BT framework follows a query-response-update mechanism through an online Behavior Tree framework, where agents broadcast queries for unknown conditions and respond with appropriate knowledge using a condition-action-control sub-flow. We embed a novel grammar structure called stringBT that encodes knowledge, enabling behavior sharing. We theoretically investigate the properties of the KT-BT framework in achieving homogeneity of high knowledge across the entire group compared to a heterogeneous system without the capability of sharing their knowledge. We extensively verify our framework in a simulated multi-robot

16:30-18:00, Paper ThCT2-CC.3	Add to My Program
Distributed Matching-By-Clone Hungarian-Based Algorithm for Task Allocation of Multi-Agent Systems

Samiei, Arezoo	USA
Sun, Liang	New Mexico State University
Keywords: Autonomous Agents, Distributed Robot Systems, Multi-Robot Systems, Task Planning Abstract: In this article, we present a novel approach, namely distributed matching-by-clone hungarian-based algorithm (DMCHBA), to multiagent task-allocation problems, in which the number of agents is smaller than the number of tasks. The proposed DMCHBA assumes that agents employ an implicit coordination mechanism and consists of two iterative phases, i.e., the communication phase and the assignment phase. In the communication phase, agents communicate with their connected neighbors and exchange their local knowledge base until they converge on the global knowledge base. In the assignment phase, each agent builds a squared cost matrix by cloning agents adding pseudotasks when necessary, and applying the Hungarian method for task allocation. A local planning algorithm is then applied to identify the order of task execution for an agent. The proposed DMCHBA is proven to produce conflict-free assignments among agents in finite time. We compare the performance of DMCHBA with the consensus-based bundle algorithm, the distributed recursive Hungarian-based algorithms, and the cluster-based Hungarian algorithm (CBHA) in Monte-Carlo simulations with different numbers of agents and tasks. The numerical results reveal the superior convergence and optimality of DMCHBA over all other selected algorithms.

16:30-18:00, Paper ThCT2-CC.4	Add to My Program
Convolutional Vision Transformer As a Path Following Controller for Omnidirectional Robots

Athni Hiremath, Sandesh	TU Kaiserslautern
Huang, ChengYi	Rheinland-Pfälzische Technische Universität Kaiserslautern-Landa
Tika, Argtim	Technische Universität Kaiserslautern
Bajcinca, Naim	TU Kaiserslautern
Keywords: Autonomous Agents, Motion Control, Deep Learning Methods Abstract: A novel deep neural network (DNN) based controller for omnidirectional robots is proposed. The controller decomposes the prescribed reference path, corresponding to a fixed prediction horizon, into multiple shorter paths corresponding to shorter prediction horizons. This implicitly enforces a Hankel structure in the input and consequently also on the output. Taking advantage of this, a convolutional vision transformer model is used to realize the controller which is then trained to predict state and controls over multiple prediction horizons. Model training is performed in a self-supervised manner using a synthetic dataset. The proposed controller is shown to be more efficient than a model designed for a single prediction horizon. In comparison to a model predictive controller, the proposed approach exhibits competitive performance in path following tasks and is 3 times faster on average for the same prediction length.

16:30-18:00, Paper ThCT2-CC.5	Add to My Program
Can an Embodied Agent Find Your “Cat-Shaped Mug”? LLM-Based Zero-Shot Object Navigation

Dorbala, Vishnu Sashank	University of Maryland, College Park
Mullen, James	University of Maryland
Manocha, Dinesh	University of Maryland
Keywords: Autonomous Agents, Domestic Robotics, AI-Enabled Robotics Abstract: We present LGX (Language-guided Exploration), a novel algorithm for Language-Driven Zero-Shot Object Goal Navigation (L-ZSON), where an embodied agent navigates to an uniquely described target object in a previously unseen environment. Our approach makes use of Large Language Models (LLMs) for this task by leveraging the LLM’s commonsense-reasoning capabilities for making sequential navigational decisions. Simultaneously, we perform generalized target object detection using a pre-trained Vision-Language grounding model. We achieve state-of-the-art zero-shot object navigation results on RoboTHOR with a success rate (SR) improvement of over 27% over the current baseline of the OWL-ViT CLIP on Wheels (OWL CoW). Furthermore, we study the usage of LLMs for robot navigation and present an analysis of various prompting strategies affecting the model output. Finally, we showcase the benefits of our approach via real-world experiments that indicate the superior performance of LGX in detecting and navigating to visually unique objects.

16:30-18:00, Paper ThCT2-CC.6	Add to My Program
AutoExplorers: Autoencoder-Based Strategies for High-Entropy Exploration in Unknown Environments for Mobile Robots

Puck, Lennart	FZI Forschungszentrum Informatik
Schik, Maximilian	FZI Forschungszentrum Informatik
Schnell, Tristan	FZI Forschungszentrum Informatik
Buettner, Timothee	FZI Research Center for Information Technology
Roennau, Arne	FZI Forschungszentrum Informatik, Karlsruhe
Dillmann, Rüdiger	FZI - Forschungszentrum Informatik - Karlsruhe
Keywords: Autonomous Agents, AI-Enabled Robotics, Space Robotics and Automation Abstract: Deciding where to go next is a challenging task for humans. However, for robots in unknown environments, this becomes even more demanding. In planetary explorations, the robots are continuously challenged with the task of exploring novel areas, yet so far, humans decide for the robots where to go. Even then, prioritizing the next target based on previous knowledge is complex. In our proposed work, the robot utilizes data about its surroundings from drone or satellite images. Alternatively, a volumetric representation can be reduced to form a suitable input. From the input, tiles are selected and embedded by different autoencoder variants. The robot can select the most promising next exploration goal through the distance in the embedding to the previous samples. In this work, a variational autoencoder, a Wasserstein autoencoder, and a spherical autoencoder are evaluated against each other. The latter two variants yield a high information gain when evaluated on satellite data from the Netherlands. Additionally, the framework was employed on data from an analog mission in the Tabernas desert. Through the framework, the robots get an understanding of which goals yield the most information gain and, therefore, can quickly improve their knowledge about their surroundings.

16:30-18:00, Paper ThCT2-CC.7	Add to My Program
LLM-BT: Performing Robotic Adaptive Tasks Based on Large Language Models and Behavior Trees

Zhou, Haotian	Wuhan University of Science and Technology
Lin, Yunhan	Wuhan University of Science and Technology
Yan, Longwu	Wuhan University of Science and Technology
Zhu, Jihong	University of York
Min, Huasong	Robotics Institute of Beihang University of China
Keywords: Behavior-Based Systems, AI-Based Methods, Control Architectures and Programming Abstract: Large Language Models (LLMs) have been widely utilized to perform complex robotic tasks. However, handling external disturbances during tasks is still an open challenge. This paper proposes a novel method to achieve robotic adaptive tasks based on LLMs and Behavior Trees (BTs). It utilizes ChatGPT to reason the descriptive steps of tasks. In order to enable ChatGPT to understand the environment, semantic maps are constructed by an object recognition algorithm. Then, we design a Parser module based on Bidirectional Encoder Representations from Transformers (BERT) to parse these steps into initial BTs. Subsequently, a BTs Update algorithm is proposed to expand the initial BTs dynamically to control robots to perform adaptive tasks. Different from other LLM based methods for complex robotic tasks, our method outputs variable BTs that can add and execute new actions according to environmental changes, which is robust to external disturbances. Our method is valid with simulation in different practical scenarios.

16:30-18:00, Paper ThCT2-CC.8	Add to My Program
DyHGDAT: Dynamic Hypergraph Dual Attention Network for Multi-Agent Trajectory Prediction

Lin, Weilong	Fudan University
Zeng, Xinhua	Fudan University
Teng, Jing	North China Electric Power University
Chengxin, Pang	Shanghai University of Electric Power
Liu, Jing	Fudan University
Keywords: AI-Based Methods, Autonomous Agents, Agent-Based Systems Abstract: Modeling the interactions among agents based on their historical trajectories is key to precise multi-agent trajectory prediction. Hypergraph Convolutional Networks (HGCN) have become a proper choice for capturing highorder interactions among agents in this field. However, most existing works only consider static hypergraphs, and ignore that in a hypergraph, the power of influence varies between vertices (or hyperedges). Therefore, we propose DyHGDAT, a dynamic hypergraph dual attention network to capture the high-order interactions among agents, which not only models the evolution of hypergraph over time but also highlights the vertices and hyperedges with larger impacts. We apply DyHGDAT to a CVAE-based prediction system for predicting plausible trajectories. To validate the effectiveness of prediction, we evaluate our proposed method on two well-established trajectory prediction datasets: the ETH/UCY datasets and the Stanford Drone Dataset (SDD). The experimental results show that with DyHGDAT, the CVAE-based prediction system outperforms state-of-the-art methods by 12.5%/5.3% in ADE/FDE on ETH/UCY, and the improvement on SDD is 6.4%/7.4%.


ThCT4-CC Oral Session, CC-315	Add to My Program
Cooperating Cellular Robots

Chair: Wen, John	Rensselaer Polytechnic Institute

16:30-18:00, Paper ThCT4-CC.1	Add to My Program
AirTwins: Modular Bi-Copters Capable of Splitting from Their Combined Quadcopter in Midair

Li, Song	Beihang University
Liu, Fangyuan	Beihang University
Gao, Yuzhe	Beihang University
Xiang, Jinwu	Beihang University
Tu, Zhan	Beihang University
Li, Daochun	Beihang University
Keywords: Cellular and Modular Robots, Aerial Systems: Mechanics and Control, Mechanism Design Abstract: Micro tandem bi-copters are capable of passing through narrow gaps owing to their particular slender shape. However, the introduction of the tilting servo motors leads to a non-minimum phase roll dynamics, which affects their flight stability when exploring environments with unpredictable disturbances. In this paper, we propose and design a re-configurable aerial platform consisting of two modular bi-copters with an undocking mechanism. In combined configuration, a crossover docking approach is employed to compensate for the poor stability in their servo-controlled attitude of each bi-copter. In bicopter configuration, the minimum size (equal to ideal passable gap's width) of the system was reduced by 58% through mid-air separation. In detail, to compare the attitude response of the two configurations, a dynamic model considering servo response and non-minimum phase is established and simulated, and flying poking experiments were also conducted on them respectively. On the other hand, the performance of single bi-copter including trajectory tracking and passing through narrow gaps was demonstrated through flight tests. Finally, the feasibility of the undocking mechanism was verified by mid-air separation experiments. The proposed system is promising to be applied in scenarios containing both complex perturbations and confined spaces, while also having the potential to improve exploration efficiency through collaborative work.

16:30-18:00, Paper ThCT4-CC.2	Add to My Program
ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch

Xue, Zhengrong	Tsinghua University
Zhang, Han	Tsinghua University, Shanghai Qi Zhi Institute
Jingwen, Cheng	Tsinghua University
He, Zhengmao	Shanghai Qi Zhi Institute
Ju, Yuanchen	Southwest University
Lin, Changyi	Carnegie Mellon University
Zhang, Gu	Shanghai Jiaotong University
Xu, Huazhe	Tsinghua University
Keywords: Cellular and Modular Robots, Deep Learning in Grasping and Manipulation, Force and Tactile Sensing Abstract: We present ArrayBot, a distributed manipulation system consisting of a 16x16 array of vertically sliding pillars integrated with tactile sensors. Functionally, ArrayBot is designed to simultaneously support, perceive, and manipulate the tabletop objects. Towards generalizable distributed manipulation, we leverage reinforcement learning (RL) algorithms for the automatic discovery of control policies. In the face of the massively redundant actions, we propose to reshape the action space by considering the spatially local action patch and the low-frequency actions in the frequency domain. With this reshaped action space, we train RL agents that can relocate diverse objects through tactile observations only. Intriguingly, we find that the discovered policy can not only generalize to unseen object shapes in the simulator but also have the ability to transfer to the physical robot without any sim-to-real fine-tuning. Leveraging the deployed policy, we derive more real-world manipulation skills on ArrayBot to further illustrate the distinctive merits of our proposed system.

16:30-18:00, Paper ThCT4-CC.3	Add to My Program
Optimizing Modular Robot Composition: A Lexicographic Genetic Algorithm Approach

Külz, Jonathan	Technical University of Munich
Althoff, Matthias	Technische Universität München
Keywords: Cellular and Modular Robots, Methods and Tools for Robot System Design, Mechanism Design Abstract: Industrial robots are designed as general-purpose hardware with limited ability to adapt to changing task requirements or environments. Modular robots, on the other hand, offer flexibility and can be easily customized to suit diverse needs. The morphology, i.e., the form and structure of a robot, significantly impacts the primary performance metrics acquisition cost, cycle time, and energy efficiency. However, identifying an optimal module composition for a specific task remains an open problem, presenting a substantial hurdle in developing task-tailored modular robots. Previous approaches either lack adequate exploration of the design space or the possibility to adapt to complex tasks. We propose combining a genetic algorithm with a lexicographic evaluation of solution candidates to overcome this problem and navigate search spaces exceeding those in prior work by magnitudes in the number of possible compositions. We demonstrate that our approach outperforms a state-of-the-art baseline and is able to synthesize modular robots for industrial tasks in cluttered environments.

16:30-18:00, Paper ThCT4-CC.4	Add to My Program
WiBot 1.0: A Modular Reconfigurable Glass Cleaning Robot for High-Rise Buildings

Akalanka, Sudheera	University of Moratuwa
Sandeepa, Harith	University of Moratuwa
Athauda Pathirana, Manu	University of Moratuwa
Amarasinghe, Ranjith	University of Moratuway
Jayasekara, A.G.B.P.	University of Moratuwa
Hanchapola Appuhamilage, Gihan Charith Premachandra	Singapore University of Technology and Design
Tan, U-Xuan	Singapore University of Techonlogy and Design
Keywords: Cellular and Modular Robots, Task and Motion Planning, Climbing Robots Abstract: Cleaning glass surfaces is a prevailing maintenance problem in high-rise buildings. In the traditional methods of cleaning windows, hanging on ropes poses significant occupational hazards to workers. Furthermore, most glass facades feature window frames to securely fasten the glass panels to the building structure, ensuring durability and elegance. In this context, existing robotic cleaning methods are limited by their capability to move-over window frames and need more flexibility to access tight corners and curved surfaces. This paper presents a novel reconfigurable glass cleaning robot called "WiBot" to address these limitations. WiBot is a kinematic chain comprising modular linkages with a prismatic joint and two revolute joints at each end. Each revolute joint has a suction unit that enables locomotion and adhesion. Window frames are detected using image processing with an onboard camera, and design optimizations were performed to improve the robot's capabilities. The prototype WiBot 1.0 was developed, and several experiments were conducted to evaluate the feasibility of the proposed system focusing on robot motion, window frame detection and move-over mechanism. The results show that WiBot can overcome the limitations of existing window cleaning solutions. Finally, several promising research directions are mentioned involving the proposed reconfigurable robot architecture in cleaning operations.

16:30-18:00, Paper ThCT4-CC.5	Add to My Program
Collaborative Manipulation of Deformable Objects with Predictive Obstacle Avoidance

Aksoy, Burak	Rensselaer Polytechnic Institute
Wen, John	Rensselaer Polytechnic Institute
Keywords: Cooperating Robots, Collision Avoidance, Simulation and Animation Abstract: Manipulating deformable objects arises in daily life and numerous applications. Despite phenomenal advances in industrial robotics, manipulation of deformable objects remains mostly a manual task. This is because of the high number of internal degrees of freedom and the complexity of predicting its motion. In this paper, we apply the computationally efficient position-based dynamics method to predict object motion and distance to obstacles. This distance is incorporated in a control barrier function for the resolved motion kinematic control for one or more robots to adjust their motion to avoid colliding with the obstacles. The controller has been applied in simulations to 1D and 2D deformable objects with varying numbers of assistant agents, demonstrating its versatility across different object types and multi-agent systems. Results indicate the feasibility of real-time collision avoidance through deformable object simulation, minimizing path tracking error while maintaining a predefined minimum distance from obstacles and preventing overstretching of the deformable object. The implementation is performed in ROS, allowing ready portability to different applications.

16:30-18:00, Paper ThCT4-CC.6	Add to My Program
D-Lite: Navigation-Oriented Compression of 3D Scene Graphs for Multi-Robot Collaboration

Chang, Yun	MIT
Ballotta, Luca	Delft University of Technology
Carlone, Luca	Massachusetts Institute of Technology
Keywords: Cooperating Robots, Multi-Robot Systems, Motion and Path Planning Abstract: For a multi-robot team that collaboratively explores an unknown environment, it is of vital importance that the collected information is efficiently shared among robots in order to support exploration and navigation tasks. Practical constraints of wireless channels, such as limited bandwidth, urge robots to carefully select information to be transmitted. In this paper, we consider the case where environmental information is modeled using a 3D Scene Graph, a hierarchical map representation that describes both geometric and semantic aspects of the environment. Then, we leverage graph-theoretic tools, namely graph spanners, to design greedy algorithms that efficiently compress 3D Scene Graphs with the aim of enabling communication between robots under bandwidth constraints. Our compression algorithms are navigation-oriented in that they are designed to approximately preserve shortest paths between locations of interest, while meeting a user-specified communication budget constraint. The effectiveness of the proposed algorithms is demonstrated in synthetic robot navigation experiments in a realistic simulator.

16:30-18:00, Paper ThCT4-CC.7	Add to My Program
ColAG: A Collaborative Air-Ground Framework for Perception-Limited UGVs' Navigation

Li, Zhehan	Zhejiang University
Mao, Rui	Sun Yat-Sen University
Chen, Nanhe	Zhejiang University
Xu, Chao	Zhejiang University
Gao, Fei	Zhejiang University
Cao, Yanjun	Zhejiang University, Huzhou Institute of Zhejiang University
Keywords: Cooperating Robots, Multi-Robot Systems, Planning, Scheduling and Coordination Abstract: Perception is necessary for autonomous navigation in an unknown area crowded with obstacles. It’s challenging for a robot to navigate safely without any sensors that can sense the environment, resulting in a blind robot, and becomes more difficult when comes to a group of robots. However, it could be costly to equip all robots with expensive perception or SLAM systems. In this paper, we propose a novel system named ColAG, to solve the problem of autonomous navigation for a group of blind UGVs by introducing cooperation with one UAV, which is the only robot that has full perception capabilities in the group. The UAV uses SLAM for its odometry and mapping while sharing this information with UGVs via limited relative pose estimation. The UGVs plan their trajectories in the received map and predict possible failures caused by the uncertainty of its wheel odometry and unknown risky areas. The UAV dynamically schedules waypoints to prevent UGVs from collisions, formulated as a Vehicle Routing Problem with Time Windows to optimize the UAV’s trajectories and minimize time when UGVs have to wait to guarantee safety. We validate our system through extensive simulation with 7 UGVs and real-world experiments with 3 UGVs.

16:30-18:00, Paper ThCT4-CC.8	Add to My Program
GRF-Based Predictive Flocking Control with Dynamic Pattern Formation

Yu, Chenghao	Sun Yat-Sen University
Zhang, Dengyu	Sun Yat-Sen University
Zhang, Qingrui	Sun Yat-Sen University
Keywords: Cooperating Robots, Swarm Robotics, Multi-Robot Systems Abstract: It is promising but challenging to design flocking control for a robot swarm to autonomously follow changing patterns or shapes in a optimal distributed manner. The optimal flocking control with dynamic pattern formation is, therefore, investigated in this paper. A predictive flocking control algorithm is proposed based on a Gibbs random field (GRF), where bio-inspired potential energies are used to charaterize ``robot-robot'' and ``robot-environment'' interactions. Specialized performance-related energies, e.g., motion smoothness, are introduced in the proposed design to improve the flocking behaviors. The optimal control is obtained by maximizing a posterior distribution of a GRF. A region-based shape control is accomplished for pattern formation in light of a mean shift technique. The proposed algorithm is evaluated via the comparison with two state-of-the-art flocking control methods in an environment with obstacles. Both numerical simulations and real-world experiments are conducted to demonstrate the efficiency of the proposed design.


ThCT9-CC Oral Session, CC-419	Add to My Program
Task and Motion Planning II

Chair: Kessens, Chad C.	United States Army Research Laboratory
Co-Chair: Domae, Yukiyasu	The National Institute of Advanced Industrial Science and Technology (AIST)

16:30-18:00, Paper ThCT9-CC.1	Add to My Program
When Prolog Meets Generative Models: A New Approach for Managing Knowledge and Planning in Robotic Applications

Saccon, Enrico	University of Trento
Tikna, Ahmet	University of Trento
De Martini, Davide	Università Degli Studi Di Trento
Lamon, Edoardo	University of Trento
Palopoli, Luigi	University of Trento
Roveri, Marco	University of Trento
Keywords: Task and Motion Planning, Multi-Robot Systems, Human-Robot Collaboration Abstract: In this paper, we propose a robot oriented knowledge representation system based on the use of the Prolog language. Our framework hinges on a special organisation of knowledge base that enables: 1) its efficient population from natural language texts using semi-automated procedures based on Large Language Models (LLMs); 2) the seamless generation of temporal parallel plans for multi-robot systems through a sequence of transformations; 3) the automated translation of the plan into an executable formalism. The framework is supported by a set of open source tools and its functionality is shown with a realistic application.

16:30-18:00, Paper ThCT9-CC.2	Add to My Program
HAPFI: History-Aware Planning Based on Fused Information

Jeon, Sujin	Seoul National University
Shin, Suyeon	Seoul National University
Zhang, Byoung-Tak	Seoul National University
Keywords: Deep Learning Methods, AI-Based Methods, Task Planning Abstract: Embodied Instruction Following (EIF) is a task of planning a long sequence of sub-goals given high-level natural language instructions, such as ''Rinse a slice of lettuce and place on the white table next to the fork''. To successfully execute these long-term horizon tasks, we argue that an agent must consider its past, i.e., historical data, when making decisions in each step. Nevertheless, recent approaches in EIF often neglects the knowledge from historical data and also do not effectively utilize information across the modalities. To this end, we propose History-Aware Planning based on Fused Information(HAPFI), effectively leveraging the historical data from diverse modalities that agents collect while interacting with the environment. Specifically, HAPFI integrates multiple modalities, including historical RGB observations, bounding boxes, sub-goals, and high-level instructions, by effectively fusing modalities via our Mutually Attentive Fusion method. Through experiments with diverse comparisons, we show that an agent utilizing historical multi-modal information surpasses all the compared methods that neglect the historical data in terms of action planning capability, enabling the generation of well-informed action plans for the next step. Moreover, we provided qualitative evidence highlighting the significance of leveraging historical multi-modal data, particularly in scenarios where the agent encounters intermediate failures, showcasing its robust re-planning capabilities.

16:30-18:00, Paper ThCT9-CC.3	Add to My Program
Non-Axiomatic Reasoning for an Autonomous Mobile Robot

Hammer, Patrick	KTH Royal Institute of Technology
Isaev, Peter	Temple University
Feng, Lei	KTH Royal Institute of Technology
Johansson, Robert	Stockholm University
Tumova, Jana	KTH Royal Institute of Technology
Keywords: Sensorimotor Learning, Learning from Experience, AI-Based Methods Abstract: We present the integration of a Non-Axiomatic Reasoning System (NARS) with mobile robots for planning and decision making. NARS enables robots to effectively handle uncertainty in real-time with complete sensor and actuator integration, thereby ensuring adaptability to evolving scenarios. We discuss essential parts of the logic, the architecture and working principles of NARS, and the integration of NARS as a ROS node. A case study is provided demonstrating the system's proficiency to carry out a garbage collection task in an open-air environment by operating a mobile robot with manipulator arm, and we demonstrate its ability to learn about the place-dependent accumulation of garbage items. Case study also reveals that our approach performs more effectively on the overall task than the Belief-Desire-Intention model we compared with.

16:30-18:00, Paper ThCT9-CC.4	Add to My Program
Asynchronous Task Plan Refinement for Multi-Robot Task and Motion Planning

Sung, Yoonchang	The University of Texas at Austin
Shome, Rahul	The Australian National University
Stone, Peter	University of Texas at Austin
Keywords: Task and Motion Planning, Motion and Path Planning Abstract: This paper explores general multi-robot task and motion planning, where multiple robots in close proximity manipulate objects while satisfying constraints and a given goal. In particular, we formulate the plan refinement problem—which, given a task plan, finds valid assignments of variables corresponding to solution trajectories—as a hybrid constraint satisfaction problem. The proposed algorithm follows several design principles that yield the following features: (1) efficient solution finding due to sequential heuristics and implicit time and roadmap representations, and (2) maximized feasible solution space obtained by introducing minimally necessary coordination-induced constraints and not relying on prevalent simplifications that exist in the literature. The evaluation results demonstrate the planning efficiency of the proposed algorithm, outperforming the synchronous approach in terms of makespan.

16:30-18:00, Paper ThCT9-CC.5	Add to My Program
Optimal Planning for Timed Partial Order Specifications

Watanabe, Kandai	University of Colorado Boulder
Fainekos, Georgios	Toyota NA-R&D
Hoxha, Bardh	Southern Illinois University
Lahijanian, Morteza	University of Colorado Boulder
Okamoto, Hideki	Toyota Motor North America
Sankaranarayanan, Sriram	University of Colorado, Boulder
Keywords: Task and Motion Planning, Formal Methods in Robotics and Automation Abstract: This paper addresses the challenge of planning a sequence of tasks to be performed by multiple robots while minimizing the overall completion time subject to timing and precedence constraints. Our approach uses the Timed Partial Orders (TPO) model to specify these constraints. We translate this problem into a Traveling Salesman Problem (TSP) variant with timing and precedent constraints, and we solve it as a Mixed Integer Linear Programming (MILP) problem. Our contributions include a general planning framework for TPO specifications, a MILP formulation accommodating time windows and precedent constraints, its extension to multi-robot scenarios, and a method to quantify plan robustness. We demonstrate our framework on several case studies, including an aircraft turnaround task involving three Jackal robots, highlighting the approach's potential applicability to important real-world problems. Our benchmark results show that our MILP method outperforms state-of-the-art open-source TSP solvers OR-Tools.

16:30-18:00, Paper ThCT9-CC.6	Add to My Program
On the Convergence of a Closed-Loop Inverse Kinematics Solver with Time-Varying Task Functions

Fiore, Mario Daniele	Università Degli Studi Della Campania "Luigi Vanvitelli"
Natale, Ciro	Università Degli Studi Della Campania "Luigi Vanvitelli"
Keywords: Constrained Motion Planning, Motion Control, Formal Methods in Robotics and Automation Abstract: Many control algorithms devised to allow redundant robots to execute complex multiple tasks with priorities require a numerical inverse kinematics (IK) solver. The present letter investigates the conditions that, if satisfied, guarantee that a specific module of closed-loop numerical IK solvers, which is at the kernel of some of the aforementioned algorithms, converges to a feasible solution. The investigation has the objective to prove the convergence in those cases when the task function is time-varying. The conditions found to ensure convergence include not only the initial task error and the loop gain - as it happens for stationary task functions - but also the maximum sampling time to be used in the computation of the solution.

16:30-18:00, Paper ThCT9-CC.7	Add to My Program
PROTAMP-RRT: A Probabilistic Integrated Task and Motion Planner Based on RRT

Saccuti, Alessio	University of Parma
Monica, Riccardo	University of Parma
Aleotti, Jacopo	University of Parma
Keywords: Task and Motion Planning, Manipulation Planning, Motion and Path Planning Abstract: Solving complex robot manipulation tasks requires a Task and Motion Planner (TAMP) that searches for a sequence of symbolic actions, i.e. a task plan, and also computes collision-free motion paths. As the task planner and the motion planner are closely interconnected TAMP is considered a challenging problem. In this paper, a Probabilistic Integrated Task and Motion Planner (PROTAMP-RRT) is presented. The proposed method is based on a unified Rapidly-exploring Random Tree (RRT) that operates on both the geometric space and the symbolic space. The RRT is guided by the task plan and it is enhanced with a probabilistic model that estimates the probability of sampling a new robot configuration towards the next sub-goal of the task plan. When the RRT is extended, the probabilistic model is updated alongside. The probabilistic model is used to generate a new task plan if the feasibility of the previous one is unlikely. The performance of PROTAMP-RRT was assessed in simulated pick-and-place tasks, and it was compared against state-of-the-art approaches TM-RRT and Planet, showing better performance.


ThCT11-CC Oral Session, CC-502	Add to My Program
Learning from Demonstration III

Chair: Si, Weiyong	University of Essex
Co-Chair: Ariki, Yuka	Sony Group Corporation

16:30-18:00, Paper ThCT11-CC.1	Add to My Program
Learning Barrier-Certified Polynomial Dynamical Systems for Obstacle Avoidance with Robots

Schonger, Martin	Technical University of Munich
Kussaba, Hugo Tadashi	Technical University of Munich
Chen, Lingyun	Technical University of Munich
Figueredo, Luis	Technical University of Munich (TUM)
Swikir, Abdalla	Technical University of Munich
Billard, Aude	EPFL
Haddadin, Sami	Technical University of Munich
Keywords: Learning from Demonstration, Formal Methods in Robotics and Automation, Robot Safety Abstract: Established techniques that enable robots to learn from demonstrations are based on learning a stable dynamical system (DS). To increase the robots' resilience to perturbations during tasks that involve static obstacle avoidance, we propose incorporating barrier certificates into an optimization problem to learn a stable and barrier-certified DS. Such optimization problem can be very complex or extremely conservative when the traditional linear parameter-varying formulation is used. Thus, different from previous approaches in the literature, we propose to use polynomial representations for DSs, which yields an optimization problem that can be tackled by sum-of-squares techniques. Finally, our approach can handle obstacle shapes that fall outside the scope of assumptions typically found in the literature concerning obstacle avoidance within the DS learning framework.

16:30-18:00, Paper ThCT11-CC.2	Add to My Program
Domain Adaptation of Visual Policies with a Single Demonstration

Wang, Weiyao	The Johns Hopkins University
Hager, Gregory	Johns Hopkins University
Keywords: Transfer Learning, Learning from Demonstration, Deep Learning Methods Abstract: Deploying machine learning algorithms for robot tasks in real-world applications presents a core challenge: overcoming the domain gap between the training and the deployment environment. This is particularly difficult for visuomotor policies that utilize high-dimensional images as input, particularly when those images are generated via simulation. A common method to tackle this issue is through domain randomization, which aims to broaden the span of the training distribution to cover the test-time distribution. However, this approach is only effective when the domain randomization encompasses the actual shifts in the test-time distribution. We take a different approach, where we make use of a single demonstration (a prompt) to learn policy that adapts to the testing target environment. Our proposed framework, PromptAdapt, leverages the Transformer architecture's capacity to model sequential data to learn demonstration-conditioned visual policies, allowing for in-context adaptation to a target domain that is distinct from training. Our experiments in both simulation and real-world settings show that PromptAdapt is a strong domain-adapting policy that outperforms baseline methods by a large margin under a range of domain shifts, including variations in lighting, color, texture, and camera pose.

16:30-18:00, Paper ThCT11-CC.3	Add to My Program
Learning Complex Motion Plans Using Neural ODEs with Safety and Stability Guarantees

Savvas Sadiq Ali, Farhad Nawaz	University of Pennsylvania
Li, Tianyu	University of Pennsylvania
Matni, Nikolai	University of Pennsylvania
Figueroa, Nadia	University of Pennsylvania
Keywords: Learning from Demonstration, Safety in HRI, Robust/Adaptive Control Abstract: We propose a Dynamical System (DS) approach to learn complex, possibly periodic motion plans from kinesthetic demonstrations using Neural Ordinary Differential Equations (NODE). To ensure reactivity and robustness to disturbances, we propose a novel approach that selects a target point at each time step for the robot to follow, by combining tools from control theory and the target trajectory generated by the learned NODE. A correction term to the NODE model is computed online by solving a quadratic program that guarantees stability and safety using control Lyapunov functions and control barrier functions, respectively. Our approach outperforms baseline DS learning techniques on the LASA handwriting dataset and complex periodic trajectories. It is also validated on the Franka Emika robot arm to produce stable motions for wiping and stirring tasks that do not have a single attractor, while being robust to perturbations and safe around humans and obstacles.

16:30-18:00, Paper ThCT11-CC.4	Add to My Program
Learning a Stable Dynamic System with a Lyapunov Energy Function for Demonstratives Using Neural Networks

Zhang, Yu	University of Chinese Academy of Sciences
Zou, Yongxiang	Institute of Automation, Chinese Academy of Sciences
Zhang, Haoyu	Institute of Automation, Chinese Academy of Sciences
Xia, Xiuze	Institute of Automation, Chinese Academy of Sciences
Cheng, Long	Chinese Academy of Sciences
Keywords: Learning from Demonstration, Imitation Learning, Motion and Path Planning Abstract: Autonomous Dynamic System (DS)-based algorithms hold a pivotal and foundational role in the field of Learning from Demonstration (LfD). Nevertheless, they confront the formidable challenge of striking a delicate balance between achieving precision in learning and ensuring the overall stability of the system. In response to this substantial challenge, this paper introduces a novel DS algorithm rooted in neural network technology. This algorithm not only possesses the capability to extract critical insights from demonstration data but also demonstrates the capacity to learn a candidate Lyapunov energy function that is consistent with the provided demonstrations. The model presented in this paper employs a simplistic neural network architecture that excels in fulfilling a dual objective: optimizing accuracy while simultaneously preserving global stability. To comprehensively evaluate the effectiveness of the proposed algorithm, rigorous assessments are conducted using the LASA dataset, further reinforced by empirical validation through a robotic experiment.

16:30-18:00, Paper ThCT11-CC.5	Add to My Program
Learning a Flexible Neural Energy Function with a Unique Minimum for Globally Stable and Accurate Demonstration Learning

Jin, Zhehao	Zhejiang University of Technology
Si, Weiyong	University of Essex
Liu, Andong	Zhejiang University of Technology
Zhang, Wen-An	Zhejiang University of Technology, China
Yu, Li	Zhejiang University of Technology
Yang, Chenguang	University of Liverpool
Keywords: Learning from Demonstration, Learning and Adaptive Systems, Cooperating Robots, dynamic system learning Abstract: Learning a stable autonomous dynamic system (ADS) encoding human motion rules has been shown as an effective way for demonstration learning. However, the stability guarantee may sacrifice the demonstration learning accuracy. This article solves the issue by learning a stability certificate, represented by a neural energy function, on the demonstration set. We propose a Polar-like space analysis approach to derive parameter constraints to guarantee the unique-minimum property of the neural energy function, which is essential for it to be a cogent stability certificate. Then, the neural energy function is learned to capture the demonstration preferences via constrained optimization algorithms. With the learned neural energy function, a globally asymptotically stable ADS with predefined position constraint is further formulated. We also quantitatively analyze the generalization ability of the learned ADS by utilizing the substantial flexibility of the neural energy function. The effectiveness of the proposed approach is validated on the LASA data set and two representative robotic experiments.

16:30-18:00, Paper ThCT11-CC.6	Add to My Program
Inverse Constraint Learning and Generalization by Transferable Reward Decomposition

Jang, Jaehwi	Korea Advanced Institute of Science and Technology
Song, Minjae	KAIST
Park, Daehyung	Korea Advanced Institute of Science and Technology, KAIST
Keywords: Learning from Demonstration, Constrained Motion Planning Abstract: We present the problem of inverse constraint learning (ICL), which recovers constraints from demonstrations to autonomously reproduce constrained skills in new scenarios. However, ICL suffers from an ill-posed nature, leading to inaccurate inference of constraints from demonstrations. To figure it out, we introduce a transferable constraint learning (TCL) algorithm that jointly infers a task-oriented reward and a task-agnostic constraint, enabling the generalization of learned skills. Our method TCL additively decomposes the overall reward into a task reward and its residual as soft constraints, maximizing policy divergence between task- and constraint-oriented policies to obtain a transferable constraint. Evaluating our method and five baselines in three simulated environments, we show TCL outperforms state-of-the-art IRL and ICL algorithms, achieving up to a 72% higher task-success rates with accurate decomposition compared to the next best approach in novel scenarios. Further, we demonstrate the robustness of TCL on two real-world robotic tasks.

16:30-18:00, Paper ThCT11-CC.7	Add to My Program
Learning Robot Motion in a Cluttered Environment Using Unreliable Human Skeleton Data Collected by a Single RGB Camera

Takamido, Ryota	Research into Artifacts, Center for Engineering (RACE), School O
Ota, Jun	The University of Tokyo
Keywords: Learning from Demonstration, Motion and Path Planning, Collision Avoidance Abstract: Current learning from demonstration (LfD) frameworks have difficulty dealing with an unreliable, limited number of demonstrations. To address this issue, we proposed a novel motion planning framework referred to as experience-driven random tree connect with human demonstration (ERTC-HD), which can facilitate the identification of valid motions in cluttered environments by only using human skeleton information extracted from a single red, green, and blue (RGB) camera. The point of this framework is to only extract the comprehensive features of human motion from unreliable demonstrations and use them as a rough estimate for solving complex planning problems instead of as a strict solution. During the process of ERTC-HD, robot motions generated from extracted comprehensive features of human motion are saved as a path experience and modified through the path adaptation process of an existing ERTC planner when transferring it to the new problem. The results of three simulation experiments revealed that the ERTC-HD could identify valid motion in cluttered environments within shorter time periods than other state-of-the-art planners even when using unreliable demonstration data collected by a single RGB camera. The reduction of the required accuracy of the original information resources can extend the range of applications of this LfD framework.


ThCT13-AX Oral Session, AX-201	Add to My Program
Human-Robot Collaboration VI

Chair: Vijayakumar, Sethu	University of Edinburgh
Co-Chair: Secchi, Cristian	Univ. of Modena & Reggio Emilia

16:30-18:00, Paper ThCT13-AX.1	Add to My Program
Robust and Dexterous Dual-Arm Tele-Cooperation Using Adaptable Impedance Control

Kouhkiloui Babarahmati, Keyhan	University of Edinburgh
Kasaei, Mohammadreza	University of Edinburgh
Tiseo, Carlo	University Fo Sussex
Mistry, Michael	University of Edinburgh
Vijayakumar, Sethu	University of Edinburgh
Keywords: Human-Robot Collaboration, Physical Human-Robot Interaction, Telerobotics and Teleoperation Abstract: In recent years, the need for robots to transition from isolated industrial tasks to shared environments, including human-robot collaboration and teleoperation, has become increasingly evident. Building on the foundation of Fractal Impedance Control (FIC) introduced in our previous work, this paper presents a novel extension to dual-arm tele-cooperation, leveraging the non-linear stiffness and passivity of FIC to adapt to diverse cooperative scenarios. Unlike traditional impedance controllers, our approach ensures stability without relying on energy tanks, as demonstrated in our prior research. In this paper, we further extend the FIC framework to bimanual operations, allowing for stable and smooth switching between different dynamic tasks without gain tuning. We also introduce a telemanipulation architecture that offers higher transparency and dexterity, addressing the challenges of signal latency and low-bandwidth communication. Through extensive experiments, we validate the robustness of our method and the results confirm the advantages of the FIC approach over traditional impedance controllers, showcasing its potential for applications in planetary exploration and other scenarios requiring dexterous telemanipulation. This paper's contributions include the seamless integration of FIC into multi-arm systems, the ability to perform robust interactions in highly variable environments, and the provision of a comprehensive comparison with competing approaches, thereby significantly enhancing the robustness and adaptability of robotic systems.

16:30-18:00, Paper ThCT13-AX.2	Add to My Program
PlanCollabNL: Leveraging Large Language Models for Adaptive Plan Generation in Human-Robot Collaboration

Izquierdo-Badiola, Silvia	Eurecat
Canal, Gerard	King's College London
Rizzo, Carlos	University of Zaragoza
Alenyà, Guillem	CSIC-UPC
Keywords: Human-Robot Collaboration, Task Planning, AI-Enabled Robotics Abstract: "Hey, robot. Let's tidy up the kitchen. By the way, I have back pain today". How can a robotic system devise a shared plan with an appropriate task allocation from this abstract goal and agent condition? Classical AI task planning has been explored for this purpose, but it involves a tedious definition of an inflexible planning problem. Large Language Models (LLMs) have shown promising generalisation capabilities in robotics decision-making through knowledge extraction from Natural Language (NL). However, the translation of NL information into constrained robotics domains remains a challenge. In this paper, we use LLMs as translators between NL information and a structured AI task planning problem, targeting human-robot collaborative plans. The LLM generates information that is encoded in the planning problem, including specific subgoals derived from an NL abstract goal, as well as recommendations for subgoal allocation based on NL agent conditions. The framework, PlanCollabNL, is evaluated for a number of goals and agent conditions, and the results show that correct and executable plans are found in most cases. With this framework, we intend to add flexibility and generalisation to HRC plan generation, eliminating the need for a manual and laborious definition of restricted planning problems and agent models.

16:30-18:00, Paper ThCT13-AX.3	Add to My Program
Multi-Agent Strategy Explanations for Human-Robot Collaboration

Pandya, Ravi	Carnegie Mellon University
Zhao, Michelle	Carnegie Mellon University
Liu, Changliu	Carnegie Mellon University
Simmons, Reid	Carnegie Mellon University
Admoni, Henny	Carnegie Mellon University
Keywords: Human-Robot Teaming, Human-Robot Collaboration Abstract: As robots are deployed in human spaces, it is important that they are able to coordinate their actions with the people around them. Part of such coordination involves ensuring that people have a good understanding of how a robot will act in the environment. This can be achieved through explanations of the robot's policy. Much prior work in explainable AI and RL focuses on generating explanations for single-agent policies, but little has been explored in generating explanations for collaborative policies. In this work, we investigate how to generate multi-agent strategy explanations for human-robot collaboration. We formulate the problem using a generic multi-agent planner, show how to generate visual explanations through strategy-conditioned landmark states and generate textual explanations by giving the landmarks to an LLM. Through a user study, we find that when presented with explanations from our proposed framework, users are able to better explore the full space of strategies and collaborate more efficiently with new robot partners.

16:30-18:00, Paper ThCT13-AX.4	Add to My Program
Efficient ISO/TS 15066 Compliance through Model Predictive Control

Pupa, Andrea	University of Modena and Reggio Emilia
Secchi, Cristian	Univ. of Modena & Reggio Emilia
Keywords: Human-Robot Collaboration, Safety in HRI, Human-Aware Motion Planning Abstract: In the actual industrial scenarios, human operators and robots work together sharing the workspace. Such proximity requires special attention in ensuring safety for the human operator, which is often translated in collision avoidance behaviour or high speed reduction. Adhering safety however is not the only aspect that must be taken into account. For many tasks, such as welding, it is crucial to ensure that the robot performs exactly the planned path. To optimize robot performance while complying with safety regulations, this work introduces a novel optimal nonlinear control problem. It prioritizes path preservation, exploiting redundancy to minimize task execution time, while explicitly adhering to the constraints imposed by ISO/TS 15066. To achieve high-performance outcomes, the control problem is addressed using the Model Predictive Control (MPC) approach. The proposed strategy has been experimentally validated in both simulations and a real-world industrial task involving a Kuka LWR4+ robot.

16:30-18:00, Paper ThCT13-AX.5	Add to My Program
Dual-Mode Human-Robot Collaboration with Guaranteed Safety Using Time-Varying Zeroing Control Barrier Functions and Quadratic Program

Shi, Kaige	Nanyang Technological University
Hu, Guoqiang	Nanyang Technological University,
Keywords: Human-Robot Collaboration, Physical Human-Robot Interaction, Safety in HRI Abstract: Safety and efficiency are two important aspects of human-robot collaboration (HRC). Most existing control methods for HRC consider either contactless HRC or physical HRC, hindering more efficient HRC. The proposed control framework enables dual-mode HRC, filling the gap between contactless and physical HRCs. With the framework, the robot can perform contactless HRC under safety regulations regarding the co-working human. Meanwhile, the human can safely interrupt the robot via physical contact to enter physical HRC, in which he/she can hand guide the robot or take over its gripped object. First, human safety is defined as bounded approaching velocities between human and multiple robot links based on ISO/TS 15066, allowing gradual establishing of physical contact. Then, the time-varying zeroing control barrier function is proposed and defined to guarantee the bounded approaching velocities by a safety control set. Second, a unified task control set is designed to achieve different robot tasks for different HRC modes in a unified manner. The unified task control set enables the robot to switch smoothly between the two HRC modes. An optimal final control input is determined by a quadratic program (QP) based on different control sets. Experiments were conducted to verify the proposed framework and compare the proposed framework with existing methods. An application example is presented to show the versatility of the proposed framework.

16:30-18:00, Paper ThCT13-AX.6	Add to My Program
A Time-Optimal Energy Planner for Safe Human-Robot Collaboration

Pupa, Andrea	University of Modena and Reggio Emilia
Minelli, Marco	University of Modena and Reggio Emilia
Secchi, Cristian	Univ. of Modena & Reggio Emilia
Keywords: Human-Robot Collaboration, Safety in HRI, Human-Aware Motion Planning Abstract: The human-robot collaboration scenarios are characterized by the presence of human operators and robots that work in close contact with each other. As a consequence, the safety regulations have been updated in order to provide guidelines on how to asses safety in these new scenarios. In particular, Power and Force Limiting (PFL) collaborative mode describes how the energy should be regulated during the collaboration. Based on these guidelines, we propose a new optimal trajectory planner which, by exploiting the variability of the robot's inertia as a function of its configuration, is able to return trajectories that can be travelled at greater speed and in less time, while guaranteeing the safety limits according to the standard. The proposed planner was validated first in simulation, comparing completion times with other state-of-the-art planning algorithms, and then experimentally, demonstrating the performance of the planned trajectories during physical interaction with the environment. Both validations confirm the effectiveness of the proposed planner, which returns shorter completion times while ensuring safe interaction.

16:30-18:00, Paper ThCT13-AX.7	Add to My Program
Discuss before Moving: Visual Language Navigation Via Multi-Expert Discussions

Long, Yuxing	Peking University
Li, Xiaoqi	Peking University
Cai, Wenzhe	Southeast University
Dong, Hao	Peking University
Keywords: Vision-Based Navigation, Natural Dialog for HRI, Task Planning Abstract: Visual language navigation (VLN) is an embodied task demanding a wide range of skills encompassing understanding, perception, and planning. For such a multifaceted challenge, previous VLN methods totally rely on one model's own thinking to make predictions within one round. However, existing models, even the most advanced large language model GPT4, still struggle with dealing with multiple tasks by single-round self-thinking. In this work, drawing inspiration from the expert consultation meeting, we introduce a novel zero-shot VLN framework. Within this framework, large models possessing distinct abilities are served as domain experts. Our proposed navigation agent, namely DiscussNav, can actively discuss with these experts to collect essential information before moving at every step. These discussions cover critical navigation subtasks like instruction understanding, environment perception, and completion estimation. Through comprehensive experiments, we demonstrate that discussions with domain experts can effectively facilitate navigation by perceiving instruction-relevant information, correcting inadvertent errors, and sifting through in-consistent movement decisions. The performances on the representative VLN task R2R show that our method surpasses the leading zero-shot VLN model by a large margin on all metrics. Additionally, real-robot experiments display the obvious advantages of our method over single-round self-thinking.

16:30-18:00, Paper ThCT13-AX.8	Add to My Program
Personality and Memory-Based Software Framework for Human-Robot Interaction

Nardelli, Alice	University of Genoa
Sgorbissa, Antonio	University of Genova
Recchiuto, Carmine Tommaso	University of Genova
Keywords: Emotional Robotics, Cognitive Modeling, Human-Robot Collaboration Abstract: The synergic orchestration of the cognitive and psychological dimensions characterizes human intelligence. Accordingly, carefully designing this mechanism in artificial intelligence can be a successful strategy to increase human likeness in a robot, enhancing mutual understanding and building a more natural and intuitive interaction. For this purpose, the main contribution of this work is a psychological and cognitive architecture tailored for HRI based on the interplay between robotic personality and memory-based cognitive processes. Indeed, the artificial personality manifests itself not only in various aspects of the behavior but also within the action selection process, which is closely intertwined with personality-dependent hedonic experiences linked to memories. Within this paper, we propose a task- and platform-independent framework, evaluated in a multiparty collaborative scenario. Obtained results show that a robot connected to our proposed framework is perceived as a cognitive agent capable of manifesting perceivable and distinguishable personality traits.


ThCT16-AX Oral Session, AX-204	Add to My Program
Wearable Robotics II

Chair: Ryu, Jee-Hwan	Korea Advanced Institute of Science and Technology
Co-Chair: Mohammed, Samer	University of Paris Est Créteil - (UPEC)

16:30-18:00, Paper ThCT16-AX.1	Add to My Program
Leaf-Inspired FSR Array and Insole-Type Sensor Module for Mobile Three-Dimensional Ground Reaction Force Estimation

Kim, Taeyeon	Korea Advanced Institute of Science and Technology
Song, Eunseok	Korea Advanced Institute of Science and Technology (KAIST)
An, Seongbin	KAIST
Choi, Hyunjin	Sangmyung University
Kong, Kyoungchul	Korea Advanced Institute of Science and Technology
Keywords: Wearable Robotics, Soft Sensors and Actuators Abstract: This paper presents an insole-type sensor module with a novel leaf-inspired force-sensitive resistor (FSR) array for accurate three-dimensional ground reaction force (GRF) estimation during human's various motions. Joint torque analysis, essential for numerous applications in biomechanics and wearable robotics, necessitates the measurement of three-dimensional GRF vector information, traditionally achieved in indoor environments using costly force plates. To overcome these limitations, this study proposes an alternative method by incorporating FSRs on three inclined planes within the insole. A vector scaling process transforms the force values from the FSRs into the three-dimensional force vector, enabling continuous and user-independent estimation of GRF. The sensor module is integrated with machine learning, demonstrating its accuracy and usability in various motion scenarios. The results confirm the effectiveness of the leaf-inspired FSR array, giving the possibilities for portable and cost-effective motion analysis systems.

16:30-18:00, Paper ThCT16-AX.2	Add to My Program
Human-Exoskeleton Locomotion Interaction Experience Transfer: Speeding up and Improving the Performance of Preference-Based Optimizations of Exoskeleton Assistance During Walking

Li, Hongwu	Harbin Institute of Technology
Liu, Junchen	Harbin Institute of Technology
Wang, Ziqi	Harbin Institute of Technology
Ju, Haotian	Harbin Institute of Technology
Zheng, Tianjiao	Harbin Institute of Technology
Gao, Yongsheng	Harbin Institute of Technology
Zhao, Jie	Harbin Institute of Technology
Zhu, Yanhe	Harbin Institute of Technology
Keywords: Wearable Robotics Abstract: Preference-based optimizing methods have shown their advantages and potential in exploring individual, comfortable, and effective control strategies and assistance parameters of exoskeletons during locomotion. Research indicates that compared with naive wearers, knowledgeable wearers with abundant exoskeleton assistance experience have obvious advantages in speeding up the parameters exploration process and improving the assistant performance. However, there is no existing method that could utilize the human-exoskeleton locomotion interaction experience (HELIE) to assist naive wearers during the exploration process. In this work, we propose a novel preference-based human-exoskeleton locomotion interaction experience transfer (LIET) framework, which could speed up the exploration of human-preferred parameters and acquire more satisfying results for naive wearers via the HELIE acquired from knowledgeable wearers. In addition, based on the proposed LIET framework, we establish the mathematical expression of the HELIE transfer during exoskeleton assistance. This will promote the research that concerns utilizing HELIE for exoskeleton control parameters optimizations in the future. Finally, experiments demonstrate the proposed LIET framework could speed up the exploration process and acquire more satisfying optimized results for naive wearers.

16:30-18:00, Paper ThCT16-AX.3	Add to My Program
Design of a Knee-Joint Exoskeleton to Reduce Misalignment in Both the Sagittal and Coronal Planes

Sengupta, Shubhranil	Korea Advanced Institute of Science and Technology
Ryu, Jee-Hwan	Korea Advanced Institute of Science and Technology
Keywords: Wearable Robotics, Rehabilitation Robotics, Prosthetics and Exoskeletons Abstract: Many individuals experience knee dysfunctions attributed to the natural aging process and degenerative con- ditions. To aid individuals in regaining knee functionality, supportive exoskeletons were designed to be affixed to both the shin and thigh. However, a common issue encountered in knee exoskeletons involves the misalignment of joints between the exoskeleton and the user, resulting in discomfort and potential injuries. To reduce misalignment with the knee joint, it is essential for the thigh and shin harnesses of the exoskeleton to replicate the natural trajectories of the knee. However, achieving this is a complex task due to the shifting center of rotation of the knee in both the Sagittal and Coronal planes. Previous knee exoskeletons primarily focus on aligning the joint in the Sagittal plane, neglecting alignment in the other dimension due to inherent design constraints. For the first time, this study introduces a knee-joint exoskeleton capable of conforming to the natural movement of the knee in both the Sagittal and Coronal planes, with the aim of minimizing joint misalignment without the use of inherently soft materials. A spherical scissor linkage mechanism (SSLM) was utilized in conjunction with a customized guide rail to adjust the center of rotation of the SSLM. This configuration facilitates knee flexion/extension while accommodating the knee joint’s center of rotation in both the Sagittal and Coronal planes. The experimental outcomes demonstrated a substantial reduction in misalignment with the knee when compared to a commercial knee-support brace with a one-degree-of-freedom revolute joint.

16:30-18:00, Paper ThCT16-AX.4	Add to My Program
Adaptive Active Disturbance Rejection Control of an Actuated Ankle Foot Orthosis for Ankle Movement Assistance

Jradi, Rami	UPEC
Rifai, Hala	University of Paris Est Créteil
Mohammed, Samer	University of Paris Est Créteil - (UPEC)
Keywords: Wearable Robotics, Robust/Adaptive Control, Motion Control Abstract: Foot-drop (FD) is a post-stroke gait disorder characterized by impaired foot lifting during the swing phase. This paper focuses on providing a continuous ankle joint assistance throughout the gait cycle using an actuated ankle foot orthosis (AAFO). The control strategy is based on an adaptive active disturbance rejection controller (AADRC) such that the orthosis provides the only required amount of assistance to complement the human effort needed to perform the walking activity. The proposed controller exhibits adaptability, making it suitable for various subjects without the need for prior parameter identification. To demonstrate its effectiveness, the control strategy is experimentally validated with five healthy subjects and compared to state-of-the-art controllers.

16:30-18:00, Paper ThCT16-AX.5	Add to My Program
A Novel Funnel-Based L1 Adaptive Fuzzy Approach for the Control of an Actuated Ankle Foot Orthosis

Bey, Oussama	University Paris-Est Créteil - UPEC
Jradi, Rami	UPEC
Moon, Huiseok	LISSI-Lab, Universite De Paris-Est Creteil (UPEC)
Rifai, Hala	University of Paris Est Créteil
Das Sharma, Kaushik	University of Calcutta
Amirat, Yacine	University of Paris Est Créteil (UPEC)
Mohammed, Samer	University of Paris Est Créteil - (UPEC)
Keywords: Wearable Robotics, Prosthetics and Exoskeletons, Physically Assistive Devices Abstract: This paper introduces a novel funnel-based adaptive L1 fuzzy control strategy for assisting ankle joint movement during walking with the use of an actuated ankle foot orthosis (AAFO). A projection-based adaptation mechanism employing a fuzzy system is used to estimate the unknown time-varying parameters of the L1 control law, ensuring precise tracking of the AAFO-wearer system by the state estimator. The projection operator guarantees the convergence of the parameters while offering a limited amount of assistance torque. Funnel-based feedback control is used to mitigate the typical time lag seen when using L1-based approaches due to the presence of a low-pass filter commonly used in this type of approach. The effectiveness of the proposed control strategy is demonstrated through experiments involving five healthy subjects.

16:30-18:00, Paper ThCT16-AX.6	Add to My Program
Pneumatic Back Exoskeleton for Lifting Posture Detection and Correction

Chen, Yu	Nanyang Technological University
Wang, Minda	Nanyang Technological University
Wang, Yifan	Nanyang Technological University
Keywords: Wearable Robotics, Prosthetics and Exoskeletons, Soft Robot Applications Abstract: Low back pain is a widespread issue that affects people worldwide and can lead to serious conditions such as herniated discs, spinal stenosis, or lumbar radiculopathy. Improper posture while lifting heavy weights is a common cause of back pain, especially among laborers. However, current back exoskeletons are often bulky and require electric motors, making them challenging to use and consuming significant power. Some passive exoskeletons don’t require power, but their fixed stiffness constrains normal motion. This paper presents a novel solution: a pneumatic back exoskeleton made of structured fabrics that can adjust stiffness under various air pressures. Additionally, it includes IMU sensors to detect lifting posture and correct it in real time. The exoskeleton’s effectiveness was tested through lifting experiments, demonstrating that it significantly corrects lifting posture, reduces stress on the lumbar spine, and mitigates back muscle stress. This pneumatic back exoskeleton offers a promising solution to prevent low back pain during weight-lifting tasks and provides guidance for future back exoskeleton designs.


ThCT18-AX Oral Session, AX-206	Add to My Program
Representation Learning II

Chair: So, Peter	Technical University of Munich
Co-Chair: Fuxin, Li	Oregon State University

16:30-18:00, Paper ThCT18-AX.1	Add to My Program
CITR: A Coordinate-Invariant Task Representation for Robotic Manipulation

So, Peter	Technical University of Munich
Cabral Muchacho, Rafael Ignacio	KTH Royal Institute of Technology
Kirschner, Robin Jeanne	TU Munich, Institute for Robotics and Systems Intelligence
Swikir, Abdalla	Technical University of Munich
Figueredo, Luis	Technical University of Munich (TUM)
Abu-Dakka, Fares	Mondragon University
Haddadin, Sami	Technical University of Munich
Keywords: Representation Learning, Learning Categories and Concepts, Dexterous Manipulation Abstract: The basis for robotics skill learning is an adequate representation of manipulation tasks based on their physical properties. As manipulation tasks are inherently invariant to the choice of reference frame, an ideal task representation would also exhibit this property. Nevertheless, most robotic learning approaches use unprocessed, coordinate-dependent robot state data for learning new skills, thus inducing challenges regarding the interpretability and transferability of the learned models. In this paper, we propose a transformation from spatial measurements to a coordinate-invariant feature space, based on the pairwise inner product of the input measurements. We describe and mathematically deduce the concept, establish the task fingerprints as an intuitive image-based representation, experimentally collect task fingerprints, and demonstrate the usage of the representation for task classification. This representation motivates further research on data-efficient and transferable learning methods for online manipulation task classification and task-level perception.

16:30-18:00, Paper ThCT18-AX.2	Add to My Program
SlotGNN: Unsupervised Discovery of Multi-Object Representations and Visual Dynamics

Rezazadeh, Alireza	University of Minnesota
Badithela, Athreyi	University of Minnesota - Twin Cities
Desingh, Karthik	University of Minnesota
Choi, Changhyun	University of Minnesota, Twin Cities
Keywords: Representation Learning, Deep Learning for Visual Perception, Visual Learning Abstract: Learning multi-object dynamics from visual data using unsupervised techniques is challenging due to the need for robust, object representations that can be learned through robot interactions. This paper presents a novel framework with two new architectures: SlotTransport for discovering object representations from RGB images and SlotGNN for predicting their collective dynamics from RGB images and robot interactions. Our SlotTransport architecture is based on slot attention for unsupervised object discovery and uses a feature transport mechanism to maintain temporal alignment in object-centric representations. This enables the discovery of slots that consistently reflect the composition of multi-object scenes. These slots robustly bind to distinct objects, even under heavy occlusion or absence. Our SlotGNN, a novel unsupervised graph-based dynamics model, predicts the future state of multi-object scenes. SlotGNN learns a graph representation of the scene using the discovered slots from SlotTransport and performs relational and spatial reasoning to predict the future appearance of each slot conditioned on robot actions. We demonstrate the effectiveness of SlotTransport in learning object-centric features that accurately encode both visual and positional information. Further, we highlight the accuracy of SlotGNN in downstream robotic tasks, including challenging multi-object rearrangement and long-horizon prediction. Finally, our unsupervised approach proves effective in the real world. With only minimal additional data, our framework robustly predicts slots and their corresponding dynamics in real-world control tasks.

16:30-18:00, Paper ThCT18-AX.3	Add to My Program
What Do We Learn from a Large-Scale Study of Pre-Trained Visual Representations in Sim and Real Environments?

Silwal, Sneha	Meta
Yadav, Karmesh	Georgia Tech
Wu, Tingfan	Meta AI
Vakil, Jay	Meta
Majumdar, Arjun	Georgia Institute of Technology
Arnaud, Sergio	Meta
Chen, Claire	Stanford University
Berges, Vincent-Pierre	Meta AI Research
Batra, Dhruv	Georgia Tech / Facebook AI Research
Rajeswaran, Aravind	Meta AI
Kalakrishnan, Mrinal	Meta
Meier, Franziska	Facebook
Maksymets, Oleksandr	Facebook AI Research
Keywords: Representation Learning, Deep Learning for Visual Perception, Vision-Based Navigation Abstract: We present a large empirical investigation on the use of pre-trained visual representations (PVRs) for training downstream policies that execute real-world tasks. Our study involves five different PVRs, each trained for five distinct manipulation or indoor navigation tasks. We performed this evaluation using three different robots and two different policy learning paradigms. From this effort, we can arrive at three insights: 1) the performance trends of PVRs in the simulation are generally indicative of their trends in the real world, 2) the use of PVRs enables a first-of-its-kind result with indoor ImageNav (zero-shot transfer to a held-out scene in the real world), and 3) the benefits from variations in PVRs, primarily data-augmentation and fine-tuning, also transfer to the real-world performance.

16:30-18:00, Paper ThCT18-AX.4	Add to My Program
L-DYNO: Framework to Learn Consistent Visual Features Using Robot’s Motion

Singh, Kartikeya	University at Buffalo
Adhivarahan, Charuvahan	University at Buffalo, State University of New York
Dantu, Karthik	University of Buffalo
Keywords: Representation Learning, Deep Learning for Visual Perception, Localization Abstract: Historically, feature-based approaches have been used extensively for camera-based robot perception tasks such as localization, mapping, tracking, and others. Several of these approaches also combine other sensors (inertial sensing, for example) to perform combined state estimation. Our work rethinks this approach; we present a representation learning mechanism that identifies visual features that best correspond to robot motion as estimated by an external signal. Specifically, we utilize the robot’s transformations through an external signal (inertial sensing, for example) and give attention to image space that is most consistent with the external signal. We use a pairwise consistency metric as a representation to keep the visual features consistent through a sequence with the robot’s relative pose transformations. This approach enables us to incorporate information from the robot’s perspective instead of solely relying on the image attributes. We evaluate our approach on real-world datasets such as KITTI & EuRoC and compare the refined features with existing feature descriptors. We also evaluate our method using our real robot experiment. We notice an average of 49% reduction in the image search space without compromising the trajectory estimation accuracy. Our method reduces the execution time of visual odometry by 4.3% and also reduces reprojection errors. We demonstrate the need to select only the most important features and show the competitiveness using various feature detection baselines.

16:30-18:00, Paper ThCT18-AX.5	Add to My Program
Point Cloud Models Improve Visual Robustness in Robotic Learners

Peri, Skand	Oregon State University
Lee, Iain	University of Utah
Kim, Chanho	Oregon State University
Fuxin, Li	Oregon State University
Hermans, Tucker	University of Utah
Lee, Stefan	Oregon State University
Keywords: Representation Learning, Reinforcement Learning Abstract: Visual control policies can encounter significant performance degradation when visual conditions like lighting or camera position differ from those seen during training -- often exhibiting sharp declines in capability even for minor differences. In this work, we examine robustness to a suite of these types of visual changes for RGB-D and point cloud based visual control policies. To perform these experiments on both model-free and model-based reinforcement learners, we introduce a novel Point Cloud World Model (PCWM) and point cloud based control policies. Our experiments show that policies that explicitly encode point clouds are significantly more robust than their RGB-D counterparts. Further, we find our proposed PCWM significantly outperforms prior works in terms of sample efficiency during training. Taken together, these results suggest reasoning about the 3D scene through point clouds can improve performance, reduce learning time, and increase robustness for robotic learners.

16:30-18:00, Paper ThCT18-AX.6	Add to My Program
HIO-SDF: Hierarchical Incremental Online Signed Distance Fields

Vasilopoulos, Vasileios	Samsung Research America
Garg, Suveer	University of Pennsylvania
Huh, Jinwook	Samsung
Lee, Bhoram	SRI International
Isler, Volkan	University of Minnesota
Keywords: Representation Learning, Incremental Learning Abstract: A good representation of a large, complex mobile robot workspace must be space-efficient yet capable of encoding relevant geometric details. When exploring unknown environments, it needs to be updatable incrementally in an online fashion. We introduce HIO-SDF, a new method that represents the environment as a Signed Distance Field (SDF). State of the art representations of SDFs are based on either neural networks or voxel grids. Neural networks are capable of representing the SDF continuously. However, they are hard to update incrementally as neural networks tend to forget previously observed parts of the environment unless an extensive sensor history is stored for training. Voxel-based representations do not have this problem but they are not space-efficient especially in large environments with fine details. HIO-SDF combines the advantages of these representations using a hierarchical approach which employs a coarse voxel grid that captures the observed parts of the environment together with high-resolution local information to train a neural network. HIO-SDF achieves a 46% lower mean global SDF error across all test scenes than a state of the art continuous representation, and a 30% lower error than a discrete representation at the same resolution as our coarse global SDF grid. Videos and code are available at: https://samsunglabs.github.io/HIO-SDF-project-page/

16:30-18:00, Paper ThCT18-AX.7	Add to My Program
Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders for Manipulation Policies

Qian, Jianing	University of Pennsylvania
Panagopoulos, Anastasios	University of Pennsylvania
Jayaraman, Dinesh	University of Pennsylvania
Keywords: Representation Learning, Imitation Learning, Sensorimotor Learning Abstract: Generic re-usable pre-trained image representation encoders have become a standard component of methods for many computer vision tasks. As visual representations for robots however, their utility has been limited, leading to a recent wave of efforts to pre-train robotics-specific image encoders that are better suited to robotic tasks than their generic counterparts. We propose Scene Objects From Transformers, abbreviated as SOFT, a wrapper around pre-trained vision transformer (PVT) models that bridges this gap without any further training. Rather than construct representations out of only the final layer activations, SOFT individuates and locates object-like entities from PVT attentions, and describes them with PVT activations, producing an object-centric representation. Across standard choices of generic pre-trained vision transformers PVT, we demonstrate in each case that policies trained on SOFT(PVT) far outstrip standard PVT representations for manipulation tasks in simulated and real settings, approaching the state-of-the-art robotics-aware representations. Appendix and videos: https://sites.google.com/view/robot-soft/

16:30-18:00, Paper ThCT18-AX.8	Add to My Program
NeRF-Loc: Transformer-Based Object Localization within Neural Radiance Fields

Sun, Jiankai	Stanford University
Xu, Yan	The Chinese University of Hong Kong
Ding, Mingyu	UC Berkeley
Yi, Hongwei	Max Planck Institute for Intelligent Systems
Wang, Chen	Stanford University
Wang, Jingdong	Baidu
Zhang, Liangjun	Baidu
Schwager, Mac	Stanford University
Keywords: Representation Learning, Semantic Scene Understanding, Computer Vision for Automation Abstract: Neural Radiance Fields (NeRFs) have become a widely-applied scene representation technique in recent years, showing advantages for robot navigation and manipulation tasks. To further advance the utility of NeRFs for robotics, we propose a transformer-based framework, NeRF-Loc, to extract 3D bounding boxes of objects in NeRF scenes. NeRF-Loc takes a pre-trained NeRF model and camera view as input and produces labeled, oriented 3D bounding boxes of objects as output. Using current NeRF training tools, a robot can train a NeRF environment model in real-time and, using our algorithm, identify 3D bounding boxes of objects of interest within the NeRF for downstream navigation or manipulation tasks. Concretely, we design a pair of paralleled transformer encoder branches, namely the coarse stream and the fine stream, to encode both the context and details of target objects. The encoded features are then fused together with attention layers to alleviate ambiguities for accurate object localization. We have compared our method with conventional RGB(-D) based methods that take rendered RGB images and depths from NeRFs as inputs. Our method achieves state-of-the-art performance. In addition, we also present the first NeRF samples-based object localization benchmark NeRFLocBench. We will make the benchmark and code publicly available.


ThCT19-NT Oral Session, NT-G301	Add to My Program
Surgical Robotics III

Chair: Huang, Huang	University of California at Berkeley
Co-Chair: Iordachita, Ioan Iulian	Johns Hopkins University

16:30-18:00, Paper ThCT19-NT.1	Add to My Program
Iterative PnP and Its Application in 3D-2D Vascular Image Registration for Robot Navigation

Song, Jingwei	University of Michigan
Yang, Keke	United Imaging
Zhang, Zheng	1. the Institute of Medical Imaging Technology, School of Biomed
Li, Meng	Shanghai United Imaging Healthcare Co., Ltd
Cao, Tuoyu	United Imaging Healthcare
Ghaffari, Maani	University of Michigan
Keywords: Surgical Robotics: Steerable Catheters/Needles, Localization, Computer Vision for Medical Robotics Abstract: This paper reports on a new real-time robot-centered 3D-2D vascular image alignment algorithm, which is robust to outliers and can align nonrigid shapes. Few works have managed to achieve both real-time and accurate performance for vascular intervention robots. This work bridges high-accuracy 3D-2D registration techniques and computational efficiency requirements in intervention robot applications. We categorize centerline-based vascular 3D-2D image registration problems as an iterative Perspective-n-Point (PnP) problem and propose using the Levenberg-Marquardt solver on the Lie manifold. Then, the recently developed Reproducing Kernel Hilbert Space (RKHS) algorithm is introduced to overcome the ``big-to-small'' problem in typical robotic scenarios. Finally, an iterative reweighted least squares is applied to solve RKHS-based formulation efficiently. Experiments indicate that the proposed algorithm processes registration over 50 Hz (rigid) and 20 Hz (nonrigid) and obtains competing registration accuracy similar to other works. Results indicate that our Iterative PnP is suitable for future vascular intervention robot applications.

16:30-18:00, Paper ThCT19-NT.2	Add to My Program
Sim2Real Transfer of Reinforcement Learning for Concentric Tube Robots

Iyengar, Keshav Kannan	University College London
Sadati, S.M.Hadi	King's College London
Bergeles, Christos	King's College London
Spurgeon, Sarah	University College London
Stoyanov, Danail	University College London
Keywords: Surgical Robotics: Steerable Catheters/Needles, Machine Learning for Robot Control, Reinforcement Learning Abstract: Concentric Tube Robots (CTRs) are promising for minimally invasive interventions due to their miniature diameter, high dexterity, and compliance with soft tissue. CTRs comprise individual pre-curved tubes usually composed of NiTi and are arranged concentrically. As each tube is relatively rotated and translated, the backbone elongates, twists, and bends with a dexterity that is advantageous for confined spaces. Tube interactions, unmodelled phenomena, and inaccurate tube parameter estimation make physical modeling of CTRs challenging, complicating in turn kinematics and control. Deep reinforcement learning (RL) has been investigated as a solution. However, hardware validation has remained a challenge due to differences between the simulation and hardware domains. With simulation-only data, in this work, domain randomization is proposed as a strategy for translation to hardware of a simulation policy with no additionally acquired physical training data. The differences in simulation and hardware forward kinematics accuracy and precision are characterized by errors of 14.74 +/- 8.87 mm or 26.61 +/- 17.00 % robot length. We showcase that the proposed domain randomization approach reduces errors by 56% in mean errors as compared to no domain randomization. Furthermore, we demonstrate path following capability in hardware with a line path with resulting errors of 4.37 +/- 2.39 mm or 5.61 +/- 3.11 % robot length.

16:30-18:00, Paper ThCT19-NT.4	Add to My Program
A Kinetostatic Model for Concentric Push-Pull Robots

Childs, Jake	EndoTheia, Inc
Rucker, Caleb	University of Tennessee
Keywords: Surgical Robotics: Steerable Catheters/Needles, Medical Robots and Systems, Kinematics, Continuum Robots Abstract: Concentric push-pull robots (CPPR) operate through the mechanical interactions of concentrically nested, laser-cut tubes with offset stiffness centers. The distal tips of the tubes are attached to each other, and relative displacement of the tube bases generates bending in the CPPR. Previous CPPR kinematic models assumed two tubes, planar shapes, no torsion, and no external loads. In this paper, we develop a new, more general CPPR model accounting for any number of tubes, describing their variable-curvature 3D shape when actuated, including the effects of torsion and external loads. To accomplish this, we employ a modified Kirchhoff rod model for each tube (with offset stiffness center) and embed the constraints of concentricity. We use an energy method to determine robot shape as a function of actuation and external loading. We experimentally validate this kinetostatic model on prototype CPPRs with two tubes and three tubes and non-constant laser-cut patterns that create variable curvature and stiffness. Experimental results agree with the model, paving the way for use of this model in design optimization, planning, and control of CPPRs.

16:30-18:00, Paper ThCT19-NT.5	Add to My Program
Fully Distributed Shape Sensing of a Flexible Surgical Needle Using Optical Frequency Domain Reflectometry for Prostate Interventions

Francoeur, Jacynthe	Polytechnique Montréal
Lezcano, Dimitri A.	Johns Hopkins University
Zhetpissov, Yernar	Johns Hopkins University
Kashyap, Raman	Polytechnique Montreal
Iordachita, Ioan Iulian	Johns Hopkins University
Kadoury, Samuel	Polytechnique Montréal
Keywords: Surgical Robotics: Steerable Catheters/Needles, Medical Robots and Systems, Visual Tracking Abstract: In minimally invasive procedures such as biopsies and prostate cancer brachytherapy, accurate needle placement remains challenging due to limitations in current tracking methods related to interference, reliability, resolution or image contrast. This often leads to frequent needle adjustments and reinsertions. To address these shortcomings, we introduce an optimized needle shape-sensing method using a fully distributed grating-based sensor. The proposed method uses simple trigonometric and geometric modeling of the fiber using optical frequency domain reflectometry (OFDR), without requiring prior knowledge of tissue properties or needle deflection shape and amplitude. Our optimization process includes a reproducible calibration process and a novel tip curvature compensation method. We validate our approach through experiments in artificial isotropic and inhomogeneous animal tissues, establishing ground truth using 3D stereo vision and cone beam computed tomography (CBCT) acquisitions, respectively. Our results yield an average RMSE ranging from 0.58 ± 0.21 mm to 0.66 ± 0.20 mm depending on the chosen spatial resolution, achieving the submillimeter accuracy required for interventional procedures.

16:30-18:00, Paper ThCT19-NT.6	Add to My Program
Integrated Magnetic Location Sensing and Actuation of Steerable Robotic Catheters for Peripheral Arterial Disease Treatment

Wu, Jingjie	The University of Texas at Austin
Yu, Kevin	University of Texas at Austin
Lopez, Ithza	University of Texas at Austin
Aguilar Izquierdo, Alexa	The University of Texas at Austin
Saber, Hamidreza	The University of Texas at Austin
Alambeigi, Farshid	University of Texas at Austin
Zhou, Lei	University of Wisconsin-Madison
Keywords: Surgical Robotics: Steerable Catheters/Needles, Medical Robots and Systems Abstract: Magnetically steerable robotic catheters (MSRC) are a promising technology for percutaneous endovascular intervention procedures to treat peripheral arterial diseases, where magnetic actuation is used to steer the catheter tip during navigation. However, today's MSRC systems require fluoroscopic imaging for catheter localization during navigation, which risks creating radiation-induced injuries to both the patient and the surgeon. Aiming to reduce the duration of x-ray radiation in interventions using MSRCs, this letter introduces a new steerable robotic catheter system that integrates magnetic location sensing and magnetic actuation. The proposed catheter uses a magnetic tip to enable magnetic steering. In addition, a cylindrical array of magnetic sensors is used to measure the field from the catheter tip to enable real-time catheter localization. To enable improved localization accuracy, a novel nested calibration algorithm for sensor positions and magnet dipole strength is introduced. This letter further proposes a novel integration of magnetic actuation and magnetic localization in MSRC systems, where fluoroscopic imaging is only required during catheter steering at bifurcations in the vasculatures. The proposed methodology is tested with an MSRC prototype, where the magnet location estimation algorithm is implemented for real-time visual feedback to the operator with a low latency of 400 ms. Experiments show that an average localization error of 0.95 mm can be achieved a

16:30-18:00, Paper ThCT19-NT.7	Add to My Program
Semi-Autonomous Robotic Manipulator for Minimally Invasive Aortic Valve Replacement

Tamadon, Izadyar	University of Twente
Sadati, S.M.Hadi	King's College London
Mamone, Virginia	University of Pisa, EndoCAS
Ferrari, Vincenzo	Università Di Pisa
Bergeles, Christos	King's College London
Menciassi, Arianna	Scuola Superiore Sant'Anna - SSSA
Keywords: Surgical Robotics: Steerable Catheters/Needles, Motion Control of Manipulators, Force and Tactile Sensing, Visual Servoing Abstract: Aortic valve surgery is the preferred procedure for replacing a damaged valve with an artificial one. The ValveTech robotic platform comprises a flexible articulated manipulator and surgical interface supporting the effective delivery of an artificial valve by tele-operation and endoscopic vision. This manuscript presents our recent work on force-perceptive safe semi-autonomous navigation of the ValveTech platform prior to valve implantation. First, we present a force observer that transfers forces from the manipulator body and tip to a haptic interface. Second, we demonstrate how hybrid forward/inverse mechanics together with endoscopic visual servoing lead to autonomous valve positioning. Benchtop experiments and an artificial phantom quantify the performance of the developed robot controller and navigator. Valves can be autonomously delivered with a 2.0±0.5 mm position error, and minimal misalignment of 3.4±0.9°. The hybrid Force/Shape Observer (FSO) algorithm was able to predict distributed external forces on the articulated manipulator body with an average 0.09 N error. FSO can also estimate loads on tip with an average accuracy of 3.3%. The presented system can lead to better

16:30-18:00, Paper ThCT19-NT.8	Add to My Program
Robotic Needle Insertion with 2D Ultrasound – 3D CT Fusion Guidance (I)

Lei, Long	Shenzhen Institute of Advanced Technology, Chinese Academy of Sc
Zhao, Baoliang	Shenzhen Institutes of Advanced Technology, Chinese Academy of S
Qi, Xiaozhi	Shenzhen Institutes of Advanced Technology, Chinese Academy of S
Mi, Rui	Department of Radiology, Shenzhen University General Hospital, S
Ye, Hai	Department of Radiology, Shenzhen University General Hospital, S
Zhang, Peng	Shenzhen Institutes of Advanced Technology, ChineseAcademyofScien
Wang, Qiong	Shenzhen Institutes of Advanced Technology, Chinese Academy of S
Heng, Pheng Ann	The Chinese University of Hong Kong
Hu, Ying	Shenzhen Institute of Advanced Technology, ShenZhen, China
Keywords: Surgical Robotics: Steerable Catheters/Needles, Software Architecture for Robotic and Automation, Medical Robots and Systems Abstract: Puncture robots pave a new way for stable, accurate and safe percutaneous liver tumor puncture operation. However, affected by respiratory motion, intraoperative accurate location of the tumor and its surrounding anatomical structures remains a difficult problem in existing robot-assisted puncture operations. In this paper, a dual-arm robotic needle insertion system with guidance of intraoperative 2D ultrasound (US) and preoperative 3D computed tomography (CT) fusion is proposed, addressing the shortcomings of existing puncture robots. To deal with the challenge of cross-modal and cross-dimensional registration between 2D US and 3D CT, a decoupled two-stage registration approach combining initial vessel structure-based 3D US – 3D CT registration with intraoperative intensity-based 2D US - 3D US registration is proposed. To achieve fast and robust ultrasound probe calibration, a method based on an improved N-wire phantom is proposed. Twenty puncture experiments are performed in different breath-holding positions on a respiratory motion simulation platform, and experimental results show that the mean puncture error is 2.48 mm, which can meet the requirements in a wide of clinical scenarios.


ThCT23-NT Oral Session, NT-G401	Add to My Program
Aerial Systems: Perception and Autonomy III

Chair: Schoellig, Angela P.	TU Munich
Co-Chair: Scherer, Sebastian	Carnegie Mellon University

16:30-18:00, Paper ThCT23-NT.1	Add to My Program
Control-Barrier-Aided Teleoperation with Visual-Inertial SLAM for Safe MAV Navigation in Complex Environments

Zhou, Siqi	Technical University of Munich
Papatheodorou, Sotiris	Technical University of Munich
Leutenegger, Stefan	Technical University of Munich
Schoellig, Angela P.	TU Munich
Keywords: Aerial Systems: Perception and Autonomy, Robot Safety, Motion Control Abstract: In this paper, we consider a Micro Aerial Vehicle (MAV) system teleoperated by a non-expert and introduce a perceptive safety filter that leverages Control Barrier Functions (CBFs) in conjunction with Visual-Inertial Simultaneous Localization and Mapping (VI-SLAM) and dense 3D occupancy mapping to guarantee safe navigation in complex and unstructured environments. Our system relies solely on onboard IMU measurements, stereo infrared images, and depth images and autonomously corrects teleoperated inputs when they are deemed unsafe. We define a point in 3D space as unsafe if it satisfies either of two conditions: (i) it is occupied by an obstacle, or (ii) it remains unmapped. At each time step, an occupancy map of the environment is updated by the VI-SLAM by fusing the onboard measurements, and a CBF is constructed to parameterize the (un)safe region in the 3D space. Given the CBF and state feedback from the VI-SLAM module, a safety filter computes a certified reference that best matches the teleoperation input while satisfying the safety constraint encoded by the CBF. In contrast to existing perception-based safe control frameworks, we directly close the perception-action loop and demonstrate the full capability of safe control in combination with real-time VI-SLAM without any external infrastructure or prior knowledge of the environment. We verify the efficacy of the perceptive safety filter in real-time MAV experiments using exclusively onboard sensing and computation and show that the teleoperated MAV is able to safely navigate through unknown environments despite arbitrary inputs sent by the teleoperator.

16:30-18:00, Paper ThCT23-NT.2	Add to My Program
Flow Shadowing: A Method to Detect Multiple Flow Headings Using an Array of Densely Packed Whisker-Inspired Sensors

Kent, Teresa	Carnegie Mellon University
Bergbreiter, Sarah	Carnegie Mellon University
Keywords: Aerial Systems: Perception and Autonomy, Sensor Fusion, Soft Sensors and Actuators Abstract: Understanding airflow around a drone is critical for performing advanced maneuvers while maintaining flight stability. Recent research has worked to understand this flow by employing 2D and 3D flow sensors to measure flow from a single source like wind or the drone’s relative motion. Our current work advances flow detection by introducing a strategy to distinguish between two flow sources applied simultaneously from different directions. By densely packing an array of flow sensors (or whiskers), we alter the path of airflow as it moves through the array. We have named this technique “flow shadowing” because we take advantage of the fact that a downstream whisker shadowed (or occluded) by an upstream whisker receives less incident flow. We show that this relationship is predictable for two whiskers based on the percent of occlusion. We then show that a 2x2 spatial array of whiskers responds asymmetrically when multiple flow sources from different headings are applied to the array. This asymmetry is direction-dependent, allowing us to predict the headings of flow from two different sources, like wind and a drone’s relative motion.

16:30-18:00, Paper ThCT23-NT.3	Add to My Program
Onboard Dynamic-Object Detection and Tracking for Autonomous Robot Navigation with RGB-D Camera

Xu, Zhefan	Carnegie Mellon University
Zhan, Xiaoyang	Carnegie Mellon University
Xiu, Yumeng	Carnegie Mellon University
Suzuki, Christopher	Carnegie Mellon University
Shimada, Kenji	Carnegie Mellon University
Keywords: Aerial Systems: Perception and Autonomy, Collision Avoidance, Vision-Based Navigation Abstract: Deploying autonomous robots in crowded indoor environments usually requires them to have accurate dynamic obstacle perception. Although plenty of previous works in the autonomous driving field have investigated the 3D object detection problem, the usage of dense point clouds from a heavy Light Detection and Ranging (LiDAR) sensor and their high computation cost for learning-based data processing make those methods not applicable to small robots, such as vision-based UAVs with small onboard computers. To address this issue, we propose a lightweight 3D dynamic obstacle detection and tracking (DODT) method based on an RGB-D camera, which is designed for low-power robots with limited computing power. Our method adopts a novel ensemble detection strategy, combining multiple computationally efficient but low-accuracy detectors to achieve real-time high-accuracy obstacle detection. Besides, we introduce a new feature-based data association and tracking method to prevent mismatches utilizing point clouds' statistical features. In addition, our system includes an optional and auxiliary learning-based module to enhance the obstacle detection range and dynamic obstacle identification. The proposed method is implemented in a small quadcopter, and the results show that our method can achieve the lowest position error (0.11m) and a comparable velocity error (0.23m/s) across the benchmarking algorithms running on the robot's onboard computer.

16:30-18:00, Paper ThCT23-NT.4	Add to My Program
APACE: Agile and Perception-Aware Trajectory Generation for Quadrotor Flights

Chen, Xinyi	The Hong Kong University of Science and Technology
Zhang, Yichen	The Hong Kong University of Science and Technology
Zhou, Boyu	Sun Yat-Sen University
Shen, Shaojie	Hong Kong University of Science and Technology
Keywords: Aerial Systems: Perception and Autonomy, View Planning for SLAM, Motion and Path Planning Abstract: Various perception-aware planning approaches have attempted to enhance the state estimation accuracy during maneuvers, while the feature matchability among frames, a crucial factor influencing estimation accuracy, has often been overlooked. In this paper, we present APACE, an Agile and Perception-Aware trajeCtory gEneration framework for quadrotors aggressive flight, that takes into account feature matchability during trajectory planning. We seek to generate a perception-aware trajectory that reduces the error of visual-based estimator while satisfying the constraints on smoothness, safety, agility and the quadrotor dynamics. The perception objective is achieved by maximizing the number of covisible features while ensuring small enough parallax angles. Additionally, we propose a differentiable and accurate visibility model that allows decomposition of the trajectory planning problem for efficient optimization resolution. Through validations conducted in both a photorealistic simulator and real-world experiments, we demonstrate that the trajectories generated by our method significantly improve state estimation accuracy, with root mean square error (RMSE) reduced by up to an order of magnitude. The source code will be released to benefit the community.

16:30-18:00, Paper ThCT23-NT.5	Add to My Program
SpECULARIA: Towards Fully Autonomous Robotic Indoor Farming System

Car, Marsela	University of Zagreb
Arbanas Ferreira, Barbara	University of Zagreb, Faculty of Electrical Engineering and Comp
Vuletic, Jelena	University of Zagreb, Faculty of Electrical Engineering and Comp
Orsag, Matko	University of Zagreb, Faculty of Electrical Engineering and Comp
Keywords: Agricultural Automation, AI-Enabled Robotics, Multi-Robot Systems Abstract: To support the hypothesis that embracing robotics has the potential to address farming challenges and at the same time replace large and complex farm machinery, this paper proposes designing a farm around a heterogeneous robotic system dubbed SpECULARIA. Within this multi-robot system, mobile robots are deployed to work just like in a warehouse, moving plants grown in containers to make sure every plant receives optimal care and ideal growing conditions. By structuring the work cell environment around a stationary dual arm manipulator, the system can plan and execute procedures to control every plant's growth and hygiene, from seed to harvest. Such a system surpasses current farming robots in scalability and versatility. We showcase compliance control algorithms combined with artificial intelligence which help us build a functional model of the plant. The same approach is used to program different plant treatments. Finally we benchmark the proposed setup with a classical mobile manipulation approach, demonstrating its feasibility.

16:30-18:00, Paper ThCT23-NT.6	Add to My Program
End-To-End Thermal Updraft Detection and Estimation for Autonomous Soaring Using Temporal Convolutional Networks

Gall, Christian	University of Stuttgart
Fichter, Walter	University of Stuttgart
Ahmad, Aamir	University of Stuttgart
Keywords: Aerial Systems: Perception and Autonomy, AI-Based Methods, Energy and Environment-Aware Automation Abstract: Exploiting thermal updrafts to gain altitude can significantly extend the endurance of fixed-wing aircraft, as has been demonstrated by human glider pilots for decades. In this work, we present a novel end-to-end deep learning approach for the simultaneous detection of multiple thermal updrafts and the estimation of their properties - a key capability to let autonomous unmanned aerial vehicles soar as well. In contrast to previous works, our approach does not require separate algorithms for the detection of individual updrafts. Instead, a sequence of sensor measurements from a time window of interest can be directly fed into our temporal convolutional network, which estimates the position, strength, and spread of the encountered updrafts. We demonstrated in simulations that our approach can reliably detect updrafts solely based on measurements of the aircraft's position and the local vertical wind velocity. Nevertheless, our method can additionally make use of measurements of the roll moment induced by updrafts, which improves the precision further. Compared with a particle-filter-based method, we can determine the correct number of encountered updrafts with an accuracy of 99.99% instead of 79.50%, significantly improve the precision of strength as well as spread estimates, and reduce the computational demand.

16:30-18:00, Paper ThCT23-NT.7	Add to My Program
SANet: Small but Accurate Detector for Aerial Flying Object

Zhou, Xunkuai	Tongji University
Zhao, Benyun	The Chinese University of Hong Kong
Yang, Guidong	The Chinese University of Hong Kong
Zhang, Jihan	Chinese University of Hong Kong
Li, Li	Tongji University
Chen, Ben M.	Chinese University of Hong Kong
Keywords: Aerial Systems: Applications, AI-Based Methods, Object Detection, Segmentation and Categorization Abstract: This paper proposes SANet, a small but accurate detector for aerial flying objects. The detector introduces an attention module into the feature extraction module (FEM) for enhancing the accuracy. This FEM with fewer convolutional kernel channels can reduce the parameters, speed up the inference time, and mitigate the computational burden.Furthermore, we optimize the Spatial Pyramid Pooling (SPP) module to enhance both the accuracy and speed. By analyzing the structure characteristic of the ResNet and RepVGG network that are usually utilized to extract features, a feature fusion module named RepNeck is designed to comprehensively fuse features extracted by the FEM, further enhancing the speed and accuracy. Eventually, we develop a neural network with an impressively small model size of only 4.5M. This network can achieve the state-of-the-art performance on three challenging datasets. Apart from its superior performance, our approach enjoys a real-time detection speed of 14.8 frames per second (fps) and power consumption of only 2.9W while the CPU and GPU temperatures are maintained below 50◦C even on an edge- computing device, highlighting the practicality of our approach for long-duration flying object detection and monitor tasks.

16:30-18:00, Paper ThCT23-NT.8	Add to My Program
N-QR: Natural Quick Response Codes for Multi-Robot Instance Correspondence

Glaser, Nathaniel	Georgia Institute of Technology
Ravi, Rajashree	Bowery Farming
Kira, Zsolt	Georgia Institute of Technology
Keywords: Agricultural Automation, Multi-Robot Systems, Deep Learning for Visual Perception Abstract: Image correspondence serves as the backbone for many tasks in robotics, such as visual fusion, localization, and mapping. However, existing correspondence methods do not scale to large multi-robot systems, and they struggle when image features are weak, ambiguous, or evolving. In response, we propose Natural Quick Response codes, or N-QR, which enables rapid and reliable correspondence between large-scale teams of heterogeneous robots. Our method works like a QR code, using keypoint-based alignment, rapid encoding, and error correction via ensembles of image patches of natural patterns. We deploy our algorithm in a production-scale robotic farm, where groups of growing plants must be matched across many robots. We demonstrate superior performance compared to several baselines, obtaining a retrieval accuracy of 88.2%. Our method generalizes to a farm with 100 robots, achieving a 12.5x reduction in bandwidth and a 20.5x speedup. We leverage our method to correspond 700k plants and confirm a link between a robotic seeding policy and germination.


ThCT24-NT Oral Session, NT-G402	Add to My Program
Robotics and Automation in Agriculture and Forestry IV

Chair: Valada, Abhinav	University of Freiburg
Co-Chair: Stachniss, Cyrill	University of Bonn

16:30-18:00, Paper ThCT24-NT.1	Add to My Program
Containerized Vertical Farming Using Cobots

Mahalingam, Dasharadhan	Stony Brook University
Patankar, Aditya	Stony Brook University
Phi, Khiem	Stony Brook University
Chakraborty, Nilanjan	Stony Brook University
McGann, Ryan	CubicAcres LLC
Ramakrishnan, Iv	Stony Brook University
Keywords: Robotics and Automation in Agriculture and Forestry, Constrained Motion Planning, Motion and Path Planning Abstract: Containerized vertical farming is a type of vertical farming practice using hydroponics in which plants are grown in vertical layers within a mobile shipping container. Space limitations within shipping containers make the automation of different farming operations challenging. In this paper, we explore the use of cobots (i.e., collaborative robots) to automate two key farming operations, namely, the transplantation of saplings and the harvesting of grown plants. Our method uses a single demonstration from a farmer to extract the motion constraints associated with the tasks, namely, transplanting and harvesting, and can then generalize to different instances of the same task. For transplantation, the motion constraint arises during insertion of the sapling within the growing tube, whereas for harvesting, it arises during extraction from the growing tube. We present experimental results to show that using RGBD camera images (obtained from an eye-in-hand configuration) and one demonstration for each task, it is feasible to perform transplantation of saplings and harvesting of leafy greens using a cobot, without task-specific programming.

16:30-18:00, Paper ThCT24-NT.2	Add to My Program
INoD: Injected Noise Discriminator for Self-Supervised Representation Learning in Agricultural Fields

Hindel, Julia	University of Freiburg
Gosala, Nikhil	University of Freiburg
Bregler, Kevin	Fraunhofer IPA
Valada, Abhinav	University of Freiburg
Keywords: Robotics and Automation in Agriculture and Forestry, Deep Learning for Visual Perception, Computer Vision for Automation Abstract: Perception datasets for agriculture are limited both in quantity and diversity which hinders effective training of supervised learning approaches. Self-supervised learning techniques alleviate this problem, however, existing methods are not optimized for dense prediction tasks in agriculture domains which results in degraded performance. In this work, we address this limitation with our proposed Injected Noise Discriminator (INoD) which exploits principles of feature replacement and dataset discrimination for self-supervised representation learning. INoD interleaves feature maps from two disjoint datasets during their convolutional encoding and predicts the dataset affiliation of the resultant feature map as a pretext task. Our approach enables the network to learn unequivocal representations of objects seen in one dataset while observing them in conjunction with similar features from the disjoint dataset. This allows the network to reason about higher-level semantics of the entailed objects, thus improving its performance on various downstream tasks. Additionally, we introduce the novel Fraunhofer Potato 2022 dataset consisting of over 16,800 images for object detection in potato fields. Extensive evaluations of our proposed INoD pretraining strategy for the tasks of object detection, semantic segmentation, and instance segmentation on the Sugar Beets 2016 and our potato dataset demonstrate that it achieves state-of-the-art performance.

16:30-18:00, Paper ThCT24-NT.3	Add to My Program
Unsupervised Generation of Labeled Training Images for Crop-Weed Segmentation in New Fields and on Different Robotic Platforms

Chong, Yue Linn	University of Bonn
Weyler, Jan	University of Bonn
Lottes, Philipp	University of Bonn
Behley, Jens	University of Bonn
Stachniss, Cyrill	University of Bonn
Keywords: Robotics and Automation in Agriculture and Forestry, Deep Learning for Visual Perception, Object Detection, Segmentation and Categorization Abstract: Agricultural robots have the potential to improve the efficiency and sustainability of existing agricultural practices. Most autonomous agricultural robots rely on machine vision systems. Such systems,however, often perform worse in new fields or when the robotic platforms change. While we can alleviate the performance degradation by manually labeling more data obtained in the new setup, this procedure is labor and cost-intensive. Therefore, we propose an approach to improve the performance of machine vision systems for new fields and different robotic platforms without additional manual labeling. In an unsupervised manner, our approach can generate images and corresponding labels to train machine vision systems. We use StyleGAN2 to generate images that appear like they are from desired new field or robotic platform. Additionally, we propose a label refinement method to generate labels corresponding to the generated images. We show that our approach can improve the performance of the crop-weed segmentation task in new fields and on different robotic platforms without additional manual labeling.

16:30-18:00, Paper ThCT24-NT.4	Add to My Program
Log Loading Automation for Timber-Harvesting Industry

Ayoub, Elie	FPInnovations
Fernando, Heshan	FP Innovations
Larrivée-Hardy, William	Laval University
Lemieux, Nicolas	FPInnovations
Giguère, Philippe	Université Laval
Sharf, Inna	McGill University
Keywords: Robotics and Automation in Agriculture and Forestry, Field Robots, Perception for Grasping and Manipulation Abstract: The timber-harvesting industry is lagging its peer industries, such as mining and agriculture, with respect to deployment of robotic, AI and autonomous technologies. In this paper, we tackle automation of a critical task that arises in transporting logs from the forest to the sawmill: the log loading operation. This work is motivated by the acute shortages of human operators and the need to improve the efficiencies of timber-harvesting processes. To this end, we demonstrate the full autonomy pipeline for the log loading operation with a fixed-base manipulator (a.k.a., the crane), starting with perception of logs around the machine, then grasp planning for where to grasp logs, through motion planning and control of the log loading maneuver. Our main contribution is in the full integration of the necessary elements to achieve a completely autonomous loading cycle, where the crane picks up and loads all logs within its reach on a trailer. Notable features of our implementation are a generalizable perception stack, a grasp planner to pick up multiple logs at a time and an extensive experimental campaign conducted outdoors, on a commercial log loader retrofitted for autonomy. Our results demonstrate an overall 87% success rate of the log loading operation, with primary failure cases due to log segmentation errors and deficiencies in the final height adjustment algorithm for grasping logs. We also present detailed timing results of the main parts of the autonomy pipeline, which support the feasibility of deployment in operational environment.

16:30-18:00, Paper ThCT24-NT.5	Add to My Program
Region-Determined Localization Method for Unmanned Ground Vehicle under Pole-Like Feature Environment

Lai, Yu-Hsiang	National Taiwan University
Chuang, Chia-Yun	National Taiwan University
Chen, Yu-Qiang	National Taiwan University
Lian, Feng-Li	National Taiwan University
Keywords: Robotics and Automation in Agriculture and Forestry, Localization Abstract: In this paper, a region-determined navigation method applied for unmanned ground vehicles (UGVs) is presented. The method aims to solve GNSS-denied localization problem using pole-like feature such as trees or street lights. The approach includes three parts: mapping, bounding, and localization. To map and reconstruct the environment, the hector mapping approach and circle-fitting method are adopted for the occupancy mapping and feature mapping. To bound out the available working region, we define the intersection area of features' enlarged radius and desired operating area as negative and positive virtual boundaries. While the robot is cruising, the likelihood detection method is adopted for obstacle searching and comparing. Using the detection's searching results as feedback reference, the Extended Kalman Filter (EKF) can modify the shifting between the GNSS signal and the true waypoints of the mowing robot. Three cruising demonstrations are presented to show the mapping and optimizing results. Different cases of demonstration represent different situations and potential issues.

16:30-18:00, Paper ThCT24-NT.6	Add to My Program
Tree Instance Segmentation and Traits Estimation for Forestry Environments Exploiting LiDAR Data Collected by Mobile Robots

Malladi, Meher Venkata Ramakrishna	University of Bonn
Guadagnino, Tiziano	University of Bonn
Lobefaro, Luca	University of Bonn
Mattamala, Matias	University of Oxford
Griess, Holger	Swiss Federal Institute for Forest, Snow and Landscape Research
Schweier, Janine	Swiss Federal Institute for Forest, Snow and Landscape Research
Chebrolu, Nived	University of Oxford
Fallon, Maurice	University of Oxford
Behley, Jens	University of Bonn
Stachniss, Cyrill	University of Bonn
Keywords: Robotics and Automation in Agriculture and Forestry, Mapping Abstract: Forests play a crucial role in our ecosystems, functioning as carbon sinks, climate stabilizers, biodiversity hubs, and sources of wood. By the very nature of their scale, monitoring and maintaining forests is a challenging task. Robotics in forestry can have the potential for substantial automation toward efficient and sustainable foresting practices. In this paper, we address the problem of automatically producing a forest inventory by exploiting LiDAR data collected by a mobile platform. To construct an inventory, we first extract tree instances from point clouds. Then, we process each instance to extract forestry inventory information. Our approach provides per-tree geometric traits such as diameter at breast height together with the individual tree locations in a plot. We validate our results against manual measurements collected by foresters during field trials. Our experiments show strong segmentation and tree trait estimation performance, underlining the potential for automating forestry services.

16:30-18:00, Paper ThCT24-NT.7	Add to My Program
Automated Testing of Spatially-Dependent Environmental Hypotheses through Active Transfer Learning

Harrison, Nicholas	The University of Sydney: The Australian Centre for Field Roboti
Wallace, Nathan Daniel	University of Sydney
Sukkarieh, Salah	The University of Sydney: The Australian Centre for Field Roboti
Keywords: Robotics and Automation in Agriculture and Forestry, Reactive and Sensor-Based Planning, Transfer Learning Abstract: The efficient collection of samples is an important factor in outdoor information gathering applications on account of high sampling costs such as time, energy, and potential destruction to the environment. Utilization of available a-priori data can be a powerful tool for increasing efficiency. However, the relationships of this data with the quantity of interest are often not known ahead of time, limiting the ability to leverage this knowledge for improved planning efficiency. To this end, this work combines transfer learning and active learning through a Multi-Task Gaussian Process and an information-based objective function. Through this combination it can explore the space of hypothetical inter-quantity relationships and evaluate these hypotheses in real-time, allowing this new knowledge to be immediately exploited for future plans. The performance of the proposed method is evaluated against synthetic data and is shown to evaluate multiple hypotheses correctly. Its effectiveness is also demonstrated on real datasets. The technique is able to identify and leverage hypotheses which show a medium or strong correlation to reduce prediction error by a factor of 1.4--3.4 within the first 7 samples, and poor hypotheses are quickly identified and rejected eventually having no adverse effect.

16:30-18:00, Paper ThCT24-NT.8	Add to My Program
Decentralized Multi-Phase Formation Control for Cattle Herding

Nguyen, Dac Dang Khoa	University of Technology Sydney
Paul, Gavin	University of Technology Sydney
Alempijevic, Alen	University of Technology Sydney
Keywords: Swarm Robotics, Agricultural Automation, Multi-Robot Systems Abstract: Herding is performed by people or trained animals to control the movement of livestock under the desired direction of a operator. This paper presents a novel decentralized control strategy for a group of robots to herd animals which consists of two phases, a surrounding phase and a driving phase. In the surrounding phase, a custom artificial potential field is employed to simultaneously guide the robots to encircle the herd by tracking the outmost animals and maintaining a safe distance with other neighboring robots. Once the encirclement is complete, the robots transition to drive the animals toward a designated goal by simply maintaining their initial formation and traverse to it. Unlike existing works on herding using flocking control, local observations of the nearest animals and communication with other robots within the sensing range are the only requirements for the robots to effectively surround and herd the animals. Moreover, the animal-robot behavior model resembles interaction of livestock under the presence of an external predatory threat, where robots act as predators. An analytical proof and empirical results collected from different simulators demonstrate that the proposed control enables the robots to converge around the boundary of the animals and guide them toward the designated goal.


ThCT29-NT Oral Session, NT-G5	Add to My Program
Object Detection V

Chair: Shi, Qing	Beijing Institute of Technology
Co-Chair: Dias, Jorge	Khalifa University

16:30-18:00, Paper ThCT29-NT.1	Add to My Program
WLST: Weak Labels Guided Self-Training for Weakly-Supervised Domain Adaptation on 3D Object Detection

Tsou, Tsung Lin	National Taiwan University
Wu, Tsung-Han	National Taiwan University
Hsu, Winston	National Taiwan University
Keywords: Object Detection, Segmentation and Categorization, Sensor Fusion, Transfer Learning Abstract: In the field of domain adaptation (DA) on 3D object detection, most of the work is dedicated to unsupervised domain adaptation (UDA). Yet, without any target annotations, the performance gap between the UDA approaches and the fully-supervised approach is still noticeable, which is impractical for real-world applications. On the other hand, weakly-supervised domain adaptation (WDA) is an underexplored yet practical task that only requires few labeling effort on the target domain. To improve the DA performance in a cost-effective way, we propose a general weak labels guided self-training framework, WLST, designed for WDA on 3D object detection. By incorporating autolabeler, which can generate 3D pseudo labels from 2D bounding boxes, into the existing self-training pipeline, our method is able to generate more robust and consistent pseudo labels that would benefit the training process on the target domain. Extensive experiments demonstrate the effectiveness, robustness, and detector-agnosticism of our WLST framework. Notably, it outperforms previous state-of-the-art methods on all evaluation tasks.

16:30-18:00, Paper ThCT29-NT.2	Add to My Program
Towards a Robust Sensor Fusion Step for 3D Object Detection on Corrupted Data

Wozniak, Maciej Kazimierz	KTH Royal Institute of Technology
Karefjard, Viktor	KTH Royal Institute of Technology
Thiel, Marko	Hamburg University of Technology (TUHH)
Jensfelt, Patric	KTH - Royal Institute of Technology
Keywords: Object Detection, Segmentation and Categorization, Sensor Fusion, Deep Learning for Visual Perception Abstract: Multimodal sensor fusion methods for 3D object detection have been revolutionizing the autonomous driving re- search field. Nevertheless, most of these methods heavily rely on dense LiDAR data and accurately calibrated sensors which is often not the case in real-world scenarios. Data from LiDAR and cameras often comes misaligned due to the miscalibration, de- calibration, or different frequencies of the sensors. Additionally, some parts of the LiDAR data may be occluded and parts of the data may be missing due to hardware malfunction or weather conditions. This work presents a novel fusion step that addresses data corruptions and makes sensor fusion for 3D object detection more robust. Through extensive experiments, we demonstrate that our method performs on par with state-of-the-art approaches on normal data and outperforms them on misaligned data.

16:30-18:00, Paper ThCT29-NT.3	Add to My Program
TerrainSense: Vision-Driven Mapless Navigation for Unstructured Off-Road Environments

Hassan, Bilal	Khalifa University, Abu Dhabi
Sharma, Arjun	Khalifa University
Abdel Madjid, Nadya	Khalifa University
Khonji, Majid	Khalifa University
Dias, Jorge	Khalifa University
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, AI-Based Methods Abstract: Navigating autonomous vehicles efficiently across unstructured and off-road terrains remains a formidable challenge, often requiring intricate mapping or multi-step pipelines. However, these conventional approaches struggle to adapt to dynamic environments. This paper presents TerrainSense, an end-to-end framework that overcomes these limitations. By utilizing a transformers, TerrainSense detects lane semantics and topology from camera images, enabling mapless path planning without the reliance on highly detailed maps. The efficacy of TerrainSense was rigorously assessed on six diverse datasets, evaluating its efficacy in detection, segmentation, and path prediction using various metrics. Notably, it outperforms the other state-of-the-art methods by 9.32% in precisely predicting the path with 18.28% faster inference time.

16:30-18:00, Paper ThCT29-NT.4	Add to My Program
RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection

Kim, Jisong	Hanyang University
Seong, Minjae	Hanyang University
Bang, Geonho	Hanyang University
Kum, Dongsuk	KAIST
Choi, Jun Won	Seoul National University
Keywords: Object Detection, Segmentation and Categorization, Sensor Fusion, Deep Learning for Visual Perception Abstract: While LiDAR sensors have been successfully applied to 3D object detection, the affordability of radar and camera sensors has led to a growing interest in fusing radars and cameras for 3D object detection. However, previous radarcamera fusion models could not fully utilize the potential of radar information. In this paper, we propose Radar-Camera Multi-level fusion (RCM-Fusion), which attempts to fuse both modalities at feature and instance levels. For feature-level fusion, we propose a Radar Guided BEV Encoder which transforms camera features into precise BEV representations using the guidance of radar Bird’s-Eye-View (BEV) features and combines the radar and camera BEV features. For instance-level fusion, we propose a Radar Grid Point Refinement module that reduces localization error by accounting for the characteristics of the radar point clouds. The experiments on the public nuScenes dataset demonstrate that our proposed RCM-Fusion achieves state-of-the-art performances among single framebased radar-camera fusion methods in the nuScenes 3D object detection benchmark. The code will be made publicly available.

16:30-18:00, Paper ThCT29-NT.5	Add to My Program
One-Vs-All Semi-Automatic Labeling Tool for Semantic Segmentation in Autonomous Driving

Jing, Gu	Expleo Germany GmbH
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Computer Vision for Automation Abstract: Semantic image segmentation plays a pivotal role in creating High-Definition (HD) maps for autonomous driving, where every pixel in an image is assigned a label from a specific semantic class. However, obtaining dense pixel-level annotations for model training is a laborious and expensive process. Active learning holds promise as a method to reduce the human annotation effort needed for semantic segmentation. However, existing active learning methods often perform well in the majority classes but struggle with the minority classes, negatively impacting segmentation performance. To tackle this challenge, we propose a novel One-vs-All (OVA) active learning framework, known as OVAAL. This paper explains how OVAAL can shift more attention towards the minority classes and thoroughly analyzes its contributions to performance enhancement. Additionally, we introduce an OVA-based semi-supervised learning method for post-processing, referred to as OVAAL+. Our results demonstrate that both OVAAL and OVAAL+ lead to significant improvements, with mean Intersection over Union (mIoU) gains of 4.55% and 6.38%, respectively, compared to the state-of-the-art active learning method Pixelpick on the Cityscapes semantic segmentation benchmark. These improvements are achieved while maintaining an economical annotation budget of 1.44% of the training data. We foresee further research exploring the potential of OVA-based active selection to address challenges in cold start scenarios and resource-constrained training environments.

16:30-18:00, Paper ThCT29-NT.6	Add to My Program
LiRaFusion: Deep Adaptive LiDAR-Radar Fusion for 3D Object Detection

Song, Jingyu	University of Michigan
Zhao, Lingjun	University of Michigan - Ann Arbor
Skinner, Katherine	University of Michigan
Keywords: Object Detection, Segmentation and Categorization, Sensor Fusion, Deep Learning for Visual Perception Abstract: We propose LiRaFusion to tackle LiDAR-radar fusion for 3D object detection to fill the performance gap of existing LiDAR-radar detectors. To improve the feature extraction capabilities from these two modalities, we design an early fusion module for joint voxel feature encoding, and a middle fusion module to adaptively fuse feature maps via a gated network. We perform extensive evaluation on nuScenes to demonstrate that LiRaFusion leverages the complementary information of LiDAR and radar effectively and achieves notable improvement over existing methods.

16:30-18:00, Paper ThCT29-NT.7	Add to My Program
BaSAL: Size Balanced Warm Start Active Learning for LiDAR Semantic Segmentation

Wei, Jiarong	Delft University of Technology
Lin, Yancong	Delft University of Technology (TU Delft)
Caesar, Holger	Delft University of Technology
Keywords: Semantic Scene Understanding, Object Detection, Segmentation and Categorization, Deep Learning for Visual Perception Abstract: Active learning strives to reduce the need for costly data annotation, by repeatedly querying an annotator to label the most informative samples from a pool of unlabeled data, and then training a model from these samples. We identify two problems with existing active learning methods for LiDAR semantic segmentation. First, they overlook the severe class imbalance inherent in LiDAR semantic segmentation datasets. Second, to bootstrap the active learning loop when there is no labeled data available, they train their initial model from randomly selected data samples, leading to low performance. This situation is referred to as the cold start problem. To address these problems we propose BaSAL, a size-balanced warm start active learning model, based on the observation that each object class has a characteristic size. By sampling object clusters according to their size, we can thus create a size-balanced dataset that is also more class-balanced. Furthermore, in contrast to existing information measures like entropy or CoreSet, size-based sampling does not require a pretrained model, thus addressing the cold start problem effectively. Results show that we are able to improve the performance of the initial model by a large margin. Combining warm start and size-balanced sampling with established information measures, our approach achieves comparable performance to training on the entire SemanticKITTI dataset, despite using only 5% of the annotations, outperforming existing active learning methods. We also match the existing state-of-the-art in active learning on nuScenes.

16:30-18:00, Paper ThCT29-NT.8	Add to My Program
Trajectory-Prediction-Based Dynamic Tracking of a UGV to a Moving Target under Multi-Disturbed Conditions

Si, Jinge	Beijing Institute of Technology
Li, Bin	Beijing Institute of Technology
Xu, Yongkang	Beijing Institute of Technology
Wang, Liang	Beijing Institute of Technology
Deng, ChenCheng	Beijing Institute of Technology
Wang, Shoukun	Beijing Institute of Technology
Wang, Junzheng	Beijing Institute of Technology
Keywords: Robust/Adaptive Control, Wheeled Robots, Motion Control Abstract: Tracking dynamic targets poses a significant challenge for Unmanned Ground Vehicles (UGVs). Existing methods often lack research on multi-disturbed conditions. To address this issue, we propose a trajectory-prediction-based dynamic tracking scheme, which includes target localization, trajectory prediction, and UGV control. Firstly, an estimation algorithm based on the Extended Kalman Filter (EKF) is employed to mitigate noise and estimate the absolute states of the target accurately. To enhance robustness, we present an Adaptive Trajectory Prediction (ATP) algorithm based on prediction anchors. In this method, a quantization standard for trajectory disturbance is designed for adaptive control. Subsequently, we iteratively solve prediction anchor points based on two motion models to robustly predict the target trajectory even in the presence of unknown disturbances. Finally, the Linear Time-Varying Model Predictive Control (LTV-MPC) is utilized in the UGV controller for dynamic tracking. Experimental results demonstrate that the ATP exhibits superior prediction robustness and accuracy in perturbed environments compared to other prediction algorithms. In addition, the proposed scheme effectively achieves dynamic tracking of the Unmanned Aerial Vehicle (UAV) by the UGV under multi-disturbed conditions. Specifically, when the target moves at a speed of 1.0 m/s, the UGV can maintain a tracking error within 0.346 m.


ThCT30-NT Oral Session, NT-G6	Add to My Program
AI Robotics

Chair: Gu, Fuqiang	Chongqing University
Co-Chair: Görner, Michael	University of Hamburg

16:30-18:00, Paper ThCT30-NT.1	Add to My Program
Sim-To-Real Robotic Sketching Using Behavior Cloning and Reinforcement Learning

Jia, Biao	University of Maryland at College Park
Manocha, Dinesh	University of Maryland
Keywords: Art and Entertainment Robotics, AI-Enabled Robotics Abstract: Robotic sketching in real-world scenarios poses a challenging problem with diverse applications in art, robotics, and digital design. We present a novel approach that bridges the gap between digital and robotic sketching, leveraging behavior cloning and reinforcement learning techniques. This paper intro- duces an approach aimed at bridging the gap between simulated and real-world robotic sketching through the integration of behavior cloning and reinforcement learning techniques. Our approach trains painting policies that operate effectively in both virtual environments and real-world robotic sketching systems. We have implemented a robotic sketching system featuring an UltraArm robot equipped with a RealSense D415 camera, closely emulating the MyPaint virtual environment. Our system can perceive its environment and adapt painting policies to natural painting media. Our results highlight the effectiveness of our agent in terms of acquiring policies for high-dimensional continuous action spaces, enabling the seamless transfer of brush manipulation techniques from simulation to practical robotic sketching. Furthermore, we demonstrate our robotic sketching system’s capability to generate complex images and strokes using various configurations.

16:30-18:00, Paper ThCT30-NT.2	Add to My Program
Safe Table Tennis Swing Stroke with Low-Cost Hardware

Cursi, Francesco	Imperial College London
Kalander, Marcus	Huawei Technologies
Wu, Shuang	Huawei
Xue, Xidi	Huawei
Tian, Yu	The Chinese University of Hong Kong
Tian, Guangjian	Huawei
Quan, Xingyue	Huawei
Hao, Jianye	Noah's Ark Lab
Keywords: Art and Entertainment Robotics, Constrained Motion Planning, Reinforcement Learning Abstract: Playing table tennis with a human player is a challenging robotic task due to its dynamic nature. Despite a number of researches being devoted to developing robotic table tennis systems, most of the works have demanding hardware requirements and ignore safety measures when generating the swing stoke. To address these issues, we propose a safe motion planning framework that fully pushes the robotic hardware performance limits to play table tennis. In particular, we propose a pipeline to generate manipulator joint trajectories with environmental safety constraints and scale the trajectories to satisfy joint movement limitations. We use three different agents to validate the planning algorithm with our handmade robot platform in both simulation and real-world environments.

16:30-18:00, Paper ThCT30-NT.3	Add to My Program
Pluck and Play: Self-Supervised Exploration of Chordophones for Robotic Playing

Görner, Michael	University of Hamburg
Hendrich, Norman	University of Hamburg
Zhang, Jianwei	University of Hamburg
Keywords: Art and Entertainment Robotics, Representation Learning, Incremental Learning Abstract: Existing robotic musicians utilize detailed handcrafted instrument models to generate or learn policies for playing because model-free or inaccurate policy rollouts might easily damage or wear out fragile instruments. We introduce an approach to characterize geometric models of chordophones and their audio onset responses directly through audio-tactile exploration with a physical robot arm. Initially, the system refines prior estimates of string positions, provided by kinesthetic teaching or visual estimation, through repeated attempts to pluck individual strings. A subsequent stage implements a Safe Active Exploration paradigm based on Gaussian Processes to explore and characterize the audio onset response of feasible plucking motions while minimizing invalid attempts. The resulting models can be used to actuate an imprecise robotic arm to play sequences of notes with varying loudness on a Chinese Guzheng.

16:30-18:00, Paper ThCT30-NT.4	Add to My Program
MBot: A Modular Ecosystem for Scalable Robotics Education

Gaskell, Peter	University of Michigan
Pavlasek, Jana	University of Michigan
Gao, Tom	University of Michigan
Narula, Abhishek	University of Michigan
Lewis, Stanley	University of Michigan
Jenkins, Odest Chadwicke	University of Michigan
Keywords: Education Robotics, Hardware-Software Integration in Robotics Abstract: The Michigan Robotics MBot is a low-cost mobile robot platform that has been used to train over 1,400 students in autonomous navigation since 2014 at the University of Michigan and our collaborating colleges. The MBot platform was designed to meet the needs of teaching robotics at scale to match the growth of robotics as a field and an academic discipline, spanning all levels of undergraduate and graduate experiences. Transformative advancements in robot navigation over the past decades have led to a significant demand for skilled roboticists across industry and academia. This demand has sparked a need for robotics courses in higher education. Incorporating real robot platforms into such courses and curricula is effective for sparking student motivation and conveying the unique challenges of programming embodied agents in real-world environments. However, teaching with real robots remains challenging due to the cost of hardware and the development effort involved in adapting existing hardware for a new course. In this paper, we describe the design and evolution of the MBot platform, and keys to success in terms of its scalability and flexibility.

16:30-18:00, Paper ThCT30-NT.5	Add to My Program
SO(2)-Equivariant Downwash Models for Close Proximity Flight

Smith, Henry	University of Cambridge
Shankar, Ajay	University of Cambridge, UK
Gielis, Jennifer	University of Cambridge
Blumenkamp, Jan	University of Cambrdige
Prorok, Amanda	University of Cambridge
Keywords: Machine Learning for Robot Control, Aerial Systems: Applications, Multi-Robot Systems Abstract: Multirotors flying in close proximity induce aerodynamic wake effects on each other through propeller downwash. Conventional methods have fallen short of providing adequate 3D force-based models that can be incorporated into robust control paradigms for deploying dense formations. Thus, learning a model for these downwash patterns presents an attractive solution. In this paper, we present a novel learning-based approach for modelling the downwash forces that exploits the latent geometries (i.e. symmetries) present in the problem. We demonstrate that when trained with only 5 minutes of real world flight data, our geometry-aware model outperforms state-of-the-art baseline models trained with more than 15 minutes of data. In dense real-world flights with two vehicles, deploying our model online improves 3D trajectory tracking by nearly 36% on average (and vertical tracking by 56%).

16:30-18:00, Paper ThCT30-NT.6	Add to My Program
Hierarchical Meta-Learning-Based Adaptive Controller

Xie, Fengze	California Institute of Technology
Shi, Guanya	Carnegie Mellon University
O'Connell, Michael	California Institute of Technology
Yue, Yisong	California Institute of Technology
Chung, Soon-Jo	Caltech
Keywords: Machine Learning for Robot Control, Aerial Systems: Mechanics and Control, Robust/Adaptive Control Abstract: We study how to design learning-based adaptive controllers that enable fast and accurate online adaptation in changing environments. In these settings, learning is typically done during an initial (offline) design phase, where the vehicle is exposed to different environmental conditions and disturbances (e.g., a drone exposed to different winds) to collect training data. Our work is motivated by the observation that real-world disturbances fall into two categories: 1) those that can be directly monitored or controlled during training, which we call ``manageable''; and 2) those that cannot be directly measured or controlled (e.g., nominal model mismatch, air plate effects, and unpredictable wind), which we call ``latent''. Imprecise modeling of these effects can result in degraded control performance, particularly when latent disturbances continuously vary. This paper presents the Hierarchical Meta-learning-based Adaptive Controller (HMAC) to learn and adapt to such multi-source disturbances. Within HMAC, we develop two techniques: 1) Hierarchical Iterative Learning, which jointly trains representations to caption the various sources of disturbances, and 2) Smoothed Streaming Meta-Learning, which learns to capture the evolving structure of latent disturbances over time (in addition to standard meta-learning on the manageable disturbances). Experimental results demonstrate that HMAC exhibits more precise and rapid adaptation to multi-source disturbances than other adaptive controllers.

16:30-18:00, Paper ThCT30-NT.7	Add to My Program
A Novel Wide-Area Multiobject Detection System with High-Probability Region Searching

Long, Xianlei	Chongqing University
Zhao, Hui	College of Computer Science, China University of Geoscience
Chen, Chao	Chongqing University
Gu, Fuqiang	Chongqing University
Gu, Qingyi	Institute of Automation, Chinese Academy of Sciences
Keywords: Surveillance Robotic Systems, Hardware-Software Integration in Robotics, Computer Vision for Transportation Abstract: In recent years, wide-area visual detection systems have been widely applied in various industrial and security sectors. These systems, however, face significant challenges when implementing multi-object detection due to conflicts arising from the need for high-resolution imaging, efficient object searching, and accurate localization. To address these challenges, this paper presents a hybrid system that incorporates a wide-angle camera, a high-speed search camera, and a galvano-mirror. In this system, the wide-angle camera offers panoramic images as prior information, which helps the search camera in capturing detailed images of the targeted objects. This integrated approach enhances the overall efficiency and effectiveness of wide-area visual detection systems. Specifically, in this study, we introduce a wide-angle camera-based method to generate a panoramic probability map (PPM) for estimating high-probability regions of target object presence. Then, we propose a probability searching module that uses the PPM-generated prior information to dynamically adjust the sampling range and refine target coordinates based on uncertainty variance computed by the object detector. Finally, the integration of PPM and the probability searching module yields an efficient hybrid vision system capable of achieving 120 fps multi-object search and detection. Extensive experiments are conducted to verify the system's effectiveness and robustness.


ThCT33-CC Oral Session, CC-301	Add to My Program
Motion Analysis and Planning

Chair: Wilhelm, Nikolas Jakob	Technical University of Munich
Co-Chair: Johnson, Aaron M.	Carnegie Mellon University

16:30-18:00, Paper ThCT33-CC.1	Add to My Program
Design and Implementation of a Robotic Testbench for Analyzing Pincer Grip Execution in Human Specimen Hands

Wilhelm, Nikolas Jakob	Technical University of Munich
Glowalla, Claudio	Department of Orthopaedics and Sports Orthopaedics, Klinikum Rec
Haddadin, Sami	Technical University of Munich
Schote, Julian	Department of Orthopaedics and Sports Orthopaedics, Klinikum Rec
Hoeppner, Hannes	Berliner Hochschule Für Technik, BHT
van der Smagt, Patrick	Volkswagen Group
Karl, Maximilian	Volkswagen AG
Burgkart, Rainer	Technische Universität München
Keywords: Human and Humanoid Motion Analysis and Synthesis, Datasets for Human Motion Abstract: This study presents an innovative test rig engineered to explore the kinematic and viscoelastic characteristics of human specimen hands. The rig features eight force-controlled motors linked to muscle tendons, enabling precise stimulation of hand specimens. Hand movements are monitored through an optical tracking system, while a force-torque sensor quantifies the resultant fingertip loads. Employing this setup, we successfully demonstrated a pincer grip using a cadaver hand and measured both muscle forces and grip strength. Our results reveal a nonlinear relationship between tendon forces and grip strength, which can be modeled by an exponential fit. This investigation serves as a nexus between biomechanical and robotics-focused research, providing critical insights for the advancement of robotic hand actuation and therapeutic interventions.

16:30-18:00, Paper ThCT33-CC.2	Add to My Program
Human Gait Cost Function Varies with Walking Speed: An Inverse Optimal Control Study

Weng, Jiacheng	University of Waterloo
Hashemi, Ehsan	University of Alberta
Arami, Arash	University of Waterloo
Keywords: Human and Humanoid Motion Analysis and Synthesis, Optimization and Optimal Control, Human-Centered Robotics Abstract: This work investigates the optimal cost function composition for human gait at different walking speeds. Kinematic and kinetic data for walking at four walking speeds were collected from five able-bodied individuals. The data was then used to recover optimal cost functions in a predictive simulation environment with musculoskeletal models. 20 inverse optimal control (IOC) problems were solved for cost function weight tuning using the previously developed and validated Adaptive Reference IOC (AR-IOC) algorithm. Given the walking speed range examined (0.6-1.5m/s), the converged cost function weights suggest that the increase in walking speed attributes to a reduction of foot sliding penalty weight and weight increase for the center of mass (CoM) acceleration and stability as confirmed by several experiments. Furthermore, we did not observe any significant weight shift in effort reduction between the upper and the lower body with respect to walking speed. The obtained results from this study can be used in a toolbox for obtaining subject- and task-specific cost functions and assisting the development of personalized rehabilitation technologies.

16:30-18:00, Paper ThCT33-CC.3	Add to My Program
LORIS: A Lightweight Free-Climbing Robot for Extreme Terrain Exploration

Nadan, Paul	Carnegie Mellon University
Backus, Spencer	NASA Jet Propulsion Lab
Johnson, Aaron M.	Carnegie Mellon University
Keywords: Climbing Robots, Grippers and Other End-Effectors, Compliant Joints and Mechanisms Abstract: Climbing robots can investigate scientifically valuable sites that conventional rovers cannot access due to steep terrain features. Robots equipped with microspine grippers are particularly well-suited to ascending rocky cliff faces, but most existing designs are either large and slow or limited to relatively flat surfaces such as walls. We present a novel free-climbing robot to bridge this gap through innovations in gripper design and force control. Fully passive grippers and wrist joints allow secure grasping while reducing mass and complexity. Forces are distributed among the robot's grippers using an optimization-based control strategy to minimize the risk of unexpected detachment. The robot prototype has demonstrated vertical climbing on both flat cinder block walls and uneven rock surfaces in full Earth gravity.

16:30-18:00, Paper ThCT33-CC.4	Add to My Program
Floating-Base Manipulation on Zero-Perturbation Manifolds

Bittner, Brian	JHUAPL
Reid, Jason	Jet Propulsion Laboratory
Wolfe, Kevin	Johns Hopkins University Applied Physics Laboratory
Keywords: Nonholonomic Motion Planning, Bimanual Manipulation, Mobile Manipulation Abstract: To achieve high-dexterity motion planning on floating-base systems, the base dynamics induced by arm motions must be treated carefully. In general, it is a significant challenge to establish a fixed-base frame during tasking due to forces and torques on the base that arise directly from arm motions (e.g. arm drag in low Reynolds environments and arm momentum in high Reynolds environments). While thrusters can in theory be used to regulate the vehicle pose, it is often insufficient to establish a stable pose for precise tasking, whether the cause be due to underactuation, modeling inaccuracy, suboptimal control parameters, or insufficient power. We propose a solution that asks the thrusters to do less high bandwidth perturbation correction by planning arm motions that induce zero perturbation on the base. We are able to cast our motion planner as a nonholonomic rapidly-exploring random tree (RRT) by representing the floating-base dynamics as pfaffian constraints on joint velocity. These constraints guide the manipulators to move on zero-perturbation manifolds (which inhabit a subspace of the tangent space of the internal configuration space). To invoke this representation (termed a perturbation map) we assume the body velocity (perturbation) of the base to be a joint-defined linear mapping of joint velocity and describe situations where this assumption is realistic (including underwater, aerial, and orbital environments). The core insight of this work is that when perturbation of the floating-base has affine structure with respect to joint velocity, it provides the system a class of kinematic reduction that permits the use of sample-based motion planners (specifically a nonholonomic RRT). We show that this allows rapid, exploration-geared motion planning for high degree of freedom systems in obstacle rich environments, even on floating-base systems with nontrivial dynamics.

16:30-18:00, Paper ThCT33-CC.5	Add to My Program
Towards Geometric Motion Planning for High-Dimensional Systems: Gait-Based Coordinate Optimization and Local Metrics

Yang, Yanhao	Oregon State University
Bass, Capprin	Oregon State University
Hatton, Ross	Oregon State University
Keywords: Nonholonomic Motion Planning, Whole-Body Motion Planning and Control, Biologically-Inspired Robots Abstract: Geometric motion planning offers effective and interpretable gait analysis and optimization tools for locomoting systems. However, due to the curse of dimensionality in coordinate optimization, a key component of geometric motion planning, it is almost infeasible to apply current geometric motion planning to high-dimensional systems. In this paper, we propose a gait-based coordinate optimization method that overcomes the curse of dimensionality. We also identify a unified geometric representation of locomotion by generalizing various nonholonomic constraints into local metrics. By combining these two approaches, we take a step towards geometric motion planning for high-dimensional systems. We test our method in two classes of high-dimensional systems - low Reynolds number swimmers and free-falling Cassie - with up to 11-dimensional shape variables. The resulting optimal gait in the high-dimensional system shows better efficiency compared to that of the reduced-order model. Furthermore, we provide a geometric optimality interpretation of the optimal gait.

Technical Program for Thursday May 16, 2024